Migration from SAS to Python

Sas2py – Migration is the only way from SAS to Python


Introduction

The idea to write the SAS2py project started when the migration from SAS to Python seemed necessary. SAS is a widely-used, proprietary software for data management and statistical analysis. Despite its popularity, modern business demands have highlighted several shortcomings:

  • High costs for maintaining, developing, and licensing the software
  • Customization limits, which can frustrate advanced users
  • Smaller community than open-source alternatives like Python or R, making it harder to find developers or students familiar with it
  • Dependency on SAS, given its closed-source nature

Due to these factors, many companies are switching to Python and R for their flexibility, scalability, and active developer communities.


Date: September 16, 2024
Duration: 4 hours

Migration from SAS to Python

Migration from SAS to other languages, particularly Python, has become an essential task. Python is especially favored for its:

  • Scalability and open-source nature
  • Large community of developers
  • Availability of SAS-to-Python converters

However, converting code from SAS to Python is not without its challenges. Existing converters, while useful, tend to have accuracy issues (typically less than 90%), which necessitates additional debugging and refinement post-conversion.

The traditional “line-to-line” code conversion approach is highly inefficient, as it only changes the syntax without understanding the context of the code. Semantic Designs offers a solution for smaller SAS projects, but struggles with larger, more complex codebases.

Our Approach

Our solution utilizes advanced AI techniques and OpenAI’s O1 API. Instead of simply translating the code line-by-line, we optimize the converted code to achieve better performance and more readable, maintainable results. This goes beyond simple translation, using machine learning to create faster, cleaner code.

Steps in the process:

  1. Convert SAS to Python using O1 API: This not only translates the code but optimizes it for performance.
  2. Post-Processing Scripts: To merge different parts of the code together for better structure.
  3. CI/CD Automation: Ensuring continuous integration and smooth deployment.

Example: SAS to Python Code Conversion

Original SAS Code:

/* Survival Analysis in SAS using PROC LIFETEST and PROC PHREG */
data survival_data;
    input patient_id time status treatment age sex;
    datalines;
1  5  1  1  65 1
...
run;

/* Kaplan-Meier Survival Analysis */
proc lifetest data=survival_data plots=survival(atrisk);
    time timestatus(0);
    strata treatment;
run;

/* Cox Proportional Hazards Model */
proc phreg data=survival_data;
    class treatment(ref='1') sex(ref='0') / param=ref;
    model timestatus(0) = treatment age sex;
run;

Converted Python Code:

import pandas as pd
from lifelines import KaplanMeierFitter, CoxPHFitter
import matplotlib.pyplot as plt

data = {
    'patient_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'time': [5, 8, 4, 7, 6, 9, 3, 10, 2, 5],
    'status': [1, 0, 1, 1, 0, 1, 1, 0, 1, 1],
    'treatment': [1, 1, 2, 2, 1, 2, 1, 2, 1, 2],
    'age': [65, 70, 50, 55, 60, 75, 45, 80, 40, 68],
    'sex': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
}

df = pd.DataFrame(data)

# Kaplan-Meier Survival Analysis
kmf = KaplanMeierFitter()

# Separate data by treatment groups
groups = df['treatment']
ix = (groups == 1)

# Survival curve for Control and Treatment groups
T = df['time']
E = df['status']
kmf.fit(T[ix], event_observed=E[ix], label='Control')
kmf.plot_survival_function()
kmf.fit(T[~ix], event_observed=E[~ix], label='Treatment')
kmf.plot_survival_function()

plt.show()

# Fit Cox Proportional Hazards Model
df['treatment'] = df['treatment'].astype('category')
df['sex'] = df['sex'].astype('category')

cph = CoxPHFitter()
cph.fit(df, duration_col='time', event_col='status', formula='treatment + age + sex')
cph.print_summary()
cph.plot(hazard_ratios=True)
plt.show()

Conclusion

Using advanced AI technologies like OpenAI’s O1 API, we’ve successfully translated SAS code to Python, resulting in faster, optimized, and maintainable Python code. This approach offers a significant improvement over traditional methods, ensuring that companies can easily move away from SAS with minimal disruption while enjoying the benefits of open-source technologies.


This report summarizes the key steps taken during the migration process, offering a clearer understanding of the benefits and solutions used in the SAS-to-Python conversion.