`import os`

os.chdir('directory')

Best practice: don't do this; bring the data to you by opening Visual Studio Code in a project root folder and using relative paths. |\n", "| `use file.dta` | `df = pd.read_stata('file.dta')` |\n", "| `use varlist using dtafile` |

`df = pd.read_stata('dtafile', columns=varlist) `

|\n",
"| `import excel using excelfile` | `df = pd.read_excel('excelfile') `

|\n",
"| `import delimited using csvfile` | `df = pd.read_csv('csvfile') `

|\n",
"| `save filename, replace` | `df.to_stata('filename') `

Best practice: don't save data in .dta files. |\n", "| `outsheet using filename, comma` |

`df.to_csv('filename') `

|\n",
"| `export excel using filename` | `df.to_excel('filename') `

Best practice: don't save data in Excel files. |\n", "| `keep if condition` | `df = df[condition]` |\n", "| `drop if condition` | `df = df[~condition]` |\n", "| `keep variable` | `df = df['variable']` |\n", "| `keep varstem*` | `df = df.filter(like='varstem*')` |\n", "| `drop variable` | `df = df.drop('variable', axis=1)` |\n", "| `drop varstem*` | `df = df.drop(df.filter(like='varstem*').columns, axis=1)` |\n", "| `describe` | `df.info()` |\n", "| `describe variable` | `df['variable'].dtype` |\n", "| `count` | `len(df)` |\n", "| `count if condition` | `df[condition].shape[0]` |\n", "| `summ variable` | `df['variable'].describe()` |\n", "| `summ variable if condition` | `df.loc[condition, 'variable'].describe()` |\n", "| `gen newvar = expression` | `df['newvar'] = expression` |\n", "| `gen newvar = expression if condition` | `df.loc[condition, 'newvar'] = expression` |\n", "| `replace newvar = expression if condition` | `df.loc[condition, 'newvar'] = expression` |\n", "| `rename var newvar` | `df = df.rename(columns={var: newvar})` or `df.columns=list_new_columns` |\n", "| `subinstr(string, \" \", \"_\", .)` | `df['var'].str.replace(' ', '_')` |\n", "| `egen newvar = statistic(var), by(groupvars)` | `df['newvar'] = df.groupby(groupvars)['var'].transform('statistic')` |\n", "| `collapse (sd) var (median) var (max) var (min)

`import pyfixest as pf`

fit = pf.feols(\"yvar ~ xvar\", data=df[\"condition\"], vcov=\"HC2\")

|\n",
"| `reg yvar xvar if condition, vce(cluster clustervar)` | `import pyfixest as pf`

fit = pf.feols(\"yvar ~ xvar\", data=df[\"condition\"], vcov={\"CRV1\": \"clustervar\"})

|\n",
"| `areg yvar xvar, absorb(fe_var)` | `import pyfixest as pf`

fit = pf.feols(\"yvar ~ xvar \\| fe_var\", data=df)

|\n",
"| `_b[var], _se[var]` | `results_sw.coef()[\"var\"], results_sw.se()[\"var\"]` following creation of `results_sw` via `results_sw = pf.feols(...)` |\n",
"| `ivreg2 lwage exper expersq (educ=age)` | ` pf.feols(\"lwage ~ exper + expersq \\| educ ~ age\", data=dfiv) `

|\n",
"| `outreg2` | `results = pf.feols(...)` then `results.tidy()` |\n",
"| `binscatter` | `binsreg` from the [**binsreg**](https://pypi.org/project/binsreg/) package; see {ref}`regression-diagnostics`. |\n",
"| `twoway scatter var1 var2` | `df.scatter(var2, var1)` |\n",
"\n",
"The table below presents further examples of doing regression with both the **statsmodels** and [**pyfixest**](https://s3alfisc.github.io/pyfixest/) packages.\n",
"\n",
"Note that, in the below, you need only import `pf.feols` once in each Python session, and the syntax for looking at results is `results = pf.feols(...)` and then `results.summary()`.\n",
"\n",
"| Command | Stata | Python |\n",
"| ----------- | ----------- | ----------- |\n",
"| Fixed Effects (absorbing) | `reghdfe y x, absorb(fe)` | `import pyfixest as pf`

fit = pf.feols(\"y ~ x \\| fe\", data=df)

|\n",
"| Categorical regression | `reghdfe y x i.cat` | `import pyfixest as pf`

fit = pf.feols(\"y ~ x + C(cat)\", data=df)

But if `cat` is of type categorical it can be run with `y ~ x + cat`|\n", "| Interacting categoricals | `reghdfe y x i.cat#i.cat2` |

`import pyfixest as pf`

fit = pf.feols(\"yvar ~ xvar + C(cat):C(cat2)\", data=df)

Note that `a*b` is a short-hand for `a + b + a:b`, with the last term representing the interaction.|\n", "| Robust standard errors | `reghdfe y x, r` |

`import pyfixest as pf`

fit = pf.feols(\"y ~ x, data=df, vcov=\"HC1\")

Note that a range of heteroskedasticity robust standard errors are available: see {ref}`regression` for more.|\n", "| Clustered standard errors | `reghdfe y x, cluster(clust)` |

`import pyfixest as pf`

fit = pf.feols(\"y ~ x\", data=df, vcov={\"CRV1\": \"clust\"})

|\n",
"| Two-way clustered standard errors | `reghdfe y x, cluster(clust1 clust2)` |`import pyfixest as pf`

fit = pf.feols(\"y ~ x\", data=df, vcov={\"CRV1\": \"clust1 + clust2\"})

|\n",
"| Instrumental variables | `ivreghdfe 2sls y exog (endog = instrument)` | `import pyfixest as pf`

fit = pf.feols(\"y ~ exog \\| endog ~ instrument\", data=df)

|"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": "0.8",
"jupytext_version": "1.5.0"
}
},
"kernelspec": {
"display_name": "Python 3.10.12 64-bit ('codeforecon': conda)",
"language": "python",
"name": "python3"
},
"source_map": [
14
]
},
"nbformat": 4,
"nbformat_minor": 5
}