SpecificationCurve

SpecificationCurve(
    self,
    df,
    y_endog=None,
    x_exog=None,
    controls=None,
    exclu_grps=[[None]],
    cat_expand=[],
    always_include=None,
    formula=None,
)

Specification curve object. Uses a model to perform all variants of a specification. Stores the results of those regressions in a tidy format pandas dataframe. Plots the regressions in chart that can optionally be saved. Will iterate over multiple inputs for exog. and endog. variables. Note that categorical variables that are expanded cannot be mutually excluded from other categorical variables that are expanded.

The class can be initialized in two mutually exclusive ways: 1. Using a formula string (e.g., “y ~ x1 + x2”) 2. Using separate y_endog, x_exog, controls, and always_include parameters

Parameters

Name Type Description Default
df pd.DataFrame Input DataFrame required
formula str R-style formula string (e.g., “y ~ x1 + x2”) None
y_endog Union[str, List[str]] Dependent variable(s) None
x_exog Union[str, List[str]] Independent variable(s) None
controls List[str] Control variables None
exclu_grps Union[List[List[None]], List[str], str, List[List[str]]] Groups of variables to exclude. Defaults to [[None]] [[None]]
cat_expand Union[str, List[None], List[str], List[List[str]]] Categorical variables to expand. Defaults to [] []
always_include Union[str, List[str]] Variables to always include None

Raises

Name Type Description
ValueError If neither formula nor (y_endog, x_exog, controls) are provided
ValueError If both formula and (y_endog, x_exog, controls) are provided

Methods

Name Description
fit Fits a specification curve by performing regressions.
fit_null Refits all of the specifications under the null of \(y_{i(k)}^* = y_{i(k)} - b_k*x_{i(k)}\)
plot Makes plot of fitted specification curve. Optionally returns figure and axes for onward adjustment.

fit

SpecificationCurve.fit(estimator=sm.OLS)

Fits a specification curve by performing regressions.

Parameters

Name Type Description Default
estimator statsmodels.regression.linear_model or statsmodels.discrete.discrete_model statsmodels estimator. Defaults to sm.OLS. sm.OLS

fit_null

SpecificationCurve.fit_null(n_boot=30, f_sample=0.1)

Refits all of the specifications under the null of \(y_{i(k)}^* = y_{i(k)} - b_k*x_{i(k)}\) where i is over rows and k is over specifications and i is a function of k as y and x rows can change depending on the k when there are multiple y_endog and multiple x_exog. Each bootstrap sees a fraction of rows f_sample taken and then a new specification curve fit under the null. Then summary statistics are created by specification, and statistical tests.

Parameters

Name Type Description Default
n_boot int Number of bootstraps. Defaults to 30. 30
f_sample float Fraction of rows to sample in each bootstrap. Defaults to 0.1. 0.1

Raises

Name Type Description
ValueError If .fit() has not been run first.

Returns

Name Type Description
None None Results saved in self.null_stats_summary.

plot

SpecificationCurve.plot(
    save_path=None,
    pretty_plots=True,
    preferred_spec=[],
    show_null_curve=False,
    return_fig=False,
    **kwargs,
)

Makes plot of fitted specification curve. Optionally returns figure and axes for onward adjustment.

Parameters

Name Type Description Default
save_path string or path Exported fig filename. Defaults to None. None
pretty_plots bool Whether to use this package’s figure formatting. Defaults to True. True
preferred_spec list Preferred specification. Defaults to []. []
show_null_curve bool Whether to include the curve under the null. Defaults to False. False
pretty_plots bool Whether to use this package’s figure formatting. Defaults to False. True
return_fig bool Whether to return the figure and axes objects. Defaults to False. False
**kwargs dict Additional arguments passed to .fit_null() when show_null_curve is True. Parameters: n_boot (int): Number of bootstrap iterations for null curve calculation. eg the argument would be **{"n_boot": 5}. f_sample (float): Fraction of rows to sample in each bootstrap. Defaults to 0.1. {}

Returns

Name Type Description
Union[None, tuple[mpl.figure.Figure, List[mpl.axes._axes.Axes]]] Union[None, tuple[mpl.figure.Figure, List[mpl.axes._axes.Axes]]]: None or the fig and axes with chart on.

Raises

Name Type Description
ValueError If .plot() is called before .fit() - the fit must be run first.