prep_data_for_model_run()
Expects a data frame in following format:
Usage
prep_data_for_model_run(
df,
macro_names,
region_names,
region_covariate_names,
aggregate_measure="gva_q_on_q",
aggregation_region="uk",
region_measure="gva_q_on_4q",
region_q_on_q_measure=None
)datetime | measure | region | value
Parameters
df: pd.DataFrame-
Long format dataframe with all data in
macro_names: list[str]-
Names of macro UK series
region_names: list[str]-
Names of (local) regions
region_covariate_names: list[str]-
Names of regional level series. These will be absorbed into exogenous factors.
aggregate_measure: str = "gva_q_on_q"-
UK-wide measure in q-on-q growth rate. Defaults to “gva_q_on_q”.
aggregation_region: str = "uk"-
Highest level geography, which other regions sum to. Defaults to “uk”.
region_measure: str = "gva_q_on_4q"-
Regional measure, q-on-4q growth rate. Defaults to “gva_q_on_4q”.
region_q_on_q_measure: str | None = None-
Optional measure name in
dfsupplying published quarterly (q-on-q) growth rates for any subset of regions and quarters. Rows live in the same long dataframe as every other input, with the standard columnsdatetime | measure | region | value:* ``datetime``: quarter-end timestamp. * ``measure``: equal to the string passed here. * ``region``: one of the names in ``region_names``. * ``value``: decimal q-on-q growth rate (e.g. ``0.005`` for 0.5% — **not** a percentage), matching the convention used for ``aggregate_measure`` / ``region_measure``.Partial coverage is fully supported: rows may be supplied for only a subset of regions and a subset of quarters. After pivoting, absent region/quarter pairs become NaN and are automatically masked out of the downstream likelihood, so uncovered regions and quarters contribute nothing. Defaults to
None(no q-on-q observations — backwards compatible).
Returns
tuple: npt.NDArray[np.float64]-
(y_uk_extracted, y_a_r_extracted, Z_panel_extract, macro_extracted, lag_qtrs, y_qoq_r_extracted). npt.NDArray[np.float64]-
y_qoq_r_extractedis a(T, R)array aligned toregion_nameswith NaN in list[npt.NDArray[np.float64]]-
unobserved cells, or
Nonewhenregion_q_on_q_measureisNone.