prep_data_for_model_run()

Expects a data frame in following format:

Usage

prep_data_for_model_run(
    df,
    macro_names,
    region_names,
    region_covariate_names,
    aggregate_measure="gva_q_on_q",
    aggregation_region="uk",
    region_measure="gva_q_on_4q",
    region_q_on_q_measure=None
)

datetime | measure | region | value

Parameters

df: pd.DataFrame

Long format dataframe with all data in

macro_names: list[str]

Names of macro UK series

region_names: list[str]

Names of (local) regions

region_covariate_names: list[str]

Names of regional level series. These will be absorbed into exogenous factors.

aggregate_measure: str = "gva_q_on_q"

UK-wide measure in q-on-q growth rate. Defaults to “gva_q_on_q”.

aggregation_region: str = "uk"

Highest level geography, which other regions sum to. Defaults to “uk”.

region_measure: str = "gva_q_on_4q"

Regional measure, q-on-4q growth rate. Defaults to “gva_q_on_4q”.

region_q_on_q_measure: str | None = None

Optional measure name in df supplying published quarterly (q-on-q) growth rates for any subset of regions and quarters. Rows live in the same long dataframe as every other input, with the standard columns datetime | measure | region | value:

* ``datetime``: quarter-end timestamp.
* ``measure``: equal to the string passed here.
* ``region``: one of the names in ``region_names``.
* ``value``: decimal q-on-q growth rate (e.g. ``0.005`` for 0.5% —
  **not** a percentage), matching the convention used for
  ``aggregate_measure`` / ``region_measure``.

Partial coverage is fully supported: rows may be supplied for only a subset of regions and a subset of quarters. After pivoting, absent region/quarter pairs become NaN and are automatically masked out of the downstream likelihood, so uncovered regions and quarters contribute nothing. Defaults to None (no q-on-q observations — backwards compatible).

Returns

tuple: npt.NDArray[np.float64]: (y_uk_extracted, y_a_r_extracted, Z_panel_extract, macro_extracted, lag_qtrs, y_qoq_r_extracted).
npt.NDArray[np.float64]: y_qoq_r_extracted is a (T, R) array aligned to region_names with NaN in
list[npt.NDArray[np.float64]]: unobserved cells, or None when region_q_on_q_measure is None.