run_out_of_sample_exercise()

Run out-of-sample exercise to evaluate model performance.

Usage

run_out_of_sample_exercise(
    df,
    macro_names,
    region_names,
    region_covariate_names,
    n_factors=4,
    aggregate_measure="gva_q_on_q",
    aggregation_region="uk",
    region_measure="gva_q_on_4q",
    region_q_on_q_measure=None,
    step_size=1,
    init_chunk_size=20,
    lag_qtrs=6,
    lag_qtrs_qoq=1,
    n_its=100000,
    n_posterior_samples=3000
)

Masks the most recent annual regional data by lag_qtrs quarters and fits the model in rolling chunks. Out-of-sample nowcasts and outturns for each step are collected and returned.

Parameters

df: pd.DataFrame

Dataframe containing relevant columns.

macro_names: list[str]

Names of macro series.

region_names: list[str]

Names of regions.

region_covariate_names: list[str]

Names of by-region covariates.

n_factors: int = 4

Number of factors. Defaults to 4.

aggregate_measure: str = "gva_q_on_q"

Nation-wide measure. Defaults to “gva_q_on_q”.

aggregation_region: str = "uk"

Top level geography. Defaults to “uk”.

region_measure: str = "gva_q_on_4q"

Growth measure regional. Defaults to “gva_q_on_4q”.

region_q_on_q_measure: str | None = None

Optional measure name in df supplying published quarterly (q-on-q) regional growth rates as a hard clamp on y_reg. Same format as every other measure (datetime | measure | region | value); values must be decimal growth rates. Inside the out-of-sample window this exercise NaN-masks these rows for every step (as with region_measure), so published quarterly values inside the OOS window do not leak into the nowcast. Older published values remain as clamps. Defaults to None (no q-on-q clamp, backwards compatible).

step_size: int = 1

Quarters to advance per OOS step. Defaults to 1.

init_chunk_size: int = 20

Initial learning window size. Defaults to 20.

lag_qtrs: int = 6

How many quarters before annual regional data are published; drives the OOS mask for region_measure. Tuned for the ONS regional annual GVA release (~6 quarters). Defaults to 6.

lag_qtrs_qoq: int = 1

How many quarters before quarterly regional data are published; drives a separate OOS mask for region_q_on_q_measure. Scot Gov quarterly GDP publishes with ~1 quarter lag, so a smaller value than lag_qtrs is realistic. Only used when region_q_on_q_measure is not None. Defaults to 1.

n_its: int = 100000

Iterations of ADVI for Bayesian inference. Defaults to 100000.

n_posterior_samples: int = 3000

Samples of the posterior. Defaults to 3000.

Returns

pd.DataFrame

pd.DataFrame: Combined out-of-sample nowcasts and outturns across all rolling steps.