---------------------------------------------------------------------- This is the API documentation for the ambric library. ---------------------------------------------------------------------- ## Classes Main classes provided by the package Ambric(df: pandas.core.frame.DataFrame, macro_names: list[str], region_names: list[str], region_covariate_names: list[str], n_factors: int = 4, aggregate_measure: str = 'gva_q_on_q', aggregation_region: str = 'uk', region_measure: str = 'gva_q_on_4q', region_q_on_q_measure: str | None = None) Augmented Mixed-frequency Bayesian Regional Inference with Constraints. Combines factor-analytic Bayesian state-space inference with XGBoost-driven predictions via a MIDAS bridge equation for regional nowcasting. ## Ambric Methods Methods for the Ambric class __repr__(self) -> str fit(self, n_model_fit_iterations: int = 200000, n_posterior_samples: int = 3000, xgb_params: dict | None = None, bridge_use_almon: bool = True, bridge_ridge_alpha: float = 1.0) -> 'Ambric' Fit the ambric model. Pipeline: 1. Extract factors from regional indicator panel. 2. Train XGBoost on annually-aggregated raw indicators to predict annual regional growth. 3. Fit MIDAS bridge equation to disaggregate XGBoost annual predictions to quarterly frequency. 4. Build Bayesian state-space model with factors, macro, and bridge signal. 5. Run variational inference. Args: n_model_fit_iterations: Number of ADVI iterations. n_posterior_samples: Number of posterior samples to draw. xgb_params: Optional XGBoost hyperparameters override. bridge_use_almon: Use Almon polynomial for MIDAS weights. bridge_ridge_alpha: Ridge regularisation for bridge equation. Returns: Self for method chaining. save_trace(self, path: str | pathlib.Path) -> None Saves the model trace to a NetCDF file. Args: path (str | Path): Path to save the trace file populate_results(self) -> pandas.core.frame.DataFrame Returns results from model estimation, and original data, in format: | datetime | region | value | measure | type where type can be "outturn" or "nowcast" Raises: ValueError: If model not fitted Returns: pd.DataFrame: Dataframe of results plot_national_quarterly_vs_implied(self, path: str | pathlib.Path | None = None) -> None Plot national quarterly growth rates vs implied estimates from the model. Args: path (str | Path | None, optional): Save dir for image. Defaults to None. Raises: ValueError: If model not fitted. plot_regional_annual_estimate(self, path: str | pathlib.Path | None = None) -> None Plot regional annual growth rates vs estimated from the model. Args: path (str | Path | None, optional): Dir to save fig to. Defaults to None. Raises: ValueError: If model not fitted. plot_single_region_annual_estimate(self, region_name: str, path: str | pathlib.Path | None = None) -> None Plot a single region's annual growth rates vs estimated from the model. Args: region_name (str): The region to plot. path (str | Path | None, optional): Dir to save fig to. Defaults to None. Raises: ValueError: If model not fitted. plot_estimated_regional_quarterly(self, path: str | pathlib.Path | None = None) -> None Plot estimated regional quarterly growth rates from the model. Args: path (str | Path | None, optional): Dir to save fig to. Defaults to None. Raises: ValueError: If model not fitted. plot_current_nowcast(self, path: pathlib.Path | None = None) -> None Plot the latest nowcast (ie the period for which no annual regional observations are available.) Args: path (Path | None, optional): Dir to save figure to. Defaults to None. Raises: ValueError: If model not fitted. assemble_loadings_data(self) -> pandas.core.frame.DataFrame Assemble estimated loadings from the model posterior. Separates data assembly from plotting so the returned frame can be inspected, exported, or passed to the companion plot methods. The frame contains one row per (region, loading) combination with the posterior mean and 94 % HDI bounds. Loadings are scaled by the standard deviation of their corresponding input variable so that the three signal types are on a comparable *contribution* scale. Raises: ValueError: If the model has not been fitted yet. Returns: pd.DataFrame: Long-format loadings frame; see :func:`~ambric.diagnostics.assemble_loadings_data` for column details. plot_loadings_by_region(self, path: pathlib.Path | None = None) -> None Plot estimated loadings for each region, coloured by broad type. Assembles loadings from the posterior and passes them to :func:`~ambric.diagnostics.plot_loadings_by_region`. One panel per region shows all factor, macro, and bridge-signal loadings as a horizontal dot chart with 94 % HDI bars, enabling within-region comparison of the three signal categories. Args: path (Path | None): Directory in which to save the figure as SVG. When ``None`` the figure is displayed interactively. Raises: ValueError: If the model has not been fitted yet. plot_loadings_aggregate(self, path: pathlib.Path | None = None) -> None Plot loading distributions across regions, grouped by broad type. Assembles loadings from the posterior and passes them to :func:`~ambric.diagnostics.plot_loadings_aggregate`. One panel per broad loading type (factors, macro, boost_signal) compares individual region estimates against the cross-region mean, enabling assessment of which signal category dominates model dynamics and how consistently loadings behave across regions. Args: path (Path | None): Directory in which to save the figure as SVG. When ``None`` the figure is displayed interactively. Raises: ValueError: If the model has not been fitted yet. bands_indicator(self, path: pathlib.Path | None = None) -> pandas.core.frame.DataFrame Produce a table indicating bands Uses seasonally adjusted q-on-q growth estimates at quarterly frequency. Args: path (Path | None): Directory to save the table as Parquet. When ``None`` no file is written. Raises: ValueError: If model not fitted. Returns: pd.DataFrame: Wide-format table with datetime index and one column per region containing the classification. point_estimates_q_on_4q(self, path: pathlib.Path | None = None) -> pandas.core.frame.DataFrame Produce a table of nowcast point estimates by region. Returns q-on-4q annual growth estimates (in percentage points, rounded to 2 d.p.) at quarterly frequency. Args: path (Path | None): Directory to save the table as Parquet. When ``None`` no file is written. Raises: ValueError: If model not fitted. Returns: pd.DataFrame: Wide-format table with datetime index and one column per region containing the point estimate. point_estimates_q_on_q(self, path: pathlib.Path | None = None) -> pandas.core.frame.DataFrame Produce a table of nowcast point estimates by region. Returns q-on-q annual growth estimates (in percentage points, rounded to 2 d.p.) at quarterly frequency. Args: path (Path | None): Directory to save the table as Parquet. When ``None`` no file is written. Raises: ValueError: If model not fitted. Returns: pd.DataFrame: Wide-format table with datetime index and one column per region containing the point estimate. to_index_q_on_q(self, path: pathlib.Path | None = None) -> pandas.core.frame.DataFrame Produce a table of nowcast index. Returns index to earliest data point (rounded to 2 d.p.) at quarterly frequency. Args: path (Path | None): Directory to save the table as Parquet. When ``None`` no file is written. Raises: ValueError: If model not fitted. Returns: pd.DataFrame: Wide-format table with datetime index and one column per region containing the point estimate. seasonally_adjusted_index_and_growth_by_region(self, path: pathlib.Path | None = None) -> pandas.core.frame.DataFrame Produce seasonally adjusted index and q-on-q growth rates Returns data to earliest data point (rounded to 2 d.p.) at quarterly frequency. Args: path (Path | None): Directory to save the table as Parquet. When ``None`` no file is written. Raises: ValueError: If model not fitted. Returns: pd.DataFrame: Wide-format table with datetime index and one column per region containing the point estimate. ## Functions Utility functions bands_indicator_out_of_sample_results(df_results: pandas.core.frame.DataFrame, path: pathlib.Path | None = None, bands: list[float] = [-0.8, -0.1, 0.1, 0.8]) -> pandas.core.frame.DataFrame Bands classification of out-of-sample trend q-on-q nowcasts. For every ``(region, datetime, quarters_to_publication, nowcast_index)`` in the OOS results, extracts the X13 trend of the q-on-q nowcast (per ``quarters_to_publication`` slice), converts it back to q-on-q growth in percentage points, and classifies into bands using :func:`~ambric.diagnostics.bands_indicator`. Args: df_results (pd.DataFrame): Output of :func:`run_out_of_sample_exercise`. path (Path | None): Directory to save the table as Parquet. When ``None`` no file is written. bands (list[float]): Interior bin edges in percentage points. Defaults to ``[-0.8, -0.1, 0.1, 0.8]``. Returns: pd.DataFrame: Long-format frame with columns ``region``, ``datetime``, ``quarters_to_publication``, ``nowcast_index``, ``classification``. build_ambric_model(y_uk: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], y_annual: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], factors: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], macro: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], bridge_signal: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], region_q_on_q: numpy.ndarray[tuple[Any, ...], numpy.dtype[numpy.float64]] | None = None) -> pymc.model.core.Model Build the AMBRIC Bayesian state-space model. The exogenous mean for latent regional growth is: mu_exog[t,r] = Lambda[r] @ F[t] + Gamma[r] @ X[t] + delta_r * s[t,r] where s[t,r] is the quarterly bridge signal from XGBoost + MIDAS. delta_r has a hierarchical shrinkage prior centred at zero. Args: y_uk: UK quarterly growth rates, shape (T,). y_annual: Regional annual growth rates, shape (T, R). factors: Extracted factors, shape (T, K). macro: Macro UK series, shape (T, M). bridge_signal: Quarterly bridge signal, shape (T, R). region_q_on_q: Optional published quarterly regional growth rates, shape ``(T, R)`` with NaN in unobserved cells. Values must be decimal growth rates (e.g. ``0.005`` for 0.5%). When supplied, ``y_reg`` is built as a hybrid ``pm.Deterministic``: at every ``(t, r)`` with an observed value, ``y_reg[t, r]`` is hard-clamped to that value; at NaN cells, ``y_reg[t, r]`` equals the sampled latent ``y_reg_free[t, r]``. The clamped values then propagate through the UK aggregation, annual aggregation, and AR(1) dynamics, informing unobserved neighbours rather than competing with them through a likelihood term. Defaults to ``None`` (no clamp — backwards compatible). Returns: PyMC model object. oos_q_on_4q_performance_table(df_results: pandas.core.frame.DataFrame, region_measure: str, path: pathlib.Path | None = None) -> pandas.core.frame.DataFrame Compute up/down classification accuracy by region and horizon. Compares the sign of each nowcast to its corresponding outturn. Returns a pivot table of percentage accuracy grouped by region and quarters-to-publication. Args: df_results (pd.DataFrame): Out-of-sample results from :func:`~ambric.run_out_of_sample_exercise`. region_measure (str): Regional measure name to filter on. path (Path | None): Directory to save a CSV of the table. When ``None`` no file is written. Returns: pd.DataFrame: Pivot table of classification accuracy (%) with regions as rows and quarters-to-publication as columns. plot_out_of_sample_nowcasts(df_results: pandas.core.frame.DataFrame, region_measure: str, path: pathlib.Path | None = None) -> None Plot out-of-sample nowcasts vs outturns for each region. Produces one figure per region showing observed outturns as dots and nowcasts at varying horizons with transparency indicating proximity to publication. Args: df_results (pd.DataFrame): Out-of-sample results from :func:`~ambric.run_out_of_sample_exercise`. region_measure (str): Regional measure name to filter on. path (Path | None): Directory to save the figures. When ``None`` the figures are displayed interactively. plot_out_of_sample_rmse(df_results: pandas.core.frame.DataFrame, region_measure: str, path: pathlib.Path | None = None) -> None Plot out-of-sample RMSEs by quarters-to-publication for each region. One subplot per region in a grid layout showing how forecast accuracy improves as publication approaches. Args: df_results (pd.DataFrame): Out-of-sample results from :func:`~ambric.run_out_of_sample_exercise`. region_measure (str): Regional measure name to filter on. path (Path | None): Directory to save the figure. When ``None`` the figure is displayed interactively. plot_seasonally_adjusted_q_on_q_growth(df_sa_trend_orig: pandas.core.frame.DataFrame, path: pathlib.Path | None) -> None _summary_ Args: df_sa_trend_orig (pd.DataFrame): _description_ prep_data_for_model_run(df: pandas.core.frame.DataFrame, macro_names: list[str], region_names: list[str], region_covariate_names: list[str], aggregate_measure: str = 'gva_q_on_q', aggregation_region: str = 'uk', region_measure: str = 'gva_q_on_4q', region_q_on_q_measure: str | None = None) -> tuple[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], list[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], int, numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]] | None] Expects a data frame in following format: datetime | measure | region | value Args: df (pd.DataFrame): Long format dataframe with all data in macro_names (list[str]): Names of macro UK series region_names (list[str]): Names of (local) regions region_covariate_names (list[str]): Names of regional level series. These will be absorbed into exogenous factors. aggregate_measure (str, optional): UK-wide measure in q-on-q growth rate. Defaults to "gva_q_on_q". aggregation_region (str, optional): Highest level geography, which other regions sum to. Defaults to "uk". region_measure (str, optional): Regional measure, q-on-4q growth rate. Defaults to "gva_q_on_4q". region_q_on_q_measure (str | None, optional): Optional measure name in ``df`` supplying published quarterly (q-on-q) growth rates for any subset of regions and quarters. Rows live in the same long dataframe as every other input, with the standard columns ``datetime | measure | region | value``: * ``datetime``: quarter-end timestamp. * ``measure``: equal to the string passed here. * ``region``: one of the names in ``region_names``. * ``value``: decimal q-on-q growth rate (e.g. ``0.005`` for 0.5% — **not** a percentage), matching the convention used for ``aggregate_measure`` / ``region_measure``. Partial coverage is fully supported: rows may be supplied for only a subset of regions and a subset of quarters. After pivoting, absent region/quarter pairs become NaN and are automatically masked out of the downstream likelihood, so uncovered regions and quarters contribute nothing. Defaults to ``None`` (no q-on-q observations — backwards compatible). Returns: tuple: ``(y_uk_extracted, y_a_r_extracted, Z_panel_extract, macro_extracted, lag_qtrs, y_qoq_r_extracted)``. ``y_qoq_r_extracted`` is a ``(T, R)`` array aligned to ``region_names`` with NaN in unobserved cells, or ``None`` when ``region_q_on_q_measure`` is ``None``. run_out_of_sample_exercise(df: pandas.core.frame.DataFrame, macro_names: list[str], region_names: list[str], region_covariate_names: list[str], n_factors: int = 4, aggregate_measure: str = 'gva_q_on_q', aggregation_region: str = 'uk', region_measure: str = 'gva_q_on_4q', region_q_on_q_measure: str | None = None, step_size: int = 1, init_chunk_size: int = 20, lag_qtrs: int = 6, lag_qtrs_qoq: int = 1, n_its: int = 100000, n_posterior_samples: int = 3000) -> pandas.core.frame.DataFrame Run out-of-sample exercise to evaluate model performance. Masks the most recent annual regional data by ``lag_qtrs`` quarters and fits the model in rolling chunks. Out-of-sample nowcasts and outturns for each step are collected and returned. Args: df (pd.DataFrame): Dataframe containing relevant columns. macro_names (list[str]): Names of macro series. region_names (list[str]): Names of regions. region_covariate_names (list[str]): Names of by-region covariates. n_factors (int): Number of factors. Defaults to 4. aggregate_measure (str): Nation-wide measure. Defaults to "gva_q_on_q". aggregation_region (str): Top level geography. Defaults to "uk". region_measure (str): Growth measure regional. Defaults to "gva_q_on_4q". region_q_on_q_measure (str | None): Optional measure name in ``df`` supplying published quarterly (q-on-q) regional growth rates as a hard clamp on ``y_reg``. Same format as every other measure (``datetime | measure | region | value``); values must be decimal growth rates. Inside the out-of-sample window this exercise NaN-masks these rows for every step (as with ``region_measure``), so published quarterly values inside the OOS window do not leak into the nowcast. Older published values remain as clamps. Defaults to ``None`` (no q-on-q clamp, backwards compatible). step_size (int): Quarters to advance per OOS step. Defaults to 1. init_chunk_size (int): Initial learning window size. Defaults to 20. lag_qtrs (int): How many quarters before annual regional data are published; drives the OOS mask for ``region_measure``. Tuned for the ONS regional annual GVA release (~6 quarters). Defaults to 6. lag_qtrs_qoq (int): How many quarters before quarterly regional data are published; drives a separate OOS mask for ``region_q_on_q_measure``. Scot Gov quarterly GDP publishes with ~1 quarter lag, so a smaller value than ``lag_qtrs`` is realistic. Only used when ``region_q_on_q_measure`` is not ``None``. Defaults to 1. n_its (int): Iterations of ADVI for Bayesian inference. Defaults to 100000. n_posterior_samples (int): Samples of the posterior. Defaults to 3000. Returns: pd.DataFrame: Combined out-of-sample nowcasts and outturns across all rolling steps. trace_to_series(trace: arviz.data.inference_data.InferenceData) -> tuple[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]] Convert a trace to the relevant estimated series coming out of the model. Args: trace (az.InferenceData): Trace containing posterior. Returns: tuple[npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64]]: Estimated UK quarterly, regional quarterly, and annual growth rates trend_adjust_out_of_sample_results(df_results: pandas.core.frame.DataFrame, path: pathlib.Path | None = None, quarters_to_pub: float = 6.0) generate_realistic_simulated_data(T: int = 130, R: int = 12, J: int = 4, n_factors: int = 2, n_macro: int = 2, lag_qtrs: int = 6) -> pandas.core.frame.DataFrame Generate simulated mixed-frequency regional data in long format. Produces a ``pd.DataFrame`` resembling real-world data suitable for initialising an :class:`~ambric.Ambric` model, including UK quarterly growth, macro series, regional covariates, and lagged annual regional growth. Args: T (int): Number of quarterly time periods. R (int): Number of regions. J (int): Number of regional covariate panels. n_factors (int): Number of latent factors in the DGP. n_macro (int): Number of macro indicator series. lag_qtrs (int): Publication lag in quarters for annual regional data. Returns: pd.DataFrame: Long-format frame with columns ``datetime``, ``measure``, ``region``, ``value``. simulate_data(T: int = 80, R: int = 6, J: int = 3, n_factors: int = 2, n_macro: int = 2, seed: int = 42) -> tuple[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], list[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]] Simulate mixed-frequency data with regional panels and macro indicators. Args: T (int): Number of quarterly time periods. R (int): Number of regions. J (int): Number of regional covariate panels. n_factors (int): Number of latent factors. n_macro (int): Number of macro indicator series. seed (int): Random seed for reproducibility. Returns: tuple: ``(y_uk, y_annual, y_reg_true, Z_panel, macro)`` — national quarterly growth (T,), annual regional growth (T, R) with NaNs, true quarterly regional growth (T, R), regional covariate panels (list of J arrays each (T, R)), and macro series (T, M). simulate_real_time_data(T: int = 80, R: int = 6, J: int = 3, n_factors: int = 2, n_macro: int = 2, annnual_regional_lag_qrtrs: int = 6, seed: int = 42) -> tuple[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], list[numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]] Simulate mixed-frequency data with a realistic publication lag. Wraps :func:`simulate_data` and removes the most recent ``annnual_regional_lag_qrtrs`` quarters of annual regional data to mimic real-time data availability. Args: T (int): Number of quarterly time periods. R (int): Number of regions. J (int): Number of regional covariate panels. n_factors (int): Number of latent factors. n_macro (int): Number of macro indicator series. annnual_regional_lag_qrtrs (int): Quarters of annual data to mask. seed (int): Random seed for reproducibility. Returns: tuple: ``(y_uk, y_annual, y_reg_true, Z_panel, macro, y_annual_no_lags)`` — same as :func:`simulate_data` with an additional copy of the annual data before the lag was applied. ## Constants Module-level constants and data utilities.OMEGA ndarray(shape, dtype=None, buffer=None, offset=0, strides=None, order=None) -- ndarray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None) An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.) Arrays should be constructed using `array`, `zeros` or `empty` (refer to the See Also section below). The parameters given here refer to a low-level method (`ndarray(...)`) for instantiating an array. For more information, refer to the `numpy` module and examine the methods and attributes of an array. Parameters ---------- (for the __new__ method; see Notes below) shape : tuple of ints Shape of created array. dtype : data-type, optional Any object that can be interpreted as a numpy data type. Default is `numpy.float64`. buffer : object exposing buffer interface, optional Used to fill the array with data. offset : int, optional Offset of array data in buffer. strides : tuple of ints, optional Strides of data in memory. order : {'C', 'F'}, optional Row-major (C-style) or column-major (Fortran-style) order. Attributes ---------- T : ndarray Transpose of the array. data : buffer The array's elements, in memory. dtype : dtype object Describes the format of the elements in the array. flags : dict Dictionary containing information related to memory use, e.g., 'C_CONTIGUOUS', 'OWNDATA', 'WRITEABLE', etc. flat : numpy.flatiter object Flattened version of the array as an iterator. The iterator allows assignments, e.g., ``x.flat = 3`` (See `ndarray.flat` for assignment examples; TODO). imag : ndarray Imaginary part of the array. real : ndarray Real part of the array. size : int Number of elements in the array. itemsize : int The memory use of each array element in bytes. nbytes : int The total number of bytes required to store the array data, i.e., ``itemsize * size``. ndim : int The array's number of dimensions. shape : tuple of ints Shape of the array. strides : tuple of ints The step-size required to move from one element to the next in memory. For example, a contiguous ``(3, 4)`` array of type ``int16`` in C-order has strides ``(8, 2)``. This implies that to move from element to element in memory requires jumps of 2 bytes. To move from row-to-row, one needs to jump 8 bytes at a time (``2 * 4``). ctypes : ctypes object Class containing properties of the array needed for interaction with ctypes. base : ndarray If the array is a view into another array, that array is its `base` (unless that array is also a view). The `base` array is where the array data is actually stored. See Also -------- array : Construct an array. zeros : Create an array, each element of which is zero. empty : Create an array, but leave its allocated memory unchanged (i.e., it contains "garbage"). dtype : Create a data-type. numpy.typing.NDArray : An ndarray alias :term:`generic ` w.r.t. its `dtype.type `. Notes ----- There are two modes of creating an array using ``__new__``: 1. If `buffer` is None, then only `shape`, `dtype`, and `order` are used. 2. If `buffer` is an object exposing the buffer interface, then all keywords are interpreted. No ``__init__`` method is needed because the array is fully initialized after the ``__new__`` method. Examples -------- These examples illustrate the low-level `ndarray` constructor. Refer to the `See Also` section above for easier ways of constructing an ndarray. First mode, `buffer` is None: >>> import numpy as np >>> np.ndarray(shape=(2,2), dtype=float, order='F') array([[0.0e+000, 0.0e+000], # random [ nan, 2.5e-323]]) Second mode: >>> np.ndarray((2,), buffer=np.array([1,2,3]), ... offset=np.int_().itemsize, ... dtype=int) # offset = 1*itemsize, i.e. skip first element array([2, 3]) ---------------------------------------------------------------------- This is the User Guide documentation for the package. ---------------------------------------------------------------------- ### How to use AMBRIC ```{python} #| echo: false import matplotlib_inline.backend_inline matplotlib_inline.backend_inline.set_matplotlib_formats("svg") ``` Let's import the package. ```{python} from ambric import Ambric from ambric.utilities import generate_realistic_simulated_data ``` As a user, we have to bring a few things to the party. The first, of course, is data, which we will simulate. Let's set-up some simulated data. We'll specify how many underlying factors are driving regional dynamics first. ```{python} n_factors = 2 df = generate_realistic_simulated_data(n_factors=n_factors, R=12) ``` Data that are input into the model must have this structure: ```{python} df.sample(10) ``` The user must also specify the details of what the model will use. In particular, which regional variables, macroeconomic indicators (these should always have the aggregate region/top level geography as their region), and regional covariates to use. We'll just use all of these: ```{python} aggregation_region = "uk" region_names = [x for x in df["region"].unique() if x != aggregation_region] macro_names = [x for x in df["measure"].unique() if "macro" in x] region_covariate_names = [x for x in df["measure"].unique() if "regional_covar" in x] ``` Gotcha: you must ensure that the annual regional data have datetime index entries for **all** quarters up to the last published annual macro value. For non-year end quarters, the values should be nan. In a typical use case, you will have non-nan quarters of quarterly growth at the aggregate region (eg the UK) for which the regional data are nan. Those nans in the time period between is what we are nowcasting. Okay, we're ready to build a **AMBRIC** model! ```{python} amb = Ambric( df, macro_names, region_names, region_covariate_names, n_factors=n_factors, ) amb ``` Note that the model has specified all of its details, including that it sees that there are 6 rows of the regional data missing that will be estimated by the model. The model also tells us it isn't fitted, so let's sort that. We recommended using at least 100k iterations. ```{python} n_iterations = 200000 n_posterior_samples = 3000 amb.fit(n_iterations, n_posterior_samples) ``` That's it! It's done. Now let's look at some results. First, our regional estimates of quarterly growth must be consistent with the observed national growth. We can check the implied vs the true growth at the national level. ```{python} amb.plot_national_quarterly_vs_implied() ``` Next let's look at what the regional growth (q on 4 q earlier) looks like for all regions. ```{python} amb.plot_regional_annual_estimate() ``` We can also look at the underlying quarterly regional growth estimates (the latent $y_{t,r}$): ```{python} amb.plot_estimated_regional_quarterly() ``` And, if we want tables of the nowcasts, there's a built-in for that at either q-on-4q ```{python} amb.point_estimates_q_on_4q().iloc[-3:, :] ``` or q-on-q: ```{python} amb.point_estimates_q_on_q().iloc[-3:, :] ``` These can be turned into a regional index, rebased to 100 at the start of the sample: ```{python} amb.to_index_q_on_q().iloc[-3:, :] ``` For a less granular binned signal, `bands_indicator()` classifies each period into growth bands rather than just recession/expansion: ```{python} amb.bands_indicator().set_index(["region", "datetime"]).unstack(0).tail() ``` There is also access to all of the internal data generated when the model runs. The raw Bayesian samples can be retrieved using `amb.trace`, while the full set of predictions and outturns are available through `amb.populate_results()`: ```{python} amb.populate_results() ``` ## Factor and macro loadings AMBRIC's hierarchical loadings — $\Lambda$ for the regional factors, $\Gamma$ for the macro covariates, and $\delta_r$ for the XGBoost bridge signal (see the README for the full specification) — can be inspected after fitting. ```{python} amb.assemble_loadings_data().head() ``` ```{python} amb.plot_loadings_by_region() ``` ```{python} amb.plot_loadings_aggregate() ``` If you want to persist the fitted posterior to disk for reuse, call `amb.save_trace('path/to/trace.nc')`.