{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(econmt-bayes-bambi)=\n",
    "# Bayesian Inference Made Easier\n",
    "\n",
    "In this chapter, we'll look at how to perform analysis and regressions using Bayesian techniques using [**Bambi**](https://bambinos.github.io/bambi/), which stands for *BAyesian Model-Building Interface*. **Bambi** uses the full-fat Bayesian package **PyMC** under the hood but aims to make doing Bayesian inference much simpler.\n",
    "\n",
    "If you haven't yet read {ref}`econmt-bayesian`, you should start there before tackling this chapter.\n",
    "\n",
    "As in the previous chapter, we'll be using [**ArviZ**](https://arviz-devs.github.io/) for visualisation of the results from Bayesian inference, a package that builds on **Matplotlib**. You should follow the install instructions for **Bambi** carefully and, if you're confident with using different Python environments, it's a good idea to spin up a new and dedicated Python environment to try them out in. In case you need a refresher, the {ref}`wrkflow-environments` covers how to create distinct Python environments.\n",
    "\n",
    "Here are our initial imports and settings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import xarray as xr\n",
    "import warnings\n",
    "import bambi as bmb\n",
    "import arviz as az\n",
    "\n",
    "# Plot settings\n",
    "plt.style.use(\n",
    "    \"https://github.com/aeturrell/coding-for-economists/raw/main/plot_style.txt\"\n",
    ")\n",
    "az.style.use(\"arviz-darkgrid\")\n",
    "\n",
    "# Pandas: Set max rows displayed for readability\n",
    "pd.set_option(\"display.max_rows\", 23)\n",
    "\n",
    "# Set seed for random numbers\n",
    "seed_for_prng = 78557\n",
    "prng = np.random.default_rng(seed_for_prng)  # prng=probabilistic random number generator\n",
    "# Turn off warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51a55374",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import matplotlib_inline.backend_inline\n",
    "\n",
    "matplotlib_inline.backend_inline.set_matplotlib_formats(\"svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bayesian Models via Formulae with Bambi\n",
    "\n",
    "Packages like **PyMC** give you a lot of flexibility, but you do have to think about what to specify for priors and model setup, even for quite simple and standard Bayesian models. [**Bambi**](https://bambinos.github.io/bambi/), BAyesian Model-Building Interface, provides a more user friendly and high level model-building interface that builds on **PyMC** and is designed to make it easy to fit Bayesian mixed-effects models. Most notably, it comes with a formulae API that allows you to specify your model with a string describing a formula (like **statsmodels** for conventional regression).\n",
    "\n",
    "Let's see how to specify the model we did in {ref}`econmt-bayesian` only using **Bambi**. As a reminder, the model was\n",
    "\n",
    "$$\n",
    "\\mu = \\alpha + \\beta x\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "$$\n",
    "Y \\mid \\alpha, \\beta, \\sigma \\stackrel{\\text{ind}}{\\thicksim} \\mathcal{N}(\\mu, \\sigma^2)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, though, we need to put the data from the first example in the previous chapter into a dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# True parameter values\n",
    "alpha_true, beta_true, sigma_true = 1, 2.5, 1.5\n",
    "\n",
    "# Size of dataset\n",
    "size = 100\n",
    "\n",
    "num_samples = 1000\n",
    "num_chains = 2\n",
    "\n",
    "# Predictor variable: random sample\n",
    "X = prng.standard_normal(size)\n",
    "\n",
    "# Simulate outcome variable\n",
    "Y = alpha_true + beta_true * X + prng.standard_normal(size) * sigma_true\n",
    "df_bambi = pd.DataFrame({\"X\": X, \"Y\": Y})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise the model; use \"-1\" at the end to suppress the constant term\n",
    "model = bmb.Model('Y ~ X', df_bambi)\n",
    "\n",
    "# Fit the model using 1000 on each of 4 chains\n",
    "results = model.fit(draws=1000, chains=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use ArviZ to plot the results\n",
    "az.plot_trace(results)\n",
    "\n",
    "# Key summary and diagnostic info on the model parameters\n",
    "az.summary(results, round_to=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This was a a lot easier to run, and it found much the same results! Note that the same set of variables as we saw in the first example are just appearing with different names here—the true values are an intercept of 1, a coefficient on $X$ of 2.5, and a standard deviation of 1.5 (the variable `Y_sigma`) above. We didn't even have to specify priors; **Bambi** chose these for us. Let's see exactly what it did by looking at the `model` variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see from this description that **Bambi** went for slightly more tight priors than we did, and opted for the (very popular) half-Student T distribution for the standard deviation\n",
    "\n",
    "- $\\text{Intercept} \\thicksim \\mathcal{N}(1.3, 7.1)$\n",
    "- $X \\thicksim \\mathcal{N}(0, 7)$\n",
    "- $\\sigma \\thicksim \\mathcal{t_{+}}(4.0, 2.8)$ (the half-Student-t distribution)\n",
    "\n",
    "\n",
    "where the half-Student-t distribution is\n",
    "\n",
    "$$\n",
    "\\begin{split}    \\begin{align}\n",
    "    f(y;\\mu, \\sigma) = \\left\\{\\begin{array}{cll}\n",
    "\\frac{\\Gamma\\left(\\frac{\\nu+1}{2}\\right)}{\\Gamma\\left(\\frac{\\nu}{2}\\right)\\sqrt{\\pi \\nu \\sigma^2}}\\left(1 + \\frac{(y-\\mu)^2}{\\nu \\sigma^2}\\right)^{-\\frac{\\nu + 1}{2}} &  & y \\ge \\mu \\\\[1em]\n",
    "0 & & \\text{otherwise}.\n",
    "\\end{array}\\right.\n",
    "    \\end{align}\\end{split}\n",
    "$$\n",
    "\n",
    "**Bambi** will try to choose sensible priors for you."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fixed Effects in Bambi\n",
    "\n",
    "Models with fixed effects are also easy to implement. When a categorical common effect with $N$ levels is added to a model, by default, it is coded by $N-1$ dummy variables (i.e., reduced-rank coding). To explicitly remove the intercept, add \"-1\" to the formula string—just like with frequentist regression in **statsmodels**.\n",
    "\n",
    "**Bambi** will recognise when a variable is of categorical or Boolean type. But, as the Zen of Python says, explicit is better than implicit; we should also specify that a variable is categorical or boolean (a fixed effect) directly in the formula. This is done, just as in **statsmodels**, by enclosing the variable in a \"C(...)\", for \"category\". So we'll add `\"C(D)\"` to the model formula specification.\n",
    "\n",
    "Let's revisit the example from earlier but implement it directly in **Bambi** and remove the intercept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# True parameter values\n",
    "α_true, β_true, σ_true, γ_true = 1, 2.5, 1.5, 6\n",
    "\n",
    "# Size of dataset\n",
    "size = 200\n",
    "\n",
    "num_samples = 1000\n",
    "num_chains = 4\n",
    "\n",
    "# Predictor variables\n",
    "X_cat = prng.standard_normal(size)\n",
    "D = prng.binomial(1, 0.4, size)  # This chooses 1 or 0 with 0.4 prob for 1\n",
    "\n",
    "# Simulate outcome variable\n",
    "Y_cat = α_true + β_true * X_cat + γ_true*D + prng.standard_normal(size) * σ_true"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now let's specify the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_bambi_cat = pd.DataFrame({\"X\": X_cat, \"Y\": Y_cat, \"D\": D})\n",
    "df_bambi_cat[\"D\"] = df_bambi_cat[\"D\"].astype(\"category\")\n",
    "model_cat = bmb.Model(\"Y ~ X + C(D)\", data=df_bambi_cat)\n",
    "model_cat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, **Bambi** has made slightly different choices but ones that seem reasonable. Let's fit the model and look at the estimated values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fit the model using 1000 on each of 4 chains\n",
    "results_cat = model_cat.fit(draws=1000, chains=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "az.summary(results_cat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is very similar to the estimate using the longer way of describing a model using **PyMC** alone."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Specifying Priors in Bambi\n",
    "\n",
    "Using *Bambi*, we didn't specify the prior at all—it chose for us. So how do we modify or specify a prior should we need to?\n",
    "\n",
    "You can always specify priors from the full **PyMC** selection should you need them. The first way is to more vaguely specify them in the form of a dictionary mapping variable names to types, for example\n",
    "\n",
    "```python\n",
    "prior = {\"condition\":\"superwide\"}\n",
    "```\n",
    "\n",
    "which scales the priors of the distribution by 0.8 (the default is `\"wide\"`, which has a scale of $\\sqrt{1/3}$). But we can also specify full priors. Let's change the prior in this model to demonstrate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bambi import Prior\n",
    "\n",
    "prior = {\"D\": Prior(\"Normal\", mu=0.5, sigma=10)}\n",
    "model_cat = bmb.Model(\"Y ~ X + C(D)\", data=df_bambi_cat, priors=prior)\n",
    "results_cat = model_cat.fit(draws=1000, chains=4)\n",
    "az.summary(results_cat, round_to=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mixed Effects Models in Bambi\n",
    "\n",
    "This section is indebted to an example in the **Bambi** [documentation](https://bambinos.github.io/bambi/).\n",
    "\n",
    "We are going to demonstrate how to perform a random and fixed effects analysis making use of a replication of a study by Strack, Martin & Stepper (1988). The original Strack et al. study tested a facial feedback hypothesis arguing that emotional responses are, in part, driven by facial expressions (rather than expressions always following from emotions). Strack and colleagues reported that participants rated cartoons as more funny when the participants held a pen in their teeth (inducing a smile) than when they held a pen between their lips (inducing a pout). This outcome variable is recorded as `\"value\"` in the data. The article has been cited over 1,400 times, and has been influential in popularising the view that affective experiences and outward expressions of affective experiences can both influence each other (instead of the relationship being a one-way street from experience to expression). In 2016, a Registered Replication Report (RRR) led by Wagenmakers and colleagues attempted to replicate Study 1 from Strack, Martin, & Stepper (1988) in 17 independent experiments comprising over 2,500 participants.\n",
    "\n",
    "Here we use the Bambi model-building interface to quickly re-analyse the RRR data using a Bayesian approach and making use of random effects and fixed effects.\n",
    "\n",
    "Let's pull down the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_rrr = pd.read_csv(\"https://github.com/bambinos/bambi/raw/main/docs/notebooks/data/rrr_long.csv\")\n",
    "df_rrr.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`\"value\"` represents the rating of the cartoon, while `\"condition\"` is an indicator of whether the participant was made to smile or not. `\"uid\"` is a unique identifier for each individual. We'll also introduce controls for gender and age, and drop any invalid values.\n",
    "\n",
    "Now, a purely fixed effects model for this would look like"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bmb.Model(\"value ~ condition + age + gender\", df_rrr, dropna=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that an intercept was automatically added.\n",
    "\n",
    "This model has steam-rollered through some potentially useful information. For example, there is a variable `\"study\"` that captures which study the subject's experiment was performed in. It takes one of 17 values. While we might expect that subjects responses will have the same pattern of effect sizes, it's reasonable to think some features of different studies might vary. \n",
    "\n",
    "To capture some of the variation, we can add a random effect to the model. We'll add intercept deviations for each study. What we're saying here is that we'll allow the intercept to be different for each study, as long the values are drawn from a distribution. Let $i$ represent an individual and $j$ a study. The model is\n",
    "\n",
    "$$\n",
    "\\text{value}_{ij} = \\alpha \\cdot \\text{condition}_{i} + \\beta \\cdot \\text{age}_{i} + \\gamma \\cdot \\text{gender}_{i} + W_j\n",
    "$$\n",
    "\n",
    "where $W_j$ is a random effect intercept. Here's how to specify it in **Bambi**; we use the notation `\"(1|study)\"` to declare that there should be a constant offset for each study drawn from a distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fixed effects and group specific (or random) intercepts for study\n",
    "model_rrr = bmb.Model(\"value ~ condition + age + gender + (1|study)\", df_rrr, dropna=True)\n",
    "model_rrr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Bambi** has chosen to draw the study-level intercepts from a normal distribution, with a prior on the standard deviation of that normal that is itself a half-normal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_rrr = model_rrr.fit(draws=1000, chains=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "az.plot_trace(results_rrr,\n",
    "              compact=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've seen much of the above before, but the posterior that looks a bit different is the one for the intercept based on the study. We see here that it is not one parameter, but 17 deviations, plus a mean that the deviations are relative to, plus a standard deviation. We can see the estimates in the normal way, using `az.summary`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "az.summary(results_rrr, round_to=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, this isn't the only way we can add random effects. As well as study-specific intercepts, we can add study-specific slopes to the model. That is, we’ll assume that the subjects at each research site have a different baseline appreciation of the cartoons (some find the cartoons funnier than others), and that the effect of condition also varies across sites. The equation for this model is\n",
    "\n",
    "$$\n",
    "\\text{value}_{ij} = \\delta_{j} + \\alpha_{j} \\cdot \\text{condition}_{i} + \\beta \\cdot \\text{age}_{i} + \\gamma \\cdot \\text{gender}_{i}\n",
    "$$\n",
    "\n",
    "where $\\delta_{j}$ is an intercept that depends on the study, and $\\alpha_{j}$ is a slope that depends on the study. Let's run this model using the syntax `\"(condition|study)\"` to apply the slope and intercept based on the study random effect. If we wanted slopes specific to each study without including a study specific intercept, we would write `\"value ~ condition + age + gender + (0 + condition | study)\"` instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fixed effects and group specific (or random) intercepts & slopes for study\n",
    "model_slope = bmb.Model(\"value ~ condition + age + gender + (condition|study)\", df_rrr, dropna=True)\n",
    "model_slope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at the diagram of the model, with the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_slope = model_slope.fit(draws=2000, chains=2, target_accept=0.99)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look at the extra variables we introduced through this one change:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "az.plot_trace(results_slope,\n",
    "              var_names=['condition|study_sigma', \"condition|study\", \"1|study\", '1|study_sigma']);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, as well as having study specific intercepts, we have a slope that characterises how the effect of condition on value varies by study.\n",
    "\n",
    "We can look at the posterior of condition more closely to determine whether it does affect how funny the study participants found cartoons—and now we're taking lots of extra variation into account. We can also compare the size of this coefficient against other variables in the model to get a rough idea of how substantial this effect is. We'll use the `plot_forest` function to do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "az.plot_forest(\n",
    "    results_slope,\n",
    "    var_names=[\"Intercept\", \"condition\", \"age\", \"gender\"],\n",
    "    figsize=(8, 2),\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we see at best a weak relationship between the condition of a subject and the value they give, and the coefficient's range includes 0. While this model is just meant to be an example (not a proper analysis), the conclusion you might draw from the above is that the results of the original study don't replicate, and there is *no* effect.\n",
    "\n",
    "This was just a short tour of what can be achieved using formulae in **Bambi**. Check out the [documentation](https://bambinos.github.io/bambi/) for more."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bayesian Generalised Linear Models\n",
    "\n",
    "Just as we can perform a wider variety of regressions using frequentist maximum likelihood methods, for example logit (aka following Fermi-Dirac statistics), probit, and poission models, so too can we perform these regressions using Bayesian methods.\n",
    "\n",
    "Let's see an example of how to perform a logistic regression using some synthetic data. We're going to examine the propensity of students to stay in education at 18 years of age as a *binary outcome* (0 for leaving education and 1 for staying in it) and see how it is predicted by a measure of parental income as a fraction of the median income, frac. We'll also use a fixed effect for whether their parents are divorced or not (1 or 0 respectively), called div. We'll create our own synthetic data to illustrate the problem.\n",
    "\n",
    "The model we'll use is a *logit*, assuming that the data generating process goes like this:\n",
    "\n",
    "$$\n",
    "\\sigma^{-1}(p) = \\ln\\left(\\frac{p}{1-p}\\right) = X'\\cdot \\vec{\\beta} = \\alpha + \\beta \\cdot \\text{frac} + \\gamma \\cdot \\text{div}\n",
    "$$\n",
    "\n",
    "where $p$ is probability of staying in education and $\\sigma$ is the \"link function\" that translates variables and coefficients into a probability. $\\ln( p/(1-p))$ is called the log-odds as it is the log of the odds ratio. The odds ratio is the ratio of the probability, $p$, to the complement of the probability, $1-p$. Note that this model implies that the log-odds ratio is modelled by a standard linear regression. These definitions also imply that the link function is\n",
    "\n",
    "$$\n",
    "p = \\sigma\\left(X'\\cdot \\vec{\\beta}\\right) = \\frac{1}{1 + e^{-X'\\cdot \\vec{\\beta}}}\n",
    "$$\n",
    "\n",
    "While $p\\in[0, 1]$, $\\sigma^{-1}(p) \\in (-\\infty, \\infty)$ so this link function maps the real number line into the interval zero to one.\n",
    "\n",
    "Of course, we're actually dealing with a *binary* variable here—outcomes can be 0 or 1, and nothing inbetween, so there's one more piece of the puzzle. Conditional on confounders, i.e. given the value of $p$, the chance of the outcome variable $y$ being 0 or 1 is $p$. In other words, $P(Y=1|y) = p$, which is the definition of the probability mass function of the Bernoulli distribution. Taking that approach over all outcomes $y$, we can say that\n",
    "\n",
    "$$\n",
    "y_i \\thicksim \\text{Bernoulli}(p_i)\n",
    "$$\n",
    "\n",
    "Let's generate some synthetic data with these properties; first we set the true values of the data generating process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "beta_true = 10\n",
    "gamma_true = -2\n",
    "alpha_true = -6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now let's generate some synthetic data. We'll just grab random numbers uniformly between 0 and 2x median income, and assume that there is equal chance of parents being divorced or not (ie a balanced class)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nobs = 100\n",
    "df_sch = pd.DataFrame({\"frac\": prng.uniform(0, 2, nobs),\n",
    "                                \"div\": prng.integers(0, 2, size=nobs)})\n",
    "\n",
    "beta_dot_x = df_sch[\"frac\"]*beta_true + df_sch[\"div\"]*gamma_true + alpha_true\n",
    "p_vec = 1/(1+np.exp(-(beta_dot_x)))\n",
    "# Now sample from Bernoulli (binomial with n=1)\n",
    "y_vec = prng.binomial(1, p=p_vec, size=nobs)\n",
    "df_sch[\"stay\"] = y_vec\n",
    "df_sch.sample(5, random_state=seed_for_prng)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll perform Bayesian inference on the model we've constructed. We could write this model out in full using **PyMC**, but that's quite long-winded and **Bambi** offers an easier syntax to do it via a formulae specification. The only addition to what we've seen already is that we're going to specify the family of link functions. There are plenty available, but in this case we'll want the \"Bernoulli\" family because we're dealing with a single 0 or 1 outcome.\n",
    "\n",
    "Let's specify and fit the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_logit = bmb.Model('stay ~ frac + C(div)', df_sch, family=\"bernoulli\")\n",
    "model_logit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_logit = model_logit.fit(draws=3000, chains=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, we know the true values, so we'll create a chart of the coefficients and the highest density interval around them alongside the true values. We'll use **ArviZ**'s `plot_forest()` function for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "az.plot_forest(results_logit, ax=ax)\n",
    "for i, val in enumerate([gamma_true, beta_true, alpha_true]):\n",
    "    ax.scatter(val, ax.get_yticks()[i], s=75, color=\"red\", zorder=5, edgecolors=\"k\")\n",
    "plt.suptitle(\"Estimated vs true parameter values\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It would be nice to see some example samples from the posterior alongside the original data, to check that the model produces sensible results. For this, we can use the predict function. Rather than predicting the entire density range, we'll just predict some means here (`kind=\"mean\"`) and put them into a separate variable (via `inplace=False`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idata_mean = model_logit.predict(results_logit, kind=\"mean\", inplace=False)\n",
    "idata_mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Armed with predictions of the posterior distribution, we can now chart the original data alongside the model predictions. In the plot below, you'll notice that the predictions seem to lie along two separate lines. Do you know why?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dot_transp = 0.4\n",
    "dot_size = 40\n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(df_sch.loc[df_sch[\"div\"] == 0, \"frac\"], df_sch.loc[df_sch[\"div\"] == 0, \"stay\"],\n",
    "           color=\"blue\", alpha=dot_transp, label=\"Parents together\", s=dot_size)\n",
    "ax.scatter(df_sch.loc[df_sch[\"div\"] == 1, \"frac\"], df_sch.loc[df_sch[\"div\"] == 1, \"stay\"],\n",
    "           color=\"green\", alpha=dot_transp, label=\"Parents divorced\", s=dot_size)\n",
    "ax.scatter(df_sch[\"frac\"], idata_mean.posterior.stay_mean.mean(axis=0).mean(axis=0),\n",
    "           label=\"posterior mean\", color=\"C1\", alpha=dot_transp, s=dot_size)\n",
    "ax.set_xlabel(\"Fraction of median income\")\n",
    "ax.set_title(\"Staying in education or not: posterior predictions\")\n",
    "ax.set_yticks([0, 1])\n",
    "ax.set_yticklabels([\"0: Left education\", \"1: Stayed in education\"])\n",
    "ax.legend(fontsize=10, loc=\"lower right\")\n",
    "ax.set_xlim(-0.25, 2.25)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to logit models, **PyMC** and **Bambi** both support a wide range of other generalised linear models."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('codeforecon')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
    "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}