Data Visualisation using the Grammar of Graphics with Plotnine#

Introduction#

Here you’ll see how to use declarative plotting library plotnine.

plotnine is, like seaborn, a declarative library. Unlike seaborn, it adopts the ‘grammar of graphics’ approach inspired by the book ‘The Grammar of Graphics’ by Leland Wilkinson. plotnine is heavily inspired by the API of the popular ggplot2 plotting package in the statistical programming language R. The point behind the grammar of graphics approach is that users can compose plots by explicitly mapping data to the various elements that make up the plot. It is a particularly effective approach for a whole slew of standard plots created from tidy data.

As ever, we’ll start by importing some key packages:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns  # Just for some data

# Set seed for random numbers
seed_for_prng = 78557
prng = np.random.default_rng(seed_for_prng)  # prng=probabilistic random number generator

Let’s take a look at how to do a simple scatter plot in plotnine. We’ll use the mtcars dataset.

from plotnine import ggplot, geom_point, aes
from plotnine.data import mtcars

(ggplot(mtcars, aes("wt", "mpg")) + geom_point())
_images/vis-plotnine_5_0.svg
<ggplot: (8791946433982)>

Here, ggplot is the organising framework for creating a plot and mtcars is a dataframe with the data in that we’d like to plot. aes stands for aesthetic mapping and it tells plotnine which columns of the dataframe to treat as the x and y axis (in that order). Finally, geom_point tells plotnine to add scatter points to the plot.

If we want to add colour, we pass a colour keyword argument to aes like so (with ‘factor’ meaning treat the variable like it’s a categorical):

(ggplot(mtcars, aes("wt", "mpg", color="factor(gear)")) + geom_point())
_images/vis-plotnine_7_0.svg
<ggplot: (8791953994424)>

One of the nice aspects of the grammar of graphics approach, perhaps its best feature, is that switching to other types of ‘geom’ (aka chart type) is as easy as calling the same code but with a different ‘geom’ switched in. Note that, because we only imported one element at a time from plotnine we do need to explicitly import any other ‘geoms’ that we’d like to use, as in the next example below. But we could have just imported everything from plotnine instead using from plotnine import *.

The next example shows how easy it is to switch between ‘geoms’.

from plotnine import geom_smooth

(ggplot(mtcars, aes("wt", "mpg")) + geom_smooth())
/opt/anaconda3/envs/codeforecon/lib/python3.8/site-packages/plotnine/stats/smoothers.py:309: PlotnineWarning: Confidence intervals are not yet implemented for lowess smoothings.
_images/vis-plotnine_9_1.svg
<ggplot: (8791953823374)>

Furthermore, we can add multiple geoms to the same chart by layering them within the same call to the ggplot function:

(ggplot(mtcars, aes("wt", "mpg")) + geom_smooth(color="blue") + geom_point())
/opt/anaconda3/envs/codeforecon/lib/python3.8/site-packages/plotnine/stats/smoothers.py:309: PlotnineWarning: Confidence intervals are not yet implemented for lowess smoothings.
_images/vis-plotnine_11_1.svg
<ggplot: (8791946423019)>

Just like seaborn and matplotlib, we can create facet plots too–but this time they’re just a variation on the same underlying call to ggplot. Let’s see that same example of GDP by country rendered with plotnine. First, we need to grab the data:

import pandas_datareader.data as web

ts_start_date = pd.to_datetime("1999-01-01")

df = pd.concat(
    [
        web.DataReader("ticker=RGDP" + x, "econdb", start=ts_start_date)
        for x in ["US", "UK"]
    ],
    axis=1,
)
df.columns = ["US", "UK"]
df.index.name = "Date"
tidy_df = (100 * df.pct_change(4)).stack().reset_index()
tidy_df.columns = ["Date", "Country", "Real GDP growth, %"]
tidy_df.head()
Date Country Real GDP growth, %
0 2000-01-01 US 4.225956
1 2000-01-01 UK 4.774235
2 2000-04-01 US 5.244683
3 2000-04-01 UK 5.035943
4 2000-07-01 US 3.974084

Now we can get on with plotting it:

from plotnine import geom_line, facet_wrap, theme, element_text

(
    ggplot(tidy_df, aes(x="Date", y="Real GDP growth, %", color="factor(Country)"))
    + geom_line()
    + facet_wrap("Country", nrow=2)
    + theme(axis_text_x=element_text(rotation=90))
)
_images/vis-plotnine_15_0.svg
<ggplot: (8791953994394)>

plotnine can do many of the same types of charts as seaborn; let’s see some similar examples:

from plotnine import geom_violin, scale_fill_manual

tips = sns.load_dataset("tips")

(
    ggplot(tips, aes("day", "total_bill", fill="smoker"))
    + geom_violin(tips)
    + scale_fill_manual(values=["dodgerblue", "darkorange"])
)
_images/vis-plotnine_17_0.svg
<ggplot: (8791946420417)>
from plotnine import labs

penguins = sns.load_dataset("penguins")

(
    ggplot(penguins, aes(x="bill_length_mm", y="bill_depth_mm", color="factor(species)"))
    + geom_point()
    + geom_smooth(method="lm")
    + labs(x="Bill length (mm)", y="Bill depth (mm)")
)
/opt/anaconda3/envs/codeforecon/lib/python3.8/site-packages/plotnine/layer.py:412: PlotnineWarning: geom_point : Removed 2 rows containing missing values.
_images/vis-plotnine_18_1.svg
<ggplot: (8791955347023)>

Finally, an example of great practical use during exploratory analysis, the kernel density plot:

from plotnine import geom_density
from plotnine.data import mpg

(ggplot(mpg, aes(x="cty", color="drv", fill="drv")) + geom_density(alpha=0.1))
_images/vis-plotnine_20_0.svg
<ggplot: (8791955344198)>