Data Visualisation using the Grammar of Graphics with Plotnine

Contents

Data Visualisation using the Grammar of Graphics with Plotnine#

Introduction#

Here you’ll see how to use declarative plotting library plotnine.

Note

We recommend you use letsplot for declarative plotting but plotnine is an excellent alternative.

plotnine is, like seaborn, a declarative library. Unlike seaborn, it adopts the ‘grammar of graphics’ approach inspired by the book ‘The Grammar of Graphics’ by Leland Wilkinson. plotnine is heavily inspired by the API of the popular ggplot2 plotting package in the statistical programming language R. The point behind the grammar of graphics approach is that users can compose plots by explicitly mapping data to the various elements that make up the plot. It is a particularly effective approach for a whole slew of standard plots created from tidy data.

As ever, we’ll start by importing some key packages:

import numpy as np
import pandas as pd
import seaborn as sns  # Just for some data

# Set seed for random numbers
seed_for_prng = 78557
prng = np.random.default_rng(
    seed_for_prng
)  # prng=probabilistic random number generator

Let’s take a look at how to do a simple scatter plot in plotnine. We’ll use the mtcars dataset.

from plotnine import aes, geom_point, ggplot
from plotnine.data import mtcars

(ggplot(mtcars, aes("wt", "mpg")) + geom_point())
_images/1f167edd3dd8990f128ae7419bcd947318543b6a6ad93c34e22b9bb27d19be8f.png

Here, ggplot is the organising framework for creating a plot and mtcars is a dataframe with the data in that we’d like to plot. aes stands for aesthetic mapping and it tells plotnine which columns of the dataframe to treat as the x and y axis (in that order). Finally, geom_point() tells plotnine to add scatter points to the plot.

If we want to add colour, we pass a colour keyword argument to aes like so (with ‘factor’ meaning treat the variable like it’s a categorical):

(ggplot(mtcars, aes("wt", "mpg", color="factor(gear)")) + geom_point())
_images/0345a0b546ef6b8925e505b8ec4d5ada2c29c9064be01412dc877ae43b0b9d7f.png

One of the nice aspects of the grammar of graphics approach, perhaps its best feature, is that switching to other types of ‘geom’ (aka chart type) is as easy as calling the same code but with a different ‘geom’ switched in. Note that, because we only imported one element at a time from plotnine we do need to explicitly import any other ‘geoms’ that we’d like to use, as in the next example below. But we could have just imported everything from plotnine instead using from plotnine import *.

The next example shows how easy it is to switch between ‘geoms’.

from plotnine import geom_smooth

(ggplot(mtcars, aes("wt", "mpg")) + geom_smooth())
/home/runner/micromamba/envs/codeforecon/lib/python3.10/site-packages/plotnine/stats/smoothers.py:347: PlotnineWarning: Confidence intervals are not yet implemented for lowess smoothings.
_images/3b142f2597f2ac73549cdae6f1adb6039b9026b61a355487a6b3114777ef8069.png

Furthermore, we can add multiple geoms to the same chart by layering them within the same call to the ggplot() function:

(ggplot(mtcars, aes("wt", "mpg")) + geom_smooth(color="blue") + geom_point())
/home/runner/micromamba/envs/codeforecon/lib/python3.10/site-packages/plotnine/stats/smoothers.py:347: PlotnineWarning: Confidence intervals are not yet implemented for lowess smoothings.
_images/2e92dcda2bcca9711cc80c153c316813e3dbe3e6fa0f493fc2ebc2c241e2767e.png

Just like seaborn and matplotlib, we can create facet plots too–but this time they’re just a variation on the same underlying call to ggplot(). Let’s see that same example of GDP by country rendered with plotnine. First, we need to grab the data:

from datetime import datetime

from pandas_datareader import wb

ts_start_date = pd.to_datetime("1999-01-01")
ts_end_date = datetime.now()
countries = ["GBR", "USA"]
gdf_const_2015_usd_code = "NY.GDP.MKTP.KD"
df = wb.download(
    indicator=gdf_const_2015_usd_code,
    country=countries,
    start=ts_start_date,
    end=ts_end_date,
).reset_index()
df["growth, %"] = df.groupby("country")[gdf_const_2015_usd_code].transform(
    lambda x: 100 * x.pct_change(1)
)
df = df.sort_values(by="year")
df = df.reset_index(drop=True)
df["year"] = df["year"].astype("float")  # needed for plotnine
df.head()
/tmp/ipykernel_8137/681287435.py:9: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead
country year NY.GDP.MKTP.KD growth, %
0 United Kingdom 1999.0 2.196830e+12 -4.160980
1 United States 1999.0 1.318024e+13 -3.917833
2 United States 2000.0 1.371768e+13 -0.946494
3 United Kingdom 2000.0 2.292208e+12 -2.508249
4 United Kingdom 2001.0 2.351182e+12 -1.763986

Now we can get on with plotting it:

from plotnine import element_text, facet_wrap, geom_line, theme

(
    ggplot(df.dropna(), aes(x="year", y="growth, %", color="country"))
    + geom_line()
    + facet_wrap("country", nrow=2)
    + theme(axis_text_x=element_text(rotation=90))
)
_images/ff9aedc7eec2e44624463e96dce25a02c4e71bbf1bf2ae1c0092ede3df4d62be.png

plotnine can do many of the same types of charts as seaborn; let’s see some similar examples:

from plotnine import geom_violin, scale_fill_manual

tips = sns.load_dataset("tips")

(
    ggplot(tips, aes("day", "total_bill", fill="smoker"))
    + geom_violin(tips)
    + scale_fill_manual(values=["dodgerblue", "darkorange"])
)
_images/e2dee987ee7e7eba401409ce184f8dc247282132d88f27a4b896268fe23a4daf.png
from plotnine import labs

penguins = sns.load_dataset("penguins")

(
    ggplot(
        penguins, aes(x="bill_length_mm", y="bill_depth_mm", color="factor(species)")
    )
    + geom_point()
    + geom_smooth(method="lm")
    + labs(x="Bill length (mm)", y="Bill depth (mm)")
)
/home/runner/micromamba/envs/codeforecon/lib/python3.10/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 2 rows containing missing values.
_images/d321744af0ce494d70f6136b5eede0f7044dc47d9719df82000969d233b1ae29.png

Finally, an example of great practical use during exploratory analysis, the kernel density plot:

from plotnine import geom_density
from plotnine.data import mpg

(ggplot(mpg, aes(x="cty", color="drv", fill="drv")) + geom_density(alpha=0.1))
_images/d5edcbbba66cab80566d5c8b854981c510806349b6ca6fb78862810f0f32a888.png