Data Visualisation using the Grammar of Graphics with Plotnine#
Introduction#
Here you’ll see how to use declarative plotting library plotnine.
Note
We recommend you use letsplot for declarative plotting but plotnine is an excellent alternative.
plotnine is, like seaborn, a declarative library. Unlike seaborn, it adopts the ‘grammar of graphics’ approach inspired by the book ‘The Grammar of Graphics’ by Leland Wilkinson. plotnine is heavily inspired by the API of the popular ggplot2 plotting package in the statistical programming language R. The point behind the grammar of graphics approach is that users can compose plots by explicitly mapping data to the various elements that make up the plot. It is a particularly effective approach for a whole slew of standard plots created from tidy data.
As ever, we’ll start by importing some key packages:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns # Just for some data
# Set seed for random numbers
seed_for_prng = 78557
prng = np.random.default_rng(seed_for_prng) # prng=probabilistic random number generator
Let’s take a look at how to do a simple scatter plot in plotnine. We’ll use the mtcars dataset.
from plotnine import ggplot, geom_point, aes
from plotnine.data import mtcars
(ggplot(mtcars, aes("wt", "mpg")) + geom_point())
<Figure Size: (640 x 480)>
Here, ggplot
is the organising framework for creating a plot and mtcars
is a dataframe with the data in that we’d like to plot. aes
stands for aesthetic mapping and it tells plotnine which columns of the dataframe to treat as the x and y axis (in that order). Finally, geom_point()
tells plotnine to add scatter points to the plot.
If we want to add colour, we pass a colour keyword argument to aes
like so (with ‘factor’ meaning treat the variable like it’s a categorical):
(ggplot(mtcars, aes("wt", "mpg", color="factor(gear)")) + geom_point())
<Figure Size: (640 x 480)>
One of the nice aspects of the grammar of graphics approach, perhaps its best feature, is that switching to other types of ‘geom’ (aka chart type) is as easy as calling the same code but with a different ‘geom’ switched in. Note that, because we only imported one element at a time from plotnine we do need to explicitly import any other ‘geoms’ that we’d like to use, as in the next example below. But we could have just imported everything from plotnine instead using from plotnine import *
.
The next example shows how easy it is to switch between ‘geoms’.
from plotnine import geom_smooth
(ggplot(mtcars, aes("wt", "mpg")) + geom_smooth())
/Users/aet/mambaforge/envs/codeforecon/lib/python3.10/site-packages/plotnine/stats/smoothers.py:330: PlotnineWarning: Confidence intervals are not yet implemented for lowess smoothings.
<Figure Size: (640 x 480)>
Furthermore, we can add multiple geoms to the same chart by layering them within the same call to the ggplot()
function:
(ggplot(mtcars, aes("wt", "mpg")) + geom_smooth(color="blue") + geom_point())
/Users/aet/mambaforge/envs/codeforecon/lib/python3.10/site-packages/plotnine/stats/smoothers.py:330: PlotnineWarning: Confidence intervals are not yet implemented for lowess smoothings.
<Figure Size: (640 x 480)>
Just like seaborn and matplotlib, we can create facet plots too–but this time they’re just a variation on the same underlying call to ggplot()
. Let’s see that same example of GDP by country rendered with plotnine. First, we need to grab the data:
from pandas_datareader import wb
from datetime import datetime
ts_start_date = pd.to_datetime("1999-01-01")
ts_end_date = datetime.now()
countries = ["GBR", "USA"]
gdf_const_2015_usd_code = 'NY.GDP.MKTP.KD'
df = wb.download(indicator=gdf_const_2015_usd_code, country=countries, start=ts_start_date, end=ts_end_date).reset_index()
df["growth, %"] = df.groupby("country")[gdf_const_2015_usd_code].transform(lambda x: 100*x.pct_change(1))
df = df.sort_values(by="year")
df = df.reset_index(drop=True)
df["year"] = df["year"].astype("float") # needed for plotnine
df.head()
country | year | NY.GDP.MKTP.KD | growth, % | |
---|---|---|---|---|
0 | United Kingdom | 1999.0 | 2.197118e+12 | -4.139935 |
1 | United States | 1999.0 | 1.321548e+13 | -3.917439 |
2 | United States | 2000.0 | 1.375430e+13 | -0.945317 |
3 | United Kingdom | 2000.0 | 2.292006e+12 | -2.513914 |
4 | United Kingdom | 2001.0 | 2.351111e+12 | -1.760439 |
Now we can get on with plotting it:
from plotnine import geom_line, facet_wrap, theme, element_text
(
ggplot(df.dropna(), aes(x="year", y="growth, %", color="country"))
+ geom_line()
+ facet_wrap("country", nrow=2)
+ theme(axis_text_x=element_text(rotation=90))
)
<Figure Size: (640 x 480)>
plotnine can do many of the same types of charts as seaborn; let’s see some similar examples:
from plotnine import geom_violin, scale_fill_manual
tips = sns.load_dataset("tips")
(
ggplot(tips, aes("day", "total_bill", fill="smoker"))
+ geom_violin(tips)
+ scale_fill_manual(values=["dodgerblue", "darkorange"])
)
<Figure Size: (640 x 480)>
from plotnine import labs
penguins = sns.load_dataset("penguins")
(
ggplot(penguins, aes(x="bill_length_mm", y="bill_depth_mm", color="factor(species)"))
+ geom_point()
+ geom_smooth(method="lm")
+ labs(x="Bill length (mm)", y="Bill depth (mm)")
)
/Users/aet/mambaforge/envs/codeforecon/lib/python3.10/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 2 rows containing missing values.
<Figure Size: (640 x 480)>
Finally, an example of great practical use during exploratory analysis, the kernel density plot:
from plotnine import geom_density
from plotnine.data import mpg
(ggplot(mpg, aes(x="cty", color="drv", fill="drv")) + geom_density(alpha=0.1))
<Figure Size: (640 x 480)>