Common Plots#

Introduction#

In this chapter, we’ll look at some of the most common plots that you might want to make–and how to create them using the most popular data visualisations libraries, including matplotlib, plotnine, seaborn, altair, and plotly. If you need an introduction to these libraries, see the previous chapter.

This chapter has benefited from viztech, a repository that aims to recreate the entire Financial Times Visual Vocabulary using plotnine, from the plotnine documentation, from the matplotlib documentation, from the seaborn documentation, from the altair documentation, from the plotly documentation, and from examples posted around the web on forums and in blog posts. It’s also worth noting that I’m more of an expert in matplotlib than anything else–I would really welcome contributions in the form of plots with particular libraries that I have not been able to find or implement myself.

Bear in mind that for many of the matplotlib examples, using the df.plot.* syntax can get the plot you want more quickly! To be more comprehensive, the solution for any kind of data is shown in the examples below.

Throughout, we’ll assume that the data are in a tidy format (one row per observation, one variable per column). Remember that all Altair plots can be made interactive by adding .interactive() at the end.

First, though, let’s import the libraries we’ll need.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
import altair as alt
import plotly.express as px
from vega_datasets import data
import os
from pathlib import Path
import warnings

# Set seed for reproducibility
np.random.seed(10)
# Turn off warnings
warnings.filterwarnings('ignore')

Scatter plot#

In this example, we will see a simple scatter plot with several categories using the “cars” data:

cars = data.cars()
cars.head()
Name Miles_per_Gallon Cylinders Displacement Horsepower Weight_in_lbs Acceleration Year Origin
0 chevrolet chevelle malibu 18.0 8 307.0 130.0 3504 12.0 1970-01-01 USA
1 buick skylark 320 15.0 8 350.0 165.0 3693 11.5 1970-01-01 USA
2 plymouth satellite 18.0 8 318.0 150.0 3436 11.0 1970-01-01 USA
3 amc rebel sst 16.0 8 304.0 150.0 3433 12.0 1970-01-01 USA
4 ford torino 17.0 8 302.0 140.0 3449 10.5 1970-01-01 USA

Matplotlib#

fig, ax = plt.subplots()
for origin in cars["Origin"].unique():
    cars_sub = cars[cars["Origin"] == origin]
    ax.scatter(cars_sub["Horsepower"], cars_sub["Miles_per_Gallon"], label=origin)
ax.set_ylabel("Miles per Gallon")
ax.set_xlabel("Horsepower")
ax.legend()
plt.show()
_images/vis-common-plots_6_0.svg

Seaborn#

In this first example, I’ll also show how to tweak the labels by using the underlying matplolib Axes object, here called ax.

fig, ax = plt.subplots()
sns.scatterplot(data=cars, x="Horsepower", y="Miles_per_Gallon", hue="Origin", ax=ax)
ax.set_ylabel("Miles per Gallon")
ax.set_xlabel("Horsepower")
plt.show()
_images/vis-common-plots_8_0.svg

Plotnine#

(
    ggplot(cars, aes(x="Horsepower", y="Miles_per_Gallon", color="Origin"))
    + geom_point()
    + ylab("Miles per Gallon")
)
_images/vis-common-plots_10_0.svg
<ggplot: (8784940143776)>

Altair#

For this first example, we’ll also show how to make the altair plot interactive with movable axes and more info on mouse-hover.

alt.Chart(cars).mark_circle(size=60).encode(
    x="Horsepower",
    y="Miles_per_Gallon",
    color="Origin",
    tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"],
).interactive()

Plotly#

Plotly is another declarative plotting library, at least sometimes (!), but one that is interactive by default.

fig = px.scatter(
    cars,
    x="Horsepower",
    y="Miles_per_Gallon",
    color="Origin",
    hover_data=["Name", "Origin", "Horsepower", "Miles_per_Gallon"],
)
fig.show()