In this chapter, we’ll look at some of the most common plots that you might want to make–and how to create them using the most popular data visualisations libraries, including matplotlib, lets-plot, seaborn, altair, and plotly. If you need an introduction to these libraries, see the previous chapter.
This chapter has benefited from the phenomenal matplotlib documentation, the lets-plot documentation, viztech (a repository that aimed to recreate the entire Financial Times Visual Vocabulary using plotnine), from the seaborn documentation, from the altair documentation, from the plotly documentation, and from examples posted around the web on forums and in blog posts. You may be wondering why plotnine isn’t featured here: its functions have almost exactly the same names as those in lets-plot, and we have opted to include the latter as it is currently the more mature plotting package. However, most of the code below for lets-plot also works in plotnine, and you can read more about plotnine in Data Visualisation using the Grammar of Graphics with Plotnine.
Bear in mind that for many of the matplotlib examples, using the
df.plot.* syntax can get the plot you want more quickly! To be more comprehensive, the solution for any kind of data is shown in the examples below.
Throughout, we’ll assume that the data are in a tidy format (one row per observation, one variable per column). Remember that all Altair plots can be made interactive by adding
.interactive() at the end.
First, though, let’s import the libraries we’ll need.
import numpy as np import pandas as pd import seaborn as sns import seaborn.objects as so import matplotlib.pyplot as plt from lets_plot import * from lets_plot.mapping import as_discrete import altair as alt from vega_datasets import data import plotly.express as px import os from pathlib import Path import warnings # Set seed for reproducibility np.random.seed(10) # Turn off warnings warnings.filterwarnings('ignore') # Set up lets-plot charts LetsPlot.setup_html()
In this example, we will see a simple scatter plot with several categories using the “cars” data:
cars = data.cars() cars.head()
|0||chevrolet chevelle malibu||18.0||8||307.0||130.0||3504||12.0||1970-01-01||USA|
|1||buick skylark 320||15.0||8||350.0||165.0||3693||11.5||1970-01-01||USA|
|3||amc rebel sst||16.0||8||304.0||150.0||3433||12.0||1970-01-01||USA|
fig, ax = plt.subplots() for origin in cars["Origin"].unique(): cars_sub = cars[cars["Origin"] == origin] ax.scatter(cars_sub["Horsepower"], cars_sub["Miles_per_Gallon"], label=origin) ax.set_ylabel("Miles per Gallon") ax.set_xlabel("Horsepower") ax.legend() plt.show()
Note that this uses the seaborn objects API.
( so.Plot(cars, x="Horsepower", y="Miles_per_Gallon", color="Origin") .add(so.Dot()) )
( ggplot(cars, aes(x="Horsepower", y="Miles_per_Gallon", color="Origin")) + geom_point() + ylab("Miles per Gallon") )
For this first example, we’ll also show how to make the altair plot interactive with movable axes and more info on mouse-hover.
alt.Chart(cars).mark_circle(size=60).encode( x="Horsepower", y="Miles_per_Gallon", color="Origin", tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"], ).interactive()
Plotly is another declarative plotting library, at least sometimes (!), but one that is interactive by default.
fig = px.scatter( cars, x="Horsepower", y="Miles_per_Gallon", color="Origin", hover_data=["Name", "Origin", "Horsepower", "Miles_per_Gallon"], ) fig.show()