1. Whole Game#

Our goal in this part of the book is to give you a rapid overview of the main tools of data science: importing, cleaning, transforming, and visualising data, as shown in the figure below. We want to show you the “whole game” of data science giving you just enough of all the major pieces so that you can tackle real, if simple, datasets. The later parts of the book will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle.

_images/4e2bd4941bbc1cfa3d6b8df68f6864a99960b19b594432ee0f693e0f785f7a6b.svg

After this chapter, we have four main chapters that focus on the tools of data science:

  • Visualisation is a great place to start with Python programming because the pay-off is so clear: you get to make elegant and informative plots that help you understand data. In Data Visualisation, you’ll dive into visualisation, learning the basic structure of a plot, and powerful techniques for turning data into plots.

  • Visualisation alone is typically not enough, so in Data Transformation, you’ll learn the key verbs that allow you to select important variables, filter out key observations, create new variables, and compute summaries.

  • In Tidy Data, you’ll learn about cleaning data and specifically “tidy” data, a consistent way of storing tabular data that makes transformation, visualisation, and modelling easier. You’ll learn the underlying principles, and how to get your data into a “tidy” format.

  • Before you can transform and visualise your data, you need to first get your data into a Python session. In Data Import you’ll learn the basics of getting .csv files into your Python session.

These are interspersed with four other chapters that focus on your Python workflow:

Finally, Postscript: Getting Further Help contains some short advice on how to get help and keep learning.