Coming from Excel#
Like it or loathe it, Microsoft’s Excel spreadsheet programme is ubiquitous in our offices and homes. If you’re looking to get off Excel, look no further because code can do everything Excel canand a whole lot more!
The biggest difference between a fullyfledged programming language, such as Python, and a graphical tool for spreadsheet analysis is that you write out all of your instructions in code in the former, but only some instructions are written in code in the latter (such as cell formulae). Another big difference is that Python (and related tools like Visual Studio Code and git) are free!
The three main packages (extensions of the basic Python programming language) that cover Excel’s functionality are pandas, which is for panel data analysis (such as spreadsheets). For some Excel functions, numpy, which provides tools for working with numbers, and matplotlib, which creates charts, will be necessary.
In this chapter, we’ll focus on how to go from Excel to Python by replacing Excel. The situation where you want to work with Excel from Python (eg by populating a spreadsheet with data, formulae, and charts) will be covered in another chapter on automation. Your first port of call for replacing Excel should be the chapter on Spreadsheets. There is a more indepth walkthrough of replacing Excel with Python here.
Excel <==> Python#
To get started with Excel, you open the application and create a new empty file. To get started with Python, follow the instructions in the preliminaries chapter of this book and create a new script (a file ending in .py
) in Visual Studio Code. You will also need to have the Python package pandas installed. The translations of common Excel tasks in the table below assume you have already run import pandas as pd
, and that you have your data in a pandas dataframe called df
(see the parts of this book on data analysis for more on using pandas). Although the table only covers a small fraction of what Excel does, and is certainly not comprehensive, it should give an idea of the ease and flexibility of pandas in replacing common Excel functions.
Action 
Excel 
Python (pandas) 

View data 
Open the file using Excel 

Count rows 
Select a column with the mouse and look at the count indicator 

Find Sum, Average, Max, or Min of the 1st column 
eg, 
eg, 
Perform an operation on, eg sum, on the first row 


Count occurrences of distinct items within a column 
See here (complicated!) 

Return true or false given values in another column based on a condition 


Adding two columns 


Create a date column 
Click on the column; select “Date” from list of numberformat categories. 

Count days between two dates 


Create a cumulative sum on a column 


Concatenate string columns 


Find out which cells in a column have invalid entries 


Remove excess whitespace 


Transpose data 


Create a pivot table 
Manually select data and use the various menus (example here) 

For more on how to replace Excel with Python and pandas, see the Data Analysis Quickstart and Data Analysis sections.