Coming from Excel#
Like it or loathe it, Microsoft’s Excel spreadsheet programme is ubiquitous in our offices and homes. If you’re looking to get off Excel, look no further because code can do everything Excel can-and a whole lot more!
The biggest difference between a fully-fledged programming language, such as Python, and a graphical tool for spreadsheet analysis is that you write out all of your instructions in code in the former, but only some instructions are written in code in the latter (such as cell formulae). Another big difference is that Python (and related tools like Visual Studio Code and git) are free!
The three main packages (extensions of the basic Python programming language) that cover Excel’s functionality are pandas, which is for panel data analysis (such as spreadsheets). For some Excel functions, numpy, which provides tools for working with numbers, and matplotlib, which creates charts, will be necessary.
In this chapter, we’ll focus on how to go from Excel to Python by replacing Excel. The situation where you want to work with Excel from Python (eg by populating a spreadsheet with data, formulae, and charts) will be covered in another chapter on automation. Your first port of call for replacing Excel should be the chapter on Spreadsheets. There is a more in-depth walkthrough of replacing Excel with Python here.
Excel <==> Python#
To get started with Excel, you open the application and create a new empty file. To get started with Python, follow the instructions in the preliminaries chapter of this book and create a new script (a file ending in .py
) in Visual Studio Code. You will also need to have the Python package pandas installed. The translations of common Excel tasks in the table below assume you have already run import pandas as pd
, and that you have your data in a pandas dataframe called df
(see the parts of this book on data analysis for more on using pandas). Although the table only covers a small fraction of what Excel does, and is certainly not comprehensive, it should give an idea of the ease and flexibility of pandas in replacing common Excel functions.
Action |
Excel |
Python (pandas) |
---|---|---|
View data |
Open the file using Excel |
|
Count rows |
Select a column with the mouse and look at the count indicator |
|
Find Sum, Average, Max, or Min of the 1st column |
eg, |
eg, |
Perform an operation on, eg sum, on the first row |
|
|
Count occurrences of distinct items within a column |
See here (complicated!) |
|
Return true or false given values in another column based on a condition |
|
|
Adding two columns |
|
|
Create a date column |
Click on the column; select “Date” from list of number-format categories. |
|
Count days between two dates |
|
|
Create a cumulative sum on a column |
|
|
Concatenate string columns |
|
|
Find out which cells in a column have invalid entries |
|
|
Remove excess whitespace |
|
|
Transpose data |
|
|
Create a pivot table |
Manually select data and use the various menus (example here) |
|
For more on how to replace Excel with Python and pandas, see the Data Analysis Quickstart and Data Analysis sections.