7. Workflow: Writing Code#

In this chapter, we’ll look at the different ways you can both write and run code. This is something that can be very confusing if you’re just getting into coding.

There are different ways to write (and run) code that suit different needs. For example, for creating a reproducible pipeline of tasks or writing production-grade software, you might opt for a script——a file that is mostly code. But for sending instructions to a colleague or exploring a narrative, you might choose to write your code in a notebook because it can present text and code together more naturally than a script can.

We already met some ways to write and run code in the previous chapters. Here, we’ll be a bit more systematic so that, by the end of the chapter, you’ll be comfortable writing code in both scripts and notebooks. For advanced users, there’s also information on how to write code with markdown, using markdown files that contain executable code chunks. Scripts and notebooks are by far the most popular ways to write code though.

Let’s start with some definitions.

  • IDE, or integrated development environment: this is the application that you write code of all different kinds in (scripts, notebooks, markdown). This book recommends Visual Studio Code as an IDE. It supports many languages. Making an analogy with writing documents, VS Code is to programming languages what a word processor is for actual written languages. JupyterLab is another IDE, but one which is geared towards the use of notebooks.

  • the interpreter: this is the programming language (eg Python) that has to be installed separately onto your computer. Its what takes your written commands and turns them into actions. VS Code and other IDEs will use whatever interpreters they can find on your computer and use them to execute the code you’ve written.

  • scripts: these are files that almost exclusively contain code. They can be edited and run in an IDE. Python scripts always have the file extension .py.

  • notebooks: aka Jupyter Notebooks, these are files that can contain code and text in different blocks called “cells”. They can be edited, and the code parts can be run, in an IDE. Jupyter Notebooks always have the file extension .ipynb. Notebooks can be export to other formats, like word documents, HTML pages, PDFs, and even slides!

  • the terminal: this is the text interface that you use to send instructions to your computer’s operating system. It’s typically what you use to install new packages, for example with commands like pip install packagename. Although your computer will come with a separate terminal application too, you can open a terminal in VS Code by clicking on Terminal > New Terminal at the top of your VS Code window.

  • markdown: this is a lightweight language that turns simple text commands into professional looking documents. It’s widely used by people who code. It’s also what’s used for the text cells in Jupyter Notebooks. When not in a notebook, files containing markdown always have the extension .md. With the Visual Studio Code markdown extensions installed, you can right-click within a Markdown file in VS Code and then select Markdown Preview Enhanced to see how the rendered document will look. The difference between HTML code and the same website viewed in a browser like Chrome is a good analogy for the difference between what you see in a .md file and what you see in the preview of that markdown file.

  • quarto markdown: this is a special variant of markdown, with file extension .qmd, that can be used to combine text and code that gets executed so that code outputs are inserted into final outputs such as html or pdf documents.

Let’s now turn to all of the different ways you can write code in a fully-featured integrated development environment like Visual Studio Code. They each have pros and cons, and you’re likely to want to use them at different times. The table below sets out all of the different ways you can write, and execute, code.

If you’re looking for a typical workflow, this book recommends working with scripts (files that end in .py) and the VS Code interactive window. Remember, if you’re working with a .py file, you can always open the Visual Studio Code interactive window by right-clicking somewhere within the script and selecting ‘Run in interactive window’.

What

How to use

Prerequisites

Pros

Cons

Script, eg script.py

‘Run in interactive window’ in an integrated development environment (IDE)

Python installation + an IDE with Python support, eg Visual Studio Code.

Can be run all-in-one or step-by-step as needed. Very powerful tools available to aid coding in scripts. De facto standard for production-quality code. Can be imported by other scripts. Version control friendly.

Not very good if you want to have lots of text alongside code.

Jupyter Notebook, eg notebook.ipynb

Open the file with Visual Studio Code.

Use Visual Studio Code and the VS Code Jupyter extension.

Code and text can alternate in the same document. Rich outputs of code can be integrated into document. Can export to PDF, HTML, and more, with control over whether code inputs/outputs are shown, and either exported directly or via Quarto. Can be run all-in-one or step-by-step as needed.

Fussy to use with version control. Code and text cannot be mixed in same ‘cell’. Not easy to import in other code files.

Markdown with executable code chunks using Quarto, eg markdown_script.qmd

To produce output, write in a mix of markdown and code blocks and then export with commands like quarto render markdown_script.qmd --to html on the command line or using the Visual Studio Code extension. Other output types available.

Installations of Python and Quarto, plus their dependencies.

Allows for true mixing of text and code. Can export to wide variety of other formats, such as PDF and HTML, with control over whether code inputs/outputs are shown. Version control friendly.

Cannot be imported by other code files.

Some of the options above make use of the command line, a way to issue text-based instructions to your computer. Remember, the command line (aka the terminal) can be accessed via the Terminal app on Mac, the Command Prompt app on Windows, or ctrl + alt + t on Linux. To open up the command line within Visual Studio Code, you can use the keyboard shortcut + ` (on Mac) or ctrl + ` (Windows/Linux), or click “View > Terminal”.

Now let’s look at each of these ways to run code in more detail using a common example: Hello World!

7.1. Scripts#

Most code is written in scripts and they should be your go to.

We already met scripts, but let’s have a recap. Create a new file in Visual Studio Code called hello_world.py. In the Visual Studio Code editor, add a single line to the file:

print('Hello World!')

Save the file. Right-click and, to run the script, you can either use ‘Run current file in interactive window’, or ‘Run current file in terminal’, or ‘Run selection/line in interactive window’. These are two different methods of running the script: in the IDE (VS Code in this case) or in the command line.

A typical workflow would be selecting some lines within a script, and then hitting ‘Run selection/line in interactive window’ or using the keyboard shortcut of shift + enter.

As an alternative for the latter, you can open up the command line yourself and run

python hello_world.py

which will execute the script.

7.2. Jupyter Notebooks#

Jupyter Notebooks are for experimentation, tinkering, and keeping text and code together. They are the lab books of the coding world. This book is mostly written in Jupyter Notebooks! The name, ‘Jupyter’, is a reference to the three original languages supported by Jupyter, which are Julia, Python, and R, and to Galileo’s notebooks recording the discovery of the moons of Jupiter. Jupyter notebooks now support a vast number of languages beyond the original three, including Ruby, Haskell, Go, Scala, Octave, Java, and more.

To get started with Jupyter Notebooks, you’ll need to have a Python installation and to run pip install jupyterlab on the command line.

If you get stuck with this tutorial, there’s a more in-depth VS Code and Jupyter tutorial available here.

Create a new file in Visual Studio Code and save it as hello_world.ipynb. Close the file and re-open it. The notebook interface should automatically load and you’ll see options to create cells with plus signs labelled ‘Code’ and ‘Markdown’. A cell is an independent chunk of either code or text. Text cells have markdown in them, a lightweight language for creating text outputs that you will find out more about in Markdown. For now, create a markdown cell containing the following:

# This is a title

## This is a subtitle

This notebook demonstrates printing 'hello world!' to screen.

Now, for the next cell, choose code and write:

print('hello world!')

To run the notebook, you can choose to run all cells (usually a double play button at the top of the notebook page) or just each cell at a time (a play button beside a cell). ‘Running’ a markdown cell will render the markdown in display mode; running a code cell will execute it and insert the output below. When you play the code cell, you should see the ‘hello world!’ message appear.

Jupyter Notebooks are versatile and popular for early exploration of ideas, especially in fields like data science. This entire book is written in a combination of Jupyter Notebooks and executable markdown (more on that in a moment). Jupyter Notebooks can easily be run in the cloud using a browser too (via Binder or Google Colab) without any prior installation. Although it’s not got very much code in, the page you’re reading now can be loaded into Google Colab as a Jupyter Notebook by clicking ‘Colab’ under the rocket icon at the top of the page.

One really nice feature of Jupyter Notebooks is that you can use them as the input files for Quarto instead of using .qmd files, and this opens up many export options and possibilities (like hiding some code inputs). You can find more information here (look for the guidance on Jupyter Notebooks aka .ipynb files) or look ahead to the chapters on Markdown and Quarto.

You can try a Jupyter Notebook without installing anything online at https://jupyter.org/try. Click on Try Classic Notebook for a tutorial.

7.2.1. Tips when using Jupyter Notebooks#

  • Version control: if you are using version control, be wary of saving the outputs of Jupyter Notebooks when you only want to save code. Most IDEs that support Jupyter Notebooks have a clear outputs option. You can also automate this as a pre-commit git hook (if you don’t know what that is, don’t worry). You could also pair your notebook to a script or markdown file (covered in the next section). Outputs or not, Jupyter Notebooks will render on github.

  • Terminal commands: these can be run from inside a Jupyter Notebook by placing a ! in front of the command and executing the cell. For example, !ls gives the directory the notebook is in. You can also !pip install and !conda install in this way. This works in Google Colab notebooks too.

  • Magic commands: statements that begin with % are magic commands. %whos displays information about defined variables. %run script.py runs a script called script.py. %timeit times how long the cell takes to execute. Finally, you can see many more magic commands using %quickref.

  • Notebook cells can be executed in any sequence you choose. But if you’re planning to share your notebook or use it again for yourself, it’s good practice to check that its cells do what you want when run in sequence, from top to bottom.

  • There are tons of extensions to Jupyter Notebooks; you can find a list here. Of particular note is ipywidgets, which adds interactivity.

  • Get help info on a command by running it but with ? appended to end.

7.3. Markdown with Executable Code Chunks#

This is by far the least common way of coding, though it has gained popularity in recent years and it’s great if you’re going to ultimately export to other formats like slides, documents, or even a website!

When you have much more text combined code, even using Jupyter Notebooks can feel a bit onerous and, historically, editing the text in a notebook was a bit tedious - especially if you wanted to move cells around a lot. Markdown makes for a much more pleasant writing experience. But markdown on its own cannot execute code—but imagine you want to combine reproducibility, text, and code + code outputs: however, there is a tool called Quarto that allows you to do this by adding executable code chunks to markdown.

As this is a bit more of an advanced topic, and as much about communication as it is about writing code, we’ll come back to how to do it in Quarto.