Workflow: Basics

3. Workflow: Basics#

If you followed the previous chapters, you will now have some experience running Python code. We didn’t give you many details, but you’ve obviously figured out the basics, or you would’ve thrown this book away in frustration! Frustration is natural when you start programming in Python, because it is such a stickler for punctuation, and even one character out of place will cause it to complain. But while you should expect to be a little frustrated, take comfort in that this experience is both typical and temporary: it happens to everyone, and the only way to get over it is to keep trying.

Before we go any further, let’s make sure you’ve got a solid foundation in running Python code, and that you know about some of the most helpful Visual Studio Code features for working with Python.

3.1. Coding Basics#

Let’s review some basics we’ve omitted thus far in the interests of getting you up to speed as quickly as possible. You can use Python as a calculator:

print(1 / 200 * 30)
print((59 + 73 + 2) / 3)

0.15
44.666666666666664

The extra package numpy contains many of the additional mathematical operators that you might need. If you don’t already have numpy installed, open up the terminal in Visual Studio Code (go to “Terminal -> New Terminal” and then type pip install numpy into the terminal then hit return). Once you have numpy installed, you can import it and use it like this:

import numpy as np

print(np.sin(np.pi / 2))

1.0

You can create new objects with the assignment operator =. You should think of this as copying the value of whatever is on the right-hand side into the variable on the left-hand side.

x = 3 * 4
print(x)

There are several structures in Python that capture multiple objects simultaneously but perhaps the most common is the list, which is designated by square brackets.

primes = [1, 2, 3, 5, 7, 11, 13]
print(primes)

[1, 2, 3, 5, 7, 11, 13]

To do basic arithmetic on a list, use a list comprehension which has the structure “for every element in this list, perform an operation”. For example, to multiply each element by three.

[element * 3 for element in primes]

[3, 6, 9, 15, 21, 33, 39]

Note that the word “element” above could have been almost any word because we define it by saying ...for element in .... You can try the above with a different word, eg [entry*3 for entry in primes].

All Python statements where you create objects (known as assignment statements) have the same form:

object_name = value

When reading that code, say “object name gets value” in your head.

3.2. Comments#

Python will ignore any text after #. This allows to you to write comments, text that is ignored by Python but can be read by other humans. We’ll sometimes include comments in examples explaining what’s happening with the code.

Comments can be helpful for briefly describing what the subsequent code does.

# define primes
primes = [1, 2, 3, 5, 7, 11, 13]
# multiply primes by 2
[el * 2 for el in primes]

[2, 4, 6, 10, 14, 22, 26]

With short pieces of code like this, it is not necessary to leave a command for every single line of code and you should try to use informative names wherever you can because these help readers of your code (likely to be you in the future) understand what is going on!

Our advice is to use comments to explain the why of your code, not the how or the what. The what and how of your code are always possible to figure out, even if it might be tedious, by carefully reading it. If you describe every step in the comments, and then change the code, you will have to remember to update the comments as well (tedious) or it will be confusing when you return to your code in the future.

Figuring out why something was done is much more difficult, if not impossible. For example, geom_smooth() has an argument called span, that controls the smoothness of the curve, with larger values yielding a smoother curve. Suppose you decide to change the value of span from its default of 0.75 to 0.9: it’s easy for a future reader to understand what is happening, but unless you note your thinking in a comment, no one will understand why you changed the default.

For data analysis code, use comments to explain your overall plan of attack and record important insights as you encounter them. There’s no way to re-capture this knowledge from the code itself.

3.3. Keeping Track of Variables#

You can always inspect an already-created object by typing its name into the interactive window:

primes

[1, 2, 3, 5, 7, 11, 13]

If you want to know what type of object it is, use type(object) in the interactive window like this:

type(primes)

list

Visual Studio Code has some powerful features to help you keep track of objects:

At the top of your interactive window, you should see a ‘Variables’ button. Click it to see a panel appear with all variables that you’ve defined.
Hover your mouse over variables you’ve previously entered into the interactive window; you will see a pop-up that tells you what type of object it is.
If you start typing a variable name into the interactive window, Visual Studio Code will try to auto-complete the name for you. Press the ‘tab’ key on your keyboard to accept the top option.

3.4. What’s In A Name?#

Object (aka “variable”) names must do the following to be valid in Python:

start with a letter or the underscore character
not start with a number
only contain alpha-numeric characters and underscores (A-z, 0-9, and _)

Object names in Python are case-sensitive too, so age, Age and AGE could all be three different variables.

When you’re naming objects, it’s best to make them descriptive so you can keep track of what they are. You’ll need to adopt a convention for multiple words. We recommend snake_case, where you separate lowercase words with _. For example, i_use_snake_case is a valid snake case name for an object.

Exercise

Try creating the object age and assigning it the value 10. What happens when you type Age into your console?

Remember that you can always inspect an object that you’ve created by typing its name again:

primes

[1, 2, 3, 5, 7, 11, 13]

Make another assignment:

this_is_a_really_long_name = 2.5

To save yourself time in inspecting this object via the interactive window, you can just begin typing the name (type “this”) and then hit the TAB button. Visual Studio Code will autocomplete what you’ve written using the variables you’ve defined during your session. This is a top tip to save time!

If you’re using the interactive console, rather than a notebook, there’s another top tip. Let’s say you previously ran this_is_a_really_long_name = 2.5 but you meant to set it to 3.5. Don’t despair; you don’t have to type it all out again. With your cursor in the interactive window, you can simply hit ↑ on your keyboard and cycle through previous commands you issued. Change 2.5 to 3.5, hit shift + return, and you’ll have redefined your variable.

Let’s define another variable:

py_variable = 2 ^ 3

Now let’s try to inspect it:

py_variabl

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/Users/aet/Documents/git_projects/python4DS/workflow-basics.ipynb Cell 31 in ()
----> 1 py_variabl

NameError: name 'py_variabl' is not defined

This illustrates the brilliance and frustration of coding: your IDE (Visual Studio Code) will do tedious computations for you, but, in exchange, you must be precise in your instructions. If not, you’re likely to get an error that says the object you’re looking for was not found. Typos matter; Python can’t read your mind and say, “oh, they probably meant py_variable when they typed py_variabl”.

3.5. Calling Functions#

Python has a large number of built-in functions. You can also import functions from packages (like we did with np.sin) or define your own.

In coding, a function has inputs, it performs its function, and it returns any outputs. Let’s see a simple example of using a built-in function, sum():

sum(primes)

The general structure of functions is the function name, followed by brackets, followed by one or more arguments. Sometimes there will also be keyword arguments. For example, sum() comes with a keyword argument that tells the function to start counting from a specific number. Let’s see this in action by starting from ten:

sum(primes, start=10)

If you’re ever unsure of what a function does, you can call help() on it (itself a function):

help(sum)

Help on built-in function sum in module builtins:

sum(iterable, /, start=0)
    Return the sum of a 'start' value (default: 0) plus an iterable of numbers
    
    When the iterable is empty, return the start value.
    This function is intended specifically for use with numeric values and may
    reject non-numeric types.

Or, in Visual Studio Code, hover your mouse over the function name.

Just as with variables, code completion works on functions too. Try typing in su and hitting tab to see this in action.

You’ll need to be extra careful with objects that are strings (words, sentences, letters, and phrases), because these always need to come with quotation marks around them. You can use single or double quotation marks as you like, but i) the convention is double quotation marks, and ii) it’s good to be consistent, whichever you choose.

Here’s an example of some code that throws an error

x = "hello

  Input In [3]
    x = "hello
        ^
SyntaxError: unterminated string literal (detected at line 1)

Again, Visual Studio Code can really help you out here because as soon as you open a double quotation mark, it will have the closing one ready for you.

3.6. Exercises#

Why does this code not work?
```
my_variable = 10
my_varıable
```
Look carefully! This may seem like an exercise in pointlessness, but training your brain to notice even the tiniest difference will pay off when programming.

Tweak each of the following Python commands so that they run correctly:

import pandas as pd
from palmerpenguins import load_penguins
from lets_pot import *

LetsPlot.setup_html()
penguins = load_penguins()

(
    ggplot(
        dTA=penguins,
        maping=aes(x="flipper_length_mm", y="body_mass_g", color="species"),
    )
    + geom_smooth(method="lm)
)