Advanced Coding#

Introduction#

This chapter covers some slightly more advanced programming concepts you’re likely to run into when coding in Python, in addition to some recommended extensions and options for advanced use of Visual Studio Code. You probably won’t need much of this if you’re using Python but if you start to develop packages yourself, there are a few bits—like classes and type hints—that are invaluable.

This chapter has benefitted from the online book Research Software Engineering with Python, the official Python documentation, the excellent 30 days of Python, and the Hitchhiker’s Guide to Python.

Higher order functions#

Functions are like any other variable in Python, which means you can do some interesting things with them and, well, it can get a bit meta. For example, a function can take one or more functions as parameters, a function can be returned as a result of another function, functions can be defined within functions, a function can be assigned to a variable, and you can iterate over functions (for example, if they are in a list).

Here’s an example that shows how to use a higher order function: it accepts a function, f, as an argument and then, using the splat operator *, it accepts all arguments of that function.

def join_a_string(str_list):
    return " ".join(str_list)


def higher_order_function(f, *args):
    """Lowers case of result"""
    out_string = f(*args)
    return out_string.lower()


result = higher_order_function(join_a_string, ["Hello", "World!"])
print(result)
hello world!

In the next example, we show how to return a function from a function (assigning a function, result, to a variable in the process):

def square(x):
    return x**2


def cube(x):
    return x**3


def higher_order_function(type):  # a higher order function returning a function
    if type == "square":
        return square
    elif type == "cube":
        return cube


result = higher_order_function("square")
print(f"Using higher_order_function('square'), result(3) yields {result(3)}")
result = higher_order_function("cube")
print(f"Using higher_order_function('cube'), result(3) yields {result(3)}")
Using higher_order_function('square'), result(3) yields 9
Using higher_order_function('cube'), result(3) yields 27

Functions within functions are allowed. They are known as closures. Here’s a simple (if contrived) example:

from datetime import datetime


def print_time_now():
    def get_curr_time():
        return datetime.now().strftime("%H:%M")

    now = get_curr_time()
    print(now)


print_time_now()
00:58

Finally, let’s see how to iterate over functions

def square_root(x):
    return x ** (0.5)


functions_list = [square_root, square, cube]

for func in functions_list:
    print(f"{func.__name__} applied to 4 is {func(4)}")
square_root applied to 4 is 2.0
square applied to 4 is 16
cube applied to 4 is 64

Iterators#

An iterator is an object that contains a countable number of values that a single command, next(), iterates through. Before that’s possible though, we need to take a countable group of some kind and use the iter() keyword on it to turn it into an iterator. Let’s see an example with some text:

text_lst = ["Mumbai", "Delhi", "Bangalore"]

myiterator = iter(text_lst)

Okay, nothing has happened yet, but that’s because we didn’t call it yet. To get the next iteration, whatever it is, use next():

next(myiterator)
'Mumbai'
next(myiterator)
'Delhi'
next(myiterator)
'Bangalore'

Alright, we’ve been through all of the values so… what’s going to happen next!?

next(myiterator)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-27-29fb3b4dbbec> in <module>
----> 1 next(myiterator)

StopIteration: 

Iterating beyond the end raises a StopIteration error because we reached the end. To keep going, use cycle() in place of iter(). Note that you can build your own iterators (here we used a built-in object type, the list, to create an iterator of type list_iterator).

Generators#

Generator functions return ‘lazy’ iterators. They are lazy because they do not store their contents in memory. This has big advantages for some operations in specific situations: datasets larger than can fit into your computer’s memory, or a complex function that needs to maintain an internal state every time it’s called.

To give an idea of how and when they work, imagine that (exogenously) integers are really costly, taking as much as 10 MB of space to store (the real figure is more like 128 bytes). We will write a function, “firstn”, that represents the first \(n\) non-negative integers, where \(n\) is large. The most naive possible way of doing this would be to build the full list in memory like so:

def first_n_naive(n):
    """Build and return a list"""
    num, nums = 0, []
    while num < n:
        nums.append(num)
        num += 1
    return nums


sum_of_first_n = sum(first_n_naive(1000000))
sum_of_first_n
499999500000

Note that nums stores every number before returning all of them. In our imagined case, this is completely infeasible because we don’t have enough computer space to keep all \(n\) 10MB integers in memory.

Now we’ll rewrite the list-based function as a generator-based function:

def first_n_generator(n):
    """A generator that yields items instead of returning a list"""
    num = 0
    while num < n:
        yield num
        num += 1


sum_of_first_n = sum(first_n_generator(1000000))
sum_of_first_n
499999500000

Now, instead of creating an enormous list that has to be stored in memory, we yield up each number as it is ‘generated’. The cleverness that’s going on here is that the ‘state’ of the function is remembered from one call to the next. This means that when next() is called on a generator object (either explicitly or implicitly, as in this example), the previously yielded variable num is incremented, and then yielded again.

That was a fairly contrived example but there are plenty of practical ones. Working with pipelines that process very large datasets is a classic use case. For example, imagine you have a csv file that’s far too big to fit in memory, i.e. open all at once, but you’d like to check the contents of each row and perhaps process them. The code below would yield each row in turn.

def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

An even more concise way of defining this is via a generator expression, which syntactically looks a lot like a list comprehension but is a generator rather than a list. The example we just saw would be written as:

csv_gen = (row for row in open(file_name))

It’s easier to see the difference in the below example which clearly shows the analogy between list comprehensions and generator comprehensions.

sq_nums_lc = [num**2 for num in range(2, 6)]
sq_nums_lc
[4, 9, 16, 25]
sq_nums_gc = (num**2 for num in range(2, 6))
sq_nums_gc
<generator object <genexpr> at 0x7f34d15eccf0>

The latter is a generator object and we can only access individual values calling next() on it.

next(sq_nums_gc)
4

Note that for small numbers of entries, lists may actually be faster and more efficient than generators-but for large numbers of entries, generators will almost always win out.

Decorators#

Decorators ‘decorate’ functions, they adorn them, modifying them as they execute. Let’s say we want to run some numerical functions but we’d like to add ten on to whatever results we get. We could do it like this:

def multiply(num_one, num_two):
    return num_one * num_two


def add_ten(in_num):
    return in_num + 10


answer = add_ten(multiply(3, 4))
answer
22

This is fine for a one-off but a bit tedious if we’re going to be using add_ten() a lot, and on many functions. Decorators allow for a more general solution that can be applied, in this case, to any inner() function that has two arguments and returns a numeric value.

def add_ten(func):
    def inner(a, b):
        return func(a, b) + 10

    return inner


@add_ten
def multiply(num_one, num_two):
    return num_one * num_two


multiply(3, 4)
22

We can use the same decorator for a different function (albeit one of the same form) now.

@add_ten
def divide(num_one, num_two):
    return num_one / num_two


divide(10, 5)
12.0

But the magic of decorators is such that we can define them for much more general cases, regardless of the number of arguments or even keyword arguments:

def add_ten(func):
    def inner(*args, **kwargs):
        print("Function has been decorated!")
        print("Adding ten...")
        return func(*args, **kwargs) + 10

    return inner


@add_ten
def combine_three_nums(a, b, c):
    return a * b - c


@add_ten
def combine_four_nums(a, b, c, d=0):
    return a * b - c - d


combine_three_nums(1, 2, 2)
Function has been decorated!
Adding ten...
10

Let’s now see it applied to a function with a different number of (keyword) arguments:

combine_four_nums(3, 4, 2, d=2)
Function has been decorated!
Adding ten...
18

Decorators can be chained too (and order matters):

def dividing_line(func):
    def inner(*args, **kwargs):
        print("".join(["-"] * 30))
        out = func(*args, **kwargs)
        return out

    return inner


@dividing_line
@add_ten
def multiply(num_one, num_two):
    return num_one * num_two


multiply(3, 5)
------------------------------
Function has been decorated!
Adding ten...
25

Classes and objects#

Python is an object oriented programming language. Everything is an object (and every object has a type). A Class is an object constructor, a blueprint for creating objects. An object is a ‘live’ instance of a class. Objects are to classes what a yellow VW Beetle is to cars. The class defines the attributes and methods that the object can perform.

Classes and instances of them are useful in certain situations, the most common being when you need something that has ‘state’, i.e. it can remember things that have happened to it, carry information with it, and change form.

While you’re quite unlikely to need to build classes in economics (unless you’re doing something really fancy), some of the biggest Python packages are based around classes so it’s useful to understand a bit about how they work, and especially how they have state.

The syntax to create a class is

class ClassName:
  ...code...

But it’s easiest to show with an example:

# Define a class called Person


class Person:
    def __init__(self, name):
        self.name = name


# Create an instance of the class
p = Person("Adam")

When we check type(), that’s when it gets really interesting

type(p)
__main__.Person

Woah! We created a whole new data type based on the Class name. The class has a constructor method, __init__, that, in this case, takes an input variable name and assigns it to an internal object variable name. The self variable that you can also see is really saying ‘generate an object of type this Class when called’. We can access any internal variables like this:

p.name
'Adam'

Okay but what’s the point of all this? Well we can now create as many objects as we like of class ‘Person’ and they will have the same structure, but not the same state, as other objects of class ‘Person’.

m = Person("Ada")
m.name
'Ada'

This is a very boring class! Let’s add a method, which will allow us to change the state of objects. Here, we add a method increment_age which is also indented under the class Person header. Note that it takes self as an input, just like the constructor, but it only acts on objects of type person that have already been created.

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def increment_age(self):
        self.age = self.age + 1


# Create an instance of the class
p = Person("Adam", 231)

print(p.age)
# Call the method increment_age
p.increment_age()
print(p.age)
231
232

This very simple method changes the internal state. Just like class constructors and regular functions, class methods can take arguments. Here’s an example:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def increment_age(self):
        self.age = self.age + 1

    def change_age(self, new_age):
        self.age = new_age


# Create an instance of the class
p = Person("Adam", 231)

print(p.age)
# Call the method increment_age
p.change_age(67)
print(p.age)
231
67

It can be tedious to have to initialise a class with a whole load of parameters every time. Just like with functions, we can define default parameters for classes:

class Person:
    def __init__(self, name="default_name", age=20):
        self.name = name
        self.age = age


p = Person()
p.name
'default_name'

That covers a lot of the basics of classes but if you’re using classes in anger then you might also want to look up inheritance and composition.

Dataclassess#

The basic classes we created above come with a lot of ‘boilerplate’; code we need but which is not very surprising. Dataclasses were introduced in Python 3.7 as a way to remove this boilerplate when the classes being created are quite simple. Think of dataclasses as a class with sensible defaults that is for light object-oriented programming.

A simple example, with a Circle class, demonstrates why they are effective. First, the full class way of doing things:

import numpy as np


class Circle1:
    def __init__(self, colour: str, radius: float) -> None:
        self.colour = colour
        self.radius = radius

    def area(self) -> float:
        return np.pi * self.radius**2


circle1 = Circle1("red", 2)
circle1
<__main__.Circle1 at 0x7f34d15b15d0>

We don’t get a very informative message when we call circle1, as you can see. At least we can compute its area:

circle1.area()
12.566370614359172

Now we’ll create the same object with dataclasses

from dataclasses import dataclass


@dataclass
class Circle2:
    colour: str
    radius: float

    def area(self) -> float:
        return np.pi * self.radius**2


circle2 = Circle2("blue", 2)
circle2
Circle2(colour='blue', radius=2)

Right away we get a much more informative message when we call the object, and the class definition is a whole lot simpler. Everything else is just the same (just try calling circle2.area()).

Type annotations and type checkers#

Type annotations were introduced in Python 3.5 (these notes are written in 3.8). If you’ve seen more low-level languages, typing will be familiar to you. Python uses ‘duck typing’ (“if it walks and quacks like a duck, it is a duck”) which means that if a variable walks like an integer, and talks like an integer, then it gets treated as if it is an integer. Ditto for other variable types. Duck typing is useful if you just want to code quickly and aren’t writing production code.

But… there are times when you do know what variable types you’re going to be dealing with ahead of time and you want to prevent the propagation of the wrong kinds of variable types. In these situations, you can clearly say what variable types are supposed to be. And, when used with some other packages, typing can make code easier to understand, debug, and maintain.

Note that it doesn’t have to be all or nothing on type checking, you can just add it in gradually or where you think it’s most important.

Now it’s important to be really clear on one point, namely that Python does not enforce type annotations. But we can use static type checking to ensure all types are as they should be in advance of running. Before we do that, let’s see how we add type annotations.

This is the simplest example of a type annotation:

answer: int = 42

This explicitly says that answer is an integer. Type annotations can be used in functions too:

def increment(number: int) -> int:
    return number + 1

A static type checker uses these type annotations to verify the type correctness of a programme without executing it. mypy is the most widely used static type checker. After installing mypy, to run type checking on a file code_script.py use

mypy code_script.py

on the command line.

What do you see when you run it? Let’s say the content of your script is:

# Contents of code_script.py
def greeting(name: str) -> str:
    return 'Hello ' + name


greeting(3)

This would return:

Argument 1 to "greeting" has incompatible type "int"; expected "str"

Here are more of the type annotations that you might need or come across, courtesy of the mypy documentation:

from typing import List, Set, Dict, Tuple, Optional

# For simple built-in types, just use the name of the type
x: int = 1
x: float = 1.0
x: bool = True
x: str = "test"
x: bytes = b"test"

# For collections, the type of the collection item is in brackets
# (Python 3.9+ only)
x: list[int] = [1]
x: set[int] = {6, 7}

# In Python 3.8 and earlier, the name of the collection type is
# capitalized, and the type is imported from 'typing'
x: List[int] = [1]
x: Set[int] = {6, 7}

# Same as above, but with type comment syntax (Python 3.5 and earlier)
x = [1]  # type: List[int]

# For mappings, we need the types of both keys and values
x: dict[str, float] = {'field': 2.0}  # Python 3.9+
x: Dict[str, float] = {'field': 2.0}

# For tuples of fixed size, we specify the types of all the elements
x: tuple[int, str, float] = (3, "yes", 7.5)  # Python 3.9+
x: Tuple[int, str, float] = (3, "yes", 7.5)

# For tuples of variable size, we use one type and ellipsis
x: tuple[int, ...] = (1, 2, 3)  # Python 3.9+
x: Tuple[int, ...] = (1, 2, 3)

# Use Optional[] for values that could be None
x: Optional[str] = some_function()
# Mypy understands a value can't be None in an if-statement
if x is not None:
    print(x.upper())
# If a value can never be None due to some invariants, use an assert
assert x is not None
print(x.upper())

I am the Walrus#

The Walrus operator, := was introduced in Python 3.8 and, well, it’s fairly complicated but it does have its uses. The main use case for the Walrus operator is when you want to both evaluate an expression and assign a variable in one fell swoop.

Take this (trivial) example which involves evaluating an expression, len(a) > 4, that returns a boolean and then assigning that same expression to a variable n:

a = [1, 2, 3, 4]
if len(a) > 3:
    n = len(a)
    print(f"List is too long ({n} elements, expected <= 3)")
List is too long (4 elements, expected <= 3)

The Walrus operator allows us to skip the clumsy use of len(a) twice and do both steps in one go. As noted, that’s trivial here, but if evaluation were very computationally expensive, then this might save us some trouble. Here’s the version with the Walrus operator:

a = [1, 2, 3, 4]
if (n := len(a)) > 3:
    print(f"List is too long ({n} elements, expected <= 3)")
List is too long (4 elements, expected <= 3)

Map, filter, and reduce#

Map, filter, and reduce are built-in higher order functions. Lambda functions, featured in the basics of coding chapter, can be passed as into each of these as an argument and some of the best use cases of lambda functions are in conjunction with map, filter, and reduce.

Map#

map() takes a function and an iterable as arguments, ie the syntax is map(function, iterable). An iterable is a type that is composed of elements that can be iterated over. The map essentially applies the function to each entry in the iterable. Here’s an example where a list of strings is cast to integers via map():

numbers_str = ["1", "2", "3", "4", "5"]
mapped_result = map(int, numbers_str)
list(mapped_result)
[1, 2, 3, 4, 5]

Here’s an example with a lambda function. The benefit of using a lambda in this map operation is that otherwise we would have to write a whole function that simply returned the input with .title() at the end:

names = ["robinson", "fawcett", "ostrom"]
names_titled = map(lambda name: name.title(), names)
list(names_titled)
['Robinson', 'Fawcett', 'Ostrom']

Filter#

filter() calls a specified function and returns a boolean for each item of the specified iterable. It filters the items that satisfy the given boolean criteria. It uses the filter(function, iterable) syntax. In the example below, we take all the numbers from zero to five and filter them according to whether they are divisible by 2:

numbers = list(range(6))
fil_result = filter(lambda x: x % 2 == 0, numbers)
list(fil_result)
[0, 2, 4]

Reduce#

reduce() is defined in the built-in functools module. Like map() and filter(), reduce() takes two parameters, a function and an iterable. However, it returns a single value rather than another iterable. The way reduce() works is to apply operations successively so that the example below effectively first sums 2 and 3 to make 5, then 5 and 5 to make 10, then 10 and 15 to make 25, and, finally, 25 and 20 to make the final result of 45.

from functools import reduce

numbers = [2, 3, 5, 15, 20]

reduce(lambda x, y: x + y, numbers)
45

Non-local variables#

Non-local variables are used in nested functions as a means to say ‘hey, this variable is not just local to this nested function, it exists outside it too’. Here’s an example that prints “world” because we tell the inner function to use the same x as the outer function:

def outer_function():
    x = "hello"

    def nested_function():
        nonlocal x
        x = "world"

    nested_function()
    return x


print(outer_function())
world

Exercise

Re-write the above function without the nonlocal keyword. What does it print?

Multiple dispatch#

One can use object-oriented methods and inheritance to get different code objects to behave in different ways depending on the type of input. For example, a different behaviour might occur if you send a string into a function versus an integer. An alternative to the object-oriented approach is to use multiple dispatch. fastcore is a library that provides “goodies to make your coding faster, easier, and more maintainable” and has many neat features but amongst the goodies is multiple dispatch, with the typedispatch decorator. The example below doesn’t execute but shows you how the library can be used to define different behaviours for inputs of different types.

# fastcore is designed to be imported as *
from fastcore.dispatch import *


@typedispatch
def func_example(x: int, y: float):
    return x + y


@typedispatch
def func_example(x: int, y: int):
    return x * y


# Int and float
print(func_example(5, 5.0))

# Int and int
print(func_example(5, 5))

What we can see here is that we have the same function, func_example, used twice with very similar inputs. But the inputs are not the same; in the first instance it’s an integer and a float while in the second it’s two integers. The different inputs get routed into the different versions of the @typeddispatch function. This decorator-based approach is not the only way to use fastcore to do typed dispatch but it’s one of the most convenient.

Fine-tuning Visual Studio Code#

This section has tips on making your IDE, Visual Studio Code, even more effective.

VS Code can do a whole lot more with some extra add-ons. You can install these using the extensions tab on the left hand side of VS Code. Here are the ones this book recommends and why:

  • Markdown extensions - markdown is a simple text language that is often used to provide READMEs for code repositories. It comes with the file extension .md

    • Markdown All in One, to help writing Markdown docs.

    • Markdown Preview Enhanced, to view rendered markdown as you type it (right click and select ‘Open Preview…’).

  • Coding extensions

    • Jupyter provides support for Jupyter Notebooks

    • indent-rainbow, gives different levels of indentation different colours for ease of reading.

    • Path Intellisense, autocompletes filenames in code.

    • autoDocstring, automatically creates a basic doc string whenever you write a function.

  • Version control

    • Git History, view and search your git log along and show a graph of git commits with details.

    • GitLens, helps to visualise code authorship at a glance via ‘Git blame’ annotations, navigate and explore Git repositories, and more.

    • Git Graph, view a graph of your commit history.

    • Code Spell Checker, does exactly what it says, really useful for avoiding mangled variable name errors. If you need it to use, for example, ‘British English’, change the ‘C Spell: Language’ text from ‘en’ to ‘en-GB’ in VS Code’s settings. Other languages are available as separate extensions.

  • General

    • Rainbow CSV, uses colour to make plain old CSV files much more readable in VS Code.

    • vscode-icons, intelligent icons for your files as seen in the VS Code file explorer, eg a folder called data gets an icon showing a disc drive superimposed onto a folder.

    • polacode, take pictures of code snippets to share on social media

    • Excel viewer, does what it says

    • Selection Word Count, calculates and displays the word count of a document and, when there is a selection, the word count of a selection (both are shown in the status bar)

    • LiveShare, to collaborate on code with someone else in real-time

  • LaTeX - it’s a bit of surprise, but VS Code is one of the best LaTeX editors out there. You will need LaTeX installed already though and initial setup of a compilation ‘recipe’ is a bit fiddly (though, once it works, it’s dreamy).

    • LaTeX Workshop, provides core features for LaTeX typesetting with Visual Studio Code.

    • LaTeX Preview, both in-line and side-by-side previews of LaTeX code. A really fantastic extension.

There are some extensions that most people won’t need but which experienced coders may find useful:

  • Github Pull Request — allows you to review and manage GitHub pull requests and issues in Visual Studio Code

  • Remote development — allows you to open any folder in: a container, a remote machine, or the Windows Subsystem for Linux (WSL)

  • Remote - WSL — run VS Code in the Windows Subsystem for Linux

  • Remote - SSH — run VS Code over an SSH connection, eg in the cloud

  • Remote - Container — run VS Code in a Docker container

  • Docker - makes it easy to build, manage, and deploy Docker containers from Visual Studio Code

A useful setting (under the cog icon, then settings) for coding is to change the ‘Editor: Render Whitespace’, aka editor.renderWhitespace, from ‘selection’ to ‘boundary’. This will now show any boundary whitespace, or more than one instance of whitespace contiguously, as a grey dot. This might seem odd but it’s really useful because the wrong amount of whitespace can create problems with code.