More Coding#

Introduction#

This chapter covers more programming, building on Coding Basics. Some of it will come in useful as you do more in code.

This chapter has benefitted from the online book Research Software Engineering with Python, the official Python documentation, the excellent 30 days of Python, and the Hitchhiker’s Guide to Python.

Sets#

A set in coding is a collection of unordered and unindexed distinct elements (in analogy to the mathematical definition of a set). To define a set, the two commands are:

st = {}
# or
st = set()

These aren’t very interesting though! Here’s a set with some values in:

people_set = {"Robinson", "Fawcett", "Ostrom"}

What can we do with it? We can check its length using len(people_set) and we can ask whether a particular entry is contained within it:

"Ostrom" in people_set
True

We can add multiple items or another set using .update or .union, or a single item using:

people_set.add("Martineau")
people_set
{'Fawcett', 'Martineau', 'Ostrom', 'Robinson'}

We can remove entries with .remove(entry_name) or, to remove only the last entry .pop(). You can easily convert between lists and sets:

list(people_set)
['Robinson', 'Fawcett', 'Ostrom', 'Martineau']

The real benefits of sets are that they support set operations, though. The most important are intersection,

st1 = {"item1", "item2", "item3", "item4"}
st2 = {"item3", "item2"}
st1.intersection(st2)
{'item2', 'item3'}

difference,

st1 = {"item1", "item2", "item3", "item4"}
st2 = {"item2", "item3"}
st1.difference(st2)
{'item1', 'item4'}

and symmetric difference,

st1 = {"item1", "item2", "item3", "item4"}
st2 = {"item2", "item3"}
st2.symmetric_difference(st1)
{'item1', 'item4'}

Truthy and falsy values#

Python objects can be used in expressions that will return a boolean value, such as when a list, listy, is used with if listy. Built-in Python objects that are empty are usually evaluated as False, and are said to be ‘Falsy’. In contrast, when these built-in objects are not empty, they evaluate as True and are said to be ‘truthy’.

(If you are building your own classes, you can define this behaviour for them through the __bool__ dunder method.)

Let’s see some examples:

def bool_check_var(input_variable):
    if not (input_variable):
        print("Falsy")
    else:
        print("Truthy")


listy = []
other_listy = [1, 2, 3]


bool_check_var(listy)
Falsy
bool_check_var(other_listy)
Truthy

The method we defined doesn’t just operate on lists; it’ll work for many various other truthy and falsy objects:

bool_check_var(0)
Falsy
bool_check_var([0, 0, 0])
Truthy

Note that zero was falsy, its the nothing of a float, but a list of three zeros is not an empty list, so it evaluates as truthy.

bool_check_var({})
Falsy
bool_check_var(None)
Falsy

Knowing what is truthy or falsy is useful in practice; imagine you’d like to default to a specific behaviour if a list called list_vals doesn’t have any values in. You now know you can do it simply with if list_vals.

Lambda functions#

Lambda functions are a very old idea in programming, and are part of the functional programming paradigm. Coding languages tend to be more object-oriented or functional, with the object-oriented approach originating with Alan Turing’s “Turing Machines” and the functional approach with Alonso Church’s “lambda calculus”. These two approaches are mathematically equivalent and, on a more practical note, high-level programming languages often mix both. As examples, Haskell is strongly a functional language, statistics language R leans toward being more functional, Python is slightly more object oriented, and powerhouse languages like Fortran and C are object-oriented. However, despite being less functional than some languages, Python does have lambda functions, for example:

plus_one = lambda x: x + 1
plus_one(3)
4

For a one-liner function that has a name it’s actually better practice here to use def plus_one(x): return x + 1, so you shouldn’t see this form of lambda function too much in the wild. However, you are likely to see lambda functions being used with dataframes and other objects. For example, if you had a dataframe with a column of string called ‘strings’ that you want to change to “Title Case” and replace one phrase with another, you could use lambda functions to do that (there are better ways of doing this but this is useful as a simple example):

import pandas as pd

df = pd.DataFrame(
    data=[["hello my blah is Ada"], ["hElLo mY blah IS Adam"]],
    columns=["strings"],
    dtype="string",
)
df["strings"].apply(lambda x: x.title().replace("Blah", "Name"))
0     Hello My Name Is Ada
1    Hello My Name Is Adam
Name: strings, dtype: object

More complex lambda functions can be constructed, eg lambda x, y, z: x + y + z. One of the best use cases of lambdas is when you don’t want to go to the trouble of declaring a function. For example, let’s say you want to compose a series of functions and you want to specify those functions in a list, one after the other. Using functions alone, you’d have to define a new function for each operation. With lambdas, it would look like this (again, there are easier ways to do this operation, but we’ll use simple functions to demonstrate the principle):

number = 1
for func in [lambda x: x + 1, lambda x: x * 2, lambda x: x ** 2]:
    number = func(number)
    print(number)
2
4
16

Note that people often use x by convention, but there’s nothing to stop you writing lambda horses: horses**2 (apart from the looks your co-authors will give you).

Exercise

Write a lambda function that takes the square root of an input number.

If you want to learn more about lambda functions, check out these short video tutorials.

Splat and splatty-splat#

You read those right, yes. These are also known as “unpacking operators” for iterables that are fed into functions as arguments (in the form of a tuple) and keyword arguments (in the form of a dictionary) respectively. Splat is * and splatty-splat is **. Because they unpack, they allow us to efficiency send packages of arguments or keyword arguments into functions without labouriously writing out every single argument.

Because function arguments are always tuples, the use of * must be accompanied by a tuple. Because function keywords are always dictionaries of key, value pairs, the use of ** must always be accompanied by a dictionary.

Let’s take a look at splat, which unpacks tuples into function arguments. If we have a function that takes two arguments we can send variables to it in different ways:

def add(a, b):
    return a + b


print(add(5, 10))

func_args = (6, 11)

print(add(*func_args))
15
17

The splat operator, *, unpacks the variable func_args into two different function arguments.

Perhaps surprisingly, we can use the splat operator in the definition of a function. For example, sum_elements below

def sum_elements(*elements):
    return sum(*elements)


nums = (1, 2, 3)

print(sum_elements(nums))

more_nums = (1, 2, 3, 4, 5)

print(sum_elements(more_nums))
6
15

Exercise

Write a function multiply that multiplies two input numbers, a and b, together and returns the answer. Send the argument (10, 12) to it using the splat operator.

Splatty-splat, **, unpacks dictionaries into keyword arguments (aka kwargs):

def function_with_kwargs(a, x=0, y=0, z=0):
    return a + x + y + z


print(function_with_kwargs(5))

kwargs = {"x": 3, "y": 4, "z": 5}

print(function_with_kwargs(5, **kwargs))
5
17

Exercise

Using a dictionary and splatty-splat with the function_with_kwargs function, find the sum of 9, 6, 13, and 2.

Higher order functions#

Functions are like any other variable in Python, which means you can do some interesting things with them and, well, it can get a bit meta. For example, a function can take one or more functions as parameters, a function can be returned as a result of another function, functions can be defined within functions, a function can be assigned to a variable, and you can iterate over functions (for example, if they are in a list).

Here’s an example that shows how to use a higher order function: it accepts a function, f, as an argument and then, using the splat operator *, it accepts all arguments of that function.

def join_a_string(str_list):
    return " ".join(str_list)


def higher_order_function(f, *args):
    """Lowers case of result"""
    out_string = f(*args)
    return out_string.lower()


result = higher_order_function(join_a_string, ["Hello", "World!"])
print(result)
hello world!

In the next example, we show how to return a function from a function (assigning a function, result, to a variable in the process):

def square(x):
    return x ** 2


def cube(x):
    return x ** 3


def higher_order_function(type):  # a higher order function returning a function
    if type == "square":
        return square
    elif type == "cube":
        return cube


result = higher_order_function("square")
print(f"Using higher_order_function('square'), result(3) yields {result(3)}")
result = higher_order_function("cube")
print(f"Using higher_order_function('cube'), result(3) yields {result(3)}")
Using higher_order_function('square'), result(3) yields 9
Using higher_order_function('cube'), result(3) yields 27

Functions within functions are allowed. They are known as closures. Here’s a simple (if contrived) example:

from datetime import datetime


def print_time_now():
    def get_curr_time():
        return datetime.now().strftime("%H:%M")

    now = get_curr_time()
    print(now)


print_time_now()
18:04

Finally, let’s see how to iterate over functions

def square_root(x):
    return x ** (0.5)


functions_list = [square_root, square, cube]

for func in functions_list:
    print(f"{func.__name__} applied to 4 is {func(4)}")
square_root applied to 4 is 2.0
square applied to 4 is 16
cube applied to 4 is 64

Iterators#

An iterator is an object that contains a countable number of values that a single command, next, iterates through. Before that’s possible though, we need to take a countable group of some kind and use the iter keyword on it to turn it into an iterator. Let’s see an example with some text:

text_lst = ["Mumbai", "Delhi", "Bangalore"]

myiterator = iter(text_lst)

Okay, nothing has happened yet, but that’s because we didn’t call it yet. To get the next iteration, whatever it is, use next:

next(myiterator)
'Mumbai'
next(myiterator)
'Delhi'
next(myiterator)
'Bangalore'

Alright, we’ve been through all of the values so… what’s going to happen next!?

next(myiterator)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-27-29fb3b4dbbec> in <module>
----> 1 next(myiterator)

StopIteration: 

Iterating beyond the end raises a StopIteration error because we reached the end. To keep going, use cycle in place of iter. Note that you can build your own iterators (here we used a built-in object type, the list, to create an iterator of type list_iterator).

Generators#

Generator functions return ‘lazy’ iterators. They are lazy because they do not store their contents in memory. This has big advantages for some operations in specific situations: datasets larger than can fit into your computer’s memory, or a complex function that needs to maintain an internal state every time it’s called.

To give an idea of how and when they work, imagine that (exogeneously) integers are really costly, taking as much as 10 MB of space to store (the real figure is more like 128 bytes). We will write a function, “firstn”, that represents the first \(n\) non-negative integers, where \(n\) is large. The most naive possible way of doing this would be to build the full list in memory like so:

def first_n_naive(n):
    """Build and return a list"""
    num, nums = 0, []
    while num < n:
        nums.append(num)
        num += 1
    return nums


sum_of_first_n = sum(first_n_naive(1000000))
sum_of_first_n
499999500000

Note that nums stores every number before returning all of them. In our imagined case, this is completely infeasible because we don’t have enough computer space to keep all \(n\) 10MB integers in memory.

Now we’ll rewrite the list-based function as a generator-based function:

def first_n_generator(n):
    """A generator that yields items instead of returning a list"""
    num = 0
    while num < n:
        yield num
        num += 1


sum_of_first_n = sum(first_n_generator(1000000))
sum_of_first_n
499999500000

Now, instead of creating an enormous list that has to be stored in memory, we yield up each number as it is ‘generated’. The cleverness that’s going on here is that the ‘state’ of the function is remembered from one call to the next. This means that when next is called on a generator object (either explicitly or implicitly, as in this example), the previously yielded variable num is incremented, and then yielded again.

That was a fairly contrived example but there are plenty of practical ones. Working with pipelines that process very large datasets is a classic use case. For example, imagine you have a csv file that’s far too big to fit in memory, i.e. open all at once, but you’d like to check the contents of each row and perhaps process them. The code below would yield each row in turn.

def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

An even more concise way of defining this is via a generator expression, which syntactically looks a lot like a list comprehension but is a generator rather than a list. The example we just saw would be written as:

csv_gen = (row for row in open(file_name))

It’s easier to see the difference in the below example which clearly shows the analogy between list comprehensions and generator comprehensions.

sq_nums_lc = [num ** 2 for num in range(2, 6)]
sq_nums_lc
[4, 9, 16, 25]
sq_nums_gc = (num ** 2 for num in range(2, 6))
sq_nums_gc
<generator object <genexpr> at 0x7f875f7d44a0>

The latter is a generator object and we can only access individual values calling next on it.

next(sq_nums_gc)
4

Note that for small numbers of entries, lists may actually be faster and more efficient than generators-but for large numbers of entries, generators will almost always win out.

Decorators#

Decorators ‘decorate’ functions, they adorn them, modifying them as they execute. Let’s say we want to run some numerical functions but we’d like to add ten on to whatever results we get. We could do it like this:

def multiply(num_one, num_two):
    return num_one * num_two


def add_ten(in_num):
    return in_num + 10


answer = add_ten(multiply(3, 4))
answer
22

This is fine for a one-off but a bit tedious if we’re going to be using add_ten a lot, and on many functions. Decorators allow for a more general solution that can be applied, in this case, to any inner function that has two arguments and returns a numeric value.

def add_ten(func):
    def inner(a, b):
        return func(a, b) + 10

    return inner


@add_ten
def multiply(num_one, num_two):
    return num_one * num_two


multiply(3, 4)
22

We can use the same decorator for a different function (albeit one of the same form) now.

@add_ten
def divide(num_one, num_two):
    return num_one / num_two


divide(10, 5)
12.0

But the magic of decorators is such that we can define them for much more general cases, regardless of the number of arguments or even keyword arguments:

def add_ten(func):
    def inner(*args, **kwargs):
        print("Function has been decorated!")
        print("Adding ten...")
        return func(*args, **kwargs) + 10

    return inner


@add_ten
def combine_three_nums(a, b, c):
    return a * b - c


@add_ten
def combine_four_nums(a, b, c, d=0):
    return a * b - c - d


combine_three_nums(1, 2, 2)
Function has been decorated!
Adding ten...
10

Let’s now see it applied to a function with a different number of (keyword) arguments:

combine_four_nums(3, 4, 2, d=2)
Function has been decorated!
Adding ten...
18

Decorators can be chained too (and order matters):

def dividing_line(func):
    def inner(*args, **kwargs):
        print("".join(["-"] * 30))
        out = func(*args, **kwargs)
        return out

    return inner


@dividing_line
@add_ten
def multiply(num_one, num_two):
    return num_one * num_two


multiply(3, 5)
------------------------------
Function has been decorated!
Adding ten...
25

Time#

Let’s do a quick dive into how to deal with dates and times. This is only going to scratch the surface, but should give a sense of what’s possible. For more, see the Introduction to Time chapter.

The built-in library that deals with datetimes is called datetime. Let’s import it and ask it to give us a very precise account of the datetime (when the code is executed):

from datetime import datetime

now = datetime.now()
print(now)
2022-10-28 18:04:21.023795

You can pick out bits of the datetime that you need:

day = now.day
month = now.month
year = now.year
hour = now.hour
minute = now.minute
print(f"{year}/{month}/{day}, {hour}:{minute}")
2022/10/28, 18:4

Exercise

Using an f-string, add seconds to the date and time string above.

To add or subtract time to a datetime, use timedelta:

from datetime import timedelta

new_time = now + timedelta(days=365, hours=5)
print(new_time)
2023-10-28 23:04:21.023795

To take the difference of two dates:

from datetime import date

new_year = date(year=2022, month=1, day=1)
time_till_ny = new_year - date.today()
print(f"{time_till_ny.days} days until New Year")
-300 days until New Year

Note that date and datetime are two different types of objects-a datetime includes information on the date and time, whereas a date does not.

Miscellaneous Fun#

Here are some other bits of basic coding that might be useful. They really show why Python is such a delightful language.

You can use unicode characters for variables

α = 15
β = 30

print(α / β)
0.5

You can swap variables in a single assignment:

a = 10
b = "This is a string"

a, b = b, a

print(a)
This is a string

itertools offers counting, repeating, cycling, chaining, and slicing. Here’s a cycling example that uses the next keyword to get the next iteraction:

from itertools import cycle

lorrys = ["red lorry", "yellow lorry"]
lorry_iter = cycle(lorrys)

print(next(lorry_iter))
print(next(lorry_iter))
print(next(lorry_iter))
red lorry
yellow lorry
red lorry

itertools also offers products, combinations, combinations with replacement, and permutations. Here are the combinations of ‘abc’ of length 2:

from itertools import combinations

print(list(combinations("abc", 2)))
[('a', 'b'), ('a', 'c'), ('b', 'c')]

Find out what the date is! (Can pass a timezone as an argument.)

from datetime import date

print(date.today())
2022-10-28

Because functions are just objects, you can iterate over them just like any other object:

functions = [str.isdigit, str.islower, str.isupper]

raw_str = "asdfaa3fa"

for str_func in functions:
    print(f"Function name: {str_func.__name__}, value is:")
    print(str_func(raw_str))
Function name: isdigit, value is:
False
Function name: islower, value is:
True
Function name: isupper, value is:
False

Functions can be defined recursively. For instance, the Fibonacci sequence is defined such that \( a_n = a_{n-1} + a_{n-2} \) for \( n>1 \).

def fibonacci(n):
    if n < 0:
        print("Please enter n>0")
        return 0
    elif n <= 1:
        return n
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)


[fibonacci(i) for i in range(10)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]