Boolean Data

Introduction

In this chapter, we’ll introduce boolean data: data that can be True or False (which can also be encoded as 1s or 0s). We’ll first look at the fundamental Python true and false boolean variables before seeing how true and false work in data frames.

Booleans

Some of the most important operations you will perform are with True and False values, also known as boolean data types. These are fundamental Python variables, just as numbers such as 1 are.

Boolean Variables and Conditions

To assign the value True or False to a variable is the same as with any other assignment:

bool_variable = True
bool_variable

True

There are two types of operation that are associated with booleans: boolean operations, in which existing booleans are combined, and condition operations, which create a boolean when executed.

Boolean operators that return booleans are as follows:

Operator	Description
`x and y`	are `x` and `y` both True?
`x or y`	is at least one of `x` and `y` True?
`not x`	is `x` False?

These behave as you’d expect: True and False evaluates to False, while True or False evaluates to True. There’s also the not keyword. For example

not True

False

as you would expect.

Conditions are expressions that evaluate as booleans. A simple example is 10 == 20. The == is an operator that compares the objects on either side and returns True if they have the same values–though be careful using it with different data types.

Here’s a table of conditions that return booleans:

Operator	Description
`x == y`	is `x` equal to `y`?
`x != y`	is `x` not equal to `y`?
`x > y`	is `x` greater than `y`?
`x >= y`	is `x` greater than or equal to `y`?
`x < y`	is `x` less than `y`?
`x <= y`	is `x` less than or equal to `y`?
`x is y`	is `x` the same object as `y`?

As you can see from the table, the opposite of == is !=, which you can read as ‘not equal to the value of’. Here’s an example of ==:

boolean_condition = 10 == 20
print(boolean_condition)

False

Exercise

What does not (not True) evaluate to?

The real power of conditions comes when we start to use them in more complex examples. Some of the keywords that evaluate conditions are if, else, and, or, in, not, and is. Here’s an example showing how some of these conditional keywords work:

name = "Ada"
score = 99

if name == "Ada" and score > 90:
    print("Ada, you achieved a high score.")

if name == "Smith" or score > 90:
    print("You could be called Smith or have a high score")

if name != "Smith" and score > 90:
    print("You are not called Smith and you have a high score")

Ada, you achieved a high score.
You could be called Smith or have a high score
You are not called Smith and you have a high score

All three of these conditions evaluate as True, and so all three messages get printed. Given that == and != test for equality and not equal, respectively, you may be wondering what the keywords is and not are for. Remember that everything in Python is an object, and that values can be assigned to objects. == and != compare values, while is and not compare objects. For example,

name_list = ["Ada", "Adam"]
name_list_two = ["Ada", "Adam"]

# Compare values
print(name_list == name_list_two)

# Compare objects
print(name_list is name_list_two)

True
False

Note that code with lots of branching if statements is not very helpful to you or to anyone else who reads your code. Some automatic code checkers will pick this up and tell you that your code is too complex. Almost all of the time, there’s a way to rewrite your code without lots of branching logic that will be better and clearer than having many nested if statements.

One of the most useful conditional keywords is in. This one must pop up ten times a day in most coders’ lives because it can pick out a variable or make sure something is where it’s supposed to be.

name_list = ["Lovelace", "Smith", "Hopper", "Babbage"]

print("Lovelace" in name_list)

print("Bob" in name_list)

True
False

Exercise

Check if “a” is in the string “Walloping weasels” using in. Is “a” in “Anodyne”?

The opposite is not in.

Finally, one conditional construct you’re bound to use at some point, is the if…else structure:

score = 98

if score == 100:
    print("Top marks!")
elif score > 90 and score < 100:
    print("High score!")
elif score > 10 and score <= 90:
    pass
else:
    print("Better luck next time.")

High score!

Note that this does nothing if the score is between 11 and 90, and prints a message otherwise.

Exercise

Create a new if … elif … else statement that prints “well done” if a score is over 90, “good” if between 40 and 90, and “bad luck” otherwise.

One nice feature of Python is that you can make multiple boolean comparisons in a single line.

a, b = 3, 6

1 < a < b < 20

True

Conditions in list comprehensions

List comprehensions are an incredibly useful pattern in Python. Here’s a simple one that produces a list of the first 12 numbers starting from 0:

[x for x in range(12)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Booleans bring conditionality to the table. We’ll add an if statement followed by a condition that evaluates to either True or False depending on the value of x. So, for example, we can ask for only those numbers that are divisible by 2:

[x for x in range(12) if x % 2 == 0]

[0, 2, 4, 6, 8, 10]

This trick even works with an else clause (but note that we have moved both if and else before the for x in ... part)

[x if x % 2 == 0 else "Not divisible by 2" for x in range(12)]

[0,
 'Not divisible by 2',
 2,
 'Not divisible by 2',
 4,
 'Not divisible by 2',
 6,
 'Not divisible by 2',
 8,
 'Not divisible by 2',
 10,
 'Not divisible by 2']

Truthsy and Falsy Values

Python objects can be used in expressions that will return a boolean value, such as when a list, listy, is used with if listy. Built-in Python objects that are empty are usually evaluated as False, and are said to be ‘Falsy’. In contrast, when these built-in objects are not empty, they evaluate as True and are said to be ‘truthy’. Let’s see some examples:

listy = []
other_listy = [1, 2, 3]

if not (listy):
    print("Falsy")
else:
    print("Truthy")

Falsy

if not (other_listy):
    print("Falsy")
else:
    print("Truthy")

Truthy

The method doesn’t just operate on lists; it’ll work for many various other truthy and falsy objects:

if not 0:
    print("Falsy")
else:
    print("Truthy")

Falsy

if not [0, 0, 0]:
    print("Falsy")
else:
    print("Truthy")

Truthy

Note that zero was falsy, its the nothing of a float, but a list of three zeros is not an empty list, so it evaluates as truthy.

if not None:
    print("Falsy")
else:
    print("Truthy")

Falsy

Knowing what is truthy or falsy is useful in practice; imagine you’d like to default to a specific behaviour if a list called list_vals doesn’t have any values in. You now know you can do it simply with if list_vals.

any() and all()

Of course, there is a big wide world of booleans out there; they don’t always occur on their own. That’s why the operators any() and all() exist. These apply to iterables of booleans, like a list of booleans.

any() takes a list of booleans with at least one true value and returns true:

any([True, False, False])

True

all() takes a list of booleans and returns true only if all values are true:

all([True, True, True, True])

True

Both of these also work for 1s and 0s:

all([0, 0, 0, 1])

False

Booleans in pandas data frames

Operations on booleans in data frames

Quite often, you will run into a scenario where you’re working with data that have True or False values in a data frame. It is easy to create a column of booleans in a pandas data frame:

import pandas as pd

df = pd.DataFrame.from_dict(
    {
        "bool_col_1": [False] * 3 + [True, True],
        "bool_col_2": [True, False, True, False, True],
    }
)
df

	bool_col_1	bool_col_2
0	False	True
1	False	False
2	False	True
3	True	False
4	True	True

We can perform operations on these just like regular pandas data frame columns. These accept & (and), | (or), == (equal), and != (not equal) as operations:

df["bool_col_1"] | df["bool_col_2"]

0     True
1    False
2     True
3     True
4     True
dtype: bool

Quite often, it’s useful to have a count of the number of true values. If you take the sum of boolean columns in a pandas data frame, it will tot up the number of True values:

df.sum()

bool_col_1    2
bool_col_2    3
dtype: int64

And if you ever get data formatted as 1s and 0s rather than True and False, it’s easy to convert by changing the data type:

df = pd.DataFrame.from_dict({"bool_col": [0, 1, 0, 1, 1]})
df["bool_col"].astype(bool)

0    False
1     True
2    False
3     True
4     True
Name: bool_col, dtype: bool

Creating booleans from comparisons using columns

It’s also possible to create boolean columns from numerical (or some other) columns. Let’s use the diamonds dataset to demonstrate this:

diamonds = pd.read_csv(
    "https://github.com/mwaskom/seaborn-data/raw/master/diamonds.csv"
)
diamonds.head()

	carat	cut	color	clarity	depth	table	price	x	y	z
0	0.23	Ideal	E	SI2	61.5	55.0	326	3.95	3.98	2.43
1	0.21	Premium	E	SI1	59.8	61.0	326	3.89	3.84	2.31
2	0.23	Good	E	VS1	56.9	65.0	327	4.05	4.07	2.31
3	0.29	Premium	I	VS2	62.4	58.0	334	4.20	4.23	2.63
4	0.31	Good	J	SI2	63.3	58.0	335	4.34	4.35	2.75

We’re going to create a new boolean variable for whenever the price is above 1000.

diamonds["expensive"] = diamonds["price"] > 1000
diamonds.sample(10)

	carat	cut	color	clarity	depth	table	price	x	y	z	expensive
5039	1.08	Premium	I	SI2	60.1	59.0	3750	6.75	6.69	4.04	True
18376	0.29	Good	E	VVS2	63.2	60.0	617	4.17	4.19	2.64	False
16248	1.01	Ideal	G	VS1	61.8	57.0	6499	6.40	6.45	3.97	True
40641	0.41	Ideal	E	VS1	62.7	55.0	1153	4.76	4.72	2.97	True
14852	1.07	Ideal	F	SI1	62.7	56.0	5982	6.47	6.53	4.08	True
39411	0.41	Ideal	F	VS1	61.3	55.0	1076	4.83	4.79	2.95	True
18891	0.90	Very Good	E	IF	63.9	55.0	7747	6.05	6.09	3.88	True
38260	0.30	Ideal	I	SI1	61.8	57.0	382	4.30	4.34	2.67	False
18363	0.37	Very Good	F	SI1	62.5	58.0	616	4.52	4.57	2.84	False
11523	1.24	Premium	J	VS2	61.4	59.0	5026	6.91	6.83	4.22	True

Of course, this could also have been achieved in a call to assign:

diamonds.assign(expensive=lambda x: x["price"] > 1000).head()

	carat	cut	color	clarity	depth	table	price	x	y	z	expensive
0	0.23	Ideal	E	SI2	61.5	55.0	326	3.95	3.98	2.43	False
1	0.21	Premium	E	SI1	59.8	61.0	326	3.89	3.84	2.31	False
2	0.23	Good	E	VS1	56.9	65.0	327	4.05	4.07	2.31	False
3	0.29	Premium	I	VS2	62.4	58.0	334	4.20	4.23	2.63	False
4	0.31	Good	J	SI2	63.3	58.0	335	4.34	4.35	2.75	False

Another use of booleans that is quite useful when it comes to data frames is the .isin() function. For example, if you just want some True or False values for whether a set of columns is in a data frame:

diamonds.columns.isin(["x", "y", "z"])

array([False, False, False, False, False, False, False,  True,  True,
        True, False])

any() and all() in data frames

A pandas column of booleans behaves a lot like a list of booleans, and we can apply the same logic to it via pandas built-in .any() and .all() methods. We expect some entries for "expensive" to be true, so any() should return true:

diamonds["expensive"].any()

np.True_

Logical subsetting

Although we’ve been effectively using this all along, it’s useful to make it explicit: booleans can be used to logically subset a data frame. Let’s say we only want the bits of a data frame where x is greater than y:

diamonds[diamonds["x"] > diamonds["y"]]

	carat	cut	color	clarity	depth	table	price	x	y	z	expensive
1	0.21	Premium	E	SI1	59.8	61.0	326	3.89	3.84	2.31	False
8	0.22	Fair	E	VS2	65.1	61.0	337	3.87	3.78	2.49	False
11	0.23	Ideal	J	VS1	62.8	56.0	340	3.93	3.90	2.46	False
12	0.22	Premium	F	SI1	60.4	61.0	342	3.88	3.84	2.33	False
14	0.20	Premium	E	SI2	60.2	62.0	345	3.79	3.75	2.27	False
...	...	...	...	...	...	...	...	...	...	...	...
53928	0.79	Premium	E	SI2	61.4	58.0	2756	6.03	5.96	3.68	True
53929	0.71	Ideal	G	VS1	61.4	56.0	2756	5.76	5.73	3.53	True
53930	0.71	Premium	E	SI1	60.5	55.0	2756	5.79	5.74	3.49	True
53931	0.71	Premium	F	SI1	59.8	62.0	2756	5.74	5.73	3.43	True
53938	0.86	Premium	H	SI2	61.0	58.0	2757	6.15	6.12	3.74	True

23423 rows × 11 columns

The expression diamonds["x"] > diamonds["y"] creates a column of booleans that is used to filter to just the rows where the condition is true.