Introduction to Time#

In this section, we’ll introduce the tools you need to manipulate time… well, in Python at least. In this chapter, we’ll cover times, dates, datetimes, time zones, and differences in datetimes.

This chapter has benefitted from the Python Data Science Handbook by Jake VanderPlas, and strftime.org.

Python’s built-in datetime#

The datetime object is the fundamental time object in, for want of a better description, ‘base’ Python. It’s useful to know about these before moving on to datetime operations using pandas (which you’re far more likely to use in practice). It combines information on date and time, capturing as it does the year, month, day, hour, second, and microsecond. Let’s import the class that deals with datetimes (whose objects are of type datetime.datetime) and take a look at it.

from datetime import datetime

now = datetime.now()
print(now)
2024-01-05 15:41:10.902253

Most people will be more used to working with day-month-year, while some people even have month-day-year, which clearly makes no sense at all! But note datetime follows ISO 8601, the international standard for datetimes that has year-month-day-hrs:mins:seconds, with hours in the 24 hour clock format. This is the format you should use when coding too.

As ever, the excellent rich library can give us a good idea of what properties and methods are available for objects of type datetime.datetime via its inspect() method:

from rich import inspect

inspect(now)
╭───────────────────────── <class 'datetime.datetime'> ──────────────────────────╮
 datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]]) 
                                                                                
 ╭────────────────────────────────────────────────────────────────────────────╮ 
  datetime.datetime(2024, 1, 5, 15, 41, 10, 902253)                           
 ╰────────────────────────────────────────────────────────────────────────────╯ 
                                                                                
         day = 5                                                                
        fold = 0                                                                
        hour = 15                                                               
         max = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)              
 microsecond = 902253                                                           
         min = datetime.datetime(1, 1, 1, 0, 0)                                 
      minute = 41                                                               
       month = 1                                                                
  resolution = datetime.timedelta(microseconds=1)                               
      second = 10                                                               
      tzinfo = None                                                             
        year = 2024                                                             
╰────────────────────────────────────────────────────────────────────────────────╯

We can see that the variable we created has methods such as year, month, day, and so on, down to microsecond. When calling these methods on the now object we created, they will return the relevant detail.

Exercise

Try calling the year, month, and day functions on an instance of datetime.now().

Note that, once created, now does not refresh itself: it’s frozen at the time that it was made.

To create a datetime using given information the command is:

specific_datetime = datetime(2019, 11, 28)
print(specific_datetime)
2019-11-28 00:00:00

To make clearer and more readable code, you can also call this using keyword arguments: datetime(year=2019, month=11, day=28). Many of the operations you’d expect to just work with datetimes, do for example:

now > specific_datetime
True

Datetimes and strings#

One of the most common transformations you’re likely to need to do when it comes to times is the one from a string, like “4 July 2002”, to a datetime. You can do this using datetime.strptime(). Here’s an example:

date_string = "16 February in 2002"
datetime.strptime(date_string, "%d %B in %Y")
datetime.datetime(2002, 2, 16, 0, 0)

What’s going on? The pattern of the datestring is “day month ‘in’ year”. Python’s strptime function has codes for the different parts of a datetime (and the different ways they can be expressed). For example, if you had the short version of month instead of the long it would be:

date_string = "16 Feb in 2002"
datetime.strptime(date_string, "%d %b in %Y")
datetime.datetime(2002, 2, 16, 0, 0)

What about turning a datetime into a string? We can do that too, courtesy of the same codes.

now.strftime("%A, %m, %Y")
'Friday, 01, 2024'

Of course, you don’t always want to have to worry about the ins and outs of what you’re passing in, and the built-in dateutil is here for flexible parsing of formats should you need that (explicit is better than implicit though!):

from dateutil.parser import parse

date_string = "03 Feb 02"
print(parse(date_string))
date_string = "3rd February 2002"
print(parse(date_string))
2002-02-03 00:00:00
2002-02-03 00:00:00

You can find a close-to-comprehensive list of strftime codes at https://strftime.org/, but they’re reproduced in the table below for convenience.

Code

Meaning

Example

%a

Weekday as locale’s abbreviated name.

Mon

%A

Weekday as locale’s full name.

Monday

%w

Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.

1

%d

Day of the month as a zero-padded decimal number.

30

%-d

Day of the month as a decimal number. (Platform specific)

30

%b

Month as locale’s abbreviated name.

Sep

%B

Month as locale’s full name.

September

%m

Month as a zero-padded decimal number.

09

%-m

Month as a decimal number. (Platform specific)

9

%y

Year without century as a zero-padded decimal number.

13

%Y

Year with century as a decimal number.

2013

%H

Hour (24-hour clock) as a zero-padded decimal number.

07

%-H

Hour (24-hour clock) as a decimal number. (Platform specific)

7

%I

Hour (12-hour clock) as a zero-padded decimal number.

07

%-I

Hour (12-hour clock) as a decimal number. (Platform specific)

7

%p

Locale’s equivalent of either AM or PM.

AM

%M

Minute as a zero-padded decimal number.

06

%-M

Minute as a decimal number. (Platform specific)

6

%S

Second as a zero-padded decimal number.

05

%-S

Second as a decimal number. (Platform specific)

5

%f

Microsecond as a decimal number, zero-padded on the left.

000000

%z

UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).

%Z

Time zone name (empty string if the object is naive).

%j

Day of the year as a zero-padded decimal number.

273

%-j

Day of the year as a decimal number. (Platform specific)

273

%U

Week number of the year (Sunday as the first day of the week) as a zero padded decimal number.

39

%W

Week number of the year (Monday as the first day of the week) as a decimal number.

39

%c

Locale’s appropriate date and time representation.

Mon Sep 30 07:06:05 2013

%x

Locale’s appropriate date representation.

09/30/13

%X

Locale’s appropriate time representation.

07:06:05

%%

A literal ‘%’ character.

%

From time to time#

As well as recording a single datetime, there are plenty of occasions when we’ll be interested in differences in datetimes. Let’s create one and then check its type.

time_diff = now - datetime(year=2020, month=1, day=1)
print(time_diff)
1465 days, 15:41:10.902253

This is in the format of days, hours, minutes, seconds, and microseconds. Let’s check the type, and more, with inspect():

inspect(time_diff)
╭──────────────────────────── <class 'datetime.timedelta'> ─────────────────────────────╮
 Difference between two datetime values.                                               
                                                                                       
 ╭───────────────────────────────────────────────────────────────────────────────────╮ 
  datetime.timedelta(days=1465, seconds=56470, microseconds=902253)                  
 ╰───────────────────────────────────────────────────────────────────────────────────╯ 
                                                                                       
         days = 1465                                                                   
          max = datetime.timedelta(days=999999999, seconds=86399, microseconds=999999) 
 microseconds = 902253                                                                 
          min = datetime.timedelta(days=-999999999)                                    
   resolution = datetime.timedelta(microseconds=1)                                     
      seconds = 56470                                                                  
╰───────────────────────────────────────────────────────────────────────────────────────╯

This is of type datetime.timedelta.

In the zone#

Date and time objects may be categorized as aware or naive depending on whether or not they include timezone information; an aware object can locate itself relative to other aware objects, but a naive object does not contain enough information to unambiguously locate itself relative to other date/time objects. So far we’ve been working with naive datetime objects.

The pytz package can help us work with time zones. It has two main use cases: i) localise timezone-naive datetimes so that they become aware, ie have a timezone and ii) convert a datetimne in one timezone to another timezone.

The default timezone for coding is UTC. ‘UTC’ is Coordinated Universal Time. It is a successor to, but distinct from, Greenwich Mean Time (GMT) and the various definitions of Universal Time. UTC is now the worldwide standard for regulating clocks and time measurement.

All other timezones are defined relative to UTC, and include offsets like UTC+0800 - hours to add or subtract from UTC to derive the local time. No daylight saving time occurs in UTC, making it a useful timezone to perform date arithmetic without worrying about the confusion and ambiguities caused by daylight saving time transitions, your country changing its timezone, or mobile computers that roam through multiple timezones.

Let’s create a couple of time zone aware datetimes and look at their difference.

import pytz
from pytz import timezone

aware = datetime(tzinfo=pytz.UTC, year=2020, month=1, day=1)
unaware = datetime(year=2020, month=1, day=1)

us_tz = timezone("US/Eastern")
us_aware = us_tz.localize(unaware)

print(us_aware - aware)
5:00:00

So we find that there’s a five hour difference between UTC and the time on the East Coast of the USA. In the above, we used the localize() method to make convert a naive datetime into an aware one, and we also initiated an aware datetime directly.

For data where time really matters, such as some types of financial data, using timezone aware datetimes could prevent some nasty (and expensive) mistakes.

Exercise

Using datetime.now() and localize(), what is the time in the ‘Australia/Melbourne’ time zone?

A More User-Friendly Approach to Datetimes: arrow#

While Python’s standard library has near-complete date, time and timezone functionality, it’s not the most user-friendly. The arrow package attempts to offer a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps. Let’s take a quick look at some of the functionality of arrow.

Import arrow, create a datetime, and find the current datetime.

import arrow

dt = arrow.get("2013-05-11T21:23:00")
print(dt)
dt2 = arrow.now()
dt2
2013-05-11T21:23:00+00:00
<Arrow [2024-01-05T15:41:11.021698+00:00]>

Use arrow to shift a datetime back by an hour and a day.

dt.shift(hours=-1, days=-1)
<Arrow [2013-05-10T20:23:00+00:00]>

Convert to a different datetime:

dt2.to("US/Pacific")
<Arrow [2024-01-05T07:41:11.021698-08:00]>

Give simpler, human readable datetimes:

dt2.shift(hours=-1).humanize()
'an hour ago'

Vectorised Datetimes#

Now we come to vectorised operations on datetimes using the powerful numpy packages (and this is what is used by pandas). numpy has its own version of datetime, called np.datetime64, and it’s very efficient at scale. Let’s see it in action:

import numpy as np

date = np.array("2020-01-01", dtype=np.datetime64)
date
array('2020-01-01', dtype='datetime64[D]')

The ‘D’ tells us that the smallest unit here is days. We can easily create a vector of dates from this object:

date + range(32)
array(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
       '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
       '2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',
       '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
       '2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20',
       '2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24',
       '2020-01-25', '2020-01-26', '2020-01-27', '2020-01-28',
       '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-01'],
      dtype='datetime64[D]')

Note how the last day rolls over into the next month.

If you are creating a datetime with more precision than day, numpy will figure it out from the input, for example this gives resolution down to seconds.

np.datetime64("2020-01-01 09:00")
numpy.datetime64('2020-01-01T09:00')

One word of warning with numpy and datetimes though: the more precise you go, and you can go down to femtoseconds (\(10^{-15}\) seconds) if you wish, the smaller the range of dates you can hit. A popular choice of precision is datetime64[ns], which can encode times from 1678 AD to 2262 AD. Working with seconds gets you 2.9\(\times 10^9\) BC to 2.9\(\times 10^9\) AD.

We’ll be seeing much more of numpy datetimes in the next chapter, on time series.