Effective coding with dates and times in Python | by Alicia Horsch | Aug, 2023


The datetime package lets you create date and datetime objects easily from scratch that can be used, for example, as thresholds for filtering (try printing the created objects below and their types to better understand their format).

Also, datetime lets you create date and time objects that refer to today or now.

Be careful here, as datetime objects are usually “timezone naive” and do not refer to a specific time zone, which may get you into trouble when working with international colleagues!

With the help of the zoneinfo module (built-in since Python version 3.9), you can set the timezone with the tz parameter of astimezone().

You might find yourself in a situation where you want to display your datetime object as a string or convert a string into a datetime object. Here, the functions strftime() and strptime() are helpful.

Converting a datetime object (or parts of it) to a string

Commonly used format codes for describing datetime objects can be found here.

Converting a string into a datetime object

Parsing complex strings using dateutil

If you are handling large datasets, numpy’s datetime64 may come in handy as, due to its design, it can be much faster than working with datetime and dateutil objects. The datetime64 data type in numpy encodes dates and times as 64-bit integers.

This stores dates and times compactly and allows vectorized operations (repeated operations applied to each element of a numpy array).

As you can see when running the code above, with a datetime or dateutil object, vectorized operations will give you an error.

Pandas can be a good choice when working on a time series data project.

The famous data-wrangling library pandas combines the convenience of datetime and dateutil with the effective storing and manipulation possibility from numpy.

Create a pandas dataframe (from CSV) parsing a date column

Now, we have a basic understanding of handling dates and times in Python using numpy and pandas. However, often, we do not create dates and times ourselves, but they are already part of the dataset we are dealing with. Let’s create a pandas data frame with a date column (Kaggle dataset NFL).

As you can see, when loading from a CSV, the column that holds a date is turned into a string format if not specified anywhere precisely. To receive the date format, you could create an extra column called “gameDate_dateformat” or directly pass the date column through the parameter parse_dates in pd.read_csv().

Another handy manipulation when working with time series data is to be able to filter by date/time or subsetting a data frame using date/time. There are two methods to do this: filtering/subsetting or indexing.

Filtering pandas data frames by time

Make sure that the threshold date you use for subsetting has the same format as the column!

If the column you want to filter by has the format datetime (like in the example), the comparison date cannot be a date but needs to have a datetime format!

Indexing pandas data frames by time

Even more powerful is indexing a pandas data frame by date or time.

Indexing can be especially useful when working with time series, as there are methods like rolling windows and time-shifting.

Often, we are not interested in the date itself but maybe the duration, the weekday, or just a part of the datetime, e.g. the year. For this, datetime but also pandas provide some useful manipulations.

Timedelta

With pandas, you can calculate, for example, the difference between two datetimes. For this, we will look at a different dataset of Uber trips (Kaggle dataset Uber) with a start and an end timestamp. Some preprocessing is needed (delete the Total Row) to start looking into timedelta.

Extract the weekday or the month

This works slightly differently for the single datetime versus the pandas Series. While the weekday or the month of the single datetime object can be directly accessed by adding an attribute (e.g., .month) or method (e.g., weekday()), the pandas Series always needs the .dt accessor.

The dt. accessor allows you to access datetime-specific attributes and methods from a datetime Series.

Create a date/time lag

Another helpful manipulation for time series data could be to add an extra column that adds a lag of a date or datetime.

To work with date or time objects in Python, knowing the basics of the built-in package datetime (e.g. date() or strftime() and strptime()) are beneficial. Zoneinfo is a new built-in package (since version 3.9) which is more convenient than third-party modules when working with different time zones. Dateutil is a valuable library for more advanced date and time manipulations when working with single date objects, e.g., parsing complex strings. When working with dates and times in data frames, Series, or arrays, pandas combines the benefits of datetime, dateutil, and numpy and serves as a convenient library.

Sources



Source link

Leave a Comment