Expanding Time. How to Increase the Value of… | by Kurt Klingensmith | Jun, 2023


Let’s revisit the original plot of temperature over time from section 1. Here’s an example code block visualizing one station (Whitefish North) over time with seasons represented by different colors:

# Show one station with seasons plotted:
plot = px.scatter(df[df['station_key'] == 'Whitefish N'],
x='datetime', y='AirTempCelsius', color='Season',
color_discrete_sequence=["#3366cc", "#109618", "#d62728",
"#ff9900"])
plot.update_layout(
title={'text': "Temperature Patterns by Season
<br><sup>Data from Whitefish N, MT Weather Station</br>",
'xanchor': 'left',
'yanchor': 'top',
'x': 0.1},
xaxis_title='',
yaxis_title='Temperature in Celsius',
legend_title_text='Season:')
plot.show()

The resulting chart is:

Screenshot by author.

The new time features quickly show how seasons map to shifts in observed temperature. The addition of the seasons column is already proving useful, but revisiting the histogram from section 1 is even more interesting. The updated code below assigns the season as the value for facet_row in the plotly express histogram:

# Generate plot:
plot = px.histogram(df, x='AirTempCelsius', color='station_key',
barmode='overlay', facet_row='Season')
plot.update_layout(title={'text': "Temperature Recordings, 2019 to 2022
<br><sup>Whitefish N, MT and Harding
Cutoff, SD Weather Stations</sup>",
'xanchor': 'left',
'yanchor': 'top',
'x': 0.1}, legend_title_text='Month',
xaxis_title='Recorded Temperature')
plot.update_layout(legend_title_text='Weather Station:')
plot.update_yaxes(title="")
plot.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
plot.show()

The result is:

Screenshot by author.

Introducing new time features to the dataframe enhances the ability to compare the temperature distributions of the two weather stations over time. In this case, a noticeable divergence between the two stations occurs in Summer.

Here’s another example — suppose a meteorologist was interested in comparing the two weather stations in Summer 2020 by Julian date. Here’s how to visualize the temperature recordings using the new features:

# Prep Data:
df1 = df[df['Year'] == 2020]
df1.sort_values(by=['JulianDate'], inplace=True)

# Generate plot:
plot = px.line(df1[df1['Season'] == 'Summer'],
y="AirTempCelsius", x="JulianDate", color="station_key",
color_discrete_sequence=["#3366cc", "#d62728"])
plot.update_layout(title={'text': "Summer Temperature Recordings, 2020
<br><sup>Whitefish N, MT and Harding
Cutoff, SD Weather Stations</sup>",
'xanchor': 'left',
'yanchor': 'top',
'x': 0.1}, legend_title_text='Month',
xaxis_title='Julian Date',
yaxis_title='Temperature in Degrees Celsius')
plot.update_layout(legend_title_text='Weather Station:')
plot.show()

The graph looks like this:

Screenshot by author.

The additional time features allow analysts to quickly answer narrowly scoped questions; note how the Harding Cutoff station’s Summer 2020 temperatures were typically higher higher than Whitefish N until an anomalous crossover occurred in the latter half of the season.

3.1. Directly Using the Original Date and Time Column

Recall in section 2 we discussed directly accessing additional date and time features versus extracting them into new columns. The above graph, “Summer Temperature Recordings, 2020,” is reproducible by using Pandas dt functions on the original “datetime” column inside the plotly chart code:

# Generate Plot:
plot = px.line(df[(df.datetime.dt.year == 2020) &
((df.datetime.dt.month == 6) |
(df.datetime.dt.month == 7) |
(df.datetime.dt.month == 8))],
x=df[(df.datetime.dt.year == 2020) &
((df.datetime.dt.month == 6) |
(df.datetime.dt.month == 7) |
(df.datetime.dt.month == 8))].datetime.dt.strftime('%j'),
y=df[(df.datetime.dt.year == 2020) &
((df.datetime.dt.month == 6) |
(df.datetime.dt.month == 7) |
(df.datetime.dt.month == 8))].AirTempCelsius,
color="station_key",
color_discrete_sequence=["#d62728", "#3366cc"])
plot.update_layout(title={'text': "Summer Temperature Recordings, 2020
<br><sup>Whitefish N, MT and Harding
Cutoff, SD Weather Stations</sup>",
'xanchor': 'left',
'yanchor': 'top',
'x': 0.1}, legend_title_text='Month',
xaxis_title='Julian Date',
yaxis_title='Temperature in Degrees Celsius')
plot.update_layout(legend_title_text='Weather Station:')
plot.show()

This results in the same exact graph:

Screenshot by author.

The advantage of this technique is it does not require increasing the dataframe’s dimensionality which, for very large datasets, can help reduce computational load and prevent the dataframe frame from becoming too large to deal with. This technique also works well for a narrow analysis question requiring limited use of additional time features.

The disadvantage is the large amount of code required to format and extract specific features within a visualization function. This can hinder the interpretability and repeatability of the code. There may also be functions or code that are incompatible with in-line Pandas dt operations.



Source link

Leave a Comment