Time Series for Climate Change: Reducing Food Waste with Clustering | by Vitor Cerqueira | Jun, 2023

In the rest of this article, we’ll do a clustering analysis of food demand time series. You’ll learn how to:

  • summarise a set of time series using feature extraction;
  • use K-Means and a hierarchical method for time series clustering.

The full code is available on Github:

Data set

We’ll use a weekly food sales time series collected by the US Department of Agriculture. This data set contains information about food sales by product category and subcategory. The time series is split by state, but we’ll use national total sales in each period.

Below is a sample of the data set:

Amount of sales by product sub-category in the USA (in millions of dollars)

Here’s what the whole data looks like:

Sales amount (millions of dollars) for different food sub-categories. Image by author.

Feature-based Time Series Clustering

We’ll use a feature-based approach to time series clustering. This process involves two main steps:

  1. Summarise each time series into a set of features, such as the average value;
  2. Apply a conventional clustering algorithm to the feature set, such as K-means.

Let’s do each step in turn.

Feature extraction using tsfel

We start by extracting a set of statistics to summarise each time series. The goal is to convert each series into a small set of features.

There are several tools for time series feature extraction. We’ll use tsfel, which provides a competitive performance relative to other approaches [3].

Here’s how you can use tsfel:

import pandas as pd
import tsfel

# get configuration
cfg = tsfel.get_features_by_domain()

# extract features for each food subcategory
features = {col: tsfel.time_series_features_extractor(cfg, data[col])
for col in data}

features_df = pd.concat(features, axis=0)

This process results in a large number of features. Some of these may be redundant, so we carry a feature selection process.

Below, we apply three operations to the feature set:

  • normalization: convert variables into a 0–1 value range;
  • selection by variance: remove any variable with 0 variance;
  • selection by correlation: remove any variable with a high correlation with another existing one.
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_selection import VarianceThreshold
from src.correlation_filter import correlation_filter

# normalizing the features
features_norm_df = pd.DataFrame(MinMaxScaler().fit_transform(features_df),

# removing features with 0 variance
min_var = VarianceThreshold(threshold=0)
features_norm_df = pd.DataFrame(min_var.transform(features_norm_df),

# removing correlated features
features_norm_df = correlation_filter(features_norm_df, 0.9)
features_norm_df.index = data.columns

Clustering with K-Means

After preprocessing a data set, we’re ready to cluster time series. We summarise each series into a small set of unordered features. So, we can use any conventional algorithm for clustering. A popular choice is K-means.

With K-means, we need to pick the number of clusters we want. Unless we have some domain knowledge, there’s no obvious apriori value for this parameter. But, we can carry out a data-driven approach to select the number of clusters. We test different values and pick the best one.

Below, we test K-means with up to 24 clusters. Then, we pick the number of clusters that maximizes the silhouette score. This metric quantifies the cohesion of the clusters obtained.

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

kmeans_parameters = {
'init': 'k-means++',
'n_init': 100,
'max_iter': 50,

n_clusters = range(2, 25)
silhouette_coef = []
for k in n_clusters:
kmeans = KMeans(n_clusters=k, **kmeans_parameters)

score = silhouette_score(features_norm_df, kmeans.labels_)


The silhouette score is maximized for 5 clusters as shown in the figure below.

Silhouette score for up to 24 clusters. Image by author.

We can draw a parallel coordinates plot to understand the profile of each cluster. Here’s an example with a sample of three features:

Parallel coordinates plot with a feature sample. Image by author.

We can also use the information about clusters to improve demand forecasting models. For example, by building a model for each cluster. The paper in reference [5] is a good example of this approach.

Hierarchical clustering

Hierarchical clustering is an alternative to K-means. It combines pairs of clusters iteratively, leading to a tree-like structure. The library scipy provides an implementation for this method.

import scipy.cluster.hierarchy as shc

# hierarchical clustering using the ward method
clustering = shc.linkage(features_norm_df, method='ward')

# plotting the dendrogram
dend = shc.dendrogram(clustering,

The results of a hierarchical clustering model are best visualized with a dendrogram plot:

Visualizing the results of hierarchical clustering using a dendrogram. Image by author

We can use the dendrogram to understand the clusters’ profiles. For example, we can see that most canned items are grouped (orange color). Oranges also cluster with pancake/cake mixes. These two often go together in people’s breakfast.

Source link

Leave a Comment