In the rest of this article, we’ll do a clustering analysis of food demand time series. You’ll learn how to:

- summarise a set of time series using feature extraction;
- use K-Means and a hierarchical method for time series clustering.

The full code is available on Github:

## Data set

We’ll use a weekly food sales time series collected by the US Department of Agriculture. This data set contains information about food sales by product category and subcategory. The time series is split by state, but we’ll use national total sales in each period.

Below is a sample of the data set:

Here’s what the whole data looks like:

## Feature-based Time Series Clustering

We’ll use a feature-based approach to time series clustering. This process involves two main steps:

- Summarise each time series into a set of features, such as the average value;
- Apply a conventional clustering algorithm to the feature set, such as K-means.

Let’s do each step in turn.

## Feature extraction using *tsfel*

We start by extracting a set of statistics to summarise each time series. The goal is to convert each series into a small set of features.

There are several tools for time series feature extraction. We’ll use *tsfel, *which provides a competitive performance relative to other approaches [3].

Here’s how you can use *tsfel*:

`import pandas as pd`

import tsfel# get configuration

cfg = tsfel.get_features_by_domain()

# extract features for each food subcategory

features = {col: tsfel.time_series_features_extractor(cfg, data[col])

for col in data}

features_df = pd.concat(features, axis=0)

This process results in a large number of features. Some of these may be redundant, so we carry a feature selection process.

Below, we apply three operations to the feature set:

- normalization: convert variables into a 0–1 value range;
- selection by variance: remove any variable with 0 variance;
- selection by correlation: remove any variable with a high correlation with another existing one.

`from sklearn.preprocessing import MinMaxScaler`

from sklearn.feature_selection import VarianceThreshold

from src.correlation_filter import correlation_filter# normalizing the features

features_norm_df = pd.DataFrame(MinMaxScaler().fit_transform(features_df),

columns=features_df.columns)

# removing features with 0 variance

min_var = VarianceThreshold(threshold=0)

min_var.fit(features_norm_df)

features_norm_df = pd.DataFrame(min_var.transform(features_norm_df),

columns=min_var.get_feature_names_out())

# removing correlated features

features_norm_df = correlation_filter(features_norm_df, 0.9)

features_norm_df.index = data.columns

## Clustering with K-Means

After preprocessing a data set, we’re ready to cluster time series. We summarise each series into a small set of unordered features. So, we can use any conventional algorithm for clustering. A popular choice is K-means.

With K-means, we need to pick the number of clusters we want. Unless we have some domain knowledge, there’s no obvious apriori value for this parameter. But, we can carry out a data-driven approach to select the number of clusters. We test different values and pick the best one.

Below, we test K-means with up to 24 clusters. Then, we pick the number of clusters that maximizes the silhouette score. This metric quantifies the cohesion of the clusters obtained.

`from sklearn.cluster import KMeans`

from sklearn.metrics import silhouette_scorekmeans_parameters = {

'init': 'k-means++',

'n_init': 100,

'max_iter': 50,

}

n_clusters = range(2, 25)

silhouette_coef = []

for k in n_clusters:

kmeans = KMeans(n_clusters=k, **kmeans_parameters)

kmeans.fit(features_norm_df)

score = silhouette_score(features_norm_df, kmeans.labels_)

silhouette_coef.append(score)

The silhouette score is maximized for 5 clusters as shown in the figure below.

We can draw a parallel coordinates plot to understand the profile of each cluster. Here’s an example with a sample of three features:

We can also use the information about clusters to improve demand forecasting models. For example, by building a model for each cluster. The paper in reference [5] is a good example of this approach.

## Hierarchical clustering

Hierarchical clustering is an alternative to K-means. It combines pairs of clusters iteratively, leading to a tree-like structure. The library *scipy* provides an implementation for this method.

`import scipy.cluster.hierarchy as shc`# hierarchical clustering using the ward method

clustering = shc.linkage(features_norm_df, method='ward')

# plotting the dendrogram

dend = shc.dendrogram(clustering,

labels=categories.values,

orientation='right',

leaf_font_size=7)

The results of a hierarchical clustering model are best visualized with a dendrogram plot:

We can use the dendrogram to understand the clusters’ profiles. For example, we can see that most canned items are grouped (orange color). Oranges also cluster with pancake/cake mixes. These two often go together in people’s breakfast.