Creating an Infographic With Matplotlib | by Andy McDonald | Jul, 2023


Geological Lithology Variations Within The Zechstein Group of the Norwegian Continental Shelf

Radial bar plots of lithology variation across the Norwegian Continental Shelf. Image by the author.

Creating exciting and compelling data visualisations is essential to working with data and being a data scientist. It allows us to provide information to readers in a concise form that helps the reader(s) understand data without them having to view the raw data values. Additionally, we can use charts and graphs to tell a compelling and interesting story that answers one or more questions about the data.

Within the Python world, there are numerous libraries that allow data scientists to create visualisations and one of the first that many come across when starting their data science journey is matplotlib. However, after working with matplotlib for a little while, many people turn to other more modern libraries as they view the basic matplotlib plots as boring and basic.

With a bit of time, effort, code, and an understanding of matplotlib’s capabilities, we can transform the basic and boring plots into something much more compelling and visually appealing.

In my past several articles, I have focused on how we can transform individual plots with various styling methods. If you want to explore improving matplotlib data visualisations further, you can check out some of my previous articles below:

These articles have mainly focused on single plots and styling them. Within this article, we are going to look at building infographics with matplotlib.

Infographics are used to transform complex datasets into compelling visual narratives that are informative and engaging for the reader. They visually represent data and consist of charts, tables and minimal text. Combining these allows us to provide an easy-to-understand overview of a topic or question.

After sharing my previous article on Polar Bar charts, I was tagged in a tweet from Russell Forbes, showing that it is possible to make infographics within matplotlib.

So, based on that, I thought to myself, why not try building an infographic with matplotlib.

And I did.

The following infographic was the result of that, and it is what we will be recreating in this article.

Example infographic that can be created using matplotlib. Image by the author.

Bear in mind that the infographic we will be building in this article may be suitable for web use or included within a presentation. However, if we were looking to include these within reports or display them in more formal settings, we may want to consider alternative colour palettes and a more professional feel.

Before we touch any data visualisation, we need to understand the purpose behind creating our infographic. Without this, it will be challenging to narrow down the plots we want to use and the story we want to tell.

For this example, we are going to use a set of well log derived lithology measurements that have been obtained from the Norwegian Continental Shelf. From this dataset, we are going to specifically look at the question:

What is the lithological variation of the Zechstein Group within this dataset?

This provides us with our starting point.

We know that we are looking for lithology data and data within the Zechstein Group.

To begin, we first need to import a number of key libraries.

These are pandas, for loading and storing our data, numpy for performing mathematical calculations to allow us to plot labels and data in a polar projections, matplotlib for creating our plot, and adjustText to ensure labels do not overlap on our scatter plot.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from adjustText import adjust_text

After the libraries have been imported, we next need to load our datasets. Details of the source for this dataset is included at the bottom of this article.

The first dataset we will load is the lithology composition of the Zechstein Group created in my previous article.

We can load this data in using pandas read_csv() function.

df = pd.read_csv('Data/LithologySummary.csv', index_col='WELL')

When we view our dataframe we have the following information about the lithologies present within the Zechstein Group as interpreted within each well.

Pandas dataframe containing lithology composition for eight wells that have penetrated the Zechstein Group. Image by the author.

To help our readers understand the data better, it would be good to have information about where the drilled wells intersected with the Zechstein Group.

We can load this data in the same way by using pd.read_csv(). However, this time, we do not need to set an index.

zechstein_well_intersections = pd.read_csv('Data/Zechstein_WellIntersection.csv')

When we view this dataframe we are presented with the following table containing the well name, the X & Y grid locations of where the well penetrated the Zechstein Group.

Pandas dataframe of the X & Y grid locations of where wells have penetrated the Zechstein Group.

Before we begin creating any figures, we need to create a few variables containing key information about our data. This will make things easier when it comes to making the plots.

First, we will get a list of all of the possible lithologies. This is done by converting the column names within our summary dataframe to a list.

lith_names = list(df.columns)

When we view this list, we get back the following lithologies.

Next, we need to decide how we want the individual plots within the infographic to be set up.

For this dataset, we have 8 wells, which will be used to generate 8 radial bar charts.

We also want to show well locations on the same figure as well. So this gives us 9 subplots.

One way we can subdivide our figure is to have 3 columns and 3 rows. This allows us to create our first variable, num_cols representing the number of columns.

We can then generalise the number of rows ( num_rows ) variable so that we can reuse it with other datasets. In this example, it will take the number of wells we have (the number of rows in the dataframe) and divide it by the number of columns we want. Using np.ceil will allow us to round this number up so that we have all of the plots on the figure.

# Set the number of columns for your subplot grid
num_cols = 3

# Get the number of wells (rows in the DataFrame)
num_wells = len(df)

# Calculate the number of rows needed for the subplot grid
num_rows = np.ceil(num_wells / num_cols).astype(int)

The next set of variables we need to declare are as follows:

  • indexes : creates a list of numbers ranging from 0 to the total number of items in our list. In our case, this will generate a list from 0 to 7, which covers the 8 lithologies in our dataset.
  • width : creates a list based on calculating the width of each bar in the chart by dividing the circumference of a circle by the number of rock types we have in rock_names
  • angles : creates a list containing the angles for each of the rock types
  • colours : a list of hexadecimal colours we want to use to represent each well
  • label_loc : creates a list of evenly spaced values between 0 and 2 * pi for displaying the rock-type labels
indexes = list(range(0, len(lith_names)))
width = 2*np.pi / len(lith_names)
angles = [element * width for element in indexes]

colours = ["#ae1241", "#5ba8f7", "#c6a000", "#0050ae",
"#9b54f3", "#ff7d67", "#dbc227", "#008c5c"]

label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(lith_names))

Adding Radial Bar Charts as Subplots

To begin creating our infographic, we first need to create a figure object. This is done by calling upon plt.figure().

To setup our figure, we need to pass in a few parameters:

  • figsize : controls the size of the infographic. As we may have varying numbers of rows, we can set the rows parameter to be a multiple of the number of rows. This will prevent the plots and figures from becoming distorted.
  • linewidth : controls the border thickness for the figure
  • edgecolor : sets the border colour
  • facecolor : sets the figure background colour
# Create a figure
fig = plt.figure(figsize=(20, num_rows * 7), linewidth=10,
edgecolor='#393d5c',
facecolor='#25253c')

Next, we need to define our grid layout. There are a few ways we can do this, but for this example, we are going to use GridSpec. This will allow us to specify the location of the subplots, and also the spacing between them.

# Create a grid layout
grid = plt.GridSpec(num_rows, num_cols, wspace=0.5, hspace=0.5)

We are now ready to begin adding our radial bar plots.

To do this, we need to loop over each row within the lithology composition summary dataframe and add an axis to the grid using add_subplot() As we are plotting radial bar charts, we want to set the projection parameter to polar.

Next, we can begin adding our data to the plot by calling upon ax.bar. Within this call, we pass in:

  • angles : provides the location of the bar in the polar projection and is also used to position the lithology labels
  • height : uses the percentage values for the current row to set the height of each bar
  • width : used to set the width of the bar
  • edgecolor : sets the edge colour of the radial bars
  • zorder : used to set the plotting order of the bars on the figure. In this case it is set to 2, so that it sits in the top layer of the figure
  • alpha : used to set the transparency of the bars
  • color : sets the colour of the bar based on the colours list defined earlier

We then repeat the process of adding bars in order to add a background fill to the radial bar plot. Instead of setting the height to a value from the table, we can set it to 100 so that it fills the entire area.

The next part of the set involves setting up the labels, subplot titles, and grid colours.

For the lithology labels, we need to create a for loop that will allow us to position the labels at the correct angle around the edge of the polar plot.

Within this loop, we need to check what the current angle is within the loop. If the angle of the bar is less than pi, then 90 degrees is subtracted from the rotation angle. Otherwise, if the bar is in the bottom half of the circle, 90 degrees is added to the rotation angle. This will allow the labels on the left and right-hand sides of the plot to be easily read.

# Loop over each row in the DataFrame
for i, (index, row) in enumerate(df.iterrows()):
ax = fig.add_subplot(grid[i // num_cols, i % num_cols], projection='polar')

bars = ax.bar(x=angles, height=row.values, width=width,
edgecolor='white', zorder=2, alpha=0.8, color=colours[i])

bars_bg = ax.bar(x=angles, height=100, width=width, color='#393d5c',
edgecolor='#25253c', zorder=1)

ax.set_title(index, pad=35, fontsize=22, fontweight='bold', color='white')
ax.set_ylim(0, 100)
ax.set_yticklabels([])
ax.set_xticks([])
ax.grid(color='#25253c')
for angle, height, lith_name in zip(angles, row.values, lith_names):
rotation_angle = np.degrees(angle)
if angle < np.pi:
rotation_angle -= 90
elif angle == np.pi:
rotation_angle -= 90
else:
rotation_angle += 90
ax.text(angle, 110, lith_name.upper(),
ha='center', va='center',
rotation=rotation_angle, rotation_mode='anchor', fontsize=12,
fontweight='bold', color='white')

When we run the code at this point, we get back the following image containing all 8 wells.

Matplotlib figure with radial bar charts displaying lithology percentages for 8 wells from the Norwegian Continental Shelf. Image by the author.

Adding a Scatter Plot as a Subplot

As you can see above, we have a gap within the figure in the bottom right. This is where we will place our scatter plot showing the locations of the wells.

To do this, we can add a new subplot outside of the for loop. As we want this to be the last plot on our figure, we need to subtract 1 from num_rows and num_cols.

We then add the scatter plot to the axis by calling upon ax.scatter() and passing in the X and Y locations from the zechstein_well_intersections dataframe.

The remainder of the code involves adding labels to the x and y axis, setting the tick formatting, and setting the edges (spines) of the scatterplot to white.

As we have 1 well that does not have location information, we can add a small footnote to the scatterplot informing the reader of this fact.

Finally, we need to add the well names as labels so that our readers can understand what each marker is. We can do this as part of a for loop and add the labels to a list.

# Add the scatter plot in the last subplot (subplot 9)
ax = fig.add_subplot(grid[num_rows - 1, num_cols - 1], facecolor='#393d5c')
ax.scatter(zechstein_well_intersections['X_LOC'],
zechstein_well_intersections['Y_LOC'], c=colours, s=60)

ax.grid(alpha=0.5, color='#25253c')
ax.set_axisbelow(True)
ax.set_ylabel('NORTHING', fontsize=12,
fontweight='bold', color='white')
ax.set_xlabel('EASTING', fontsize=12,
fontweight='bold', color='white')

ax.tick_params(axis='both', colors='white')
ax.ticklabel_format(style='plain')
ax.set_title('WELL LOCATIONS', pad=35, fontsize=22, fontweight='bold', color='white')

ax.spines['bottom'].set_color('white')
ax.spines['top'].set_color('white')
ax.spines['right'].set_color('white')
ax.spines['left'].set_color('white')

ax.text(0.0, -0.2, 'Well 16/11-1 ST3 does not contain location information', ha='left', va='bottom', fontsize=10,
color='white', transform=ax.transAxes)

labels = []
for i, row in zechstein_well_intersections.iterrows():
labels.append(ax.text(row['X_LOC'], row['Y_LOC'], row['WELL'], color='white', fontsize=14))

When we run our plotting code, we will have the following figure. We can now see all eight wells represented as a radial bar chart and their locations represented by a scatter plot.

Matplotlib radial bar charts and a scatter plot all within a single figure. Image by the author.

We do have one issue we need to resolve, and that is the positions of the labels. Currently, they are overlapping the data points, the spines and other labels.

We can resolve this by using the adjustText library we imported earlier. This library will work out the best label position to avoid any of these issues.

To use this, all we need to do is call upon adjust_text and pass in the labels list we created in the previous for loop. To reduce the amount of overlap, we can use the expand_points and expand_objects parameters. For this example, a value of 1.2 works well.

adjust_text(labels, expand_points=(1.2, 1.2), expand_objects=(1.2, 1.2))
Scatter plot showing well locations and associated labels after using the adjustText library. Image by the author.

Adding Footnotes and Figure Titles

To finish our infographic, we need to give the reader some extra information.

We will add a footnote to the figure to show where the data was sourced from and who created it.

To help the reader understand what the infographic is about, we can add a title using plt.suptitle and a subtitle using fig.text. This will instantly tell the reader what they can expect when looking at the charts.

footnote = """
Data Source:
Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Surrender, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for
machine learning competition [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4351156

Figure Created By: Andy McDonald
"""

plt.suptitle('LITHOLOGY VARIATION WITHIN THE ZECHSTEIN GP.', size=36, fontweight='bold', color='white')
plot_sub_title = """CHARTS OF LITHOLOGY PERCENTAGES ACROSS 8 WELLS FROM THE NORWEGIAN CONTINENTAL SHELF"""

fig.text(0.5, 0.95, plot_sub_title, ha='center', va='top', fontsize=18, color='white', fontweight='bold')
fig.text(0.1, 0.01, footnote, ha='left', va='bottom', fontsize=14, color='white')

plt.show()

After finishing the plotting code, we will end up with a matplotlib figure like the one below.

Matplotlib infographic showing lithology variation for the Zechstein Group on the Norwegian Continental Shelf. Image by the author.

We have all the radial bar charts on display and where each of the wells is located. This allows the reader to understand any spatial variation between the wells, which in turn may help explain variances within the data.

For example, Well 15/9–13 is located on the area’s western side and is composed of a mixture of dolomite, anhydrite and shale. Whereas well 17/11–1 is located on the easter side of the area and is predominantly composed of halite. This could be attributable to different depositional environments across the region.

The full code for the infographic is displayed below, with each of the main sections commented.

# Set the number of columns for your subplot grid
num_cols = 3

# Get the number of wells (rows in the DataFrame)
num_wells = len(df)

# Calculate the number of rows needed for the subplot grid
num_rows = np.ceil(num_wells / num_cols).astype(int)

indexes = list(range(0, len(lith_names)))
width = 2*np.pi / len(lith_names)
angles = [element * width for element in indexes]

colours = ["#ae1241", "#5ba8f7", "#c6a000", "#0050ae", "#9b54f3", "#ff7d67", "#dbc227", "#008c5c"]

label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(lith_names))

# Create a figure
fig = plt.figure(figsize=(20, num_rows * 7), linewidth=10,
edgecolor='#393d5c',
facecolor='#25253c')

# Create a grid layout
grid = plt.GridSpec(num_rows, num_cols, wspace=0.5, hspace=0.5)

# Loop over each row in the DataFrame to create the radial bar charts per well
for i, (index, row) in enumerate(df.iterrows()):
ax = fig.add_subplot(grid[i // num_cols, i % num_cols], projection='polar')
bars = ax.bar(x=angles, height=row.values, width=width,
edgecolor='white', zorder=2, alpha=0.8, color=colours[i])

bars_bg = ax.bar(x=angles, height=100, width=width, color='#393d5c',
edgecolor='#25253c', zorder=1)

# Set up labels, ticks and grid
ax.set_title(index, pad=35, fontsize=22, fontweight='bold', color='white')
ax.set_ylim(0, 100)
ax.set_yticklabels([])
ax.set_xticks([])
ax.grid(color='#25253c')

#Set up the lithology / category labels to appear at the correct angle
for angle, height, lith_name in zip(angles, row.values, lith_names):
rotation_angle = np.degrees(angle)
if angle < np.pi:
rotation_angle -= 90
elif angle == np.pi:
rotation_angle -= 90
else:
rotation_angle += 90
ax.text(angle, 110, lith_name.upper(),
ha='center', va='center',
rotation=rotation_angle, rotation_mode='anchor', fontsize=12,
fontweight='bold', color='white')

# Add the scatter plot in the last subplot (subplot 9)
ax = fig.add_subplot(grid[num_rows - 1, num_cols - 1], facecolor='#393d5c')
ax.scatter(zechstein_well_intersections['X_LOC'], zechstein_well_intersections['Y_LOC'], c=colours, s=60)
ax.grid(alpha=0.5, color='#25253c')
ax.set_axisbelow(True)

# Set up the labels and ticks for the scatter plot
ax.set_ylabel('NORTHING', fontsize=12,
fontweight='bold', color='white')
ax.set_xlabel('EASTING', fontsize=12,
fontweight='bold', color='white')

ax.tick_params(axis='both', colors='white')
ax.ticklabel_format(style='plain')
ax.set_title('WELL LOCATIONS', pad=35, fontsize=22, fontweight='bold', color='white')

# Set the outside borders of the scatter plot to white
ax.spines['bottom'].set_color('white')
ax.spines['top'].set_color('white')
ax.spines['right'].set_color('white')
ax.spines['left'].set_color('white')

# Add a footnote to the scatter plot explaining missing well
ax.text(0.0, -0.2, 'Well 16/11-1 ST3 does not contain location information', ha='left', va='bottom', fontsize=10,
color='white', transform=ax.transAxes)

# Set up and display well name labels
labels = []
for i, row in zechstein_well_intersections.iterrows():
labels.append(ax.text(row['X_LOC'], row['Y_LOC'], row['WELL'], color='white', fontsize=14))

# Use adjust text to ensure text labels do not overlap with each other or the data points
adjust_text(labels, expand_points=(1.2, 1.2), expand_objects=(1.2, 1.2))

# Create a footnote explaining data source

footnote = """
Data Source:
Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Surrender, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for
machine learning competition [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4351156

Figure Created By: Andy McDonald
"""

# Display overall infographic title and footnote
plt.suptitle('LITHOLOGY VARIATION WITHIN THE ZECHSTEIN GP.', size=36, fontweight='bold', color='white')
plot_sub_title = """CHARTS OF LITHOLOGY PERCENTAGES ACROSS 8 WELLS FROM THE NORWEGIAN CONTINENTAL SHELF"""

fig.text(0.5, 0.95, plot_sub_title, ha='center', va='top', fontsize=18, color='white', fontweight='bold')
fig.text(0.1, 0.01, footnote, ha='left', va='bottom', fontsize=14, color='white')

plt.show()

Infographics are a great way to summarise data and present it to readers in a compelling and interesting way without them having to worry about the raw numbers. It is also a great way to tell stories about your data.

At first, you may not think matplotlib is geared up for creating infographics, but with some practice, time and effort, it is definitely possible.

Training dataset used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). This dataset is licensed under Creative Commons Attribution 4.0 International.

The full dataset can be accessed at the following link: https://doi.org/10.5281/zenodo.4351155.



Source link

Leave a Comment