# Utilising pykrige and matplotlib for Spatial Visualisation of Geological Variations | by Andy McDonald | Jun, 2023

## Exploring Spatial Geological Variation from Well Log Measurements

When working with geological and petrophysical data, we often want to understand how that data changes over our field or region. One of the ways we can do this is to grid our actual measurement values and extrapolate what those values may be in other areas that have yet to be explored using boreholes.

One particular method for carrying out this extrapolation is kriging, a geostatistical procedure named after South African mining engineer Danie G. Krige. The idea behind kriging lies in its estimation technique: it uses spatial correlation between observed data to predict values at unmeasured locations.

By gauging how variables change over a distance, this method establishes a statistical relationship that can be used to predict values across an area, transforming scattered data points into a coherent spatial map.

Within this tutorial, we will look at a Python library called pykrige. This library has been designed for 2D and 3D kriging calculations and is easy to use with well log data.

First, we need to import the libraries we are going to need. For this article, we will require the following libraries:

• pandas — to read our data, which is in `csv` format
• matplotlib to create our visualisation
• pykrige to carry out the kriging
• numpy for some numerical calculations
`import pandas as pdimport matplotlib.pyplot as pltfrom pykrige import OrdinaryKrigingimport numpy as np`

Once we have imported the libraries, we can now import our data.

Within this tutorial, we will be using a dataset derived from the Xeek and Force 2020 Machine Learning competition for predicting lithology from well log measurements. Details of this dataset can be found at the bottom of this article.

This subset of the competition dataset contains 65 well locations with average acoustic compressional slowness measurements for the Balder Formation.

To read our data we can use the pandas `read_csv()` function, and pass in the location of the datafile. In this example, we use a path relative to our Jupyter Notebook, but we could use an absolute path if our file is located elsewhere.

`df = pd.read_csv('Data/Xeek Force 2020/Xeek_2020_Balder_DTC_AVG.csv')df`

When we view the dataframe, we will see that we have 65 wells, which contain the location of the top of the Balder Formation (X_LOC and Y_LOC for grid coordinates, and LAT & LON for latitude and longitude). We also have the True Vertical Depth Sub Sea (TVDSS) at which the formation was encountered, and the mean value for acoustic compressional slowness (DTC).

Now that our data has been successfully loaded into a dataframe, we can visualise our data to understand where our wells are located. To do this we will use matplotlib’s scatter plot and pass in the longitude and latitude columns.

`plt.scatter(df['Longitude'], df['Latitude'], c=df['DTC'])`

When we run the above code, we get the following plot.

We can see the above figure is very basic, with no colourbar or axis labels.

Let’s modify the plot slightly by adding these features to it.

`cm = plt.cm.get_cmap('viridis')plt.figure(figsize=(10,10))scatter = plt.scatter(df['LON'], df['LAT'], c=df['DTC_MEAN'], cmap=cm, s=50)plt.colorbar(scatter)plt.xlabel('Longitude')plt.ylabel('Latitude')plt.show()`

When we run the above code, we get back the following figure, which tells us more about our data. We can use the colourbar to estimate our point values.

To better understand our data points and how the DTC measurement varies across the area for the Balder Formation, we can use kriging and our data points to fill in the gaps between our measured values.

To do this, we need to create an `OrdinaryKriging` object from the pykrige library.

Into this object we pass our location data for x and y, and the data we want to map to the z parameter.

We also need to select what variogram model we want to use. In this case, we will use an exponential model. More details on the model types can be found in the documentation.

As we are using latitude and longitude for our x and y coordinates, we can change the coordinates_type parameter to `geographic`

`OK = OrdinaryKriging(x=df['LON'], y=df['LAT'], z=df['DTC_MEAN'],variogram_model='exponential',verbose=True, enable_plotting=True,coordinates_type='geographic')`

When we run the above, we return the following model summary and semi-variogram.

Here is a short rundown of the parameters that are returned:

• Nugget: The nugget is the y-intercept of the variogram, representing the variance at zero distance, often due to measurement errors or very small-scale variations.
• Full Sill: The sill is the maximum variance the variogram reaches and begins to level off, which happens when the points are very far apart.
• Range: The range is the distance at which the variogram reaches the sill, meaning the distance beyond which further separation of points does not increase the variance.
• Partial Sill: The partial sill is the difference between the sill and the nugget, representing the amount of variance that is spatially structured in the data.

This can give us an understanding of how suitable our model is for the data based on the shape of the generated line and points.

To begin displaying our data, we need to create a data grid.

To do this, we first create arrays for the latitudes and longitudes between the coordinates we define. In this case, we would like the chart to extend from 57.5 degrees N to 62 degrees N and from 1.5 degrees E to 4.5 degrees E.

Using `np.arange` will allow us to create these arrays at regular spacing.

`grid_lat = np.arange(57.5, 62, 0.01, dtype='float64')grid_long = np.arange(1.5, 4.5, 0.01,dtype='float64')`

Now that we have the X & Y coordinates, we can create our grid of values. For this, we call upon `OK.execute`, and pass in our latitude and longitude arrays.

`zstar, ss = OK.execute('grid', grid_long, grid_lat)`

This will return two arrays. Our data grid (zstar) and the uncertainty associated with it (ss)

Next, we can now use our data array and plot it using matplotlib’s `imshow`.

`import matplotlib.pyplot as pltfig, ax = plt.subplots(figsize=(10,10))image = ax.imshow(zstar, extent=(1.5, 4.5, 57.5, 62), origin='lower')ax.set_xlabel('Longitude', fontsize=14, fontweight='bold')ax.set_ylabel('Latitude', fontsize=14, fontweight='bold')scatter = ax.scatter(x=df['LON'], y=df['LAT'], color='black')colorbar = fig.colorbar(image)colorbar.set_label('DTC (us/ft)', fontsize=14, fontweight='bold')plt.show()`

When we run this, we get back the following map showing the variation in acoustic compressional slowness for the Balder Formation across our 65 wells.

We can see that around 59 to 60 degrees N we have much faster rocks, and in the North East and South West regions we have much slower rocks.

To interpret this, we would need to understand how deep the formation is at each of these wells. This will allow us to identify whether the difference relates to burial and compaction or other geological processes.

We will see how we can do this in a future article.

## Visualising Kriging Uncertainty

One of the key things when looking at data like this is to understand the uncertainty associated with the kriging.

We can do this by resuing the same plotting code, and instead of `zstar` being passed in, we can swap it for the `ss` variable we created earlier.

`fig, ax = plt.subplots(figsize=(10,10))image = ax.imshow(ss, extent=(1.5, 4.5, 57.5, 62), origin='lower')ax.set_xlabel('Longitude', fontsize=14, fontweight='bold')ax.set_ylabel('Latitude', fontsize=14, fontweight='bold')scatter = ax.scatter(x=df['LON'], y=df['LAT'], color='black')colorbar = fig.colorbar(image)colorbar.set_label('DTC (us/ft)', fontsize=14, fontweight='bold')plt.show()`

With the following plot, we are able to see the areas where we have a high or low uncertainty.

In areas where we have less coverage from the wells, we will have a much higher uncertainty, whereas in areas where we have multiple wells, our uncertainty will be much lower.

Within this tutorial, we have seen how we can take average values for a well log measurement (DTC) and map them across an entire region. This allows us to understand the trends in our data over a geographical area.

However, when looking at this data, we must bear in mind that we are looking at a 2D surface rather than a more complex 3D structure, which we encounter within the subsurface. Therefore, variations in measurement could be attributable to variations in depth.

The dataset used in this article is a subset of a training dataset used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). It is released under a NOLD 2.0 licence from the Norwegian Government, details of which can be found here: Norwegian Licence for Open Government Data (NLOD) 2.0. The full dataset can be accessed here.

The full reference for the dataset is:

Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Surrender, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for machine learning competition [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4351156