3D Deep Learning Python Tutorial for Segmentation

We notice that out of our 32,080,350 points, 23,131,067 belong to the ground (72%), 7,440,825 to the vegetation (23%), 1,146,575 to buildings (4%), 191,039 to water (less than 1%), and the remaining 170,844 are not labeled (class 0). This will be very interesting because we are in this specific imbalance case with predominant classes.

Now that we have analyzed what our point cloud contains and refined the labels, we can dive into feature selection.

When using the PointNet architecture for 3D point cloud semantic segmentation, feature selection is essential in preparing the data for training.

In traditional machine learning methods, feature engineering is often required to select and extract relevant features from the data. However, this step can be avoided with deep learning methods like PointNet since the model can learn to extract features from the data automatically.

However, ensuring that the input data contains the necessary information for the model to learn relevant and deducted features is still essential. We use seven features: X, Y, Z (spatial attributes), R, G, B (radiometric attributes), and the intensity I (LiDAR-derived).

X                Y              Z          R  G   B  INTENSITY
205980.49800000 465875.02398682 7.10500002 90 110 98 1175.000000
205980.20100001 465875.09802246 7.13500023 87 107 95 1115.000000
205982.29800010 465875.00000000 7.10799980 90 110 98 1112.000000

This is our reference. It means that we will build our model with this input, and any other dataset we would like to process with the trained PointNet model must contain these same features. Before moving within Python, the last step is to structure the data according to the Architecture specifications.

For several reasons, structuring a 3D point cloud into square tiles is essential when processing it with the neural network architecture PointNet.

The tile definition within this workflow is to train PointNet 3D Deep Learning Architecture. © F. Poux

First, PointNet requires the input data to be of fixed size, meaning that all input samples should have the same number of points. By dividing a 3D point cloud into square tiles, we can ensure that each tile has a more homogeneous number of points, allowing PointNet to process them consistently and effectively without the extra overhead or unreversible loss when sampling to the final fixed-point number.

Example of the impact of sampling strategies on the 3D Point Cloud dataset. © F. Poux

🌱 Growing: With PointNet, we need to have the input tile to a fixed number of points, recommended at 4096 points by the original paper’s authors. This means that a sampling strategy will be needed (which is not done in CloudCompare). As you can see from the illustration above, sampling the point cloud with different strategies will yield different results and object identification capabilities (E.,g. the electrical pole on the right). Do you think this impacts the 3D Deep Learning architecture performances?

Secondly, PointNet’s architecture involves a shared multi-layer perceptron (MLP) applied to each point independently, which means that the Network processes each point in isolation from its neighbors. By structuring the point cloud into tiles, we can preserve the local context of each point within its tile while still allowing the Network to process points independently, enabling it to extract some meaningful features from the data.

The resulting 3D point cloud tile. © F. Poux

Finally, structuring the 3D point cloud into tiles can also improve the computational efficiency of the neural Network, as it allows for parallel processing of the tiles, reducing the overall processing time required to analyze the entire point cloud (on GPU).

We use the “Cross Section” tool (1) to achieve this feat. We set up the size to 100 meters (2), we then shift along X and Y (minus) to get as close as possible to the lowest corner of the initial tile (3), we use the multiple slice button (4), we repeat along and Y axis (5) and get the resulting square tiles (6), as shown below.

The process to automate the tile creation within CloudCompare. © F. Poux
The live process to automate the tile creation within CloudCompare. © F. Poux

This allows defining tiles of approximately one hundred meters by one hundred meters along X and Y axes. We obtain 143 tiles, from which we discard the last 13 tiles, as they could be more representative of what we want our input to be (i.e., they are not square because they are on the edge). With the remaining 130 tiles, we choose a good (around 20%) of representative tiles (holding Shift + Selection), as shown below.

Selection and manual split into training and testing set for PointNet. © F. Poux

🌱 Growing: We split our data between training and testing following an 80/20 percent scheme. At this stage, what do you think about this approach? What would be, in your opinion, a good strategy?

At the end of this process, we have around 100 tiles in the train set and 30 tiles in the test set, each holding the original number of points. We then select one folder and export each tile as an ASCII file, as shown below.

Exporting the point cloud tiles to use with PointNet

🦚 Note: CloudCompare allows to export all the point clouds independently within a directory when choosing to export as an ASCII file. He will automatically indent after the last character, using a “_” character to ensure consistency. This is very handy and can be used/abused.

Structuring a 3D point cloud into square tiles is an essential preprocessing step when using PointNet. It allows for consistent input data size, preserves local context, and improves computational efficiency, all of which contribute to more accuracy and efficient processing of the data. This is the final step before moving on to 3D Python 🎉.

It is time to ingest the point cloud tiles in Python.

For this, we import the libraries that we need. If you use the Google Colab version accessible here: 💻 Google Colab Code, then it is important to run the first line as shown below:

from google.colab import drive

For any setup, we have to import the various libraries as illustrated below:

#Base libraries
import numpy as np
import random
import torch
#Plotting libraries
%matplotlib inline
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
import plotly
import plotly.graph_objects as go
#Utilities libraries
from glob import glob
import os

Great! From there, we split the datafile names in our respective folders in pointcloud_train_files, and pointcloud_test_files

#specify data paths and extract filenames
project_dir="gdrive/My Drive/_UTWENTE/DATA/AHN4_33EZ2_12"
pointcloud_train_files = glob(os.path.join(project_dir, "train/*.txt"))
pointcloud_test_files = glob(os.path.join(project_dir, "test/*.txt"))

🦚 Note: we have two folders in our explorer: the train folder, and the test folder, both in the AHN4_33EZ2_12 folder. What we do here is first to give the path to the root folder, and then, we will collect with glob all the files in train and test with the * that means “select all.” A convenient way to deal with multiple files!

At this step, two variables hold the paths to all the tiles we prepared. To ensure that this is correct, we can print one element taken randomly from a 0 to 20 random distribution:

>> gdrive/My Drive/_UTWENTE/DATA/AHN4_33EZ2_12/train/AHN4_33EZ2_12_train_000083.txt

Great, so what we can do, is thus to split further our dataset into three variables:

  • valid_list: This holds the validation data paths. The validation split helps to improve the model performance by fine-tuning the model after each epoch.
  • train_list: This holds the training data paths, which is the data set on which the training takes place.
  • test_list: This holds the test data paths. The test set informs us about the final accuracy of the model after completing the training phase

This is done using the friendly numpy functions that work on the array indexes from the list. Indeed, we randomly extract 20% from the pointcloud_train_files, then split what is retained for validation vs. what is not retained and constitutes the train_list variable.

#Prepare the data in a train set, a validation set (to tune the model parameters), and a test set (to evaluate the performances)
#The validation is made of a random 20% of the train set.
valid_index = np.random.choice(len(pointcloud_train_files),int(len(pointcloud_train_files)/5), replace=False)
valid_list = [pointcloud_train_files[i] for i in valid_index]
train_list = [pointcloud_train_files[i] for i in np.setdiff1d(list(range(len(pointcloud_train_files))),valid_index)]
test_list = pointcloud_test_files
print("%d tiles in train set, %d tiles in test set, %d files in valid list" % (len(train_list), len(test_list), len(valid_list)))

We then randomly study the properties of one data file by looking at the median, the standard deviation, and the min-max values with the following snippet:


Which gives:

gdrive/My Drive/_UTWENTE/DATA/AHN4_33EZ2_12/train/AHN4_33EZ2_12_train_000083.txt 
median [2.068e+05 4.659e+05 6.628e+00 1.060e+02 1.210e+02 1.030e+02 1.298e+03 1.000e+00]
std [ 28.892 30.155 0.679 29.986 21.4 17.041 189.388 0.266]
min [2.068e+05 4.659e+05 5.454e+00 3.600e+01 6.200e+01 5.700e+01 7.700e+01 0.000e+00]
max [2.068e+05 4.660e+05 1.505e+01 2.510e+02 2.470e+02 2.330e+02 1.625e+03 4.000e+00]

As we can notice, there is one central element that we have to address: data normalization. Indeed, so to avoid any mismatch, we need to be in this “canonical space,” which means we can replicate the same experimental context in the feature space. Using a T-Net would be like killing a fly 🪰 with a bazooka. This is fine, but if we can avoid and use an actual coherent approach, it would be smarter 😁.

Normalizing a 3D point cloud tile before feeding it to the PointNet architecture is crucial for three main reasons. First, normalization ensures that the input data is centered around the origin, which is essential for PointNet’s architecture which applies an MLP to each point independently. The MLP is more effective when the input data is centered around the origin, which allows for more meaningful feature extraction and better overall performance.

Illustration of the normalization impact on the results of training 3D Deep Learning Models. © F. Poux

🌱 Growing: Some sort of intuition is also good before normalizing bluntly. For example, we predominantly use gravity-based scenes, meaning that the Z-axis is almost always colinear to the Z-axis. Therefore, how would you approach this normalization?

Secondly, normalization scales the point cloud data to a consistent range, which helps prevent saturation of the activation functions within the MLP. This allows the Network to learn from the entire range of input values, improving its ability to accurately classify or segment the data.

The problem with [0,1] scaling illustrated on the intensity field of 3D point clouds. © F. Poux

Finally, normalization can help reduce the impact of different scales in the point cloud data, which can be caused by differences in sensor resolutions or distances from the scanned objects (which is a bit flattened in the case of Aerial LiDAR data). This improves the consistency of the data and the Network’s ability to extract meaningful features from it.

Okay, let us get on it. For our experiments, we will first capture the minimum value of the features in min_f, and the average in mean_f:


🦚 Note: We transposed our dataset to handle the data and the indexes much more efficiently and conveniently. Therefore, to take the X-axis elements of the point cloud, we can just pass cloud_data[0] instead of cloud_data[:,0], which involves a bit of overhead.

We will now normalize the different features to use in our PointNet networks. First, the spatial coordinates X, Y, and Z. We will center our data on the planimetric axis (X and Y) and ensure that we subtract the minimal value of Z to account for discrimination between roofs and ground, for example:

n_coords = cloud_data[0:3]
n_coords[0] -= mean_f[0]
n_coords[1] -= mean_f[1]
n_coords[2] -= min_f[2]

Great, now we can scale our colors by ensuring we are in a [0,1] range. This is done by dividing the max value (255) for all our colors:

colors = cloud_data[3:6]/255

Finally, we will attack the normalization of the intensity feature. Here, we will work with quantiles to obtain a normalization robust to outliers, as we saw when exploring our data. This is done in a three-stage process. First, we compute the interquartile difference IQR, which is the difference between the 75th and 25th quantile. Then we subtract the median from all the observations and divide by the interquartile difference. Finally, we subtract the minimum value of the intensity to have a significant normalization:

# The interquartile difference is the difference between the 75th and 25th quantile
IQR = np.quantile(cloud_data[-2],0.75)-np.quantile(cloud_data[-2],0.25)
# We subtract the median to all the observations and then divide by the interquartile difference
n_intensity = ((cloud_data[-2] - np.median(cloud_data[-2])) / IQR)
#This permits to have a scaling robust to outliers (which is often the case)
n_intensity -= np.min(n_intensity)

Wonderful! At this stage, we have a point cloud normalized and ready to be fed to a PointNet architecture. But automating this process is the next logical step to execute this on all the tiles.

Creating a Point Cloud Tile Load and Normalize function

We create a function cloud_loader that takes as input the path to a tile tile_path, and a string of features used, features_used, and outputs a cloud_data variable, which holds the normalized features, along with its ground-truth variable gt that holds the labels of each point. The function will act as follows:

The definition of a cloud loading function to process point cloud datasets and make them ready for training. © F. Poux

This translates into a simple cloud_loader function, as shown below:

# We create a function that loads and normalize a point cloud tile
def cloud_loader(tile_path, features_used):
cloud_data = np.loadtxt(tile_path).transpose()
if 'xyz' in features_used:
n_coords = cloud_data[0:3]
n_coords[0] -= mean_f[0]
n_coords[1] -= mean_f[1]
n_coords[2] -= min_f[2]
if 'rgb' in features_used:
colors = cloud_data[3:6]/255
if 'i' in features_used:
IQR = np.quantile(cloud_data[-2],0.75)-np.quantile(cloud_data[-2],0.25)
n_intensity = ((cloud_data[-2] - np.median(cloud_data[-2])) / IQR)
n_intensity -= np.min(n_intensity)

gt = cloud_data[-1]
gt = torch.from_numpy(gt).long()

cloud_data = torch.from_numpy(np.vstack(features))
return cloud_data, gt

This function is now used to obtain both point cloud features and labels as follows:

pc, labels = cloud_loader(tile_selected, ‘xyzrgbi’)

🌱 Growing: As you can see, we pass a string for the features. This is very convenient for our different ‘if’ tests indeed. However, note that we do not return errors if what is fed to the function is not expected. This is not a standard code practice, but this extends the scope of this tutorial. I recommend checking out PEP-8 guidelines if you want to start with beautiful code writing.

If we want to parallel a previous article, accessible here:

We can visualize our dataset with Open3D. First, we need to install a specific version (if working on a Jupyter Notebook environment, such as Google Colab or the CRIB platform) and load it in our script:

!pip install open3d==0.16
import open3d as o3d

🦚 Note: The “!” before pip is used when you work on Google Colab to say it should use the environment console directly. If you work locally, you should delete this character and use pip install open3d==0.16 directly.

Then we run the following successive steps :

The drawing function to plot interactive 3D scenes directly within Google Colab. © F. Poux

Which translates into the following code lines:

pc, gt = cloud_loader(tile_selected, ['xyz','rgb','i'])
o3d.visualization.draw_plotly([pcd],point_sample_factor=0.5, width=600, height=400)

🦚Note: As our pc variable capturing the cloud_data output of our cloud_loader function is transposed, we must not forget to transpose it back when plotting with open3d.

The previous code snippet will output the following visualization:

The results of plotting scenes with plotly and R,G,B fields. © F. Poux

🦚 Note: when using the draw_plotly function, we do not have a direct hand in the scaling of the plot, and we can notice that we have non-equal scales for X, Y, and Z, which emphasizes the Z largely in that case. 😁

Due to the limitations that you can notice, we create a custom visualization function to visualize a random tile so that running the function: visualize_input_tile outputs an interactive plotly visualization that lets us switch the rendering mode.

To test the provided function, we first need to define the class names in our experiments: class_names = [‘unclassified’, ‘ground’, ‘vegetation’, ‘buildings’, ‘water’]. Then, we provide the cloud features cloud_features=’xyzi’, randomly select a point cloud captured in the variable selection, and visualize the tile. This translates into the following code snippet:

class_names = ['unclassified', 'ground', 'vegetation', 'buildings', 'water']
visualize_input_tile(selection, class_names, cloud_features, sample_size=20000)

which outputs the interactive scene below.

interactive 3D scene within Google Colab. © F. Poux

🦚 Note: You can use the button to switch rendering modes between the feature intensity and the labels from the loaded features of interest.

We have a working solution for loading, normalizing, and visualizing a single tile in Python. The last step is to create what we call a tensor for use with the PointNet Architecture.

I want to show you how we can use PyTorch as an initiation. For clarity concerns, let me quickly define the primary Python object type we manipulate with this library: a tensor.

A PyTorch tensor is a multi-dimensional array used for storing and manipulating data in PyTorch. It is similar to a NumPy array but with the added benefit of being optimized for use with deep learning models. Tensors can be created using the torch.tensor() function and initialized with data or created as an empty tensor with a specified shape. For example, to create a 3×3 tensor with random data:

import torch

x = torch.tensor([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]])


which will output:

tensor([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])

Pretty straightforward, hun? Now, to make things easier for us, there is also a tiny Pytorch library that we can use to prepare lists of datasets. This library is called TorchNet. TorchNet is designed to simplify the process of building and training complex neural network architectures by providing a set of predefined modules and helper functions for everyday tasks such as data loading, validation, and testing.

One of the main advantages of TorchNet is its modular design, which allows users to easily construct complex neural network architectures by combining a series of pre-built modules. This can save significant time and effort compared to building neural networks from scratch, especially when new to deep learning.

🦚 Note: Besides its modular design, TorchNet provides several helper functions for common deep learning tasks, such as data augmentation, early stopping, and model checkpointing. This can help users to achieve better results and optimize their neural network architectures more efficiently.

To install torchnet version 0.0.4 and import it into our script, we can do the following:

!pip install torchnet==0.0.4
import torchnet as tnt

We also import another utility module called functools. This module is for higher-order functions that act on or return other functions. For this, add to the import statements import functools.

In general, any callable object can be treated as a function for the purposes of this module. From this additional setup, it is straightforward to generate the train, validation, and test sets with the following four lines of code:

test_set = tnt.dataset.ListDataset(test_list,functools.partial(cloud_loader, features_used=cloud_features))
train_set = tnt.dataset.ListDataset(train_list,functools.partial(cloud_loader, features_used=cloud_features))
valid_set = tnt.dataset.ListDataset(valid_list,functools.partial(cloud_loader, features_used=cloud_features))

Now, if we would like to explore, you can use indexes like a classical numpy array to retrieve a tensor on a specific position, such as train_set[1], which outputs:

Finally, we have to save our results to a Python object to use straight out of the box for the following steps, such as PointNet training. We are using the library pickle, which is handy for saving Python objects. To save an object, just run the following:

import pickle
f = open(project_dir+"/data_prepared.pckl", 'wb')
pickle.dump([test_list, test_set, train_list, train_set, valid_list, valid_set], f)

If you want to test your setup, you can also run the following lines of code and ensure that you retrieve back what you intend:

f = open(project_dir+"/data_prepared.pckl", 'rb')
test_list_t, test_set_t, train_list_t, train_set_t, valid_list_t, valid_set_t = pickle.load(f)

Source link

Leave a Comment