Feature Transformations: A Tutorial on PCA and LDA | by Pádraig Cunningham | Jul, 2023

Reducing the dimension of a dataset using methods such as PCA

Photo by Nicole Cagnina on Unsplash


When dealing with high-dimension data, it is common to use methods such as Principal Component Analysis (PCA) to reduce the dimension of the data. This converts the data to a different (lower dimension) set of features. This contrasts with feature subset selection which selects a subset of the original features (see [1] for a turorial on feature selection).

PCA is a linear transformation of the data to a lower dimension space. In this article we start off by explaining what a linear transformation is. Then we show with Python examples how PCA works. The article concludes with a description of Linear Discriminant Analysis (LDA) a supervised linear transformation method. Python code for the methods presented in that paper is available on GitHub.

Linear Transformations

Imagine that after a holiday Bill owes Mary £5 and $15 that needs to be paid in euro (€). The rates of exchange are; £1 = €1.15 and $1 = €0.93. So the debt in € is:

Here we are converting a debt in two dimensions (£,$) to one dimension (€). Three examples of this are illustrated in Figure 1, the original (£5, $15) debt and two other debts of (£15, $20) and (£20, $35). The green dots are the original debts and the red dots are the debts projected into a single dimension. The red line is this new dimension.

A depiction of example currency conversions (£,$ -> €).
Figure 1. An illustration of how converting £,$ debts to € is a linear transformation. Image by author.

On the left in the figure we can see how this can be represented as matrix multiplication. The original dataset is a 3 by 2 matrix (3 samples, 2 features), the rates of exchange form a 1D matrix of two components and the output is a 1D matrix of 3 components. The exchange rate matrix is the transformation; if the exchange rates are changed then the transformation changes.

We can perform this matrix multiplication in Python using the code below. The matrices are represented as numpy arrays; the final line calls the dot method on the cur matrix to perform matrix multiplication (dot product). This…

Source link

Leave a Comment