Building PCA from the Ground Up. Supercharge your understanding of… | by Harrison Hoffman | Aug, 2023


Supercharge your understanding of Principal Component Analysis with a step-by-step derivation

Hot air balloons. Image by Author.

Principal Component Analysis (PCA) is an old technique commonly used for dimensionality reduction. Despite being a well-known topic among data scientists, the derivation of PCA is often overlooked, leaving behind valuable insights about the nature of data and the relationship between calculus, statistics, and linear algebra.

In this article, we will derive PCA through a thought experiment, beginning with two dimensions and extending to arbitrary dimensions. As we progress through each derivation, we will see the harmonious interplay of seemingly distinct branches of mathematics, culminating in an elegant coordinate transformation. This derivation will unravel the mechanics of PCA and reveal the captivating interconnectedness of mathematical concepts. Let’s embark on this enlightening exploration of PCA and its beauty.

As humans living in a three-dimensional world, we generally grasp two-dimensional concepts, and this is where we will begin in this article. Starting in two dimensions will simplify our first thought experiment and allow us to better understand the nature of the problem.

Theory

We have a dataset that looks something like this (note that each feature should be scaled to have a mean of 0 and variance of 1):

(1) Correlated Data. Image by Author.

We immediately notice this data lies in a coordinate system described by x1 and x2, and these variables are correlated. Our goal is to find a new coordinate system informed by the covariance structure of the data. In particular, the first basis vector in the coordinate system should explain the majority of the variance when projecting the original data onto it.

Our first order of business is to find a vector such that when we project the original data onto the vector, the maximum amount of variance is preserved. In other words, the ideal vector points in the direction of maximal variance, as defined by the…



Source link

Leave a Comment