Principal Component Analysis (PCA) is an old technique commonly used for dimensionality reduction. Despite being a well-known topic among data scientists, the derivation of PCA is often overlooked, leaving behind valuable insights about the nature of data and the relationship between calculus, statistics, and linear algebra.

In this article, we will derive PCA through a thought experiment, beginning with two dimensions and extending to arbitrary dimensions. As we progress through each derivation, we will see the harmonious interplay of seemingly distinct branches of mathematics, culminating in an elegant coordinate transformation. This derivation will unravel the mechanics of PCA and reveal the captivating interconnectedness of mathematical concepts. Let’s embark on this enlightening exploration of PCA and its beauty.

As humans living in a three-dimensional world, we generally grasp two-dimensional concepts, and this is where we will begin in this article. Starting in two dimensions will simplify our first thought experiment and allow us to better understand the nature of the problem.

## Theory

We have a dataset that looks something like this (note that each feature should be scaled to have a mean of 0 and variance of 1):

We immediately notice this data lies in a coordinate system described by ** x1** and

**, and these variables are correlated.**

*x2**Our goal is to find a new coordinate system informed by the covariance structure of the data.*In particular, the first basis vector in the coordinate system should explain the majority of the variance when projecting the original data onto it.

Our first order of business is to find a vector such that when we project the original data onto the vector, the maximum amount of variance is preserved. In other words, the ideal vector points in the direction of maximal variance, as defined by the…