Principal Component Analysis (PCA) is an important technique to understand in the fields of statistics and data science. It is a process of computing the principal components and utilising then to perform a change of basis on the data. For the purpose of visualisation, it is very hard to visulaise and understand the data in high dimensions, this is where PCA comes to the rescue.
Introduction
Principal Component Analysis or (PCA) is a widely used technique for dimensionality reduction of the large data set. It needs the knowledge of some linear algebra, such as vector projection, eigenvalues and eigenvectors, Lagrange multipliers, derivatives of a matrix, and covariance matrix.
Derivation
Let’s go ahead and get into it. Let’s say we have N different vectors
It looks like a scary and daunting task, let’s take a whole step back and look at a simpler picture. Our objective is that we want to maximise the variance of the projections onto some dimensional space. In other words, we have this D dimensional space that contains all this information and all this data, we want to reduce its dimensionality, but we want to do it in a clever way, we want to do it onto a space so that we preserve as much of the information (original variation) while reducing the dimensions.
Projection
The projection of
where u is a unit vector, so its length is 1, and we can get
Finally, we know that our mean of projections among all the data is
Variance
Going back to our objective, our goal is to maximise the variance of the projected data. By the definition of the variance,
Next,
And we expand it,
Next,
where
Lagrange Multipler
In this part, we want to maximise the variance of the projections which as we found is
We are just going to take the derivative of this line with respect to vector
We get
This means that for u, whatever direction we choose to project on is going to have to be an eigenvector of the covariance matrix S, because this is exactly the definition of an eigenvector. But there’s lots of eigenvectors and eigenvalues, what eigenvector and what eigenvalue should we use?
Eigenvectors and Eigenvalues
To figure this out, we know that
If we want the maximum value of
To be more general, if we want to project the data onto more than just one dimension, we have to figure out what is the second biggest eigenvalue, and we use the second eigenvector corresponding to the second biggest eigenvalue, etc. You just go down in line for whatever many different components you want to end up in.
Conclusion
In this article, we understand the moving parts behind Principal Component Analysis (PCA), I believe this will give you some insight into what’s actually happening. I hope you are able to follow this article, stay tuned! Bye!