# PCA (Principal Components Analysis)
# From the Ground Up
Let us start with a dataset $X \in \mathbb{R}^{n \times d}$. This means that we have $n$ points that live in a $d$ dimensional space. In our case the dimensions are the number of nodes. Thus we can think about starting by describing our points in the *node basis*.
The node basis consists of a [Linearly Independent](Linear%20Independence.md) set of basis vectors. They are orthogonal to one another, but may not be a great fit for the data. What do I mean by that?
Consider below. We have an $X$ matrix with $d= 2$. We start by describing our data with respect to the standard basis, $e_1$ and $e_2$, which are indeed orthonormal. However, the data's actually *geometry* isn't a great fit for the standard basis. We might naturally ask? Is there *another* orthonormal basis that we could find that would be tied to the actual, underlying geometry of the data?
That is just what PCA finds: a new orthonormal basis that is *aligned* with the main axis of variation of our data.

Depending on how extreme this is, PCA may find that your data actually has *low-rank structure*. This means that while your data was described with respect to the standard basis, it's variation actually occurs in a *lower dimensional subspace*. A class example might be a 2d plane embedded in 3d space, or a line embedded in 2 dimensional space, shown below:

So we can state the objective of PCA quite nicely:
> **PCA** finds an orthonormal basis that is tied to the geometry of your data, where each basis vector also has a number attached to it describing how much variance it captures.
Now how does it work? Mathematically we simply take $X$ (assume it is centered), compute it's covariance matrix:
$\Sigma = \frac{1}{n - 1} X_c^\top X_c \quad \text{(shape: } d \times d \text{)}$
Then perform an eigen decomposition:
$\Sigma = V \Lambda V^\top$
Where $V$ is a matrix whose columns are eigenvectors, $\Lambda$ is a diagonal matrix of eigenvalues. We can interpret this decomposition as a [Change of Basis](Change%20of%20Basis%20Linear%20Algebra.md).
1. $V^T$ *rotates* from the standard basis to the PCA basis
2. $\Lambda$ is a diagonal matrix in the PCA basis
3. $V$ rotates back to the PCA basis
In this way our eigen decomposition has taken a positive definite matrix, $\Sigma$, and decomposed it into a rotation, followed by a scaling, followed by a rotation.
Recall that a matrix's columns can be thought of as describe *where the basis vectors land* after the transformation. Given this, how should we interpret $V$ and $V^T$?
$
V =
\begin{bmatrix}
v_{11} & v_{12} & \cdots & v_{1d} \\
v_{21} & v_{22} & \cdots & v_{2d} \\
\vdots & \vdots & \ddots & \vdots \\
v_{d1} & v_{d2} & \cdots & v_{dd}
\end{bmatrix}
\quad\text{and}\quad
V^\top =
\begin{bmatrix}
v_{11} & v_{21} & \cdots & v_{d1} \\
v_{12} & v_{22} & \cdots & v_{d2} \\
\vdots & \vdots & \ddots & \vdots \\
v_{1d} & v_{2d} & \cdots & v_{dd}
\end{bmatrix}
$
Let's say $V$ was:
$V = \begin{bmatrix} 0.6 & -0.7 & 0.4 \\ 0.6 & 0.7 & -0.3 \\ 0.5 & 0.0 & 0.86 \end{bmatrix}$
Each column represents an *eigenvector*, $v_i$. We need a way to *describe* that eigenvector, and $V$ does that by encoding it as a linear combination of the standard basis. That *is* what the first column represents for instance:
$v_1 = 0.6 e_1 + 0.6 e_2 + 0.5 e_3$
The same goes for the other columns:
$v_2 = -0.7 e_1 + 0.7 e_2 + 0.0 e_3$
$v_3 = 0.4 e_1 - 0.3 e_2 + 0.86 e_3$
In this way we can see that $V$ can take in linear combinations (vectors) described in the eigen basis, and return their representations in the original standard basis:
$V: \text{eigen basis} \rightarrow \text{standard basis}$
In that case, how do we go the other direction—from the standard basis to the eigen basis? Well that will require that each column corresponds to a standard basis vector, where the column values represent where that standard basis vector lands in the eigen basis. In other words, we need to represent $e_1$ as a linear combination of $v_i