PCA - Nate's Notes

# PCA (Principal Components Analysis) # From the Ground Up Let us start with a dataset $X \in \mathbb{R}^{n \times d}$. This means that we have $n$ points that live in a $d$ dimensional space. In our case the dimensions are the number of nodes. Thus we can think about starting by describing our points in the *node basis*. The node basis consists of a [Linearly Independent](Linear%20Independence.md) set of basis vectors. They are orthogonal to one another, but may not be a great fit for the data. What do I mean by that? Consider below. We have an $X$ matrix with $d= 2$. We start by describing our data with respect to the standard basis, $e_1$ and $e_2$, which are indeed orthonormal. However, the data's actually *geometry* isn't a great fit for the standard basis. We might naturally ask? Is there *another* orthonormal basis that we could find that would be tied to the actual, underlying geometry of the data? That is just what PCA finds: a new orthonormal basis that is *aligned* with the main axis of variation of our data. ![center | 400](Pasted%20image%2020250605133931.png) Depending on how extreme this is, PCA may find that your data actually has *low-rank structure*. This means that while your data was described with respect to the standard basis, it's variation actually occurs in a *lower dimensional subspace*. A class example might be a 2d plane embedded in 3d space, or a line embedded in 2 dimensional space, shown below: ![center | 400](Pasted%20image%2020250605134538.png) So we can state the objective of PCA quite nicely: > **PCA** finds an orthonormal basis that is tied to the geometry of your data, where each basis vector also has a number attached to it describing how much variance it captures. Now how does it work? Mathematically we simply take $X$ (assume it is centered), compute it's covariance matrix: $\Sigma = \frac{1}{n - 1} X_c^\top X_c \quad \text{(shape: } d \times d \text{)}$ Then perform an eigen decomposition: $\Sigma = V \Lambda V^\top$ Where $V$ is a matrix whose columns are eigenvectors, $\Lambda$ is a diagonal matrix of eigenvalues. We can interpret this decomposition as a [Change of Basis](Change%20of%20Basis%20Linear%20Algebra.md). 1. $V^T$ *rotates* from the standard basis to the PCA basis 2. $\Lambda$ is a diagonal matrix in the PCA basis 3. $V$ rotates back to the PCA basis In this way our eigen decomposition has taken a positive definite matrix, $\Sigma$, and decomposed it into a rotation, followed by a scaling, followed by a rotation. Recall that a matrix's columns can be thought of as describe *where the basis vectors land* after the transformation. Given this, how should we interpret $V$ and $V^T$? $ V = \begin{bmatrix} v_{11} & v_{12} & \cdots & v_{1d} \\ v_{21} & v_{22} & \cdots & v_{2d} \\ \vdots & \vdots & \ddots & \vdots \\ v_{d1} & v_{d2} & \cdots & v_{dd} \end{bmatrix} \quad\text{and}\quad V^\top = \begin{bmatrix} v_{11} & v_{21} & \cdots & v_{d1} \\ v_{12} & v_{22} & \cdots & v_{d2} \\ \vdots & \vdots & \ddots & \vdots \\ v_{1d} & v_{2d} & \cdots & v_{dd} \end{bmatrix} $ Let's say $V$ was: $V = \begin{bmatrix} 0.6 & -0.7 & 0.4 \\ 0.6 & 0.7 & -0.3 \\ 0.5 & 0.0 & 0.86 \end{bmatrix}$ Each column represents an *eigenvector*, $v_i$. We need a way to *describe* that eigenvector, and $V$ does that by encoding it as a linear combination of the standard basis. That *is* what the first column represents for instance: $v_1 = 0.6 e_1 + 0.6 e_2 + 0.5 e_3$ The same goes for the other columns: $v_2 = -0.7 e_1 + 0.7 e_2 + 0.0 e_3$ $v_3 = 0.4 e_1 - 0.3 e_2 + 0.86 e_3$ In this way we can see that $V$ can take in linear combinations (vectors) described in the eigen basis, and return their representations in the original standard basis: $V: \text{eigen basis} \rightarrow \text{standard basis}$ In that case, how do we go the other direction—from the standard basis to the eigen basis? Well that will require that each column corresponds to a standard basis vector, where the column values represent where that standard basis vector lands in the eigen basis. In other words, we need to represent $e_1$ as a linear combination of $v_is. To do this we really just want to *invert* $V$! And because $V$ is orthonormal, $V^{-1} = V^T$. Thus, in order to change our basis from the standard basis to the eigen basis, we use $V^T$: $V^\top = \begin{bmatrix} 0.6 & 0.6 & 0.5 \\ -0.7 & 0.7 & 0.0 \\ 0.4 & -0.3 & 0.86 \end{bmatrix}$ Thus we can state: * If you want to answer "where do standard basis vectors land in eigen space?", use $V^T$ * If you want to answer "where do eigen basis vectors land in standard basis?", use $V$ Note that in sklearns PCA implementation, `pca.components_` is equivalent to the $V^T$ matrix. If we take the top $k$ eigenvectors (based on their eigenvalues) of $V$, we will go from $V \in \mathbb{R}^{d \times d}$, to $V_k \in \mathbb{R}^{d \times k}$. In other words, we take the top $k$ columns of $V$. Notes: we have restricted $\alpha$ to live in the orthogonal complement of $\mathbf{1}$. In other words, we want pca subspace to be orthogonal to 1 vector. * The balance constraint on alpha defines a hyperplane in d-1 space. This hyperplane is orthogonal to the all 1s vector. * Thus, the feasibility region of our solution lies entirely within the orthogonal complement of $\mathbf{1}$ * Thus, if our search space (pca subspace) is orthogonal to $\mathbf{1}$, then all solutions automatically satisfy the sum constraint * Can you say more about "- **Check** how well \mathbf{1} projects into V_k: If \|V_k V_k^\top \mathbf{1} - \mathbf{1}\| is large → be cautious." okay so did you make a mistake when you said: "The better aligned your PCA subspace is with \mathbf{1}, the more faithfully you can enforce constraints like \sum \alpha_i = c." * Sum to 0 constraint enforces that alpha lives in a subspace of dimension d-1 that is orthogonal to the all 1s vector in d dims. This subspace contains the valid solutions for alpha that satisfy this constrain. * We have another subspace to think about—$V_kV_k^Tx$ defines the pca subspace embedded in our original d dim space * Where these two subspaces overlap is where our solutions can lie * But where do they overlap? * # More Here See [Covariance Matrix](Covariance%20Matrix.md). ![](Screenshot%202025-05-28%20at%2010.32.59%20AM.png) ![](Screenshot%202025-05-28%20at%2010.33.14%20AM.png) ![](Screenshot%202025-05-28%20at%2010.33.41%20AM.png) ![](Screenshot%202025-05-28%20at%2010.33.52%20AM.png) --- Date: 20220720 Links to: [Linear Algebra MOC](Linear%20Algebra%20MOC.md) Tags: #review References: * My blog post