# Sparse PCA Say you have a bunch of documents. Each document is represented as a long sparse vector . ![400](Screenshot%202024-05-24%20at%208.13.00%20AM.png) We then have a data matrix of all document vectors: ![400](Screenshot%202024-05-24%20at%208.13.53%20AM.png) Given this we can compute a correlation matrix (how many times does word i appear with word j in a document): ![400](Screenshot%202024-05-24%20at%208.14.43%20AM.png) ![400](Screenshot%202024-05-24%20at%208.17.38%20AM.png) ![400](Screenshot%202024-05-24%20at%208.17.46%20AM.png) ![400](Screenshot%202024-05-24%20at%208.18.47%20AM.png) ![400](Screenshot%202024-05-24%20at%208.20.41%20AM.png) ![400](Screenshot%202024-05-24%20at%208.22.03%20AM.png) --- Date: 20240524 Links to: Tags: References: * [Megasthenis Asteris: A Framework for Sparse PCA - YouTube](https://www.youtube.com/watch?v=uCzGi6zLu6s)