Privileged Basis - Nate's Notes

# Privileged Basis Consider the **residual stream** of a transformer. This is a 784 dimensional vector space, let's call it $V$. At this point we have *not* defined a **[basis](Basis%20Vectors.md)** for $V$. Again, it is just an [Abstract Vector Spaces](Abstract%20Vector%20Spaces.md) at this point (of course note that because [All Vectors Spaces are Isomorphic to Rn](All%20Vectors%20Spaces%20are%20Isomorphic%20to%20Rn.md), any 784 dimensional space is isomorphic to $\mathbb{R}^{784}$). Fundamentally, if you have a vector space, you are going to need some sort of basis to understand what is going on inside of it. Some way to decompose vectors into coefficients of a bunch of fixed coordinate axes. Now, there are a bunch of techniques for taking in an arbitrary set of vectors and finding a basis that might be sensible for those (such as [PCA](PCA.md)). But it would be very nice if we could just take a model and say, a priori, what is going on; what is the right basis? If we knew the right basis then interpreting the model goes from interpreting this huge mess of weights and activations that are very hard to get your head around, to a bunch of numbers - and we hope that each number is independently meaningful independent of the other things. Pause! Let's break this down a little bit further (I was stuck on this for a moment). Consider a basis neural network of the following form: We have an input $x \in \mathbb{R}^{764}$ , a linear transformation (layer) $W: \mathbb{R}^{764} \rightarrow \mathbb{R}^{256}$, and finally an element wise nonlinearity $ReLU: \mathbb{R}^{256} \rightarrow \mathbb{R}^{256}$. Let's focus on the the *output of the ReLU*, $h \in \mathbb{R}^{256}$. At first $h$ simply is the vector space $\mathbb{R}^{256}$, and we haven't chose a particular basis yet. There is nothing about certain vectors that will "encourage" them to have a special basis. . We start with *no basis* and *no semantic meaning*. We Say we observe some activations in that space, where each activation corresponds to a particular dimension, which has an implicit basis. At this point, it isn't clear *semantically* what our basis corresponds to (see [this great work](https://distill.pub/2017/feature-visualization/) by Chris Olah on how to determine the semantic meaning in images). Maybe the dimension 675, after optimization, corresponds to a dogs nose. However, maybe after a different random initialization and optimization it was dimension 313 that represented the dogs nose. *Which* dimension it is doesn't really matter when we are talking about privileged bases - what matters is that then network has a tendency at all to imbue semantic meaning in it's bases (dimensions). Now this is often not the case, but there are some bits of the model where we are more likely to have a privileged basis, and some bits where we are less likely to. We can finally define privileged basis: > **Privileged Basis**: We can predict, a priori, without explicitly looking at the weights or activations, which basis vectors might be meaningful. Now, taking a step back, it is kind of weird that we would ever expect a basis to be privileged. Vector spaces are geometric objects - the coordinate axes (bases) are just arbitrary, a lens with which we can look at our space. You would need something special to have a privileged basis. That requires something *nonlinear* to interact with the residual stream. But, because everything interacting with the residual stream is linear, there is no reason for it to have a privileged basis. Sometimes a neural network representation has a special basis that is particularly meaningful to study, and sometimes it doesn't. So, if you wish to study a particular neuron in a neural network, this is only meaningful if it has a privileged basis. A privileged basis is usually created by having a **non-linearity**, such as ReLU. ReLU is such a sharp non-linearity that creates sparsity, so it is very good at aligning features with the basis dimension of your representation. ### Another Intuition Privileged Basis: Only some representations have a privileged basis which encourages features to align with basis directions (i.e. to correspond to neurons). ### Examples of Privileged Basis These show up all the time in convolution neural networks. ### Examples of Non Privileged Basis In Word Embeddings we have a non-privileged basis. It doesn't make sense to ask "what does basis 0 mean?" or "what does the basis direction 1 mean?". There is nothing special about those basis dimensions. ### What to do if we don't have a privileged basis? One approach that is often successful is to find a new basis in terms of the difference between *semantically meaningful vectors*. The canonical example here comes from word embeddings. Saying we have a bunch of word embedding vectors living in $V$, where $V = \mathbb{R}^{512}$. We then have $v_1$ which represents `man` and $v_2$ which represents `woman`. If we compute $w = v_1 - v_2$ we may be able to say that $w$ represents the *gender* dimension. Of course $w$ will live in $V$, so we can take any other vector in $V$ and **project** it onto $w$. If the resulting projection is *large* then we can say that the vector we projected onto $w$ had a lot of the gender dimension present. And if it is small we can the opposite. ### A Deeper dive... See my conversation with claude [here](https://claude.ai/chat/3ff26198-df18-4d7c-9943-fffa1ba7c98e). **Claude**: "Imagine you have a vector space with a bunch of vectors floating around in it. Most bases we pick to describe this space are sort of arbitrary - we just pick some vectors that span the space. But certain bases are special - they align with the natural features of the space itself. These special, natural bases are called "privileged bases". For example, imagine the vector space represents the possible positions and velocities of a particle in 3D space. We could pick any old orthogonal basis to describe this space - call them basis vectors e1, e2, e3, e4, e5, e6. But there's a more natural basis - the position basis (x, y, z) and the velocity basis (vx, vy, vz). These align with the intuitive notions of position and velocity in 3D space. So the (x, y, z, vx, vy, vz) basis is a privileged basis for this vector space. The privileged basis captures something intrinsic about the vector space - it reflects its natural structure. Other bases feel artificial in comparison. That's the intuition behind privileged bases - they align with the inherent features of the space itself. Picking a privileged basis is like picking a natural coordinate system tailored to the space, rather than an arbitrary one."" **Nate**: "I like this explanation! Thanks. To put it another way: A vector space may contain objects (vectors) that correspond to something in the real world. We can describe these vectors via a basis that most closely aligns with out real world way of thinking. This is what is meant by a "natural" basis. Can you confirm if my understanding is correct?" **Claude**: "Yes, your understanding is spot on! The key intuition is that a vector space can represent some real world system or phenomenon, and certain bases align more naturally with our real world notions than others. Your example of position and velocity in 3D space is the quintessential example - the (x, y, z) and (vx, vy, vz) basis vectors clearly correspond to real world concepts we already understand. They form a natural, intuitive basis for describing the state of a particle. More abstractly, a privileged basis captures the "true degrees of freedom" or "independent components" of the vectors in the space. It untangles and separates the distinct factors that the vectors represent. An arbitrary basis mixes these together, while a privileged basis teases them apart into natural, interpretable pieces. So in summary - a privileged basis aligns with real world concepts and the inherent degrees of freedom in the vector space. It provides an intuitive coordinate system, rather than an arbitrary one." **Nate**: "To push this a little further: I am thinking that a vector space in general is very abstract. On it's own it does not even define a basis! It just consists of vectors (objects) that satisfy certain axioms/properties. So in the case of a real world example, such as position and velocity, we are taking a system that we know has some intrinsic structure and semantic meaning, and then saying it can be thought of as a vector space. This takes away some of that semantic meaning until we pick a basis! By picking the privileged basis, we restore that semantic meaning. Is that correct?" **Claude**: "You're absolutely right, that's a great way to think about it! The vector space itself is purely abstract - just vectors and axioms, no inherent meaning. The semantic meaning comes in when we try to model some real world system as a vector space. By picking vectors that represent meaningful states/properties, we imbue the abstract vectors with concrete meaning. But the vector space itself has no notion of what these vectors represent. The basis is what gives them meaning and structure relative to the real world concepts. An arbitrary basis loses that meaning again by mixing the degrees of freedom together. But the privileged basis preserves the meaning by aligning the basis vectors with the "true" independent components. So you summarized it perfectly - the privileged basis takes a meaningless abstract vector space and restores the original semantic meaning from the real world system it represents. That's exactly the right intuition behind privileged bases." **Nate**: "Okay great! Can you give me an example of a privileged basis in a natural language context?" **Claude**: "Here's an example of a privileged basis in a natural language context: Imagine representing the meaning of sentences as vectors in a vector space. An arbitrary basis for this space doesn't align with linguistic meaning - the basis vectors don't correspond to real semantic concepts. But we can define a privileged basis using vectors that represent the presence or absence of important semantic features like: - Positive sentiment - Negative sentiment - Topic = politics - Topic = science - Topic = sports - Contains question - Provides answer - etc. These basis vectors isolate key semantic components of meaning. Sentences live in this space, but in an arbitrary basis their meaning is obscured. The privileged basis teases apart the core semantic factors so each basis vector aligns with a real interpretible concept. This allows downstream tasks to leverage the privileged basis for things like sentiment analysis, topic modeling, QA systems etc. The privileged basis reveals the semantic structure rather than hiding it amongst arbitrary basis vectors. So in NLP, a privileged basis captures the core semantic components of meaning and isolates them into interpretable basis vectors. This aligns with natural concepts rather than arbitrary combinations of meaning." In a privileged basis, basis are "neurons". ### Learned Features Convolutional neural networks learn abstract features and concepts from raw image pixels. [Feature Visualization](https://christophm.github.io/interpretable-ml-book/cnn-features.html#feature-visualization) visualizes the learned features by activation maximization. [Network Dissection](https://christophm.github.io/interpretable-ml-book/cnn-features.html#network-dissection) labels neural network units (e.g. channels) with human concepts. Deep neural networks learn high-level features in the hidden layers. This is one of their greatest strengths and reduces the need for feature engineering. Assume you want to build an image classifier with a support vector machine. The raw pixel matrices are not the best input for training your SVM, so you create new features based on color, frequency domain, edge detectors and so on. With convolutional neural networks, the image is fed into the network in its raw form (pixels). The network transforms the image many times. First, the image goes through many convolutional layers. In those convolutional layers, the network learns new and increasingly complex features in its layers. Then the transformed image information goes through the fully connected layers and turns into a classification or prediction. It is useful to think about this in the context of [what exactly is a neuron?](Neuron.md) ### Activation Space See [here](https://distill.pub/2017/feature-visualization/). If neurons are not the right way to understand neural nets, what is? In real life, combinations of neurons work together to represent images in neural networks. A helpful way to think about these combinations is geometrically: let’s define _activation space_ to be all possible combinations of neuron activations. We can then think of individual neuron activations as the _basis vectors_ of this activation space. Conversely, a combination of neuron activations is then just a vector in this space. This framing unifies the concepts “neurons” and “combinations of neurons” as “vectors in activation space”. It allows us to ask: Should we expect the directions of the basis vectors to be any more interpretable than the directions of other vectors in this space? Szegedy _et al._ found that random directions seem just as meaningful as the directions of the basis vectors. More recently Bau, Zhou _et al._found the directions of the basis vectors to be interpretable more often than random directions. Our experience is broadly consistent with both results; we find that random directions often seem interpretable, but at a lower rate than basis directions. ![](Screenshot%202023-08-03%20at%207.32.51%20AM.png) ![](Screenshot%202023-08-03%20at%207.33.01%20AM.png) ![](Screenshot%202023-08-03%20at%207.33.19%20AM.png) ![](Screenshot%202023-08-03%20at%207.33.34%20AM.png) --- Date: 20230801 Links to: Tags: References: * [Privileged vs non-privileged bases in machine learning](https://harrisonpim.com/blog/privileged-vs-non-privileged-bases-in-machine-learning) * [A Walkthrough of A Mathematical Framework for Transformer Circuits - YouTube](https://youtu.be/KV5gbOmHbjU?t=2000) * [MLP Neurons - Privileged vs Non-Privileged Basis [rough early thoughts] - YouTube](https://youtu.be/-oKuDRFHW_Y?t=14) * [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html#residual-comms) * [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html#def-privileged-basis) * [Privileged Bases in the Transformer Residual Stream](https://transformer-circuits.pub/2023/privileged-basis/index.html) * [Toy Models of Superposition](https://transformer-circuits.pub/2022/toy_model/index.html)