Concentration of Measure

# Concentration of Measure # Summary As we increase the dimension $n$, there become more ways to be orthogonal than similar. Volume further from the origin grows more than near the origin. # Intro Just what is **concentration of measure**? When you first come across it, it can sound quite forbidding. But we will see that it really isn't so bad! For our exploration all we are going to need is a *space* with a *measure* on it. We will focus on the the $n$ dimensional *unit sphere*, $S^{n-1}$[^2]. ![](Pasted%20image%2020250723150832.png) Note that the space is the *shell*, not it's *interior*. For instance, in the 2d case, the space is the purple circle—*not* the area within the circle. Likewise for the 3d sphere; the space is the sphere itself, the *shell*—*not* the volume within it. This extends upwards to higher dimensions—the space we are interested in is always of $n-1$ dimensions, but we tend to think of it as being *embedded* in an $n$ dimensional space. These spaces have a *measure*—which just means they have a size. The side of the 2d circle is it's perimeter. The side of the 3d sphere it's surface area. The side of a 4d hypersphere is it's volume. And so on[^1]. Great. So we now have our space that is equipped with a measure—a size. This is a *structure* that we can analyze, that we can ask questions about. This is because the rules that define the structure have *consequences*—they have implications. But what are some questions we might ask? What about the following: > *What does the distribution of points look like on this surface?* This might first seem like a boring question—after all in $2$ and $3$ dimensions the points look evenly distributed. But let's just see where this line of questioning leads us. This will be the main idea—our core problem: how can we mathematically describe the relationships between that point and other points in the space? To answer this question, we must first address what it even means to measure the "distribution of points". Because these are symmetric objects (rotation invariant), a clean way to think about this might be to simply pick a point at random and measure it's relationship to all other points. Whatever this relationship is, it will apply equally well to all points due to symmetry. Okay, but what do we mean by "relationship" between points? There are all sorts of ways to measure this, but we we stick to the *angle* between points. The angle is especially useful in this context because we know every point has the same *length* from the origin (because we are dealing with *unit spheres*, all vectors have length $1$). Excellent, our question is slowly starting to crystalize. We may phrase it informally as: > Suppose we are looking at the space of the $n-1$ unit sphere, $S^{n-1}$. Given any point in that space, $x$, what is the distribution of angles from $x$ to all other points in the $S^{n-1}$? > > What percentage of points are with in $p$ percent of being perfectly orthogonal to our point? To answer this question we will utilize the tried and true mathematical technique: when dealing with higher dimensions, start with $2$ dimensions, deeply internalize what happens when you move to $3$ dimensions, and then let your brain "extrapolate" our to higher dimensions. But the intuition will be in this $2$ to $3$ dimensional transition. ## A $2$-d toy example Below we can see our unit sphere, $S^1$, in black. I picked a random reference point, in red. I then show in green the points in the space (the green arc) that are within 5 degrees of the reference point. The blue bands represent the points that are within 5 degrees of being perfectly orthogonal to the reference point. If we look at the distribution of angles with respect to the reference point, it is nice and uniform. We are just as likely to find a vector perfectly orthogonal to our reference point, as we are to find an exact match. The key intuition that you must keep track of is simply: notice the size of the green arc relative to the blue arcs. They are effectively identical ![](Pasted%20image%2020250723155619.png) ## Stepping up to $3$-d Now lets move up one dimension, from $S^1$ to $S^2$. In the image below we can see our original $S^1$, and reference point. But whereas previously the points that had an angle within 5 degrees of it made up a the little green arc (a $1$-d line), now the set of points satisfying that criteria make up the shaded green circle surrounding the reference point. Likewise, the set of all points within 5 degrees of being perfectly orthogonal to our reference point now make up the blue band. And this is the intuition builder: notice that the area of the blue band is much larger than the green circle. Said another way, it has a much greater measure. We can see this in the distribution of angles of points in the space with respect to the reference point. The probability of being nearly identical is plummeting, while there is a the start of a pileup around being orthogonal. This is showing something quite powerful: already, in just 3 dimensions, the distribution of angles is already biased slightly towards being orthogonal. Put another way, for any random two points, the most likely angle between them is 90 degrees. ![](Pasted%20image%2020250723160242.png) Notice that this is a property of the *space*, $S^{n-1}$. The *geometry* of this space is such that orthogonality is biased as $n$ increases. ## The inductive leap: Increasing the dimensionality What happens as we increase the dimensionality? This behavior only intensifies—below we can see that as we increase $n$, the measure of points piles up around orthogonality. > All points become nearly orthogonal to all other points as $n$ increases. ![](Pasted%20image%2020250723161027.png) This plot is just generalizing what we already saw in the jump from 2 to 3 dimensions: when we add a dimension, we increase the proportion of ways to be *orthogonal* more than we increase the proportion of ways to be *similar*. ## Adding dimensions grows the orthogonal space faster than the similar space Let us continue building intuition here: > When we add a new dimension, the number of ways to be orthogonal increases much faster than the number of ways to be similar. In other words, adding a new dimension is literally adding more "room" to be orthogonal—and that room grows faster than the room to stay similar. Adding a new dimension is literally adding more orthogonal space! We can think about this as follows. We are in 2d space with a vector $v$ that falls on the unit circle, $S^1$. When we add a new dimension, that means adding a new direction that is orthogonal to our space $S^1$, which means adding a new direction orthogonal to $\mathbb{R}^2$ (for that is where $S^1$ is embedded). But note that this new direction will—*by definition*—be orthogonal to $v$ (for $v$ is part of $S^1$). Really pause and think about this for a moment. We have added a new dimension that is *orthogonal* to our space, and our reference vector $v$. Will this *orthogonal dimension* provide more space for vectors to be *orthogonal* or *similar* to our vector $v$ (and $S^1$ as a whole)? This is not a trick question—the answer is it will provide more space for orthogonal vectors! Of course each new dimension will allow for more ways to be similar—it is just that it heavily favors more ways to be orthogonal. ## Poor little $1$-d subspaces embedding in big bad $n$-d Let us keep building intuition. Part of the reason the above behavior occurs is due to the fact that vectors represent $1$-d subspaces, embedding in our full space. These little $1$-d subspaces become vanishingly small relative to the ambient space as dimension increases. Look back to our 2 and 3 d examples—we can see that for both the set of similar vectors (shaded green) form a *cone* around our reference vector. This cone will always be tightly "wrapped around" the $1$-d subspace of our vector. As $n$ increases, the unit spheres surface area spreads out into so many directions that the cone around the reference vector becomes negligible. ## Reference points, global behavior and spikiness Up until now I have generally been talking about a reference point. But this behavior applies to the whole space, globally. This might be hard to visualize—but here is a trick. Below I have 8 different plots of the behavior of $S^2$, each with a different reference point. You can see that the blue band of orthogonal points simply rotates based on the chosen reference point. This behavior applies equally to any reference point. In other words, it is rotationally invariant. ![](Pasted%20image%2020250724161513.png) Okay great so we can say that this behavior applies to the every point in the space—it is a global property in that sense. But that still leaves a nagging question: is the *measure* dependent on *a* reference point? Not necessarily any specific reference point (we just showed that any point will do), but just a reference point in general. And the answer is *yes*. The reason that measure is dependent on a reference point is because *measure is a function* that we can define! And in this case we have defined measure to be a function, $f$, that takes in a point, $x$, and maps its to the set of angles between it and all other points in the space. We then show that $f$—our measure function—*concentrates* around a value. In this case it concentrates around 90 degrees (orthogonality). This is what is meant if you ever hear that the measure in high dimensions becomes "spiky". Each point has a distribution of angles (with respect to other points) that spikes heavily at 90 degrees, and this gets more extreme as $n$ increases. It is *this distribution* that becomes spiky. It is absolutely critical to note that it is *not* the space itself that becomes spiky! The space remains smooth! If this seems like a contradiction, let's move back to 3 dimensions to strengthen our intuitions. In 3 d we can already see that there is a pileup of angles centered at 90 degrees, but the space ($S^2$, the unit sphere) is perfectly smooth and uniform! Thus even at low dimensionality that we can visualize, we have an example of a smooth space with a slightly spiky measure of angles. ## A bit more technical Before explaining the technical innards of measure, *push forward*, and *pull back*, let's clarify the *problem* we are trying to solve—we want to answer the question: > ###### The Problem > We are looking at a specific reference vector, $r$, living in the space $S^{n-1}$. What fraction of points in this space have an angle with $r$ in the range of $[0.45 \pi, 0.55 \pi]$? Before diving into a solution, how might we go about solving this on our own? As always, a *concrete toy example* will help. Supposed our space is $S^1$, the unit circle, and we have a reference vector $r$, as shown below. We can then ask, what fraction of points is in the range $[0.45 \pi, 0.55 \pi]$? ![center | 350](Screenshot%202025-07-28%20at%2011.41.55%20AM.png) In this simple example we can see that it is going to just be: length of the two blue arcs divided by the total length of the unit circle. But what tools did we need in order to get there? 1. A way of *identifying* the blue arcs 2. A way to measure how much of the full space was comprised by the blue arcs. ###### 1. Identify the Blue Arcs Let's now walk through how we might build this up, piece by piece. Our starting point is our set of angles we are interested in—we are trying to answer what fraction of points in the space have an angle with respect to $r$ in the range $E$: $E = \overbrace{[0.45 \pi, 0.55 \pi]}^\text{A set of angles}$ The first thing we must do is *find the set of points* in $S^1$ that have an angle with respect to $r$ in this range. How can we do that? We will need two tools: 1. A *function* $f$ 2. A *preimage* operator $f^{-1}$ (this is *not* an inverse) This function maps points on the sphere to angles (relative to a fixed vector $r$). It takes each point $x \in S^1$, and returns the angle between $x$ and the reference vector $r$: $f : S^1 \to [0, \pi]$ $f(x) = arccos(x, r)$ The *preimage* operator takes in sets and outputs sets—it is *not* a function inverse (no guarantees of an input mapping to a single output), but uses the same notation. Here it will take in our set of angles $E$ and output the set of points on the sphere whose angle with the $r$ falls in $E$: $f^{-1}(E) = \{ x \in S^{n-1} \mid f(x) \in E \} = \{ x \in S^{n-1} \mid \text{angle}(x, v) \in E \} = B$ So this preimage operator spits out the dark blue lines in the image above. Let us call this set $B$. ###### 2. What fraction of space is comprised of blue arcs? We now have $B$ and we want to know how much of the total space, $S^1$, does $B$ make up. This requires a *new measure*, $\mu$ (the uniform surface measure): $\mu: B \rightarrow [0, 1]$ $\mu(B) = \frac{\text{surface area of } B}{\text{surface area of } S^{1}}$ ###### Summary And just like that, we've answer our question. It required introducing two new objects: 1. A preimage operator to take our set of angles, $E$, and map it to points $x \in S^1$ 2. A uniform measure $\mu$ to map this set of points to a fractional value (effectively a probability) These two piece combined form a *new measure*, which we could call $\nu$. ## Is this due to $S^{n-1}$ being curved? Not really. The same behavior occurs if we look at a hypercube. The real difference occurs when we deal with $\mathbb{R}^n$. This is because $\mathbb{R}^n$ doesn't have a clear measure that it comes equipped with (unlike $S^{n-1}$, which the uniform measure fits with nicely). On a hypersphere, each point is equally likely, so the uniform measure is incredibly natural. However, in $\mathbb{R}^n$ (which is infinite), we don't have clear cut measure (because the space is infinite, if we tried to use the uniform measure we should be dividing by an infinity and get an undefined). To get around this we can use a *gaussian measure*. This just means that instead of being uniformly weighted, the points are weighted according to a gaussian density. So whereas with the sphere we could ask "what is the probability that an angle will fall in range $Equot;, we now ask that question and then weight it via the probability of each point (find all points in $\mathbb{R}^n$ that have an angle in $E$, then perform a weighted integral across the gaussian weighting for those points). In other words: in $\mathbb{R}^n$ with a gaussian measure, some points count more than others and the gaussian density tells us how much. ## More on the Gaussian You may hear that if our space is $\mathbb{R}^n$ and we use a gaussian measure, that the resulting probability mass will concentrate in a *shell* far from the origin. This may sound counter intuitive at first, but this is very analogous to our sphere. With the sphere, we had a uniform measure and saw that for any point, the probability distribution of angles would spike at 90 (orthogonal) as the dimensionality, $n$, increased. This was mainly due to the property of increasing dimension—as we add more dimensions, there are more ways to be orthogonal than similar. Okay, but what about the case where our space is $\mathbb{R}^n$ and we use a gaussian measure? Here, the *density* specified via the gaussian will be highest at the origin—this aligns with intuition. The *problem* is that the volume becomes disproportionately small near the origin. So while the density is high, the volume is so low, and thus the probability mass is low. The probability mass will be highest where the density and volume are both not too small. This is in a shell far from the origin. Again, this is due to the *volume* of the space—the *density* behaves as we'd expect: it is highest at the origin. This is surprisingly easy to visualize. Below, we are looking at the volume of cubes in 1, 2 and 3 d. The purple cubes have side lengths of 2, the red side lengths of 1. In 1d, this means that the red makes up 50% of the purple. In 2d this shrinks to 0.25. And in 3d this already down to 0.125. ![](Pasted%20image%2020250729090114.png) This behavior continues, with the fraction occupied by the center red cube decreasing exponentially (plotted in log scale below): ![center | 400](Pasted%20image%2020250729090308.png) The key idea is that the volume centered around the origin (red cube) becomes an exponentially small sliver of the total volume as $n$ increases. Thus, even though the gaussian provides high density in this area, the volume is so small that it is washed out. > **As** n **increases, random vectors in high-dimensional space become nearly orthogonal to one another — not just frequently, but overwhelmingly so.** > This reflects how high-dimensional geometry causes most directions to concentrate around 90° from any given reference point. > **As** n **increases, the geometry of** \mathbb{R}^n **causes volume to concentrate in a thin spherical shell far from the origin.** > In Gaussian space, this means that although density is highest at the origin, the _mass_ piles up where volume and density balance — around radius \sqrt{n}. > In high dimensions: - > **Similarity breaks down** — most vectors are nearly orthogonal to each other. This is a geometric effect: the sphere’s surface measure concentrates in an equatorial band orthogonal to any fixed direction. - > **Volume concentrates in shells** — in \mathbb{R}^n with Gaussian measure, mass doesn’t cluster at the origin, but instead in a thin spherical shell at radius \sqrt{n}. This is due to the interplay between decaying density and expanding volume. > > Together, these effects show that increasing dimension fundamentally reshapes intuition: both **angles and distances** behave in unintuitive, structured ways. COME BACK TO THIS: > As dimension grows, random points in high-dimensional space concentrate in a thin shell at fixed radius from the origin. > This geometric concentration forces their **pairwise inner products** to vanish and their **angles** to converge to 90° — producing both uniform distances and pervasive orthogonality. LEAVING OFF: [ChatGPT](https://chatgpt.com/share/e/6888e618-d710-8006-af0c-45d31276471e) ## At what rate does this occur? Is this shift of measure towards orthogonality exponential? Something else? # Jacobians TODO [^1]: The measure of a unit hypersphere of dimension $n-1$ is $\operatorname{Vol}_{n-1}(S^{n-1}) = \frac{2\pi^{n/2}}{\Gamma(n/2)}$ [^2]: Note that we are *not* focusing on $\mathbb{R}^n$. This would require different mathematical objects—for our purposes the unit hypersphere will suffice (note that the unit hypersphere is applicable anytime we have a bunch of unit vectors in $n$ dimensional space—for example, the eigenvectors returned from PCA)