PDF Geometric Intuition

# PDF Geometric Intuition Consider the vector space, $\mathbb{R}^N$. We wish to associated a **probability distribution** with this space. To do so we can consider an *additional dimension* that we can describe as a *curve* (1-d) or *surface* (2-d) or *hyper-surface* (3-d and above) that sits on top of our space. This surface is known as a **probability density**. If our vector space was $\mathbb{R}^2$ then our density may look like: ![](Pasted%20image%2020220522142123.png) The **volume** *between* the *probability density surface* and the *underlying space*, $\mathbb{R}^2$, is referred to as the **probability mass** and must be equal to $1$. ### Why is it called mass if it's a volume? Good question! The key idea here is that our underlying space, here $\mathbb{R}^2$, has a corresponding *differential volume element*, $dx_1 dx_2$. Here that is simply the area of the product of the two differentials. Because we are treating our probability surface as a **density**, we can recall the formula: $\text{density} = \frac{\text{mass}}{\text{volume}} \longrightarrow \text{mass} = \text{density} \cdot \text{volume}$ And in our case that is: $\text{mass} = \int_X f \cdot dx_1 dx_2$ Note that we think of our probability density function as a density *by construction* (this arises from measure theory; for more see Betancourt's writing on Radon-Nikodym Derivatives). Because we think of the probability density surface as a density, the volume between it and our original surface, while in a sense still a *volume* can be thought of as a *mass*. The fact that this is a conserved quantity makes this even more appropriate. ### Bayesian Updating If these surfaces describe our prior distributions on the unknowns, what happens to our space after we incorporate our observed data X? The data X *does not change the space*, but it ***changes the surface of the space by pulling and stretching the fabric of the prior surface to reﬂect where the true parameters likely live***. More data means more pulling and stretching, and our original surface may become mangled or insigniﬁcant compared to the newly formed surface. Less data, and our original shape is more present. Regardless, the resulting surface describes the new posterior distribution. Again I must stress that it is, unfortunately, impossible to visualize this in large dimensions. For two dimensions, the data essentially pushes up the original surface to make tall mountains. The tendency of the observed data to push up the posterior probability in certain areas is checked by the prior probability distribution, so that less prior probability means more resistance. Thus in the preceding double-exponential prior case, a mountain (or multiple mountains) that might erupt near the (0,0) corner would be much higher than mountains that erupt closer to (5,5), since there is more resistance (low prior probability) near (5,5). The mountain reﬂects the posterior probability of where the true parameters are likely to be found. It is important to note that if the prior has assigned a probability of 0 to a point, then no posterior probability will be assigned there. ![](Screen%20Shot%202022-05-22%20at%202.31.59%20PM.png) --- Date: 20220522 Links to: [Probability MOC](Probability%20MOC.md) [Probability Density](Probability%20Density.md) Tags: #review References: * []()