Probability Distribution

# MCMC and Probability Distributions ### Geometric Perspective When we set up a Bayesian inference problem with N-unknowns, we are implicitly creating an N-dimensional space for the prior distributions to exist in. Associated with the space is an additional dimension, which we can describe as the surface, or curve, that sits on top of the space, that reﬂects the prior probability of a particular point. ![](Screen%20Shot%202022-05-11%20at%204.44.26%20PM.png) ![](Screen%20Shot%202022-05-11%20at%204.44.35%20PM.png) #### How are these surfaces impacted when we incorporate observed data? If these surfaces describe our prior distributions on the unknowns, what happens to our space after we incorporate our observed data X? The data X does not change the space, but it changes the surface of the space by pulling and stretching the fabric of the prior surface to reﬂect where the true parameters likely live. More data means more pulling and stretching, and our original surface may become mangled or insigniﬁcant compared to the newly formed surface. Less data, and our original shape is more present. Regardless, the resulting surface describes the new posterior distribution. Again I must stress that it is, unfortunately, impossible to visualize this in large dimensions. For two dimensions, the data essentially pushes up the original surface to make tall mountains. The tendency of the observed data to push up the posterior probability in certain areas is checked by the prior probability distribution, so that less prior probability means more resistance. Thus in the preceding double-exponential prior case, a mountain (or multiple mountains) that might erupt near the (0,0) corner would be much higher than mountains that erupt closer to (5,5), since there is more resistance (low prior probability) near (5,5). The mountain reﬂects the posterior probability of where the true parameters are likely to be found. It is important to note that if the prior has assigned a probability of 0 to a point, then no posterior probability will be assigned there. ![](Screen%20Shot%202022-05-11%20at%204.46.10%20PM.png) #### Exploring this landscape We should explore the deformed posterior space generated by our prior surface and observed data to ﬁnd the posterior mountain. However, we cannot naively search the space. Any computer scientist will tell you that traversing N-dimensional space is exponentially difﬁcult in N: The size of the space quickly blows up as we increase N (see the curse of dimensionality: http://en.wikipedia.org/wiki/ Curse of dimensionality). What hope do we have of ﬁnding these hidden mountains? The idea behind MCMC is to perform an intelligent search of the space. To say “search” implies we are looking for a particular point, which is perhaps not an accurate description, as we are really looking for a broad mountain. Recall that MCMC returns samples from the posterior distribution, not the distribution itself. Stretching our mountainous analogy to its limit, MCMC performs a task similar to repeatedly asking “How likely is this pebble I found to be from the mountain I am searching for?” and completes its task by returning thousands of accepted pebbles in hopes of reconstructing the original mountain. In MCMC and PyMC lingo, the “pebbles” returned in the sequence are the samples, cumulatively called the traces. When I say that MCMC “intelligently searches,” I am really saying that we hope that MCMC will converge toward the areas of posterior probability. MCMC does this by exploring nearby positions and moving into areas with higher probability. “Converging” usually implies moving toward a point in space, but MCMC moves toward a broad area in the space and randomly walks around in that area, picking up samples from that area. #### Algorithms to perform this exploration There is a large family of algorithms that perform MCMC. Most of these algorithms can be expressed at a high level as follows. 1. Start at the current position. 2. Propose moving to a new position (investigate a pebble near you). 3. Accept/Reject the new position based on the position’s adherence to the data and prior distributions (ask if the pebble likely came from the mountain). 4. (a) If you accept: Move to the new position. Return to Step 1. (b) Else: Do not move to the new position. Return to Step 1. 5. After a large number of iterations, return all accepted positions. In this way, we move in the general direction toward the regions where the posterior distributions exist, and collect samples sparingly on the journey. Once we reach the posterior distribution, we can easily collect samples, as they likely all belong to the posterior distribution. If the current position of the MCMC algorithm is in an area of extremely low probability, which is often the case when the algorithm begins (typically at a random location in the space), the algorithm will move in positions that are likely not from the posterior but better than everything else nearby. Thus the ﬁrst moves of the algorithm are not very reﬂective of the posterior. We’ll deal with this later. In the preceding algorithm’s pseudocode, notice that only the current position matters (new positions are investigated only near the current position). We can describe this property as memorylessness; that is, the algorithm does not care how it arrived at its current position, only that it is there. --- Date: 20220511 Links to: Tags: References: * Bayesian Methods for Hackers, Chapter 3 ### Anki START Basic How should you think about a probability distribution from a geometric perspective? Back: It can be thought of as a **hyper-surface** that sits on top of a vector space (such as $\mathbb{R^n}$). Tags: math  END START Basic Visualize an exponential probability distribution sitting on top of the vector space $\mathbb{R}^2$. Back: ![](Screen%20Shot%202022-05-11%20at%204.44.35%20PM.png) Tags: math  END