Markov-Chain-Monte-Carlo

# Markov Chain Monte Carlo Markov Chain Monte Carlo essentially allows us to sample from a high dimensional probability distribution effectively. This is done by essentially performing a *biased random walk* to explore the distribution. ### Is MCMC a probabilistic gradient descent? This was a great question asked [here](https://stats.stackexchange.com/questions/78876/is-the-mcmc-simply-a-probabilistic-gradient-descent). It had a great response: > The key difference is that we are not attempting to optimize anything. Instead, we are attempting to estimate a density function - but not by estimating in some optimal manner a small number of parameters, instead by generating a lot of random numbers from the density function and going from there. So MCMC is really a random number generation technique, not an optimization technique. > > To see what a gradient descent-like algorithm with a stochastic component looks like, check out [stochastic approximation](http://en.wikipedia.org/wiki/Stochastic_approximation). The [simultaneous perturbation](http://www.jhuapl.edu/spsa/) variant is quite effective and accessible. ### Bayesian Intuition We should explore the deformed posterior space generated by our prior surface and observed data to ﬁnd the posterior mountain. However, we cannot naively search the space. Any computer scientist will tell you that traversing N-dimensional space is exponentially difﬁcult in N: The size of the space quickly blows up as we increase N (see the curse of dimensionality: http://en.wikipedia.org/wiki/ Curse of dimensionality). What hope do we have of ﬁnding these hidden mountains? The idea behind MCMC is to perform an intelligent search of the space. To say “search” implies we are looking for a particular point, which is perhaps not an accurate description, as we are really looking for a broad mountain. Recall that MCMC returns samples from the posterior distribution, not the distribution itself. Stretching our mountainous analogy to its limit, MCMC performs a task similar to repeatedly asking “How likely is this pebble I found to be from the mountain I am searching for?” and completes its task by returning thousands of accepted pebbles in hopes of reconstructing the original mountain. In MCMC and PyMC lingo, the “pebbles” returned in the sequence are the samples, cumulatively called the traces. When I say that MCMC “intelligently searches,” I am really saying that we hope that MCMC will converge toward the areas of posterior probability. MCMC does this by exploring nearby positions and moving into areas with higher probability. “Converging” usually implies moving toward a point in space, but MCMC moves toward a broad area in the space and randomly walks around in that area, picking up samples from that area. ### Why Thousands of Samples At ﬁrst, returning thousands of samples to the user might sound like an inefﬁcient way to describe the posterior distributions. I would argue that this is actually extremely efﬁcient. Consider the alternative possibilities: 1. Returning a mathematical formula for the “mountain ranges” would involve describing an N-dimensional surface with arbitrary peaks and valleys. This is not easy. 2. Returning the “peak” of the landscape (the highest point of the mountain), while mathematically possible and a sensible thing to do (the highest point corresponds to the most probable estimate of the unknowns), ignores the shape of the landscape, which we have previously argued is very important in determining posterior conﬁdence in unknowns. Besides computational reasons, likely the strongest reason for returning samples is that we can easily use the Law of Large Numbers to solve otherwise intractable problems. --- Date: 20210629 Links to: [003-Data-Science-MOC](003-Data-Science-MOC.md) Tags: #review References: * [Markov Chain Monte Carlo (MCMC) : Data Science Concepts - YouTube](https://www.youtube.com/watch?v=yApmR-c_hKU&t=99s) * [Markov Chain Monte Carlo - Introduction, mathematical monk](https://docs.sqlalchemy.org/en/14/orm/extensions/orderinglist.html) * [MCMC Animations](https://twiecki.io/blog/2014/01/02/visualizing-mcmc/) * [MCMC Slides](http://www.cs.cmu.edu/~aarti/Class/10701_Spring14/slides/MCMC.pdf) * Machine learning a probabilistic perspective, textbook