# Markov Chain Monte Carlo
Markov Chain Monte Carlo essentially allows us to sample from a high dimensional probability distribution effectively. This is done by essentially performing a *biased random walk* to explore the distribution.
### Is MCMC a probabilistic gradient descent?
This was a great question asked [here](https://stats.stackexchange.com/questions/78876/is-the-mcmc-simply-a-probabilistic-gradient-descent). It had a great response:
> The key difference is that we are not attempting to optimize anything. Instead, we are attempting to estimate a density function - but not by estimating in some optimal manner a small number of parameters, instead by generating a lot of random numbers from the density function and going from there. So MCMC is really a random number generation technique, not an optimization technique.
>
> To see what a gradient descent-like algorithm with a stochastic component looks like, check out [stochastic approximation](http://en.wikipedia.org/wiki/Stochastic_approximation). The [simultaneous perturbation](http://www.jhuapl.edu/spsa/) variant is quite effective and accessible.
### Bayesian Intuition
We should explore the deformed posterior space generated by our prior surface and observed data to find the posterior mountain. However, we cannot naively search the space. Any computer scientist will tell you that traversing N-dimensional space is exponentially difficult in N: The size of the space quickly blows up as we increase N (see the curse of dimensionality: http://en.wikipedia.org/wiki/ Curse of dimensionality). What hope do we have of finding these hidden mountains? The idea behind MCMC is to perform an intelligent search of the space. To say “search” implies we are looking for a particular point, which is perhaps not an accurate description, as we are really looking for a broad mountain.
Recall that MCMC returns samples from the posterior distribution, not the distribution itself. Stretching our mountainous analogy to its limit, MCMC performs a task similar to repeatedly asking “How likely is this pebble I found to be from the mountain I am searching for?” and completes its task by returning thousands of accepted pebbles in hopes of reconstructing the original mountain. In MCMC and PyMC lingo, the “pebbles” returned in the sequence are the samples, cumulatively called the traces.
When I say that MCMC “intelligently searches,” I am really saying that we hope that MCMC will converge toward the areas of posterior probability. MCMC does this by exploring nearby positions and moving into areas with higher probability. “Converging” usually implies moving toward a point in space, but MCMC moves toward a broad area in the space and randomly walks around in that area, picking up samples from that area.
### Why Thousands of Samples
At first, returning thousands of samples to the user might sound like an inefficient way to describe the posterior distributions. I would argue that this is actually extremely efficient. Consider the alternative possibilities:
1. Returning a mathematical formula for the “mountain ranges” would involve describing an N-dimensional surface with arbitrary peaks and valleys. This is not easy.
2. Returning the “peak” of the landscape (the highest point of the mountain), while mathematically possible and a sensible thing to do (the highest point corresponds to the most probable estimate of the unknowns), ignores the shape of the landscape, which we have previously argued is very important in determining posterior confidence in unknowns.
Besides computational reasons, likely the strongest reason for returning samples is that we can easily use the Law of Large Numbers to solve otherwise intractable problems.
---
Date: 20210629
Links to: [003-Data-Science-MOC](003-Data-Science-MOC.md)
Tags: #review
References:
* [Markov Chain Monte Carlo (MCMC) : Data Science Concepts - YouTube](https://www.youtube.com/watch?v=yApmR-c_hKU&t=99s)
* [Markov Chain Monte Carlo - Introduction, mathematical monk](https://docs.sqlalchemy.org/en/14/orm/extensions/orderinglist.html)
* [MCMC Animations](https://twiecki.io/blog/2014/01/02/visualizing-mcmc/)
* [MCMC Slides](http://www.cs.cmu.edu/~aarti/Class/10701_Spring14/slides/MCMC.pdf)
* Machine learning a probabilistic perspective, textbook