Probability from first principles

# Probability from first principles ### Resources to include * [Abstract Probability Distributions](Abstract%20Probability%20Distributions.md) ### Key Ideas * We start with a [Space](Space.md) (and the structure/operations provided via [Set Theory](Set%20Theory.md)) * Probability is simply a positive, conserved quantity that we want to distribute across a given space. In particular it does not necessarily refer to anything inherently random or uncertain. * So if our probability function is P, it simply takes in a set and maps it to a number in $[0, 1]$ (where P must satisfy certain conditions). * A probability distribution defines a mathematically self-consistent allocation of this conserved quantity across the space $X$. If $A$ is a subset of X then we write $P_{\pi}[A]$ as the probability assigned to A by the probability distribution $\pi$ * Once we have defined a probability distribution on a space, $X$, and a well-behaved collection of subsets, we can then consider how the probability distribution transforms when $X$ (the space) transforms ### Measurable Transformations A good visualization is provided [here](Logarithmic%20Transform.md#Can%20I%20get%20a%20visualization). You can also checkout [Space, Transformations and Descriptions](Space,%20Transformations%20and%20Descriptions.md). With that said, here is how I think about it: ###### Probability Mass Functions * We have to be ever vigilant in recognizing the limitations of a representation such as a probability mass function. Pushing our probability distribution forward along a measurable transformation yields a pushforward distribution with its own probability mass function, mean, and variance. * What happens to the representative probability mass function when we reparameterize with the one-to-one measurable transformation y=x√y=x? Using the transformation rules introduced above we can readily visualize the probability mass function of the pushforward distribution , * On this new space the probability mass function is no longer a Poissonian probability mass function! * After applying our transformation the mean and variance are no longer equal! * For intensities above 5 or so the variance of the pushforward distribution stabilizes to a constant value independent of the intensity itself, decoupling the exact correlation between mean and variance featured in the original distribution. Indeed the square root transformation is known as a _variance stabilizing transformation_ for distributions that admit parameterizations with Poissonian probability mass functions. * Reparameterizations like these are often used to simplify the use of a given probability distribution by pushing it forward to a space where the corresponding probability mass function, or even means and variances, have certain desirable properties. ###### Probability Density Functions * Unlike probability mass functions, probability densities don’t transform quite as naturally under a measurable transformation. The complication is that the differential volumes over which we integrate will in general change under such a transformation, and probability density functions have to change in the opposite way to compensate and ensure that probabilities are conserved. * The change in volumes is quantified by the determinant of the Jacobian Matrix of partial derivatives ### Probability Density Functions * Probability density functions (and the discrete counterpart, probability mass functions) are simply a way to *describe* the way in which a random variable is distributed. Specifically, it assigns the conserved quantity **probability** to different values of the random variable. * As outlined above, if we *transform* the random variable, its density function will need to change as well! * Below we can see a transformation of an exponentially distributed random variable, $X$, via a square root transformation: $f_X(x) = e^{-x}$ $Y = \sqrt{X}$ * We wish to find the density function of $Y$ ![700](PDF%20transformations.png) ![400](PDF%20transformations%202.png) * This example was taken from this video [here](https://www.youtube.com/watch?v=OeD3RJpeb-w). * The big thing to keep in mind though is that both $f_X$ and $f_Y$ are simply **parameterizations** of the same **abstract probability distribution**. * We see that as $g$ transforms space (where the amount of transformation can be described via the [Derivative](Derivative.md)) our density function must change inversely in order to ensure that probability is conserved. * See more in my notes [here](https://photos.google.com/photo/AF1QipPuDwSKnDfao1LTHPef2W3bKUBispiSULABCC9I). ##### A Concrete Example This is taken from Betancourt, [here](https://betanalpha.github.io/assets/case_studies/probability_theory.html). Consider a normal probability density function: ![500](Screen%20Shot%202022-08-01%20at%2010.11.19%20AM.png) And then consider the logistic transform: $y = \frac{1}{ 1 + e^{-x}}$ ![500](Screen%20Shot%202022-08-01%20at%2010.12.29%20AM.png) Which transforms space as follows: ![600](Screen%20Shot%202022-08-01%20at%2010.12.39%20AM.png) We can see that as we get further from $0$, the transformation squishes values closer and closer together. Lets take stock of where we are: * We have a **probability distribution** that, on space $X$, admits a **gaussian probability density**. * This does not mean that our probability distribution is gaussian. There *are not gaussian probability distributions*! Only probability distributions that admit gaussian probability densities on certain parameterizations (certain spaces). * So, in this case our gaussian probability density is specifically tied to our space (parameterization) $X$. * The question is: What happens if we transform our underlying space $X$ to $Y$ via the logistic transform? What will happen to our density? We effectively need to do two things (shown in earlier sections of this post): 1. Find the inverse of the logistic, so that given a $y$ we can map it back to its original $x$, and grab the probability of that $x$, which is $F_X(x)$. 2. Because we need to conserve *probability* and we are dealing with a density, we need to ensure that if the transformation changed the underlying differential ($dx$), that we account for that so that probability is conserved. The final result is: ![500](Screen%20Shot%202022-08-01%20at%2010.14.18%20AM.png) We will always start with some parameterization, and then we can move things around as long as we are careful about how space is transformed and ensure that probability is pushed forward correctly. --- Date: 20220713 Links to: Tags: #review References: * [Probability Theory Overview.excalidraw](Probability%20Theory%20Overview.excalidraw.md) * Probability, Jim Pitman (page 305)