Cumulative Distribution Function

# Cumulative Distribution Function The **cumulative distribution function**, $F_X(x)$ of a random variable $X$ is a function, $\mathbb{R} \rightarrow \mathbb{R}$ , defined by: $F_X(x) = P \{ \omega \in \Omega : X(\omega) \leq x\}$ The argument $\omega$ is often omitted for brevity: $F_X(x) = P \{ X \leq x\}$ ![](Untitled%2013.png) ### Key Idea Notice what we have done. We have started with a [Sample Space](Sample%20Space.md) that is a *set* comprised of specific outcomes. Note that in no way does this set need to consist of numbers, it could just as easily consist of, say, people. We then create a [Random Variable](Random%20Variable.md) that maps elements from this sample space, to the real numbers. At that point, we can ask the question: what are the *properties* of this resulting set of real numbers? In other words, what are the properties of this random variable? If we want to investigate these properties, what tools do we have at our disposal? Calculus can almost always come to mind, in addition to the idea of *ordering*. Let us start by visualizing how the random variable maps events elements from our sample space to the number line: ![](Untitled%2014.png) We can see that certain areas end up with higher density than others! ![](Untitled%2015.png) Now, something we may like to do is *characterize* or *describe* this mathematically. In other words, what areas of $\mathbb{R}$ end up with a greater density (where we note that this implies a greater probability of observation)? How can we characterize how dense a given area is? For this we use a [Probability Density Function](Probability%20Density%20Function.md). ![](Untitled%2016.png) What is beautiful is that there is another way to characterize our random variable. We can also ask: "what is the probability of observing a specific value less than or equal to some $x$?". Again, this is a specific relationship about the structure of a mapping against a sample space, but via the tools of analytic geometry we can actually visualize this! If we create function (the cumulative distribution function) that captures the probability of the random variable being less than or equal to a given $x$, we can then visualize the curve below: ![](Untitled%2017.png) What is fantastic about this is that our curve nicely captures *density*! How does it capture density? Via the *slope* of the curve, i.e. the *derivative*. In areas where our random variable maps $\omegas from $\Omega$ to $\mathbb{R}$ with high density, the slope/derivative will be large. In areas of low density, it will be ~0. This provides us a bridge to the [Probability Density Function](Probability%20Density%20Function.md)! We can take the derivative of the CDF in order to get the PDF. It is worth summarizing this big idea: > We simply started with a *set* of outcomes, our sample space. Our random variable maps these outcomes to the real numbers. In this mapping we will end up with more and less dense areas, signifying more or less probable outcomes. We can describe this and talk about it nicely using the tools provided by *analytic geometry* and *calculus*. ### Technical note The empirical CDF generally assumes that each point in a dataset is equiprobable, and hence areas of high density correspond to high probability. However, that need not be the case! We could have a probability measure that assigns a dense area of points a low probability of occurring! In which case our CDF would not have a high slope. This can be seen below. ![](images/Probability%20(models%20and%20inequalities)%202.png) Note: when we look at a CDF, regions where it has a *large slope* correspond to regions of *high probability density* (of the [Probability Density Function](Probability%20Density%20Function.md)). --- Date: 20211007 Links to: [Probability MOC](Probability%20MOC.md) [Random Variable](Random%20Variable.md) Tags: References: * Discrete Stochastic Processes Notes, Gallagher ([here](https://drive.google.com/file/d/1DsCW0L8lLt6YdF2SBNK73UJh1oUcerwQ/view?usp=sharing))