Copulas - Nate's Notes

# Copulas We can map *any* distribution to the uniform distribution via taking it's [Cumulative Distribution Function](Cumulative%20Distribution%20Function.md) (CDF): * It's CDF maps from the RV's domain to the domain of $[0, 1]$ * It's inverse CDF maps from from the domain of $[0,1]$ to the original domain of the RV We can come up with an entirely data driven form of the CDF by using the empirical CDF (ECDF) and the inverse ECDF. This is effectively just ranking each value. The empirical CDF, via ranking, will effectively just be mapping a given rv to the uniform distribution[^1]. In an analytic CDF, the information about how to transform from the rv to the uniform distribution is encoded in the function itself (such as the [normal distributions cdf](https://en.wikipedia.org/wiki/Normal_distribution#Cumulative_distribution_function)). But in an ECDF, some sort of structure or object will need to keep track of each rv value rank. Okay, so at this point, we know how to: * Map from any distribution to uniform (via the CDF or ECDF) * Map from the uniform distribution to any other distribution (ICDF, IECDF) $\text{CDF}: \text{Any Distribution} \rightarrow \text{Uniform}$ $\text{Inverse CDF}: \text{Uniform} \rightarrow \text{Any Distribution}$ How does this help us with our problem of creating a custom joint probability distribution? We’re actually almost done already. We know how to convert anything uniformly distributed to an arbitrary probability distribution. So that means we need to generate uniformly distributed data with the correlations we want. How do we do that? We simulate data from a multivariate Gaussian with the specific correlation structure that we want, transform so that the marginals are uniform (but the uniforms will still maintain the correlations of our original multivariate gaussians), and then transform the uniform marginals to whatever we like. For example, we can create samples from a correlated multivariate normal. We now have the correlations we want. ```python # Generate random samples from multivariate normal with correlation .5 mvnorm = stats.multivariate_normal(mean=[0, 0], cov=[[1., 0.5], [0.5, 1.]]) x = mvnorm.rvs(100000) ``` ![center | 300](Pasted%20image%2020241124170059.png) Now, map this data to be uniform via it's CDF. This joint plot above is usually how copulas are visualized. We can see the correlations are still present. For instance, the upper right corner shows that when $Y_1$ is near $1$, $Y_2$ is also very close to $1$. This is exactly what we would have expected. ```python norm = stats.norm() x_unif = norm.cdf(x) ``` ![center | 300](Pasted%20image%2020241124170303.png) Now we just transform the marginals again to what we want (Gumbel and Beta): ```python m1 = stats.gumbel_l() m2 = stats.beta(a=10, b=2) x1_trans = m1.ppf(x_unif[:, 0]) x2_trans = m2.ppf(x_unif[:, 1]) ``` ![center | 300](Pasted%20image%2020241124170430.png) We can contrast this to the joint distribution without correlations: ![center | 300](Pasted%20image%2020241124170456.png) --- Date: 20241124 Links to: Tags: References: * [while my_mcmc: gently(samples) - An intuitive, visual guide to copulas](https://twiecki.io/blog/2018/05/03/copulas/) * [intuitiveml/notebooks/Math-appendix/Probability/copulas.ipynb at e330416d6e4817c3a376fca650ea5eb85f194a8a · NathanielDake/intuitiveml · GitHub](https://github.com/NathanielDake/intuitiveml/blob/e330416d6e4817c3a376fca650ea5eb85f194a8a/notebooks/Math-appendix/Probability/copulas.ipynb) [^1]: Note: if there are many identical values in your array for a given rv, that may make it's uniform representation sub optimal