# Probability in Machine Learning Contexts
We know that in Machine Learning we are often dealing with a high dimensional vector space and the points within it. If we take a probabilistic view, we can think of *points* in this space as comprising our *sample space*. We can then define random variables on this space (*features* in ML terminology) that map from the sample space to the real numbers. If our sample space was all possible people in existence, we could define a height and weight random variable, mapping a person from the sample space to their given height.
We would then have PDFs and CDFs for height and weight. We can then look at the joint distribution, i.e. the distribution across all possible pairs of heights and weights (cartesian product) that could exist.
Now, imagine that we then add another random variable that we treat as a class, *gender*; it is bernoulli distributed (1 or 0). Now we have 3 rvs, and our joint distribution just grew to an additional dimension. Our input space is now the cartesian product of all possible input tuples, $(height, weight, gender)$. Remember, each rv maps from the sample space to the number line. In the case of gender, the rv goes through every possible person, mapping each to 0 or 1 (male or female). We would see that the density of 0s and 1s is equal in this case (i.e. there is a 50% chance of male or female).
Again, we do the same for height and weight. We then can see for a given height and weight, look at the joint density conditional on male vs the joint density conditional on female. Which is larger? That is the more likely gender of this particular human with the given height and weight.
---
Date: 20220216
Links to:
Tags:
References:
* []()