Logarithmic Transform - Nate's Notes

# Logarithmic Transform ### Log Space See your notes on log transforms in intuitiveml. The key ideas is that we always refer to *something* being transformed into log space. So “being in log space” is always with reference to *something being in log space*. > Yes, "predicting the probabilities in log space" is another way of saying "predicting the log of the probabilities". The reason for talking about "log space" is to suggest the perspective that we are still predicting the probabilities, only transformed into a different space where they are easier to work with. ### Can I get a visualization? [Betancourt](https://betanalpha.github.io/assets/case_studies/probability_theory.html#42_probability_density_functions) has a great exposition on probability density functions and how they must transform under a measurable transformation. Unlike probability mass functions, probability densities don’t transform quite as naturally under a measurable transformation. The complication is that the differential volumes over which we integrate will in general change under such a transformation, and probability density functions have to change in the opposite way to compensate and ensure that probabilities are conserved. The change in volumes is quantified by the determinant of the _Jacobian matrix_ of partial derivatives. Consider a PMF on a space $X$, where $X = \mathbb{N}$ (here the PMF is a poisson distribution): ![500](Screen%20Shot%202022-07-15%20at%209.20.19%20PM.png) What happens to our distribution if we **log transform** our space $X$? To start, a log transform looks like: ![500](Screen%20Shot%202022-07-15%20at%209.21.58%20PM.png) So, again we ask: what happens to the underlying PMF? We know that the probability assigned to a given $x$ in the original domain should be pushed forward to transformed $x$ in the codomain. So, for instance, we have $x=5$ in the domain $X$. It has approximately 0.17 probability mass assigned to it by the PMF. $x = 5$ is then transformed into log space and lives as $log(x) = log(5) = 1.6$. So, the probability that was assigned to $5$ now is assigned to $1.6$ in *log space*. The visual result is as follows: ![500](Screen%20Shot%202022-07-15%20at%209.22.13%20PM.png) We can see that on the new space (after the transformation) the PMF is no longer a poission probability mass function (we know that a [Poisson PMF has mean and variance that are equivalent](https://betanalpha.github.io/assets/case_studies/probability_theory.html#41_probability_mass_functions), and after applying our transformation the mean and variance are no longer equal!). The key idea behind the transformation of a space as it applies to a PMF can be summarized succinctly as: > * Let us have a PMF $F$ on the space $X$. > * Transform the space $X$ to the new space $Y$ (above $Y = \sqrt{X}$) via a measurable function $g$. > * Then, for each element $y \in Y$, assign it a probability equal to that of $F(g^{-1}(y))$ > * This simply means for every $y$ in the new space, find the $x$’s that mapped to it and assign all of their total probability to that particular $y$ This is outlined fully in my [log transforms notebook](https://github.com/NathanielDake/intuitiveml/blob/master/notebooks/Math-appendix/functions/log_transforms.ipynb). ### How does log scale fit in? Remember that if we transform our underlying space, that can be viewed as equivalent to transforming our coordinate system in certain cases. We can describe the situation as follows: We have a domain $X_1$ and a codomain $X_2$, related by a function we will call $f$: $f: X_1 \rightarrow X_2$ So let us look at a starting a starting space: ![500](Screen%20Shot%202022-07-21%20at%203.19.59%20PM.png) We then, as before, can ask what it means to view $X_2$ in log space? This can be done via applying $g = log$: $g: X_2 \rightarrow X_3$ Visually, this looks like: ![500](Screen%20Shot%202022-07-21%20at%203.23.47%20PM.png) A key realization you should be having is as follows: > We are visualizing $X_3$, which is simply $X_2$ transformed into *log space* with respect to $X_1$. It is because we are looking at it with respect to $X_1$ that the *linear relationship* presents itself! So what does it mean to view something in **log scale**? Well, we can apply this via a simply matplotlib command and see that it *looks* identical to the transformed version above: ![500](Screen%20Shot%202022-07-21%20at%203.27.08%20PM.png) Now *is* it identical? Of course not, the axis are different! Above we have the upper tick mark corresponding to $X_3 = 10^3$, where as in the transformed version we have $X_3 = 6$ at the upper tick mark. However, there is indeed something deeper at play here! It is that in *both* visuals a log transform is occurring! One is *explicit* and we create a new variable $X_3$ to signify that. The other is *implicit*. If we were to investigate the geometric properties of our data in this space, all curvature has been lost from the original representation. This is because a log transform *did* occur! The only *nice-to-have* visualization property that we added was that *after the log transformation* we took the tick mark values and mapped them back to the $X_2$ space. Here is what I mean; consider the tick marks of $X_3$ (i.e. *after* the log transform): $2, 4, 6$. We can simply invert our map and see what points they came from in the previous space: $g^{-1}(2) = 7.39, \;\; g^{-1}(4) = 54.59, \;\; g^{-1}(6) = 403.42$ So, under the hood the following is happening when using a log scale: 1. $X_2$ is log transformed, resulting in $X_3$ 2. Instead of plotting $X_3$ against $X_1$, which has the disadvantage of losing the label/meaning associated with $X_2$ when compared to $X_1$, we then simply relabel the y-axis, where we take each tick and map it back via $g^{-1}$ to the original value in $X_2$. As a concrete example, consider covid cases. We have an original dataset that is exponential: ![500](Screen%20Shot%202022-07-21%20at%203.48.32%20PM.png) We then can log transform that space as shown below: ![500](Screen%20Shot%202022-07-21%20at%203.48.51%20PM.png) The problem is then that our y axis is in terms of $log(count)$, which may be less interpretable compared to the original count. Is there any way we can have *both* components? Can we see the *linear* relationship that presents itself when the count of cases per day is transformed to log space, *and* keep our original description of *count* and not $log(count)$? Of course there is - we can simply use a log scale! ![500](Screen%20Shot%202022-07-21%20at%203.51.20%20PM.png) And there we have it! The log scale allows us to: > 1. *Visualize* what a relationship may look like *after* performing a log transform > 2. *And* keep our original (potentially more intuitive and understandable) description of the codomain (i.e. count instead of $log(count)$) --- Date: 20220405 Links to: [Mathematics MOC](Mathematics%20MOC.md) Tags: References: * [You should (usually) log transform your positive data | Statistical Modeling, Causal Inference, and Social Science](https://statmodeling.stat.columbia.edu/2019/08/21/you-should-usually-log-transform-your-positive-data/) * [Logistic Regression from Bayes' Theorem — Count Bayesie](https://www.countbayesie.com/blog/2019/6/12/logistic-regression-from-bayes-theorem) * [probability - In the context of machine learning deep learning, what does a "log space" mean? - Mathematics Stack Exchange](https://math.stackexchange.com/questions/3302446/in-the-context-of-machine-learning-deep-learning-what-does-a-log-space-mean)