# Space, Transformations and Descriptions
### Big Idea
At its core, we have [**space**](Space.md). This is often is **described** either intrinsically or with respect to some extrinsic space. Our space can be **transformed** into another space! This *transformed space* can be described with respect to the original space, or another space altogether.
> You must always remember to distinguish between a **space** and its **description/representation**.
So, we have:
* Objects reside in a space
* Some spaces are more conducive to analysis/algorithms than others. So we sometimes want to transform objects from one space to another, where we have more favorable properties to work with.
* Once we have transformed (via a *function*) our objects in the original space, they live in the transformed space (codomain). This transformed space specifically makes reference to the original space, since that, at its core, is what defines a function (an association/map between two spaces)!
### Minimum Viable Concept
Let $A$ and $B$ be sets, and let $F$ be a subset of $A \times B$. We say that $F$ is a **function** if it satisfies the following property: for each $a \in A$, there is a unique pair $(a,b) \in F$ (an input must have exactly one output). The set $A$ is called the *domain* of $F$ and $B$ is called the *codomain* of $F$. To denote this, we use the arrow notation:
$F: A \rightarrow B$
We think of functions computationally as mappings from inputs to outputs[^1]. So much so that the nouns function and map are synonyms. But this definition of a function is a set.
So here we have *two* spaces: the domain and codomain. They are often the “same” space, as is the case for functions such as $f(x) = x - 2$ (here the domain is $\mathbb{R}$ and the codomain is $\mathbb{R}$).
However, the elements of our domain are *mapped* to elements in the codomain. So while both the domain and codomain share the same space, $\mathbb{R}$, we see that each element is indeed mapped to a corresponding element:
$f(5) = 5 - 2 = 3$
$5 \rightarrow 3$
So, $5$ is mapped to $3$ via $f$.
You need to be super aware of your underlying space!!!!!!!!!!
### Can we get some examples please?
###### What exactly does “log space” mean?
Frequently in the ML and stats world you will hear about “transforming” your data to "**log space**". But what exactly does that mean? Taken literally it seems to convey that there is a *space* that is somehow logarithmic. But, given the definition of a space we talked about, what exactly would that look like? Remember, a space is a collection of all elements under consideration. In the case of our logarithm (and any function) we are inherently dealing with *two spaces*: the domain, $\mathbb{R}$, and codomain, $(0, + \infty]$.
This is part of the confusion. Log space does not mean a space in the sense of a domain or codomain per se. It certainly isn’t the codomain, $(0, + \infty]$. No, it is simply the space after performing a log transform.
Wait, won’t that simply give the codomain? Well, yes. But that isn’t what people mean when they talk about it! For instance, the entire notion of log space depends on knowing that the data *came from another space to start*! This is a crucial idea. If we start with human heights, then log transform them, we say we are looking at heights in “log space”. However, if we did not know we started with human heights and were simply given a finite sample of log heights, we wouldn’t be able to conclude “Ah yes, these are from the log space!”.
The key idea is as follows:
> Log space depends on the *reference* space that you started in. It is *with respect to that space* that we think about log space.
Why may this be useful? Well, we know that a logarithm will *squish* higher numbers progressively closers together. For instance, in our starting space, 10 and 100 have a difference of 90. However, after a $log_{10}$ transform, the difference is simply 1:
$log(100) - log(10) = 2 - 1 = 1$
So it had squished 100 and 10 closers together. In other words, our original space was log transformed and the resulting space (log space) squished numbers together.
This notion of our *starting space* (reference space) as being of paramount importance makes complete sense if we remember our definition of a function! It specifically is a *map* or *transformation* between two spaces! How could we ever hope to talk about a *function* without keeping a reference to the starting space?
###### And what about linear transformations?
[3b1b does a great job](https://youtu.be/kYB8IZa5AuE?t=92) of visualizing this idea of *keeping a reference* to the original space. That is why the gridlines remain in the background at all times.
Linear transformations are certainly a more *restrictive* transformation of space, obeying certain desirable properties, but they transform space never the less.
###### Descriptions of space vs space?
Consider a vector, specifically in this case an arrow, living in space. We can describe it with respect to *other vectors* in the space. This is what is happening when we use a [basis](Basis%20Vectors.md). If we are describing a vector $v$ with respect to a particular basis (set of building block vectors), and we wish to describe $v$ with respect to another basis, we are simply changing the basis; we are changing the *building blocks* with which we *describe/compose* $v$.
A [change of basis](Change%20of%20Basis%20Linear%20Algebra.md) is performed via a special linear transformation (a matrix). Which begs the question: in the case of a change of basis, what is being transformed: space or the description of space?
Well, we know that it cannot be the underlying space our vector lives in! Our vector is *invariant* (unchanged) to a change in basis. It is simply the representation/description that changes. So it is our *description* of $v$ that changes.
This concept comes up in many places, another of which is the [Relationship between Translating Space and Translating Objects](Relationship%20between%20Translating%20Space%20and%20Translating%20Objects.md)
###### Transformations and Probability Distributions
[Betancourt](https://betanalpha.github.io/assets/case_studies/probability_theory.html#42_probability_density_functions) has a great exposition on probability density functions and how they must transform under a measurable transformation. Unlike probability mass functions, probability densities don’t transform quite as naturally under a measurable transformation. The complication is that the differential volumes over which we integrate will in general change under such a transformation, and probability density functions have to change in the opposite way to compensate and ensure that probabilities are conserved. The change in volumes is quantified by the determinant of the _Jacobian matrix_ of partial derivatives.
Consider a PMF on a space $X$, where $X = \mathbb{N}$ (here the PMF is a poisson distribution):

What happens to our distribution if we **log transform** our space $X$? To start, a log transform looks like:

So, again we ask: what happens to the underlying PMF? We know that the probability assigned to a given $x$ in the original domain should be pushed forward to transformed $x$ in the codomain. So, for instance, we have $x=5$ in the domain $X$. It has approximately 0.17 probability mass assigned to it by the PMF. $x = 5$ is then transformed into log space and lives as $log(x) = log(5) = 1.6$. So, the probability that was assigned to $5$ now is assigned to $1.6$ in *log space*. The visual result is as follows:

We can see that on the new space (after the transformation) the PMF is no longer a poission probability mass function (we know that a [Poisson PMF has mean and variance that are equivalent](https://betanalpha.github.io/assets/case_studies/probability_theory.html#41_probability_mass_functions), and after applying our transformation the mean and variance are no longer equal!).
The key idea behind the transformation of a space as it applies to a PMF can be summarized succinctly as:
> * Let us have a PMF $F$ on the space $X$.
> * Transform the space $X$ to the new space $Y$ (above $Y = \sqrt{X}$) via a measurable function $g$.
> * Then, for each element $y \in Y$, assign it a probability equal to that of $F(g^{-1}(y))$
> * This simply means for every $y$ in the new space, find the $x$’s that mapped to it and assign all of their total probability to that particular $y$
This is outlined fully in my [log transforms notebook](https://github.com/NathanielDake/intuitiveml/blob/master/notebooks/Math-appendix/functions/log_transforms.ipynb).
###### Transformed spaces don’t make sense in isolation
Consider we have a space $X$ that is transformed via a simple multiplication by $3$ to a space $Y$. We can call the transformation $f$:
$f: X \rightarrow Y$
$f(x) = 3x$
Now, we know that both $X$ and $Y$ are $\mathbb{R}$. However, after transforming $X$ to $Y$ we may talk about the “transformed space”. They big idea though is that talking about this transformed space makes no sense with out keeping in mind the original space $X$. It is only with respect to the original space that a notion of transforming and a resulting transformed space makes sense.
#### Other things worth mentioning
* Why transform? Algorithms often exploit certain *structure* and some spaces will provide a structure more conducive to the algorithm for exploitation.
### References to include
1. [Logarithmic Transform](Logarithmic%20Transform.md)
2. [Relationship between Translating Space and Translating Objects](Relationship%20between%20Translating%20Space%20and%20Translating%20Objects.md)
3. [Space Transformations](Space%20Transformations.md)
4. [Linear Transformations](Linear%20Transformations.md)
5. [Vectors and Representation](Vectors%20and%20Representation.md)
[^1]: A Programmers Guide to Mathematics, pg 42