# Linear Transformations ### Clear Definition[^1] Recall, a linear map preserves linear structure. Given a vector $x$: $x = \alpha_1 v_1 + \dots + \alpha_n v_n$ A linear map $f$ ensures that: $\begin{align} f(\alpha_1 v_1 + \dots + \alpha_n v_n) &= f(\alpha_1 v_1) + \dots + f(\alpha_n v_n) \\ &= \alpha_1 f(v_1) + \dots + \alpha_n f(v_n) \end{align} $ Where again the above property is ensured by linearity. This *is* a **linear transformation**. So we see how our linear map transforms an arbitrary vector $x$. If we were to select a set of basis vectors, it would also transform them as well. For instance, if $v_1, \dots, v_n$ were our basis vectors, we see that they are mapped to $f(v_1), \dots, f(v_n)$ respectively. But, given what we know, there is something missing! What is that? Lets think for a moment; we know the following: * We start with an $n$ dimensional vector space. * $f$ transforms vectors from that space (the **domain**) to the output space (the **codomain**) Ahah! That is it! We don’t know the dimension of the *codomain* yet. Generally, a function is defined in such a way that we know. For instance, it is often described as: $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ So, we now know that $f(v_1) \in \mathbb{R}^m$. ### Matrices are parameterizations; Can I get a visualization please? A matrix is a parameterization of a matrix for a specific basis. Visually this can be seen in [Linear transformation visual.excalidraw](Linear%20transformation%20visual.excalidraw.md). ### Intuitions $L$ is a transformation. This simply means it takes in a vector and spits outs a vector. However, if we state that $L$ is **[linear](Linearity.md)**, that imposes a great deal of structure on how it must move vectors. This structure is _information_. This great deal of information means that we only need a few things to specify where $L$ takes any vector $\vec{v}$. To take a step back for a moment, suppose we are dealing with a simple linear function: $f(x) = kx$. The fact that $f$ is a linear function means that if I tell you the coordinate of $x$, you simply need to know the value of $k$ and you immediately yield the value of $f(x)$. This applies to any $x$. So, we only need *one piece of information*, $k$, to describe where $f$ takes $x$. If we decided to *scale* or *add* to our input, [linearity](Linearity.md) states that that is no problem! We can evaluate in whatever order is *simplest* for our purposes. So, in our example: $f(3 + 4) = f(7) = 7k \longrightarrow f(3+4) = f(3) + f(4) = 3k + 4k = 7k$ Note that if $f$ were not linear, say it was $f(x) = x^2$, we cannot chose to perform the evaluation in any order that we like. For instance: $\overbrace{f(3 + 4) = f(7) = 7^2 = 49}^{\text{LHS}} \neq \overbrace{f(3+4) = f(3) + f(4) = 3^2 + 4^2 = 25}^{\text{RHS}}$ We see that the LHS does not equal the RHS, meaning that we must in fact follow a specific evaluation order to arrive at the correct result. Returning to linear transformations, specifically, we need to know where it takes the basis vectors. We can then simply scale the transformed basis vectors by the corresponding components of $\vec{v}$. This was proved on the whiteboard, and is a fundamental deduction from the fact that $L$ is linear. * [How does a linear transformation come to be represented by a matrix](https://photos.google.com/photo/AF1QipOJIrD1SyZUyuZ5OBQvzFnKQTSCeXGLAfGv7BFh) * [Represent linear function as matrix, map, and represent a nonlinear function](https://photos.google.com/photo/AF1QipPyzor-qm6HY6ebEO_p6lDhHNkWMFBKCfXH1k-S) ### Locking gridlines in place There is a great intuition [here](https://youtu.be/VmfTXVG9S0U?list=PLSQl0a2vh4HC5feHa6Rc5c0wbRTx56nF7&t=308) about how knowing where a linear transformation takes our basis vectors essentially *locks* our transformation in place. Thinking about gridlines, it essentially locks into place our entire transformed grid. Via 4 numbers, encoding our matrix, we can then read off where *any* point is moved to. I worked this out on the whiteboard [here](https://photos.google.com/photo/AF1QipOd0vVtsa0rOD6-mFmidCuchmCOf3ahDF2wNFFs). Think about this computationally. If we have function, $L(x) = 2x$, we know that regardless of the input, the functional will simply scale it by a factor of $2$: $L(1) = \color{red}2 \times \color{black} 1$ $L(2) = \color{red}2 \times \color{black} 2$ $L(3) = \color{red}2 \times \color{black} 3$ $L(50) = \color{red}2 \times \color{black} 50$ We can clearly see that the functions action does not depend on the input (i.e. it does not depend on where our input lives in the input space). Now, consider $f(x) = x^2$. The action of this function is entirely dependent on the input! $f(1) = \color{red}1 \times \color{black} 1$ $f(2) = \color{red}2 \times \color{black} 2$ $f(3) = \color{red}3 \times \color{black} 3$ $f(50) = \color{red}50 \times \color{black} 50$ In the case of the linear transformation/function, for any input we only need *one* piece of information to determine where it is transformed to: the constant factor by which it is multiplied (2 in the example above). In the case of the nonlinear transformation, for any input we need to determine the action that is to be taken, which is dependent on the input. In a sense this requires infinite information. More intuition found in this [quora response](https://www.quora.com/Why-is-it-correct-to-think-of-linear-transformations-as-moving-points-so-that-grid-lines-remain-parallel-and-evenly-spaced): > When we first learn about multiplication it is by two numbers to produce a third, a x b = c. What linear algebra reveals is how multiplication works when multiplying a whole FIELD of points by a number, where multiplication becomes a spatial transformation. For example imagine a field of random points around the origin in (x,y,z). Now if you multiply the x coordinate of every point by some coefficient, it will move all the points in the field inward or outward from the x=0 plane, depending on whether the coefficient is lesser or greater than one. That is what they mean by the grid lines: they represent the whole field of points spanned by the grid, and you can view the various coefficients as “control points” that warp or skew the grid this way or that, shifting every point simultaneously with the shifting field. And a proof in this [quora post](https://www.quora.com/Why-is-it-correct-to-think-of-linear-transformations-as-something-that-keeps-lines-as-lines-and-keeps-the-origin-fixed). Note: we can also say that linear transforms are the class of transforms that preserve **collinearity** (see Math Architectures of Deep Learning, pg 62). --- [^1]: A Programmers Guide to Mathematics, Jkun ([here](https://drive.google.com/file/d/1gtKERIMkRj9M0lHVzd8FJeHvXBhiGFGv/view?usp=sharing))