# Linear Regression
### General Form
Linear regression can be modeled most simply as:
$y(x) = w^T x + \epsilon = \sum_{j=1}^D w_j x_j + \epsilon$
Where $\epsilon$ is normally distributed.
### Linear in *weights*
We need to keep in mind that linear regression is linear in terms of it's weights, not necessarily in terms of $x$. See [here](https://youtu.be/rVviNyIR-fI?t=469).
### A Probabilistic Perspective
Note that because $\epsilon$, our **residual error**, is normally distributed, we can write it as:
$\epsilon \sim \mathcal{N}(\mu, \sigma^2)$
This allows us to actually represent our model as a *conditional probability distribution*:
$p(y \mid x, \theta) = \mathcal{N}\big(y \mid \mu(x), \sigma^2(x) \big)$
In the canonical/simplest case, we let $\mu$ be a linear function in $x$:
$\mu(x) = w^Tx$
And assume that out noise, $\sigma^2$ is fixed. So, our model parameters would then be $\theta = (w, \sigma^2)$. We can then write our model as:
$p(y \mid x, \theta) = \mathcal{N}\big(y \mid w^Tx, \sigma^2 \big)$
To be clear, this has a nice and simple interpretation, namely:
> Given a specific input point $x$ and parameters $\theta$ (i.e. the weights of our linear transformation and the amount of noise), the conditional probability of a specific $y$ is a normal distribution, centered at mean $w^Tx$, with variance equal to that of our noise parameter.
---
References
* [Linear regression - Nonlinearity via basis functions](https://www.youtube.com/watch?v=rVviNyIR-fI&t=200s)