# Linear Regression ### General Form Linear regression can be modeled most simply as: $y(x) = w^T x + \epsilon = \sum_{j=1}^D w_j x_j + \epsilon$ Where $\epsilon$ is normally distributed. ### Linear in *weights* We need to keep in mind that linear regression is linear in terms of it's weights, not necessarily in terms of $x$. See [here](https://youtu.be/rVviNyIR-fI?t=469). ### A Probabilistic Perspective Note that because $\epsilon$, our **residual error**, is normally distributed, we can write it as: $\epsilon \sim \mathcal{N}(\mu, \sigma^2)$ This allows us to actually represent our model as a *conditional probability distribution*: $p(y \mid x, \theta) = \mathcal{N}\big(y \mid \mu(x), \sigma^2(x) \big)$ In the canonical/simplest case, we let $\mu$ be a linear function in $x$: $\mu(x) = w^Tx$ And assume that out noise, $\sigma^2$ is fixed. So, our model parameters would then be $\theta = (w, \sigma^2)$. We can then write our model as: $p(y \mid x, \theta) = \mathcal{N}\big(y \mid w^Tx, \sigma^2 \big)$ To be clear, this has a nice and simple interpretation, namely: > Given a specific input point $x$ and parameters $\theta$ (i.e. the weights of our linear transformation and the amount of noise), the conditional probability of a specific $y$ is a normal distribution, centered at mean $w^Tx$, with variance equal to that of our noise parameter. --- References * [Linear regression - Nonlinearity via basis functions](https://www.youtube.com/watch?v=rVviNyIR-fI&t=200s)