# Machine Learning vs. Time Series
### Notes
* ML's emphasis on flexible nonparametric modeling of conditional-mean nonlinearity doesn't play a big role in TS. Of course there are the traditional TS conditional-mean nonlinearities: smooth non-linear trends, seasonal shifts, and so on. But there's very little evidence of important conditional-mean nonlinearity in the covariance-stationary (de-trended, de-seasonalized) dynamics of most economic time series. Not that people haven't tried hard -- really hard -- to find it, with nearest neighbors, neural nets, random forests, and lots more.
* ML tends to focus on trying to estimate **conditional function expectations** (i.e. what is the expected value of $f$, *given* that we observe some context $x$). Econometrics on the other hands tries to get more at causality via the estimation of **partial derivatives**.
* You must model your time dynamics in an ML algorithm, or you will miss out on a lot of structure that would help in forecasting.
* Time series data is ***not independent***. This is similar to image data, where the value of a pixel is not independent of the pixel next to it.
* The big difference between time-series data and “regular” data is the violation of the identically and independently distributed (i.i.d.) assumption, which is core to many standard machine learning models. This assumption states that every observation is independent of all others, and that the observations all come from the same generative distribution.
* The central point that differentiates time-series problems from most other statistical problems is that in a time series, observations are not mutually independent. Rather a single chance event may affect all later data points.
### On Independence and Identical Distribution
Mathematically, we state that two events $A$ and $B$ are **independent** if:
$P(A,B) = P(A)P(B)$
This can be said in a way that highlights *information flow*. Two events $A$ and $B$ are independent if *knowledge about the outcome of* $A$ provides *no information* about the outcome of $B$.
Identical distributions are even easier to understand. It simply means we are dealing with distributions that are, wait for it, *identical*.
In terms of [random variables](Random%20Variable.md), we say that they are independent if:
$F_{XY}(x,y) = F_X(x) F_Y(y) \;\;\;\;\; \text{for each } x \in \mathbb{R}, \; y \in \mathbb{R}$
Say we have a bernoulli rv representing the outcome of a coin flip. We then have a sequence of these rvs (and the corresponding outcome):
$X_1, X_2, \dots, X_n$
The sample space for this sequence is all possible $n$ tuples of $0$s and $1$s. In this case they are clearly IID. The outcome of any flip does not impact any other, and they each are bernoulli distributed, with a probability $p$ of heads.
However, consider a sequence of random variables representing the price of a stock (say, google) on a given day:

We see that IID clearly does not apply here:
* Not Independent: The price of a stock *today* depends on its price *yesterday*
* Not Identically Distributed: The variance clearly changes over time, meaning that the underlying distribution was changing.
### A Key Distinction
It is worth noting a key distinction that is often lost when discussing time series and ML. In ML we are indeed often dealing with IID samples, but that does not mean that the outcome of a specific data point $x_1$ does not provide us any information about $x_2$. For instance, consider the following situation:
$(x_1 = 2.0, \; y_1 = 9.0)$
$(x_2 = 2.1, \; y_2 = ?)$
Here we have two observations, $x_1$ and $x_2$. A fundamental assumption of ML is that our underlying function can be *approximated* and we can use **interpolation** to determine unknown values, based on the values nearby. For instance, in this case the fact that our $x$’s are very similar means that our $y