Machine Learning vs Time Series

# Machine Learning vs. Time Series ### Notes * ML's emphasis on flexible nonparametric modeling of conditional-mean nonlinearity doesn't play a big role in TS. Of course there are the traditional TS conditional-mean nonlinearities: smooth non-linear trends, seasonal shifts, and so on. But there's very little evidence of important conditional-mean nonlinearity in the covariance-stationary (de-trended, de-seasonalized) dynamics of most economic time series. Not that people haven't tried hard -- really hard -- to find it, with nearest neighbors, neural nets, random forests, and lots more. * ML tends to focus on trying to estimate **conditional function expectations** (i.e. what is the expected value of $f$, *given* that we observe some context $x$). Econometrics on the other hands tries to get more at causality via the estimation of **partial derivatives**. * You must model your time dynamics in an ML algorithm, or you will miss out on a lot of structure that would help in forecasting. * Time series data is ***not independent***. This is similar to image data, where the value of a pixel is not independent of the pixel next to it. * The big difference between time-series data and “regular” data is the violation of the identically and independently distributed (i.i.d.) assumption, which is core to many standard machine learning models. This assumption states that every observation is independent of all others, and that the observations all come from the same generative distribution. * The central point that differentiates time-series problems from most other statistical problems is that in a time series, observations are not mutually independent. Rather a single chance event may affect all later data points. ### On Independence and Identical Distribution Mathematically, we state that two events $A$ and $B$ are **independent** if: $P(A,B) = P(A)P(B)$ This can be said in a way that highlights *information flow*. Two events $A$ and $B$ are independent if *knowledge about the outcome of* $A$ provides *no information* about the outcome of $B$. Identical distributions are even easier to understand. It simply means we are dealing with distributions that are, wait for it, *identical*. In terms of [random variables](Random%20Variable.md), we say that they are independent if: $F_{XY}(x,y) = F_X(x) F_Y(y) \;\;\;\;\; \text{for each } x \in \mathbb{R}, \; y \in \mathbb{R}$ Say we have a bernoulli rv representing the outcome of a coin flip. We then have a sequence of these rvs (and the corresponding outcome): $X_1, X_2, \dots, X_n$ The sample space for this sequence is all possible $n$ tuples of $0$s and $1$s. In this case they are clearly IID. The outcome of any flip does not impact any other, and they each are bernoulli distributed, with a probability $p$ of heads. However, consider a sequence of random variables representing the price of a stock (say, google) on a given day: ![500](Screen%20Shot%202022-08-08%20at%208.03.11%20AM.png) We see that IID clearly does not apply here: * Not Independent: The price of a stock *today* depends on its price *yesterday* * Not Identically Distributed: The variance clearly changes over time, meaning that the underlying distribution was changing. ### A Key Distinction It is worth noting a key distinction that is often lost when discussing time series and ML. In ML we are indeed often dealing with IID samples, but that does not mean that the outcome of a specific data point $x_1$ does not provide us any information about $x_2$. For instance, consider the following situation: $(x_1 = 2.0, \; y_1 = 9.0)$ $(x_2 = 2.1, \; y_2 = ?)$ Here we have two observations, $x_1$ and $x_2$. A fundamental assumption of ML is that our underlying function can be *approximated* and we can use **interpolation** to determine unknown values, based on the values nearby. For instance, in this case the fact that our $x$’s are very similar means that our $ys should be very similar. Without that we simply don’t have enough information to learn from. We are effectively trying to estimate $y$ conditional on $x$. The key point is that clearly our observations aren’t entirely “independent” in the sense of information transfer. The way that independence is used in the literature and in relation to TS is as follows: * We observe $(x_1, y_1)$, where here our index is *time*. *Before* we observe the next observation, we ask, does knowledge of $(x_1, y_1)$ provide us information about $y_2$? * In traditional ML it is does not. Now once we *know* this second observation we of course can think of them as both providing information about our underlying distribution. * In traditional TS is does! We will specifically use $y_1$ to help predict $y_2$. TS has a specific structure that we will want to take advantage of as much as possible that is not present with ML. --- Date: 20220808 Links to: [003-Data-Science-MOC](003-Data-Science-MOC) Tags: #review References: * [No Hesitations: ML and Metrics VI: A Key Difference Between ML and TS Econometrics](https://fxdiebold.blogspot.com/2017/03/machine-learning-and-econometrics-vi.html) * [No Hesitations: Machine Learning vs. Econometrics, III](https://fxdiebold.blogspot.com/2016/10/machine-learning-vs-econometrics-iii.html) * [regression - Why time series analysis is not considered a machine learning algorithm - Cross Validated](https://stats.stackexchange.com/questions/160382/why-time-series-analysis-is-not-considered-a-machine-learning-algorithm) * [Jonathan Gordon's answer to How do we treat time series data differently than other data before passing it to a Bayes predictor or any other machine learning algorithms? - Quora](https://qr.ae/pv5pWn) * [Time Series Analysis — Part I/II. From Independent and Identically… | by Florian Heinrichs | Towards Data Science](https://towardsdatascience.com/time-series-analysis-part-i-3be41995d9ad)