# Darts (Time Series) # Overview First we can start with some general terminology. * **Endogenous Variables**: This is the **target** time series (that we wish to predict). In sklearn this would be the `y` (which, due to the autoregressive nature of the problem, is also used to generate `X`). * **Exogenous Variables**: This would be our **features** in traditional modeling, i.e. our `X`. These are variables that have predictive power for whatever our problem is, but our model cannot predict them. * **Covariates[^1]**: External data that can be used to help improve forecasts. In the context of forecasting models, the **target** is the series to be forecasted/predicted, and the covariates themselves are not predicted. As with exogenous variables, covariates are yet another way of describing *features*. * **Past Covariates**: Covariates known only into the past (e.g. measurements, for example, *what the actual temperature was 2 days prior to inference time*) * **Future Covariates**: Covariates known into the future (e.g. weather forecasts). * **Static Covariates:** Covariates that are constant over time (e.g. the county that given node was located in) ### How does Darts do probabilistic prediction? This is described nicely in [G. Grosch, F. Lässig - Darts: Unifying time series forecasting models from ARIMA to Deep Learning - YouTube](https://www.youtube.com/watch?v=thg10qDqpRE). > It does not output a time series, but rather it outputs *parameters*, $\theta$, of a given probability distribution. Using these parameters we can obtain an arbitrary number of sample predictions. ![](Screenshot%202023-05-19%20at%202.10.18%20PM.png) ### Misc Notes * When performing quantile regression with LGBM, a model is trained for *each* quantile that we wish to estimate! [darts/lgbm.py at master · unit8co/darts · GitHub](https://github.com/unit8co/darts/blob/master/darts/models/forecasting/lgbm.py#L200) * If we fit, say, a unique model to 10 different quantiles (via [pinball loss](Quantile%20Regression.md)), then we have started to get useful information about the full distribution (if we have 100 evenly spaced quantiles that starts to be a nice approximation, under certain conditions) * When we call `predict`, under the hood each of our 10 fitted models generates a prediction (the associated target value at the associated quantile). Then, for each sample that we wish to generate (say we want 300 samples, see [here](https://github.com/unit8co/darts/blob/f067f27103ad9327e67938bb59e3e8c098562f78/darts/models/forecasting/regression_model.py#L552)), under the hood a random number between 0 and 1 will be generated, and then we will linearly interpolate (take a weighted average) between the nearest known quantiles ([darts.utils.likelihood_models — darts documentation](https://unit8co.github.io/darts/_modules/darts/utils/likelihood_models.html#QuantileRegression)) ### Terminology * `components` are features ### What other feature engineering could they do with us * diff method * ratio method # Notes * sub estimators should maintain universal interface like sktime * could subclass * could have adapter/shim to get things into format of darts * * Train and inference functions? * Who is calling distributed estimator? * who is --- Date: 20230519 Links to: Tags: References: * [G. Grosch, F. Lässig - Darts: Unifying time series forecasting models from ARIMA to Deep Learning - YouTube](https://www.youtube.com/watch?v=thg10qDqpRE) [^1]: [Covariates — darts documentation](https://unit8co.github.io/darts/userguide/covariates.html)