# T-Digest
### Raison d'être
An [Online-Algorithm](Online-Algorithm.md) that allows us to compute approximate quantiles with adjustable accuracy limits.
### Overview
We may start by asking why it is even so important to estimate quantiles? Recall their definition:
> **Quantile (or percentile)**: A value that is used to split a group (vector/list) of numbers in a particular way. The **median** (50th percentile) is the value that is used to split a list of numbers into two equal groups. Percentiles can be used to reconstruct the [Cumulative Distribution Function](Cumulative%20Distribution%20Function.md).
It is important to note that T-Digest has variable accuracy and *constant relative accuracy*. This is a very important property to have, ensuring that it is more accurate towards the 99.9th percentile and less accurate towards the median.
### Original Approach
The way that T-Digest originally achieved its goals was as follows:
* It kept clusters (similar to k means in 1 dimension), where the size of the cluster was allowed to be large in the middle (where q ~ 0.5) and small at the 0 and 1 end
* Choosing how that size was selected could be done via $size = q(1 - q)$, ensuring that the size will decrease towards the end. This allows for us to have the *accuracy properties* that we want!
*
###
* See
* Input: Any sequence of numeric observations
* numeric observations paired with a mass
* The sketch is an estimate of the CDF


Under the hood T-Digest stores a bunch of clusters. A cluster is a location and a mass. It is a location somewhere in your data space.

When you update your T-Digest you have an incoming observation and you locate the cluster in the T-Digest nearest it (green below) and then you add the mass and update the location based on the weighted average.

Each cluster can be assigned a quantile. That is the sum of the mass to the left, divided by the total mass.




---
Date: 20210831
Links to:
Tags: #todo
References:
* [Tdunning overview](https://www.youtube.com/watch?v=CR4-aVvjE6A)
* [# Sketching Data with T Digest - youtube talk](https://www.youtube.com/watch?v=ETUYhEZRtWE)
* [PDF paper by dunning (marked up by me)](https://drive.google.com/file/d/11qk76KQHBwo6ZKuj8cKT-HJpD41Rmida/view?usp=sharing)