Kernel density estimation is a way to approximate the distribution of a dataset just by looking at the points in the dataset. KDE does not make very many modeling assumptions, so it’s a general technique to model the probability distribution that generated your data. To understand KDE, we need some intuition about the kernel function $k(x,q)$. This function should be large when $x$ and $q$ are similar and small when they are not. If the kernel has these properties, we call it a similarity kernel because it measures whether $x$ and $q$ are close to each other.

By adding together all of the $k(x,q)$ values, we can measure whether $q$ fits in with the observed data. If $q$ is similar to lots of examples from the dataset, many of the kernel values will be large and we will get a large sum. If $q$ is not similar to any examples, the kernel sum will be small. The KDE model just divides the sum by a normalization constant so that the probabilities are between 0 and 1. The model assigns a high probability density to a query if it is close to many examples and a low density otherwise.
---
Date: 20221030
Links to: [Machine Learning MOC](Machine%20Learning%20MOC.md)
Tags: #review
References:
* [RACE Sketches for Kernel Density Estimation - Randorithms](https://randorithms.com/2020/09/15/RACE-KDE.html)