Mean Shift Clustering - Nate's Notes

# Mean Shift Clustering Most people that have come across clustering algorithms have learnt about **k-means**. Mean shift clustering is a newer and less well-known approach, but it has some important advantages: * It doesn't require selecting the number of clusters in advance, but instead just requires a **bandwidth** to be specified, which can be easily chosen automatically * It can handle clusters of any shape, whereas k-means (without using special extensions) requires that clusters be roughly ball shaped. The algorithm is as follows: * For each data point x in the sample X, find the distance between that point x and every other point in X * Create weights for each point in X by using the **Gaussian kernel** of that point's distance to x * This weighting approach penalizes points further away from x * The rate at which the weights fall to zero is determined by the **bandwidth**, which is the standard deviation of the Gaussian * Update x as the weighted average of all other points in X, weighted based on the previous step This will iteratively push points that are close together even closer until they are next to each other. This can almost be thought of as a form of "iterative **gravity**". --- Date: 20230810 Links to: Tags: References: * [Lesson 12: Deep Learning Foundations to Stable Diffusion - YouTube](https://www.youtube.com/watch?v=_xIzPbCgutY&t=2554s)