# Mahalanobis Distance
Euclidean distance is tied to basis (coordinate system) you are in. If that basis is not aligned with your data's *natural geometry*—the directions and scales of its actual variation—euclidean distance will be misleading. Specifically:
1. Differences along directions of *high variance* will be *overstated* (look farther than they should)
2. Differences along directions of *low variance* will be *understated* (look closer than they should)
Mahalanobis distance fixes this by performing a change of basis that whitens the data—removing correlations and normalizing variances — so that Euclidean distance in the transformed space reflects the data’s true geometric structure. It allows us to measure distance in a coordinate system aligned with the data's geometry.
This is easy to think about. There are several conditions under which euclidean distance will naturally reflect the geometry of your data. When:
1. Axes are orthogonal (no correlation between features) — $\Sigma$ is diagonal in your basis
2. Equal scale along all axes (all variances are equal, so each axis contributes equally)
But if either of those are not satisfied, then euclidean distance won't reflect the geometry of your data correctly.
Note that "choosing a geometry" really amounts to choosing a metric $M$ that defines how lengths are measured:
$d(x,y) = \sqrt{(x-y)^\top M (x-y)}$
- If $M = I$ → standard Euclidean geometry (equal, orthogonal axes)
- If $M = \Sigma^{-1}$ → Mahalanobis geometry (ellipsoid aligned with data covariance)
- Other $M$ → other geometries (weighted features, custom similarity, etc.)
And to be clear: we *must* pick $M$. Frequently we just do so *implicitly* by letting $M = I$. But that is still a *choice*—one we just don't think about making. The danger is *not realizing* you're making a choice.
Picking $M$ is saying: “This is what I consider equal-length steps in different directions.” In statistics, choosing $M = \Sigma^{-1}$ means: “One standard deviation in any principal direction counts the same amount.”
We can view $M$ in either the [Active or Passive View](Active%20and%20Passive%20View%20-%20Change%20of%20Basis%20and%20Linear%20Transformations.md).
# Overview 1
**Mahalanobis distance** is a measure of the distance between a point $p$ and a **probability distribution** $D$. Given $D$, which is parameterized by a mean $\mu = [\mu_1, \dots, \mu_N ]$ covariance matrix $\Sigma$, the Mahalanobis distance between a point $p$ and $D$ is:
$D_M(x, D) = \sqrt{(x - \mu)^T \Sigma^{-1} (x - \mu)}$
We can visualize this below. Note that both $p_1$ and $p_2$ have the same euclidean distance to the point $\mu$. However, if we ask the distance from a $p$ to the *distribution* $D$, which is centered at $\mu$ and has a covariance $\Sigma$, we see that $p_1$ is much closer than $p_2$.

If $\Sigma$ is the identity matrix, then this will resolve to standard euclidean distance.
In the next section I'll look at how the Mahalanobis distance can be use to compare two points under a geometry defined by $\Sigma$. All this really means is that implicitly, under the hood, one of these points is going to be treated as the mean (as it is above).
# Overview 2
Mahalanobis Distance is a generalization of Euclidean Distance. Euclidean distance assumes that all directions are equally important, coordinates are uncorrelated and the geometry is [Isotropic](Isotropic.md). We define it as:
$D_E(x, y) = \sqrt{(x - y)^T I (x - y)} = \|x - y\|_2$
Where $I$ is the identify matrix.
Mahalanobis distance is defined as:
$D_M(x, y) = \sqrt{(x - y)^T \Sigma^{-1} (x - y)}$
Where now $\Sigma$ is the covariance matrix. This encodes how to "weight" directions based on how much variation there is in those directions.
Below we can see that the underlying [Space](Space.md) is the same, but the level sets (different fixed distances) are quite different. We say that Euclidean distance is [Isotropic](Isotropic.md) (level sets are circles). On the other hand Mahalanobis distance is [Anisotropic](Anisotropic.md) (level sets are ellipses).

When you think about different distances metrics be applied to a space, remember [Space vs Geometry](Geometry.md#Space%20vs%20Geometry). A distance metric will define a geometry, but it won't alter the underlying space.
However, there can be an equivalence to applying Mahalanobis Distance to a space and looking at it's resulting geometry, and transforming the space in such a way that euclidean distance in this new space is equivalent to Mahalanobis distance in the original space. Specifically, if we take our original space and apply the linear transformation $f$:
$f(x) = L^{-1}(x - \mu)$
Where $\Sigma = LL^T$, then:
$D_M(x, y) = \|f(x) - f(y)\|_2$
That is: Measuring Mahalanobis distance in the original space is **equivalent** to transforming the space and measuring Euclidean distance in the new coordinates.
---
This as measure used in multivariate statistical testing. Below we can see that in terms of euclidean distance the red and blue squares each have the same distance from the mean (the blue plus sign). However, we intuitively can see that the red square is in a sense more of an outlier; it does not fall within the band of points.

This highlights that when we are dealing with data that has **covariance** euclidean distance is not sufficient.
What we can do though is perform a change of basis, so instead of $x_1$ and $x_2$ we have two new basis vectors, specifically those that are (scaled) eigenvectors (also known as the principal components):

We see here that the red and blue line are our (the span) eigenvectors that will be our new basis. In this basis there is no longer any covariance, and euclidean distance works sufficiently. Mathematically this is written as below, where $S^{-1}$ represents the inverse covariance matrix (which is simply a change of basis to our eigenbasis):

---
Date: 20220119
Links to: [Machine-Learning-MOC](Machine-Learning-MOC.md)
Tags:
References:
* [23: Mahalanobis distance - YouTube](https://www.youtube.com/watch?v=spNpfmWZBmg)