Mean Variance Optimization

# Mean Variance Optimization The best entry point into understanding **Mean Variance Optimization (MVO)** is through it's core mathematical objects. There are two main spaces we are interested in: **return space** and **weight space**. They are *not* the same space, even they are both $\mathbb{R}^n$. There are four main objects of interest that live in these spaces: | | **Returns vector** | **Mu Vector** | **Covariance of returns** | **Portfolio weights vector** | | ------------------------------- | -------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | **Symbol** | $R = [R_1, \dots, R_n]$ | $\mu = \mathbb{E}[R]$ | $\Sigma = \text{Cov}(R)$ | $w = [w_1, \dots, w_n]$ | | **Definition** | Returns of each asset at a specific point in time | Mean return of assets | Covariance matrix computed from historical return data | Proportion of capital allocated to each asset | | **Space** | Return | Return | Return | Weight | | **Notes** | [Random Variable](Random%20Variable.md) | Center of the $\Sigma$ ellipse in return space | Encodes variance and correlations of asset returns | Typically treated as fixed in analysis | | **Geometry / Role in MD & MVO** | A _point_ in return space; distribution of $R$ has shape given by $\Sigma$ | Origin point for Mahalanobis distance contours; in MVO, $\Sigma^{-1}\mu$ is the risk-adjusted optimal direction | Defines the _Mahalanobis geometry_ in return space: ellipses of equal risk; whitening by $\Sigma^{-1/2}$ makes them spheres | In weight space, $\Sigma$ defines the _variance ellipse_ $w^T \Sigma w = \text{const}$; $\Sigma^{-1}\mu$ gives optimal direction in unconstrained MVO | | | | | | | We can visualize these space and their objects. Until other wise stated, we'll set $n=2$ and thus return and weight space are $\mathbb{R}^2$. Lets start simple and just look at a single, fixed $r \sim R$ and $w$. We can compute the portfolio return, $r_p$, via their dot product: $r_p = w^T r$. ![center](Pasted%20image%2020250814155013.png) But we aren't interested in a specific $r_p$ based on a specific $r$. We want to see the distribution of $R_ps given the distribution of $Rs (that we have or could observe). This means we need to look at the [Random Variable](Random%20Variable.md) $R$. From this we can perform another dot product and get a new random variable, $R_p = w^TR = w_1 R_1 + \dots + w_n R_n$. This is all visualized below. In the left plot we have our fixed $w$ vector in weight space. In the middle plot, we have all of our $r \sim R$ observed points in return space. These are then *projected* onto the direction defined by $w$. And finally the right plot shows that the distribution of projections is equivalent to the dot product $w^TR$. This will always hold when $w$ has unit length. ![](Pasted%20image%2020250814155140.png) Of course if we vary either $w$ or the samples $r \sim R$ the distribution of returns $R_p$ will change. But a key idea is this: > $R_p$ is a random variable. It has as distribution. We can ask questions about this distribution. One question we can ask is: what is it's [Variance](Variance.md)? In other words, for a portfolio $w$, what is the variance of it's returns, $\text{Var}(R_p)$? To answer this we need to introduce our third object: $\Sigma$, the covariance matrix of $R$. Given $\Sigma$, we can compute the variance of returns simply as: $\text{Var}(R_p) = w^T \Sigma w$ This $\Sigma$ object is incredibly important when reasoning about what the optimizer is going to do. There are two geometric interpretations that we must be able to hold in our heads: ###### $\Sigma$ as an ellipse in return space We start with a cloud of return vectors and compute $\Sigma$. The contours of constant [Mahalanobis Distance](Mahalanobis%20Distance.md) (MD) are ellipses: $\text{MD}(x, \mu) \;=\; \sqrt{ (x - \mu)^\top \Sigma^{-1} (x - \mu) }$ ![center | 350](Screenshot%202025-08-14%20at%203.53.16%20PM.png) The formula for MD contains $\Sigma^{-1}$ because it "whitens" the space: multiplying by $\Sigma^{-1/2}$ removes the stretching and rotation defined by $\Sigma$, making the geometry [Isotropic](Isotropic.md) so that euclidean distance can be applied. However, the ellipse shape itself comes from the eigenvectors and eigenvalues of $\Sigma$: - Eigenvectors = ellipse axes directions - $\sqrt{\lambda_i}$ from $\Sigma$ = semi-axis lengths for the $1σ$ contour A contour of constant MD means “all points here are equally far from the mean in the geometry defined by $\Sigma$”, which also corresponds to equal variance (and, under Gaussian assumptions, equal probability density). ###### $\Sigma$ as an ellipse in weight space Now we'll switch to the space of portfolio weights $w$. The equation for constant variance here is: $\text{Var}(R_p) = w^T \Sigma w = c$ ![center | 350](Screenshot%202025-08-14%20at%203.53.37%20PM.png) This also produces an ellipse, but now the semi-axis lengths are proportional to $1/\sqrt{\lambda_i}$. This is because along a direction of large variance in return space ($\lambda_i$ is big), you must keep $w$ small to maintain constant portfolio variance. This reciprocal relationship means the principal axes in weight space are aligned with those in return space, but their lengths are inverted: $\text{Return-space axis length} \; \propto \sqrt{\lambda_i}, \quad \text{Weight-space axis length} \; \propto \frac{1}{\sqrt{\lambda_i}}$ ###### Connection This “two ellipse” mental model is critical: - Return space ellipse: geometry of the _distribution of returns_. - Weight space ellipse: geometry of the _risk constraint_ on portfolios. They’re mathematically linked through the bilinear form $w^\top \Sigma w$ and eigen-decomposition of $\Sigma$. ![](Pasted%20image%2020250814155307.png) ###### $\Sigma$ instead of point clouds To complete our mental model, let's now integrate this view of $\Sigma$ along with $w$. Starting with two different point clouds, we can compute two different $\Sigmas. We will fix the direction of $w$. You can see what this looks like below. The purple $\Sigma$ is less elongated and has less variance in it's returns. ![](Pasted%20image%2020250814155805.png) We can also see what happens to the variance of $R_p$ as we change $w$. Here $w_1$ is in the direction of largest variance, and thus it has the largest $\text{Var}(R_p)$. On the other hand, $w_3$ is orthogonal to the direction of largest variance, and it has a much smaller $\text{Var}(R_p)$. ![](Pasted%20image%2020250814160108.png) # Risk Geometry Let's pause for a moment. I've introduced $\mu$ and $\Sigma$, but they could use a bit more context. They both live in return space. $\mu$ is the expected value of $R$. It is the "center" of the distribution of returns: $\mu = \mathbb{E} [R]$ And $\Sigma$ is the covariance matrix. It captures how returns co-vary. It encodes the *geometry* of the return distribution: spreads and correlations in all directions. $\Sigma_{ij} = \text{Cov}(R_i, R_j)$ We can focus on the $1\sigma$ ellipse. This represents the set of points in return space ($R$ vectors) that have [Mahalanobis Distance](Mahalanobis%20Distance.md) of $1$ from the $\mu$. Put another way, if we measure distance according to the *geometry* of $\Sigma$, then the ellipse represents points that are all distance $1$ from $\mu$. This can be thought of as a multidimensional extension of standard deviation. Think about return space for a moment. Each axis is the return of one asset over a time step. A portfolio weight vector $w$ is a direction in this space. The **risk** of $w$ comes from the variability of its portfolio return, where $R$ is the random asset-return vector and $\Sigma$ is its covariance matrix.: $\text{Var}(R_p) = w^T \Sigma w$ $\Sigma$ encodes the variability in return space. If we plot the contours of equal [Mahalanobis Distance](Mahalanobis%20Distance.md) (MD) from the mean (i.e., $(r-\mu)^T \Sigma^{-1} (r-\mu) = c$), you'd get ellipses in 2d. The axes of these ellipses are the eigenvectors of $\Sigma$ (principal risk directions), and their lengths are proportional to the square roots of the eigenvalues (standard deviations along those directions). Why is this the "risk geometry"? Remember, every vector in return space corresponds to some portfolio direction. The MD length $\sqrt{u^T \Sigma^{-1} u}$ tells you how many "risk-standard-deviations" long that vector is. Whitening with $\Sigma^{-1/2}$ turns the ellipses into circles—in that space, Euclidean distance directly measures the risk-adjusted length. The take away is this: > Portfolio vectors $w$ live in return space. A specific $ws *risk* is directly dependent on the *direction* it points in this space. Risky directions are encoded via $\Sigma$. We want $w$ to be *risk aware*. This means that we want $ws direction to be influenced by the risk directions encoded in $\Sigma$. This raises the question: How do we use $\Sigma$ to update the direction of $w$ in a principled way? # Mean Variance Optimization (MVO) ###### Mean Optimization (MO) To start, we need to think about what $w$ would do if we don't consider $\Sigma$ at all. In this case we would take our $\mu$ vector (the average price of each asset's return), and pick a $w$ that would maximize the quantity $\max_w \; w^T \mu$. This will yield a $w$ in the exactly direction of $\mu$: $w || \mu$ When using an optimizer, it will place $w$ exactly in the direction of $\mu$ (with the length of $w$ being constrained only by how much capital it can trade). So, if we don't account for variance: $w || \mu$. ###### Mean Variance Optimization (MVO) Now what will happen when we add the variance into the picture? It will cause $w$ to rotate away from the direction of $\mu$. How much will it cause it to rotate? Well that depends on how much variance was in the direction of $\mu$, and what other constraints are present. But the key idea is that it will cause a rotation of $w$. If we don't have any constraints, then the unconstrained MVO gives: $w \;\propto\; \Sigma^{-1} \mu$ ![center](Pasted%20image%2020250813110947.png) This is a rotation/scaling of $\mu$ by the whitening geometry of $\Sigma^{-1}$. In MVO, $\Sigma^{-1}\mu$ plays the role of a "direction in return space" we want to move toward, but *adjusted for risk geometry*. We can see this below. On the left we have the raw $\mu$. It points in a direction of high variance. On the right, we have whitened the space, and this has rotated $\mu$. In this transformed space, the $w$ that the optimizer finds will point in this new, rotated $\Sigma^{-1}\mu$ direction. ![center](Pasted%20image%2020250813113958.png) Some intuition: in the raw space, $\mu$ points in the direction of the highest returns, but that direction might be dominated by a high variance axis. The whitened space rescales so that 1 unit is $1\sigma$ of risk in any direction. This has the effect of: * Squishing risky directions (so a $\mu$ along a risky direction will get pulled towards origin—it is down weighted) * Stretching safe directions > After whitening, MVO picks $w$ in the *risk adjusted direction*. I won't show this here, but this process of whitening $\mu$ and then aligning $w$ with the whitened $\mu$ is equivalent to the *Sharpe ratio maximization* problem—it will yield a $w$ that maximizes Sharpe: $\max_{w \neq 0} \frac{w^\top \mu}{\sqrt{w^\top \Sigma w}}$ # Mahalanobis Distance This has a direct connection to [Mahalanobis Distance](Mahalanobis%20Distance.md) (MD). But first, a quick recap on what MD captures. For a point $x$, a mean $\mu$ and a covariance $\Sigma$: $\text{MD}(x, \mu) \;=\; \sqrt{ (x - \mu)^\top \Sigma^{-1} (x - \mu) }$ This measures how many *risk-standard deviations* is $x$ away from $\mu$, when the risk geometry is given by $\Sigma$. We can think about this as the covariance being centered at $\mu$, or subtracting $\mu$ first so that $\mu$ is effectively the origin: ![center](Pasted%20image%2020250813112504.png) MD can be thought of as *whitening* (transforming) the space, so that instead of an ellipse, the risk is a circle. This is effectively a change of basis. ![center | 350](Pasted%20image%2020250813112540.png) # Connection between MVO and MD: $\Sigma^{-1}$ Geometry ###### Mahalanobis Distance Given two points $\mu$ and $x$ in return space, we want their distance measured in **risk geometry**. This is encoded via $\Sigma$. Whitening with $\Sigma^{-1/2}$ turns the $1\sigma$ ellipse into a unit circle. Euclidean distance in this whitened space is risk-adjusted distance. Directions of high variance are *compressed*—therefore they count less to the distance in this space. Below we can see that after applying MD, $x_2$ is now closer to $\mu$ and $x_1$ is further away. ![center](Pasted%20image%2020250813131618.png) ###### Mean Variance Optimization Given an expected return vector $\mu$ in return space, we want to find the portfolio weights $w$ that balance return and risk. Again, $\Sigma$ encodes the risk geometry. Whitening with $\Sigma^{-1/2}$ puts us in *risk-adjusted return space*. Here the geometry is [Isotropic](Isotropic.md). In whitened space, the optimal $w$ just points along the whitened mean: $\Sigma^{-1/2} \mu$. This whitening rotates and rescales $\mu$ so that high-risk (variance) directions count *less* in the optimization (i.e. directions with high variance are down weighted, even if they have large $\mu$). As a reminder, without accounting for $\Sigma$ (i.e. without whitening), $w$ would point in the direction of $\mu$ (orange). ![center](Pasted%20image%2020250813131639.png) ###### The Connection > Both MD and MVO start with a vector in return space and *re-express it in the risk geometry* defined by $\Sigma$. Whitening removes the stretching and skewing of that geometry: > - In MD, the vector is a _difference_ $(x - \mu)$ — we’re measuring separation in risk-adjusted units. > - In MVO, the vector is $\mu$ itself — we’re finding the optimal direction for $w$ after accounting for risk. > > In both cases, distances in high variance directions are given less "weight". | Category | Mahalanobis Distance (MD) | Mean-Variance Optimization (MVO) | | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | **Goal** | Measure how far $x$ is from $\mu$ in the data’s geometry (geometry of $\Sigma$) | Choose portfolio weights $w$ that balance expected return $\mu$ and risk $\Sigma$ | | **Whitening Step** | Apply $\Sigma^{-1/2}$ to $(x - \mu)$, which rescales each direction so 1 unit means one std of risk | Apply $\Sigma^{-1/2}$ to $\mu$, producing a *risk-adjusted mean vector* in a whitened space | | **Effect** | In high-variance directions, raw distances are *compressed*—differences there contribute less to the final distance | Components of $\mu$ in high-variance directions are *shrunk* and rotated towards lower variance directions | | **Interpretation** | *Discounts* differences along directions where the data naturally varies a lot, and *emphasizes/amplifies* differences along less varying directions | Whitening *discounts* return potential from risky directions, and *emphasizes* return potential from stable directions | # Using MD to Cross Validate $\mu$ and improve MVO We know that MVO is only as good as the inputs that we provide it: $\mu$ and $\Sigma$. But say we are making updates and improvements to $\mu$—how can we measure if we are actually improving the quality of $\mu$? In other words, how can we *cross validate* $\mu$? Sure we could just run it through the optimization process, but there are two drawbacks to that: 1. This is very susceptible to backtest hacking 2. If we find that $\mu_2$ is *not* an improvement over $\mu_1$, we'll have no idea why. And therefore we'll have no clear way to iterate forward and improve $\mu_2$. Using the backtest as a means of improving $\mu$ is a dead end. So the cross validation of $\mu$ is important. But how can we achieve this? What would a good cross validation function(s) look like? It will serve two purposes: 1. **Correlate with financial performance (FP)** If we show that $\mu_2$ is an improvement over $\mu_1$, this should mean that $\text{FP}_2$ is an improvement over $\text{FP}_1$. 2. **Help identify areas of $\mu$ to improve** Some points of $\mu$ matter more than others. Some elements of $\mu$ aren't that useful to improve: those that the optimizer will never trade due to the associated risk, or their middle-of-the-pack value, or our balance constraint, and so on. But some elements are useful to improve: those that have low risk, those that really help us in terms of our balance constraint, and so on. Moving forward these are our *goals* for a cross validation function(s). # Mahalanobis Distance ###### Intuition One candidate cross validation function is MD. Let's start trying to build some intuition around why it would be useful. Lets start with $\hat{\mu}$ (prediction) and $\mu$ (true). Suppose they differ mainly along a low(high) variance direction. When the optimizer performs the whitening step via $\Sigma^{-1/2}$, what will happen? - **Low-variance direction difference**: Whitening barely shrinks it and MD is large. MD flags this as _bad_, because the optimizer is more likely to put weight on that direction (low risk, so high risk-adjusted return). If your prediction is wrong there, it will meaningfully mislead the allocation. - **High-variance direction difference**: Whitening shrinks it a lot and MD is smaller. MD treats it as _less harmful_, because the optimizer tends to allocate less along high-variance axes anyway. Even if you’re wrong here, the portfolio impact is muted. So MD isn’t just a generic “distance” — in this MVO setting, it’s _aligned_ with the economic consequence of our forecast errors. I have yet to address what $\Sigma$ we will be using with MD. Considering we don't ever have access to the true sigma, we'll be using $\hat{\Sigma}$, our prediction. But that is actually the *correct choice* in this case. We are trying to measure errors that will *matter*—in the sense that the optimizer *will act on them*. And the optimizer is going to act based on what we pass to it, which in this case is $\hat{\Sigma}$. Remember, the optimizer will whiten return space via $\hat{\Sigma}$ (implicitly) and sets $w \propto \hat{\Sigma}^{-1}\hat{\mu}$. This means the portfolio will be in the directions that $\hat{\Sigma}$ deems as low-variance (low risk). MD measures forecast error in the *same geometry* that the optimizer uses. | Direction (variance) | Error size | MD magnitude | Impact on optimizer | | -------------------------- | ---------- | ------------ | ----------------------------------------------------- | | *Low-variance directions* | Big | Large MD | Harmful — optimizer tends to allocate weight there | | *High-variance directions* | Big | Smaller MD | Less harmful — optimizer downweights these directions | ###### Connection Between Improving MD and Improving FP Can we show that if we improve MD this will lead to improving FP? Can we offer any guarantees or assurances? Or highlight when this will hold and when it won't? So their performance depends monotonically in their cosine similarity in sigma_pred adjusted space. Can you walk through the math here step by step? --- Date: 20250813 Links to: Tags: References: * []()