# Mendelian Randomization
In traditional observational studies, dividing people into groups based on a trait like high cholesterol can be confounding. For instance, you could take 1000 people, where 500 have high cholesterol (group A) and 500 have low cholesterol (group B). You could then see that group A has a higher rate of heart disease. You could conclude that cholesterol causes heart disease. But here is the problem: What if the people in group A had high cholesterol *because* they didn't exercise? Then is it *not exercising* that lead to the heart disease, or was it the high cholesterol?
In an observational study conducted this way, there is no way of disentangling the two.
That is where **Mendelian Randomization** comes into play. This technique leverages genetics to cut out these pesky confounders.

It does this by noticing a key fact about confounding: it requires two directed edges in a graph. Consider `exercise` above. `exercise` affects both `cholesterol` and `heart disease`. What makes `exercise` a confounder is *not* that it affects `heart disease` - that in and of itself is fine. There are many things that affect heart disease. It is that it *also* affects `cholesterol`. If we could get rid of that, then `exercise` would no longer be a confounder.
How does a **Randomized Controlled Trial** accomplish this? It does so via randomly splitting a population into two groups and then giving them a treatment and control. If we were trying to see if a drug was affective, it could be that yet again `exercise` interacts and affects the drugs impact. But that is the beauty of the RCT. By splitting the two groups totally randomly, we expect that each group has some participants that exercise, and some that don't. This effectively "washes out" (e.g. marginalizes out) the affect of exercise.
And it is from here that MR gets inspiration. MR attacks the edge from `exercise -> cholesterol`. It tries to eradicate that edge. It does so as follows:
* There is a *gene* that *causes* higher cholesterol: `PCSK9`
* MR splits our population into two groups: one with and one without `PCSK9`
* The group with `PCSK9` will have higher cholesterol, and without will have lower cholesterol
* At first it may seem like we are right back where we started - we have two groups, one with high and the other with low cholesterol
* But there is a key difference! We have eradicated the edge from `exercise -> cholesterol`!
* This was because we *know* the cause of high cholesterol! It was the `PCSK9` gene

To be clear, there can of course still be *causes* of high cholesterol not related to the gene. But now they should be equally present in each group. For instance, each group should have the same number of participants that exercise, so any impact it has should be "washed out". The final difference we observe in heart disease between the two groups can be attributed to the high cholesterol.
---
Date: 20231203
Links to:
Tags:
References:
* []()