# Gamblers Fallacy vs. Regression to the Mean
The **gambler's fallacy** is basically the belief that independent events are not really independent. (For example if 15 hasn't appeared on a Roulette in the past hundred spins, it must be more likely to appear on the next spin. This idea isn't true, but it is commonly believed.)
**Regression to the mean** is more subtle. Take for example a PGA golfer's scores on the first two days of a tournament. If his score was better than average the first day, what should we predict for him on the second day?
Well, the regression model would predict that part of the reason he was better than average the first day is that he is a better than average golfer. The model therefore predicts his score on day two to be better than the second day average.
However, the model also recognizes that part of the reason for his good score on the first day was just randomness (or good luck or some other unexplained variable). The model knows that we shouldn't expect his "good luck" to also help his second day score so it ignores the luck when predicting his second day score. As a result, while we still think he'll be better than average on the second day, we don't think he will be AS MUCH better than average on day two. We expect him to "regress to the mean."
The same thing applies to golfers who score worse than the mean on the first day. We expect that some of their score was just bad luck that we don't expect to follow them to day two. So we predict that the second day score will still be worse than average, but not as much worse as the first day. (We still call it regression to the mean even though it represents a relative improvement.)
### Key Idea
The key idea is that regression to the mean and the gambler's fallacy are actually ***saying very similar things***! But wait, how can that be? In my own words:
* The gamblers fallacy is saying: "If I observe 10 heads I am 'due' for a tails". The reason this is not true is because these are independent events. The 10 heads in a row did not make tails more likely; it simply was a very unlikely sequence of events.
* Regression to the mean is saying: "Observations are made up of random variables/components. Observations above the mean likely had a bit of 'luck'/randomness on their side to help them get up above the mean. Just like a string of many heads in a row is unlikely, the probability of the random component of the observation being positive many times in a row is unlikely. Hence, we expect that over time the outlying observations will regress to the mean."
### Regression to the Mean deals with *Dependent Events*
Peter Flom also gives a nice additional note on this issue [here](https://qr.ae/pv27yc). This notes that regression to the mean is actually a statement about *dependent* (correlated) random variables, such as the $x$ = the heights of fathers and $y$ = the heights of sons.
> Regression to the mean means that if two variables are correlated, and you rank each, then the ones that are highest on one will tend to be not quite so high on the other. E.g. the sons of the 1000 tallest fathers in the world will tend to be taller than average, but they will not be the 1000 tallest boys in their generation.
Again, here we can think of this as being due to the fact that height is made up of many genetic factors and some random variation. For very tall fathers, that random variation was likely far in the positive/tall direction. But, the probability of all that random variation combining so much in the positive/tall direction is low. There is much more probability (mass) that would actually bring the positive/tall direction down. Hence, we can predict/expect that the son of a very tall father will be shorter than the father.
### Outliers
Also, it is worth noting that this statement is most powerful for outliers/tails of distributions. If we are only focusing on the observations near the mean, there isn't much that we can say about their counterparts (for a father whose height is directly on the mean of all men, we can't say much about his sons height-i.e. we can't make a better prediction given this information).
## A string of heads, followed by another
The following intuition really made it click for me. Consider a string of 10 heads. Given that string, how confident are you that it would be followed by *another* string of 10? Obviously not very likely!
It is like this in the case of heights. There is a genetic component and a random component (analogous to the coins in this example). The genetic component may stay constant, but the random component (I.e. 10 heads in a row) likely will not.
---
Date: 20220421
Links to:
Tags:
References:
* [Michael Lamar's answer to What's the difference between regression to the mean and the gambler's fallacy? - Quora](https://www.quora.com/Whats-the-difference-between-regression-to-the-mean-and-the-gamblers-fallacy/answer/Michael-Lamar?ch=17&oid=5495552&share=76444abe&srid=srcU&target_type=answer)
* [Peter's answer to What's the difference between regression to the mean and the gambler's fallacy? - Quora](https://qr.ae/pv27yc)