# Information Theory Whenever thinking about information theory there are two very crisp visuals to have in mind. That of a *high entropy distribution* and a *low entropy distribution*: ![](Information%20theory.png) ### Key Intuitions * One **bit** is the amount of information required to chose between two *equally probable* alternatives. To be clear, a bit is an **amount of information**. * The more improbable a particular outcome is, the more surprised we are to observe it. The Shannon Information (i.e. the **surprisal**) of a value $x$ how surprised we are to observe that value (see below for mathematical definition. **Shannon information** is a measure of **surprise**. * The *average surprise* of a variable is called the **entropy**. In essence, entropy is a measure of uncertainty. When our uncertainty is reduced we gain information, so information and entropy are two sides of the same coin. * Average information actually shares the same definition as entropy, but whether we call a given quantity information or entropy usually depends on whether it is being given to us or taken away. I.e., receiving an amount of information is equivalent to having the exact same amount of entropy being taken away. See Information Theory: A Tutorial Introduction, p.g. 41. ---- # Equations ### Shannon Information/Suprisal $Surprise = \color{purple}log(\frac{1}{p(x)})$ ### Entropy $\rightarrow$ Average Shannon Information $H(X) = \sum_{i=1}^m p(x_i) \overbrace{\color{purple}log(\frac{1}{p(x_i)})}^{\substack{\text{shannon }\\ \text{information}}} = \overbrace{\mathbb{E}[\color{purple}log(\frac{1}{p(x_i)}) \color{black}]}^{\substack{\text{Expected value} \\ \text{of shannon information}}}$ --- * **Mutual information**: maps to: "how much can I gamble on X knowing Y" (see paper Informational Rescaling of PCA Maps with Application to Genetics) --- Date: 20211220 Links to: Tags: References: * Information Theory, A Tutorial Introduction * My notability notes * Notes in filing cabinet