Einsum - Nate's Notes

# Einsum > [Einstein summation](https://ajcr.net/Basic-guide-to-einsum/) ([`einsum`](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html)) is a compact representation for combining products and sums in a general way. The key rules are: > - Repeating letters between input arrays means that values along those axes will be multiplied together. > - Omitting a letter from the output means that values along that axis will be summed. ### Deep Dive * **Free Indices** are the indices specified in the output. * Associated with outer loops * **Summation Indices**: All other indices. Those that appear in the input argument but *not* in output specification * Associated with inner index ![](Einsum%20Is%20All%20You%20Need_%20NumPy,%20PyTorch%20and%20TensorFlow%203-41%20screenshot.png) ### Comparison to Marginalization As an analogy, consider the process of **Marginalization** in probability theory. There you have the concept of marginalizing out some variable. Say you have a distribution $P(X, Y)$ and you want to just get $P(X)$. You can marginalize out $Y$ via: $P(X) = \sum_{y} P(X , Y = y)$. In this case we are marginalizing out $Y$, so it would be akin to a *summation index*. The index of the $X$ dimension would be the *free index* since it occurs in the output. ### Visual to hold in your mind If you are told that you are going to perform *any* aggregation along some axis $i$, always visualize **collapsing** along that dimension! ![](IMG_B3A785FE2109-1.jpeg) So a very useful example to hold in mind here is to say we have two tensors `A = [i, k]` and `B = [k, j]`, where it may be that `i = 5, j = 10, k = 784`. Now look at the following two einsums: ```python t1 = einsum('ik, kj -> ikj', A, B) t1.shape >> [5, 784, 10] t2 = einsum('ik, kj -> ij', A, B) t2.shape >> [5, 10] ``` The key idea here is that: * **Repeating letters** between *input arrays* means that values along those axes will be **multiplied together**. In this case `k` is the repeated letter in the input array, so our `784` dimensional axis that is present in both `A` and `B` is multiplied together. However, in the first case it is *not* summed up! This is because it is still present in the output! * **Omitting a letter** from the *output* means that values along that axis will be **summed**. So, in the second case, we have omitted `k` from the output meaning that we *sum* along the `k` axis, and again we can visualize our sum as **collapsing** along the `k` axis! --- Date: 20230720 Links to: Tags: References: * [Einsum Is All You Need: NumPy, PyTorch and TensorFlow - YouTube](https://www.youtube.com/watch?v=pkVwUVEHmfI)