# Einsum
> [Einstein summation](https://ajcr.net/Basic-guide-to-einsum/) ([`einsum`](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html)) is a compact representation for combining products and sums in a general way. The key rules are:
> - Repeating letters between input arrays means that values along those axes will be multiplied together.
> - Omitting a letter from the output means that values along that axis will be summed.
### Deep Dive
* **Free Indices** are the indices specified in the output.
* Associated with outer loops
* **Summation Indices**: All other indices. Those that appear in the input argument but *not* in output specification
* Associated with inner index

### Comparison to Marginalization
As an analogy, consider the process of **Marginalization** in probability theory. There you have the concept of marginalizing out some variable. Say you have a distribution $P(X, Y)$ and you want to just get $P(X)$. You can marginalize out $Y$ via: $P(X) = \sum_{y} P(X , Y = y)$. In this case we are marginalizing out $Y$, so it would be akin to a *summation index*. The index of the $X$ dimension would be the *free index* since it occurs in the output.
### Visual to hold in your mind
If you are told that you are going to perform *any* aggregation along some axis $i$, always visualize **collapsing** along that dimension!

So a very useful example to hold in mind here is to say we have two tensors `A = [i, k]` and `B = [k, j]`, where it may be that `i = 5, j = 10, k = 784`. Now look at the following two einsums:
```python
t1 = einsum('ik, kj -> ikj', A, B)
t1.shape
>> [5, 784, 10]
t2 = einsum('ik, kj -> ij', A, B)
t2.shape
>> [5, 10]
```
The key idea here is that:
* **Repeating letters** between *input arrays* means that values along those axes will be **multiplied together**. In this case `k` is the repeated letter in the input array, so our `784` dimensional axis that is present in both `A` and `B` is multiplied together. However, in the first case it is *not* summed up! This is because it is still present in the output!
* **Omitting a letter** from the *output* means that values along that axis will be **summed**. So, in the second case, we have omitted `k` from the output meaning that we *sum* along the `k` axis, and again we can visualize our sum as **collapsing** along the `k` axis!
---
Date: 20230720
Links to:
Tags:
References:
* [Einsum Is All You Need: NumPy, PyTorch and TensorFlow - YouTube](https://www.youtube.com/watch?v=pkVwUVEHmfI)