# Hash Maps
The key idea to keep in mind with hash maps is that under the hood they are using a **list**, but the ingenuity stems from their clever use of two things:
1. Hash functions
2. Compression functions
A hash function takes in some *immutable* value (whether it be an integer, string, tuple, etc), and maps it to an *integer*.
A compression function then takes this integer and maps it to some *index* of the underlying list that will contain our values. The usefulness of a hashmap comes from the fact that they are very fast, generally having lookup and insertion time of O(1).
Let's think about why this is so fast. Consider a hash map that was built on top of a list containing 1000 slots. If we were looking to see if some name, say `John`, was in the list, we would need to scan the entire list, which is O(n) (where `n` is the length of the list). Where hash functions come in is they allow us to take the name `John` and map it to an index very quickly. Specifically the two steps look like:
```python
my_hash_list = []
# Map 'John' to an integer
>> hash('John')
-1005131343229729964
# Map this integer to a bucket of our list
def compress(i):
a = 11
b = 13
p = 109345121 # prime number larger than N
N = 1000 # num slots in list
return ((a * i + b) % p) % N
>> compress(-1005131343229729964)
390
my_hash_list[390] = 'John'
```
Now, we have an operation that is very fast! This ability to hash and then compression some key, such as `John`, only take several operations (for instance, our `compress` function requires `1` multiplication, `1` addition, and `2` mods, for a total of `4` operations. This is *constant* for any key that needs to be compressed).
What we *must* keep in mind though is that the entire usefulness of a hash function comes from the fact that they map *uniformly random* to indices of the list. If this process is not random, then we will end up with many values being mapped to the same index.
### Meta idea
One of the cool things to keep in mind here is that we do *not* need to "keep track" of where are particular key resides. In other words, we don't have some "table" that we go to an ask: "What index does `John` reside at?" and the table responds, "Ah `John` is at index `390`". In a sense, we have pushed that work off to the **mathematical process of hashing and compressing**. And ya know what, if we think about it in some ways we are "asking" what index `John` resides at, but the answer is algorithmically computable It lives in a space that our function captures.
### General Properties
Hash functions typically have these key properties:
* They map some type of input, such as strings or floats, to discrete values, such as integers.
* They’re designed so that two inputs will result in hash outputs that are either different or the same based on key properties of the inputs.
### Why do hash functions use prime numbers?
See [here](https://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/) as well as [this notebook](https://github.com/NathanielDake/intuitiveml/blob/master/notebooks/computer_science/Algorithms_and_data_structures_in_python/maps_and_dictionaries/hash_collisions_and_primes.ipynb). Note that if the data are random then we do not need to use a prime. However, if the data exhibit any sort of pattern then a prime will be very useful in spreading out collisions (see [here](https://qr.ae/pGJaSN)).
---
Date: 20210813
Links to:
Tags:
References:
* [Pythons default hash algorithm](https://andrewbrookins.com/technology/pythons-default-hash-algorithm/)