# Distillation
Distillation is where you have a teacher network that does a task very well, but it is slow/requires a lot of memory, and you try to train a student network to do the same task faster/with less memory.
Distillation works in large part because you now have access to **intermediate outputs** that you did not have in the past. This can be used to construct input + outputs pairs, which in turn are used to train the model.
---
Date: 20230712
Links to:
Tags:
References:
* []()