# Distillation Distillation is where you have a teacher network that does a task very well, but it is slow/requires a lot of memory, and you try to train a student network to do the same task faster/with less memory. Distillation works in large part because you now have access to **intermediate outputs** that you did not have in the past. This can be used to construct input + outputs pairs, which in turn are used to train the model. --- Date: 20230712 Links to: Tags: References: * []()