fgg blog

: Distillation

knowledge_distillation

知识蒸馏(Knowledge Distillation)是一种机器学习技术,它通过将大型、复杂的模型(称为教师 模型,Teacher Model)的知识“蒸馏”到小型、简洁的模型(称为学生模型,Student Model)中,从 而实现模型压缩和加速,同时尽可能保持原始模型的性能。这一技术使得模型可以在资源有限的设备 上高效运行,如手机或嵌入式设备。

The method works by incorporating an additional loss into the traditional cross entropy loss, which is based on the softmax output of the teacher network. The assumption is that the output activations of a properly trained teacher network carry additional information that can be leveraged by a student network during training.