KD for Vision Transformer

less than 1 minute read

Published: June 29, 2022

knowledge Distillation

Knowledge distillation has became a common trick to obtain better results for student models in both industry and academia. Student models learn implict information by imitate the logit output and intermediate layer of teacher model. This aiticle summarize and demonstrate my work on “Transformer model distillation” in IDL, Baidu Research from 2022-04-13 to 2022-07-10. (As Prof. Wang saids, my work is more like a survey than a conference paper.😢 )

A very important issue in knowledge distillation is how to tranfer information from teacher models to student models. In CNN, we can align their logit output as well as intermediate features, we can also do that in transformer, but we want to align teacher and student in a finer grain manner and going deep with the information flows between different layers.

Tranformer

Transformer framework

Share on

Twitter Facebook LinkedIn

Shiyu Gao

KD for Vision Transformer

knowledge Distillation

Tranformer

Share on

You May Also Enjoy

Bev Perception v2

Bev Seg

Bev Perception

Transformer is all you need