Feature: Add knowledge distillation support #2595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mrT23 wants to merge 6 commits into huggingface:main

from mrT23:master

Open

Feature: Add knowledge distillation support #2595

mrT23 wants to merge 6 commits into huggingface:main from mrT23:master

+181 −0

Conversation

@mrT23

Copy link

Contributor

@mrT23 mrT23 commented Oct 15, 2025 •

edited

Loading

Knowledge distillation is one of the most effective techniques for achieving state-of-the-art results in model pre-training and finetuning. It has become a standard component in nearly every competitive paper and effectively sets the gold standard for top-tier performance.

Additionally, when training with knowledge distillation, the sensitivity to training parameters and the reliance on training tricks is significantly reduced.

This pull request adds simple support for training with knowledge distillation on ImageNet.
The proposed implementation is straightforward, clean, and robust, building on the methodology from https://arxiv.org/abs/2204.03475, and the reference implementation at: https://github.com/Alibaba-MIIL/Solving_ImageNet

@mrT23


 Add knowledge distillation model and loss function support

b48e88c

@mrT23 mrT23 changed the title ~~(削除) Add knowledge distillation model and loss function support (削除ここまで)~~ (追記) Feature: Add knowledge distillation support (追記ここまで)

Oct 15, 2025

@HuggingFaceDocBuilderDev

Copy link

HuggingFaceDocBuilderDev commented Oct 15, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rwightman and others added 5 commits

October 17, 2025 10:30

@rwightman


 Merge branch 'master' of github.com:mrT23/pytorch-image-models

3c5135c

@rwightman


 Cleanup distillation code

d705d67

@rwightman


 Keep as no_grad

6dcbc22

@mrT23


 Merge branch 'huggingface:main' into master

92b4850

@mrT23


 Merge remote-tracking branch 'origin/master'

0eb45e3

@mrT23

Copy link

Contributor Author

mrT23 commented Oct 21, 2025

merged #2598 to this PR

@mrT23

Copy link

Contributor Author

mrT23 commented Oct 27, 2025

@rwightman anything else is needed for merging ?

@rwightman

Copy link

Collaborator

rwightman commented Oct 27, 2025

@mrT23 well I'd long had in my mind I should alter the train interface a bit if I ever support distill, so have a few additions for that on top of my other PR... still testing some things

On your input 're-normalization' is that actually correct? It should be this no?

 mean_student = torch.as_tensor(mean_student).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 std_student = torch.as_tensor(std_student).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 mean_teacher = torch.as_tensor(self.mean_model_kd).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 std_teacher = torch.as_tensor(self.std_model_kd).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 input_kd = input * std_student / std_teacher + (mean_student - mean_teacher) / std_teacher

In the current code, it misses a rescale of the mean portion no?

 std = tuple(t_std / t_mean for t_std, t_mean in zip(self.std_model_kd, std_student))
 transform_std = T.Normalize(mean=(0, 0, 0), std=std)
 mean = tuple(t_mean - s_mean for t_mean, s_mean in zip(self.mean_model_kd, mean_student))
 transform_mean = T.Normalize(mean=mean, std=(1, 1, 1))
 input_kd_old = transform_mean(transform_std(input))

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Feature: Add knowledge distillation support #2595

Are you sure you want to change the base?

Feature: Add knowledge distillation support #2595

Conversation

@mrT23 mrT23 commented Oct 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

mrT23 commented Oct 21, 2025

Uh oh!

mrT23 commented Oct 27, 2025

Uh oh!

rwightman commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Feature: Add knowledge distillation support #2595

Are you sure you want to change the base?

Feature: Add knowledge distillation support #2595

Conversation

@mrT23 mrT23 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

mrT23 commented Oct 21, 2025

Uh oh!

mrT23 commented Oct 27, 2025

Uh oh!

rwightman commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@mrT23 mrT23 commented Oct 15, 2025 •

edited

Loading