Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feature: Add knowledge distillation support #2595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mrT23 wants to merge 6 commits into huggingface:main
base: main
Choose a base branch
Loading
from mrT23:master

Conversation

@mrT23
Copy link
Contributor

@mrT23 mrT23 commented Oct 15, 2025
edited
Loading

Knowledge distillation is one of the most effective techniques for achieving state-of-the-art results in model pre-training and finetuning. It has become a standard component in nearly every competitive paper and effectively sets the gold standard for top-tier performance.

Additionally, when training with knowledge distillation, the sensitivity to training parameters and the reliance on training tricks is significantly reduced.

This pull request adds simple support for training with knowledge distillation on ImageNet.
The proposed implementation is straightforward, clean, and robust, building on the methodology from https://arxiv.org/abs/2204.03475, and the reference implementation at: https://github.com/Alibaba-MIIL/Solving_ImageNet

@mrT23 mrT23 changed the title (削除) Add knowledge distillation model and loss function support (削除ここまで) (追記) Feature: Add knowledge distillation support (追記ここまで) Oct 15, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor Author

mrT23 commented Oct 21, 2025

merged #2598 to this PR

Copy link
Contributor Author

mrT23 commented Oct 27, 2025

@rwightman anything else is needed for merging ?

Copy link
Collaborator

@mrT23 well I'd long had in my mind I should alter the train interface a bit if I ever support distill, so have a few additions for that on top of my other PR... still testing some things

On your input 're-normalization' is that actually correct? It should be this no?

 mean_student = torch.as_tensor(mean_student).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 std_student = torch.as_tensor(std_student).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 mean_teacher = torch.as_tensor(self.mean_model_kd).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 std_teacher = torch.as_tensor(self.std_model_kd).to(device=input.device, dtype=input.dtype).view(-1, 1, 1)
 input_kd = input * std_student / std_teacher + (mean_student - mean_teacher) / std_teacher

In the current code, it misses a rescale of the mean portion no?

 std = tuple(t_std / t_mean for t_std, t_mean in zip(self.std_model_kd, std_student))
 transform_std = T.Normalize(mean=(0, 0, 0), std=std)
 mean = tuple(t_mean - s_mean for t_mean, s_mean in zip(self.mean_model_kd, mean_student))
 transform_mean = T.Normalize(mean=mean, std=(1, 1, 1))
 input_kd_old = transform_mean(transform_std(input))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /