-
Notifications
You must be signed in to change notification settings - Fork 6.4k
feat(scheduler): Add scale_betas_for_timesteps to DDPMScheduler #12341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scheduler): Add scale_betas_for_timesteps to DDPMScheduler #12341
Conversation
Hi, I wanted to add a quick comment with some further justification for this change. The PR addresses a subtle but critical issue where changing num_train_timesteps
can cause the asymptotic variance of the forward process to deviate from unity, breaking the assumption that
Problem
Using the standard notation
the variance of the final state
How the Current Implementation Fails
The current implementation leads to an inconsistent
- Default
T = 1000
:$\sum_i \beta_i \approx 10.05$ and$Var(x_T) \approx 1$ . - Naive change
T = 200
:$\sum_i \beta_i \approx 2.01$ which is far too small. This results in$Var(x_T) \approx 0.87$ and the prior is incorrect, which can cause significant issues during the reverse sampling process.
How the Proposed Fix Works
The proposed fix of scaling beta_end
ensures the sum of betas remains approximately constant, thereby preserving the unit variance of the final state.
- With
T = 200
,beta_end
is scaled to 0.1 and$\sum_i \beta_i \approx 10.01$ . - This ensures
$\bar{\alpha}_T \approx 0$ and in turn$Var(x_T) \approx 1$ , preserving the mathematical integrity of the diffusion process.
This change ensures the scheduler produces a theoretically sound noise schedule by default, preventing users from having to manually correct for variance issues when experimenting with different numbers of training timesteps.
What does this PR do?
This PR introduces a new boolean flag,
scale_betas_for_timesteps
, to theDDPMScheduler
. This flag provides an optional, more robust way to handle the beta schedule whennum_train_timesteps
is set to a value other than the default of 1000.Motivation and Context
The default parameters for the
DDPMScheduler
(beta_start=0.0001
,beta_end=0.02
) are implicitly tuned fornum_train_timesteps=1000
. This creates a potential "usability trap" for practitioners who may change the number of training timesteps without realizing they should also adjust the beta range.num_train_timesteps
to a large value (e.g., 4000), the linear beta schedule becomes too shallow, and noise is added too slowly.num_train_timesteps
is set to a small value (e.g., 200), the schedule becomes too steep, and noise is added too aggressively.Both scenarios can lead to suboptimal training performance that is difficult to debug.
Proposed Solution
This PR introduces an opt-in solution to this problem.
scale_betas_for_timesteps
, is added to the scheduler's__init__
method.False
to ensure 100% backward compatibility with existing code.True
, it automatically scales thebeta_end
parameter using a simple heuristic (beta_end * (1000 / num_train_timesteps)
). This ensures that the overall noise schedule remains sensible and robust, regardless of the number of training steps chosen by the user.beta_end
is used by schedules dependent on it (e.g.,linear
,scaled_linear
), while schedules that do not use this parameter (e.g.,squaredcos_cap_v2
) are naturally unaffected.This change makes the scheduler more intuitive and helps prevent common configuration errors.
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
As suggested by the contribution guide for schedulers: @yiyixuxu