add Beta Schedule #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

Green-Sky wants to merge 1 commit into leejet:master

from Green-Sky:scheduler_beta_phil2sat

Draft

add Beta Schedule #811

Green-Sky wants to merge 1 commit into leejet:master from Green-Sky:scheduler_beta_phil2sat

Conversation

Green-Sky

Copy link

Contributor

@Green-Sky Green-Sky commented Sep 10, 2025 •

edited

Loading

Submitted by @phil2sat in #777

output (Chroma1-HD-Flash-Q4_K_S)

TODO:

model generation note?

@Green-Sky

Copy link

Contributor Author

Green-Sky commented Sep 10, 2025

BETA sigmas for 8steps:

from initial code.

@Green-Sky

Copy link

Contributor Author

Green-Sky commented Sep 10, 2025 •

edited

Loading

What i realized is that this scheduler allows for way too many variations that make no real sense. The paper only really uses alpha/beta 0.5 and diffusion practitioners seem to almost always use 0.6 for both. What I realized while looking at the functions, is that it almost looks exactly like a smoothstep/smootherstep (or rather the inverse of it).
So it would probably make sense to just use that and call it slightly differently (beta_smoothstep) or something instead.

Beta(1/2, 1/2) is equivalent to the arcsine distribution.

arcsine dist cdf:
image

inverse smoothstep plot:
image

f(x)	beta(0.6, 0.6) ppf	smoothstep
f(1.000000)	1.000000	1.000000
f(0.875000)	0.942844	0.957031
f(0.750000)	0.824320	0.843750
f(0.625000)	0.670619	0.683594
f(0.500000)	0.500000	0.500000
f(0.375000)	0.329381	0.316406
f(0.250000)	0.175680	0.156250
f(0.125000)	0.057156	0.042969

beta(0.6,0.6) ppf	smoothstep
output	output

(Chroma1-HD-Flash-Q4_K_S)

@phil2sat

Copy link

phil2sat commented Sep 10, 2025

struct BetaSchedule : SigmaSchedule {
 static constexpr double alpha = 0.6;
 static constexpr double beta = 0.6;
 // Log Beta function
 static double log_beta(double a, double b) {
 return std::lgamma(a) + std::lgamma(b) - std::lgamma(a + b);
 }
 // Regularized incomplete beta function using continued fraction
 static double incbeta(double x, double a, double b) {
 if (x <= 0.0) return 0.0;
 if (x >= 1.0) return 1.0;
 // Use the continued fraction approximation (Lentz’s method)
 const int MAX_ITER = 200;
 const double EPSILON = 3.0e-7;
 double aa, c, d, del, h;
 double qab = a + b;
 double qap = a + 1.0;
 double qam = a - 1.0;
 c = 1.0;
 d = 1.0 - qab * x / qap;
 if (std::fabs(d) < 1e-30) d = 1e-30;
 d = 1.0 / d;
 h = d;
 for (int m = 1; m <= MAX_ITER; m++) {
 int m2 = 2 * m;
 // Even term
 aa = m * (b - m) * x / ((qam + m2) * (a + m2));
 d = 1.0 + aa * d;
 if (std::fabs(d) < 1e-30) d = 1e-30;
 c = 1.0 + aa / c;
 if (std::fabs(c) < 1e-30) c = 1e-30;
 d = 1.0 / d;
 h *= d * c;
 // Odd term
 aa = -(a + m) * (qab + m) * x / ((a + m2) * (qap + m2));
 d = 1.0 + aa * d;
 if (std::fabs(d) < 1e-30) d = 1e-30;
 c = 1.0 + aa / c;
 if (std::fabs(c) < 1e-30) c = 1e-30;
 d = 1.0 / d;
 del = d * c;
 h *= del;
 if (std::fabs(del - 1.0) < EPSILON) break;
 }
 return std::exp(a * std::log(x) + b * std::log(1.0 - x) - log_beta(a, b)) / a * h;
 }
 // Beta CDF using symmetry for better convergence
 static double beta_cdf(double x, double a, double b) {
 if (x == 0.0) return 0.0;
 if (x == 1.0) return 1.0;
 if (x < (a + 1.0) / (a + b + 2.0)) {
 return incbeta(x, a, b);
 } else {
 return 1.0 - incbeta(1.0 - x, b, a);
 }
 }
 // Inverse Beta CDF (PPF) using Newton-Raphson
 static double beta_ppf(double u, double a, double b, int max_iter = 30) {
 double x = 0.5; // initial guess
 for (int i = 0; i < max_iter; i++) {
 double f = beta_cdf(x, a, b) - u;
 if (std::fabs(f) < 1e-10) break;
 // derivative = x^(a-1) * (1-x)^(b-1) / B(a,b)
 double df = std::exp((a-1.0)*std::log(x) + (b-1.0)*std::log(1.0-x) - log_beta(a,b));
 x -= f / df;
 if (x <= 0.0) x = 1e-10;
 if (x >= 1.0) x = 1.0 - 1e-10;
 }
 return x;
 }
 std::vector<float> get_sigmas(uint32_t n, float /*sigma_min*/, float /*sigma_max*/, t_to_sigma_t t_to_sigma) override {
 std::vector<float> result;
 result.reserve(n + 1);
 int t_max = TIMESTEPS - 1;
 if (n == 0) return result;
 if (n == 1) {
 result.push_back(t_to_sigma((float)t_max));
 result.push_back(0.f);
 return result;
 }
 int last_t = -1;
 for (uint32_t i = 0; i < n; i++) {
 double u = 1.0 - double(i)/double(n); // reversed linspace
 double t_cont = beta_ppf(u, alpha, beta) * t_max;
 int t = (int)std::lround(t_cont);
 if (t != last_t) {
 result.push_back(t_to_sigma((float)t));
 last_t = t;
 }
 }
 result.push_back(0.f);
 return result;
 }
};

Thats the way to go... no speed loss no gain only some more lines

@Green-Sky

Copy link

Contributor Author

Green-Sky commented Sep 10, 2025 •

edited

Loading

image

The paper has those graphs here, indicating we might want to try 0.5 or 0.55 for low step chroma too.

ref: https://arxiv.org/abs/2407.12173

@Green-Sky Green-Sky force-pushed the scheduler_beta_phil2sat branch from f382a48 to 5635b0e Compare

September 10, 2025 13:57

@phil2sat

Copy link

phil2sat commented Sep 10, 2025

i dont think that makes a huge difference never seen beta configurable and other than 0.6 i guess 0.5 or 0.55 makes 3 of 1m pixel difference

@Green-Sky Green-Sky marked this pull request as ready for review

September 10, 2025 14:09

@wbruna

Copy link

Contributor

wbruna commented Sep 10, 2025

Just came up with an alternative, too: wbruna@2050ffe (looks like the same algorithm):

boost	`2050ffe`
test_beta_boost_1757513933	test_beta_local_1757513685

@Green-Sky Green-Sky marked this pull request as draft

September 10, 2025 14:50

@phil2sat @Green-Sky


 Beta Scheduler

d68873f

@Green-Sky Green-Sky force-pushed the scheduler_beta_phil2sat branch from 5635b0e to d68873f Compare

September 10, 2025 15:13

@phil2sat

Copy link

phil2sat commented Sep 10, 2025

Just came up with an alternative, too: wbruna@2050ffe (looks like the same algorithm):

boost 2050ffe
test_beta_boost_1757513933 test_beta_local_1757513685

For me that looks exactly different, lol, so like my first fake implementaion without boost or simple, compare it with the actual implementaion its exact what boost does but without boost dependency and also with the same speed, you posted two different pics, on the first view it seem the same bit copare it with simple i guess thats more simple than beta

@Green-Sky Green-Sky mentioned this pull request

Sep 10, 2025

add SmoothStep Schedule #813

Merged

@phil2sat

Copy link

phil2sat commented Sep 10, 2025

maybe its also time for an simple comeback https://github.com/user-attachments/files/22256634/simple_beta.tar.gz

@wbruna

Copy link

Contributor

wbruna commented Sep 10, 2025

For me that looks exactly different, lol, so like my first fake implementaion without boost or simple, compare it with the actual implementaion its exact what boost does but without boost dependency and also with the same speed, you posted two different pics, on the first view it seem the same bit copare it with simple i guess thats more simple than beta

Well... yeah, of course they are. But the difference is in the finishing steps, so that points to a precision issue. If we mindlessly crank up the precision:

diff --git a/denoiser.hpp b/denoiser.hpp
index d841f03..541bb99 100644
--- a/denoiser.hpp
+++ b/denoiser.hpp
@@ -280,8 +280,8 @@ struct BetaDist {
 
 double x = u < 0.5 ? u * u : 1.0 - (1.0 - u) * (1.0 - u);
 
- const int max_iterations = 50;
- const double tolerance = 1e-12;
+ const int max_iterations = 1000;
+ const double tolerance = 1e-20;
 
 for (int i = 0; i < max_iterations; ++i) {
 double err = beta_cdf(x) - u;
@@ -333,8 +333,8 @@ private:
 double incomplete_beta(double a, double b, double x) {
 
 double f = 1.0, c = 1.0, d = 0.0;
- const int max_iterations = 200;
- const double tolerance = 1e-15;
+ const int max_iterations = 1000;
+ const double tolerance = 1e-20;
 
 for (int i = 0; i <= max_iterations; ++i) {
 int m = i / 2;

... we get the same sha256 between images generated by Boost and this implementation.

Let me just clarify why I posted it as-is:

if I came up independently with almost the same algorithm, and I got almost the same results as yours, that validates both our versions;
if the results were different, perhaps a look at what I did could point to an issue on the PR;
it was a test of the Boost implementation, too;
I coded it so it'd be easier to compare implementations, and that approach could be useful to test yours.

@Green-Sky

Copy link

Contributor Author

Green-Sky commented Sep 10, 2025 •

edited

Loading

I suspect a Cubic Bezier fit might be another simple solution.

a visualization of what I meant: https://thebookofshaders.com/edit.php?log=160414041933
code: http://www.flong.com/archive/texts/code/shapers_bez/

a similar function is also called a gain function.

@phil2sat

Copy link

phil2sat commented Sep 11, 2025 •

edited

Loading

for me the question is, after testing this actual pull, the code is about twice as long as boost version.
that doesnt matter that much, so why check for boost it makes the code as long as without.

on a modern gpu i guess there is absolute zero speed gain or loss, even on my gpu.
to be clear the image i generated took 15.19s/it beta without boost while simple needed 15.15s/it, that is in the range of GPU throtteling, temp or random.

so extra checking for boost is in my opinion not neccesary as the actual implemetation is exactly cloning what boost does.
keep in mind that if the 0.04s on a 4090 or faster gets 0.00.... somewhat and thats simple against beta so boost is 0.0000.... somewhat nobody can measure or is in the range of measuring tolerance, theoretically my AI says its slower, practically if you do 1m steps there is maybe 1s.

and i dont really know it this simple math is slower than following a pointer to external lib or statically make the bin twice as big, even with my slow gpu they do the same speed. (didnt check size difference)

for me it works, im fine so whats next, maybe "pertubed attention guidance" PAG ? makes SDXL much better.
qwen-image tested? i tried but didnt work... no qwen2.5 text encoder option, nice model like chroma but with even better text_encoder. was the last on comfyui i tested with.
or maybe the t5xxl unchained config.
im not so close to the code to know all since the last three days, maybe sub quad attention like comfyui, dont know the default attention in sd.cpp but thats the comfyui default.
flash attention doesnt work for me it makes steps need 4x time. gfx900 is missing something it needs
fa2 or 3 i cant even think about also sage attention

@Green-Sky Green-Sky changed the title ~~(削除) Beta Scheduler (削除ここまで)~~ (追記) Beta Schedule (追記ここまで)

Sep 11, 2025

@Green-Sky Green-Sky changed the title ~~(削除) Beta Schedule (削除ここまで)~~ (追記) add Beta Schedule (追記ここまで)

Sep 11, 2025

@MrSnichovitch MrSnichovitch mentioned this pull request

Sep 12, 2025

Feature Request: add "beta" schedule type to better support Chroma models #777

Closed

@phil2sat

Copy link

phil2sat commented Sep 12, 2025 •

edited

Loading

some comparison:

Chroma V47 heun 8-Step

Top Beta(#811) /Bottom Bezier(test)	Top SmoothStep(main)/Bottom Simple
3	2
12	4

sorry for low res but it takes ages, @Green-Sky your idea with bezier. the first test looks promising but i have to tweak it a little bit, details are finer but a little noise in hair. have to generate larger resolutions later if fine tuned.

sd -M img_gen -p "Character: one 18-year-old elven woman, long delicate pastel pink hair in intricate braids, short pointed ears, realistic skin, natural hair, realistic proportions, hyper-realistic, life-like, photorealistic. 
Pose: Standing gracefully in a mystical forest clearing beneath the sacred elven life tree, shy yet seductive expression, averted gaze, slight smile, biting lower lip, subtle arch in back, one hand on collarbone, leaning forward slightly.
Clothing: Flowing, light, ethereal elven garments with intricate details on fabric and hair.
Atmosphere: Inviting, cozy, magical, warm dimmed lighting filtering through trees, cinematic lighting, enchanting and immersive, hyper-realistic textures, subtle motion blur for natural realism.
Technical: Medium-wide shot, professional color photography, high-resolution, perfect composition, captured with Bolex H16." -n "illustration, anime, drawing, artwork, bad hands. artwork" --sampling-method heun --steps 8 --scheduler simple -W 512 -H 512 -b 1 --cfg-scale 1 -s 512944566 --clip-skip -1 --embd-dir /home/phil2sat/sd.cpp-webui/models/embeddings/ --lora-model-dir /home/phil2sat/sd.cpp-webui/models/loras/ -t 0 --rng cuda -o /home/phil2sat/sd.cpp-webui/outputs/txt2img/4.png --diffusion-model /home/phil2sat/sd.cpp-webui/models/unet/chroma-unlocked-v47-flash-heun-8-steps-cfg-1-Q8_0.gguf --vae /home/phil2sat/sd.cpp-webui/models/vae/flux-vae-Q8_0.gguf --t5xxl /home/phil2sat/sd.cpp-webui/models/clip/t5xxl-unchained-Q8_0.gguf --guidance 7 --color

I think i have maxed out Detail vs. Noise/Artifacts.
Actual Bezier implementation:

struct BezierSchedule : SigmaSchedule {
 // Inverse Quadratic Bézier from Don Lancaster
 static float quadraticBezier(float x, float a, float b) {
 const float epsilon = 0.00001f;
 a = std::clamp(a, 0.0f, 1.0f);
 b = std::clamp(b, 0.0f, 1.0f);
 if (a == 0.5f) a += epsilon;
 // Solve t from x (inverse quadratic Bézier)
 float om2a = 1.0f - 2.0f * a;
 float t = (std::sqrt(a*a + om2a*x) - a) / om2a;
 float y = (1.0f - 2.0f * b) * (t*t) + 2.0f * b * t;
 return y;
 }
 std::vector<float> get_sigmas(uint32_t n, float /*sigma_min*/, float /*sigma_max*/, t_to_sigma_t t_to_sigma) override {
 std::vector<float> result;
 result.reserve(n + 1);
 const int t_max = TIMESTEPS - 1;
 if (n == 0) {
 // Handle zero-step case: push max sigma and 0
 result.push_back(t_to_sigma(static_cast<float>(t_max)));
 result.push_back(0.f);
 return result;
 }
 // Parameters for a Flux-Chroma-like distribution.
 // These values were tuned experimentally to match the desired 8-step curve
 const float a = 0.38f; // Quadratic Bézier X control point
 const float b = 0.96f; // Quadratic Bézier Y control point
 for (uint32_t i = 0; i < n; i++) {
 // Uniform parameter in [0,1] for all steps
 float u = static_cast<float>(i) / static_cast<float>(n - 1);
 
 // Compute the inverted quadratic Bézier value and scale to timestep
 float t_cont = (1.0f - quadraticBezier(u, a, b)) * t_max;
 
 int t = static_cast<int>(std::lround(t_cont));
 result.push_back(t_to_sigma(static_cast<float>(t)));
 }
 // Ensure the last sigma value is zero (final step)
 result.push_back(0.f);
 return result;
 }
};

same parameters heun bezier 1024x1024 cant get higher, dang.
13

add Beta Schedule #811

Are you sure you want to change the base?

add Beta Schedule #811

Uh oh!

Conversation

@Green-Sky Green-Sky commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO:

Uh oh!

Green-Sky commented Sep 10, 2025

Uh oh!

Green-Sky commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil2sat commented Sep 10, 2025

Uh oh!

Green-Sky commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil2sat commented Sep 10, 2025

Uh oh!

wbruna commented Sep 10, 2025

Uh oh!

phil2sat commented Sep 10, 2025

Uh oh!

phil2sat commented Sep 10, 2025

Uh oh!

wbruna commented Sep 10, 2025

Uh oh!

Green-Sky commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil2sat commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil2sat commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

@Green-Sky Green-Sky commented Sep 10, 2025 •

edited

Loading

Green-Sky commented Sep 10, 2025 •

edited

Loading

Green-Sky commented Sep 10, 2025 •

edited

Loading

Green-Sky commented Sep 10, 2025 •

edited

Loading

phil2sat commented Sep 11, 2025 •

edited

Loading

phil2sat commented Sep 12, 2025 •

edited

Loading