-
Notifications
You must be signed in to change notification settings - Fork 422
Allow models to run without all text encoder(s) #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks to this, one can now run Flux on an 8GB Android phone
@rmatif is your comment for this pr specifically? ... kind of sounds not related.
BTW, did you try one of the flux 8B "lite" prunes?
https://huggingface.co/Green-Sky/flux.1-lite-8B-GGUF/tree/main/lora-experiments
Here with hyper-sd lora merged, for lower step count.
@Green-Sky I think @rmatif meant that with this PR it's possible to drop T5, which makes Flux fit in only 8GB of system memory.
@Green-Sky I think @rmatif meant that with this PR it's possible to drop T5, which makes Flux fit in only 8GB of system memory.
This is exactly what I meant, sorry if I wasn't clear. With this PR, we can drop the heavy T5, so we can squeeze Flux into just an 8GB phone.
@Green-Sky I just tested Flux.1-lite and the q4_k version can also fit now into those kinds of devices, although you can't run inference on resolutions larger than 512x512 due to the compute buffer, but I bet q3_k will do just fine.
1562f0f
to
4096d99
Compare
4096d99
to
9a2ef28
Compare
Uh oh!
There was an error while loading. Please reload this page.
For now only Flux and SD3.x.
Just puts a warning instead of crashing when text encoders are missing, and then proceed without it.
TODOs (maybe in follow up PRs):
Comparisons:
SD3.5 Large Turbo (iq4_nl):
With t5_xxl:
Without t5_xxl:
Flux Schnell (iq4_nl imatrix):