-
Notifications
You must be signed in to change notification settings - Fork 432
Description
Git commit
Operating System & Version
Ubuntu 24.04
GGML backends
Vulkan
Command-line arguments used
./sd -r /home/user/shirt8.png --diffusion-model /home/user/sd.cpp-webui/models/unet/flux1-kontext-dev-Q8_0.gguf -W 736 -H 1024 --lora-model-dir /home/user/sd.cpp-webui/models/loras/ --vae /home/user/sd.cpp-webui/models/vae/ae.safetensors --clip_l /home/user/sd.cpp-webui/models/clip/clip_l.safetensors --t5xxl /home/user/sd.cpp-webui/models/clip/t5xxl_fp16.safetensors -p "lora:tryanything_flux_kontext_lora:1 try on this outfit, man, copy shirt" --cfg-scale 1.0 --sampling-method euler -v --clip-on-cpu -o /home/user/sd.cpp-webui/outputs/imgedit/2.png --vae-on-cpu
Steps to reproduce
All versions prior to commit b25785b which syncs ggml when running the command would set the following amount of vram for buffer - [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM).
With that commit the amount of vram for buffer jumps - [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) which overflows into GTT memory and becomes super slow.
What you expected to happen
The vram usage to be below 16 GB with [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM)
What actually happened
vram usage spiked to 21 GB with [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) on same command
Logs / error messages / stack trace
No response
Additional context / environment details
No response