-
Couldn't load subscription status.
- Fork 433
Description
Git commit
$ git rev-parse HEAD
d05e46c
Operating System & Version
Ubuntu 22.04
GGML backends
Vulkan
Command-line arguments used
./bin/sd --diffusion-model "/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf" --vae "/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors" --qwen2vl "/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf" --lora-model-dir "/home/user/ComfyUI-0.3.60/models/loras/qwen-image" -p "high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign "DRUNKARD" on bar with retrowave font. long pink nailpolish lora:Samsung:1" --cfg-scale 3.5 --steps 20 --sampling-method euler -v --offload-to-cpu -H 1024 -W 1024 --diffusion-fa --flow-shift 3 --vae-on-cpu --seed 244727409499015
Steps to reproduce
-
Use the following models:
- UNet: Qwen_Image-Q5_0.gguf
- VAE: qwen_image_vae.safetensors
- CLIP: Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf
- LoRA: Samsung.safetensors (Link: https://civitai.com/models/1551668/samsungcam-ultrareal)
-
Run the first command (without LoRA) using --seed 244727409499015 and save the output.png.
-
Run the second command (with lora:Samsung:1) using the exact same --seed 244727409499015 and save the output.png.
-
Compare the two output images. They will be identical
What you expected to happen
The image generated with the LoRA lora:Samsung:1 and seed 244727409499015 should be noticeably different from the image generated without the LoRA using the same seed.
The LoRA weights should be applied during the sampling process, altering the final image.
What actually happened
The LoRA appears to load correctly. The log output confirms it was found and "applied":
[INFO ] stable-diffusion.cpp:929 - lora 'Samsung' applied, taking 1.87s
[INFO ] stable-diffusion.cpp:969 - apply_loras completed, taking 1.87s
However, the final output.png generated with the LoRA is identical to the output.png generated without the LoRA, despite using the same fixed seed.
This indicates the LoRA weights are not actually being used during the sampling steps.
Logs / error messages / stack trace
-
output without lora:
./bin/sd --diffusion-model "/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf" --vae "/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors" --qwen2vl "/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf" -p "high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign "DRUNKARD" on bar with retrowave font. long pink nailpolish" --cfg-scale 3.5 --steps 20 --sampling-method euler -v --offload-to-cpu -H 1024 -W 1024 --diffusion-fa --flow-shift 3 --vae-on-cpu --seed 244727409499015
Option:
n_threads: 14
mode: img_gen
model_path:
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
qwen2vl_path: /home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf
qwen2vl_vision_path:
diffusion_model_path: /home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf
high_noise_diffusion_model_path:
vae_path: /home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
auto_resize_ref_image: true
increase_ref_index: false
offload_params_to_cpu: true
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: true
diffusion flash attention: true
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 3.50, img_cfg: 3.50, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
prediction: default
flow_shift: 3.00
strength(img2img): 0.75
rng: cuda
seed: 244727409499015
batch_count: 1
vae_tiling: false
force_sdxl_vae_conv_scale: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:151 - Using Vulkan backend
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 0 = Radeon RX 7900 XT (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:207 - loading diffusion model from '/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf'
[INFO ] model.cpp:1097 - load /home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf using gguf format
[DEBUG] model.cpp:1114 - init from '/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf'
[INFO ] stable-diffusion.cpp:254 - loading qwen2vl from '/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf'
[INFO ] model.cpp:1097 - load /home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf using gguf format
[DEBUG] model.cpp:1114 - init from '/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf'
[INFO ] stable-diffusion.cpp:268 - loading vae from '/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors'
[INFO ] model.cpp:1100 - load /home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors using safetensors format
[DEBUG] model.cpp:1207 - init from '/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:289 - Version: Qwen Image
[INFO ] stable-diffusion.cpp:316 - Weight type stat: f32: 1422 | q5_0: 720 | q5_1: 120 | q4_K: 111 | q5_K: 25 | q6_K: 41 | iq4_xs: 20 | bf16: 6
[INFO ] stable-diffusion.cpp:317 - Conditioner weight type stat: f32: 141 | q4_K: 111 | q5_K: 25 | q6_K: 41 | iq4_xs: 20
[INFO ] stable-diffusion.cpp:318 - Diffusion model weight type stat: f32: 1087 | q5_0: 720 | q5_1: 120 | bf16: 6
[INFO ] stable-diffusion.cpp:319 - VAE weight type stat: f32: 194
[DEBUG] stable-diffusion.cpp:321 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:348 - Using flash attention in the diffusion model
[DEBUG] qwenvl.hpp:141 - merges size 151387
[DEBUG] qwenvl.hpp:163 - vocab size: 151665
[INFO ] qwen_image.hpp:539 - qwen_image_params.num_layers: 60
[DEBUG] ggml_extend.hpp:1754 - qwenvl2.5 params backend buffer size = 5918.09 MB(RAM) (338 tensors)
[DEBUG] ggml_extend.hpp:1754 - qwen_image params backend buffer size = 13733.54 MB(RAM) (1933 tensors)
[INFO ] stable-diffusion.cpp:478 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1754 - wan_vae params backend buffer size = 139.84 MB(RAM) (108 tensors)
[DEBUG] stable-diffusion.cpp:596 - loading weights
[DEBUG] model.cpp:2030 - using 14 threads for model loading
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf
|=======================================> | 1933/2465 - 1177.94it/s
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf
|==============================================> | 2271/2465 - 131.23it/s
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors
|==================================================| 2465/2465 - 137.66it/s
[INFO ] model.cpp:2351 - loading tensors completed, taking 17.92s (process: 0.02s, read: 17.18s, memcpy: 0.00s, convert: 0.09s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:679 - total params memory size = 19791.47MB (VRAM 19651.64MB, RAM 139.84MB): text_encoders 5918.09MB(VRAM), diffusion_model 13733.55MB(VRAM), vae 139.84MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:778 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:803 - finished loaded file
[DEBUG] stable-diffusion.cpp:2470 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2597 - TXT2IMG
[INFO ] stable-diffusion.cpp:949 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:969 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:970 - prompt after extract and remove lora: "high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish"
[DEBUG] conditioner.hpp:1432 - parse '<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish<|im_end|>
<|im_start|>assistant
', 1], ]
[INFO ] ggml_extend.hpp:1677 - qwenvl2.5 offload params (5918.09 MB, 338 tensors) to runtime backend (Vulkan0), taking 2.20s
[DEBUG] ggml_extend.hpp:1579 - qwenvl2.5 compute buffer size: 38.04 MB(VRAM)
[DEBUG] conditioner.hpp:1572 - computing condition graph completed, taking 2429 ms
[DEBUG] conditioner.hpp:1432 - parse '<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
<|im_end|>
<|im_start|>assistant
', 1], ]
[INFO ] ggml_extend.hpp:1677 - qwenvl2.5 offload params (5918.09 MB, 338 tensors) to runtime backend (Vulkan0), taking 2.20s
[DEBUG] ggml_extend.hpp:1579 - qwenvl2.5 compute buffer size: 7.24 MB(VRAM)
[DEBUG] conditioner.hpp:1572 - computing condition graph completed, taking 2312 ms
[INFO ] stable-diffusion.cpp:2208 - get_learned_condition completed, taking 4745 ms
[INFO ] stable-diffusion.cpp:2233 - sampling using Euler method
[INFO ] stable-diffusion.cpp:2327 - generating image: 1/1 - seed 244727409499015
[INFO ] ggml_extend.hpp:1677 - qwen_image offload params (13733.54 MB, 1933 tensors) to runtime backend (Vulkan0), taking 4.89s
[DEBUG] ggml_extend.hpp:1579 - qwen_image compute buffer size: 590.52 MB(VRAM)
|==================================================| 20/20 - 8.49s/it
[INFO ] stable-diffusion.cpp:2364 - sampling completed, taking 169.98s
[INFO ] stable-diffusion.cpp:2372 - generating 1 latent images completed, taking 170.96s
[INFO ] stable-diffusion.cpp:2375 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1579 - wan_vae compute buffer size: 7492.50 MB(RAM)
[DEBUG] stable-diffusion.cpp:1666 - computing vae decode graph completed, taking 75.65s
[INFO ] stable-diffusion.cpp:2385 - latent 1 decoded, taking 75.65s
[INFO ] stable-diffusion.cpp:2389 - decode_first_stage completed, taking 75.65s
[INFO ] stable-diffusion.cpp:2709 - generate_image completed in 251.37s
save result PNG image to 'output.png' -
output with lora:
./bin/sd --diffusion-model "/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf" --vae "/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors" --qwen2vl "/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf" --lora-model-dir "/home/user/ComfyUI-0.3.60/models/loras/qwen-image" -p "high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign "DRUNKARD" on bar with retrowave font. long pink nailpolish lora:Samsung:1" --cfg-scale 3.5 --steps 20 --sampling-method euler -v --offload-to-cpu -H 1024 -W 1024 --diffusion-fa --flow-shift 3 --vae-on-cpu --seed 244727409499015
Option:
n_threads: 14
mode: img_gen
model_path:
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
qwen2vl_path: /home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf
qwen2vl_vision_path:
diffusion_model_path: /home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf
high_noise_diffusion_model_path:
vae_path: /home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
auto_resize_ref_image: true
increase_ref_index: false
offload_params_to_cpu: true
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: true
diffusion flash attention: true
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish lora:Samsung:1
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 3.50, img_cfg: 3.50, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
prediction: default
flow_shift: 3.00
strength(img2img): 0.75
rng: cuda
seed: 244727409499015
batch_count: 1
vae_tiling: false
force_sdxl_vae_conv_scale: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:151 - Using Vulkan backend
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 0 = Radeon RX 7900 XT (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:207 - loading diffusion model from '/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf'
[INFO ] model.cpp:1097 - load /home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf using gguf format
[DEBUG] model.cpp:1114 - init from '/home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf'
[INFO ] stable-diffusion.cpp:254 - loading qwen2vl from '/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf'
[INFO ] model.cpp:1097 - load /home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf using gguf format
[DEBUG] model.cpp:1114 - init from '/home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf'
[INFO ] stable-diffusion.cpp:268 - loading vae from '/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors'
[INFO ] model.cpp:1100 - load /home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors using safetensors format
[DEBUG] model.cpp:1207 - init from '/home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:289 - Version: Qwen Image
[INFO ] stable-diffusion.cpp:316 - Weight type stat: f32: 1422 | q5_0: 720 | q5_1: 120 | q4_K: 111 | q5_K: 25 | q6_K: 41 | iq4_xs: 20 | bf16: 6
[INFO ] stable-diffusion.cpp:317 - Conditioner weight type stat: f32: 141 | q4_K: 111 | q5_K: 25 | q6_K: 41 | iq4_xs: 20
[INFO ] stable-diffusion.cpp:318 - Diffusion model weight type stat: f32: 1087 | q5_0: 720 | q5_1: 120 | bf16: 6
[INFO ] stable-diffusion.cpp:319 - VAE weight type stat: f32: 194
[DEBUG] stable-diffusion.cpp:321 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:348 - Using flash attention in the diffusion model
[DEBUG] qwenvl.hpp:141 - merges size 151387
[DEBUG] qwenvl.hpp:163 - vocab size: 151665
[INFO ] qwen_image.hpp:539 - qwen_image_params.num_layers: 60
[DEBUG] ggml_extend.hpp:1754 - qwenvl2.5 params backend buffer size = 5918.09 MB(RAM) (338 tensors)
[DEBUG] ggml_extend.hpp:1754 - qwen_image params backend buffer size = 13733.54 MB(RAM) (1933 tensors)
[INFO ] stable-diffusion.cpp:478 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1754 - wan_vae params backend buffer size = 139.84 MB(RAM) (108 tensors)
[DEBUG] stable-diffusion.cpp:596 - loading weights
[DEBUG] model.cpp:2030 - using 14 threads for model loading
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/unet/Qwen_Image-Q5_0.gguf
|=======================================> | 1933/2465 - 33.63it/s
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/clip/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf
|==============================================> | 2271/2465 - 38.52it/s
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/vae/qwen_image_vae.safetensors
|==================================================| 2465/2465 - 41.67it/s
[INFO ] model.cpp:2351 - loading tensors completed, taking 59.17s (process: 0.02s, read: 57.54s, memcpy: 0.00s, convert: 0.11s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:679 - total params memory size = 19791.47MB (VRAM 19651.64MB, RAM 139.84MB): text_encoders 5918.09MB(VRAM), diffusion_model 13733.55MB(VRAM), vae 139.84MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:778 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:803 - finished loaded file
[DEBUG] stable-diffusion.cpp:2470 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2597 - TXT2IMG
[DEBUG] stable-diffusion.cpp:964 - lora Samsung:1.00
[INFO ] stable-diffusion.cpp:949 - attempting to apply 1 LoRAs
[INFO ] model.cpp:1100 - load /home/user/ComfyUI-0.3.60/models/loras/qwen-image/Samsung.safetensors using safetensors format
[DEBUG] model.cpp:1207 - init from '/home/user/ComfyUI-0.3.60/models/loras/qwen-image/Samsung.safetensors', prefix = ''
[INFO ] lora.hpp:120 - loading LoRA from '/home/user/ComfyUI-0.3.60/models/loras/qwen-image/Samsung.safetensors'
[DEBUG] model.cpp:2030 - using 14 threads for model loading
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/loras/qwen-image/Samsung.safetensors
|==================================================| 480/480 - 480000.00it/s
[INFO ] model.cpp:2351 - loading tensors completed, taking 0.03s (process: 0.03s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] ggml_extend.hpp:1754 - lora params backend buffer size = 90.00 MB(VRAM) (480 tensors)
[DEBUG] model.cpp:2030 - using 14 threads for model loading
[DEBUG] model.cpp:2113 - loading tensors from /home/user/ComfyUI-0.3.60/models/loras/qwen-image/Samsung.safetensors
|==================================================| 480/480 - 2388.06it/s
[INFO ] model.cpp:2351 - loading tensors completed, taking 0.26s (process: 0.06s, read: 0.18s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] lora.hpp:175 - lora type: ".lora_down"/".lora_up"
[DEBUG] lora.hpp:177 - finished loaded lora
[DEBUG] lora.hpp:874 - (480 / 480) LoRA tensors will be applied
[DEBUG] ggml_extend.hpp:1579 - lora compute buffer size: 1593.56 MB(VRAM)
[DEBUG] lora.hpp:874 - (480 / 480) LoRA tensors will be applied
[INFO ] stable-diffusion.cpp:929 - lora 'Samsung' applied, taking 1.87s
[INFO ] stable-diffusion.cpp:969 - apply_loras completed, taking 1.87s
[DEBUG] stable-diffusion.cpp:970 - prompt after extract and remove lora: "high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish "
[DEBUG] conditioner.hpp:1432 - parse '<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish <|im_end|>
<|im_start|>assistant
' to [['<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
high quality raw candid photo. eastern european young woman, late teens, slim. detailed skin texture, detailed blue eyes. pinterest style. natural late afternoon sunlight, on an outdoor rooftop bar patio, thick eyebrows, long wavy dark brown hair, wearing a pink pvc crop top, a black skirt with a white floral pattern, visible makeup, black sunglasses on her head, a silver moon pendant on a black cord, a silver chain-link bracelet, sitting at a wooden table and holding cup of coffee in her right hand, her body angled towards the camera while her head is turned to her right, with a thinking expression with left hand near chin, gazing off-camera towards the city view in the background where a bar area is also visible, neon sign DRUNKARD on bar with retrowave font. long pink nailpolish <|im_end|>
<|im_start|>assistant
', 1], ]
[INFO ] ggml_extend.hpp:1677 - qwenvl2.5 offload params (5918.09 MB, 338 tensors) to runtime backend (Vulkan0), taking 2.13s
[DEBUG] ggml_extend.hpp:1579 - qwenvl2.5 compute buffer size: 38.23 MB(VRAM)
[DEBUG] conditioner.hpp:1572 - computing condition graph completed, taking 2354 ms
[DEBUG] conditioner.hpp:1432 - parse '<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
<|im_end|>
<|im_start|>assistant
', 1], ]
[INFO ] ggml_extend.hpp:1677 - qwenvl2.5 offload params (5918.09 MB, 338 tensors) to runtime backend (Vulkan0), taking 2.13s
[DEBUG] ggml_extend.hpp:1579 - qwenvl2.5 compute buffer size: 7.24 MB(VRAM)
[DEBUG] conditioner.hpp:1572 - computing condition graph completed, taking 2247 ms
[INFO ] stable-diffusion.cpp:2208 - get_learned_condition completed, taking 6473 ms
[INFO ] stable-diffusion.cpp:2233 - sampling using Euler method
[INFO ] stable-diffusion.cpp:2327 - generating image: 1/1 - seed 244727409499015
[INFO ] ggml_extend.hpp:1677 - qwen_image offload params (13733.54 MB, 1933 tensors) to runtime backend (Vulkan0), taking 4.80s
[DEBUG] ggml_extend.hpp:1579 - qwen_image compute buffer size: 590.57 MB(VRAM)
|==================================================| 20/20 - 8.48s/it
[INFO ] stable-diffusion.cpp:2364 - sampling completed, taking 169.71s
[INFO ] stable-diffusion.cpp:2372 - generating 1 latent images completed, taking 170.67s
[INFO ] stable-diffusion.cpp:2375 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1579 - wan_vae compute buffer size: 7492.50 MB(RAM)
[DEBUG] stable-diffusion.cpp:1666 - computing vae decode graph completed, taking 75.84s
[INFO ] stable-diffusion.cpp:2385 - latent 1 decoded, taking 75.85s
[INFO ] stable-diffusion.cpp:2389 - decode_first_stage completed, taking 75.85s
[INFO ] stable-diffusion.cpp:2709 - generate_image completed in 253.00s
save result PNG image to 'output.png'
Additional context / environment details
- OS: Ubuntu 22.04 (Kernel 6.8.0-85-generic)
- CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
- GPU: AMD Radeon RX 7900 XT/XTX (Navi 31)
- Driver: amdgpu
- Backend: Vulkan
- RAM: 48GiB DDR4 (2x8GB @ 2400MHz, 2x16GB @ 3200MHz)