-
Couldn't load subscription status.
- Fork 433
-
Hello!
Thanks for making this amazing project!
So i'm running this with realistic vision v1.5 checkpoint. I'm getting ~75 s/it with OpenBLAS enabled.
Any idea how to speed that up significantly?
With ComfyUI on the same machine in CPU mode, I'm getting ~30 s/it and it takes 10 min to generate 512x512 image with 20 steps.
What's causing such a large performance difference?
Do you know if there's a way to get some basic OpenGL 3 acceleration for some of the tensor ops?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 5 comments
-
My observation is that ggml does not achieve maximum convolution optimization, my solution is to use turbo model or configure quantization, such as q8_0
Beta Was this translation helpful? Give feedback.
All reactions
-
use lcm lora sdv1.5 with 4 steps and taesd it be much faster
Beta Was this translation helpful? Give feedback.
All reactions
-
~75 it/s
pretty sure you meant s/it
Beta Was this translation helpful? Give feedback.
All reactions
-
Yeah s/it...
Please keep in mind that ComfyUI is using f32 and not any lower quantization with 20 steps and is more than 2X faster.
Anyone know what tensorvision-cpu is doing that's so much faster than gglm?
Beta Was this translation helpful? Give feedback.
All reactions
-
@RogerDass It's just that PyTorch implements more optimized convolution algorithms that are too complex to implement in ggml. That's why PyTorch is quite heavy; instead of reinventing the wheel, they reuse existing code to avoid unnecessary complications.
Beta Was this translation helpful? Give feedback.