2017 Google Pixelbook (intel i7, intel iris 615?) · leejet/stable-diffusion.cpp · Discussion #121

RogerDass
Dec 21, 2023

Hello!

Thanks for making this amazing project!

So i'm running this with realistic vision v1.5 checkpoint. I'm getting ~75 s/it with OpenBLAS enabled.
Any idea how to speed that up significantly?

With ComfyUI on the same machine in CPU mode, I'm getting ~30 s/it and it takes 10 min to generate 512x512 image with 20 steps.

What's causing such a large performance difference?

Do you know if there's a way to get some basic OpenGL 3 acceleration for some of the tensor ops?

Replies: 5 comments

Cyberhan123
Dec 21, 2023

My observation is that ggml does not achieve maximum convolution optimization, my solution is to use turbo model or configure quantization, such as q8_0

0 replies

Amin456789
Dec 21, 2023

use lcm lora sdv1.5 with 4 steps and taesd it be much faster

0 replies

Green-Sky
Dec 21, 2023

~75 it/s

pretty sure you meant s/it

0 replies

RogerDass
Dec 21, 2023
Author

Yeah s/it...

Please keep in mind that ComfyUI is using f32 and not any lower quantization with 20 steps and is more than 2X faster.

Anyone know what tensorvision-cpu is doing that's so much faster than gglm?

0 replies

@RogerDass It's just that PyTorch implements more optimized convolution algorithms that are too complex to implement in ggml. That's why PyTorch is quite heavy; instead of reinventing the wheel, they reuse existing code to avoid unnecessary complications.

0 replies

Uh oh!

2017 Google Pixelbook (intel i7, intel iris 615?) #121

Uh oh!

Uh oh!

RogerDass Dec 21, 2023

Replies: 5 comments

Uh oh!

Cyberhan123 Dec 21, 2023

Uh oh!

Uh oh!

Amin456789 Dec 21, 2023

Uh oh!

Green-Sky Dec 21, 2023

Uh oh!

RogerDass Dec 21, 2023 Author

Uh oh!

Uh oh!

FSSRepo Dec 22, 2023

RogerDass
Dec 21, 2023

Cyberhan123
Dec 21, 2023

Amin456789
Dec 21, 2023

Green-Sky
Dec 21, 2023

RogerDass
Dec 21, 2023
Author

FSSRepo
Dec 22, 2023