Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Effect of tile size for VAE tiling performance #701

stduhpf started this conversation in Benchmark
Discussion options

On Vulkan with RX6800 (Windows driver), x axis is the tile size:

Tile size vulkan

Similar results with a RX 5700XT.

According to my tests, the optimal tile size for VAE decoding on Vulkan backend seems to be 26. But I'm really confused about the huge drop of performance that happens for tile sizes between 10 and 25 for no obvious reason. I'm not noticing the same thing on CPU so far (CPU performance seems pretty consistent, with a very slight performance adventage for smaller tile sizes that should not matter in practice)

You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

Huh. If you need a few more weird results:

ggml_vulkan: 0 = AMD Radeon RX 7600 XT (RADV NAVI33) (radv) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

768x960 image, SDXL:

tile size total VAE time iteration speed number of iterations
10 9.11s 41.67it/s 414
11 7.46s 31.25it/s 285
12 9.98s 29.41it/s 285
13 7.92s 23.26it/s 192
14 8.86s 20.41it/s 192
15 9.24s 14.08it/s 154
16 12.44s 12.66it/s 154
17 11.27s 9.62it/s 108
18 11.75s 9.17it/s 108
19 12.55s 6.99it/s 88
20 12.56s 6.99it/s 88
21 13.43s 5.24it/s 70
22 11.83s 5.35it/s 63
23 16.00s 3.95it/s 63
24 15.11s 4.18it/s 63
25 14.98s 3.21it/s 48
26 14.05s 3.44it/s 48
27 13.25s 2.66it/s 35
28 12.15s 2.89it/s 35
29 17.02s 2.06it/s 35
30 15.98s 2.19it/s 35
31 18.53s 1.62it/s 30
32 16.89s 1.78it/s 30
33 18.22s 1.32it/s 24
34 16.01s 1.50it/s 24
35 17.91s 1.12it/s 20
36 15.05s 1.33it/s 20
38 18.13s 1.11it/s 20
39 18.21s 1.21s/it 15
40 14.64s 1.03it/s 15
41 16.43s 1.37s/it 12
42 13.85s 1.15s/it 12
48 19.47s 1.62s/it 12
49 13.63s 2.27s/it 6
50 11.73s 1.95s/it 6
59 23.37s 3.89s/it 6
62 13.75s 3.43s/it 4
64 14.05s 3.51s/it 4

(rev e767be7 )

So... looks like I don't have that same drop on tiles/centisecond. On the other hand, my card seems to dislike odd-sized tiles. And those iteration numbers drop very abruptly at some points... possibly because the number of needed tiles changes on both axis at the same time?

Artifacts are very noticeable at 14 and below, BTW. But above 17 or so, I can hardly notice anything.

You must be logged in to vote
1 reply
Comment options

Interesting results. Maybe the huge drop of performance in the 10-25 time size range I'm seeing is driver-related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /