-
Notifications
You must be signed in to change notification settings - Fork 326
Draft
Conversation
@robbie01
robbie01
force-pushed
the
groestl-avx512-gfni
branch
2 times, most recently
from
August 13, 2025 15:23
7220730 to
6dc67f7
Compare
robbie01
commented
Aug 13, 2025
Author
Performance (with -C target-cpu=native, Ryzen 9 7900X, x86_64-pc-windows-msvc):
soft backend:
test groestl256_10 ... bench: 62.62 ns/iter (+/- 1.15) = 161 MB/s
test groestl256_100 ... bench: 604.86 ns/iter (+/- 7.71) = 165 MB/s
test groestl256_1000 ... bench: 5,930.86 ns/iter (+/- 83.92) = 168 MB/s
test groestl256_10000 ... bench: 59,241.11 ns/iter (+/- 535.22) = 168 MB/s
avx512_gfni backend:
test groestl256_10 ... bench: 15.39 ns/iter (+/- 0.42) = 666 MB/s
test groestl256_100 ... bench: 148.98 ns/iter (+/- 5.03) = 675 MB/s
test groestl256_1000 ... bench: 1,402.30 ns/iter (+/- 27.58) = 713 MB/s
test groestl256_10000 ... bench: 13,936.83 ns/iter (+/- 608.29) = 717 MB/s
@robbie01
robbie01
force-pushed
the
groestl-avx512-gfni
branch
from
August 13, 2025 16:49
6dc67f7 to
ff01186
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#718
I took a conservative approach here and kept the same in-memory representation as the original (now
soft) backend. This results in an extra twovpermbs (_mm512_permutexvar_epi8) per call tocompressandp. (Note: this is not a per-block overhead, ascompressnow works on a slice of blocks per @newpavlov's recommendation.)If it's acceptable, I can modify the code to use the same state representation in memory as it does in the register. It should be risk-free as it would be absurd for CPU features to change during execution.