Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

groestl: add AVX-512/GFNI backend#720

Draft
robbie01 wants to merge 1 commit into
RustCrypto:master from
robbie01:groestl-avx512-gfni
Draft

groestl: add AVX-512/GFNI backend #720
robbie01 wants to merge 1 commit into
RustCrypto:master from
robbie01:groestl-avx512-gfni

Conversation

@robbie01

@robbie01 robbie01 commented Aug 13, 2025

Copy link
Copy Markdown

#718

I took a conservative approach here and kept the same in-memory representation as the original (now soft) backend. This results in an extra two vpermbs (_mm512_permutexvar_epi8) per call to compress and p. (Note: this is not a per-block overhead, as compress now works on a slice of blocks per @newpavlov's recommendation.)

If it's acceptable, I can modify the code to use the same state representation in memory as it does in the register. It should be risk-free as it would be absurd for CPU features to change during execution.

@robbie01 robbie01 force-pushed the groestl-avx512-gfni branch 2 times, most recently from 7220730 to 6dc67f7 Compare August 13, 2025 15:23

Copy link
Copy Markdown
Author

Performance (with -C target-cpu=native, Ryzen 9 7900X, x86_64-pc-windows-msvc):

soft backend:

test groestl256_10 ... bench: 62.62 ns/iter (+/- 1.15) = 161 MB/s
test groestl256_100 ... bench: 604.86 ns/iter (+/- 7.71) = 165 MB/s
test groestl256_1000 ... bench: 5,930.86 ns/iter (+/- 83.92) = 168 MB/s
test groestl256_10000 ... bench: 59,241.11 ns/iter (+/- 535.22) = 168 MB/s

avx512_gfni backend:

test groestl256_10 ... bench: 15.39 ns/iter (+/- 0.42) = 666 MB/s
test groestl256_100 ... bench: 148.98 ns/iter (+/- 5.03) = 675 MB/s
test groestl256_1000 ... bench: 1,402.30 ns/iter (+/- 27.58) = 713 MB/s
test groestl256_10000 ... bench: 13,936.83 ns/iter (+/- 608.29) = 717 MB/s

@robbie01 robbie01 marked this pull request as draft August 13, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /