-
Notifications
You must be signed in to change notification settings - Fork 509
Comments
[feature] integrate zeknox GPU-acceleration library into gnark#1332
[feature] integrate zeknox GPU-acceleration library into gnark #1332dloghin wants to merge 68 commits intoConsensys:master from
Conversation
doutv
commented
Dec 9, 2024
@ivokub need your help and review!
ivokub
commented
Dec 9, 2024
@ivokub need your help and review!
On it. Would it be possible to allow adding commits directly to the branch for easier review?
doutv
commented
Dec 10, 2024
@ivokub need your help and review!
On it. Would it be possible to allow adding commits directly to the branch for easier review?
Sure, I've add grant you push permission in https://github.com/okx/gnark/invitations
Let me delete those examples to keep the PR clean
ivokub
commented
Dec 13, 2024
I'm not able to create a proof for now, in the debug logs I see the last action is:
�[90m14:05:05�[0m DBG Bs.MultiExp done �[36mMSMG2 5 took=�[0m0.86421 �[36macceleration=�[0mzeknox �[36mbackend=�[0mgroth16 �[36mcurve=�[0mbn254 �[36mnbConstraints=�[0m6
I guess it is probably some deadlock somewhere. Have you been able to run end-to-end prover?
dloghin
commented
Dec 16, 2024
Hi Ivo,
May I check: if you use the precompiled zeknox libraries, does your GPU have compute capability 8.6 or 8.9? (only these two are supported by our precompiled libraries).
On our systems, the end-to-end example (go run -tags=zeknox examples/zeknox/main.go) is working.
ivokub
commented
Dec 16, 2024
Hi Ivo,
May I check: if you use the precompiled zeknox libraries, does your GPU have compute capability 8.6 or 8.9? (only these two are supported by our precompiled libraries).
On our systems, the end-to-end example (go run -tags=zeknox examples/zeknox/main.go) is working.
I'm using AWS g4dn.xlarge instance which by documentation is T4. And it seems it is compute capability 7.5.
Should it work if I compile the libraries myself? I started compiling them, but it took quite a bit of time and I didn't let it terminate. When I benchmarked previously, then g4dn was quite good balance between performance and $-per-proof cost.
doutv
commented
Dec 16, 2024
Yeah, compile by yourself should work. Compile BN254 MSM G2 takes ~5mins on our device. expect a long compile time
doutv
commented
Dec 16, 2024
ivokub
commented
Dec 16, 2024
Use this script https://github.com/okx/zeknox/blob/main/native/build-release-msm-bn254.sh
Indeed I got it working and the speedup is similar to the one claimed in the PR (1.6x). I also had to build libblst.
But now it seems that there is an issue with the proof, I get invalid proof:
panic: points in the proof are not in the correct subgroup
I could try looking into it, but it would probably take a bit time to compare the computed values against CPU execution - would it be possible to try out with another GPU and see if you hit the same problem?
This is an edge case. We found this bug, tried many methods to fix it, but it still happens...
I will look into it.
dloghin
commented
Feb 20, 2025
Hi @ivokub, my latest commit fixes (temporarily) the issue with invalid proof. We observe that this issue appears in multi-GPU environments with relatively low frequency but we did not find the reason. If the proof is invalid, we recompute only the invalid points on CPU. We still observe 25-50% speedup even when this issue appears. Please review. Thank you.
ivokub
commented
Feb 28, 2025
Hi @ivokub, my latest commit fixes (temporarily) the issue with invalid proof. We observe that this issue appears in multi-GPU environments with relatively low frequency but we did not find the reason. If the proof is invalid, we recompute only the invalid points on CPU. We still observe 25-50% speedup even when this issue appears. Please review. Thank you.
Thanks for the update. It takes a bit time to review more. We actually intend to make different proving backend support more modular, so I would like to get that done before.
Uh oh!
There was an error while loading. Please reload this page.
Description
This PR aims to integrate zeknox GPU-acceleration library into gnark. Specifically, this PR targets the GPU (NVIDIA CUDA) acceleration of groth16 backend over BN254. In addition, this PR adds a new example consisting of proving/verifying a batch of secp256r1 (P256) signatures. Our benchmarking shows 1.54-1.57X speedup of the CPU+GPU execution (with zeknox) compared to the default CPU-only execution.
In summary, we did the following addition:
backend/groth16/bn254/zeknoxfolder.backend/groth16/bn254/prove.goprinted in debug mode.examples/p256.README.mdon how to run gnark with zeknox.Type of change
How has this been tested?
We wrote new tests under
backend/groth16/bn254/zeknoxandexamples/p256. In addition, we also run tests underbackend/groth16/bn254.How has this been benchmarked?
We ran the P256 example to prove/verify a batch of 10 secp256r1 keys. The steps to run:
cd examples go build -tags zeknox ./examplesResults
The times below represent the proving time (in milliseconds) for 10 secp256r1 keys.
Checklist:
golangci-lintdoes not output errors locally