-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Open
Milestone
@kkoogqw What version of Go are you using (
Description
What version of Go are you using (go version
)?
$ go version 1.14.4
When I run AES-CBC performance analysis on amd64 and arm64 platforms, I found that function:func xorBytes(dst, a, b []byte) int
and func safeXORBytes(dst, a, b []byte, n int)
(in crypto/cipher/xor_generic.go) on arm64-arch always appears top15 in pprof list. Compared with amd64-arch, this function uses SSE2 SIMD instruction in func xorBytesSSE2(dst, a, b *byte, n int)
.
```bash (pprof) top10 Showing nodes accounting for 700ms, 55.12% of 1270ms total Showing top 10 nodes out of 113 flat flat% sum% cum cum% 170ms 13.39% 13.39% 530ms 41.73% runtime.mallocgc 90ms 7.09% 20.47% 90ms 7.09% crypto/cipher.safeXORBytes 90ms 7.09% 27.56% 130ms 10.24% syscall.Syscall 80ms 6.30% 33.86% 80ms 6.30% runtime.nextFreeFast (inline) 60ms 4.72% 38.58% 60ms 4.72% runtime.publicationBarrier 50ms 3.94% 42.52% 50ms 3.94% crypto/aes.expandKeyAsm 50ms 3.94% 46.46% 140ms 11.02% crypto/cipher.xorBytes 40ms 3.15% 49.61% 40ms 3.15% runtime.acquirem (inline) 40ms 3.15% 52.76% 40ms 3.15% runtime.memclrNoHeapPointers 30ms 2.36% 55.12% 30ms 2.36% crypto/internal/subtle.InexactOverlap
I consider whether we can use the arm64 SIMD instruction to optimize the performance of this function?