WARNING : this is for educational purpose only, this code may contains bugs and is not secure (timing attacks for exemple)
| Name | Optimized ? |
|---|---|
| Shake128 | Yes |
| Poly1305 | Mostly |
Note : generated using hyperfine on a ~78Mo file, see below the table for the full output
| Implementation | Time consumed (absolute) |
|---|---|
| OpenSSL | 198 ms |
| Python (hashlib) | 209 ms |
| Rust (tiny-keccak, quoted in Keccak website) | 198 ms |
| My implementation | 208 ms |
The idea is to split the 130 bits field integer in 5 separates 26 bits limbs represented by u64. It allows to handle such integers without any dependancy and to propagate the carry more efficiently. I implemented a naive addition on top of that. It might be more optimized to split the 130 bits differently by using u128 integers instead of u64, reducing the number of limbs but I did not tried. This first "naive" implementation focused on arithmetic optimization gave a throughput of approx. 3.7 cycles/byte.