The comments on SO comments on SO give some good hints. The most critical part is obviously blockIterator
, so let's make it a separate function:
The comments on SO give some good hints. The most critical part is obviously blockIterator
, so let's make it a separate function:
The comments on SO give some good hints. The most critical part is obviously blockIterator
, so let's make it a separate function:
The comments on SO give some good hints. The most critical part is obviously blockIterator
, so let's make it a separate function:
blockIterator
:: ([Octet] -> [Octet] -> [Octet]) -- ^ pseudo random function (HMAC)
-> [Octet] -- ^ password octets
-> Int -- ^ iterations
-> [Octet] -- ^ m
-> [Octet] -- ^ a
-> [Octet]
The implementation using folds on a list is nice and concise, but processing the list adds some unnecessary overhead, and prevents possible optimizations. If we instead rewrite it as a recursive function
blockIterator prf passwordOctets = loop
where
loop i m a | i == 0 = a
| otherwise = let m' = prf passwordOctets m
a' = zipWith xor a m'
in a' `deepseq`
m' `deepseq`
loop (i - 1) m' a'
it gets somewhat faster. I also added deepseq
on the octet lists so that they're fully evaluated at each round. Introducing the recursion inside loop
, instead of making the whole blockIterator
recursive allows GHC to inline it.
Measuring performance can be done using the criterion package:
import Criterion.Main
-- ...
main = do
let stdtest n = pbkdf2HmacSha1 "password" "salt" n 20
defaultMain [ bgroup "stdtest" $
map (\n -> bench (show n) (nf stdtest n))
[1, 2, 4096, 65536 ]
]
However, the far most time-consuming part is still processing lists inside blockIterator
. Using unboxed ST arrays would make the code way faster. There would be no memory allocation in the loop, no need to force evaluation. The problem is, hmac_sha1
is implemented using list, so we'd need another optimized, ST
-based implementation.