Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 12:40

The comments on SO comments on SO give some good hints. The most critical part is obviously blockIterator, so let's make it a separate function:

The comments on SO give some good hints. The most critical part is obviously blockIterator, so let's make it a separate function:

Source Link

answered Sep 11, 2013 at 16:03

Petr

answered Sep 11, 2013 at 16:03

Petr

3.1k
19
33

The comments on SO give some good hints. The most critical part is obviously blockIterator, so let's make it a separate function:

blockIterator
 :: ([Octet] -> [Octet] -> [Octet]) -- ^ pseudo random function (HMAC)
 -> [Octet] -- ^ password octets
 -> Int -- ^ iterations
 -> [Octet] -- ^ m
 -> [Octet] -- ^ a
 -> [Octet]

The implementation using folds on a list is nice and concise, but processing the list adds some unnecessary overhead, and prevents possible optimizations. If we instead rewrite it as a recursive function

blockIterator prf passwordOctets = loop
 where
 loop i m a | i == 0 = a
 | otherwise = let m' = prf passwordOctets m
 a' = zipWith xor a m'
 in a' `deepseq` 
 m' `deepseq`
 loop (i - 1) m' a'

it gets somewhat faster. I also added deepseq on the octet lists so that they're fully evaluated at each round. Introducing the recursion inside loop, instead of making the whole blockIterator recursive allows GHC to inline it.

Measuring performance can be done using the criterion package:

import Criterion.Main
-- ...
main = do
 let stdtest n = pbkdf2HmacSha1 "password" "salt" n 20
 defaultMain [ bgroup "stdtest" $
 map (\n -> bench (show n) (nf stdtest n))
 [1, 2, 4096, 65536 ]
 ]

However, the far most time-consuming part is still processing lists inside blockIterator. Using unboxed ST arrays would make the code way faster. There would be no memory allocation in the loop, no need to force evaluation. The problem is, hmac_sha1 is implemented using list, so we'd need another optimized, ST-based implementation.

lang-hs