Code Review of Haskell PBKDF2

Question 1

Moved from Programmers.SE.

I have written a new version of the PBKDF2 algorithm in Haskell. It passes all of the HMAC-SHA-1 test vectors listed in RFC 6070, but it is not very efficient. How can I improve the code? I plan to upload it to Github (and maybe Hackage) once it is improved.

{-# LANGUAGE BangPatterns #-}
{- Copyright 2013, G. Ralph Kuntz, MD. All rights reserved. LGPL License. -}
module Crypto where
import Codec.Utils (Octet)
import qualified Data.Binary as B (encode)
import Data.Bits (xor)
import qualified Data.ByteString.Lazy.Char8 as C (pack)
import qualified Data.ByteString.Lazy as L (unpack)
import Data.List (foldl')
import Data.HMAC (hmac_sha1)
import Text.Bytedump (dumpRaw)
-- Calculate the PBKDF2 as a hexadecimal string
pbkdf2
 :: ([Octet] -> [Octet] -> [Octet]) -- pseudo random function (HMAC)
 -> Int -- hash length in bytes
 -> String -- password
 -> String -- salt
 -> Int -- iterations
 -> Int -- derived key length in bytes
 -> String
pbkdf2 prf hashLength password salt iterations keyLength =
 let
 passwordOctets = stringToOctets password
 saltOctets = stringToOctets salt
 totalBlocks =
 ceiling $ (fromIntegral keyLength :: Double) / fromIntegral hashLength
 blockIterator message acc =
 foldl' (\(a, m) _ ->
 let !m' = prf passwordOctets m
 in (zipWith xor a m', m')) (acc, message) [1..iterations]
 in
 dumpRaw $ take keyLength $ foldl' (\acc block ->
 acc ++ fst (blockIterator (saltOctets ++ intToOctets block)
 (replicate hashLength 0))) [] [1..totalBlocks]
 where
 intToOctets :: Int -> [Octet]
 intToOctets i =
 let a = L.unpack . B.encode $ i
 in drop (length a - 4) a
 stringToOctets :: String -> [Octet]
 stringToOctets = L.unpack . C.pack
-- Calculate the PBKDF2 as a hexadecimal string using HMAC and SHA-1
pbkdf2HmacSha1
 :: String -- password
 -> String -- salt
 -> Int -- iterations
 -> Int -- derived key length in bytes
 -> String
pbkdf2HmacSha1 =
 pbkdf2 hmac_sha1 20

Question 2

Do you have a test suite that measures the algorithm's performance? It'd be much easier to test ideas for improvements.

Question 3

I have a test suite that confirms that it passes the test vectors. My suite does not currently check performance. Still learning testing in Haskell.

Question 4

The comments on SO give some good hints. The most critical part is obviously blockIterator, so let's make it a separate function:

blockIterator
 :: ([Octet] -> [Octet] -> [Octet]) -- ^ pseudo random function (HMAC)
 -> [Octet] -- ^ password octets
 -> Int -- ^ iterations
 -> [Octet] -- ^ m
 -> [Octet] -- ^ a
 -> [Octet]

The implementation using folds on a list is nice and concise, but processing the list adds some unnecessary overhead, and prevents possible optimizations. If we instead rewrite it as a recursive function

blockIterator prf passwordOctets = loop
 where
 loop i m a | i == 0 = a
 | otherwise = let m' = prf passwordOctets m
 a' = zipWith xor a m'
 in a' `deepseq` 
 m' `deepseq`
 loop (i - 1) m' a'

it gets somewhat faster. I also added deepseq on the octet lists so that they're fully evaluated at each round. Introducing the recursion inside loop, instead of making the whole blockIterator recursive allows GHC to inline it.

Measuring performance can be done using the criterion package:

import Criterion.Main
-- ...
main = do
 let stdtest n = pbkdf2HmacSha1 "password" "salt" n 20
 defaultMain [ bgroup "stdtest" $
 map (\n -> bench (show n) (nf stdtest n))
 [1, 2, 4096, 65536 ]
 ]

However, the far most time-consuming part is still processing lists inside blockIterator. Using unboxed ST arrays would make the code way faster. There would be no memory allocation in the loop, no need to force evaluation. The problem is, hmac_sha1 is implemented using list, so we'd need another optimized, ST-based implementation.

Petr Petr 3,10019 silver badges33 bronze badges · Accepted Answer · 2013-09-11 16:03:19Z

The comments on SO give some good hints. The most critical part is obviously blockIterator, so let's make it a separate function:

blockIterator
 :: ([Octet] -> [Octet] -> [Octet]) -- ^ pseudo random function (HMAC)
 -> [Octet] -- ^ password octets
 -> Int -- ^ iterations
 -> [Octet] -- ^ m
 -> [Octet] -- ^ a
 -> [Octet]

The implementation using folds on a list is nice and concise, but processing the list adds some unnecessary overhead, and prevents possible optimizations. If we instead rewrite it as a recursive function

blockIterator prf passwordOctets = loop
 where
 loop i m a | i == 0 = a
 | otherwise = let m' = prf passwordOctets m
 a' = zipWith xor a m'
 in a' `deepseq` 
 m' `deepseq`
 loop (i - 1) m' a'

it gets somewhat faster. I also added deepseq on the octet lists so that they're fully evaluated at each round. Introducing the recursion inside loop, instead of making the whole blockIterator recursive allows GHC to inline it.

Measuring performance can be done using the criterion package:

import Criterion.Main
-- ...
main = do
 let stdtest n = pbkdf2HmacSha1 "password" "salt" n 20
 defaultMain [ bgroup "stdtest" $
 map (\n -> bench (show n) (nf stdtest n))
 [1, 2, 4096, 65536 ]
 ]

However, the far most time-consuming part is still processing lists inside blockIterator. Using unboxed ST arrays would make the code way faster. There would be no memory allocation in the loop, no need to force evaluation. The problem is, hmac_sha1 is implemented using list, so we'd need another optimized, ST-based implementation.

Stack Exchange Network

Code Review of Haskell PBKDF2

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Code Review of Haskell PBKDF2

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions