HashCash Algorithm header generation in Haskell

Question 1

A thought exercise on my part as I'm relatively new to Haskell. I wanted an interesting project to work on so I decided to implement the Hashcash Algorithm, which is most commonly used as the basis of Bitcoin Proof of Work scheme. I am implementing the original specification that utilizes SHA1 and the description of the algorithmic steps are described well in the above Wikipedia article.

This appears to function correctly to the best of my knowledge, however I feel it is somewhat slower than it should be. Any potential suggestions for performance improvements are welcome here. Furthermore, as I am new to writing Haskell, if I am violating common expected conventions here then please feel free to point out how I can write more readable and standard code here.

{-# LANGUAGE BangPatterns #-}
module HashCash where
import Data.Int
import Data.List
import Data.List.Split (splitOn)
import Data.Char
import Data.Function
import System.Random
import Data.Bits
import Data.Either
import Data.Binary.Strict.Get
import System.IO as SIO
import Data.Word (Word32)
import Data.ByteString as B
import Data.ByteString.Char8 as BC
import Data.ByteString.UTF8 as BU
import Data.ByteString.Base64 as B64
import Data.ByteString.Conversion as BCON
import Data.ByteArray as BA
import Crypto.Random
import Crypto.Hash
startingCounter :: Int32
startingCounter = 1
difficulty :: Int
difficulty = 20
headerPrefix = "X-Hashcash: "
template = "1:{:{:{::{:{"
dateTemplate = "YYMMDDhhmmss"
address = "a@a"
-- example date because I dont want to mess with date formatting just now
exampleDate = "150320112233"
convertToString :: ByteString -> String
convertToString b = BU.toString b
convertFromString :: String -> ByteString
convertFromString s = BU.fromString s
convertIntToString :: Int -> String
convertIntToString a = convertToString . BCON.toByteString' $ a
encodeInt32 :: Int32 -> ByteString
encodeInt32 a = B64.encode . BCON.toByteString' $ a
mahDecoder :: Get Word32
mahDecoder = do
 first32Bits <- getWord32be
 return first32Bits
firstBitsZero :: (Bits a) => a -> Bool
firstBitsZero val = Data.List.foldr (\x acc -> ((not $ testBit val x) && acc)) True [0..(difficulty - 1)]
formatTemplate :: String -> [String] -> String
formatTemplate base [] = base
formatTemplate base (x:xs) = 
 let splix = (Data.List.Split.splitOn "{" base) :: [String]
 splixHead = Data.List.head splix ++ x
 splixTail = Data.List.tail splix
 concatSplitTail = Data.List.init $ Data.List.concatMap (++ "{") splixTail
 in formatTemplate (splixHead ++ concatSplitTail) xs
get16RandomBytes :: (DRG g) => g -> IO (ByteString, g)
get16RandomBytes gen = do
 let a = randomBytesGenerate 16 gen
 return $ a
getBaseString :: ByteString -> Int32 -> String
getBaseString bs counter = 
 let encodedVal = B64.encode bs
 encodedCounter = encodeInt32 counter
 baseParams = [(convertIntToString difficulty), exampleDate, address, (convertToString encodedVal), (convertToString encodedCounter)]
 in formatTemplate template baseParams
hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs =
 let hashDigest = hash bs :: Digest SHA1
 byteString = B.pack . BA.unpack $ hashDigest
 in byteString
-- Pass a counter and if the first 20 bits are zero then return the same counter value else increment it
-- signifying it is time to test the next number (NOTE: recursive style, may overflow stack)
testCounter :: ByteString -> Int32 -> Int32
testCounter rb !counter = 
 let baseString = getBaseString rb counter
 hashedString = hashSHA1Encoded $ convertFromString baseString
 !eitherFirst32 = runGet mahDecoder hashedString
 incCounter = counter + 1
 in case eitherFirst32 of
 (Left first32, _) -> testCounter rb incCounter
 (Right first32, _) -> if (firstBitsZero first32)
 then counter
 else testCounter rb incCounter
testCounterBool :: ByteString -> Int32 -> Bool
testCounterBool rb counter =
 let baseString = getBaseString rb counter
 hashedString = hashSHA1Encoded $ convertFromString baseString
 eitherFirst32 = runGet mahDecoder hashedString
 in case eitherFirst32 of
 (Left first32, _) -> False
 (Right first32, _) -> firstBitsZero first32
-- Keep taking incrementing counters from an infinite list and testing them until we find a counter 
-- that generates a valid header
findValidCounter :: ByteString -> Int32
findValidCounter ran = Data.List.last $ Data.List.takeWhile (not . testCounterBool ran) [1..]
generateHeader :: IO String
generateHeader = do
 g <- getSystemDRG
 (ran, _) <- get16RandomBytes g
 let validCounter = findValidCounter ran
 let validHeader = getBaseString ran validCounter
 return $ headerPrefix ++ validHeader
main :: IO ()
main = do 
 header <- generateHeader
 SIO.putStrLn header
 return ()

Question 2

That's a lot of imports, and several seem strange. For example Data.List, since you only use head, tail, foldr or other Prelude functions. The problem is that you've use as, but left the qualified. This should be:

import Crypto.Hash
import Crypto.Random
import Data.Binary.Strict.Get
import Data.Bits
import Data.ByteString (ByteString)
import qualified Data.ByteArray as BA
import qualified Data.ByteString as B
import qualified Data.ByteString.Base64 as B64
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString.Conversion as BCON
import qualified Data.ByteString.UTF8 as BU
import Data.Either (either)
import Data.Int (Int32)
import Data.List.Split (splitOn)
import Data.Word (Word32)
import System.Random

qualified will prevent names like head from Data.ByteString getting imported into the global namespace. See "import" for more information. The whitespace is purely custom. I like to sort the modules by name, but that's up to you.

Next, I would make clear that your convert* functions are only new names, e.g.:

convertToString :: ByteString -> String
convertToString = BU.toString
convertFromString :: String -> ByteString
convertFromString = BU.fromString
convertIntToString :: Int -> String
convertIntToString = convertToString . BCON.toByteString' 
encodeInt32 :: Int32 -> ByteString
encodeInt32 = B64.encode . BCON.toByteString' 
mahDecoder :: Get Word32
mahDecoder = getWord32be

The mahDecoder change is a little bit different, since you've used originally something like

foo = do
 x <- func
 return x

However, this is fine due to the right-identity monad law:

func >>= return === func

Next, firstBitsZero can be rewritten using all:

firstBitsZero :: (Bits a) => a -> Bool
firstBitsZero val = all (\x -> not $ testBit val x) [0..(difficulty - 1)]

Note that a bit mask would be faster would make it faster, e.g.:

bitMask :: Num a => a
bitMask = (2 ^ difficulty) - 1
firstBitsZero :: Int32 -> Bool 
firstBitsZero val = bitMask .&. val == zeroBits

That way you only have to create the mask once and then only use bitwise AND, which usually gets compiled into a single CPU instruction.

When working with list, you want to add elements at the front, not at the back. Instead of

init (concatMap (++ "{") splixTail)

you should use

tail (concatMap ('{' :) splixTail)

The problem is that ++ is linear in its first argument:

(x:xs) ++ ys = x : (xs ++ ys)

(:) on the other hand is constant in terms of time.

We end up with:

formatTemplate :: String -> [String] -> String
formatTemplate base [] = base
formatTemplate base (x:xs) = 
 let (splixHead:splixTail) = splitOn "{" base
 concatSplitTail = tail $ concatMap ("{" :) splixTail
 in formatTemplate (splixHead ++ (x : concatSplitTail)) xs

Your next function doesn't need to be in IO:

get16RandomBytes :: (DRG g) => g -> (ByteString, g)
get16RandomBytes = randomBytesGenerate 16

In your following functions, I would prefer to use where, but that's completely up to personal preference. Also, if you use an expression only once, it might make sense to get rid of its binding if the code stays readable:

hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs = B.pack . BA.unpack $ (hash bs :: Digest SHA1)

or

hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs = B.pack . BA.unpack $ hashDigest
 where
 hashDigest = hash bs :: Digest SHA1

Both testCounter and testCounterBool use some duplicate code, which should be placed in its own function:

decodeFirst32 :: ByteString -> Int32 -> Either String Word32
decodeFirst32 rb = fst . runGet mahDecoder . hashSHA1Encoded . convertFromString . getBaseString rb

This makes testCounter and testCounterBool a lot simpler:

testCounter :: ByteString -> Int32 -> Int32
testCounter rb !counter = 
 case decodeFirst32 rb counter of
 Right f32 | firstBitsZero f32 -> counter
 _ -> testCounter rb (counter + 1)
testCounterBool :: ByteString -> Int32 -> Bool
testCounterBool rb counter = either (const False) firstBitsZero $ decodeFirst32 rb counter

With findValidCounter, I'm somewhat sure that your logic isn't 100% correct. According to the documentation, you check with increasing counters, but last . takeWhile p will take the last element for which p holds. Since p = not . testCounterBool ran, you will end up with the last element for which testCounterBool ran does not hold.

Either way, if you look for the first element that holds a predicate, you can use find from Data.List instead:

findValidCounter ran = fromJust $ find (testCounterBool ran) [1..]

Given the changes with get16RandomBytes, we end up with

generateHeader :: IO String
generateHeader = do
 (ran, _) <- fmap get16RandomBytes getSystemDRG
 let validCounter = findValidCounter ran
 let validHeader = getBaseString ran validCounter
 return $ headerPrefix ++ validHeader

Question 3

I'll have to stop the review there, I'll try to review the rest on Saturday.

Question 4

This is such an informative answer. Thank you so much! I knew I could solve this with a bit mask but for some reason I wasnt sure if it would be faster but it makes sense when you explain it in terms of assembly.

Question 5

I only understand the do syntax for monadic actions. Looks like I need to do more reading.

Question 6

@maple_shaft: basically do { x <- foo; bar} is the same as foo >>= (\x -> bar), which sheds some light on the error you'll get if you use x <- foo as the last expression in a do block. I'll elaborate on Saturday.

Question 7

Small bug in your optimized bitMask expression, you need to swap the x and n arguments: bitMask = foldr (\x n -> setBit n x) zeroBits [0..(difficulty - 1)]

Zeta Zeta 19.6k2 gold badges57 silver badges90 bronze badges · Accepted Answer · 2017-01-12 21:25:33Z

That's a lot of imports, and several seem strange. For example Data.List, since you only use head, tail, foldr or other Prelude functions. The problem is that you've use as, but left the qualified. This should be:

import Crypto.Hash
import Crypto.Random
import Data.Binary.Strict.Get
import Data.Bits
import Data.ByteString (ByteString)
import qualified Data.ByteArray as BA
import qualified Data.ByteString as B
import qualified Data.ByteString.Base64 as B64
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString.Conversion as BCON
import qualified Data.ByteString.UTF8 as BU
import Data.Either (either)
import Data.Int (Int32)
import Data.List.Split (splitOn)
import Data.Word (Word32)
import System.Random

qualified will prevent names like head from Data.ByteString getting imported into the global namespace. See "import" for more information. The whitespace is purely custom. I like to sort the modules by name, but that's up to you.

Next, I would make clear that your convert* functions are only new names, e.g.:

convertToString :: ByteString -> String
convertToString = BU.toString
convertFromString :: String -> ByteString
convertFromString = BU.fromString
convertIntToString :: Int -> String
convertIntToString = convertToString . BCON.toByteString' 
encodeInt32 :: Int32 -> ByteString
encodeInt32 = B64.encode . BCON.toByteString' 
mahDecoder :: Get Word32
mahDecoder = getWord32be

The mahDecoder change is a little bit different, since you've used originally something like

foo = do
 x <- func
 return x

However, this is fine due to the right-identity monad law:

func >>= return === func

Next, firstBitsZero can be rewritten using all:

firstBitsZero :: (Bits a) => a -> Bool
firstBitsZero val = all (\x -> not $ testBit val x) [0..(difficulty - 1)]

Note that a bit mask would be faster would make it faster, e.g.:

bitMask :: Num a => a
bitMask = (2 ^ difficulty) - 1
firstBitsZero :: Int32 -> Bool 
firstBitsZero val = bitMask .&. val == zeroBits

That way you only have to create the mask once and then only use bitwise AND, which usually gets compiled into a single CPU instruction.

When working with list, you want to add elements at the front, not at the back. Instead of

init (concatMap (++ "{") splixTail)

you should use

tail (concatMap ('{' :) splixTail)

The problem is that ++ is linear in its first argument:

(x:xs) ++ ys = x : (xs ++ ys)

(:) on the other hand is constant in terms of time.

We end up with:

formatTemplate :: String -> [String] -> String
formatTemplate base [] = base
formatTemplate base (x:xs) = 
 let (splixHead:splixTail) = splitOn "{" base
 concatSplitTail = tail $ concatMap ("{" :) splixTail
 in formatTemplate (splixHead ++ (x : concatSplitTail)) xs

Your next function doesn't need to be in IO:

get16RandomBytes :: (DRG g) => g -> (ByteString, g)
get16RandomBytes = randomBytesGenerate 16

In your following functions, I would prefer to use where, but that's completely up to personal preference. Also, if you use an expression only once, it might make sense to get rid of its binding if the code stays readable:

hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs = B.pack . BA.unpack $ (hash bs :: Digest SHA1)

or

hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs = B.pack . BA.unpack $ hashDigest
 where
 hashDigest = hash bs :: Digest SHA1

Both testCounter and testCounterBool use some duplicate code, which should be placed in its own function:

decodeFirst32 :: ByteString -> Int32 -> Either String Word32
decodeFirst32 rb = fst . runGet mahDecoder . hashSHA1Encoded . convertFromString . getBaseString rb

This makes testCounter and testCounterBool a lot simpler:

testCounter :: ByteString -> Int32 -> Int32
testCounter rb !counter = 
 case decodeFirst32 rb counter of
 Right f32 | firstBitsZero f32 -> counter
 _ -> testCounter rb (counter + 1)
testCounterBool :: ByteString -> Int32 -> Bool
testCounterBool rb counter = either (const False) firstBitsZero $ decodeFirst32 rb counter

With findValidCounter, I'm somewhat sure that your logic isn't 100% correct. According to the documentation, you check with increasing counters, but last . takeWhile p will take the last element for which p holds. Since p = not . testCounterBool ran, you will end up with the last element for which testCounterBool ran does not hold.

Either way, if you look for the first element that holds a predicate, you can use find from Data.List instead:

findValidCounter ran = fromJust $ find (testCounterBool ran) [1..]

Given the changes with get16RandomBytes, we end up with

generateHeader :: IO String
generateHeader = do
 (ran, _) <- fmap get16RandomBytes getSystemDRG
 let validCounter = findValidCounter ran
 let validHeader = getBaseString ran validCounter
 return $ headerPrefix ++ validHeader

I'll have to stop the review there, I'll try to review the rest on Saturday.
This is such an informative answer. Thank you so much! I knew I could solve this with a bit mask but for some reason I wasnt sure if it would be faster but it makes sense when you explain it in terms of assembly.
I only understand the do syntax for monadic actions. Looks like I need to do more reading.
@maple_shaft: basically do { x <- foo; bar} is the same as foo >>= (\x -> bar), which sheds some light on the error you'll get if you use x <- foo as the last expression in a do block. I'll elaborate on Saturday.
Small bug in your optimized bitMask expression, you need to swap the x and n arguments: bitMask = foldr (\x n -> setBit n x) zeroBits [0..(difficulty - 1)]

Stack Exchange Network

HashCash Algorithm header generation in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

HashCash Algorithm header generation in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions