A thought exercise on my part as I'm relatively new to Haskell. I wanted an interesting project to work on so I decided to implement the Hashcash Algorithm, which is most commonly used as the basis of Bitcoin Proof of Work scheme. I am implementing the original specification that utilizes SHA1 and the description of the algorithmic steps are described well in the above Wikipedia article.
This appears to function correctly to the best of my knowledge, however I feel it is somewhat slower than it should be. Any potential suggestions for performance improvements are welcome here. Furthermore, as I am new to writing Haskell, if I am violating common expected conventions here then please feel free to point out how I can write more readable and standard code here.
{-# LANGUAGE BangPatterns #-}
module HashCash where
import Data.Int
import Data.List
import Data.List.Split (splitOn)
import Data.Char
import Data.Function
import System.Random
import Data.Bits
import Data.Either
import Data.Binary.Strict.Get
import System.IO as SIO
import Data.Word (Word32)
import Data.ByteString as B
import Data.ByteString.Char8 as BC
import Data.ByteString.UTF8 as BU
import Data.ByteString.Base64 as B64
import Data.ByteString.Conversion as BCON
import Data.ByteArray as BA
import Crypto.Random
import Crypto.Hash
startingCounter :: Int32
startingCounter = 1
difficulty :: Int
difficulty = 20
headerPrefix = "X-Hashcash: "
template = "1:{:{:{::{:{"
dateTemplate = "YYMMDDhhmmss"
address = "a@a"
-- example date because I dont want to mess with date formatting just now
exampleDate = "150320112233"
convertToString :: ByteString -> String
convertToString b = BU.toString b
convertFromString :: String -> ByteString
convertFromString s = BU.fromString s
convertIntToString :: Int -> String
convertIntToString a = convertToString . BCON.toByteString' $ a
encodeInt32 :: Int32 -> ByteString
encodeInt32 a = B64.encode . BCON.toByteString' $ a
mahDecoder :: Get Word32
mahDecoder = do
first32Bits <- getWord32be
return first32Bits
firstBitsZero :: (Bits a) => a -> Bool
firstBitsZero val = Data.List.foldr (\x acc -> ((not $ testBit val x) && acc)) True [0..(difficulty - 1)]
formatTemplate :: String -> [String] -> String
formatTemplate base [] = base
formatTemplate base (x:xs) =
let splix = (Data.List.Split.splitOn "{" base) :: [String]
splixHead = Data.List.head splix ++ x
splixTail = Data.List.tail splix
concatSplitTail = Data.List.init $ Data.List.concatMap (++ "{") splixTail
in formatTemplate (splixHead ++ concatSplitTail) xs
get16RandomBytes :: (DRG g) => g -> IO (ByteString, g)
get16RandomBytes gen = do
let a = randomBytesGenerate 16 gen
return $ a
getBaseString :: ByteString -> Int32 -> String
getBaseString bs counter =
let encodedVal = B64.encode bs
encodedCounter = encodeInt32 counter
baseParams = [(convertIntToString difficulty), exampleDate, address, (convertToString encodedVal), (convertToString encodedCounter)]
in formatTemplate template baseParams
hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs =
let hashDigest = hash bs :: Digest SHA1
byteString = B.pack . BA.unpack $ hashDigest
in byteString
-- Pass a counter and if the first 20 bits are zero then return the same counter value else increment it
-- signifying it is time to test the next number (NOTE: recursive style, may overflow stack)
testCounter :: ByteString -> Int32 -> Int32
testCounter rb !counter =
let baseString = getBaseString rb counter
hashedString = hashSHA1Encoded $ convertFromString baseString
!eitherFirst32 = runGet mahDecoder hashedString
incCounter = counter + 1
in case eitherFirst32 of
(Left first32, _) -> testCounter rb incCounter
(Right first32, _) -> if (firstBitsZero first32)
then counter
else testCounter rb incCounter
testCounterBool :: ByteString -> Int32 -> Bool
testCounterBool rb counter =
let baseString = getBaseString rb counter
hashedString = hashSHA1Encoded $ convertFromString baseString
eitherFirst32 = runGet mahDecoder hashedString
in case eitherFirst32 of
(Left first32, _) -> False
(Right first32, _) -> firstBitsZero first32
-- Keep taking incrementing counters from an infinite list and testing them until we find a counter
-- that generates a valid header
findValidCounter :: ByteString -> Int32
findValidCounter ran = Data.List.last $ Data.List.takeWhile (not . testCounterBool ran) [1..]
generateHeader :: IO String
generateHeader = do
g <- getSystemDRG
(ran, _) <- get16RandomBytes g
let validCounter = findValidCounter ran
let validHeader = getBaseString ran validCounter
return $ headerPrefix ++ validHeader
main :: IO ()
main = do
header <- generateHeader
SIO.putStrLn header
return ()
1 Answer 1
That's a lot of imports, and several seem strange. For example Data.List
, since you only use head
, tail
, foldr
or other Prelude
functions. The problem is that you've use as
, but left the qualified
. This should be:
import Crypto.Hash
import Crypto.Random
import Data.Binary.Strict.Get
import Data.Bits
import Data.ByteString (ByteString)
import qualified Data.ByteArray as BA
import qualified Data.ByteString as B
import qualified Data.ByteString.Base64 as B64
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString.Conversion as BCON
import qualified Data.ByteString.UTF8 as BU
import Data.Either (either)
import Data.Int (Int32)
import Data.List.Split (splitOn)
import Data.Word (Word32)
import System.Random
qualified
will prevent names like head
from Data.ByteString
getting imported into the global namespace. See "import" for more information. The whitespace is purely custom. I like to sort the modules by name, but that's up to you.
Next, I would make clear that your convert*
functions are only new names, e.g.:
convertToString :: ByteString -> String
convertToString = BU.toString
convertFromString :: String -> ByteString
convertFromString = BU.fromString
convertIntToString :: Int -> String
convertIntToString = convertToString . BCON.toByteString'
encodeInt32 :: Int32 -> ByteString
encodeInt32 = B64.encode . BCON.toByteString'
mahDecoder :: Get Word32
mahDecoder = getWord32be
The mahDecoder
change is a little bit different, since you've used originally something like
foo = do
x <- func
return x
However, this is fine due to the right-identity monad law:
func >>= return === func
Next, firstBitsZero
can be rewritten using all
:
firstBitsZero :: (Bits a) => a -> Bool
firstBitsZero val = all (\x -> not $ testBit val x) [0..(difficulty - 1)]
Note that a bit mask would be faster would make it faster, e.g.:
bitMask :: Num a => a
bitMask = (2 ^ difficulty) - 1
firstBitsZero :: Int32 -> Bool
firstBitsZero val = bitMask .&. val == zeroBits
That way you only have to create the mask once and then only use bitwise AND, which usually gets compiled into a single CPU instruction.
When working with list, you want to add elements at the front, not at the back. Instead of
init (concatMap (++ "{") splixTail)
you should use
tail (concatMap ('{' :) splixTail)
The problem is that ++
is linear in its first argument:
(x:xs) ++ ys = x : (xs ++ ys)
(:)
on the other hand is constant in terms of time.
We end up with:
formatTemplate :: String -> [String] -> String
formatTemplate base [] = base
formatTemplate base (x:xs) =
let (splixHead:splixTail) = splitOn "{" base
concatSplitTail = tail $ concatMap ("{" :) splixTail
in formatTemplate (splixHead ++ (x : concatSplitTail)) xs
Your next function doesn't need to be in IO
:
get16RandomBytes :: (DRG g) => g -> (ByteString, g)
get16RandomBytes = randomBytesGenerate 16
In your following functions, I would prefer to use where
, but that's completely up to personal preference. Also, if you use an expression only once, it might make sense to get rid of its binding if the code stays readable:
hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs = B.pack . BA.unpack $ (hash bs :: Digest SHA1)
or
hashSHA1Encoded :: ByteString -> ByteString
hashSHA1Encoded bs = B.pack . BA.unpack $ hashDigest
where
hashDigest = hash bs :: Digest SHA1
Both testCounter
and testCounterBool
use some duplicate code, which should be placed in its own function:
decodeFirst32 :: ByteString -> Int32 -> Either String Word32
decodeFirst32 rb = fst . runGet mahDecoder . hashSHA1Encoded . convertFromString . getBaseString rb
This makes testCounter
and testCounterBool
a lot simpler:
testCounter :: ByteString -> Int32 -> Int32
testCounter rb !counter =
case decodeFirst32 rb counter of
Right f32 | firstBitsZero f32 -> counter
_ -> testCounter rb (counter + 1)
testCounterBool :: ByteString -> Int32 -> Bool
testCounterBool rb counter = either (const False) firstBitsZero $ decodeFirst32 rb counter
With findValidCounter
, I'm somewhat sure that your logic isn't 100% correct. According to the documentation, you check with increasing counters, but last . takeWhile p
will take the last element for which p
holds. Since p = not . testCounterBool ran
, you will end up with the last element for which testCounterBool ran
does not hold.
Either way, if you look for the first element that holds a predicate, you can use find
from Data.List
instead:
findValidCounter ran = fromJust $ find (testCounterBool ran) [1..]
Given the changes with get16RandomBytes
, we end up with
generateHeader :: IO String
generateHeader = do
(ran, _) <- fmap get16RandomBytes getSystemDRG
let validCounter = findValidCounter ran
let validHeader = getBaseString ran validCounter
return $ headerPrefix ++ validHeader
-
1\$\begingroup\$ I'll have to stop the review there, I'll try to review the rest on Saturday. \$\endgroup\$Zeta– Zeta2017年01月12日 21:26:20 +00:00Commented Jan 12, 2017 at 21:26
-
\$\begingroup\$ This is such an informative answer. Thank you so much! I knew I could solve this with a bit mask but for some reason I wasnt sure if it would be faster but it makes sense when you explain it in terms of assembly. \$\endgroup\$maple_shaft– maple_shaft2017年01月12日 22:58:43 +00:00Commented Jan 12, 2017 at 22:58
-
\$\begingroup\$ I only understand the do syntax for monadic actions. Looks like I need to do more reading. \$\endgroup\$maple_shaft– maple_shaft2017年01月12日 23:45:13 +00:00Commented Jan 12, 2017 at 23:45
-
\$\begingroup\$ @maple_shaft: basically
do { x <- foo; bar}
is the same asfoo >>= (\x -> bar)
, which sheds some light on the error you'll get if you usex <- foo
as the last expression in ado
block. I'll elaborate on Saturday. \$\endgroup\$Zeta– Zeta2017年01月13日 05:49:32 +00:00Commented Jan 13, 2017 at 5:49 -
\$\begingroup\$ Small bug in your optimized bitMask expression, you need to swap the x and n arguments:
bitMask = foldr (\x n -> setBit n x) zeroBits [0..(difficulty - 1)]
\$\endgroup\$maple_shaft– maple_shaft2017年01月13日 13:52:09 +00:00Commented Jan 13, 2017 at 13:52