2
\$\begingroup\$

I have been working on a Bencoded string parser in order to improve my knowledge of Haskell. After running the code through hlint, I still have a few questions:

  1. As I noted in the comments, the key value in the definition of BMapT should always be a Bstr. Is there any way to enforce a particular data constructor when pattern matching an algrebraic datatype?

  2. Since Bencoded dictionaries are generally stored as human readable text, I made an instance of the Show typeclass so that I could take Bencode structures and turn them into text. I was wondering if this is the right thing to do, or if Haskell has another typeclass for storage/serialization a la Python's repr.

  3. This question is kind of nebulous, but would you consider this code 'idiomatic'? Code without style isn't worth writing :)

Code:

-- This is an implementation of Bencoding for bittorrent as described at 
-- http://www.bittorrent.org/beps/bep_0003.html
module Bencode where
import Text.Parsec.ByteString
import Text.Parsec.Char
import Text.Parsec.Prim
import Text.Parsec.Combinator
import qualified Text.Parsec.Error as PE
import Data.Char
import qualified Data.Map as M
import qualified Control.Monad as Mon
-- Technically this first argument can only be a Bstr, but I don't know
-- how to express that
type BMapT = M.Map Bencode Bencode
-- I have no idea why Bstr is okay as a String when it's a ByteString
data Bencode = Bint Integer
 | Bstr String
 | Blist [Bencode]
 | Bmap BMapT
 deriving (Eq, Ord)
instance Show Bencode where
 show (Bint i) = "i" ++ show i ++ "e"
 show (Bstr s) = (show . length) s ++ ":" ++ s
 show (Blist bs) = 'l':concatMap show bs ++ "e"
 show (Bmap bm) = M.foldlWithKey (\a k b -> a ++ show k ++ show b) "d" bm ++ "e"
-- Parse a Bencoded Integer
bInt :: Parser Bencode
bInt = do char 'i'
 num <- validNum
 char 'e'
 return $ Bint num
 -- This parser parses valid integers in Bencodings 
 where validNum = do neg <- option ' ' (char '-')
 d <- digit
 case digitToInt d of
 -- Only time the first digit == 0 is "i0e"
 0 -> if neg == ' ' then 
 -- "i0e" allowed but NOT "i-0e" or zero padded integer
 lookAhead (char 'e') >> return 0 
 else
 parserFail "Can't have a negative zero"
 _ -> many digit >>= \xs -> return $ read (neg:d:xs)
-- Parse a Bencoded String
bString :: Parser Bencode
bString = do ss <- many1 digit
 char ':'
 let size = read ss
 Mon.liftM Bstr $ count size anyChar
bList :: Parser Bencode
bList = do char 'l' 
 ls <- many (bInt <|> bString <|> bList <|> bMap)
 char 'e'
 return $ Blist ls
-- A parser which parses bencoded dictionaries 
bMap :: Parser Bencode
bMap = do char 'd'
 entries <- many dictEntry
 char 'e'
 return $ Bmap $ M.fromList entries
-- This parser will parse a key-value pair
dictEntry :: Parser (Bencode, Bencode)
dictEntry = do key <- bString
 value <- bString <|> bList <|> bInt <|> bMap
 return (key, value)
-- This function reads a torrent file. readBencodedFile "filename" reads
-- that filename and returns the parsed bencoded dictionary
readBencodedFile :: String -> IO (Either PE.ParseError Bencode)
readBencodedFile = parseFromFile bMap
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 27, 2013 at 6:33
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Below is your code with some minor tweaks and the comments moved into standard Haddock form.

module Bencode (readBencodedFile, BencodedValue(..), bencode, bencodeBS) where
import qualified Control.Monad as Mon
import Data.ByteString (ByteString)
import Data.ByteString.Char8 (pack)
import qualified Data.Map as M
import Text.Parsec.ByteString as ParseBS
import Text.Parsec.Char (anyChar, char, digit)
import Text.Parsec.Prim ((<|>), parserFail, runParser)
import Text.Parsec.Combinator (count, many1, option)
import qualified Text.Parsec.Error as PE
type BMapT = M.Map BencodedValue BencodedValue
data BencodedValue = Bint Integer
 | Bstr String
 | Blist [BencodedValue]
 | Bmap BMapT
 deriving (Eq, Ord, Show)
-- |Transform a 'BencodedValue' back to an encoded string
bencode :: BencodedValue -> String
bencode (Bint i) = "i" ++ show i ++ "e"
bencode (Bstr s) = (show . length) s ++ ":" ++ s
bencode (Blist bs) = 'l':concatMap bencode bs ++ "e"
bencode (Bmap bm) = M.foldlWithKey (\a k b -> a ++ bencode k ++ bencode b) "d" bm ++ "e"
-- |Return a 'BencodedValue' value to it original encoded 'ByteString' form.
bencodeBS :: BencodedValue -> ByteString
bencodeBS = pack . bencode
{-|
 'readBencodedFile' reads data from a torrent file and returns the parsed
 result as a dictionary in a 'BencodedValue'.
 The 'String' is the filename of the torrent to read.
 -}
readBencodedFile :: String -> IO (Either PE.ParseError BencodedValue)
readBencodedFile = parseFromFile bMap
------------------------
-- Internal functions --
------------------------
-- |Parse a Bencoded Integer
bInt :: ParseBS.Parser BencodedValue
bInt = do char 'i'
 neg <- option ' ' (char '-')
 nums <- many1 digit
 char 'e'
 mkBint neg nums
 where
 mkBint ' ' ['0'] = return $ Bint 0
 mkBint '-' ('0':ns) = parserFail "Can't have a negative zero"
 mkBint _ ('0':ns) = parserFail "Invalid number with a leading zero"
 mkBint '-' nums = return $ Bint $ read ('-':nums)
 mkBint _ nums = return $ Bint $ read nums
-- |Parse a Bencoded String
bString :: ParseBS.Parser BencodedValue
bString = do nums <- many1 digit
 char ':'
 mkBstring (read nums)
 where
 mkBstring size = do cs <- count size anyChar
 return $ Bstr cs
-- |Parse a Bencoded List
bList :: ParseBS.Parser BencodedValue
bList = do char 'l'
 ls <- many1 (bInt <|> bString <|> bList <|> bMap)
 char 'e'
 mkBlist ls
 where
 mkBlist = return . Blist
-- |Parse a Bencoded dictionary
bMap :: ParseBS.Parser BencodedValue
bMap = do char 'd'
 entries <- many1 dictEntry
 char 'e'
 mkBmap entries
 where
 mkBmap = return . Bmap . M.fromList
-- |Parse a Bencoded dictionary entry.
-- This is an internal function declared outside of 'bMap' to
-- facilitate testing.
dictEntry :: ParseBS.Parser (BencodedValue, BencodedValue)
dictEntry = do key <- bString
 value <- bString <|> bList <|> bInt <|> bMap
 return (key, value)

Answers to your questions.

  1. Without getting REAL fancy there is no good way limit the BMap argument to a Bstr while it is an member of algebraic type.
  2. Your analogy to repr from Python is a good one. Typically if there is a Haskell Show instance there is also a Read instance so show . read . show is valid. In your case the result of show needs to be parsed again and you do not expose a parse from string/bytestring. I added a bencode and bencodeBS which takes a BencodedValue and outputs a String or ByteString.
  3. Look at where I simplified or renamed your items to see suggestions on idiomatic usage. On the whole you did pretty well. If I had more time I would have liked to transform your parser functions into Applicative style. By transitioning to the mkFoo functions I started that process for you if you like.
  4. In the code you ask about why this is a String when the ByteString is the input. It is because the parsers are returning Strings parsed from the ByteString. If the performance really matters you could transition over to Attoparsec.
answered Mar 29, 2014 at 0:40
\$\endgroup\$
1
  • \$\begingroup\$ I apologize for not accepting this earlier! Thank you so much for this review. I like your addition of the mkBInt function, its definitely better than the repeated ifs I was using. You also inspired me to read the typeclassopedia which has made me a more effective Haskeller! \$\endgroup\$ Commented May 1, 2014 at 18:12

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.