7
\$\begingroup\$

I'm learning Haskell at the moment, and have just written my first useful module - a parser fo INI files. I used Parsec. I'd like to know what can be improved here - or maybe I did some things completely wrong and there is a better way. Thanks.

module IniFile (iniFileToMap) where
import Text.ParserCombinators.Parsec
import qualified Data.Map as Map
import Char
parseIniFile :: Parser ([(String, [(String, String)])])
parseSection :: Parser (String, [(String, String)])
parseSectionHeader :: Parser String
parseNameValue :: Parser (String, String)
parseComment :: Parser ()
parseNormalValue :: Parser String
parseQuotedValue :: Parser String
parseValue :: Parser String
parseName :: Parser String
parseInsideQuotes :: Parser String
parseCmtOrSpace :: Parser ()
iniFileToMap :: FilePath -> IO (Map.Map String (Map.Map String String))
parseCmtOrSpace = do{
 spaces;
 skipMany parseComment;
}
parseInsideQuotes = many (noneOf "\"")
parseQuotedValue = between (char '"') (char '"') parseInsideQuotes 
parseNormalValue = many1 (satisfy (\c -> isPrint c && not (isSpace c)))
parseValue = parseQuotedValue <|> parseNormalValue
parseName = many1 (satisfy (\c -> isAlpha c || isDigit c || c == '_' || c == '.'));
parseComment = do{
 skipMany1 (char ';');
 skipMany (noneOf "\n");
 spaces;
}
parseNameValue = do{
 name <- parseName;
 between (skipMany (oneOf " \t")) (skipMany (oneOf " \t")) (char '=');
 value <- parseValue;
 return (name, value);
}
parseSectionHeader = do{
 name <- between (char '[') (char ']') parseName;
 return name;
}
parseSection = do{
 name <- between parseCmtOrSpace parseCmtOrSpace parseSectionHeader;
 values <- endBy1 (parseNameValue) (parseCmtOrSpace);
 return (name, values);
}
parseIniFile = do{
 result <- many1 (parseSection);
 return result;
}
list2Map list = Map.fromList (map (\e -> (fst e, Map.fromList (snd e))) list)
iniFileToMap path = do{
 result <- parseFromFile parseIniFile path;
 case (result) of
 Left err -> error (show err)
 Right xs -> return (list2Map(xs));
}
asked May 5, 2011 at 12:29
\$\endgroup\$

2 Answers 2

7
\$\begingroup\$

The first thing I noticed is that you have grouped all the type signatures together, away from the function bodies. This seems very strange to me. It's way more common (and as far as I'm concerned also way more readable) to have a value's type directly before its definition.

The second thing I noticed were all the braces and semicolons. Unless you have a specific reason to use them, I'd strongly recommend making use of Haskell's significant indentation instead. Much less visual noise. I'm especially confused by the semicolon at the end of parseName's definition as parseName doesn't use do-notation and you didn't use a semicolon on any other definition which didn't use do.


You sometimes surround variables with parentheses (e.g. in values <- endBy1 (parseNameValue) (parseCmtOrSpace), but not in the line name <- between parseCmtOrSpace ... directly above it). I can't see any system to when you do this and when you don't, so it just seems random and inconsistent. So I'd recommend to get rid of the needless parentheses.

Speaking of parentheses, you might also want to get rid of some more parentheses by sprinkling a couple of $ signs in your code, to give it a more "haskellish" feel.


parseSectionHeader = do{
 name <- between (char '[') (char ']') parseName;
 return name;
}
parseIniFile = do{
 result <- many1 (parseSection);
 return result;
}

Whenever you have something of the form:

foo <- bar
return foo

You can simply just write bar. I.e.:

parseSectionHeader = between (char '[') (char ']') parseName
parseIniFile = many1 parseSection

iniFileToMap path = do{
 result <- parseFromFile parseIniFile path;
 case (result) of
 Left err -> error (show err)
 Right xs -> return (list2Map(xs));
}

Here the indentation is off - case should not be indented further than result <-. It should be:

iniFileToMap path = do
 result <- parseFromFile parseIniFile path
 case result of
 Left err -> error $ show err
 Right xs -> return $ list2Map xs
answered May 6, 2011 at 16:34
\$\endgroup\$
3
\$\begingroup\$

Formatting/Conventions

Firstly, the usual convention is to alias Data.Map as M. Usually qualified imports are one letter, unless you need two because of a collision.

Don't use braces except for record syntax.

Don't use semicolons.

Monads

Alot of your do notation doesn't need do notation. To add to sepp2k's examples, we also have stuff like:

parseCmtOrSpace = do
 spaces
 skipMany parseComment

which should be written much more nicely as

parseCmtOrSpace = spaces >> skipMany parseComment

since do notation is usually not necessary for a two line do.

As another example of cleaning up do notation: (This one is significantly more to personal case than my other example. However it also makes some other changes that you should look at, even if you want to keep your monadic do notation, such as refactoring out the tabs)

parseNameValue = do
 name <- parseName
 between (skipMany (oneOf " \t")) (skipMany (oneOf " \t")) (char '=')
 value <- parseValue
 return (name, value)

We can instead write: (I apologize if the following is incorrect, I don't have a compiler available)

parseNameValue = liftM3 g parseName eq parseValue
 where eq = between ts ts $ char '='
 ts = skipMany $ oneOf " \t"
 g x _ y = (x, y)

We can do the same for parseSection:

parseSection = do
 name <- between parseCmtOrSpace parseCmtOrSpace parseSectionHeader
 values <- endBy1 (parseNameValue) (parseCmtOrSpace)
 return (name, values)

becomes

parseSection = liftM2 g names values
 where names = between parseCmtOrSpace parseCmtOrSpace parseSectionHeader
 values = endBy1 parseNameValue parseCmtOrSpace
 g x y = (x, y)

Parsec

You should define a way to get an Either ParseError (M.Map String String) instead of erroring in the library. Erroring should be left to the client. Although you may want to provide one that provides an Either, and one function that errors automatically.

It seems to be convention that the monads that build up the parser should not begin with parse. We see this in the Parsec library (its char, not parseChar). The functions don't parse; they return data that can be used for parsing. At the very least do nameValueParser, but theres nothing wrong with just nameValue. Just don't export the functions and there won't be any namespacing issues most of the time.

answered May 7, 2011 at 12:37
\$\endgroup\$
1
  • \$\begingroup\$ You can create stunningly readable Parsec code by using its Applicative instance. For example, parseNameValue then becomes parseNameValue = (,) <$> parseName <*> (ws *> char '=' <* ws) *> parseValue (where ws = skipMany $ oneOf " \t") \$\endgroup\$ Commented May 26, 2011 at 6:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.