I'm learning Haskell at the moment, and have just written my first useful module - a parser fo INI files. I used Parsec. I'd like to know what can be improved here - or maybe I did some things completely wrong and there is a better way. Thanks.
module IniFile (iniFileToMap) where
import Text.ParserCombinators.Parsec
import qualified Data.Map as Map
import Char
parseIniFile :: Parser ([(String, [(String, String)])])
parseSection :: Parser (String, [(String, String)])
parseSectionHeader :: Parser String
parseNameValue :: Parser (String, String)
parseComment :: Parser ()
parseNormalValue :: Parser String
parseQuotedValue :: Parser String
parseValue :: Parser String
parseName :: Parser String
parseInsideQuotes :: Parser String
parseCmtOrSpace :: Parser ()
iniFileToMap :: FilePath -> IO (Map.Map String (Map.Map String String))
parseCmtOrSpace = do{
spaces;
skipMany parseComment;
}
parseInsideQuotes = many (noneOf "\"")
parseQuotedValue = between (char '"') (char '"') parseInsideQuotes
parseNormalValue = many1 (satisfy (\c -> isPrint c && not (isSpace c)))
parseValue = parseQuotedValue <|> parseNormalValue
parseName = many1 (satisfy (\c -> isAlpha c || isDigit c || c == '_' || c == '.'));
parseComment = do{
skipMany1 (char ';');
skipMany (noneOf "\n");
spaces;
}
parseNameValue = do{
name <- parseName;
between (skipMany (oneOf " \t")) (skipMany (oneOf " \t")) (char '=');
value <- parseValue;
return (name, value);
}
parseSectionHeader = do{
name <- between (char '[') (char ']') parseName;
return name;
}
parseSection = do{
name <- between parseCmtOrSpace parseCmtOrSpace parseSectionHeader;
values <- endBy1 (parseNameValue) (parseCmtOrSpace);
return (name, values);
}
parseIniFile = do{
result <- many1 (parseSection);
return result;
}
list2Map list = Map.fromList (map (\e -> (fst e, Map.fromList (snd e))) list)
iniFileToMap path = do{
result <- parseFromFile parseIniFile path;
case (result) of
Left err -> error (show err)
Right xs -> return (list2Map(xs));
}
2 Answers 2
The first thing I noticed is that you have grouped all the type signatures together, away from the function bodies. This seems very strange to me. It's way more common (and as far as I'm concerned also way more readable) to have a value's type directly before its definition.
The second thing I noticed were all the braces and semicolons. Unless you have a specific reason to use them, I'd strongly recommend making use of Haskell's significant indentation instead. Much less visual noise. I'm especially confused by the semicolon at the end of parseName
's definition as parseName
doesn't use do
-notation and you didn't use a semicolon on any other definition which didn't use do
.
You sometimes surround variables with parentheses (e.g. in values <- endBy1 (parseNameValue) (parseCmtOrSpace)
, but not in the line name <- between parseCmtOrSpace ...
directly above it). I can't see any system to when you do this and when you don't, so it just seems random and inconsistent. So I'd recommend to get rid of the needless parentheses.
Speaking of parentheses, you might also want to get rid of some more parentheses by sprinkling a couple of $
signs in your code, to give it a more "haskellish" feel.
parseSectionHeader = do{
name <- between (char '[') (char ']') parseName;
return name;
}
parseIniFile = do{
result <- many1 (parseSection);
return result;
}
Whenever you have something of the form:
foo <- bar
return foo
You can simply just write bar
. I.e.:
parseSectionHeader = between (char '[') (char ']') parseName
parseIniFile = many1 parseSection
iniFileToMap path = do{
result <- parseFromFile parseIniFile path;
case (result) of
Left err -> error (show err)
Right xs -> return (list2Map(xs));
}
Here the indentation is off - case
should not be indented further than result <-
. It should be:
iniFileToMap path = do
result <- parseFromFile parseIniFile path
case result of
Left err -> error $ show err
Right xs -> return $ list2Map xs
Formatting/Conventions
Firstly, the usual convention is to alias Data.Map as M. Usually qualified imports are one letter, unless you need two because of a collision.
Don't use braces except for record syntax.
Don't use semicolons.
Monads
Alot of your do notation doesn't need do notation. To add to sepp2k's examples, we also have stuff like:
parseCmtOrSpace = do
spaces
skipMany parseComment
which should be written much more nicely as
parseCmtOrSpace = spaces >> skipMany parseComment
since do notation is usually not necessary for a two line do.
As another example of cleaning up do notation: (This one is significantly more to personal case than my other example. However it also makes some other changes that you should look at, even if you want to keep your monadic do notation, such as refactoring out the tabs)
parseNameValue = do
name <- parseName
between (skipMany (oneOf " \t")) (skipMany (oneOf " \t")) (char '=')
value <- parseValue
return (name, value)
We can instead write: (I apologize if the following is incorrect, I don't have a compiler available)
parseNameValue = liftM3 g parseName eq parseValue
where eq = between ts ts $ char '='
ts = skipMany $ oneOf " \t"
g x _ y = (x, y)
We can do the same for parseSection
:
parseSection = do
name <- between parseCmtOrSpace parseCmtOrSpace parseSectionHeader
values <- endBy1 (parseNameValue) (parseCmtOrSpace)
return (name, values)
becomes
parseSection = liftM2 g names values
where names = between parseCmtOrSpace parseCmtOrSpace parseSectionHeader
values = endBy1 parseNameValue parseCmtOrSpace
g x y = (x, y)
Parsec
You should define a way to get an Either ParseError (M.Map String String)
instead of erroring in the library. Erroring should be left to the client. Although you may want to provide one that provides an Either
, and one function that errors automatically.
It seems to be convention that the monads that build up the parser should not begin with parse. We see this in the Parsec library (its char
, not parseChar
). The functions don't parse; they return data that can be used for parsing. At the very least do nameValueParser
, but theres nothing wrong with just nameValue
. Just don't export the functions and there won't be any namespacing issues most of the time.
-
\$\begingroup\$ You can create stunningly readable Parsec code by using its Applicative instance. For example,
parseNameValue
then becomesparseNameValue = (,) <$> parseName <*> (ws *> char '=' <* ws) *> parseValue
(wherews = skipMany $ oneOf " \t"
) \$\endgroup\$Cactus– Cactus2011年05月26日 06:18:25 +00:00Commented May 26, 2011 at 6:18