I've written a rudimentary parser for INI files:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Map as M
import Data.Maybe (fromMaybe)
import qualified Data.Text as T
type Ini = M.Map T.Text Section
data Section =
Section
{ name :: T.Text
, properties :: M.Map T.Text T.Text
}
deriving (Show)
main :: IO ()
main = parseIni iniFilePath >>= \ini -> putStrLn $ "Parsed INI: " ++ show ini
where
iniFilePath = "/home/me/test.ini"
parseIni :: FilePath -> IO Ini
parseIni iniFilePath = parseToIni . T.pack <$> readFile iniFilePath
parseToIni :: T.Text -> Ini
parseToIni stringToParse =
-- We return the parsed Ini, not the helper values
firstOfTriple $
foldr
(\line (ini, currentSectionMaybe, lineIndex) ->
-- We're at a new section start or the end of the file → add the previous section
-- to the parsed Ini value and create a new section
if isSectionHeader line || lineIndex >= length lines - 1
then let updatedIni = addSection ini currentSectionMaybe
in (updatedIni, Just $ Section (getSectionName line) M.empty, 1 + lineIndex)
else (ini, updateSection currentSectionMaybe line, 1 + lineIndex))
(M.empty, Nothing, 0) $
-- Since foldr is right associative we would process the lines starting with the last one, that's
-- why we reverse the list of lines
reverse lines
where
lines :: [T.Text]
lines = T.splitOn "\n" stringToParse
firstOfTriple :: (a, b, c) -> a
firstOfTriple (x, _, _) = x
parseProperty :: T.Text -> Maybe (T.Text, T.Text)
parseProperty line =
case T.splitOn "=" line of
[name, value] -> Just (T.strip name, T.strip value)
_ -> Nothing
updateSection :: Maybe Section -> T.Text -> Maybe Section
updateSection sectionMaybe line = fmap updateSection' sectionMaybe
where
updateSection' :: Section -> Section
updateSection' section =
-- Add the property to the section if the property can be parsed.
-- Otherwise, leave the section as it were
maybe
section
(\(propName, value) -> Section (name section) (M.insert propName value (properties section)))
(parseProperty line)
getSectionName :: T.Text -> T.Text
getSectionName line = fromMaybe line headerWithoutBracketsMaybe
where
headerWithoutBracketsMaybe = T.stripPrefix "[" line >>= T.stripSuffix "]"
isSectionHeader :: T.Text -> Bool
isSectionHeader line = T.isPrefixOf "[" strippedLine && T.isSuffixOf "]" strippedLine
where
strippedLine = T.strip line
addSection :: Ini -> Maybe Section -> Ini
addSection ini sectionMaybe = maybe ini (\section -> M.insert (name section) section ini) sectionMaybe
I'd love to get feedback on how to simplify the code and/or make it more readable.
Things I'm aware of and ok with at the moment:
- The parser doesn't support comments
addSection
could be eta-reduced- I don't use a parsing library like Parsec
- I don't use lenses
2 Answers 2
Just a few comments – not a full review:
parseToIni :: T.Text -> Ini
indicates that from any random string,parseToIni
can produce anIni
. This makes me wonder how it would handle an invalid.ini
file, or e.g. the stringfoo
.In
updateSection :: Maybe Section -> T.Text -> Maybe Section
theMaybe
s obfuscate what the function is supposed to do. Can the function produce aNothing
if the first argument is aJust
? Better remove theMaybe
s andfmap
the whole thing if needed.addSection
is similar.main
would IMHO be more readable if it simply useddo
-notation.In
parseToIni
, the worker function forfoldr
is complex enough that it should have a type annotation. It's ok to to simply call inner worker functionsf
IMHO.I think it's a bit confusing that the section names appear both as the keys of
Ini
and inSection
'sname
field. I'd probably remove thename
field.A few type synonyms for keys, values, section names etc might help with readability.
The order of parameters in
addSection
andupdateSection
is a bit unconventional. The usuala -> b -> b
ordering is a bit nicer for partial applications.IMHO,
maybe
(and similar functions likeeither
) don't aid readability. If you don't want to come up with a variable name, try theLambdaCase
extension.Instead of
getSectionName
andisSectionName
, have a single function of typeText
->Maybe SectionName
.
All in all I think your code is pretty readable. It's mostly the types that could be a bit better.
I know you said you were fine with not using a parser combinator library like parsec, but I thought you might like to see the how the same thing might look using one, so I wrote an Attoparsec based parser for your data types:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
module IniParser where
import Control.Monad ( void )
import Data.Attoparsec.Text ( Parser
, char
, endOfInput
, endOfLine
, many'
, many1
, notInClass
, parseOnly
, satisfy
, space
)
import Data.Map.Strict ( Map )
import qualified Data.Map.Strict as Map
import Data.Text ( Text
, pack
)
import System.Environment ( getArgs )
type Ini = Map Text Section
data Section = Section
{ name :: Text
, properties :: Map Text Text
} deriving (Show)
main :: IO ()
main = do
[path] <- getArgs
parseIniFile path >>= \case
Right ini -> putStrLn $ "Parsed INI: " ++ show ini
Left err -> putStrLn $ "ERROR parsing ini: " ++ err
parseIniFile :: FilePath -> IO (Either String Ini)
parseIniFile iniFilePath = parseIni . pack <$> readFile iniFilePath
parseIni :: Text -> Either String Ini
parseIni = parseOnly ini
ini :: Parser Ini
ini = do
defaultSection <- lexeme (Section "" <$> (Map.fromList <$> many' property))
namedSections <- lexeme (many' section)
void $ endOfInput
let allSections | null (properties defaultSection) = namedSections
| otherwise = defaultSection:namedSections
pure . Map.fromList . map (\section -> (name section, section))
$ allSections
section :: Parser Section
section = Section <$> sectionName <*> (Map.fromList <$> many' (lexeme property))
sectionName :: Parser Text
sectionName = char '[' *> sectionNameChars <* char ']' <* endOfLine
sectionNameChars :: Parser Text
sectionNameChars = pack <$> many' (satisfy $ notInClass "]\r\n")
property :: Parser (Text, Text)
property = (,) <$> propertyName <*> (lexeme (char '=') *> propertyValue)
propertyName :: Parser Text
propertyName = pack <$> many' (satisfy $ notInClass "=\r\n\t ")
propertyValue :: Parser Text
propertyValue = pack <$> many' (satisfy $ notInClass "\r\n")
lexeme :: Parser a -> Parser a
lexeme p = whitespace *> p <* whitespace
whitespace :: Parser String
whitespace = many' space
I think the main strength of the approach is self pretty self-evident. It eliminates all the multi-line lambdas, the entire foldr, etc. which (at least IMHO) really obscures the essence of what the code is expressing.
Additionally I've restricted the use of qualified imports to a single one usage where I think it makes the code more readable, though your taste may vary.
You can see the whole stack based project here if you're interested.