I figured it's about time to jump into some Haskell, so here's a first attempt at an oddly specific SVG parser:
import System.Environment
import Text.ParserCombinators.Parsec
parseFile fname = parseFromFile (manyTill (try tag) (try readEnd)) fname
tag = do manyTill anyChar . try $ lookAhead tagStart
char '<'
name <- many $ noneOf " "
props <- tagContents
char '>'
junk
return (name, props)
tagContents = do props <- manyTill property . try . lookAhead $ char '>'
junk
return props
property = do spaces
name <- many1 $ noneOf "="
string "=\""
val <- manyTill anyChar $ char '"'
junk
return (name, val)
junk = optional . many $ oneOf "\n\r\t\\/"
readEnd = do optional $ string "</svg>"
junk
eof
tagStart = do char '<'
tagName
tagName = string "rect"
<|> string "polygon"
<|> string "polyline"
<|> string "circle"
<|> string "path"
<|> string "g"
<|> string "svg"
Before anyone asks, one of the objectives for this program (other than learning) is that it should accept invalid SVGs (for example, a partial SVG or one that has been saved as an rtf
), which is why I'm making liberal use of try
, and cherry-picking tags.
As implied by the title, this is my first non-tutorial attempt at Haskell, so give me all the style pointers you can muster (and highlight anything that signals a broken understanding on my part).
I'd also appreciate pointers on how to cherry-pick properties (and not just tags). I've tried writing property
as
property = do manyTill anyChar . try $ lookAhead tagName
name <- many1 $ noneOf "="
string "=\""
val <- manyTill anyChar $ char '"'
junk
return (name, val)
and defining propName
as
propName = string "points"
<|> string "x"
<|> string "y"
<|> string "r"
<|> string "d"
<|> string "cx"
<|> string "cy"
<|> string "width"
<|> string "height"
<|> string "transform"
This seems like it should work since it's basically how I got tag
jumping to the next desired tag, but it gives me unexpected input errors.
Edit The Second:
import System.Environment
import Text.ParserCombinators.Parsec
type TagName = String
type Property = (PropName, Value)
type PropName = String
type Value = String
parseFile fname = parseFromFile (manyTill (try tag) (try readEnd)) fname
tag :: GenParser Char st (TagName, [Property])
tag = do manyTill anyChar . try $ lookAhead tagStart
name <- tagStart
props <- tagContents
char '>'
junk
return (name, props)
tagContents :: GenParser Char st [Property]
tagContents = do props <- manyTill property . try . lookAhead $ char '>'
junk
return props
property :: GenParser Char st (PropName, Value)
property = do manyTill anyChar . try $ lookAhead propName
name <- many1 $ noneOf "="
string "=\""
val <- manyTill anyChar $ char '"'
junk
return (name, val)
junk = many $ oneOf "\n\r\t\\/{} "
tagStart :: GenParser Char st TagName
tagStart = do char '<'
name <- tagName
return name
readEnd = do optional $ string "</svg>"
junk
eof
tagName = oneStringOf ["rect", "polygon", "polyline", "circle", "path", "g", "svg"]
propName = oneStringOf ["points", "x", "y", "r", "d", "cx", "cy", "width", "height", "transform"]
oneStringOf :: [String] -> GenParser Char st String
oneStringOf = choice . map (try . string)
- Added
try
to the definition ofoneStringOf
1 Answer 1
Ok, here are some things I noticed:
First of all, while the code seems easy enough to follow as it is, a couple of comments here and there certainly couldn't hurt.
tag = do manyTill anyChar . try $ lookAhead tagStart
char '<'
name <- many $ noneOf " "
It seems weird that you first use tagStart
to find until where to match, but then don't use it to actually match the tag start.
junk = optional . many $ oneOf "\n\r\t\\/"
Since many can already match the empty string, making it optional doesn't change anything.
tagName = string "rect"
<|> string "polygon"
<|> string "polyline"
<|> string "circle"
<|> string "path"
<|> string "g"
<|> string "svg"
That looks a bit repetitive. I'd define a helper function which matches one of a list of strings:
-- Takes a list of strings and returns a Parser which matches any of those strings
oneStringOf = choice . map (try . string)
tagName = oneStringOf ["rect", "polygon", "polyline"] -- etc
By adding the try
we also made it so that it actually works correctly now (thanks to Joey Adams for pointing out that fix).
As for why your alternative definition for property
doesn't work: I'm not sure whether it's the only thing that keeps it from working, but one mistake is that you wrote tagName
when you meant propName
.