8
\$\begingroup\$

I figured it's about time to jump into some Haskell, so here's a first attempt at an oddly specific SVG parser:

import System.Environment
import Text.ParserCombinators.Parsec 
parseFile fname = parseFromFile (manyTill (try tag) (try readEnd)) fname
tag = do manyTill anyChar . try $ lookAhead tagStart
 char '<'
 name <- many $ noneOf " "
 props <- tagContents
 char '>'
 junk
 return (name, props)
tagContents = do props <- manyTill property . try . lookAhead $ char '>'
 junk
 return props
property = do spaces
 name <- many1 $ noneOf "="
 string "=\""
 val <- manyTill anyChar $ char '"'
 junk
 return (name, val)
junk = optional . many $ oneOf "\n\r\t\\/"
readEnd = do optional $ string "</svg>" 
 junk
 eof
tagStart = do char '<'
 tagName
tagName = string "rect" 
 <|> string "polygon" 
 <|> string "polyline" 
 <|> string "circle" 
 <|> string "path" 
 <|> string "g" 
 <|> string "svg"

Before anyone asks, one of the objectives for this program (other than learning) is that it should accept invalid SVGs (for example, a partial SVG or one that has been saved as an rtf), which is why I'm making liberal use of try, and cherry-picking tags.

As implied by the title, this is my first non-tutorial attempt at Haskell, so give me all the style pointers you can muster (and highlight anything that signals a broken understanding on my part).

I'd also appreciate pointers on how to cherry-pick properties (and not just tags). I've tried writing property as

property = do manyTill anyChar . try $ lookAhead tagName
 name <- many1 $ noneOf "="
 string "=\""
 val <- manyTill anyChar $ char '"'
 junk
 return (name, val)

and defining propName as

propName = string "points" 
 <|> string "x" 
 <|> string "y" 
 <|> string "r" 
 <|> string "d" 
 <|> string "cx" 
 <|> string "cy" 
 <|> string "width" 
 <|> string "height" 
 <|> string "transform"

This seems like it should work since it's basically how I got tag jumping to the next desired tag, but it gives me unexpected input errors.


Edit The Second:

import System.Environment
import Text.ParserCombinators.Parsec 
type TagName = String
type Property = (PropName, Value)
type PropName = String
type Value = String
parseFile fname = parseFromFile (manyTill (try tag) (try readEnd)) fname
tag :: GenParser Char st (TagName, [Property])
tag = do manyTill anyChar . try $ lookAhead tagStart
 name <- tagStart
 props <- tagContents
 char '>'
 junk
 return (name, props)
tagContents :: GenParser Char st [Property]
tagContents = do props <- manyTill property . try . lookAhead $ char '>'
 junk
 return props
property :: GenParser Char st (PropName, Value)
property = do manyTill anyChar . try $ lookAhead propName
 name <- many1 $ noneOf "="
 string "=\""
 val <- manyTill anyChar $ char '"'
 junk
 return (name, val)
junk = many $ oneOf "\n\r\t\\/{} "
tagStart :: GenParser Char st TagName
tagStart = do char '<'
 name <- tagName
 return name
readEnd = do optional $ string "</svg>" 
 junk
 eof
tagName = oneStringOf ["rect", "polygon", "polyline", "circle", "path", "g", "svg"]
propName = oneStringOf ["points", "x", "y", "r", "d", "cx", "cy", "width", "height", "transform"]
oneStringOf :: [String] -> GenParser Char st String
oneStringOf = choice . map (try . string)
  • Added try to the definition of oneStringOf
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 19, 2011 at 5:53
\$\endgroup\$

1 Answer 1

5
\$\begingroup\$

Ok, here are some things I noticed:

First of all, while the code seems easy enough to follow as it is, a couple of comments here and there certainly couldn't hurt.


tag = do manyTill anyChar . try $ lookAhead tagStart
 char '<'
 name <- many $ noneOf " "

It seems weird that you first use tagStart to find until where to match, but then don't use it to actually match the tag start.


junk = optional . many $ oneOf "\n\r\t\\/"

Since many can already match the empty string, making it optional doesn't change anything.


tagName = string "rect" 
 <|> string "polygon" 
 <|> string "polyline" 
 <|> string "circle" 
 <|> string "path" 
 <|> string "g" 
 <|> string "svg"

That looks a bit repetitive. I'd define a helper function which matches one of a list of strings:

-- Takes a list of strings and returns a Parser which matches any of those strings
oneStringOf = choice . map (try . string)
tagName = oneStringOf ["rect", "polygon", "polyline"] -- etc

By adding the try we also made it so that it actually works correctly now (thanks to Joey Adams for pointing out that fix).


As for why your alternative definition for property doesn't work: I'm not sure whether it's the only thing that keeps it from working, but one mistake is that you wrote tagName when you meant propName.

answered Mar 19, 2011 at 7:59
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.