In Haskell, there are a few different options to "parsing text". I know of Alex & Happy, Parsec and Attoparsec. Probably some others.
I'd like to put together a library where the user can input pieces of a URL (scheme e.g. HTTP, hostname, username, port, path, query, etc.) I'd like to validate the pieces according to the ABNF specified in RFC 3986.
In other words, I'd like to put together a set of functions such as:
validateScheme :: String -> Bool
validateUsername :: String -> Bool
validatePassword :: String -> Bool
validateAuthority :: String -> Bool
validatePath :: String -> Bool
validateQuery :: String -> Bool
What is the most appropriate tool to use to write these functions?
Alex's regexps is very concise, but it's a tokenizer and doesn't straightforwardly allow you to parse using specific rules, so it's not quite what I'm looking for, but perhaps it can be wrangled into doing this easily.
I've written Parsec code that does some of the above, but it looks very different from the original ABNF and unnecessarily long.
So, there must be an easier and/or more appropriate way. Recommendations?
-
If you want to take a BNF grammar and use that as your code to parse, you want a parser generator that takes the grammar and spits out code. I don't know off hand which parser generators might work directly off a BNF like this, but that's because I don't prefer this type of approach. The resulting code is always greater and maintenance is funky since it's nothing you can directly dig into easily. The answer below is a nice clear concise approach which takes a minimum of code and would be easy to maintain. Though I agree it's not the BNF->validation you're looking for.Jimmy Hoffa– Jimmy Hoffa2013年11月07日 15:38:34 +00:00Commented Nov 7, 2013 at 15:38
1 Answer 1
I'm curious why Parsec was so long, I'd write it something like
data URL = URL { scheme, hostname, username, port, path, query :: String}
parser :: String -> Either ParseError URL
parser = flip parse "Foo" $ Parser <$> parseScheme <*> parseHostname <*> parseUsername ...
validators = [validateScheme . scheme, validateHostname . hostname, .....]
validate :: URL -> Bool
validate url = all ($url) validators
This is nice and concise looking to me. As for how to implement the parser, it's a straightforward to parse each section and then apply URL
to each piece.
-
To me it's not whether it's straightforward. To me, it looks much more removed from the original specification.Ana– Ana2013年11月06日 19:54:36 +00:00Commented Nov 6, 2013 at 19:54
-
1@Ana Define removed? I think it'd be a misfeature to mix your parsing and validation. Parse a wellformed url, and check if it fits your specifications.daniel gratzer– daniel gratzer2013年11月06日 19:55:22 +00:00Commented Nov 6, 2013 at 19:55
-
You're user shouldn't have to care/see exactly what goes on to parse a url, they just care about making sure it fits XYZ specifications. Dealing with it as a datastructure makes this much more feasibledaniel gratzer– daniel gratzer2013年11月06日 19:56:44 +00:00Commented Nov 6, 2013 at 19:56
-
I would make the validators
Either SomeError URL -> Either SomeError URL
and compose them into the applicative parser so you have:Parser <$> (validateScheme . parseScheme) <*> (validateHostname . parseHostname) <*> (validateUsername . parseUsername)
getting validation at parse/construction time, the intermediary step while good design for a larger piece seems unnecessary given what the OP is asking for. Though if this is a part of a greater whole your approach is better because the separation of concerns making for more reusable components in your approach.Jimmy Hoffa– Jimmy Hoffa2013年11月07日 16:12:01 +00:00Commented Nov 7, 2013 at 16:12