1

In Haskell, there are a few different options to "parsing text". I know of Alex & Happy, Parsec and Attoparsec. Probably some others.

I'd like to put together a library where the user can input pieces of a URL (scheme e.g. HTTP, hostname, username, port, path, query, etc.) I'd like to validate the pieces according to the ABNF specified in RFC 3986.

In other words, I'd like to put together a set of functions such as:

validateScheme :: String -> Bool
validateUsername :: String -> Bool
validatePassword :: String -> Bool
validateAuthority :: String -> Bool
validatePath :: String -> Bool
validateQuery :: String -> Bool

What is the most appropriate tool to use to write these functions?

Alex's regexps is very concise, but it's a tokenizer and doesn't straightforwardly allow you to parse using specific rules, so it's not quite what I'm looking for, but perhaps it can be wrangled into doing this easily.

I've written Parsec code that does some of the above, but it looks very different from the original ABNF and unnecessarily long.

So, there must be an easier and/or more appropriate way. Recommendations?

asked Nov 6, 2013 at 18:56
1
  • If you want to take a BNF grammar and use that as your code to parse, you want a parser generator that takes the grammar and spits out code. I don't know off hand which parser generators might work directly off a BNF like this, but that's because I don't prefer this type of approach. The resulting code is always greater and maintenance is funky since it's nothing you can directly dig into easily. The answer below is a nice clear concise approach which takes a minimum of code and would be easy to maintain. Though I agree it's not the BNF->validation you're looking for. Commented Nov 7, 2013 at 15:38

1 Answer 1

3

I'm curious why Parsec was so long, I'd write it something like

data URL = URL { scheme, hostname, username, port, path, query :: String}
parser :: String -> Either ParseError URL
parser = flip parse "Foo" $ Parser <$> parseScheme <*> parseHostname <*> parseUsername ...
validators = [validateScheme . scheme, validateHostname . hostname, .....]
validate :: URL -> Bool
validate url = all ($url) validators

This is nice and concise looking to me. As for how to implement the parser, it's a straightforward to parse each section and then apply URL to each piece.

answered Nov 6, 2013 at 19:40
4
  • To me it's not whether it's straightforward. To me, it looks much more removed from the original specification. Commented Nov 6, 2013 at 19:54
  • 1
    @Ana Define removed? I think it'd be a misfeature to mix your parsing and validation. Parse a wellformed url, and check if it fits your specifications. Commented Nov 6, 2013 at 19:55
  • You're user shouldn't have to care/see exactly what goes on to parse a url, they just care about making sure it fits XYZ specifications. Dealing with it as a datastructure makes this much more feasible Commented Nov 6, 2013 at 19:56
  • I would make the validators Either SomeError URL -> Either SomeError URL and compose them into the applicative parser so you have: Parser <$> (validateScheme . parseScheme) <*> (validateHostname . parseHostname) <*> (validateUsername . parseUsername) getting validation at parse/construction time, the intermediary step while good design for a larger piece seems unnecessary given what the OP is asking for. Though if this is a part of a greater whole your approach is better because the separation of concerns making for more reusable components in your approach. Commented Nov 7, 2013 at 16:12

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.