1
\$\begingroup\$

In a parser written with Parsec 3.1.11, I have a performance issue with parsing (or better, skipping) the unreachable branch of a if-then-else statement.

The syntax to be parsed is as follows:

_if (condition) {
 # block of statements
} _else {
 # block of statements
}

The _else branch is optional and only statically resolvable conditions are allowed.

The parser for the if-then-else block:

type MyParser a = ParsecT T.Text UserState (StateT UserState Identity) a
ifelseStmt :: MyParser -> [Statement]
 = do reserved "_if"
 e <- parens expression
 if evalBoolExpr e
 then do b <- block
 option [] $
 do reserved "_else"
 unparsedBlock
 return b
 else do unparsedBlock
 option [] $
 do reserved "_else"
 block

Now, parser unparsedBlock is meant to throw away the entire {...} block of the unreachable branch. Obviously this can in turn contain {..} blocks, which means curly brackets have to be matched.

This is the parser I wrote for it:

unparsedBlock :: MyParser -> [Statement]
 = do braces (many (do { noneOf "{}"; return []} <|> unparsedBlock))
 return []

It works well, but the profiler shows that way too much time is spent on it, while I'd expect it to be rather efficient.

Can anyone see how I could implement unparsedBlock more efficiently?

asked Sep 7, 2016 at 13:42
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

First of all, you can batch the skipping of braces (and return ()):

unparsedBlock :: Parser ()
unparsedBlock = braces (skipMany (skipMany (noneOf "{}") <|> unparsedBlock))

Unfortunately, this is also probably quite inefficient, as this still works on a character-by-character basis, and uses stuff like manyAccum under the hood.

If you really need more speed, then you will need to write your own Parser function that does the skipping on the source, while at the same time updating the position in the stream.

That is (one of) the reason(s) attoparsec is much faster : it doesn't keep track of the position in the input stream, and has combinators that work directly on the source (such as skipWhile).

answered Sep 19, 2016 at 13:20
\$\endgroup\$
1
  • \$\begingroup\$ The character by character parsing seems the real issue indeed. Somewhat related to this discussion btw: github.com/mrkkrp/megaparsec/issues/106 \$\endgroup\$ Commented Oct 1, 2016 at 9:05

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.