Inefficient Parsec parser to skip unreachable block

Question 1

In a parser written with Parsec 3.1.11, I have a performance issue with parsing (or better, skipping) the unreachable branch of a if-then-else statement.

The syntax to be parsed is as follows:

_if (condition) {
 # block of statements
} _else {
 # block of statements
}

The _else branch is optional and only statically resolvable conditions are allowed.

The parser for the if-then-else block:

type MyParser a = ParsecT T.Text UserState (StateT UserState Identity) a
ifelseStmt :: MyParser -> [Statement]
 = do reserved "_if"
 e <- parens expression
 if evalBoolExpr e
 then do b <- block
 option [] $
 do reserved "_else"
 unparsedBlock
 return b
 else do unparsedBlock
 option [] $
 do reserved "_else"
 block

Now, parser unparsedBlock is meant to throw away the entire {...} block of the unreachable branch. Obviously this can in turn contain {..} blocks, which means curly brackets have to be matched.

This is the parser I wrote for it:

unparsedBlock :: MyParser -> [Statement]
 = do braces (many (do { noneOf "{}"; return []} <|> unparsedBlock))
 return []

It works well, but the profiler shows that way too much time is spent on it, while I'd expect it to be rather efficient.

Can anyone see how I could implement unparsedBlock more efficiently?

Question 2

First of all, you can batch the skipping of braces (and return ()):

unparsedBlock :: Parser ()
unparsedBlock = braces (skipMany (skipMany (noneOf "{}") <|> unparsedBlock))

Unfortunately, this is also probably quite inefficient, as this still works on a character-by-character basis, and uses stuff like manyAccum under the hood.

If you really need more speed, then you will need to write your own Parser function that does the skipping on the source, while at the same time updating the position in the stream.

That is (one of) the reason(s) attoparsec is much faster : it doesn't keep track of the position in the input stream, and has combinators that work directly on the source (such as skipWhile).

Question 3

The character by character parsing seems the real issue indeed. Somewhat related to this discussion btw: github.com/mrkkrp/megaparsec/issues/106

bartavelle bartavelle 3802 silver badges5 bronze badges · Answer 1 · 2016-09-19 13:20:48Z

First of all, you can batch the skipping of braces (and return ()):

unparsedBlock :: Parser ()
unparsedBlock = braces (skipMany (skipMany (noneOf "{}") <|> unparsedBlock))

Unfortunately, this is also probably quite inefficient, as this still works on a character-by-character basis, and uses stuff like manyAccum under the hood.

If you really need more speed, then you will need to write your own Parser function that does the skipping on the source, while at the same time updating the position in the stream.

That is (one of) the reason(s) attoparsec is much faster : it doesn't keep track of the position in the input stream, and has combinators that work directly on the source (such as skipWhile).

The character by character parsing seems the real issue indeed. Somewhat related to this discussion btw: github.com/mrkkrp/megaparsec/issues/106

Stack Exchange Network

Inefficient Parsec parser to skip unreachable block

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Inefficient Parsec parser to skip unreachable block

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions