In a parser written with Parsec 3.1.11, I have a performance issue with parsing (or better, skipping) the unreachable branch of a if-then-else statement.
The syntax to be parsed is as follows:
_if (condition) {
# block of statements
} _else {
# block of statements
}
The _else
branch is optional and only statically resolvable conditions are allowed.
The parser for the if-then-else
block:
type MyParser a = ParsecT T.Text UserState (StateT UserState Identity) a
ifelseStmt :: MyParser -> [Statement]
= do reserved "_if"
e <- parens expression
if evalBoolExpr e
then do b <- block
option [] $
do reserved "_else"
unparsedBlock
return b
else do unparsedBlock
option [] $
do reserved "_else"
block
Now, parser unparsedBlock
is meant to throw away the entire {...}
block of the unreachable branch. Obviously this can in turn contain {..}
blocks, which means curly brackets have to be matched.
This is the parser I wrote for it:
unparsedBlock :: MyParser -> [Statement]
= do braces (many (do { noneOf "{}"; return []} <|> unparsedBlock))
return []
It works well, but the profiler shows that way too much time is spent on it, while I'd expect it to be rather efficient.
Can anyone see how I could implement unparsedBlock
more efficiently?
1 Answer 1
First of all, you can batch the skipping of braces (and return ()
):
unparsedBlock :: Parser ()
unparsedBlock = braces (skipMany (skipMany (noneOf "{}") <|> unparsedBlock))
Unfortunately, this is also probably quite inefficient, as this still works on a character-by-character basis, and uses stuff like manyAccum under the hood.
If you really need more speed, then you will need to write your own Parser
function that does the skipping on the source, while at the same time updating the position in the stream.
That is (one of) the reason(s) attoparsec
is much faster : it doesn't keep track of the position in the input stream, and has combinators that work directly on the source (such as skipWhile).
-
\$\begingroup\$ The character by character parsing seems the real issue indeed. Somewhat related to this discussion btw: github.com/mrkkrp/megaparsec/issues/106 \$\endgroup\$swingbit– swingbit2016年10月01日 09:05:58 +00:00Commented Oct 1, 2016 at 9:05