I am looking for source codes of parsers and/or parser generators that could be studied in order to develop further, my skills that I acquired during a school course. Do you know any recommendable parsers of any type?
-
Parsers or parser generators?Dervall– Dervall2012年02月25日 21:34:04 +00:00Commented Feb 25, 2012 at 21:34
-
I also had pain with that, and I've worked on Masala Parser (github.com/masala-oss/masala-parser) in the purpose of being an easier parser generator than bison/yacc.Nicolas Zozol– Nicolas Zozol2019年06月05日 09:09:19 +00:00Commented Jun 5, 2019 at 9:09
5 Answers 5
You should know how to build recursive descent parsers by hand. Here's an SO link to a quick lesson on how to do this: https://stackoverflow.com/a/2336769/120163
If you want to understand how recursive descent parsers can be constructed automatically, you can read a paper (and see a tutorial) on MetaII at this link: https://stackoverflow.com/a/1142034/120163
7 Comments
- Bison is a classical example (C/C++).
- Pyparsing is a great module, and it is very easy to use (Python) .
- Lemon is very easy to use (C++).
Check the examples, and good luck.
Edit:
I guess I should comment. A parser is a program which processes an input and "understands" it. A parser generator is a tool used to write parsers. I guess you mean you want to learn more about generating parsers, in which case, you should refer to the documentation of parser generators (all of the above).
Comments
Parsers themselves are usually not that interesting, it's the generators of parsers that are more of a subject of study.
- ANTLR generates LL parsers which are easily readable once generated. (Java)
- Bison generates LALR(1) parsers which are impossible to read. (C)
If LALR(1) interests you, I have a library up on github that tries to do a new number on LALR parsing. Feel free to take a look. It's in C# and I've tried my finest to make the code comprehensible. It's been a learning project for me, but it's smaller than the big tools and a bit easier to penetrate. And definitely feel free to contribute, lots of features to add still.
Otherwise, take a look at the generated code of these tools to see how they build the actual parsers that do the work.
2 Comments
I would suggest you this book: http://www.cs.nott.ac.uk/~gmh/book.html. it's quite good to start Haskell and it has got an entire chapter on parsers.
If you can understand that, creating a parser using Haskell is pretty straight forward. Take also in consideration that Haskell is quite fast and good for multi-core programming, so it may be the future.
Plus.
Here is a parser in Haskell: Happy - http://www.haskell.org/happy/.
Comments
I just went through the same battles and finally feel like I have a good handle on your options.
Lots of parsers are build for context-free grammars. You can read the formal definition of context-free but my intuition is that it basically means that syntax tokens / rules cannot change based on some context. I could be wrong about this, but I think it also means that you don't have look-ahead.
For example, markdown is not context-free, and I think pretty much a any language that is indentation based is not context-free without having to do some preprocessing to wrap blocks with start and end tokens. C is a perfect example of a context-free grammar.
If you're dealing with a context-free grammar, BNF is a formal way of specifying the compiler. This was an immensely helpful article explaining how BNF grammars work, how they're performant, and common extensions to BNF grammars.
Some of your options in this category are ANTLR, Bison, Yacc, Jison, and Peg.js.
However, after battling against ANTLR for a while, I found what I think is the best solution: "parser combinators". Its basically regex on steroids and very popular in the functional programming world.
I don't have any good learning resources for you yet, but Google around and you'll find them for pretty much any language. I come from the JavaScript world and peeking through the source code of this very small library really helped me understand what they're all about.