My idea is to write a superset of C# (but question is not language-specific), so that it source-to-source compiles (transcompiles) to C# itself (fall-through switch clauses, default method parameters etc., nothing impossible in C#).
First idea was to parse it, make syntax trees, abstract trees etc. but it seems as a bit of an overkill to me, mostly because large portions of code will remain the same.
My question: Is there a simpler way to do this?
One of my ideas was to search for tokens that need modifying (e.g. switch
in case of fall-through) and then rewrite the code (add goto case NEXT_CASE
where needed) but is there a better and cleaner way to do this?
-
2Look at roslyn.codeplex.com a lot of it may already have been done for you.Daniel Little– Daniel Little2014年08月11日 03:43:57 +00:00Commented Aug 11, 2014 at 3:43
2 Answers 2
If you want this to maintainable then not really. I've seen a compiler that was literally an overgrown sed script. It worked of course, but then we decided we wanted to add something to the language..
However, if you take the more or less standard route of
- Lex
- Parse
- Compile superset to vanilla C# AST
- Pretty print AST
you can almost certainly use an existing library for 4, and if you decide to grow your compiler then you'll have a far easier time. If you want to do anything vaguely serious with this compiler than the initial overhead is well worth it.
It might be worth your time to look into some nicer tools for parsing/lexing. I don't think it'd be impossible to find/modify an existing C# grammar to deal with 1 and 2.
-
Thanks for the reply and advice. Mono has an open source C# compiler, so I might be able to change the lexer and parser.SpelingMistake– SpelingMistake2014年08月10日 12:24:43 +00:00Commented Aug 10, 2014 at 12:24
-
@StarGateTABC If you're really lucky they'll ahve a grammar for something like yacc or bison and then you can just modify that :)daniel gratzer– daniel gratzer2014年08月10日 12:29:54 +00:00Commented Aug 10, 2014 at 12:29
-
Yes they do have it for JAY, which is also some grammar tool thing. Actually it's their own port of jay to C#SpelingMistake– SpelingMistake2014年08月10日 12:37:35 +00:00Commented Aug 10, 2014 at 12:37
-
1This is exactly how C++ got started. Stroustrop's first C++ "compiler" (called CFront) did a full syntactic analysis of the input C++ program and produced an equivalent output C program.Ross Patterson– Ross Patterson2014年08月10日 13:31:44 +00:00Commented Aug 10, 2014 at 13:31
One simpler way which actually matches your idea with searching tokens etc. is called preprocessor
It is a piece of "transcompiler" which is quite easy to write as it actually does not understand the language semantics and sometimes it even does not understand the language syntax, except few basics (like tokenizing the input source). It operates just on the text level. In order to "compile" into valid target code, some strict syntax rules must be usually obeyed by the programmers.
There has been quite a few preprocessors in the history, e.g. macro assembler or perhaps the best known the C preprocessor
Out of the less known and very powerful I have quite admired the one used by Alaska, the successor of the Clipper programming language.
The preprocessor was able to handle most of the syntactic sugar and most of the Clipper-compatibility things.
It is definitely easier to implement and as long as the language superset will be used only by programmers who can be made to follow some rules - you should get the thing done (or at least have a 1st working prototype) in a reasonable time
-
1Actually I'm doing a combination of both answers: I'm writing a Lexer-Parser, but not parsing it down to very end (identifiers and constants), but copying currently unchanged parts verbatim (whole expressions), because I don't need to change them. As the language additions expand, I'll expand the definitions (right now expression is a leaf, but I can expand it to add support for more operators, for example)SpelingMistake– SpelingMistake2014年08月12日 06:27:16 +00:00Commented Aug 12, 2014 at 6:27
-
@StarGateTABC the preprocessors I've seen were usually driven by a set of rules. There was sort of "Data Definition Language" and "Data Manipulation Language". In your case you can either hard code the rules into code or try to come up with a rule definition language enabling rules like "if you see keyword switch and ... then emit original keyword and all tokens you see until the default keyword.. and everything till the end bracket.." The ability to define rules and then interpret them was the not so trivial part of preprocessors. I would think about drawing a state machine. Good luck :+1:xmojmr– xmojmr2014年08月12日 10:43:14 +00:00Commented Aug 12, 2014 at 10:43
-
-
@Bacco while
harbour
may be another Clipper successor, its "preprocessor" looks fairly normal (see harbor.y forYACC
grammar). On the other handAlaska
is also Clipper sucessor having very capablemacro compiler
(the preprocessor)xmojmr– xmojmr2014年09月21日 05:03:58 +00:00Commented Sep 21, 2014 at 5:03 -
@xmojmr there's some confusion here. Harbour's preprocessor is far more capable and complex than Clipper's one, and it is not synonym as the macro compiler (that Harbour also have), neither the compiler itself. I strongly suggest you take a broader look at the sources. By the way, look here too: github.com/harbour/core/tree/master/src/ppLargato– Largato2014年09月21日 09:26:54 +00:00Commented Sep 21, 2014 at 9:26