I am slowly creating a simple programming language (a bit like Lua).
The interpreter has 2 important methods, exec
and evaluate
.
exec
reads the tokens 1 by 1 and does stuff as it says like creating new variables, etc.
evaluate
basically interprets a bit differently.
It understands ==
, new numbers (5.3
), +-*/^%
and new strings with ""
.
It also understands variables and takes their value to be used.
In the end of evaluate
, it returns one value for exec
to use.
A ginormous design hole in this interpreter is the fact that you cannot create new strings in exec
without creating a variable.
Meaning:
string a = "some string";
a.someStringMethod();
Works, but this:
"some string".someStringMethod();
does not.
This also means multidimensional arrays do not work, although I plan to use .
instead of [
and ]
.
If you still do not understand how the interpreter right now here is the GitHub page on it:
https://github.com/lvivtotoro/mau/blob/master/Mau/src/org/midnightas/mau/Mau.java#L56
So the overall question is: How would I merge these 2 methods?
1 Answer 1
First of all, your parsers are not handling every token, but rather skipping over an arbitrary number of tokens until recognizing something.
Any parser I've ever seen or written does something with each and every token, even if it is very simple like pushing the token onto an appropriate stack. Handling of each token is very important with regard to a parser's ability to understand and recognize language syntax, and in other words to understand the syntactic state of the input at any point in its acceptance of tokens.
Unless you plan to develop your own parser methodology, I'd suggest you adopt one of the common ones. There are several things you might consider. First, you should define a specification for the syntax of your language. A good approach is to use a grammar (e.g. EBNF). Next, consider some well-understood technologies, such as:
Recursive Decent. As suggested by the name, uses a recursive algorithm to handle arbitrary expression complexity. A recursive decent parser can be created that almost directly reflects the grammar of the language, which is pretty neat. You write set of recursive routines where each recursive function reflects a production rule in the grammar. Such parser is not necessarily the most efficient, especially at expression parsing, because it expends effort repeatedly looking at the same tokens through the eyes of different levels of the grammar; still it is easy to write, easy to understand, and works. It can seamlessly integrate statement parsing (if,while,for,function,let,with, etc..) with expression parsing (a+b*c[d]), which may seem otherwise as requiring wholly different parsers.
Use a parser generator like ANTLR. This is a sophisticated tool chain that supports complex grammars. You input a grammar and out pops some code to parse the language. Voilà!
There are a myriad of other techniques as well, for instance, Operator Precedence.
-
I don't see how this answers my question, you told me about making the tokenizing process better, but in the end the problem will still be there.mid– mid2016年08月12日 15:17:05 +00:00Commented Aug 12, 2016 at 15:17
-
@Midnightas, if you follow one of the common methodologies, such as mentioned above, you won't have the problem of needing to integrate two separate parsers. The problem is that you are not using a methodological approach to parsing. When you do, these problems will go away. You can, of course, choose to create your own or use one of the common ones. If you like writing your own code, try recursive decent. Even if you eventually want to roll your own parsing methodology, having done a recursive decent parser will help your understanding.Erik Eidt– Erik Eidt2016年08月20日 18:44:03 +00:00Commented Aug 20, 2016 at 18:44
-
I started using ANTLR4 /w visitors, but I am in the same problem againmid– mid2016年09月08日 09:15:19 +00:00Commented Sep 8, 2016 at 9:15
-
Should I say my problem here or create a new question?mid– mid2016年09月08日 10:34:25 +00:00Commented Sep 8, 2016 at 10:34
-
@Midnightas, you've defined a single grammar for your language?Erik Eidt– Erik Eidt2016年09月08日 14:27:30 +00:00Commented Sep 8, 2016 at 14:27
exec
is for interpreting,evaluate
is for mathematical/expressions that returns a value forexec
to use. Ifexec
ever foundnumber a = 5 + 5;
, it would askexec
what5 + 5
is, then seta
to the result.