I'm currently writing a JavaScript compiler in ANTLR+Java.
I've read questions here on Stack Overflow on how to proceed with the execution - and the answer is always that it would be way too hard to do a static compilation (without JIT-information) of a dynamic language - but why is that exactly? There are of course the obvious "type resolving" problem and in JavaScript maybe a problem with the eval function - but are there other reasons? (because they don't seem too hard to overcome pure statically (no JITS))
I'm excluding JIT-based compilation because I figure it would be too hard for me to implement.
I have some experience in writing static compilers with a byte-code execution.
UPDATE:
All your answers are really helpfull understanding the problem. To clarify does this mean that JavaScript is harder to implement than other dynamic languages?
And does this also means that im better of using a Tree-based interpreter than e.g. Byte-code (if we forget about the property that JS always is shipped in raw source code - hence adding extra time for generating and IR and afterwards execute it)? - or should they be about equally easy / hard to do?
(Im new to SOF; dont know if this is the preferred way to update a question?)
3 Answers 3
There are lots of ways this conversation could go. Here's one direction. In javascript, nearly everything is an object and properties or methods can be added to any object at run-time. As such, you don't know at compile time what methods or properties will or won't be attached to an object. As such, everything has to be looked up at run-time.
For example:
var myObj = {};
function configureObject() {
if (something in the environment) {
myObj.myfunc = function () {alert("Hi");}
} else {
myObj.myfunc = function () {document.write("Hello");}
}
}
Now, sometime later in the code you call myObj.myfunc(); It is not known at compile time what myfunc is or whether it's even an attribute of myObj. It has to be a run-time lookup.
In another example, take this line of code:
var c = a + b;
What his means depends entirely upon the types of a and b and those types are not known at compile time.
If a and b are both numbers, then this is an addition statement and c will be a number.
If either a or b is a string, then the other will be coerced to a string and c will be a string.
You can't precompile this kind of logic into native code. The execution environment has to record that this is a request for the addition operator between these two operands and it has to (at runtime) examine the types of the two operands and decide what to do.
The challenge with writing a static JavaScript compiler is that it is in general undecidably hard to determine what objects are being referenced at any program point or what functions are being called. I could use the fact that JavaScript is dynamic to decide which function to call based on the output of some Turing machine. For example:
var functionName = RunTuringMachineAndReportOutputOnTape(myTM, myInput);
eval(functionName + "();");
At this point, unless you have advance knowledge about what myTM and myInput are, it is provably impossible to decide what function will be invoked by the call to eval, since it's undecidable to determine what is on a Turing machine's tape if it halts (you can reduce the halting problem to this problem). Consequently, no matter how clever you are, and no matter how good of a static analyzer you build, you will never be able to correctly statically resolve all function calls. You can't even bound the set of functions that might be called here, since the Turing machine's output might define some function that is then executed by the above code.
What you can do is compile code that, whenever a function is called, includes extra logic to resolve the call, and possibly uses techniques like inline caching to speed things up. Additionally, in some cases you might be able to prove that a certain function is being called (or that one of a small number of functions will be called) and can then hardcode in those calls. You could also compile multiple versions of a piece of code, one for each common type (object, numeric, etc.), then emit code to jump to the appropriate compiled trace based on the dynamic type.
3 Comments
V8 does that. See Compile JavaScript to Native Code with V8
With EcmaScript 3 and 5 non-strict there are a number of wrinkles around scopes which you don't run into in other dynamic languages. You might think that it is easy to do compiler optimizations on local variables, but there are edge cases in the language when it is not, even ignoring eval's scope introspection.
Consider
function f(o, x, y) {
with (o) { return x + y + z; }
}
when called with
o = {};
o = { z: 3 };
o = { x: 1, z: 2 };
Object.prototype.z = 3, o = {};
and according to EcmaScript 3,
x = (function () { return toString(); })()
should produce quite a different result from
x = toString();
because EcmaScript 3 defines an activation record as an object with a prototype chain.
Comments
Explore related questions
See similar questions with these tags.