I have a personal project, I want to write a JIT compiler/runtime in Rust (well, the language is not that relevant). I'm thinking about using a technique where the code is interpreted first and then JITted only when the runtime decides it is time to JIT (whatever that means).
This imposes a certain problem. How exactly interpreted and JITted code interacts with each other? Here's what I understand so far:
An interpreted function would look like this:
fn handle_call(context: &mut Context) {
let current_fn = context.current_fn();
for instr in current_fn.instructions() {
match instr {
// do something
}
}
}
so in Rust I have a single function, that is called with different context, depending on what the interpreted function we are running at the moment.
It is easy to imagine that if I find a CALL
instruction, then I retrieve the corresponding data, create new Context
object and make (recursive) call to handle_call
function.
But how do I call a JITted function? And how will a JITted function call interpreted function? Here are two options I can see:
JITted functions follow the same signature, they accept
&mut Context
objects. This makes things very simple, since now calls between all functions are just calls through pointers. But I don't like this solution, since this involves potentially big overhead of building, passing around and usingContext
objects. For example if our function accepts ani32
, then it can be passed in registers. What is worse ARGS have to be packed intoContext
object, and since we don't know how many of them there will be, this can mean a potential heap allocation per call. And this is a huge overhead.All functions, JITted or not, are treated the same way as low level, machine code functions. So JITted code simply emits instructions corresponding to some calling convention. While interpreted code now has to transform recursive call to
handle_call
into an appropriate machine code call. But how do I do that? I don't know the signature at compile time (as in: compiler's compile time), so how can I generate an arbitrary call? The only thing I can think of is to generate machine code on the fly whenCALL
instruction is seen.On the other hand how do I call interpreted function from JITted one? I cannot use a single
handle_call
to handle all calls. Now I need to generate a wrapper for each function in my language that does the opposite to what I've mentioned earlier: it takes arguments passed through some low level calling convention, packs them intoContext
object and then callshandle_call
with it.I'm not worried about performance of this process, after all it is JITted code that is supposed to be optimized, not the interpreted code. What I'm worried is that this already requires JITting of at least some code. Originally I though that I can separate interpretter from JITter, but with this approach I have to use JITter at least partially. Which is a significant complication. Not only complication, I thought that I can use interpreter on platforms where I don't have JITter implemented. With this approach I cannot.
So am I forced to sacrifice performance here? Or JIT (at least partially) from the beginning and sacrifice simplicity and modularity? How do tiered JITters like Java (see here and here) do that? What am I missing here?
-
1You can make this slightly less complicated by imposing the requirement that JIT code cannot call non-JIT code, except through function-as-variable (closures, C#'s delegates, etc)pjc50– pjc502024年10月24日 10:50:19 +00:00Commented Oct 24, 2024 at 10:50
-
1I don't know how it is solved in the Java runtime, but my guess would be that when the JIT compiles a function, it also provides an "interpreter-friendly" adapter as a second entry point. In the other direction, for each interpreted function called from a compiled function, the JIT may provide a "compiled-code" friendly adapter as well, which gets replaced as soon as the interpreted function gets compiled. But when crossing the border between interpreted and compiled code, I would always expect some marshalling overhead.Doc Brown– Doc Brown2024年10月24日 10:52:04 +00:00Commented Oct 24, 2024 at 10:52
-
3Again, some braindead close voting and downvoting, I bet it came from one of our usual suspects, probably because there are literally two questions asked here inside of one. Sorry, but this is a well researched and focussed question. I just hope some more people here balance these senseless, destructive votes.Doc Brown– Doc Brown2024年10月24日 16:35:26 +00:00Commented Oct 24, 2024 at 16:35
-
1@JimmyJames of course they mean it interprets the bytecode. That's exactly how I use the term here. What DocBrown says is that C# always JITs a function, it doesn't have the interpretation step.freakish– freakish2024年10月24日 20:29:23 +00:00Commented Oct 24, 2024 at 20:29
-
2@JimmyJames both articles that I've linked earlier (from this and previous year) claim the opposite about Java. It does not JIT right from the beginning. And none of the references you provided support what you just said.freakish– freakish2024年10月24日 21:53:54 +00:00Commented Oct 24, 2024 at 21:53
2 Answers 2
I don't know how it is solved in the Java environment, but here is an idea for an approach which might work for you.
When the JIT compiler compiles a function, it could provide two entry points:
- one standard entry point for other compiled code - so compiled functions can call other compiled functions without overhead
- an "interpreter-friendly" adapter as a second entry point, which provides the unmarshaling of arguments from the context and then calls the first entry point.
In the other direction, for each (potentially) interpreted function called from a compiled function, the JIT may provide an adapter for the compiled code which does the specific marshaling. That adapter can be replaced as soon as the interpreted function gets compiled. For the latter, the JIT has to do some book-keeping where it has placed such adapter calls in the JIT-compiled version, and it needs a mechanic to modify such adapter calls later, or simply recompile the specific functions again.
When crossing the border between interpreted and compiled code, in either direction, I would always expect some marshalling overhead, I think that's unavoidable. Still, what I wrote above lets you develop the interpreter independently from a JIT compiler (just not the other way round).
-
FYI, while this answers my question, it introduces another issue: how do I pass around references to functions? As two pointers + flag? Together with atomic check. Eh, doesn't sound too good. And it is a mandatory feature. Maybe JITting everything from start (like C# does) is actually simpler and better.freakish– freakish2024年10月25日 13:56:45 +00:00Commented Oct 25, 2024 at 13:56
-
@freakish: well, I am programming C# since 2003, and as far as I remember, C# programs had never such startup time issues like early Java. So I guess a bytecode interpreter may be expandable under performance aspects. Still, the link in my second comment under your question gives an different motivation: getting the CLR quickly available on new platforms for which a specific JIT is not (yet) available.Doc Brown– Doc Brown2024年10月25日 15:01:17 +00:00Commented Oct 25, 2024 at 15:01
-
1@freakish compiled code may be given a native wrapped over interpreted function, until the latter is compiled.Basilevs– Basilevs2024年10月25日 15:49:18 +00:00Commented Oct 25, 2024 at 15:49
-
@Basilevs consider what happens when A takes reference to C and passes it to B, so that B calls it. B doesn't know whether this ptr is compiled or not. And so I need to pass more information, and runtime check. Unless I JIT everything from the beginning with some standard calling convention.freakish– freakish2024年10月25日 16:05:11 +00:00Commented Oct 25, 2024 at 16:05
-
If I go down the interpreter + JITter road, then I think that glueing interpreter with some assembly is unavoidable unfortunately. It seems that Java has interpreter per architecture, I read it somewhere. That would explain a lot...freakish– freakish2024年10月25日 16:11:03 +00:00Commented Oct 25, 2024 at 16:11
For example: JavaScript in Safari is run a fe times in the interpreter to avoid expensive translation. If a loop or a function is run a dozen times then it is compiler producing unoptimised code with very fast compile time. After 100 repeats it is compiled again with a much better but slower compiler. Then after 1000 or so executions it runs Clang to produce the fastest possible code.
Now let’s say you have a variable that could in principle have any type but it was int in 990 of 1000 cases. So the clang compiler will produce code like "if type is Int then run totally optimised code for the int case else run code compiled with the lightweight compiler".
Every time the system decides to update the compiled code it stops execution, compiles, then replaces the old code either the new code. To avoid pauses the compilation can be done con a separate thread.
-
Sure you read the question in full, not just the title?Doc Brown– Doc Brown2024年10月24日 20:07:18 +00:00Commented Oct 24, 2024 at 20:07
Explore related questions
See similar questions with these tags.