I have a little pet compiler project that generates bytecode interpreted by a virtual machine. The language is kind of low-level, as it allows the user to manually allocate memory and dereference any pointer as they see fit. This can of course lead to bugs which crash the interpreted program. When the VM runtime is used in a host environment though, I would like to prevent the host application from crashing if a script has memory bugs in it. What I thought I could do is install a signal handler that catches SIGSEGV
(and perhaps other signals) and longjump
s back into the runtime. The runtime could clean up after the script, as it can track all resource allocations made by the user through the language facilities. I tested it and it works nicely for simple cases, however please correct me if I'm wrong on this.
What makes things complicated though is that it is possible for the environment to install callbacks that the script can call. Then those callbacks can execute other script functions in the VM runtime. So basically the host program can have callstacks that look like this:
Host -> ScriptFunctionA -> HostCallback -> ScriptFunctionB -> AnotherHostCallback -> ...
So I would install the signal handler when the VM runtime is constructed, call setjmp
whenever a script function is called and keep a stack of jmp_buf
s so the signal handler can jump to the script invocation at the top of the callstack. What is not considered in this design is that the host might install other signal handlers for SIGSEGV
overriding mine.
Here is a slightly simplified code example
Runtime runtime;
runtime.installCallback(HostFuncA); // HostFuncA calls bar() defined in the script below, actual design is a bit more complicated than this
runtime.installCallback(HostFuncB);
runtime.compile(R"(
func foo() {
HostFuncA();
}
func bar() {
var baz = *roguePointer; // Segfault
HostFuncB();
}
)");
auto hostFoo = runtime.getFunction("foo");
hostFoo();
The constructor of Runtime
installs a signal handler via
sigaction(SIGSEGV, &sa, nullptr); // Or perhaps store already installed handler and reinstall it later...
The signal handler calls
longjmp(jmpBufStack.top());
And the function hostFoo()
defined by the runtime looks like this
if (setjmp(jmpBufStack.push())) { cleanup(); throw /*...*/; }
executeScript();
jmpBufStack.pop();
So my question is this: Is this design sound? Can I even handle segfaults in user code reliably? Would I (as the maintainer of the runtime) have to reinstall the handler everytime a host callback returns, in case it installed another signal handler?
2 Answers 2
It seems you have two conflicting requirements.
On one hand, you want direct, low level memory access from your language's byte code to the host.
On the other hand, you want to prevent the host application from crashing by something which happens in the byte code.
In reality, it is hard to get both. If you effectively want to prevent your VM interpreter crashing the host application, you need to isolate the interpreter execution in a separate process, and that forbids most kind of low level memory access. Shared memory might be an option, but that will undoubtly make the memory interface more complex. Callback functions to the host will become quite a challenge. If you want to go this route, you better rethink your whole execution model and switch to an asynchronous event-based approach.
If you just want to reduce the probability of crashing to a reasonable degree, but still prefer an in-process solution, you may consider not to pass any data by pointer references, but only "by value", or by some kind of "managed" or "smart" pointer capsule. The goal should be to avoid the occurence of error signals like a segmentation fault, so you can leave the implementation of such handlers to the host process.
Can I even handle segfaults in user code reliably?
No, because the worst case is that it doesn't segfault but instead overwrites part of the runtime's state which is in the same process.
You can't achieve sandboxing this way. Have you considered targeting e.g. the WASM runtime? It manages this by having fake "pointers" that are just offsets into a block of memory that can be bounds-checked by the runtime.
-
Yes, I haven't really considered that possibility. I think I will do just that, have a big block of 'virtual memory' and share data between guest and host by value only.chrysante– chrysante2023年09月27日 09:09:18 +00:00Commented Sep 27, 2023 at 9:09
malloc
andfree
, but the language exposes them in a RAII like fashion so 'destructors' of owning pointers callfree
. But I will probably change this to sandboxed arena allocations in the future. It could still dereference pointers that it receives from the host environment, but I guess in that case it is the responsibilty of the host to only pass valid pointers to the script functions. What is not possible though is pointer arithmetic (beyond array indexing which is always bounds checked)