Re: Computed goto optimization of vanilla Lua
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Computed goto optimization of vanilla Lua
- From: "Soni L." <fakedme@...>
- Date: Wed, 3 Feb 2016 20:47:05 -0200
On 03/02/16 08:28 PM, Nagaev Boris wrote:
Hi,
computed gotos are a feature of modern compilers [1]. They can be used
as a faster replacement for switch-based VM [2]. Many programming
languages have VMs implemented in computed gotos (at least, Ruby and
Python).
The computed goto version is faster because of two reasons [2]:
* The switch does a bit more per iteration because of bounds checking.
* The effects of hardware branch prediction.
I have applied this optimization to VM of Lua 5.3.2, file src/lvm.c
[3]. It was very easy, because VM uses macros vmdispatch, vmcase and
vmbreak. I have redefined these macros and created a dispatch table.
It passes Lua basic tests (path/lua -e"_U=true" all.lua) [4].
My benchmark [5] shows speedup of 1.12:
$ time ./src/lua.orig ~/lang-bench/f3/test.lua
499500
real 0m28.208s
user 0m28.166s
sys 0m0.004s
$ time ./src/lua ~/lang-bench/f3/test.lua
499500
real 0m25.066s
user 0m25.030s
sys 0m0.000s
This commit is a draft, not a final contribution. If a compiler
doesn't support computed gotos, the switch based implementation should
be used, as before.
[1] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
[2] http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables
[3] https://github.com/starius/lua/commit/b10eb122ad1e6f94e3ebf100179515ed66921756
[4] http://www.lua.org/tests/#basic
[5] https://github.com/starius/lang-bench/blob/master/f3/test.lua
So the C equivalent of NoVM[1]? (NoVM, in Lua and functional languages,
basically builds a linked list of functions, which use tail-calling to
go to the next instruction)
[1] It's a name I came up with. Basically means a VM without what we
consider "the VM", that is a VM without an interpreter loop. (Physical
machines use what's equivalent to an interpreter loop, so it makes sense
for VMs to use interpreter loops. A loop-less VM doesn't sound much like
a real machine.) This is actually much better on JITs, yielding HUGE
performance improvements compared to interpreter loops. Tested with
LuaJIT. It also lets you add features you couldn't otherwise.
--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.