lua-users home
lua-l archive

Re: [ANN] llvm-lua 1.0

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Robert G. Jakabosky wrote:
> One way llvm-lua could improve the speed of static compiled code would be to 
> add support for recording hints during runtime and writting those hints to a 
> file that would then be passed to the static compiler.
Sure, feedback-directed optimization (FDO) would probably help a
lot. But static compilers face some inherent difficulties when
compiling a dynamic language:
- Dynamic languages generate tons of code paths, even for simple
 operations. A static compiler has to compile *all* code paths.
 Apart from the overhead, this pessimizes the fast code paths,
 too (esp. PRE and register allocation). A JIT compiler only
 generates code for paths that are actually executed (this may
 also differ between runs).
- Even with FDO, static compilers have to be very conservative
 about inlining and specialization to limit code growth. A JIT
 compiler can be more aggressive, because it has to deal with a
 much lower number of code paths.
- Not even the best C compilers are able to hoist complex control
 structures out of loops. Once you lower (say) a load from a hash
 table (a loop following the hash chain) to the LLVM IR, you lose
 the original semantics. This usually means you cannot optimize
 it or eliminate it, either.
That's not to discourage you from your work -- just to point out
some difficulties you'll be facing with llvm-lua.
The following trivial Lua function is a nice optimization challenge:
 function f(x, n)
 for i=1,n do x = math.abs(x) end
 return x
 end
The inner loop has two hash table loads (_G["math"], math["abs"]),
a function dispatch and a call to a standard library function.
Here's the corresponding checklist for a Lua compiler in
increasing order of complexity:
LJ1 LJ2
yes yes Type-specializes the hash table loads (object and key type).
yes* yes Inlines both hash table loads (* partially inlined by LJ1).
no yes Specializes the loads to the hash slot (omit chain loop).
yes yes Specializes to a monomorphic function dispatch for abs().
yes yes Inlines the abs() function.
no yes Hoists both hash table loads out of the loop.
no yes Hoists the abs() out of the loop (abs(abs(x)) ==> abs(x)).
no yes Emits an empty loop (could be optimized away, too).
Now try to get 8x yes with a static compiler, while preserving the
full Lua semantics (e.g. somebody might do math.abs = math.sin or
_G might have a metatable etc.). Good luck! :-)
> Below are the results of running scimark-2008年01月22日.lua [1] for a performance 
> comparison.
> 
> # taskset -c 1 /usr/bin/lua scimark-2008年01月22日.lua large
The large dataset is an out-of-cache problem and cache thrashing
(esp. for FFT) dominates the timings. IMHO the small dataset is
better suited to compare the performance of the different VMs.
Anyway, for the large dataset LuaJIT 1.1.x gets 89.80 and Lua 5.1.4
gets 13.87 on my box. And here's a preliminary result for LJ2:
FFT 85.91 [1048576]
SOR 821.95 [1000]
MC 73.14 
SPARSE 324.05 [100000, 1000000]
LU 820.51 [1000]
SciMark 425.11 [large problem sizes]
(i.e. 31x faster than plain Lua)
For comparison, GCC 4.3.2 -m32 -march=core2 -O3 -fomit-frame-pointer:
FFT 93.00 [1048576]
SOR 883.53 [1000]
MC 180.16 
SPARSE 716.08 [100000, 1000000]
LU 1149.43 [1000]
SciMark 604.44 [large problem sizes]
(I've reformatted the C program output to match the Lua output.)
Ok, so LJ2 is getting closer to C's performance. It's already on
par with C for simpler benchmarks (like mandelbrot or spectralnorm).
But there still remains a lot to do ...
--Mike

AltStyle によって変換されたページ (->オリジナル) /