lua-users home
lua-l archive

CSE chains in LuaJIT 2.0

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,
I read http://article.gmane.org/gmane.comp.lang.lua.general/58908 and was
very interested to read this:
- Skip-list chains: The IR is threaded with segregated, per-opcode
 skip-list chains. The links are stored in a multi-purpose 16 bit
 field in the instruction. This facilitates low-overhead lookup
 for CSE, DSE and alias analysis. Back-linking enables short-cut
 searches (average overhead is less than 1 lookup). Incremental
 build-up is trivial. No hashes, no sets, no complex updates.
I looked at the LuaJIT-2.0.0-beta4 code, this (from lj_opt_fold.c) seems to
be the relevant function:
 TRef LJ_FASTCALL lj_opt_cse(jit_State *J)
 {
 /* Avoid narrow to wide store-to-load forwarding stall */
 IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
 IROp op = fins->o;
 if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
 /* Limited search for same operands in per-opcode chain. */
 IRRef ref = J->chain[op];
 IRRef lim = fins->op1;
 if (fins->op2 > lim) lim = fins->op2; /* Relies on lit < REF_BIAS. */
 while (ref > lim) {
 if (IR(ref)->op12 == op12)
 return TREF(ref, irt_t(IR(ref)->t)); /* Common
subexpression found. */
 ref = IR(ref)->prev;
 }
 }
 /* Otherwise emit IR (inlined for speed). */
 {
 IRRef ref = lj_ir_nextins(J);
 IRIns *ir = IR(ref);
 ir->prev = J->chain[op];
 ir->op12 = op12;
 J->chain[op] = (IRRef1)ref;
 ir->o = fins->o;
 J->guardemit.irt |= fins->t.irt;
 return TREF(ref, irt_t((ir->t = fins->t)));
 }
 }
I have several questions about all this...
- The quoted paragraph above mentions "skip-list chains" but AFAICT the code
 is just using a normal list, not a skip-list
 (http://en.wikipedia.org/wiki/Skip_list). Have I misunderstood something?
- I don't understand the "while (ref > lim)" condition, it looks like the
 list search is terminated early in some cases but I don't understand what
 those cases are... the IR is very dense and pithy code like that is hard
 for newcomers to understand.
- I tried implementing a simple list in Nanojit (used by Mozilla and Adobe),
 which currently uses a hash table for CSE. I tried it only on 32-bit
 immediates, and the performance was awful -- there can be dozens of
 different ones in a single block so searching through them linearly was
 much worse than doing a hashtable lookup, which usually hits in about 1.5
 lookups (albeit using a not-cheap hash function). But the quoted
 paragraph above says "average overhead is less than 1 lookup". What am I
 missing?
Any extra info about this would be much appreciated! Thanks.
Nick

AltStyle によって変換されたページ (->オリジナル) /