lua-users home
lua-l archive

Re: [Benchmark] Chain calls

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Nov 17, 2008 at 3:38 PM, Peter Cawley <lua@corsix.org> wrote:
> It may be worth looking at the generated Lua opcodes for these benchmarks in
> order to easier see the differences in what is happening in each. For
> example, return true v.s return nil are loadbool,return vs. loadnil,return.
> Then looking at the VM code for these operations, either in C or as the
> assembled output of the C, might make it clearer. Of course, this won't help
> with explaining the luajit results, as it skips the VM when JITing.
Sorry for the late reply.
Opcode listing (via luac -l -l) is indeed very helpful. Chaining calls
use less resources, since they do not require extra MOVE opcodes:
local function chain_local()
 local chain = chain
 chain () () () () () () () () () () -- 10 calls
end
function <chaincallbench2.lua:9,12> (13 instructions, 52 bytes at 0x100fb0)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[10]	GETUPVAL 	0 0	; chain
	2	[11]	MOVE 	1 0
	3	[11]	CALL 	1 1 2
	4	[11]	CALL 	1 1 2
	5	[11]	CALL 	1 1 2
	6	[11]	CALL 	1 1 2
	7	[11]	CALL 	1 1 2
	8	[11]	CALL 	1 1 2
	9	[11]	CALL 	1 1 2
	10	[11]	CALL 	1 1 2
	11	[11]	CALL 	1 1 2
	12	[11]	CALL 	1 1 1
	13	[12]	RETURN 	0 1
Whereas plain_local and plain_chain_local both require MOVEs to get
function to call:
local function plain_local()
 local plain = plain
 plain ()
 ...
 plain () -- 10 calls
end
local function plain_chain_local()
 local chain = chain
 chain ()
 ...
 chain () -- 10 calls
end
function <chaincallbench2.lua:14,26> (22 instructions, 88 bytes at 0x101190)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[15]	GETUPVAL 	0 0	; plain
	2	[16]	MOVE 	1 0
	3	[16]	CALL 	1 1 1
	4	[17]	MOVE 	1 0
	5	[17]	CALL 	1 1 1
	6	[18]	MOVE 	1 0
	7	[18]	CALL 	1 1 1
	8	[19]	MOVE 	1 0
	9	[19]	CALL 	1 1 1
	10	[20]	MOVE 	1 0
	11	[20]	CALL 	1 1 1
	12	[21]	MOVE 	1 0
	13	[21]	CALL 	1 1 1
	14	[22]	MOVE 	1 0
	15	[22]	CALL 	1 1 1
	16	[23]	MOVE 	1 0
	17	[23]	CALL 	1 1 1
	18	[24]	MOVE 	1 0
	19	[24]	CALL 	1 1 1
	20	[25]	MOVE 	1 0
	21	[25]	CALL 	1 1 1
	22	[26]	RETURN 	0 1
function <chaincallbench2.lua:28,40> (22 instructions, 88 bytes at 0x101460)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[29]	GETUPVAL 	0 0	; chain
	2	[30]	MOVE 	1 0
	3	[30]	CALL 	1 1 1
	4	[31]	MOVE 	1 0
	5	[31]	CALL 	1 1 1
	6	[32]	MOVE 	1 0
	7	[32]	CALL 	1 1 1
	8	[33]	MOVE 	1 0
	9	[33]	CALL 	1 1 1
	10	[34]	MOVE 	1 0
	11	[34]	CALL 	1 1 1
	12	[35]	MOVE 	1 0
	13	[35]	CALL 	1 1 1
	14	[36]	MOVE 	1 0
	15	[36]	CALL 	1 1 1
	16	[37]	MOVE 	1 0
	17	[37]	CALL 	1 1 1
	18	[38]	MOVE 	1 0
	19	[38]	CALL 	1 1 1
	20	[39]	MOVE 	1 0
	21	[39]	CALL 	1 1 1
	22	[40]	RETURN 	0 1
Note that in versions without upvalue caching MOVE is replaced with
GETUPVAL. From a quick look to Lua code, MOVE *looks* a bit faster due
to less lookups:
 case OP_MOVE: {
 setobjs2s(L, ra, RB(i));
 continue;
 }
 case OP_GETUPVAL: {
 int b = GETARG_B(i);
 setobj2s(L, ra, cl->upvals[b]->v);
 continue;
 }
Still, the difference is in tenths of microseconds, and it looks like
both of my benchmark runs were with too little iterations to be
trusted (seconds in total time)...
Alexander.

AltStyle によって変換されたページ (->オリジナル) /