Re: [Benchmark] Chain calls

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: [Benchmark] Chain calls
From: "Alexander Gladysh" <agladysh@...>
Date: 2008年11月24日 23:13:38 +0300

On Mon, Nov 17, 2008 at 3:38 PM, Peter Cawley <lua@corsix.org> wrote:
> It may be worth looking at the generated Lua opcodes for these benchmarks in
> order to easier see the differences in what is happening in each. For
> example, return true v.s return nil are loadbool,return vs. loadnil,return.
> Then looking at the VM code for these operations, either in C or as the
> assembled output of the C, might make it clearer. Of course, this won't help
> with explaining the luajit results, as it skips the VM when JITing.
Sorry for the late reply.
Opcode listing (via luac -l -l) is indeed very helpful. Chaining calls
use less resources, since they do not require extra MOVE opcodes:
local function chain_local()
 local chain = chain
 chain () () () () () () () () () () -- 10 calls
end
function <chaincallbench2.lua:9,12> (13 instructions, 52 bytes at 0x100fb0)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[10]	GETUPVAL 	0 0	; chain
	2	[11]	MOVE 	1 0
	3	[11]	CALL 	1 1 2
	4	[11]	CALL 	1 1 2
	5	[11]	CALL 	1 1 2
	6	[11]	CALL 	1 1 2
	7	[11]	CALL 	1 1 2
	8	[11]	CALL 	1 1 2
	9	[11]	CALL 	1 1 2
	10	[11]	CALL 	1 1 2
	11	[11]	CALL 	1 1 2
	12	[11]	CALL 	1 1 1
	13	[12]	RETURN 	0 1
Whereas plain_local and plain_chain_local both require MOVEs to get
function to call:
local function plain_local()
 local plain = plain
 plain ()
 ...
 plain () -- 10 calls
end
local function plain_chain_local()
 local chain = chain
 chain ()
 ...
 chain () -- 10 calls
end
function <chaincallbench2.lua:14,26> (22 instructions, 88 bytes at 0x101190)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[15]	GETUPVAL 	0 0	; plain
	2	[16]	MOVE 	1 0
	3	[16]	CALL 	1 1 1
	4	[17]	MOVE 	1 0
	5	[17]	CALL 	1 1 1
	6	[18]	MOVE 	1 0
	7	[18]	CALL 	1 1 1
	8	[19]	MOVE 	1 0
	9	[19]	CALL 	1 1 1
	10	[20]	MOVE 	1 0
	11	[20]	CALL 	1 1 1
	12	[21]	MOVE 	1 0
	13	[21]	CALL 	1 1 1
	14	[22]	MOVE 	1 0
	15	[22]	CALL 	1 1 1
	16	[23]	MOVE 	1 0
	17	[23]	CALL 	1 1 1
	18	[24]	MOVE 	1 0
	19	[24]	CALL 	1 1 1
	20	[25]	MOVE 	1 0
	21	[25]	CALL 	1 1 1
	22	[26]	RETURN 	0 1
function <chaincallbench2.lua:28,40> (22 instructions, 88 bytes at 0x101460)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[29]	GETUPVAL 	0 0	; chain
	2	[30]	MOVE 	1 0
	3	[30]	CALL 	1 1 1
	4	[31]	MOVE 	1 0
	5	[31]	CALL 	1 1 1
	6	[32]	MOVE 	1 0
	7	[32]	CALL 	1 1 1
	8	[33]	MOVE 	1 0
	9	[33]	CALL 	1 1 1
	10	[34]	MOVE 	1 0
	11	[34]	CALL 	1 1 1
	12	[35]	MOVE 	1 0
	13	[35]	CALL 	1 1 1
	14	[36]	MOVE 	1 0
	15	[36]	CALL 	1 1 1
	16	[37]	MOVE 	1 0
	17	[37]	CALL 	1 1 1
	18	[38]	MOVE 	1 0
	19	[38]	CALL 	1 1 1
	20	[39]	MOVE 	1 0
	21	[39]	CALL 	1 1 1
	22	[40]	RETURN 	0 1
Note that in versions without upvalue caching MOVE is replaced with
GETUPVAL. From a quick look to Lua code, MOVE *looks* a bit faster due
to less lookups:
 case OP_MOVE: {
 setobjs2s(L, ra, RB(i));
 continue;
 }
 case OP_GETUPVAL: {
 int b = GETARG_B(i);
 setobj2s(L, ra, cl->upvals[b]->v);
 continue;
 }
Still, the difference is in tenths of microseconds, and it looks like
both of my benchmark runs were with too little iterations to be
trusted (seconds in total time)...
Alexander.

Follow-Ups:
- Re: [Benchmark] Chain calls, KHMan

References:
- [Benchmark] Chain calls, Alexander Gladysh
- Re: [Benchmark] Chain calls, Florian Weimer
- Re: [Benchmark] Chain calls, Alexander Gladysh
- Re: [Benchmark] Chain calls, Alexander Gladysh
- Re: [Benchmark] Chain calls, Alexander Gladysh
- Re: [Benchmark] Chain calls, Peter Cawley

Prev by Date: Re: "attempt to compare number with userdata"
Next by Date: Re: FTP connection timeout
Previous by thread: Re: [Benchmark] Chain calls
Next by thread: Re: [Benchmark] Chain calls
Index(es):
- Date
- Thread