Object Benchmark Tests

wiki

This is an evaluation of object access time for various approaches to object orientation in Lua. The primary concern was raw performance speed, not memory use or object creation time.

The Code

-- Benchmarking support.
do
 local function runbenchmark(name, code, count, ob)
 local f = loadstring([[
 local count,ob = ...
 local clock = os.clock
 local start = clock()
 for i=1,count do ]] .. code .. [[ end
 return clock() - start
 ]])
 io.write(f(count, ob), "\t", name, "\n") 
 end
 local nameof = {}
 local codeof = {}
 local tests = {}
 function addbenchmark(name, code, ob)
 nameof[ob] = name
 codeof[ob] = code
 tests[#tests+1] = ob
 end
 function runbenchmarks(count)
 for _,ob in ipairs(tests) do
 runbenchmark(nameof[ob], codeof[ob], count, ob)
 end
 end
end
function makeob1()
 local self = {data = 0}
 function self:test() self.data = self.data + 1 end
 return self
end
addbenchmark("Standard (solid)", "ob:test()", makeob1())
local ob2mt = {}
ob2mt.__index = ob2mt
function ob2mt:test() self.data = self.data + 1 end
function makeob2()
 return setmetatable({data = 0}, ob2mt)
end
addbenchmark("Standard (metatable)", "ob:test()", makeob2())
function makeob3() 
 local self = {data = 0};
 function self.test() self.data = self.data + 1 end
 return self
end
addbenchmark("Object using closures (PiL 16.4)", "ob.test()", makeob3())
function makeob4()
 local public = {}
 local data = 0
 function public.test() data = data + 1 end
 function public.getdata() return data end
 function public.setdata(d) data = d end
 return public
end
addbenchmark("Object using closures (noself)", "ob.test()", makeob4())
addbenchmark("Direct Access", "ob.data = ob.data + 1", makeob1())
addbenchmark("Local Variable", "ob = ob + 1", 0)
runbenchmarks(select(1,...) or 100000000)

The Results (current version)

These are the results for the current version. All times are in user-mode CPU time in seconds (and sub-seconds if your OS supports it) for 100 million iterations (1e8, the default).

| 2010年01月15日 MikePall
| Intel Core2 Duo E8400 3.00GHz
| Linux x86, GCC 4.3.3 (-O2 -fomit-frame-pointer for both Lua and LuaJIT)
| Lua 5.1.4 (lua objbench.lua) vs.
| LuaJIT 1.1.5 (luajit -O objbench.lua) vs.
| LuaJIT 2.0.0-beta2 (lj2 objbench.lua)
Lua LJ1 LJ2
-----------------------------------------------------
14.08	2.16 0.1 Standard (solid)
14.92	4.62 0.1 Standard (metatable)
14.28	2.66 0.1 Object using closures (PiL 16.4)
 9.14	1.68 0.1 Object using closures (noself)
 7.30 1.10 0.1 Direct Access
 1.22	0.34 0.1 Local Variable
| 2008年04月16日 MikePall
| Intel Core2 Duo E6420 2.13GHz
| Linux x86, GCC 4.1.2 (-O3 -fomit-frame-pointer for both Lua and LuaJIT)
| Lua 5.1.3 (lua objbench.lua) vs. LuaJIT 1.1.4 (luajit -O objbench.lua)
17.93 3.11 Standard (solid)
20.36 6.25 Standard (metatable)
19.34 3.73 Object using closures (PiL 16.4)
12.76 2.23 Object using closures (noself)
 7.53 1.55 Direct Access
 2.59 0.47 Local Variable
| 2008年04月17日 LeonardoMaciel
| Intel Core 2 Duo T7200 2.00 GHz
| WinXP, MSVC9 (VS 2008)
| Lua 5.1.3 (using luavs.bat) vs. LuaJIT 1.1.4 (using luavs.bat)
| [NOTE: this measurement probably didn't use luajit -O]
17.52 10.78 Standard (solid)
19.74 12.55 Standard (metatable)
18.31 10.88 Object using closures (PiL 16.4)
14.20 5.09 Object using closures (noself)
 7.99 5.94 Direct Access
 1.70 0.41 Local Variable
| 2008年04月19日 DougCurrie
| Pentium M 2.00 GHz
| WinXP, GCC 3.4.5 (mingw special)
| Lua 5.1.3 (from wxLua build) vs. LuaJIT 1.1.4 (luajit -O objbench.lua)
28.68 4.76 Standard (solid)
31.23 9.49 Standard (metatable)
30.32 5.38 Object using closures (PiL 16.4)
19.60 3.27 Object using closures (noself)
12.47 2.26 Direct Access
 2.72 0.51 Local Variable
<--- Add your results here if you like.
<--- Please indicate the date and your name or wiki page.
<--- Add your CPU, OS, compiler and Lua and/or LuaJIT version.

The Results (old version)

These are the results for the old version, missing the "Local Variable" test and measuring elapsed time in seconds.

Windows XP SP2 Intel P4 1.8a
Standard (solid) Time: 34
Standard (metatable) Time: 37
Object using closures (PiL 16.4) Time: 40
Object using closures (noself) Time: 29
Direct Access Time: 19

Windows XP x64 SP1 AMD Athlon64 3500+ (64-bit Lua)
Standard (solid) Time: 22
Standard (metatable) Time: 23
Object using closures (PiL 16.4) Time: 25
Object using closures (noself) Time: 18
Direct Access Time: 11

Windows Vista Ultimate(32bit), AMD Athlon X2 4200+ (Vanilla Lua 5.1.1 / LuaJIT 1.1.2)
Standard (solid) Time: 26 / 11
Standard (metatable) Time: 29 / 15
Object using closures (PiL 16.4) Time: 30 / 12
Object using closures (noself) Time: 20 / 6
Direct Access Time: 13 / 8

Linux Xubuntu Feisty Fawn(32bit), Intel P4 Celeron 2.4ghz (Vanilla Lua 5.1.1)
Standard (solid) Time: 34
Standard (metatable) Time: 38
Object using closures (PiL 16.4) Time: 40
Object using closures (noself) Time: 25
Direct Access Time: 20

Windows XP Prof. SP2, Intel PIII 500mhz (Vanilla Lua 5.1.1 / LuaJIT 1.1.2)
Standard (solid) Time: 133 / 60
Standard (metatable) Time: 146 / 76
Object using closures (PiL 16.4) Time: 147 / 64
Object using closures (noself) Time: 99 / 32
Direct Access Time: 67 / 36

Conclusion

Direct Access to a local copy of a table is by far the fastest way to do things (as expected). This serves as a reference to the rest of them.

The noself method is the second fastest here. It relies strictly on closures and locals defined within that closure, returning nothing but a public interface. If the tests are modified to perform ten additions per method call, then it can exceed the speed of Direct Access since this reduces the overhead of the function call. The noself and PiL 16.4 methods are the only two that have any support for privately scoped variables.

Every other method is slower than standard direct access. The metatable method gums up the works even more with the extra lookup required but still has comparable speed to the PiL 16.4 method.

The method mentioned in PiL 16.4 adds a private scope advantage and uses a closure to store self, but this isn't nearly as fast as the optimization done in Lua for using a proper 'self'.

One last thing to note: An early version of the benchmarking code did not take a local reference to the objects (but indexed the global table). This affected everything by a few seconds, most of all the direct access--the time doubled.

Hope this helps -- AA

Please feel free to add your specs above if they would have some value. Please run them three times and use the average.

Object Benchmark Tests

The Code

The Results (current version)

The Results (old version)

Conclusion

See Also