lua-users home
lua-l archive

Re: LuaJIT2 performance for number crunching

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi all,
I have some new very interesting results. I've implemented a pure Lua
adaptive integration routine for numerical integration, the QAG
routine.
I've made an adaptation of the same routine implemented in the GSL
library which is based itself of the quadpack implementation. The
translation in Lua was quite straightforward. I've used also the
template engine but in this case it was not really essential.
For information, I've made also an adaptation of the non-adaptive
Gauss-Kronrod integration routine but for this latter I don't have any
benchmark.
Here the result of the benchmark for the QAG adaptive routine:
C code / GSL (-O2 -mfpmath=sse -march=native) : 0m0.891s
LuaJIT2 (git HEAD): 0m0.438
The results obtained with LuaJIT2 are also completely accurate.
So, once again LuaJIT2 confirms à factor 2x or even slightly more over
a conventional optimised C implementation. For the other side on my
Linux box the results are a little bit different (same benchmark as
before but with 8 times more points):
C code / GSL (-O2 -mfpmath=sse -march=native) : 0m1.281s
LuaJIT2 (git HEAD): 0m1.310
So on Linux the C optimized code is performing slightly better. May be
it is because the gsl library have better optimization or may be
because the GCC with ubuntu is better ?
In any case I consider the results completely satisfying because in
the worst case LuaJIT2 is performing almost at well as C optimized
code. It is also true that for the QAG algorithm I was not able to do
any significant optimization over the C implementations like I've done
with the ODE RKF45 algorithm.
Note also that in this case the routine was performing well since the
beginning. This is probably because the routine is simpler then the
ODE one in array form. I've also respected Mike's guidelines to help
the JIT compiler to produce optimal code.
I'm very happy of this result as this confirm yet again that LuaJIT2
is a viable and performant alternative for numerical computations.
I include the files in attachement for the curious, they work with a
plain LuaJIT2. The implementation files for the QAG algorithm are
qag.lua.in, err.lua.in and gauss-kronrod-x-wgs.lua.in . Then I include
also an updated version of the template module and the benchmark I've
used in Lua and C.
I will take this opportunity to talk a little bit about GSL shell, if
someone is interested. The latest fixes of LuaJIT2 are already
included in GSL shell. With GSL Shell you have also the realine
support plus some other goodies like a graphical systems, for example
;)
There is also another major improvements in GSL Shell because now it
is 100% compatible with Lua 5.1. I've managed to get rid of the LNUM
patch so now it can works both with standard Lua or with LuaJIT2. I
don't rely anymore in the axact behaviour of the GC for the weak
tables so I don't need any special patch or workaround. I've also
enabled a more clean modular architecture, now you can choose
LUA_BUILD in the makeconfig file and every functions will go in a
specific namespace without polluting the global namespace. I have a
namespace for GSL function, one for complex number and another one for
graphical functions. If you choose the LUA_BUILD option the 'short
function syntax' is not enabled so that Lua purists are happy :-)
For the complex number I've a userdata implementation based on LHF's
implementation (heavily modified). Complex number specific function
goes is a specific namespace to not pollute the global namespace.
In addition I've made some important improvements is the arithmetic of
matrix and complex numbers. Now the '*' operator perform the matrix
multiplication, instead of the elemet-wise multiplication. I think
that this latter behaviour is much more useful. The matrices are now
automatically promoted to complex when needed and the matrix functions
handle transparently both real and complex matrix. So now you can also
mix freely real and complex numbers with real and complex matrix and
GSL shell will do 'the right thing (R)' for you :-)
I will not probably release a new version of GSL Shell very soon
because there is a lot of work in progress with LuaJIT2 but if you
checkout the 'luajit2' branch it is already there and fully
functional. In the 'rk4' branch I have also all the numerical
algorithm I'm implementing right now.
Francesco
--
-- A Lua preprocessor for template code specialization.
-- Adapted by Steve Donovan, based on original code of Rici Lake.
--
local M = {}
-------------------------------------------------------------------------------
local function preprocess(chunk, name, defs)
 local function parseDollarParen(pieces, chunk, s, e)
 local append, format = table.insert, string.format
 local s = 1
 for term, executed, e in chunk:gmatch("()$(%b())()") do
	 append(pieces,
		format("%q..(%s or '')..", chunk:sub(s, term - 1), executed))
	 s = e
 end
 append(pieces, format("%q", chunk:sub(s)))
 end
 local function parseHashLines(chunk)
 local append = table.insert
 local pieces, s, args = chunk:find("^\n*#ARGS%s*(%b())[ \t]*\n")
 if not args or find(args, "^%(%s*%)$") then
	 pieces, s = {"return function(_put) ", n = 1}, s or 1
 else
	 pieces = {"return function(_put, ", args:sub(2), n = 2}
 end
 while true do
	 local ss, e, lua = chunk:find("^#+([^\n]*\n?)", s)
	 if not e then
	 ss, e, lua = chunk:find("\n#+([^\n]*\n?)", s)
	 append(pieces, "_put(")
	 parseDollarParen(pieces, chunk:sub(s, ss))
	 append(pieces, ")")
	 if not e then break end
	 end
	 append(pieces, lua)
	 s = e + 1
 end
 append(pieces, " end")
 return table.concat(pieces)
 end
 local ppenv
 if defs._self then
 ppenv = defs._self
 else
 ppenv = {string= string, table= table, template= M}
 for k, v in pairs(defs) do ppenv[k] = v end
 ppenv._self = ppenv
 local include = function(filename)
			 return M.process(filename, ppenv)
		 end
 setfenv(include, ppenv)
 ppenv.include = include
 end
 local code = parseHashLines(chunk)
 local fcode = loadstring(code, name)
 if fcode then
 setfenv(fcode, ppenv)
 return fcode()
 end
end
local function read_file(filename)
 local f = io.open(filename)
 if not f then 
 error(string.format('error opening template file %s', filename))
 end
 local content = f:read('*a')
 f:close()
 return content
end
local function process(filename, defs)
 local template = read_file(filename)
 local codegen = preprocess(template, 'template_gen', defs)
 local code = {}
 local add = function(s) code[#code+1] = s end
 codegen(add)
 return table.concat(code)
end
local function require(filename)
 local f = loadstring(process(filename .. '.lua.in', {}), filename)
 if not f then error(string.format('error loading module %s', filename)) end
 return f()
end
local function load(filename, defs)
 local f = loadstring(process(filename, defs), filename)
 if not f then error(string.format('error loading module %s', filename)) end
 return f()
end
M.process = process
M.require = require
M.load = load
return M
local template = require 'template'
local q = template.load('num/qag.lua.in', {limit=1000, order=21})
local sin, cos, pi = math.sin, math.cos, math.pi
local epsabs, epsrel = 1e-6, 0.01
function bessel_gen(n)
 local xs
 local fint = function(t) return cos(n*t - xs*sin(t)) end
 return function(x)
	 xs = x
	 return q(fint, 0, pi, epsabs, epsrel)
	 end
end
local J12 = bessel_gen(12)
-- p = graph.fxplot(J12, 0, 30*pi)
local xold, xsmp = -100, 10
for k = 1, 4096*8 do
 local x = (k-1) * 30 * math.pi / (4096*8)
 local y = J12(x);
 if x - xold > xsmp then
 print(x, y)
 xold = x
 end
end
#include <stdio.h>
#include <math.h>
#include <gsl/gsl_integration.h>
struct bessel_param {
 double x;
 int n;
};
double f (double t, void * params) {
 struct bessel_param *p = (struct bessel_param *) params;
 double x = p->x;
 int n = p->n;
 return cos(n * t - x * sin(t));
}
int
main (void)
{
 gsl_integration_workspace * ws = gsl_integration_workspace_alloc (1000);
 struct bessel_param param = {0.0, 12};
 double result, error;
 double xold = -100, xsmp = 10;
 int k;
 gsl_function F;
 F.function = &f;
 F.params = &param;
 for (k = 0; k < 4096 * 8; k++)
 {
 param.x = (k * 30 * M_PI) / (4096 * 8);
 gsl_integration_qag (&F, 0, M_PI, 1e-6, 1e-4, 1000, GSL_INTEG_GAUSS21, ws, &result, &error);
 if (param.x - xold > xsmp)
	{
	 xold = param.x;
	 printf ("%.18f %.18f\n", param.x, result);
	}
 }
 gsl_integration_workspace_free (ws);
 return 0;
}

Attachment: qag.lua.in
Description: Binary data

Attachment: err.lua.in
Description: Binary data

Attachment: gauss-kronrod-x-wgs.lua.in
Description: Binary data


AltStyle によって変換されたページ (->オリジナル) /