Re: LuaJIT2 performance for number crunching

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LuaJIT2 performance for number crunching
From: T T <t34www@...>
Date: 2011年2月17日 14:02:51 +0000

On 17 February 2011 10:52, steve donovan <steve.j.donovan@gmail.com> wrote:
> On Thu, Feb 17, 2011 at 12:26 PM, T T <t34www@googlemail.com> wrote:
>> I can't understand why so many people here are so fixated on this
>> templating idea without exploring simpler possibilities.  Have you
>> missed my example with unrolling?
>
> Sorry, I did look at it now.  So the idea is that there are a few
> predefined dimensions, which are 'conditionally' unrolled.
>
> So the dim >= 2  etc tests are pretty efficient? That is cool...
>From what I understood, the problem really comes down to small loops,
which are not jitted at all. Since, conditionals will get jitted, you
won't get that much of a performance hit with them.
> Naturally, I think in terms of program textual transformation, as one
> way to bridge the gap between desired code style and the current needs
> of the optimizer.  That manual unrolling seems like a good candidate
> for automatic transformation. As Agent Smith once said, don't send in
> a human to do a machine's job.
My thoughts exactly, but maybe it's not that simple. Anyway, I didn't
do this unrolling completely manually. I used luapp.lua (
http://lua-users.org/wiki/SlightlyLessSimpleLuaPreprocessor ) with a
template (attached, if anyone wants to play with it further; I think
it makes for a nice test case).
> OK, it could be argued that writing good numerical code is hard, but
> only needs to be written once, and then put into a library... so I do
> see the point.  Text templates with [[...]] are not ideal, you lose
> syntax highlighting and compile-time checking.
Not to mention that it takes more brain power to develop such a
templated code (it took me way more time for this simple example than
I expected) and, once you hand over your code, others have to follow
all these idiosyncrasies.
Cheers,
Tomek

# MAXUNROLL = tonumber(os.getenv('MAXUNROLL')) or 4
#
# LPP = require('luapp').preprocess
#
# function loop_unroll(CODE)
# local prefix = string.match(CODE, "%s+") or ""
# local _L = {MAXUNROLL=MAXUNROLL}
# CODE = string.gsub(CODE, "%[i%]", "[$(i)]")
# _L.i = 'i'
# _put(
# prefix, "", LPP{ input="if dim > $(MAXUNROLL) then", output='string', lookup=_L }, "\n",
# prefix, " ", LPP{ input= "for $(i) = 1,dim do", output='string', lookup=_L }, "\n",
# prefix, " ", LPP{ input=CODE, output='string', lookup=_L }, "\n",
# prefix, " end", "\n",
# prefix, "else", "\n")
# for i = 1,MAXUNROLL do
# _L.i = i
# _put(
# prefix, " ", LPP{ input="if dim >= $(i) then", output='string', lookup=_L }, "\n",
# prefix, " ", LPP{ input=CODE, output='string', lookup=_L }, "\n",
# prefix, " end", "\n")
# end
# _put(
# prefix, "end", "\n")
# end
use = 'Lua'
if use == 'FFI' then
 ffi = require 'ffi'
 darray = ffi.typeof("double[?]")
elseif use == 'GSL' then
 darray = function(n) return new(n, 1) end
else
 darray = function(n) local t = {}; for k = 1,n do t[k] = 0; end; return t end
end
rk4 = {}
function rk4.new(n)
 local s = {
 k= darray(n+1), 
 k1= darray(n+1),
 y0= darray(n+1),
 ytmp= darray(n+1),
 y_onestep= darray(n+1),
 dim = n
 }
 return s
end
function rk4.step(y, state, h, t, sys)
 -- Makes a Runge-Kutta 4th order advance with step size h.
 local dim = state.dim
 local f = sys.f
 -- initial values of variables y.
 local y0 = state.y0
 
 -- work space 
 local ytmp = state.ytmp
 -- Runge-Kutta coefficients. Contains values of coefficient k1
 -- in the beginning 
 local k = state.k
 -- k1 step 
# loop_unroll( " y[i] = y[i] + h / 6 * k[i]; ytmp[i] = y0[i] + 0.5 * h * k[i]" )
 
 -- k2 step
 f(t + 0.5 * h, ytmp, k)
# loop_unroll( " y[i] = y[i] + h / 3 * k[i]; ytmp[i] = y0[i] + 0.5 * h * k[i]" )
 -- k3 step 
 f(t + 0.5 * h, ytmp, k)
# loop_unroll( " y[i] = y[i] + h / 3 * k[i]; ytmp[i] = y0[i] + 0.5 * h * k[i]" )
 -- k4 step 
 f(t + h, ytmp, k)
# loop_unroll( " y[i] = y[i] + h / 6 * k[i]" )
end
function rk4.apply(state, t, h, y, yerr, dydt_in, dydt_out, sys)
 local f, dim = sys.f, state.dim
 local k, k1, y0, y_onestep = state.k, state.k1, state.y0, state.y_onestep
# loop_unroll( " y0[i] = y[i]" )
 if dydt_in then 
# loop_unroll( " k[i] = dydt_in[i]" )
 else 
 f(t, y0, k)
 end
 -- Error estimation is done by step doubling procedure 
 -- Save first point derivatives
# loop_unroll( " k1[i] = k[i]" )
 -- First traverse h with one step (save to y_onestep) 
# loop_unroll( " y_onestep[i] = y[i]" )
 rk4.step (y_onestep, state, h, t, sys)
 -- Then with two steps with half step length (save to y) 
# loop_unroll( " k[i] = k1[i]" )
 rk4.step(y, state, h/2, t, sys)
 -- Update before second step 
 f(t + h/2, y, k)
 
 -- Save original y0 to k1 for possible failures 
# loop_unroll( " k1[i] = y0[i]" )
 -- Update y0 for second step 
# loop_unroll( " y0[i] = y[i]" )
 rk4.step(y, state, h/2, t + h/2, sys)
 -- Derivatives at output
 if dydt_out then f(t + h, y, dydt_out) end
 
 -- Error estimation
 --
 -- yerr = C * 0.5 * | y(onestep) - y(twosteps) | / (2^order - 1)
 --
 -- constant C is approximately 8.0 to ensure 90% of samples lie within
 -- the error (assuming a gaussian distribution with prior p(sigma)=1/sigma.)
# loop_unroll( " yerr[i] = 4 * (y[i] - y_onestep[i]) / 15" )
end
function f_ode1(t, y, dydt)
 local p, q = y[1], y[2]
 dydt[1] = - q - p^2
 dydt[2] = 2*p - q^3
end
t0, t1, h0 = 0, 200, 0.001
function do_rk(p0, q0, sample, dim)
-- local dim = tonumber(os.getenv('dim') or 2)
 local state = rk4.new(dim)
 local y, dydt, yerr = darray(dim+1), darray(dim+1), darray(dim+1)
 local sys = {f = f_ode1}
 y[1], y[2] = p0, q0
 local t = t0
 local tsamp = t0
 rk4.apply(state, t, h0, y, yerr, nil, dydt, sys)
 t = t + h0
 while t < t1 do
 rk4.apply(state, t, h0, y, yerr, dydt, dydt, sys)
 t = t + h0
 if sample and t - tsamp > sample then
 print(t, y[1], y[2])
 tsamp = t
 end
 end
 print(t, y[1], y[2])
end
for k=1, 10 do
 local th = math.pi/4 -- *(k-1)/5
 local p0, q0 = math.cos(th), math.sin(th)
 local dim = tonumber(os.getenv('dim') or 2)
 do_rk(p0, q0, sample, dim)
end

References:
- Re: LuaJIT2 performance for number crunching, Florian Weimer
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, T T
- Re: LuaJIT2 performance for number crunching, Daurnimator
- Re: LuaJIT2 performance for number crunching, Mike Pall
- Re: LuaJIT2 performance for number crunching, Philippe Lhoste
- Re: LuaJIT2 performance for number crunching, Alexander Gladysh
- Re: LuaJIT2 performance for number crunching, Miles Bader
- Re: LuaJIT2 performance for number crunching, T T
- Re: LuaJIT2 performance for number crunching, steve donovan
- Re: LuaJIT2 performance for number crunching, T T
- Re: LuaJIT2 performance for number crunching, steve donovan

Prev by Date: Re: [5.1.4] Can't navigate in interpreter
Next by Date: Re: Lua Workshop 2011
Previous by thread: Re: LuaJIT2 performance for number crunching
Next by thread: Re: LuaJIT2 performance for number crunching
Index(es):
- Date
- Thread