Re: LuaJIT2 performance for number crunching

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LuaJIT2 performance for number crunching
From: KHMan <keinhong@...>
Date: 2011年2月16日 22:01:19 +0800

On 2/16/2011 9:36 PM, Francesco Abbate wrote:

2011年2月16日 KHMan<keinhong@gmail.com>:

Sorry to barge in, it is a worrying difference. One thing is bugging me: Is
the C code running SSE2? IIRC gcc -O2 does not normally enable SSE2.

Hmmm, I've to confess that I don't have a very deep knowledge of
SSE-related optimization flags. My approach was quite naive, I use
standard optimization flags like "-O2" or "-O2 -fomit-frame-pointer"
and I leave gcc doing his works. My idea is quite simple, I want to
compare optimized C code with LuaJIT2 and with "optimized" I just mean
"standard optimizations".
For the other side I guess your remark is good, to be completely fair
the benchmark should include the best possible optimization flags.
Probably I should use "-march=native", I believe this is activated by
default in ubuntu.

Always check gcc -v

Ubuntu is *very* conservative. My Ubuntu 8.04 vanilla gccinstallation is saying something about i486... I don't think theywill ever err on the side of native processor checks.

Otherwise there are some flags that may be you
should not activate with GSL to not degrade the accuracy. For example
I know that you cannot use -ffast-math and I don't know if you can use
-mfpmath=sse because, if I understood correctly, with SSE you dont
have the extra precision of 80-bit wide numbers and this can
potentially degrade the accuracy.

I am not familiar with GSL or the thing being benchmark so I can'tcomment on that. I only hope you can avoid making it fragile ifyou can -- anyhow lots of supercomputer people run BLAS or theGoto stuff and they seem to be happy with SSE*.IIRC, wasn't LuaJIT using SSE2 for floating point? (I haven'tchecked the sources, I'm not totally sure of this but I believeI've read it before.)

I can make some more tests to have a more fair benchmark but this is a
little bit outside of the scope of my simple benchmark.

It might mean that it would be hard to draw conclusions in anapples-to-oranges comparison, unless something more comparable isrunning, such as SSE2-and-SSE2, then there is less variation toconsider when drawing any useful conclusions from the exercise.Well gcc 4.5.x has autovectorizations and all that, but you'llnever get its benefits if you use the default i387. Granted, gccwon't be the greatest at those things compared to the Intelcompiler, but without enabling SSE2, I suspect you will get a widechasm if the library has to be hobbled with the x87 floatinstructions.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

References:
- LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Mike Pall
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Mike Pall
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Florian Weimer
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, KHMan
- Re: LuaJIT2 performance for number crunching, Francesco Abbate

Prev by Date: Re: Propsoal: a lua dialect without nil
Next by Date: Re: LuaJIT FFI: pointer<->lightuserdata ?
Previous by thread: Re: LuaJIT2 performance for number crunching
Next by thread: Re: LuaJIT2 performance for number crunching
Index(es):
- Date
- Thread