Re: LuaJIT2 performance for number crunching
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: LuaJIT2 performance for number crunching
- From: KHMan <keinhong@...>
- Date: 2011年2月16日 22:01:19 +0800
On 2/16/2011 9:36 PM, Francesco Abbate wrote:
2011年2月16日 KHMan<keinhong@gmail.com>:
Sorry to barge in, it is a worrying difference. One thing is bugging me: Is
the C code running SSE2? IIRC gcc -O2 does not normally enable SSE2.
Hmmm, I've to confess that I don't have a very deep knowledge of
SSE-related optimization flags. My approach was quite naive, I use
standard optimization flags like "-O2" or "-O2 -fomit-frame-pointer"
and I leave gcc doing his works. My idea is quite simple, I want to
compare optimized C code with LuaJIT2 and with "optimized" I just mean
"standard optimizations".
For the other side I guess your remark is good, to be completely fair
the benchmark should include the best possible optimization flags.
Probably I should use "-march=native", I believe this is activated by
default in ubuntu.
Always check gcc -v
Ubuntu is *very* conservative. My Ubuntu 8.04 vanilla gcc
installation is saying something about i486... I don't think they
will ever err on the side of native processor checks.
Otherwise there are some flags that may be you
should not activate with GSL to not degrade the accuracy. For example
I know that you cannot use -ffast-math and I don't know if you can use
-mfpmath=sse because, if I understood correctly, with SSE you dont
have the extra precision of 80-bit wide numbers and this can
potentially degrade the accuracy.
I am not familiar with GSL or the thing being benchmark so I can't
comment on that. I only hope you can avoid making it fragile if
you can -- anyhow lots of supercomputer people run BLAS or the
Goto stuff and they seem to be happy with SSE*.
IIRC, wasn't LuaJIT using SSE2 for floating point? (I haven't
checked the sources, I'm not totally sure of this but I believe
I've read it before.)
I can make some more tests to have a more fair benchmark but this is a
little bit outside of the scope of my simple benchmark.
It might mean that it would be hard to draw conclusions in an
apples-to-oranges comparison, unless something more comparable is
running, such as SSE2-and-SSE2, then there is less variation to
consider when drawing any useful conclusions from the exercise.
Well gcc 4.5.x has autovectorizations and all that, but you'll
never get its benefits if you use the default i387. Granted, gcc
won't be the greatest at those things compared to the Intel
compiler, but without enabling SSE2, I suspect you will get a wide
chasm if the library has to be hobbled with the x87 float
instructions.
--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia
- References:
- LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Mike Pall
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Mike Pall
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Florian Weimer
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, Leo Razoumov
- Re: LuaJIT2 performance for number crunching, Francesco Abbate
- Re: LuaJIT2 performance for number crunching, KHMan
- Re: LuaJIT2 performance for number crunching, Francesco Abbate