Stack ILP issues (Was: Slow recursive functions
Ian Rogers
ian.rogers@manchester.ac.uk
Wed Aug 17 09:06:00 GMT 2005
Hi,
The best place to read up on this are the Intel and AMD optimization
guides. Intel can complete two reads and one write per clock cycle. The
Jikes RVM (on which I'm working) has a naive baseline compiler that
extensively uses stack operations. The sequence of iload, iload, iadd,
istore becomes 9 memory operations and the one write completion rule
means that effectively the baseline compiler may be only achieving one
bytecode (multiple X86 instructions) per clock tick. Intel has store
forwarding, so the sequence of push, pop needn't block the pipelines as
much as it might. Anyway the optimization guides are at:
http://www.intel.com/design/pentium4/manuals/248966.htm
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
Regards,
Ian Rogers
Mladen Adamovic wrote:
>Andrew Haley wrote on 2005年4月15日 11:43:39 +0100 :
>>(the original thread was about bad performance in recursive functions -
>Akerman test ran slow)
>>Andrew> We use the x86 system calling
>Andrew> convention throughout gcj, and this is slower than passing args in
>Andrew> registers. There's also the possibility that some JITs might be
>Andrew> optimized for this kind of benchmark.
>>I guess this means that stack functions push, pop etc. don't exploit ILP.
>I found that it might be true
>http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/dsd/2004/2203/00/2203toc.xml&DOI=10.1109/DSD.2004.1333267
>>But real performance issue is that STACK instructions have AFA I remember 8%
>in SPEC2000 tests. I don't have book "Modern computer design" at the moment to
>check the real percentage.
>>I think that for compilers the easiest way to compile expresssions like
>(expr1 * expr2) / expr3
>is extensivly using stack.
>>So, way to awoid using stack might be important performance issue.
>MOV is better idea because it can exploit ILP better.
>Somebody might check the stack performance in x86-64.
>>Anyway, can somebody of developers say which techniques did they use to
>exploit ILP in GCJ?
>>Also, maybe good idea will be to ask in gcc mailing list about stack ilp issues?
>>Speed of gcj might be important because JVM have awfull performance in matrix
>multiplication and nested loops. Probably they don't do compiler techniques to
>exploit ILP for loops. JVM language was done in 1995 so maybe they will have a
>lot problems with ILP and TLP in the future because in 1995 just few people
>think about that.
>>I'm new here, I'm graduate student a bit involved in ILP and compiler issues.
>If you think that I can help somehow in gcj development in these issues you
>can let me know.
>>>Mladen Adamovic
>home page: http://home.blic.net/adamm
>>
More information about the Java
mailing list