Link to the compiled function to improve performance #12182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

wxue1 wants to merge 1 commit into php:master

from wxue1:JITTED_Duplication

Open

Link to the compiled function to improve performance #12182

wxue1 wants to merge 1 commit into php:master from wxue1:JITTED_Duplication

Conversation

wxue1

Copy link

Contributor

@wxue1 wxue1 commented Sep 12, 2023

When JIT is recording, backtrack the trace if encountering a compiled inline function and link to this function later. This reduces the runtime compilation overhead and duplicated JITTed code. Smaller code size has better cache efficiency, which brings 1.7% performance gain in our benchmark on x86.

@wxue1 wxue1 requested review from dstogov and iluuu1994 as code owners

September 12, 2023 06:12

@github-actions github-actions bot added the Extension: opcache label

Sep 12, 2023

@stkeke

Copy link

Contributor

stkeke commented Sep 12, 2023

This patch has no conflict with #12079

@iluuu1994

Copy link

Member

iluuu1994 commented Sep 12, 2023

Benchmark shows a 1.59% regression for Zend/bench.php JIT. That benchmark is generally the most stable, so I would consider this legitimate. Symfony Demo and Wordpress show improvements (-0.65% and -0.07%, respectively).

@dstogov

Copy link

Member

dstogov commented Sep 12, 2023

Tracing over the already compiled function was done on purpose. This opens possibilities for new specializations and optimizations (similar to LuaJIT).

I'll take a look a bit later (probably next week). I think, the patch may be improved using a bit smarter heuristic - link to previous trace only if the trace of the inlined function become too long.

@wxue1

Copy link

Contributor Author

wxue1 commented Sep 12, 2023

Tracing over the already compiled function was done on purpose. This opens possibilities for new specializations and optimizations (similar to LuaJIT).

I'll take a look a bit later (probably next week). I think, the patch may be improved using a bit smarter heuristic - link to previous trace only if the trace of the inlined function become too long.

Maybe we can add a parameter to link to the previous trace only if the trace of the inlined function becomes too long.
Or maybe we can add a switch for this patch?

@dstogov

Copy link

Member

dstogov commented Sep 12, 2023

Maybe we can add a parameter to link to the previous trace only if the trace of the inlined function becomes too long. Or maybe we can add a switch for this patch?

Yeah. You can of course. You may add something like opcache.jit_trace_inline_limit or opcache.jit_inline_over_link_limit.

@wxue1 wxue1 force-pushed the JITTED_Duplication branch from 98d7480 to da96e5f Compare

September 13, 2023 11:07

@wxue1

Copy link

Contributor Author

wxue1 commented Sep 13, 2023

Maybe we can add a parameter to link to the previous trace only if the trace of the inlined function becomes too long. Or maybe we can add a switch for this patch?

Yeah. You can of course. You may add something like opcache.jit_trace_inline_limit or opcache.jit_inline_over_link_limit.

Yeah, I tried it in my experiments. The smaller value opcache.jit_trace_inline_limit

Maybe we can add a parameter to link to the previous trace only if the trace of the inlined function becomes too long. Or maybe we can add a switch for this patch?

Yeah. You can of course. You may add something like opcache.jit_trace_inline_limit or opcache.jit_inline_over_link_limit.

I update this ~

@wxue1 wxue1 force-pushed the JITTED_Duplication branch from 2da8d10 to 8ffa0e9 Compare

September 14, 2023 02:56

dstogov

dstogov reviewed

Sep 14, 2023

View reviewed changes

ext/opcache/jit/zend_jit_vm_helpers.c Outdated

Comment on lines 930 to 932

} else if ( idx > JIT_G(jit_trace_inline_func_limit) && \

backtrack_link_to_inline_func < 0 && \

(ZEND_OP_TRACE_INFO(opline, offset)->trace_flags & ZEND_JIT_TRACE_JITED)) {

Copy link

Member

@dstogov dstogov Sep 14, 2023 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you check idx improperly and in wrong place.
It should be checked in the next chunk, like

} else if (backtrack_link_to_inline_func > 0 && 
 idx - baktrack_link_to_inline_func > JIT_G(jit_trace_inline_func_limit)) {
 ...

Copy link

Member

@dstogov dstogov Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also don't use backslashes in multi-line if conditions.

Copy link

Member

@dstogov dstogov Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if we successfully inlined function into trace we should reset backtrack_link_to_inline_func

Copy link

Contributor Author

@wxue1 wxue1 Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that when the trace is too long and idx exceeds the limit value, we check whether the inline function has been compiled at the start of the inline function.
Do you mean to just judge the length of inline functions?

Copy link

Member

@dstogov dstogov Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that when the trace is too long and idx exceeds the limit value

Then the name opcache.jit_trace_inline_func_limit doesn't reflect what you are doing and you might stop tracing directly without "backtracking".

I think your idea is less obvious and efficient.
We should be able to form quite long traces with many short getters and setters inlined.

Do you mean to just judge the length of inline functions?

yes.

Copy link

Contributor Author

@wxue1 wxue1 Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear maintainer, Hope to get your reply~

Copy link

Member

@dstogov dstogov Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

				} else if (ZEND_OP_TRACE_INFO(opline, offset)->trace_flags & ZEND_JIT_TRACE_JITED) {
					backtrack_link_to_inline_func = idx;
					link_to_inline_func_opline = opline;
				} 
				if (backtrack_link_to_inline_func > 0 && 
				idx - baktrack_link_to_inline_func > JIT_G(jit_trace_inline_func_limit)) {
					break;
				}

It hard to say without a full patch.
something similar, but you do break without setting end_opline and stop. Do I miss something?

Copy link

Contributor Author

@wxue1 wxue1 Oct 10, 2023 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Let me have an update. Each time we enter a function when recording, we judge the length of the inline function. I get a 1% TPS gain on WordPress benchmark. I hope to get your review.

Copy link

Contributor Author

@wxue1 wxue1 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

				} else if (ZEND_OP_TRACE_INFO(opline, offset)->trace_flags & ZEND_JIT_TRACE_JITED) {
					backtrack_link_to_inline_func = idx;
					link_to_inline_func_opline = opline;
				} 
				if (backtrack_link_to_inline_func > 0 && 
				idx - baktrack_link_to_inline_func > JIT_G(jit_trace_inline_func_limit)) {
					break;
				}

It hard to say without a full patch. something similar, but you do break without setting end_opline and stop. Do I miss something?

I have updated the patch and how about that ?

Copy link

Member

@dstogov dstogov Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the patch and how about that ?

I'll able to review this only on Monday


 Link to the compiled function to improve performance

72f219b

When JIT is recording, backtrack the trace if encountering
a compiled inline function and link to this function later.
This reduces the runtime compilation overhead and duplicated
JITTed code. Smaller code size has better cache efficiency,
which brings 1.0% performance gain in our benchmark on x86.
Signed-off-by: Wang, Xue <xue1.wang@intel.com>
Signed-off-by: Yang, Lin A <lin.a.yang@intel.com>
Signed-off-by: Su, Tao <tao.su@intel.com>

@wxue1 wxue1 force-pushed the JITTED_Duplication branch from 8ffa0e9 to 72f219b Compare

October 10, 2023 06:42

dstogov

dstogov requested changes

Oct 16, 2023

View reviewed changes

Copy link

Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxue1 could you please test the behaviour of your patch

test.php

<?php
class Foo {
 private $x = 0, $y = 0;
 function getX() {
 return $this->x * $this->x + $this->y * $this->y;
 }
}
$o = new Foo();
for ($i = 0; $i < 10; $i++) {
 $o->getX($i);
}
?>

$ sapi/cli/php -d opcache.jit=1254 -d opcache.jit_hot_func=2 -d opcache.jit_hot_loop=2 -d opcache.jit_trace_inline_func_limit=3 -d opcache.jit_debug=0x80000 test.php
---- TRACE 1 TSSA start (loop) $main() /home/dmitry/php/php-master/CGI-RELEASE-64/test.php:9
 ;#0.CV0($o) [!undef, ref, rc1, rcn, any]
 ;#1.CV1($i) [undef, ref, rc1, rcn, any]
LOOP:
 ;#3.CV1($i) [!undef, ref, rc1, rcn, any] = Phi(#1.CV1($i) [undef, ref, rc1, rcn, any], #13.CV1($i) [undef, ref, rc1, rcn, any])
0009 #4.T2 [bool] = IS_SMALLER #3.CV1($i) [!undef, ref, rc1, rcn, any] int(10) ; op1(int)
0010 ;JMPNZ #4.T2 [bool] 0005
0005 INIT_METHOD_CALL 1 #0.CV0($o) [!undef, ref, rc1, rcn, any] string("getX") ; op1(object of class Foo)
 >init Foo::getX
0006 SEND_VAR_EX #3.CV1($i) [!undef, ref, rc1, rcn, any] -> #5.CV1($i) [!undef, ref, rc1, rcn, any] 1 ; op1(int)
0007 DO_FCALL
 >enter Foo::getX
0000 #6.T0 [!long] = FETCH_OBJ_R THIS string("x") ; val(int)
0001 #7.T2 [!long] = FETCH_OBJ_R THIS string("x") ; val(int)
0002 #8.T1 [!long] = MUL #6.T0 [!long] #7.T2 [!long] ; op1(int) op2(int)
0003 #9.T0 [!long] = FETCH_OBJ_R THIS string("y") ; val(int)
0004 #10.T3 [!long] = FETCH_OBJ_R THIS string("y") ; val(int)
0005 #11.T2 [!long] = MUL #9.T0 [!long] #10.T3 [!long] ; op1(int) op2(int)
0006 #12.T0 [!long] = ADD #8.T1 [!long] #11.T2 [!long] ; op1(int) op2(int)
0007 RETURN #12.T0 [!long] ; op1(int)
 <back /home/dmitry/php/php-master/CGI-RELEASE-64/test.php
0008 PRE_INC #5.CV1($i) [!undef, ref, rc1, rcn, any] -> #13.CV1($i) [undef, ref, rc1, rcn, any] ; op1(int)
---- TRACE 1 TSSA stop (loop)

Your patch is intended to limit inlining of function above specified length (3), but it doesn't do it (function of length 8 is inlined). What is wrong?

Since you propose this as a performance improvement, it would be great to see some benchmark results. I'll need repeat that benchmark sand rerun my own ones to confirm the improvement.

@wxue1

Copy link

Contributor Author

wxue1 commented Oct 16, 2023 •

edited

Loading

@wxue1 could you please test the behaviour of your patch

test.php

<?php
class Foo {
 private $x = 0, $y = 0;
 function getX() {
 return $this->x * $this->x + $this->y * $this->y;
 }
}
$o = new Foo();
for ($i = 0; $i < 10; $i++) {
 $o->getX($i);
}
?>

$ sapi/cli/php -d opcache.jit=1254 -d opcache.jit_hot_func=2 -d opcache.jit_hot_loop=2 -d opcache.jit_trace_inline_func_limit=3 -d opcache.jit_debug=0x80000 test.php
---- TRACE 1 TSSA start (loop) $main() /home/dmitry/php/php-master/CGI-RELEASE-64/test.php:9
 ;#0.CV0($o) [!undef, ref, rc1, rcn, any]
 ;#1.CV1($i) [undef, ref, rc1, rcn, any]
LOOP:
 ;#3.CV1($i) [!undef, ref, rc1, rcn, any] = Phi(#1.CV1($i) [undef, ref, rc1, rcn, any], #13.CV1($i) [undef, ref, rc1, rcn, any])
0009 #4.T2 [bool] = IS_SMALLER #3.CV1($i) [!undef, ref, rc1, rcn, any] int(10) ; op1(int)
0010 ;JMPNZ #4.T2 [bool] 0005
0005 INIT_METHOD_CALL 1 #0.CV0($o) [!undef, ref, rc1, rcn, any] string("getX") ; op1(object of class Foo)
 >init Foo::getX
0006 SEND_VAR_EX #3.CV1($i) [!undef, ref, rc1, rcn, any] -> #5.CV1($i) [!undef, ref, rc1, rcn, any] 1 ; op1(int)
0007 DO_FCALL
 >enter Foo::getX
0000 #6.T0 [!long] = FETCH_OBJ_R THIS string("x") ; val(int)
0001 #7.T2 [!long] = FETCH_OBJ_R THIS string("x") ; val(int)
0002 #8.T1 [!long] = MUL #6.T0 [!long] #7.T2 [!long] ; op1(int) op2(int)
0003 #9.T0 [!long] = FETCH_OBJ_R THIS string("y") ; val(int)
0004 #10.T3 [!long] = FETCH_OBJ_R THIS string("y") ; val(int)
0005 #11.T2 [!long] = MUL #9.T0 [!long] #10.T3 [!long] ; op1(int) op2(int)
0006 #12.T0 [!long] = ADD #8.T1 [!long] #11.T2 [!long] ; op1(int) op2(int)
0007 RETURN #12.T0 [!long] ; op1(int)
 <back /home/dmitry/php/php-master/CGI-RELEASE-64/test.php
0008 PRE_INC #5.CV1($i) [!undef, ref, rc1, rcn, any] -> #13.CV1($i) [undef, ref, rc1, rcn, any] ; op1(int)
---- TRACE 1 TSSA stop (loop)

Your patch is intended to limit inlining of function above specified length (3), but it doesn't do it (function of length 8 is inlined). What is wrong?

For this case where the function is only inlined once, this patch allows inlining.
When more functions are inlined, eg A JIT compiled FuncA calls FuncB, this patch splits this trace and links to the FuncA.

I know you want to backtrack to FuncA as long as FuncB is too long whether or not the function has been JITTed.
I have tried and the code is here. patch1
Patch1 has some bugs when JITTed code calls other JITTed code, and it is a little hard to debug. Could you help take a look?

Or maybe we could return to the original easy code? patch2

Actually, this patch "Link to the compiled function to improve performance" is different from the previous patch about JIT long inline functions ( PR #10897 ) WordPress JIT Memory 1212kb -> 1019kb

@wxue1

Copy link

Contributor Author

wxue1 commented Nov 1, 2023 •

edited

Loading

image

This patch wants to fix this duplicated compiled inline function problem. In the picture, the apply_filters has been JITTed before, but it is still inlined.
I found many duplicated "apply_filters" inline functions in our workload WordPress. What do you think about that ?

Labels

Extension: opcache

4 participants

@wxue1 @stkeke @iluuu1994 @dstogov

Link to the compiled function to improve performance #12182

Are you sure you want to change the base?

Link to the compiled function to improve performance #12182

Uh oh!

Conversation

@wxue1 wxue1 commented Sep 12, 2023

Uh oh!

stkeke commented Sep 12, 2023

Uh oh!

iluuu1994 commented Sep 12, 2023

Uh oh!

dstogov commented Sep 12, 2023

Uh oh!

wxue1 commented Sep 12, 2023

Uh oh!

dstogov commented Sep 12, 2023

Uh oh!

wxue1 commented Sep 13, 2023

Uh oh!

@dstogov dstogov Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@dstogov dstogov Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

@dstogov dstogov Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

@wxue1 wxue1 Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

@dstogov dstogov Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

@wxue1 wxue1 Oct 8, 2023

Choose a reason for hiding this comment

Uh oh!

@dstogov dstogov Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

@wxue1 wxue1 Oct 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@wxue1 wxue1 Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

@dstogov dstogov Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

@dstogov dstogov left a comment

Choose a reason for hiding this comment

Uh oh!

wxue1 commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wxue1 commented Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

@dstogov dstogov Sep 14, 2023 •

edited

Loading

@wxue1 wxue1 Oct 10, 2023 •

edited

Loading

wxue1 commented Oct 16, 2023 •

edited

Loading

wxue1 commented Nov 1, 2023 •

edited

Loading