-
Notifications
You must be signed in to change notification settings - Fork 6.4k
8303762: Optimize vector slice operation with constant index using VPALIGNR instruction#24104
8303762: Optimize vector slice operation with constant index using VPALIGNR instruction #24104jatin-bhateja wants to merge 25 commits into
Conversation
...ALIGNR instruction
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@jatin-bhateja This change is no longer ready for integration - check the PR body for details.
@jatin-bhateja The following labels will be automatically applied to this pull request:
core-libsgraalhotspot
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
/label add hotspot-compiler-dev
@jatin-bhateja
The hotspot-compiler label was successfully added.
@jatin-bhateja This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
/keepalive
@jatin-bhateja The pull request is being re-evaluated and the inactivity timeout has been reset.
@jatin-bhateja this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout JDK-8303762 git fetch https://git.openjdk.org/jdk.git master git merge FETCH_HEAD # resolve conflicts and follow the instructions given by git merge git commit -m "Merge master" git push
@jatin-bhateja This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
3d09134 to
c0b9eea
Compare
c0b9eea to
607a8fc
Compare
jatin-bhateja
commented
Jul 25, 2025
Performance after AVX2 backend modifications
Benchmark (size) Mode Cnt Score Error Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 51644.530 ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 48171.079 ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9662.306 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 14358.347 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 14619.920 ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6675.824 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 818.911 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 4778.321 ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1612.264 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 35961.146 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 39072.170 ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 thrpt 2 11209.685 ops/ms
Webrevs
- 21: Full - Incremental (46fcc9ac)
- 20: Full - Incremental (c5950031)
- 19: Full - Incremental (2834a027)
- 18: Full - Incremental (ad7151e9)
- 17: Full - Incremental (121c40a8)
- 16: Full (bde0c216)
- 15: Full - Incremental (2b8f0b48)
- 14: Full - Incremental (9625b04e)
- 13: Full - Incremental (444a3568)
- 12: Full (1dfff558)
- 11: Full - Incremental (ae242926)
- 10: Full - Incremental (2c7eb96d)
- 09: Full (4b807102)
- 08: Full (9da1f862)
- 07: Full - Incremental (340f1849)
- 06: Full - Incremental (70c22932)
- 05: Full - Incremental (d55fa594)
- 04: Full - Incremental (f36ae6dd)
- 03: Full - Incremental (405de56f)
- 02: Full (e7c7374b)
- 01: Full - Incremental (04be59a6)
- 00: Full (b2e93434)
sviswa7
commented
Mar 31, 2026
@jatin-bhateja I agree with @iwanowww that the PR could be split into two: One handling the intrinsification failure/fallback handling and other with vector slice optimization for x86. That might help you to get reviews on this work. I volunteer to review the x86 PR. Order wise, the fallback PR would need to get in first though.
@jatin-bhateja I agree with @iwanowww that the PR could be split into two: One handling the intrinsification failure/fallback handling and other with vector slice optimization for x86. That might help you to get reviews on this work. I volunteer to review the x86 PR. Order wise, the fallback PR would need to get in first though.
Hi @sviswa7 ,
Almost all the fallback code apart from few (unslice, slice etc) use scalar operation loop to compute the result, a box created on caller side on account of failed intrinsic will not be unboxed on callee side i.e. fall back implementation. In this context, inlining the fallback will save the call overhead but not prevent boxing penalty or code bloating on callee side which may have other side effects. Which is why this PR selectively enables in lining of slice fallback which is composed of vector APIs and code for that is part of this pull request.
May I request you to kindly review the x86 backend implementation part of this pull request and share your feedback.
Best Regards
jatin-bhateja
commented
Apr 6, 2026
Hi @iwanowww , kindly let us know your comments on current implementation.
Hi @sviswa7 , your comments have been addressed, kindly verify
jatin-bhateja
commented
Apr 14, 2026
/template append
@jatin-bhateja The pull request template has been appended to the pull request body
@sviswa7
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x86 changes look good to me. You will need another review from compiler folks for the changes in the call generator to handle fallback.
@iwanowww
iwanowww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much simpler to review inlining-related and VectorSlice-related parts separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for a separate list? Why don't you perform fallback inlining when intrinsification attempt fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to give intrinsic another chance to succeed if it fails due to non-constant context on first attempt,
#24104 (comment)
Currently, if intrinsification fails then we set the generator for CallStaticJavaNode in Compile::inline_incrementally_one
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2108
Compile::inline_incrementally_cleanup called after Compile::inline_incrementally_one internally calls IGVN optimizations
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2213
CallStaticJavaNode idealization then re-injects the the failed intrinsic call node to _late_inlines list for another intrinsification attempt.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/callnode.cpp#L1175
If we inline the fallback on first intrinsification failure then we loose another opportunity to intrinsify, _vector_late_inlines collects such callgenerators and then once we are through with intrinsification attempts it inline the failed intrinsic calls towards the end on the lines of _string_late_inlines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but you could delay vector operation intrinsification until a full round of late inlining is over and then dispatch between intrinsic and fallback implementation.
Overall, I'm not fully satisfied with current implementation. Please, extract it in a separate PR and let's discuss it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iwanowww
This pull request performs partial intrinsification of slice API and if it does not succeed then we attempt inlining vector API based fallback implementation. moving compiler side change into a new PR will also involve factoring out Java side changes related to slice.
I agree with you that existing handling in CallGenerator::do_late_inline_helper is somewhat messy, I have cleaned up the handling for populating _vector_late_lines in the latest patch. Request your to kindly have a re-look at the change and let me know if this looks fine now.
Best Regards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to fix the handling of vector operation intrinsification failure first. Then vector slice intrinsic implementation can follow.
It's not specific to vector slice and all vector intrinsics are susceptible to the very same problem: when intrinsification fails, fallback implementation is not inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total number of required reviews for this PR has been set to 2 based on the presence of these labels: hotspot, hotspot-compiler. This can be overridden with the /reviewers command.
@jatin-bhateja This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@jatin-bhateja This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.
Uh oh!
There was an error while loading. Please reload this page.
Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction.
It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails.
Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java).
Vector API jtreg tests pass at AVX level 2, remaining validation in progress.
Performance numbers:
Please share your feedback.
Best Regards,
Jatin
Progress
Warning
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104$ git checkout pull/24104Update a local copy of the PR:
$ git checkout pull/24104$ git pull https://git.openjdk.org/jdk.git pull/24104/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24104View PR using the GUI difftool:
$ git pr show -t 24104Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24104.diff
Using Webrev
Link to Webrev Comment