Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

8303762: Optimize vector slice operation with constant index using VPALIGNR instruction#24104

Closed
jatin-bhateja wants to merge 25 commits into
openjdk:master from
jatin-bhateja:JDK-8303762
Closed

8303762: Optimize vector slice operation with constant index using VPALIGNR instruction #24104
jatin-bhateja wants to merge 25 commits into
openjdk:master from
jatin-bhateja:JDK-8303762

Conversation

@jatin-bhateja

@jatin-bhateja jatin-bhateja commented Mar 18, 2025
edited by openjdk Bot
Loading

Copy link
Copy Markdown
Member

Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction.
It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails.

Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java).

Vector API jtreg tests pass at AVX level 2, remaining validation in progress.

Performance numbers:


System : 13th Gen Intel(R) Core(TM) i3-1315U
Baseline:
Benchmark (size) Mode Cnt Score Error Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 thrpt 2 9756.573 ops/ms
With opt:
Benchmark (size) Mode Cnt Score Error Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 34122.435 ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 33281.868 ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9345.154 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 8283.247 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 8510.695 ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 5626.367 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 960.958 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 4155.801 ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1465.953 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 32748.061 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 33674.408 ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 thrpt 2 9346.148 ops/ms

Please share your feedback.

Best Regards,
Jatin



Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Warning

⚠️ Patch contains a binary file (test/jdk/java/util/jar/JarEntry/test.jar)

Issue

  • JDK-8303762: Optimize vector slice operation with constant index using VPALIGNR instruction (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104
$ git checkout pull/24104

Update a local copy of the PR:
$ git checkout pull/24104
$ git pull https://git.openjdk.org/jdk.git pull/24104/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24104

View PR using the GUI difftool:
$ git pr show -t 24104

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24104.diff

Using Webrev

Link to Webrev Comment

bridgekeeper Bot commented Mar 18, 2025

Copy link
Copy Markdown

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk Bot commented Mar 18, 2025
edited
Loading

Copy link
Copy Markdown

@jatin-bhateja This change is no longer ready for integration - check the PR body for details.

openjdk Bot commented Mar 18, 2025
edited
Loading

Copy link
Copy Markdown

@jatin-bhateja The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk Bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Mar 18, 2025

jatin-bhateja commented Mar 18, 2025
edited by bridgekeeper Bot
Loading

Copy link
Copy Markdown
Member Author

/label add hotspot-compiler-dev

@openjdk openjdk Bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 18, 2025

openjdk Bot commented Mar 18, 2025

Copy link
Copy Markdown

@jatin-bhateja
The hotspot-compiler label was successfully added.

bridgekeeper Bot commented May 13, 2025

Copy link
Copy Markdown

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

jatin-bhateja commented May 18, 2025
edited by bridgekeeper Bot
Loading

Copy link
Copy Markdown
Member Author

/keepalive

openjdk Bot commented May 18, 2025

Copy link
Copy Markdown

@jatin-bhateja The pull request is being re-evaluated and the inactivity timeout has been reset.

openjdk Bot commented May 18, 2025

Copy link
Copy Markdown

@jatin-bhateja this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8303762
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk Bot added the merge-conflict Pull request has merge conflict with target branch label May 18, 2025

bridgekeeper Bot commented Jul 13, 2025

Copy link
Copy Markdown

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper bridgekeeper Bot added oca Needs verification of OCA signatory status and removed oca Needs verification of OCA signatory status labels Jul 15, 2025
@openjdk openjdk Bot removed the merge-conflict Pull request has merge conflict with target branch label Jul 24, 2025

Copy link
Copy Markdown
Member Author

Performance after AVX2 backend modifications

Benchmark (size) Mode Cnt Score Error Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 51644.530 ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 48171.079 ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9662.306 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 14358.347 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 14619.920 ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6675.824 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 818.911 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 4778.321 ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1612.264 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 35961.146 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 39072.170 ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 thrpt 2 11209.685 ops/ms

@jatin-bhateja jatin-bhateja marked this pull request as ready for review July 25, 2025 13:40
@openjdk openjdk Bot added the rfr Pull request is ready for review label Jul 25, 2025

mlbridge Bot commented Jul 25, 2025
edited
Loading

Copy link
Copy Markdown

sviswa7 commented Mar 31, 2026

Copy link
Copy Markdown

@jatin-bhateja I agree with @iwanowww that the PR could be split into two: One handling the intrinsification failure/fallback handling and other with vector slice optimization for x86. That might help you to get reviews on this work. I volunteer to review the x86 PR. Order wise, the fallback PR would need to get in first though.

jatin-bhateja commented Apr 1, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

@jatin-bhateja I agree with @iwanowww that the PR could be split into two: One handling the intrinsification failure/fallback handling and other with vector slice optimization for x86. That might help you to get reviews on this work. I volunteer to review the x86 PR. Order wise, the fallback PR would need to get in first though.

Hi @sviswa7 ,
Almost all the fallback code apart from few (unslice, slice etc) use scalar operation loop to compute the result, a box created on caller side on account of failed intrinsic will not be unboxed on callee side i.e. fall back implementation. In this context, inlining the fallback will save the call overhead but not prevent boxing penalty or code bloating on callee side which may have other side effects. Which is why this PR selectively enables in lining of slice fallback which is composed of vector APIs and code for that is part of this pull request.

May I request you to kindly review the x86 backend implementation part of this pull request and share your feedback.

Best Regards

Copy link
Copy Markdown
Member Author

Hi @iwanowww , kindly let us know your comments on current implementation.

Comment thread src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated
Comment thread src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated
Comment thread src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated
Comment thread src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated
Comment thread src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated
Comment thread src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated
@openjdk openjdk Bot added the merge-conflict Pull request has merge conflict with target branch label Apr 7, 2026
@openjdk openjdk Bot removed the merge-conflict Pull request has merge conflict with target branch label Apr 7, 2026

jatin-bhateja commented Apr 7, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

Hi @sviswa7 , your comments have been addressed, kindly verify

Comment thread src/hotspot/cpu/x86/x86.ad Outdated
Comment thread src/hotspot/cpu/x86/x86.ad Outdated
Comment thread src/hotspot/cpu/x86/assembler_x86.cpp Outdated
Comment thread src/hotspot/cpu/x86/x86.ad Outdated
@openjdk openjdk Bot removed the rfr Pull request is ready for review label Apr 14, 2026

Copy link
Copy Markdown
Member Author

/template append

openjdk Bot commented Apr 14, 2026

Copy link
Copy Markdown

@jatin-bhateja The pull request template has been appended to the pull request body

@openjdk openjdk Bot added the rfr Pull request is ready for review label Apr 14, 2026

@sviswa7 sviswa7 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86 changes look good to me. You will need another review from compiler folks for the changes in the call generator to handle fallback.

jatin-bhateja reacted with thumbs up emoji
@openjdk openjdk Bot added the ready Pull request is ready to be integrated label Apr 15, 2026

@iwanowww iwanowww left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much simpler to review inlining-related and VectorSlice-related parts separately.

GrowableArray<CallGenerator*> _boxing_late_inlines; // same but for boxing operations

GrowableArray<CallGenerator*> _vector_reboxing_late_inlines; // same but for vector reboxing operations
GrowableArray<CallGenerator*> _vector_late_inlines; // inline fallback implementation for failed intrinsics

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for a separate list? Why don't you perform fallback inlining when intrinsification attempt fails?

@jatin-bhateja jatin-bhateja Apr 16, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to give intrinsic another chance to succeed if it fails due to non-constant context on first attempt,
#24104 (comment)

Currently, if intrinsification fails then we set the generator for CallStaticJavaNode in Compile::inline_incrementally_one
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2108

Compile::inline_incrementally_cleanup called after Compile::inline_incrementally_one internally calls IGVN optimizations
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2213

CallStaticJavaNode idealization then re-injects the the failed intrinsic call node to _late_inlines list for another intrinsification attempt.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/callnode.cpp#L1175

If we inline the fallback on first intrinsification failure then we loose another opportunity to intrinsify, _vector_late_inlines collects such callgenerators and then once we are through with intrinsification attempts it inline the failed intrinsic calls towards the end on the lines of _string_late_inlines.

https://github.com/openjdk/jdk/pull/24104/changes#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaR2252

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but you could delay vector operation intrinsification until a full round of late inlining is over and then dispatch between intrinsic and fallback implementation.

Overall, I'm not fully satisfied with current implementation. Please, extract it in a separate PR and let's discuss it there.

@jatin-bhateja jatin-bhateja Apr 17, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iwanowww
This pull request performs partial intrinsification of slice API and if it does not succeed then we attempt inlining vector API based fallback implementation. moving compiler side change into a new PR will also involve factoring out Java side changes related to slice.

I agree with you that existing handling in CallGenerator::do_late_inline_helper is somewhat messy, I have cleaned up the handling for populating _vector_late_lines in the latest patch. Request your to kindly have a re-look at the change and let me know if this looks fine now.

Best Regards

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to fix the handling of vector operation intrinsification failure first. Then vector slice intrinsic implementation can follow.

It's not specific to vector slice and all vector intrinsics are susceptible to the very same problem: when intrinsification fails, fallback implementation is not inlined.

@jatin-bhateja jatin-bhateja Apr 22, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iwanowww , As suggested I have created a new pull request #30876 for inlining on intrinsic failure.

openjdk Bot commented Apr 16, 2026

Copy link
Copy Markdown

The total number of required reviews for this PR has been set to 2 based on the presence of these labels: hotspot, hotspot-compiler. This can be overridden with the /reviewers command.

@openjdk openjdk Bot removed the ready Pull request is ready to be integrated label Apr 16, 2026

bridgekeeper Bot commented May 21, 2026

Copy link
Copy Markdown

@jatin-bhateja This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk openjdk Bot added the merge-conflict Pull request has merge conflict with target branch label Jun 4, 2026

bridgekeeper Bot commented Jun 19, 2026

Copy link
Copy Markdown

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

5 more reviewers

@iwanowww iwanowww iwanowww left review comments

@erifan erifan erifan approved these changes

@merykitty merykitty merykitty left review comments

@XiaohongGong XiaohongGong XiaohongGong approved these changes

@sviswa7 sviswa7 sviswa7 approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /