Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimizes reducelanes in diversityCalculation of PQVectors, for Euclidean function#514

Open
shyla226 wants to merge 2 commits intodatastax:main from
intel-staging:reduce_lanes
Open

Optimizes reducelanes in diversityCalculation of PQVectors, for Euclidean function #514
shyla226 wants to merge 2 commits intodatastax:main from
intel-staging:reduce_lanes

Conversation

@shyla226
Copy link

@shyla226 shyla226 commented Aug 29, 2025
edited
Loading

This pull request introduces improvement to Euclidean similarity function in PQVectors.diversityFunctionFor. From flamegraph, it is observed that considerable amount of time is spent in jdk/incubator/vector/FloatVector.reducelanesTemplate. This is mainly because FloatVector.reducelanes() is expensive and it is being called inside a for loop (via VectorUtil.squareL2Distance). Modification in this pull request moves call to reduceLanes() outside the for loop.

Change proposed here was tested with the benchmark, PQDistanceCalculationBenchmark.diversityCalculation
With this benchmark, ~18% reduction in time was observed when M=64 and ~22% when M=192.

Code modifications:

  1. Added a new function pqDiversityEuclidean in VectorUtilSupport and its corresponding implementations
  2. Removed for loop in PQVectors.diversityFunctionFor and moved it into pqDiversityEuclidean
  3. Moved FloatVector.reducelanes() outside the for loop

Test setup:
Jvector version : main branch (as of 2025年08月28日)
JDK version : openjdk version "24.0.2" 2025年07月15日
Platform : INTEL(R) XEON(R) PLATINUM 8592+
Benchmark : PQDistanceCalculationBenchmark.diversityCalculation

New changes
I have modified the code to include dot product & cosine functions and implemented similar changes for scoreFunctionFor.
With the changes applied to scoreFunctionFor, when M=64: dot product shows ~30% reduction in latency & cosine shows ~43% reduction.
I can add data points for other subspace counts, if required.

Code modifications:

  1. Added new functions pqScoreEuclidean, pqScoreDotProduct, pqScoreCosine in VectorUtilSupport and its corresponding implementations for diversityFunctionFor
  2. Added overloaded version of above functions for scoreFunctionFor
  3. Removed for loop in PQVectors.diversityFunctionFor and PQVectors.scoreFunctionFor and moved them into respective functions in PanamaVectorUtilSupport
  4. Moved FloatVector.reducelanes() outside the for loop
  5. Added a new benchmark which uses MutablePQVectors to test this

Test setup:
Jvector version : main branch (as of 2025年10月22日)
JDK version : openjdk version "25.0.1" 2025年10月21日
Platform : Intel(R) Xeon(R) 6979P
Benchmark : PQDistanceCalculationMutableVectorBenchmark (Added this by replicating PQDistanceCalculationBenchmark to measure performance for MutablePQVectors)

Copy link
Contributor

This PR does successfully remove the calls to reduceLanes from the inside of the loop iterating over the subspaces the Euclidean case. I do have a concern about applying this pattern to one specific case but leaving the handling of other cases as is, leaving the code with 2 different implementations for the same problem. Some questions:

  1. Why is the optimization only implemented for the Euclidean case? In PanamaVectorUtilSupport, where the call to reduceLanes is made from the squareDistance methods which are called in a loop for the subspace count, likewise reduceLanes is called from the dotProduct methods. Why only fix one?

  2. Why is the optimization only applied to diversityFunctionFor(...) and not scoreFunctionFor(...) which uses the same pattern of calling squareL2Distance(...) inside of a loop for each subspace?

Copy link
Author

shyla226 commented Oct 6, 2025

@MarkWolters, thank you very much for the feedback. I can make the changes to dotProduct methods and to function calls in scoreFunctionFor()

Copy link
Contributor

@shyla226 please be aware that in order to pass the automated GitHub action regression test any pull request must come from a branch inside the datastax/jvector repository and not a fork of the repository. So while I am willing to review the PR from a fork and provide commentary, in order for it to pass the requirements for merging it will have to be a branch and not a fork.

Copy link
Author

@MarkWolters, Sounds good! I will create a branch

Copy link
Contributor

@shyla226 I don't think you'll have sufficient permissions to push a new branch to the repository. For now you can keep the changes on a fork and they can be reviewed there while we come up with a solution to this issue.

Copy link
Author

@MarkWolters , I have added new functions to include dotproduct & cosine for diversityFunction.
I have also added overloaded function for scoreFunction, for all the 3 cases

Copy link
Contributor

Thanks @shyla226, I will review the updated code asap

Copy link
Author

Thank you very much @MarkWolters . With the changes, when M=64, dot product shows ~30% reduction in latency & COSINE shows ~43% reduction

Copy link
Author

@MarkWolters sorry, accidentally closed it! I will update the test setup used in the description above.

Copy link
Author

@MarkWolters, I have addressed the issue with the 2 failed checks and tested the code changes on Windows server

Copy link
Contributor

@shyla226 apologies for not addressing this sooner. I have created a branch named reduce_lanes based off of the latest main. Please rebase your change on current main and open a pull request from your fork to the reduce_lanes branch. From the Github UI you should be able to select Contribute --> Open Pull Request and in the resulting screen select the branch you wish to merge to. From there we can run the internal github actions and regression tests and get your changes merged.

Copy link
Author

shyla226 commented Feb 3, 2026

@MarkWolters Thank you very much. I will rebase the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@MarkWolters MarkWolters Awaiting requested review from MarkWolters MarkWolters is a code owner

@marianotepper marianotepper Awaiting requested review from marianotepper

@jshook jshook Awaiting requested review from jshook jshook is a code owner

@tlwillke tlwillke Awaiting requested review from tlwillke tlwillke is a code owner

At least 2 approving reviews are required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /