Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[WIP] Add the streaming response to prefix cache #1730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zetxqx wants to merge 5 commits into kubernetes-sigs:main
base: main
Choose a base branch
Loading
from zetxqx:prefixcachestreaming

Conversation

@zetxqx
Copy link
Contributor

@zetxqx zetxqx commented Oct 15, 2025
edited
Loading

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces the capability to process streaming responses from the model server and add to the approximate prefix cache. It adds a new parser for streaming responses that can handle both chat and legacy completion formats.

  • Implemented a new streaming response parser in pkg/epp/scheduling/types/llmresponse.go.
  • Updated pkg/epp/handlers/response.go and pkg/epp/handlers/server.go to handle streaming responses.
  • Updated pkg/epp/requestcontrol/director.go to use the parsed LLMResponse.
  • Added and updated unit tests for the new functionality.

Which issue(s) this PR fixes:

Fixes #971

Does this PR introduce a user-facing change?:

feat: improve approximate prefix cache by putting the response into account especially benefit for multi-turn usecase.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 15, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 15, 2025
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zetxqx
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
Copy link

netlify bot commented Oct 15, 2025
edited
Loading

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 368d54f
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69054718c172c20008233b87
😎 Deploy Preview https://deploy-preview-1730--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 15, 2025
@zetxqx zetxqx force-pushed the prefixcachestreaming branch 2 times, most recently from 4857c71 to 1af87ef Compare October 16, 2025 05:53
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 16, 2025
@zetxqx zetxqx force-pushed the prefixcachestreaming branch 5 times, most recently from f0393b3 to 07d2e2a Compare October 17, 2025 06:40
Copy link
Contributor Author

zetxqx commented Oct 18, 2025

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@ahg-g ahg-g Awaiting requested review from ahg-g

@kfswain kfswain Awaiting requested review from kfswain

Assignees

No one assigned

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Prefix cache plugin should also add response to the cache

AltStyle によって変換されたページ (->オリジナル) /