generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 186
v0.1.0 #296
kfswain
announced in
Announcements
v0.1.0
#296
-
API version: v1alpha1
We are excited to announce the v0.1.0 release of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs. Please note the following:
- Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
- Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.
Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!
What's Changed
- Owners addition by @kfswain in Owners addition #2
- proposed repo structure + copy of initial proposal by @kfswain in proposed repo structure + copy of initial proposal #1
- Repo structure by @kfswain in Repo structure #3
- Update OWNERS by @smarterclayton in Update OWNERS #6
- PoC implementation by @kfswain in PoC implementation #4
- Fix build for ext-proc example by @terrytangyuan in Fix build for ext-proc example #7
- Simplify POC installation by @liu-cong in Simplify POC installation #8
- docs: poc markdown improvements by @Xunzhuo in docs: poc markdown improvements #9
- fix: inconsistent secret key with deployment by @Xunzhuo in fix: inconsistent secret key with deployment #11
- Updating top level README by @kfswain in Updating top level README #13
- API Proposal by @kfswain in API Proposal #5
- Add initial ext proc implementation with LoRA affinity by @liu-cong in Add initial ext proc implementation with LoRA affinity #14
- Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug #17
- Envoy update by @kfswain in Envoy update #18
- CRD implementation by @kfswain in CRD implementation #20
- Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing #26
- Add priority based scheduling by @liu-cong in Add priority based scheduling #25
- Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in Update vllm deployment example to use 1 GPU as tensor parallelism is 1 #28
- Add a hermetic e2e test with fake backend pods by @liu-cong in Add a hermetic e2e test with fake backend pods #29
- Fix mutierr appending; add a unit test. by @liu-cong in Fix mutierr appending; add a unit test. #33
- Some minor fixes in Envoy setup by @liu-cong in Some minor fixes in Envoy setup #35
- Update targetModel in request body by @liu-cong in Update targetModel in request body #37
- Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. #39
- Simulation code for llm inference gateway by @kaushikmitr in Simulation code for llm inference gateway #15
- Add myself to approvers by @kfswain in Add myself to approvers #42
- Dynamic lora load/unload sidecar by @coolkp in Dynamic lora load/unload sidecar #31
- LLMServerPool Implementation by @kfswain in LLMServerPool Implementation #36
- Repo cleanup by @kfswain in Repo cleanup #46
- Updating API and generating code by @kfswain in Updating API and generating code #47
- Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in Do not fail Init if fetch metrics fails. It can recover gracefully. #51
- llmservice reconciler implementation by @kfswain in llmservice reconciler implementation #48
- Update README.md by @BenTheElder in Update README.md #52
- Fixing hermetic_test, small formatting changes by @kfswain in Fixing hermetic_test, small formatting changes #53
- Add myself to reviewers by @liu-cong in Add myself to reviewers #40
- Add dependency updates by @robert-cronin in Add dependency updates #57
- Bump the kubernetes group with 4 updates by @dependabot in Bump the kubernetes group with 4 updates #58
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 #61
- Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 #62
- Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in Bump github.com/prometheus/common from 0.55.0 to 0.60.1 #60
- Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in Bump google.golang.org/grpc from 1.65.0 to 1.68.0 #59
- Fixing Groupversion by @kfswain in Fixing Groupversion #63
- Integrating LLMService with weight splitting by @kfswain in Integrating LLMService with weight splitting #64
- Fix build and test by @liu-cong in Fix build and test #65
- Makefile fixes with generated output by @kfswain in Makefile fixes with generated output #67
- Manifest updates by @kaushikmitr in Manifest updates #81
- Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates #78
- Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized #79
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 #82
- Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 #83
- Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 #86
- Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 #84
- Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in Bump github.com/prometheus/common from 0.60.1 to 0.61.0 #85
- Proposal update for the API names and latency objective by @ahg-g in Proposal update for the API names and latency objective #91
- Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in Adding simple cloudbuild file that builds, tags, and pushes the docker image #94
- switch to using upstream vllm with new metric by @coolkp in switch to using upstream vllm with new metric #54
- Updating cloudbuild to have image name by @kfswain in Updating cloudbuild to have image name #106
- Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 #105
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 #102
- Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in Bump google.golang.org/grpc from 1.68.0 to 1.69.0 #103
- Bump the kubernetes group with 4 updates by @dependabot in Bump the kubernetes group with 4 updates #101
- Bump google.golang.org/protobuf from 1.35.2 to 1.36.0 by @dependabot in Bump google.golang.org/protobuf from 1.35.2 to 1.36.0 #104
- Change from SIG Apps to SIG Network by @terrytangyuan in Change from SIG Apps to SIG Network #92
- Add response body handler by @liu-cong in Add response body handler #90
- API Shift/Refactor by @kfswain in API Shift/Refactor #93
- API compliance fix and build fixes by @kfswain in API compliance fix and build fixes #114
- Added a verify rule to Makefile by @ahg-g in Added a verify rule to Makefile #122
- update the linter version by @ahg-g in update the linter version #123
- Disable response body processing by @liu-cong in Disable response body processing #121
- Adding initial docs infra by @robscott in Adding initial docs infra #118
- Lint fixes/updating .golangci to not use deprecated linter by @kfswain in Lint fixes/updating .golangci to not use deprecated linter #125
- fixing some lint errors by @ahg-g in fixing some lint errors #126
- Fix the make build command and add main tag to the latest image by @ahg-g in Fix the make build command and add main tag to the latest image #127
- Fixing the build command by @ahg-g in Fixing the build command #128
- Fixed the rest of the lint errors and updating the linters by @ahg-g in Fixed the rest of the lint errors and updating the linters #134
- Remove outdated configurations and ensure the tutorial runs smoothly by @Jeffwan in Remove outdated configurations and ensure the tutorial runs smoothly #136
- Update README: Add Hugging Face Token Setup Instructions and Improve Deployment Instructions by @yankay in Update README: Add Hugging Face Token Setup Instructions and Improve Deployment Instructions #139
- Updates APIs Based on Kubernetes API Conventions by @danehans in Updates APIs Based on Kubernetes API Conventions #143
- Fix InferencePoolReconciler by @MaYuan-02 in Fix InferencePoolReconciler #147
- Updating the boilerplate template and regenerating by @kfswain in Updating the boilerplate template and regenerating #156
- Bump github.com/envoyproxy/go-control-plane from 0.13.1 to 0.13.3 by @dependabot in Bump github.com/envoyproxy/go-control-plane from 0.13.1 to 0.13.3 #155
- Updating non-generated docs/ minor formatting by @kfswain in Updating non-generated docs/ minor formatting #160
- Bump github.com/onsi/ginkgo/v2 from 2.22.0 to 2.22.2 by @dependabot in Bump github.com/onsi/ginkgo/v2 from 2.22.0 to 2.22.2 #138
- Bump google.golang.org/grpc from 1.69.0 to 1.69.2 by @dependabot in Bump google.golang.org/grpc from 1.69.0 to 1.69.2 #133
- .*: change llm-instance-gateway -> gateway-api-inference-extension by @MadhavJivrajani in .*: change llm-instance-gateway -> gateway-api-inference-extension #161
- Changes InferencePool EPP Flags by @danehans in Changes InferencePool EPP Flags #152
- ext-proc: remove unused fields from EndpointSliceReconciler by @MadhavJivrajani in ext-proc: remove unused fields from EndpointSliceReconciler #165
- ext-proc/backend: add unit test for InferencePoolReconciler by @MadhavJivrajani in ext-proc/backend: add unit test for InferencePoolReconciler #168
- ext-proc: change Inference* APIs to use NamespacedName by @MadhavJivrajani in ext-proc: change Inference* APIs to use NamespacedName #172
- manifests: remove unused curl image from EPP manifest by @MadhavJivrajani in manifests: remove unused curl image from EPP manifest #180
- Add a few debug logs by @liu-cong in Add a few debug logs #179
- Adds Health gRPC Server and Refactors Main() by @danehans in Adds Health gRPC Server and Refactors Main() #148
- Adding some initial docs content and diagrams by @robscott in Adding some initial docs content and diagrams #129
- Fixing Netlify builds by @robscott in Fixing Netlify builds #195
- Bump google.golang.org/grpc from 1.69.2 to 1.69.4 by @dependabot in Bump google.golang.org/grpc from 1.69.2 to 1.69.4 #193
- Bump github.com/envoyproxy/go-control-plane/envoy from 1.32.2 to 1.32.3 by @dependabot in Bump github.com/envoyproxy/go-control-plane/envoy from 1.32.2 to 1.32.3 #194
- Bump sigs.k8s.io/controller-runtime from 0.19.3 to 0.19.4 by @dependabot in Bump sigs.k8s.io/controller-runtime from 0.19.3 to 0.19.4 #191
- Bump google.golang.org/protobuf from 1.36.1 to 1.36.2 by @dependabot in Bump google.golang.org/protobuf from 1.36.1 to 1.36.2 #192
- [v0.1 API Review] Grammatical fixes and TypedCondition creation/defaulting by @kfswain in [v0.1 API Review] Grammatical fixes and TypedCondition creation/defaulting #186
- Add logging guidelines by @liu-cong in Add logging guidelines #182
- dev: respect the IMAGE args in Makefile by @spacewander in dev: respect the IMAGE args in Makefile #205
- fix small typo by @LiorLieberman in fix small typo #206
- Adding metrics for request total, latency and size by @courageJ in Adding metrics for request total, latency and size #177
- Bump google.golang.org/protobuf from 1.36.2 to 1.36.3 by @dependabot in Bump google.golang.org/protobuf from 1.36.2 to 1.36.3 #209
- Bump github.com/prometheus/common from 0.61.0 to 0.62.0 by @dependabot in Bump github.com/prometheus/common from 0.61.0 to 0.62.0 #211
- Bump the kubernetes group across 1 directory with 5 updates by @dependabot in Bump the kubernetes group across 1 directory with 5 updates #212
- [v0.1 API Review] Cleaning up optional fields/clearer wording by @kfswain in [v0.1 API Review] Cleaning up optional fields/clearer wording #185
- Add link to meeting notes by @terrytangyuan in Add link to meeting notes #215
- Adding new maintainers by @robscott in Adding new maintainers #203
- Adding myself & ahg-g to owners directly while we figure out alias bug by @kfswain in Adding myself & ahg-g to owners directly while we figure out alias bug #218
- only print out pods & metrics when log level is DEBUG by @spacewander in only print out pods & metrics when log level is DEBUG #216
- Alias fix by @kfswain in Alias fix #220
- Separates EnvoyExtensionPolicy from Ext Proc by @danehans in Separates EnvoyExtensionPolicy from Ext Proc #200
- [Metrics] Add input/output token and request size metrics by @JeffLuoo in [Metrics] Add input/output token and request size metrics #214
- Update to k8s v0.32.0 and runtime to v0.20.0 by @ahg-g in Update to k8s v0.32.0 and runtime to v0.20.0 #226
- Slight cleanup of some of our readmes by @kfswain in Slight cleanup of some of our readmes #221
- Bump google.golang.org/grpc from 1.69.4 to 1.70.0 by @dependabot in Bump google.golang.org/grpc from 1.69.4 to 1.70.0 #231
- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 by @dependabot in Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 #232
- More Getting Started updates by @kfswain in More Getting Started updates #233
- Bump the kubernetes group with 5 updates by @dependabot in Bump the kubernetes group with 5 updates #228
- Bump sigs.k8s.io/controller-runtime from 0.20.0 to 0.20.1 by @dependabot in Bump sigs.k8s.io/controller-runtime from 0.20.0 to 0.20.1 #229
- Bump google.golang.org/protobuf from 1.36.3 to 1.36.4 by @dependabot in Bump google.golang.org/protobuf from 1.36.3 to 1.36.4 #230
- Adding logging const and updating usage by @kfswain in Adding logging const and updating usage #236
- Adds Initial e2e Tests and Tooling by @danehans in Adds Initial e2e Tests and Tooling #217
- Fixes gomega.Eventually() in e2e Test by @danehans in Fixes gomega.Eventually() in e2e Test #241
- Add Endpoint Picker Protocol Proposal by @liu-cong in Add Endpoint Picker Protocol Proposal #164
- Docker modification to support simple
docker buildby @kfswain in Docker modification to support simpledocker build#242 - Refactor ext-proc Main with Server Package Add Hermetic Test with k8s API Client for EPP by @BenjaminBraunDev in Refactor ext-proc Main with Server Package Add Hermetic Test with k8s API Client for EPP #222
- Requeue reconcile requests for endpointslice until the inferencepool is available by @ahg-g in Requeue reconcile requests for endpointslice until the inferencepool is available #248
- Repo cleanup by @ahg-g in Repo cleanup #255
- Fix e2e test by @ahg-g in Fix e2e test #246
- inferencemodel_reconciler: Fix a log message by @tchap in inferencemodel_reconciler: Fix a log message #261
- Populating api-types & concepts by @kfswain in Populating api-types & concepts #254
- Proposals cleanup by @ahg-g in Proposals cleanup #266
- InferencePool config proposal for API review by @ahg-g in InferencePool config proposal for API review #162
- Add TRACE log level for the metric refresh loop by @liu-cong in Add TRACE log level for the metric refresh loop #275
- Using constants for the repeated values by @adarshagrawal38 in Using constants for the repeated values #273
- Update default target-pod and inject it into response metadata by @ahg-g in Update default target-pod and inject it into response metadata #270
- [Metrics] Add grafana dashboard for Inference extension and vLLM metrics by @JeffLuoo in [Metrics] Add grafana dashboard for Inference extension and vLLM metrics #237
- Release setup by @ahg-g in Release setup #274
- Updating to the new gcr repo by @kfswain in Updating to the new gcr repo #279
- Replace EndpointSlice reconciler with pod list backed by informer by @ahg-g in Replace EndpointSlice reconciler with pod list backed by informer #271
- Bump github.com/envoyproxy/go-control-plane/envoy from 1.32.3 to 1.32.4 by @dependabot in Bump github.com/envoyproxy/go-control-plane/envoy from 1.32.3 to 1.32.4 #277
- feat: adds initial release automation by @danehans in feat: adds initial release automation #291
New Contributors
- @kfswain made their first contribution in Owners addition #2
- @smarterclayton made their first contribution in Update OWNERS #6
- @terrytangyuan made their first contribution in Fix build for ext-proc example #7
- @liu-cong made their first contribution in Simplify POC installation #8
- @Xunzhuo made their first contribution in docs: poc markdown improvements #9
- @kaushikmitr made their first contribution in Simulation code for llm inference gateway #15
- @coolkp made their first contribution in Dynamic lora load/unload sidecar #31
- @BenTheElder made their first contribution in Update README.md #52
- @robert-cronin made their first contribution in Add dependency updates #57
- @dependabot made their first contribution in Bump the kubernetes group with 4 updates #58
- @ahg-g made their first contribution in Proposal update for the API names and latency objective #91
- @robscott made their first contribution in Adding initial docs infra #118
- @Jeffwan made their first contribution in Remove outdated configurations and ensure the tutorial runs smoothly #136
- @yankay made their first contribution in Update README: Add Hugging Face Token Setup Instructions and Improve Deployment Instructions #139
- @MaYuan-02 made their first contribution in Fix InferencePoolReconciler #147
- @MadhavJivrajani made their first contribution in .*: change llm-instance-gateway -> gateway-api-inference-extension #161
- @spacewander made their first contribution in dev: respect the IMAGE args in Makefile #205
- @LiorLieberman made their first contribution in fix small typo #206
- @courageJ made their first contribution in Adding metrics for request total, latency and size #177
- @JeffLuoo made their first contribution in [Metrics] Add input/output token and request size metrics #214
- @BenjaminBraunDev made their first contribution in Refactor ext-proc Main with Server Package Add Hermetic Test with k8s API Client for EPP #222
- @tchap made their first contribution in inferencemodel_reconciler: Fix a log message #261
- @adarshagrawal38 made their first contribution in Using constants for the repeated values #273
Full Changelog: https://github.com/kubernetes-sigs/gateway-api-inference-extension/commits/v0.1.0
This discussion was created from the release v0.1.0.
Beta Was this translation helpful? Give feedback.
All reactions
-
🚀 1
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment