- 
  Notifications
 You must be signed in to change notification settings 
- Fork 1.7k
Pull requests: triton-inference-server/server
Pull requests list
 Removing unused values for TensorRT-LLM container build
 
 
 
 #8472
 opened Oct 23, 2025  by
 mc-nv
 
 
 
 
 
 
 Loading...
 
 
 
 
 
 ci: Add support for Changes to our CI configuration files and scripts 
 
 
max_inflight_requests parameter to prevent unbounded memory growth in ensemble models
 
 
 PR: ci
 
 
 #8458
 opened Oct 13, 2025  by
 pskiran1
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 7 of 20 tasks
 
 
 
 feat: Add Hermes tool call parser for OpenAI API
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8456
 opened Oct 12, 2025  by
 amit-timalsina
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 11 of 12 tasks
 
 
 
 Feat: revamp build.py CLI to improve usability and maintainability
 
 
 #8437
 opened Oct 2, 2025  by
 kpedro88
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 9 of 22 tasks
 
 
 
 feat: Minor improvements to build.py
 
 
 build
 Issues pertaining to builds 
 
 enhancement
 New feature or request 
 
 
 
 #8362
 opened Aug 19, 2025  by
 kpedro88
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 6 of 22 tasks
 
 
 
 fix: WAR for Python CUDA library unknown race condition
 
 
 PR: fix
 A bug fix 
 
 
 
 
 
 
 #8360
 opened Aug 19, 2025  by
 GuanLuo
 
 
 
 
 
 
 Loading...
 
 
 
 
 
 feat: Added --build_variant flag for cpu only build.
 
 
 #8329
 opened Aug 5, 2025  by
 Sunidhi-Gaonkar1
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 4 of 22 tasks
 
 
 
 feat: add parameters in onprem k8s chart (volume, resources & env. variables)
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8324
 opened Aug 1, 2025  by
 vladmirtxrx
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 3 of 22 tasks
 
 
 
 Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible
 
 
 #8321
 opened Jul 31, 2025  by
 JunmooByun
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 11 of 13 tasks
 
 
 
 docs: Fix typos and grammar issues in markdown files
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8306
 opened Jul 23, 2025  by
 cluster2600
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 12 of 13 tasks
 
 
 
 fix: Fix the server runtime errors on cpu only platform and with pytorch backend
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8272
 opened Jun 27, 2025  by
 snadampal
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 6 of 21 tasks
 
 
 
 docs: fix capitalization of Triton Inference Server
 
 
 #8252
 opened Jun 13, 2025  by
 ShriyashP
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 5 of 13 tasks
 
 
 
 feat: Add guided decoding support to OpenAI frontend
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8245
 opened Jun 11, 2025  by
 pei0033
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 7 of 22 tasks
 
 
 
 docs: update the link formats for additional security networking guides
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8229
 opened Jun 2, 2025  by
 xander-aphe-hatschi
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 22 tasks
 
 
 
 refactor: replace tf model with onnx model for L0_response_cache
 
 
 
 #8114
 opened Apr 2, 2025  by
 ziqifan617
 
 
 
 
 •
 
 Draft
 
 
 
 
 
 
 [build]: Add rt_base_image parameter to differentiate triton build base image and runtime base image
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8064
 opened Mar 12, 2025  by
 nv-tusharma
 
 
 
 
 •
 
 Draft
 
 
 
  
 
 
 
 
 
 
 
 5 of 20 tasks
 
 
 
 feat: GRPC Callback API migration for Non Inference
 
 
 
 
 
 
 
 
 
 
 
 
 
 #8062
 opened Mar 11, 2025  by
 indrajit96
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 7 of 20 tasks
 
 
 
 Build: Build using the PA binaries and whl if available. 
 
 
 #8043
 opened Feb 27, 2025  by
 pvijayakrish
 
 
 
 
 
 
 Loading...
 
 
  
 
 
 
 
 
 
 
 
 
 8 of 20 tasks
 
 
 
 
 ProTip!
 Type g i on any issue or pull request to go back to the issue listing page.
 
 
 
 You can’t perform that action at this time.