Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JunmooByun wants to merge 2 commits into triton-inference-server:main
base: main
Choose a base branch
Loading
from JunmooByun:feat/tokenizer-override

Conversation

@JunmooByun
Copy link

@JunmooByun JunmooByun commented Jul 31, 2025

What does the PR do?

Support per-model tokenizer override when using Triton + vLLM in OpenAI-compatible mode.

This PR introduces HF_MODEL_NAME_MAP to associate custom model names with their corresponding Hugging Face model identifiers. During model registration, if a mapping is found, the tokenizer is loaded accordingly; otherwise, the system falls back to the default tokenizer.

This enables true multi-model serving in scenarios where each model may require a different tokenizer — something not possible with the previous global --tokenizer option.


Checklist

  • I have read the Contribution guidelines and signed the Contributor License Agreement
  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • I ran pre-commit locally (pre-commit install, pre-commit run --all)
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

  • feat

Related PRs:


Where should the reviewer start?

  • python/openai/openai_frontend/engine/triton_engine.py: Tokenizer override logic introduced here.

Test plan:

  • Ran frontend with:
    python3 openai_frontend/main.py --model-repository tests/vllm_models

@JunmooByun JunmooByun marked this pull request as draft August 4, 2025 01:19
@JunmooByun JunmooByun marked this pull request as ready for review August 4, 2025 01:29
Copy link
Author

This PR was created from a forked repository.

  • The branch has been updated to the latest main.
  • Currently, workflow approval and a code review are required.

Could you please:

  1. Approve and run the workflows
  2. Review and approve the PR

Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /