GitHub Copilot
Write better code with AI

GitHub Spark New
Build and deploy intelligent apps

GitHub Models New
Manage and compare prompts

GitHub Advanced Security
Find and fix vulnerabilities

Actions
Automate any workflow
Codespaces
Instant dev environments

Issues
Plan and track work

Code Review
Manage code changes

Discussions
Collaborate outside of code

Code Search
Find more, search less
Explore

Why GitHub

Documentation

GitHub Skills

Blog
Integrations

GitHub Marketplace

MCP Registry
View all features
By company size

Enterprises

Small and medium teams

Startups

Nonprofits
By use case

App Modernization

DevSecOps

DevOps

CI/CD

View all use cases
By industry

Healthcare

Financial services

Manufacturing

Government

View all industries
View all solutions
Topics

AI

DevOps

Security

Software Development

View all
Explore

Learning Pathways

Events & Webinars

Ebooks & Whitepapers

Customer Stories

Partners

Executive Insights
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories

Topics

Trending

Collections
Enterprise platform
AI-powered developer platform
Available add-ons

GitHub Advanced Security
Enterprise-grade security features

Copilot for business
Enterprise-grade AI features

Premium Support
Enterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.7k
Star 10k

Code
Issues 778
Pull requests 82
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

JunmooByun wants to merge 2 commits into triton-inference-server:main

base: main

Choose a base branch

from JunmooByun:feat/tokenizer-override

Open

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

JunmooByun wants to merge 2 commits into triton-inference-server:main from JunmooByun:feat/tokenizer-override

+15 −2

Conversation 1 Commits 2 Checks 0 Files changed 1

Conversation

@JunmooByun

Copy link

@JunmooByun JunmooByun commented Jul 31, 2025

What does the PR do?

Support per-model tokenizer override when using Triton + vLLM in OpenAI-compatible mode.

This PR introduces HF_MODEL_NAME_MAP to associate custom model names with their corresponding Hugging Face model identifiers. During model registration, if a mapping is found, the tokenizer is loaded accordingly; otherwise, the system falls back to the default tokenizer.

This enables true multi-model serving in scenarios where each model may require a different tokenizer — something not possible with the previous global --tokenizer option.

Checklist

I have read the Contribution guidelines and signed the Contributor License Agreement
PR title reflects the change and is of format <commit_type>: <Title>
Changes are described in the pull request.
Related issues are referenced.
Populated github labels field
Added test plan and verified test passes.
Verified that the PR passes existing CI.
I ran pre-commit locally (pre-commit install, pre-commit run --all)
Verified copyright is correct on all changed files.
Added succinct git squash message before merging ref.
All template sections are filled out.
Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

feat

Related PRs:

Where should the reviewer start?

python/openai/openai_frontend/engine/triton_engine.py: Tokenizer override logic introduced here.

Test plan:

Ran frontend with:

python3 openai_frontend/main.py --model-repository tests/vllm_models

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

JunmooByun added 2 commits

July 31, 2025 13:20

@JunmooByun


 Add tokenizer override logic for mapped HF model names

cde2484

@JunmooByun


 Merge branch 'main' into feat/tokenizer-override

36d7b43

@JunmooByun JunmooByun marked this pull request as draft

August 4, 2025 01:19

@JunmooByun JunmooByun marked this pull request as ready for review

August 4, 2025 01:29

@JunmooByun

Copy link

Author

JunmooByun commented Aug 4, 2025

This PR was created from a forked repository.

The branch has been updated to the latest main.
Currently, workflow approval and a code review are required.

Could you please:

Approve and run the workflows
Review and approve the PR

Thanks for your time!

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Uh oh!

There was an error while loading. Please reload this page.

1 participant

@JunmooByun

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

Footer navigation

Terms
Privacy
Security
Status
Community
Docs
Contact

You can’t perform that action at this time.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

Are you sure you want to change the base?

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

Conversation

@JunmooByun JunmooByun commented Jul 31, 2025

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Uh oh!

JunmooByun commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant