GitHub - vllm-project/semantic-router: Intelligent Mixture-of-Models Router for Efficient LLM Inference

vllm-project/semantic-router

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
.github		.github
bench		bench
candle-binding		candle-binding
config		config
deploy		deploy
docker		docker
e2e-tests		e2e-tests
src		src
website		website
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.extproc		Dockerfile.extproc
LICENSE		LICENSE
Makefile		Makefile
OWNER		OWNER
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Repository files navigation

vLLM Semantic Router

Documentation Hugging Face License Crates.io

📚 Complete Documentation | 🚀 Quick Start | 🏗️ Architecture | 📖 API Reference

Innovations ✨

Intelligent Routing 🧠

Auto-Reasoning and Auto-Selection of Models

An Mixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on Semantic Understanding of the request's intent (Complexity, Task, Tools).

This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.

As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:

Model Accuracy

The screenshot below shows the LLM Router dashboard in Grafana.

LLM Router Dashboard

The router is implemented in two ways: Golang (with Rust FFI based on Candle) and Python. Benchmarking will be conducted to determine the best implementation.

Auto-Selection of Tools

Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.

Enterprise Security 🔒

PII detection

Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user.

Prompt guard

Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.

Similarity Caching ⚡️

Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.

Documentation 📖

For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:

👉 Complete Documentation at Read the Docs

The documentation includes:

Installation Guide - Complete setup instructions
System Architecture - Technical deep dive
Model Training - How classification models work
API Reference - Complete API documentation

Community 👋

For questions, feedback, or to contribute, please join #semantic-router channel in vLLM Slack.

Citation

If you find Semantic Router helpful in your research or projects, please consider citing it:

@misc{semanticrouter2025,
 title={vLLM Semantic Router},
 author={vLLM Semantic Router Team},
 year={2025},
 howpublished={\url{https://github.com/vllm-project/semantic-router}},
}

About

Intelligent Mixture-of-Models Router for Efficient LLM Inference

vllm-semantic-router.com

Code of conduct

Contributing

Activity

Custom properties

Stars

464 stars

Watchers

6 watching

Forks

40 forks

Report repository

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

vllm-project/semantic-router

Folders and files

Latest commit

History

Repository files navigation

Innovations ✨

Intelligent Routing 🧠

Auto-Reasoning and Auto-Selection of Models

Auto-Selection of Tools

Enterprise Security 🔒

PII detection

Prompt guard

Similarity Caching ⚡️

Documentation 📖

Community 👋

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

vllm-project/semantic-router

Folders and files

Latest commit

History

Repository files navigation

Innovations ✨

Intelligent Routing 🧠

Auto-Reasoning and Auto-Selection of Models

Auto-Selection of Tools

Enterprise Security 🔒

PII detection

Prompt guard

Similarity Caching ⚡️

Documentation 📖

Community 👋

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages