Documentation Hugging Face License Crates.io
π Complete Documentation | π Quick Start | ποΈ Architecture | π API Reference
An Mixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on Semantic Understanding of the request's intent (Complexity, Task, Tools).
This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.
As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:
The screenshot below shows the LLM Router dashboard in Grafana.
The router is implemented in two ways: Golang (with Rust FFI based on Candle) and Python. Benchmarking will be conducted to determine the best implementation.
Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.
Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user.
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.
Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
π Complete Documentation at Read the Docs
The documentation includes:
- Installation Guide - Complete setup instructions
- System Architecture - Technical deep dive
- Model Training - How classification models work
- API Reference - Complete API documentation
For questions, feedback, or to contribute, please join #semantic-router
channel in vLLM Slack.
If you find Semantic Router helpful in your research or projects, please consider citing it:
@misc{semanticrouter2025,
title={vLLM Semantic Router},
author={vLLM Semantic Router Team},
year={2025},
howpublished={\url{https://github.com/vllm-project/semantic-router}},
}