Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

llm-d

llm-d enables high performance distributed inference in production on Kubernetes

Welcome to llm-d: a Kubernetes-native high-performance distributed LLM inference framework

GitHub Org's stars Documentation License

Join Slack X (formerly Twitter) Follow Bluesky LinkedIn Reddit YouTube

llm-d is a Kubernetes-native high-performance distributed LLM inference framework that provides the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d offers modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.

🚀 Quick Start Guide

New to llm-d? Here's how to get started:

  1. Join our Slack 💬Get your invite and visit llm-d.slack.com
  2. Explore our code 📂GitHub Organization
  3. Join a meeting 📅Add calendar
  4. Pick your area 🎯 → Browse Special Interest Groups.

📚 Key Resources

💬 Communication Channels

🗓️ Regular Meetings

All meetings are open to the public! 🌟

  • 📅 Weekly Standup: Every Wednesday at 12:30pm ET - Project updates and open discussion
  • 🎯 SIG Meetings: Various times throughout the week - See SIG details for schedules

Join to participate, ask questions, or just listen and learn!

🎯 Special Interest Groups (SIGs)

Want to dive deeper into specific areas? Our Special Interest Groups are focused teams working on different aspects of llm-d:

  • Inference Scheduler - Intelligent request routing and load balancing
  • Benchmarking - Performance testing and optimization
  • PD-Disaggregation - Prefill/decode separation patterns
  • KV-Disaggregation - KV caching and distributed storage
  • Installation - Kubernetes integration and deployment
  • Autoscaling - Traffic-aware autoscaling and resource management
  • Observability - Monitoring, logging, and metrics

View more SIG Details →

🤝 How to Contribute

Getting Involved

Contributing Code

  1. Read Guidelines: Review our Code of Conduct and contribution process
  2. Sign Commits: All commits require DCO sign-off (git commit -s)

Ways to Contribute

  • 🐛 Bug fixes and small features - Submit PRs directly to component repos
  • 🚀 New features with APIs - Require project proposals
  • 📚 Documentation - Help improve guides and examples
  • 🧪 Testing & Benchmarking - Contribute to our test coverage
  • 💡 Experimental features - Start in llm-d-incubation org

🔒 Security & Safety

🌐 Connect With Us

Follow llm-d across social platforms for updates, discussions, and community highlights:

❓ Need Help?

Questions? Ideas? Just want to chat? We're here to help! The llm-d community team is friendly and responsive.


License: Apache 2.0

Pinned Loading

  1. llm-d llm-d Public

    Achieve state of the art inference performance with modern accelerators on Kubernetes

    Shell 2.5k 312

  2. llm-d-inference-scheduler llm-d-inference-scheduler Public

    Inference scheduler for llm-d

    Go 128 122

  3. llm-d-kv-cache llm-d-kv-cache Public

    Distributed KV cache scheduling & offloading libraries

    Go 104 80

  4. llm-d-benchmark llm-d-benchmark Public

    llm-d benchmark scripts and tooling

    Jupyter Notebook 47 47

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 11 repositories

AltStyle によって変換されたページ (->オリジナル) /