- Large-scale Inference Scheduling: Optimizing GPU utilization across clusters.
- Multi-model Routing: Intelligent traffic governance for heterogeneous models.
- Production LLMOps: Moving from "demo" to "day-2" operations.
- Cloud-Native Abstractions: Defining standard APIs for AI workloads.
- semantic-router (Committer, Current Focus): Defining the decision-making layer for multi-model LLM serving.
- llm-d : Building the cloud-native infrastructure for disaggregated LLM inference.
- HAMi & Kueue : Used in Kubernetes-native batch scheduling & GPU virtualization.
- Istio : Standardizing traffic governance for service mesh.
- Karmada & Kubernetes : Essential for multi-cluster orchestration at scale.
I build tools to fix my own problems.
- Chrome TabBoost : Browser tab overload is a bug. I patched it with an extension.
- MacMusicPlayer : A minimalist, clean music player for macOS.
- ConfigForge : Manage
~/.ssh/configandkubeconfigwithout the headache. - gmc : AI-powered Git commit messages.
- mdctl : AI-powered Markdown workflow automation.
- hf-model-downloader : Painless Hugging Face model downloads.
- LogoWallpaper : Generating brand assets shouldn't take 30 minutes.
- SaveEye : A minimalist eye care reminder that doesn't annoy you.
- homebrew-tap : I deliver binaries. I don't just dump code. (
brew tap samzong/homebrew-tap) - mirrormate : Docker pulls failing? I fixed it with magic.
- swagger-online : Unified Swagger UI. No more tab chaos.
- ai-icon-generator : I needed icons, so I built a generator.