Physics-based LLM capacity planner - size a deployment before touching hardware. Roofline model → replica estimate + cost + confidence. Benchmark harness validates against real endpoints. Compares on latency, cost, and quality.
benchmark nextjs capacity-planning mlops roofline-model fastapi openai-api llm llm-serving vllm llm-inference sglang gpu-sizing
-
Updated
Jun 15, 2026 - Python