SGLang

Open-source framework for large language model inference

SGLang

Developer	LMSYS
Initial release	January 17, 2024; 2 years ago (2024年01月17日)
Written in	Python, Rust, CUDA, C++
Type	Large language model inference engine
License	Apache License 2.0
Website	sglang.io
Repository	github.com/sgl-project/sglang

SGLang (short for Structured Generation Language) is an open-source framework for programming and serving large language models and multimodal models. It was introduced by researchers affiliated with LMSYS^[1] and other institutions as a system combining a Python-embedded language for structured generation with a runtime for high-throughput inference.^[2]^[3]^[4]

The project is designed for low latency and high-throughput inference workloads, and its documentation describes support for features such as structured outputs, speculative decoding, continuous batching, quantization, and compatibility with OpenAI-style APIs.^[5]

History

[edit ]

SGLang was publicly introduced in January 2024 by researchers affiliated with Stanford, UC Berkeley, Texas A&M, and Shanghai Jiao Tong University.^[2] Its academic description later appeared in the proceedings of NeurIPS 2024.^[3] In January 2026, TechCrunch reported that contributors associated with the project had formed the startup RadixArk to commercialize services around SGLang while continuing its open-source development.^[6]^[7]

Architecture

[edit ]

According to the NeurIPS paper, SGLang consists of two main components: a front-end language embedded in Python and a back-end runtime for executing language model programs efficiently.^[3] The front end provides primitives for generation, selection, and parallel control flow, while the runtime uses a set of optimizations intended to reduce repeated computation and improve throughput.^[3]

Among the techniques described by the project are RadixAttention for reusing key–value cache state across multiple generation calls, compressed finite-state machines for faster constrained decoding, and speculative execution for API-based models.^[3] The current documentation also describes support for serving both language models and multimodal models across a range of hardware back ends.^[5]

References

[edit ]

^ "LMSYS". GitHub. GitHub, Inc. Retrieved April 22, 2026.
^ ^a ^b "Fast and Expressive LLM Inference with RadixAttention and SGLang". LMSYS Org. January 17, 2024. Retrieved April 19, 2026.
^ ^a ^b ^c ^d ^e Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos; Stoica, Ion; Gonzalez, Joseph E.; Barrett, Clark; Sheng, Ying (2024). SGLang: Efficient Execution of Structured Language Model Programs (PDF). Advances in Neural Information Processing Systems 37. Retrieved April 19, 2026.
^ "SGLang". UC Berkeley Sky Computing Lab. April 25, 2024. Retrieved April 22, 2026.
^ ^a ^b "SGLang Documentation". SGLang. Retrieved April 19, 2026.
^ Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with 400ドルM valuation as inference market explodes". TechCrunch. Retrieved April 19, 2026.
^ R, Vignesh (January 23, 2026). "From Berkeley lab to 400ドルM startup: SGLang becomes RadixArk". TFN. Retrieved April 22, 2026.

External links

[edit ]

v
t
e

Large language models (LLMs)

Concepts

Training, prompting, and alignment

Models

Early and encoder models	Word2vec Seq2seq GloVe BERT XLNet T5
GPT series	GPT GPT-1 GPT-2 GPT-3 GPT-4 GPT-4.1 GPT-4.5 GPT-4o GPT-5 GPT-5.5 Codex OpenAI o1 OpenAI o3 OpenAI o4-mini
Other model families	BLOOM Chinchilla Claude DBRX Gemini Gemma GPT-J PanGu Granite Jais LaMDA Llama Vicuna Minerva Mistral and Mixtral Nemotron PaLM Phi Qwen Xiaomi MiMo

Chatbots and assistants

Agents, coding, and applications

Software

Hardware and infrastructure

Benchmarks, evaluation, and detection

Datasets and data

Organizations

People

Social, economic, and governance

Category:Large language models

Retrieved from "https://en.wikipedia.org/w/index.php?title=SGLang&oldid=1351485050"

History

Architecture

See also

References

External links