Efficient Inference for LLMs & MLLMs
An open-source research project from Alibaba Cloud dedicated to efficient large language model inference.
- โจ Key Features
- ๐ฅ Latest Updates
- ๐ฆ Installation
- โก Quick Start
- ๐งช Benchmarks
- ๐ Publications
- ๐ค Contributing
- ๐ License
- โ๏ธ Contact
EfficientAI focuses on inference-time optimizations for LLMs and MLLMs:
| Feature | Description | Status |
|---|---|---|
| ๐น Activation Sparsity | Dynamic sparsity methods for faster inference | โ LaRoSa (ICML 2025) |
| ๐น Quantization | Post-training & quantization-aware techniques for MLLMs | โ MASQuant (CVPR 2026) |
| ๐น Agentic Reasoning | Efficient tool-use and reasoning frameworks | โ D-CORE ( ICML 2026) |
| ๐น Reproducible Benchmarks | Standardized eval pipelines for research & production | ๐ In Progress |
๐ฐ Changelog (Click to expand)
-
[2026-05] ๐ D-CORE accepted to ICML 2026 โ Efficient tool-use reasoning via dynamic computation routing
๐ Paper | ๐ป Code | ๐ฎ Demo -
[2026-03] ๐ MASQuant accepted to CVPR 2026
โ Multimodal LLM PTQ algorithm with SOTA accuracy-efficiency tradeoff
๐ Paper | ๐ป Code -
[2026-01] ๐ LaRoSa accepted to ICML 2025
โ Training-free activation sparsity for LLM acceleration
๐ Paper | ๐ป Code
# Clone the repository git clone https://github.com/alibaba/EfficientAI.git cd EfficientAI # Install dependencies (recommended: use conda) pip install -r requirements.txt # Optional: Install with specific module support # pip install -e ".[larosa]" # for LaRoSa # pip install -e ".[masquant]" # for MASQuant