-
Notifications
You must be signed in to change notification settings - Fork 578
Releases: google-ai-edge/LiteRT-LM
v0.13.1
v0.13.0
🔥 What's New (v0.13.0)
- 🚀 Agent skill support: Support creating a standalone LiteRT-LM Android demo app with backend selection and multi-modality support. See README.md for the example prompt.
- LiteRT LM CLI update: Support OpenAI API Compatible server (doc)
- Swift package for MacOS: Swift package supports MacOS besides iOS now.
Assets 4
v0.12.0
🔥 What's New (v0.12.0)
- 🚀 Swift APIs: Natively integrate LiteRT-LM into iOS applications with Metal GPU acceleration.
- 🚀 Web JavaScript APIs: Run models inside web browsers with high performance via web GPU/CPU.
- LiteRT-LM CLI Update: The command-line interface now supports NPU, besides CPU and GPU backends across Linux, macOS, and Windows.
- 🚀 Community-Maintained Flutter APIs: Build cross-platform Flutter applications using the community flutter_gemma package.
Features and bug fixes:
CLI
- [feature] NPU support for Intel OpenVINO with --backend=npu.
- [feature] Add --max-num-tokens (context length) to benchmark
- [bugfix] Pin CLI version with API version. (0.12.0 CLI uses 0.12.0 API)
Python API
- [feature] NPU support for Intel OpenVINO.
- [feature] New API to construct Message object.
- [bugfix] Correct the GPU activation type. Prefill speed back to normal (was limited to 50%).
- [bugfix] Propagate cache_dir to vision and audio backend.
Assets 4
v0.11.0
🔥 What's New: v0.11.0
-
Gemma 4 Multi-token Prediction (MTP) Support: Supercharge Gemma 4 on-device inference with Single Position Multi Token Prediction (MTP), delivering >2x faster decode speeds on mobile GPUs with zero quality degradation (blog, documentation).
-
Windows Native Support: The LiteRT-LM CLI now runs natively on Windows with both CPU and GPU backend support.
Assets 7
v0.11.0-rc.1
Release candidate for 0.11.0
Bug fixes
Assets 8
v0.10.2
- Various Bug fixes
- Improve the UI smoothness
Assets 2
v0.10.1
🔥 Gemma 4 support
Deploy Gemma 4 across a broad range of hardware with stellar performance (blog).
👉 Try on Linux, macOS, Windows (WSL) or Raspberry Pi with the
LiteRT-LM CLI:
litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--prompt="What is the capital of France?"Release Notes
- CLI Enhancements & Migration: Migrated the CLI from
firetoclick, adding features like--verbose,--version, improved help formatting, and enhanced terminal output styling (#1784, #1733, #1791, #1792). - Hugging Face Integration: Added support for importing models directly from Hugging Face and implemented auto-conversion for missing models during "run" commands (#1797, #1735).
- Core Performance & Features: Introduced a LiteRT-based KV cache implementation, speculative decoding support, and improved context merging for conversation history (#1601, #1793, #1742).
- Platform & Build Improvements: Refactored CMake for better Android/cross-compilation support, updated the Windows build with a CPU sampler workaround, and transitioned nightly releases to Ubuntu-22.04 (#1741, #1734, #1772).
- API & Documentation: Expanded the Kotlin API for response channel configuration and launched new Python API resources, including a "Getting Started" guide and a Colab notebook (#1724, #1737, #1757).
Assets 9
v0.9.0
Android & iOS Update
-
Performance Optimizations: Significant improvements to app initialization speed and memory management.
-
Bug Fixes: General stability enhancements for a smoother user experience.
Assets 7
v0.9.0-rc
Android / iOS release
With many bug fixes and performance improvements.
Assets 7
v0.9.0-beta
Beta release for v0.9.0.