Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

On-device Inference with LiteRT

LiteRT CompiledModel API represents the modern standard for on-device ML inference, offering streamlined hardware acceleration that significantly outperforms the Interpreter API. This interface simplifies the deployment of .tflite models across a wide range of edge platforms by providing a unified developer experiences and advanced features designed for maximum hardware efficiency.

Why Choose the CompiledModel API?

While the Interpreter API remains available for backward compatibility, the CompiledModel API is where new performance and accelerator features are prioritized. It is the recommended choice for these reasons:

  • Best-in-class GPU acceleration: Leverages ML Drift, the state-of-the-art GPU acceleration library, to deliver reliable GPU inference across mobile, web, desktop, and IoT devices. See GPU acceleration with LiteRT.

  • Unified NPU access: Provides a single, consistent developer experience to access NPUs from various providers like Google Tensor, Qualcomm, MediaTek, abstracting away vendor-specific compilers and runtime complexities. See NPU acceleration with LiteRT.

  • Automated hardware selection: Automatically selects the optimal backend among CPU, GPU, and NPU, based on available hardware and internal priority logic, eliminating the need for manual delegate configuration.

  • Asynchronous execution: Utilizes OS-level mechanisms (like sync fences) to allow hardware accelerators to trigger directly upon completion of previous tasks without involving the CPU. This can reduce latency by up to 2x and ensures a smoother, more interactive AI experience.

  • Efficient I/O buffer management: Leverages the TensorBuffer API to manage high-performance data flow between accelerators. This includes zero-copy buffer interop across AHardwareBuffer, OpenCL, and OpenGL, eliminating costly data copies between preprocessing, inference, and post-processing stages.

Get Started with CompiledModel API

Supported platforms

LiteRT CompiledModel API supports high-performance inferences across Android, iOS, Web, IoT, and Desktop devices. See platform-specific guide.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年12月23日 UTC.