GPU acceleration delegate with Interpreter API
Using graphics processing units (GPUs) to run your machine learning (ML) models can dramatically improve the performance and the user experience of your ML-enabled applications. On Android devices, you can enable a delegate and one of the following APIs:
- Interpreter API - this guide
- Native (C/C++) API - guide
This page describes how to enable GPU acceleration for LiteRT models in Android apps using the Interpreter API. For more information about using the GPU delegate for LiteRT, including best practices and advanced techniques, see the GPU delegates page.
Use GPU with LiteRT with Google Play services
The LiteRT Interpreter API provides a set of general purpose APIs for building a machine learning applications. This section describes how to use the GPU accelerator delegate with these APIs with LiteRT with Google Play services.
LiteRT with Google Play services is the recommended path to use LiteRT on Android. If your application is targeting devices not running Google Play, see the GPU with Interpreter API and standalone LiteRT section.
Add project dependencies (with .toml version catalog)
- Update your project's
libs.versions.tomlfile
[libraries]
...
tflite-gpu={module="com.google.ai.edge.litert:litert-gpu",version="2.X.Y"}
tflite-gpu-api={module="com.google.ai.edge.litert:litert-gpu-api",version="2.X.Y"}
...
- Add project dependencies in the app's
build.gradle.kts
dependencies{
...
implementation(libs.tflite.gpu)
implementation(libs.tflite.gpu.api)
...
}
Add project dependencies
To enable access to the GPU delegate, add
com.google.android.gms:play-services-tflite-gpu to your app's build.gradle
file:
dependencies{
...
implementation'com.google.android.gms:play-services-tflite-java:16.4.0'
implementation'com.google.android.gms:play-services-tflite-gpu:16.4.0'
}
Enable GPU acceleration
Then initialize LiteRT with Google Play services with the GPU support:
Kotlin
valuseGpuTask=TfLiteGpu.isGpuDelegateAvailable(context) valinterpreterTask=useGpuTask.continueWith{useGpuTask-> TfLite.initialize(context, TfLiteInitializationOptions.builder() .setEnableGpuDelegateSupport(useGpuTask.result) .build()) }
Java
Task<boolean>useGpuTask=TfLiteGpu.isGpuDelegateAvailable(context); Task<Options>interpreterOptionsTask=useGpuTask.continueWith({task-> TfLite.initialize(context, TfLiteInitializationOptions.builder() .setEnableGpuDelegateSupport(true) .build()); });
You can finally initialize the interpreter passing a GpuDelegateFactory
through InterpreterApi.Options:
Kotlin
valoptions=InterpreterApi.Options()
.setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY)
.addDelegateFactory(GpuDelegateFactory())
valinterpreter=InterpreterApi(model,options)
// Run inference
writeToInput(input)
interpreter.run(input,output)
readFromOutput(output)
Java
Optionsoptions=InterpreterApi.Options()
.setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY)
.addDelegateFactory(newGpuDelegateFactory());
Interpreterinterpreter=newInterpreterApi(model,options);
// Run inference
writeToInput(input);
interpreter.run(input,output);
readFromOutput(output);
The GPU delegate can also be used with ML model binding in Android Studio. For more information, see Generate model interfaces using metadata.
Use GPU with standalone LiteRT
If your application is targets devices which are not running Google Play, it is possible to bundle the GPU delegate to your application and use it with the standalone version of LiteRT.
Add project dependencies
To enable access to the GPU delegate, add
com.google.ai.edge.litert:litert-gpu-delegate-plugin to your app's
build.gradle file:
dependencies{
...
implementation'com.google.ai.edge.litert:litert'
implementation'com.google.ai.edge.litert:litert-gpu'
implementation'com.google.ai.edge.litert:litert-gpu-api'
}
Enable GPU acceleration
Then run LiteRT on GPU with TfLiteDelegate. In Java, you can specify
the GpuDelegate through Interpreter.Options.
Kotlin
importorg.tensorflow.lite.Interpreter
importorg.tensorflow.lite.gpu.CompatibilityList
importorg.tensorflow.lite.gpu.GpuDelegate
valcompatList=CompatibilityList()
valoptions=Interpreter.Options().apply{
if(compatList.isDelegateSupportedOnThisDevice){
// if the device has a supported GPU, add the GPU delegate
valdelegateOptions=compatList.bestOptionsForThisDevice
this.addDelegate(GpuDelegate(delegateOptions))
}else{
// if the GPU is not supported, run on 4 threads
this.setNumThreads(4)
}
}
valinterpreter=Interpreter(model,options)
// Run inference
writeToInput(input)
interpreter.run(input,output)
readFromOutput(output)
Java
importorg.tensorflow.lite.Interpreter;
importorg.tensorflow.lite.gpu.CompatibilityList;
importorg.tensorflow.lite.gpu.GpuDelegate;
// Initialize interpreter with GPU delegate
Interpreter.Optionsoptions=newInterpreter.Options();
CompatibilityListcompatList=CompatibilityList();
if(compatList.isDelegateSupportedOnThisDevice()){
// if the device has a supported GPU, add the GPU delegate
GpuDelegate.OptionsdelegateOptions=compatList.getBestOptionsForThisDevice();
GpuDelegategpuDelegate=newGpuDelegate(delegateOptions);
options.addDelegate(gpuDelegate);
}else{
// if the GPU is not supported, run on 4 threads
options.setNumThreads(4);
}
Interpreterinterpreter=newInterpreter(model,options);
// Run inference
writeToInput(input);
interpreter.run(input,output);
readFromOutput(output);
Quantized models
Android GPU delegate libraries support quantized models by default. You do not have to make any code changes to use quantized models with the GPU delegate. The following section explains how to disable quantized support for testing or experimental purposes.
Disable quantized model support
The following code shows how to disable support for quantized models.
Java
GpuDelegatedelegate=newGpuDelegate(newGpuDelegate.Options().setQuantizedModelsAllowed(false));
Interpreter.Optionsoptions=(newInterpreter.Options()).addDelegate(delegate);
For more information about running quantized models with GPU acceleration, see GPU delegate overview.