RecurrentGemma

FunctionGemma released, a model tuned for function calling! Learn more

RecurrentGemma is an open model based on Griffin, a hybrid model architecture that mixes gated linear recurrences with local sliding window attention.

Like Gemma, RecurrentGemma is well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. However, RecurrentGemma's unique architecture offers the following additional advantages:

Reduced memory usage

Lower memory requirements allow for the generation of longer samples on devices with limited memory, like single GPUs or CPUs.
Higher throughput

RecurrentGemma can perform inference at significantly higher batch sizes, meaning it can generate substantially more tokens per second — especially when generating long sequences.
High performance

RecurrentGemma matches Gemma's performance while requiring less memory and achieving faster inference.

More resources

View the model card

RecurrentGemma's model card contains detailed information about the model, implementation information, evaluation information, model usage and limitations, and more.

View on Kaggle

View more code, Colab notebooks, information, and discussions about RecurrentGemma on Kaggle.

Run on GitHub

Run example Colab notebooks for JAX and PyTorch on GitHub.

Reduced memory usage

Higher throughput

High performance

View the model card

View on Kaggle

Run on GitHub