Semantic caching with LangCache on Redis Cloud

Store LLM responses for AI applications in Redis Cloud.

Redis Cloud

LangCache on Redis Cloud is currently available as a public preview. Features and behavior are subject to change.

LangCache is a semantic caching service available as a REST API that stores LLM responses for fast and cheaper retrieval, built on the Redis vector database. By using semantic caching, you can significantly reduce API costs and lower the average latency of your generative AI applications.

For more information about how LangCache works, see the LangCache overview.

LLM cost reduction with LangCache

LangCache reduces your LLM costs by caching responses and avoiding repeated API calls. When a response is served from cache, you don’t pay for output tokens. Input token costs are typically offset by embedding and storage costs.

For every cached response, you'll save the output token cost. To calculate your monthly savings with LangCache, you can use the following formula:

Est. monthly savings with LangCache = 
 (Monthly output token costs) ×ばつ (Cache hit rate)

The more requests you serve from LangCache, the more you save, because you’re not paying to regenerate the output.

Here’s an example:

Monthly LLM spend: 200ドル

Percentage of output tokens in your spend: 60%

Cost of output tokens: 200ドル ×ばつ 60% = 120ドル

Cache hit rate: 50%

Estimated savings: 120ドル ×ばつ 50% = 60ドル/month

Note:

The formula and numbers above provide a rough estimate of your monthly savings. Actual savings will vary depending on your usage.

You can also use the LangCache savings calculator to estimate your annual savings with LangCache.

Get started with LangCache on Redis Cloud

To set up LangCache on Redis Cloud:

Products

Tools

Get Redis

Connect

Learn

Latest

See how it works

Semantic caching with LangCache on Redis Cloud

LLM cost reduction with LangCache

Get started with LangCache on Redis Cloud

On this page