semantic-caching

Star

Here are 31 public repositories matching this topic...

Language: All

Filter by language

All 31 Python 20 JavaScript 4 Go 2 Rust 2 Java 1 Jupyter Notebook 1 TypeScript 1

Sort: Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

zzbright1998 / SentenceKV

Star 14

Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.

natural-language-processing transformers memory-efficiency efficient-inference inference-optimization kv-cache llm semantic-caching colm2025

Updated Sep 29, 2025
Python

Hyperion-HQ / Hyperion

Star 14

Ultra-low-latency LLM gateway with microsecond caching, dynamic routing, budgets, analytics, and forecasting.

redis distributed-systems api-gateway rate-limiting low-latency observability cost-optimization multi-provider vector-search pii-redaction qdrant llmops llm-proxy ai-gateway ai-infrastructure llm-gateway llm-routing openai-compatible semantic-caching model-routing

Updated Apr 2, 2026
Go

renswickd / semantic-prompt-cache

Star 9

This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.

optimization ttl-cache rag mistral-api semantic-caching

Updated Jul 4, 2025
Python

AzureManagedRedis / semantic-caching-demo-and-calculator

Star 7

Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.

demo azure-managed-redis semantic-caching cost-modeling

Updated Nov 12, 2025
Python

Chief-Strategist-J / llm-observability-platform

Star 3

High-performance LLM observability and evaluation platform with automated instrumentation, stateful chat orchestration, semantic vector memory caching, and scheduled Temporal workers for cost anomaly detection.

python go clickhouse semantic-search temporal rag opentelemetry vector-database llm-observability semantic-caching llmops-prompt-engineering

Updated Jun 13, 2026
Python

redislabsdev / langcache-customer-data-eval

Star 3

Evaluate how a semantic cache performs on your dataset by computing key KPIs over a threshold sweep and producing plots/CSVs:

redis evaluation vector-database semantic-caching

Updated Mar 11, 2026
Python

AP3008 / Janus

Star 3

Rust Local Token Compression Proxy for coding agents, built solo for GenAI Genesis 2026. 🏆 1st Google Sustainability Hack

rust redis local proxy-server tui tokio deduplication ratatui axum-framework token-compression semantic-caching

Updated Mar 16, 2026
Rust

sunilp303 / claude-cost-gateway

Star 2

An intelligent gateway for Claude APIs that dynamically routes requests to the most cost-efficient model, caches responses, and escalates based on confidence signals — reducing LLM spend without sacrificing quality.

api-proxy observability claude cost-optimization generative-ai llmops prompt-caching claude-api ai-gateway agentic-ai llm-gateway llm-routing inference-gateway semantic-caching llm-infrastructure

Updated May 6, 2026
Python

Pralishatripathy000 / Semantic-cache-llm

Star 2

Semantic caching system using Meta Llama 3.3, Groq, FAISS, and Sentence Transformers to reduce LLM latency and API costs through meaning-based query matching.

machine-learning transformers vector-search groq faiss-vector-database llama3 semantic-caching

Updated Jun 4, 2026
Python

Clement-Okolo / Semantic-Cache

Star 2

Semantic caching for LLM responses using Redis Vector DB, LangChain, and HuggingFace embeddings, parses PDFs, generates FAQs with Groq, and serves similarity-based answers without redundant LLM calls.

chunking long-term-memory vector-database semantic-caching llamacloud live-caching batch-caching

Updated Feb 28, 2026
Jupyter Notebook

leisurelyleon / ragline

Star 2

A retrieval-augmented generation pipeline in Python with a rigorous offline evaluation harness. Chunks and embeds documents, retrieves by vector similarity, and generates grounded answers — with pluggable LLM providers (including a deterministic local fake for tests) and metrics for retrieval quality and answer faithfulness. No API key required.

microsoft python google-ai fastapi token-cost rag-system semantic-caching zoo-design-studio golden-dataset

Updated Jun 1, 2026
Python

sensoris / semcache-python

Star 2

Python library for the Semcache API

python ai openai llm anthropic semantic-caching

Updated Jun 9, 2025
Python

awesome-pro / smartmemo

Sponsor

Star 2

Semantic memory and caching for LLM agents with classifier-validated equivalence instead of naive cosine thresholds.

python machine-learning sqlite pytorch embeddings developer-tools ai-agents cost-optimization faiss vector-search sentence-transformers semantic-memory llm llmops semantic-cache semantic-caching

Updated Jun 10, 2026
Python

manishklach / semantic-kv-control-plane

Star 1

A systems research platform for semantic KV-cache orchestration, topology-aware memory placement, distributed prefix reuse, and rack-scale inference memory simulation.

distributed-systems simulation gpu inference hbm distributed-cache memory-systems prefetching ai-systems kv-cache topology-aware memory-tiering runtime-systems cxl llm-inference ai-infrastructure semantic-caching systems-research memory-orchestration rack-scale

Updated May 25, 2026
Python

sensoris / semcache-node

Star 1

Node SDK for the Semcache API

node js openai llm semantic-caching

Updated Jun 18, 2025
JavaScript

maichanks / llm-cost-optimizer

Star 1

LLM cost monitoring and optimization toolkit

redis monitoring budget cost-optimization llm openrouter prompt-compression semantic-caching token-tracking ai-cost openclaw api-cost-management

Updated Mar 16, 2026
JavaScript

developertogo / velo-sentinel

Star 1

Production-grade Java 25 Virtual Thread inference gateway bridging NVIDIA Triton → Dynamo with Earliest Deadline First (EDF) priority queuing, adaptive batching, and async shadow validation.

redis distributed-systems grpc priority-queues load-balancing model-serving triton-inference-server virtual-threads inference-gateway semantic-caching nvidia-dynamo disaggregated-serving

Updated Jun 13, 2026
Java

Nagpal45 / memoria

Star 1

Semantic LLM Gateway featuring intelligent prompt routing (basic MoE), L1/L2 semantic caching (Redis + pgvector), fault-tolerant model fallbacks, and real-time streaming telemetry. Built to reduce AI inference latency and optimize API compute costs.

redis devops typescript mongodb nextjs telemetry expressjs sse system-design pgvector ai-gateway llm-orchestration semantic-caching

Updated Mar 11, 2026
TypeScript

dchukkapalli-dev / semantic-caching-llm-companion

Star 0

Machine-readable companion to the IEEE OJ-CS survey 'Semantic Caching and Response Reuse for Large Language Model Services: A Survey' (Chukkapalli, Mishra, Naik, 2026): 21-work evidence matrix, systematic-search log, proposed benchmark trace schema, stdlib-only contract validator, and CPU pilot. Code MIT; data CC-BY-4.0.

benchmark survey prisma inference-serving llm semantic-caching response-reuse cache-correctness

Updated Jun 5, 2026
Python

nunoferna / aegis-llm

Star 0

LLMOps API Gateway in Go. Optimizes GenAI workloads with Qdrant semantic caching, Redis rate-limiting, and OpenTelemetry metrics.

docker kubernetes redis golang api-gateway proxy rate-limiting gemini openai cloud-native helm-chart opentelemetry qdrant llm anthropic semantic-caching

Updated Mar 15, 2026
Go

Improve this page

Add a description, image, and links to the semantic-caching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the semantic-caching topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-caching

Here are 31 public repositories matching this topic...

zzbright1998 / SentenceKV

Hyperion-HQ / Hyperion

renswickd / semantic-prompt-cache

AzureManagedRedis / semantic-caching-demo-and-calculator

Chief-Strategist-J / llm-observability-platform

redislabsdev / langcache-customer-data-eval

AP3008 / Janus

sunilp303 / claude-cost-gateway

Pralishatripathy000 / Semantic-cache-llm

Clement-Okolo / Semantic-Cache

leisurelyleon / ragline

sensoris / semcache-python

awesome-pro / smartmemo

manishklach / semantic-kv-control-plane

sensoris / semcache-node

maichanks / llm-cost-optimizer

developertogo / velo-sentinel

Nagpal45 / memoria

dchukkapalli-dev / semantic-caching-llm-companion

nunoferna / aegis-llm

Improve this page

Add this topic to your repo