A Go library for proxying requests to upstream LLM providers with pluggable, composable architecture.
go get github.com/agentuity/llmproxy
package main import ( "context" "io" "net/http" "github.com/agentuity/llmproxy" "github.com/agentuity/llmproxy/interceptors" "github.com/agentuity/llmproxy/providers/openai" ) func main() { ctx := context.Background() provider, _ := openai.New("sk-your-key") proxy := llmproxy.NewProxy(provider, llmproxy.WithInterceptor(interceptors.NewLogging(nil)), ) http.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r *http.Request) { resp, meta, err := proxy.Forward(ctx, r) if err != nil { http.Error(w, err.Error(), 500) return } defer resp.Body.Close() // Response includes token usage _ = meta.Usage.PromptTokens _ = meta.Usage.CompletionTokens io.Copy(w, resp.Body) }) http.ListenAndServe(":8080", nil) }
Single endpoint that auto-detects provider and API type:
package main import ( "net/http" "github.com/agentuity/llmproxy" "github.com/agentuity/llmproxy/providers/openai" "github.com/agentuity/llmproxy/providers/anthropic" ) func main() { openaiProvider, _ := openai.New("sk-openai-key") anthropicProvider, _ := anthropic.New("sk-ant-key") router := llmproxy.NewAutoRouter( llmproxy.WithAutoRouterFallbackProvider(openaiProvider), ) router.RegisterProvider(openaiProvider) router.RegisterProvider(anthropicProvider) // Single endpoint handles all providers and APIs http.Handle("/", router) http.ListenAndServe(":8080", nil) }
POST to / with any model - provider and API are auto-detected:
# Auto-detect OpenAI from gpt-4 model name curl -X POST http://localhost:8080/ \ -H 'Content-Type: application/json' \ -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}' # Auto-detect Anthropic from claude model name curl -X POST http://localhost:8080/ \ -H 'Content-Type: application/json' \ -d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}' # Auto-detect Responses API from input field curl -X POST http://localhost:8080/ \ -H 'Content-Type: application/json' \ -d '{"model":"gpt-4o","input":"Hello"}'
- 9 Provider Implementations: OpenAI, Anthropic, Groq, Fireworks, x.AI, Google AI, AWS Bedrock, Azure OpenAI, OpenAI-compatible base
- AutoRouter: Single endpoint with automatic provider/API detection
- Responses API: Full support for OpenAI's Responses API (HTTP streaming and WebSocket mode)
- WebSocket Mode: Persistent connections for multi-turn Responses API workflows with per-turn billing
- SSE Streaming: Full streaming support with efficient token usage extraction
- 8 Built-in Interceptors: Logging, Metrics, Retry, Billing, Tracing (OTel), HeaderBan, AddHeader, PromptCaching
- Pricing Integration: models.dev adapter with markup support
- Prompt Caching: prompt caching support for Anthropic, OpenAI, xAI, Fireworks, and Bedrock
- Raw Body Preservation: Custom JSON fields pass through unchanged
The AutoRouter provides automatic routing from a single endpoint:
- Path-based -
/v1/messages→ Messages API,/v1/responses→ Responses API - Body + Provider - When path is
/or unknown:inputfield → Responses APIpromptfield → Completions APIcontentsfield → GenerateContent APImessages+ Anthropic → Messages APImessages+ other → Chat Completions
- X-Provider header - Explicit override
- Model prefix -
openai/gpt-4→ OpenAI (strips prefix before forwarding) - Model pattern -
gpt-*→ OpenAI,claude-*→ Anthropic, etc.
# Explicit provider via header curl -X POST http://localhost:8080/ \ -H 'Content-Type: application/json' \ -H 'X-Provider: anthropic' \ -d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}' # Provider prefix in model (gets stripped) curl -X POST http://localhost:8080/ \ -H 'Content-Type: application/json' \ -d '{"model":"anthropic/claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}' # Traditional path still works curl -X POST http://localhost:8080/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'
SSE streaming is fully supported with automatic token usage extraction for billing:
# Streaming with automatic usage extraction curl -X POST http://localhost:8080/ \ -H 'Content-Type: application/json' \ -d '{"model":"gpt-4","stream":true,"messages":[{"role":"user","content":"Hello"}]}'
Key Features:
- Efficient flushing: Uses
http.ResponseControllerfor immediate SSE delivery - Token extraction: Extracts usage from streaming responses for billing
- Auto stream_options: Automatically injects
stream_options.include_usagewhen billing is configured - Works with billing: Billing is calculated after stream completes
Example with billing:
adapter, _ := modelsdev.LoadFromURL() billingCallback := func(r llmproxy.BillingResult) { log.Printf("Cost: $%.6f (tokens: %d/%d)", r.TotalCost, r.PromptTokens, r.CompletionTokens) } router := llmproxy.NewAutoRouter( llmproxy.WithAutoRouterBillingCalculator(llmproxy.NewBillingCalculator(adapter.GetCostLookup(), billingCallback)), )
The Responses API supports persistent WebSocket connections for multi-turn, tool-call-heavy workflows. WebSocket support is opt-in with a zero-dependency adapter pattern — bring your own WebSocket library.
package main import ( "context" "log" "net/http" "github.com/agentuity/llmproxy" "github.com/agentuity/llmproxy/providers/openai" "github.com/gorilla/websocket" ) // Configure allowed origins for WebSocket upgrades. var trustedOrigins = []string{"https://myapp.example.com"} // Thin adapters — gorilla's *Conn already satisfies llmproxy.WSConn type gorillaUpgrader struct{ websocket.Upgrader } func (u *gorillaUpgrader) Upgrade(w http.ResponseWriter, r *http.Request, h http.Header) (llmproxy.WSConn, error) { conn, err := u.Upgrader.Upgrade(w, r, h) return conn, err } type gorillaDialer struct{ websocket.Dialer } func (d *gorillaDialer) DialContext(ctx context.Context, urlStr string, h http.Header) (llmproxy.WSConn, *http.Response, error) { conn, resp, err := d.Dialer.DialContext(ctx, urlStr, h) return conn, resp, err } func main() { // In production, validate the Origin header against trusted origins. // This example allows all origins for brevity. upgrader := websocket.Upgrader{ CheckOrigin: func(r *http.Request) bool { origin := r.Header.Get("Origin") for _, allowed := range trustedOrigins { if origin == allowed { return true } } return false }, } provider, _ := openai.New("sk-your-key") router := llmproxy.NewAutoRouter( llmproxy.WithAutoRouterFallbackProvider(provider), llmproxy.WithAutoRouterWebSocket( &gorillaUpgrader{upgrader}, &gorillaDialer{websocket.Dialer{}}, ), llmproxy.WithAutoRouterWSBillingCallback(func(turn int, meta llmproxy.ResponseMetadata, billing *llmproxy.BillingResult) { log.Printf("Turn %d: %d prompt + %d completion tokens", turn, meta.Usage.PromptTokens, meta.Usage.CompletionTokens) }), ) router.RegisterProvider(provider) http.Handle("/", router) log.Fatal(http.ListenAndServe(":8080", nil)) }
Clients connect with any WebSocket library:
from websocket import create_connection import json ws = create_connection("ws://localhost:8080/v1/responses", header=["Authorization: Bearer sk-your-key"]) ws.send(json.dumps({ "type": "response.create", "model": "gpt-4o", "input": [{"type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}]}], })) for msg in ws: event = json.loads(msg) print(event["type"], event.get("delta", "")) if event["type"] == "response.completed": break
The proxy handles model prefix stripping, auth header forwarding, usage extraction, and per-turn billing automatically. See DESIGN.md for full protocol details.
| Provider | Auth | API Format | Notes |
|---|---|---|---|
| OpenAI | Bearer token | Chat completions, Responses, WebSocket | HTTP + WebSocket for /v1/responses |
| Anthropic | x-api-key |
Messages API | |
| Groq | Bearer token | OpenAI-compatible | |
| Fireworks | Bearer token | OpenAI-compatible | |
| x.AI | Bearer token | OpenAI-compatible | |
| Google AI | API key query param | Gemini generateContent | |
| AWS Bedrock | AWS Signature V4 | Converse API | |
| Azure OpenAI | api-key or Azure AD |
Chat completions (deployments) |
// Logging llmproxy.WithInterceptor(interceptors.NewLogging(logger)) // Metrics (thread-safe) metrics := &interceptors.Metrics{} llmproxy.WithInterceptor(interceptors.NewMetrics(metrics)) // Retry on 429/5xx llmproxy.WithInterceptor(interceptors.NewRetry(3, time.Second)) // Billing with models.dev pricing adapter, _ := modelsdev.LoadFromURL() llmproxy.WithInterceptor(interceptors.NewBilling(adapter.GetCostLookup(), func(r llmproxy.BillingResult) { log.Printf("Cost: $%.6f", r.TotalCost) })) // OTel tracing llmproxy.WithInterceptor(interceptors.NewTracing(otelExtractor)) // Strip sensitive headers llmproxy.WithInterceptor(interceptors.NewResponseHeaderBan("Openai-Organization")) // Add custom headers llmproxy.WithInterceptor(interceptors.NewAddResponseHeader( interceptors.NewHeader("X-Gateway", "llmproxy"), )) // Anthropic prompt caching (default 5 min, free) llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetentionDefault)) // Anthropic prompt caching with 1h retention (costs more) llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetention1h)) // OpenAI prompt caching with explicit cache key llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCaching(interceptors.CacheRetention24h, "my-cache-key")) // OpenAI prompt caching with auto-derived key and tenant namespace llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCachingAuto("tenant-123", interceptors.CacheRetentionDefault)) // xAI/Grok prompt caching (uses x-grok-conv-id header) llmproxy.WithInterceptor(interceptors.NewXAIPromptCaching("conv-abc123")) // Fireworks prompt caching (uses x-session-affinity and x-prompt-cache-isolation-key headers) llmproxy.WithInterceptor(interceptors.NewFireworksPromptCaching("session-123"))
The library uses small, focused interfaces that compose into providers:
Parse → Enrich → Resolve → Forward → Extract
- BodyParser — Extract metadata from request body
- RequestEnricher — Add auth headers
- URLResolver — Determine upstream URL
- ResponseExtractor — Parse response metadata
- Provider — Composes the above
- Interceptor — Wrap request/response for cross-cutting concerns
See DESIGN.md for full architecture details.
A complete multi-provider proxy server:
cd examples/basic
go run main.goEnvironment variables:
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
GROQ_API_KEY |
Groq |
FIREWORKS_API_KEY |
Fireworks |
XAI_API_KEY |
x.AI |
GOOGLE_AI_API_KEY |
Google AI |
AZURE_OPENAI_RESOURCE |
Azure OpenAI |
AZURE_OPENAI_API_KEY |
Azure OpenAI |
AWS_REGION + AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
AWS Bedrock |