npm version TypeScript License: MIT
unillm is a unified LLM interface for edge computing. It provides a consistent, type-safe API across multiple LLM providers with minimal dependencies and optimized memory usage for edge environments.
ζ₯ζ¬θͺ | English
- π Edge-First: ~50KB bundle size, ~10ms cold start, optimized for edge runtimes
- π Unified Interface: Single API for Anthropic, OpenAI, Groq, Gemini, Cloudflare, and more
- π Streaming Native: Built on Web Streams API with nagare integration
- π― Type-Safe: Full TypeScript support with Zod schema validation
- π¦ Minimal Dependencies: Only Zod (~11KB) required
- β‘ Memory Optimized: Automatic chunking and backpressure handling
npm install @aid-on/unillm
yarn add @aid-on/unillm
pnpm add @aid-on/unillm
import { unillm } from "@aid-on/unillm"; // Fluent API with type safety const response = await unillm() .model("openai:gpt-4o-mini") .credentials({ openaiApiKey: process.env.OPENAI_API_KEY }) .temperature(0.7) .generate("Explain quantum computing in simple terms"); console.log(response.text);
unillm returns @aid-on/nagare Stream<T> for reactive stream processing:
import { unillm } from "@aid-on/unillm"; import type { Stream } from "@aid-on/nagare"; const stream: Stream<string> = await unillm() .model("groq:llama-3.3-70b-versatile") .credentials({ groqApiKey: "..." }) .stream("Write a story about AI"); // Use nagare's reactive operators const enhanced = stream .map(chunk => chunk.trim()) .filter(chunk => chunk.length > 0) .throttle(16) // ~60fps for UI updates .tap(chunk => console.log(chunk)) .toSSE(); // Convert to Server-Sent Events
Generate type-safe structured data with Zod schemas:
import { z } from "zod"; const PersonSchema = z.object({ name: z.string(), age: z.number(), skills: z.array(z.string()) }); const result = await unillm() .model("groq:llama-3.1-8b-instant") .credentials({ groqApiKey: "..." }) .schema(PersonSchema) .generate("Generate a software engineer profile"); // Type-safe access console.log(result.object.name); // string console.log(result.object.skills); // string[]
Ultra-concise syntax for common models:
import { anthropic, openai, groq, gemini, cloudflare } from "@aid-on/unillm"; // One-liners for quick prototyping await anthropic.sonnet("sk-ant-...").generate("Hello"); await openai.mini("sk-...").generate("Hello"); await groq.instant("gsk_...").generate("Hello"); await gemini.flash("AIza...").generate("Hello"); await cloudflare.llama({ accountId: "...", apiToken: "..." }).generate("Hello");
anthropic:claude-opus-4-5-20251101- Claude Opus 4.5 (Most Intelligent)anthropic:claude-haiku-4-5-20251001- Claude Haiku 4.5 (Ultra Fast)anthropic:claude-sonnet-4-5-20250929- Claude Sonnet 4.5 (Best for Coding)anthropic:claude-opus-4-1-20250805- Claude Opus 4.1anthropic:claude-opus-4-20250514- Claude Opus 4anthropic:claude-sonnet-4-20250514- Claude Sonnet 4anthropic:claude-3-5-haiku-20241022- Claude 3.5 Haikuanthropic:claude-3-haiku-20240307- Claude 3 Haiku
openai:gpt-4o- GPT-4o (Latest, fastest GPT-4)openai:gpt-4o-mini- GPT-4o Mini (Cost-effective)openai:gpt-4o-2024εΉ΄11ζ20ζ₯- GPT-4o November snapshotopenai:gpt-4o-2024εΉ΄08ζ06ζ₯- GPT-4o August snapshotopenai:gpt-4-turbo- GPT-4 Turbo (High capability)openai:gpt-4-turbo-preview- GPT-4 Turbo Previewopenai:gpt-4- GPT-4 (Original)openai:gpt-3.5-turbo- GPT-3.5 Turbo (Fast & cheap)openai:gpt-3.5-turbo-0125- GPT-3.5 Turbo Latest
groq:llama-3.3-70b-versatile- Llama 3.3 70B Versatilegroq:llama-3.1-8b-instant- Llama 3.1 8B Instantgroq:meta-llama/llama-guard-4-12b- Llama Guard 4 12Bgroq:openai/gpt-oss-120b- GPT-OSS 120Bgroq:openai/gpt-oss-20b- GPT-OSS 20Bgroq:groq/compound- Groq Compoundgroq:groq/compound-mini- Groq Compound Mini
gemini:gemini-3-pro-preview- Gemini 3 Pro Previewgemini:gemini-3-flash-preview- Gemini 3 Flash Previewgemini:gemini-2.5-pro- Gemini 2.5 Progemini:gemini-2.5-flash- Gemini 2.5 Flashgemini:gemini-2.0-flash- Gemini 2.0 Flashgemini:gemini-2.0-flash-lite- Gemini 2.0 Flash Litegemini:gemini-1.5-pro-002- Gemini 1.5 Pro 002gemini:gemini-1.5-flash-002- Gemini 1.5 Flash 002
cloudflare:@cf/meta/llama-4-scout-17b-16e-instruct- Llama 4 Scoutcloudflare:@cf/meta/llama-3.3-70b-instruct-fp8-fast- Llama 3.3 70B FP8cloudflare:@cf/meta/llama-3.1-70b-instruct- Llama 3.1 70Bcloudflare:@cf/meta/llama-3.1-8b-instruct-fast- Llama 3.1 8B Fastcloudflare:@cf/meta/llama-3.1-8b-instruct- Llama 3.1 8Bcloudflare:@cf/openai/gpt-oss-120b- GPT-OSS 120Bcloudflare:@cf/openai/gpt-oss-20b- GPT-OSS 20Bcloudflare:@cf/ibm/granite-4.0-h-micro- IBM Granite 4.0cloudflare:@cf/mistralai/mistral-small-3.1-24b-instruct- Mistral Small 3.1cloudflare:@cf/mistralai/mistral-7b-instruct-v0.2- Mistral 7Bcloudflare:@cf/google/gemma-3-12b-it- Gemma 3 12Bcloudflare:@cf/qwen/qwq-32b- QwQ 32Bcloudflare:@cf/qwen/qwen2.5-coder-32b-instruct- Qwen 2.5 Coder
const builder = unillm() .model("groq:llama-3.3-70b-versatile") .credentials({ groqApiKey: "..." }) .temperature(0.7) .maxTokens(1000) .topP(0.9) .system("You are a helpful assistant") .messages([ { role: "user", content: "Previous question..." }, { role: "assistant", content: "Previous answer..." } ]); // Reusable configuration const response1 = await builder.generate("New question"); const response2 = await builder.stream("Another question");
Automatic memory management for edge environments:
import { createMemoryOptimizedStream } from "@aid-on/unillm"; const stream = await createMemoryOptimizedStream( largeResponse, { maxMemory: 1024 * 1024, // 1MB limit chunkSize: 512 // Optimal chunk size } );
import { UnillmError, RateLimitError } from "@aid-on/unillm"; try { const response = await unillm() .model("groq:llama-3.3-70b-versatile") .credentials({ groqApiKey: "..." }) .generate("Hello"); } catch (error) { if (error instanceof RateLimitError) { console.log(`Rate limited. Retry after ${error.retryAfter}ms`); } else if (error instanceof UnillmError) { console.log(`LLM error: ${error.message}`); } }
import { useState } from "react"; import { unillm } from "@aid-on/unillm"; export default function ChatComponent() { const [response, setResponse] = useState(""); const [loading, setLoading] = useState(false); const handleGenerate = async () => { setLoading(true); const stream = await unillm() .model("groq:llama-3.1-8b-instant") .credentials({ groqApiKey: import.meta.env.VITE_GROQ_API_KEY }) .stream("Write a haiku"); for await (const chunk of stream) { setResponse(prev => prev + chunk); } setLoading(false); }; return ( <div> <button onClick={handleGenerate} disabled={loading}> {loading ? "Generating..." : "Generate"} </button> <p>{response}</p> </div> ); }
export default { async fetch(request: Request, env: Env) { const stream = await unillm() .model("cloudflare:@cf/meta/llama-3.1-8b-instruct") .credentials({ accountId: env.CF_ACCOUNT_ID, apiToken: env.CF_API_TOKEN }) .stream("Hello from the edge!"); return new Response(stream.toReadableStream(), { headers: { "Content-Type": "text/event-stream" } }); } };
| Method | Description | Example |
|---|---|---|
model(id) |
Set the model ID | model("groq:llama-3.3-70b-versatile") |
credentials(creds) |
Set API credentials | credentials({ groqApiKey: "..." }) |
temperature(n) |
Set temperature (0-1) | temperature(0.7) |
maxTokens(n) |
Set max tokens | maxTokens(1000) |
topP(n) |
Set top-p sampling | topP(0.9) |
schema(zod) |
Set output schema | schema(PersonSchema) |
system(text) |
Set system prompt | system("You are...") |
messages(msgs) |
Set message history | messages([...]) |
generate(prompt) |
Generate response | await generate("Hello") |
stream(prompt) |
Stream response | await stream("Hello") |
MIT