Run AI models locally on your machine
Pre-built bindings are provided with a fallback to building from source with cmake✨ gpt-oss is here! ✨
- Run LLMs locally on your machine
- Metal, CUDA and Vulkan support
- Pre-built binaries are provided, with a fallback to building from source without node-gypor Python
- Adapts to your hardware automatically, no need to configure anything
- A Complete suite of everything you need to use LLMs in your projects
- Use the CLI to chat with a model without writing any code
- Up-to-date with the latest llama.cpp. Download and compile the latest release with a single CLI command
- Enforce a model to generate output in a parseable format, like JSON, or even force it to follow a specific JSON schema
- Provide a model with functions it can call on demand to retrieve information or perform actions
- Embedding and reranking support
- Safe against special token injection attacks
- Great developer experience with full TypeScript support, and complete documentation
- Much more
Chat with a model in your terminal using a single command:
npx -y node-llama-cpp chat
npm install node-llama-cpp
This package comes with pre-built binaries for macOS, Linux and Windows.
If binaries are not available for your platform, it'll fallback to download a release of llama.cpp and build it from source with cmake.
To disable this behavior, set the environment variable NODE_LLAMA_CPP_SKIP_DOWNLOAD to true.
import {fileURLToPath} from "url"; import path from "path"; import {getLlama, LlamaChatSession} from "node-llama-cpp"; const __dirname = path.dirname(fileURLToPath(import.meta.url)); const llama = await getLlama(); const model = await llama.loadModel({ modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf") }); const context = await model.createContext(); const session = new LlamaChatSession({ contextSequence: context.getSequence() }); const q1 = "Hi there, how are you?"; console.log("User: " + q1); const a1 = await session.prompt(q1); console.log("AI: " + a1); const q2 = "Summarize what you said"; console.log("User: " + q2); const a2 = await session.prompt(q2); console.log("AI: " + a2);
For more examples, see the getting started guide
To contribute to node-llama-cpp read the contribution guide.
- llama.cpp: ggml-org/llama.cpp