An another Node binding of llama.cpp to make same API with llama.rn as much as possible.
- macOS
- arm64: CPU and Metal GPU acceleration
- x86_64: CPU only
- Windows (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA (x86_64)
- Linux (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA
npm install @fugood/llama.node
import { loadModel } from '@fugood/llama.node' // Initial a Llama context with the model (may take a while) const context = await loadModel({ model: 'path/to/gguf/model', n_ctx: 2048, n_gpu_layers: 99, // > 0: enable GPU // lib_variant: 'vulkan', // Change backend }) // Do completion const { text } = await context.completion( { prompt: 'This is a conversation between user and llama, a friendly chatbot. respond in simple markdown.\n\nUser: Hello!\nLlama:', n_predict: 100, stop: ['</s>', 'Llama:', 'User:'], // n_threads: 4, }, (data) => { // This is a partial completion callback const { token } = data }, ) console.log('Result:', text)
-
default: General usage, not support GPU except macOS (Metal) -
vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable -
cuda: Support GPU CUDA (Windows/Linux), but only for limited capabilityLinux: (x86_64: 8.9, arm64: 8.7) Windows: x86_64 - 12.0
MIT
Built and maintained by BRICKS.