Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

v0.3.1

Latest

Choose a tag to compare

@github-actions github-actions released this 04 Jun 21:21
· 6 commits to main since this release
Changelog
  • Merged latest upstream llama.cpp master. This pulls in Gemma 4 12B and Gemma 4 unified multimodal support fixes, including non-causal vision, unified audio/vision projector handling, and FPE fixes; Qwen3.5 post-norm hidden-state behavior for MTP; CUDA KV-cache quantization preallocation and PDL race fixes; WebGPU FlashAttention refactoring with standardized quantization support; CPU backend improvements for RVV/SVE; lower-latency Metal command-buffer status polling; Mermaid diagram rendering and preview support in tools/ui; updated BoringSSL, SYCL documentation, save/load-state tests, Docker docs, and small CI/release maintenance.
  • Repaired CUDA fused TurboQuant FlashAttention for same-type turbo2, turbo3, and turbo4 K/V caches. The fused MMA path now loads each supported format correctly, while mixed TurboQuant and TCQ pairs stay on the established non-fused paths; TurboQuant/TCQ partial KV offload now fails early instead of falling back to an incompatible CPU cache and reaching a scheduler crash. Added GGML_TURBO_FA_DEBUG=1 path diagnostics and regression coverage for the supported dispatch matrix.
  • Updated release packaging and documentation. HIP/ROCm builds now include all quantized FlashAttention combinations, and the prebuilt binary and Docker image lists reflect the current release outputs.

macOS:

Linux:

Windows:

Docker:

  • CPU: docker pull ghcr.io/anbeeld/beellama.cpp:server-cpu-v0.3.1
  • CUDA: docker pull ghcr.io/anbeeld/beellama.cpp:server-cuda-v0.3.1
  • CUDA 12: docker pull ghcr.io/anbeeld/beellama.cpp:server-cuda12-v0.3.1
  • CUDA 13: docker pull ghcr.io/anbeeld/beellama.cpp:server-cuda13-v0.3.1
  • ROCm: docker pull ghcr.io/anbeeld/beellama.cpp:server-rocm-v0.3.1
  • Vulkan: docker pull ghcr.io/anbeeld/beellama.cpp:server-vulkan-v0.3.1
  • SYCL: docker pull ghcr.io/anbeeld/beellama.cpp:server-sycl-v0.3.1

Browse all container images

Assets 18
Loading
noonghunna, sakharovaan, fguillotpro, pjsgsy, and alexanderjacuna reacted with thumbs up emoji anubhavgupta, Jgemez, and alexanderjacuna reacted with heart emoji
7 people reacted

AltStyle によって変換されたページ (->オリジナル) /