forked from spiritbuun/buun-llama-cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 32
@github-actions
github-actions
released this
04 Jun 21:21
·
6 commits
to main
since this release
Changelog
- Merged latest upstream llama.cpp master. This pulls in Gemma 4 12B and Gemma 4 unified multimodal support fixes, including non-causal vision, unified audio/vision projector handling, and FPE fixes; Qwen3.5 post-norm hidden-state behavior for MTP; CUDA KV-cache quantization preallocation and PDL race fixes; WebGPU FlashAttention refactoring with standardized quantization support; CPU backend improvements for RVV/SVE; lower-latency Metal command-buffer status polling; Mermaid diagram rendering and preview support in
tools/ui; updated BoringSSL, SYCL documentation, save/load-state tests, Docker docs, and small CI/release maintenance. - Repaired CUDA fused TurboQuant FlashAttention for same-type
turbo2,turbo3, andturbo4K/V caches. The fused MMA path now loads each supported format correctly, while mixed TurboQuant and TCQ pairs stay on the established non-fused paths; TurboQuant/TCQ partial KV offload now fails early instead of falling back to an incompatible CPU cache and reaching a scheduler crash. AddedGGML_TURBO_FA_DEBUG=1path diagnostics and regression coverage for the supported dispatch matrix. - Updated release packaging and documentation. HIP/ROCm builds now include all quantized FlashAttention combinations, and the prebuilt binary and Docker image lists reflect the current release outputs.
macOS:
Linux:
- Ubuntu x64 CPU
- Ubuntu arm64 CPU
- Ubuntu x64 CUDA 12.4
- Ubuntu x64 CUDA 13.1
- Ubuntu x64 Vulkan
- Ubuntu x64 ROCm 7.2
- Ubuntu x64 SYCL
Windows:
- Windows x64 CPU
- Windows x64 SYCL
- Windows x64 CUDA 12.4 - DLLs
- Windows x64 CUDA 13.1 - DLLs
- Windows x64 HIP
Docker:
- CPU:
docker pull ghcr.io/anbeeld/beellama.cpp:server-cpu-v0.3.1 - CUDA:
docker pull ghcr.io/anbeeld/beellama.cpp:server-cuda-v0.3.1 - CUDA 12:
docker pull ghcr.io/anbeeld/beellama.cpp:server-cuda12-v0.3.1 - CUDA 13:
docker pull ghcr.io/anbeeld/beellama.cpp:server-cuda13-v0.3.1 - ROCm:
docker pull ghcr.io/anbeeld/beellama.cpp:server-rocm-v0.3.1 - Vulkan:
docker pull ghcr.io/anbeeld/beellama.cpp:server-vulkan-v0.3.1 - SYCL:
docker pull ghcr.io/anbeeld/beellama.cpp:server-sycl-v0.3.1
Assets 18
7 people reacted