Fast LLM speculative inference server for consumer hardware.
kernel cuda cuda-kernels nvidia-cuda luce rtx3090 llama-cpp local-ai qwen speculative-decoding dflash megakernel speculative-prefill pflash lucebox
-
Updated
Jun 13, 2026 - C++
Fast LLM speculative inference server for consumer hardware.
Add a description, image, and links to the luce topic page so that developers can more easily learn about it.
To associate your repository with the luce topic, visit your repo's landing page and select "manage topics."