๐ค Hugging Face | ๐ค ModelScope | ๐ฌ WeChat (ๅพฎไฟก) | ๐ฎ Discord
ComfyUI plugin for QuantFunc โ the fastest diffusion model inference engine. Run quantized text-to-image and image editing models at 2xโ11x speed with zero Python model dependencies.
Key features:
- Native C++/CUDA acceleration via
libquantfunc.so/quantfunc.dll - SVDQ (offline quantization) + Lighting (runtime quantization) dual engine
- Zero-cost LoRA stacking
- Image editing with reference images
- Export runtime-quantized models with LoRA fusion support
- Auto-update from ModelScope
Plugin (comfy) |
Engine (lib) |
Summary |
|---|---|---|
| 0.0.02 (current) | 0.0.07 | v2 loader architecture ยท inpainting ยท full GPU coverage ยท faster editing โ details below |
| 0.0.01 | 0.0.01 โ 0.0.06 | Base release: runtime/offline quantization ยท model & LoRA loaders ยท reference-image editing ยท export ยท auto-update |
๐ฏ Ease of Use
- v2 loaders โ separate
MODEL/CLIP/VAEsockets feed a Build Pipeline node, so models wire up the ComfyUI-native way instead of one monolithic loader. - Universal format adapters โ load diffusers / BFL (Flux) / nunchaku SVDQ / bundled-checkpoint / HF layouts automatically, with no manual conversion.
- Base Model Auto Loader with one-click download; the plugin also auto-pulls the matching engine on first startup.
๐งฉ Model Support
- SVDQ (offline quantization) + Lighting (runtime BF16/FP16 โ 4-bit) dual engine.
- Pipelines: Z-Image ยท QwenImage ยท QwenImage-Edit ยท Flux.2 Klein.
- Full GPU coverage (engine 0.0.07): consumer RTX 20 / 30 / 40 / 50-series, datacenter A100 / H100 / H200 / B100 / B200 / GB300, workstation RTX 6000 Ada / RTX PRO 6000 Blackwell โ across CUDA 12 & 13.
โก Performance
- Consumer GPUs run native SASS โ no first-run JIT compile stall on 20/30/40/50-series (datacenter/workstation cards JIT once, then cache).
- Native FP4 (NVFP4) on Blackwell (SM120) โ the fastest 4-bit path.
- QFRAW raw staging for reference images & masks skips the PNG/BMP encode (~80 ms saved per ref).
- Multi-pipeline CPUโGPU coexistence โ swap pipelines without a full reload; idle workers auto-free VRAM.
โจ New Features
- Inpainting โ
MASKinput plus Mask Config and Mask Scale By nodes (white = regenerate, black = preserve), mirroring ComfyUI's SetLatentNoiseMask. - Build Pipeline node (v2 assembly) with per-component precision control.
- Robust worker-process architecture โ CPUโGPU model swap + zombie-worker cleanup.
๐ก๏ธ Stability & Security
- Fixed a
/dev/shmRAM leak โ edit/inpaint staging files are now always cleaned up. - Zip-slip guard on dependency-archive extraction.
- IPC bound-check on the worker โ host image transfer.
The plugin auto-pulls the matching engine on startup: bumping
comfyto 0.0.02 lets the updater fetch engine 0.0.07 from ModelScope (oldercomfystays capped at engine 0.0.06).
cd ComfyUI/custom_nodes
git clone https://github.com/QuantFunc/ComfyUI-QuantFunc.gitThe plugin will automatically download the latest compatible libquantfunc.so (Linux) or quantfunc.dll (Windows) from ModelScope on first startup. No manual binary download needed.
- Download or clone this repository into
ComfyUI/custom_nodes/:
ComfyUI/
โโโ custom_nodes/
โโโ ComfyUI-QuantFunc/
โโโ __init__.py
โโโ nodes.py
โโโ worker.py
โโโ auto_update.py
โโโ bin/
โโโ linux/
โ โโโ version.json
โโโ windows/
โโโ version.json
-
Start ComfyUI โ the plugin auto-downloads the library binary on first run.
-
(Optional) To skip auto-download, manually place the binary:
- Linux: Download
libquantfunc.soโbin/linux/ - Windows: Download
quantfunc.dllโbin/windows/
- Linux: Download
| Requirement | Minimum |
|---|---|
| GPU | NVIDIA RTX 20 series or newer (CC 7.5+) |
| VRAM | 8 GB |
| Driver | NVIDIA โฅ 560 |
| CUDA Runtime | 13.0+ |
| cuDNN | 9.x |
| OS | Linux (glibc 2.31+) or Windows 10/11 |
| Python | 3.9+ (ComfyUI's embedded Python) |
# CUDA 12 runtime libraries sudo apt install cuda-libraries-12-8 # or individual packages: sudo apt install libcublas-12-8 libcurand-12-8 libcusolver-12-8 libcusparse-12-8 libnvjitlink-12-8 # cuDNN 9 sudo apt install libcudnn9-cuda-12 # --- OR --- # CUDA 13 runtime libraries sudo apt install cuda-libraries-13-0 # or individual packages: sudo apt install libcublas-13-0 libcurand-13-0 libcusolver-13-0 libcusparse-13-0 libnvjitlink-13-0 # cuDNN 9 sudo apt install libcudnn9-cuda-13
- NVIDIA Driver โฅ 560 (provides CUDA runtime DLLs)
- Visual C++ Redistributable 2015-2022 (download)
- cuDNN 9.x (download)
Auto-update requires modelscope Python package:
pip install modelscope
If modelscope is not installed, auto-update is silently skipped. You can manually download binaries from:
After starting ComfyUI, check the console for:
[QuantFunc] Checking for updates (plugin v0.0.01, lib v0.0.01)...
[QuantFunc] Library is up to date (v0.0.01)
If the library was not found:
[QuantFunc] No library found, checking ModelScope for download (plugin v0.0.01)...
[QuantFunc] Downloading libquantfunc.so v0.0.01 from ModelScope...
[QuantFunc] Updated libquantfunc.so to v0.0.01. Restart ComfyUI to use the new version.
See doc/ for detailed tutorials and workflow_sample/README.md for node reference.
The easiest way to get started โ import the Easy Gen workflow, pick a model from the dropdown, and the plugin auto-downloads everything. No manual model downloads or path configuration needed.
The Lighting backend provides runtime quantization โ it uses the Lighting engine to quantize any diffusers-format BF16/FP16 model (e.g., Qwen/Qwen-Image-Edit-2511) to 4bit at load time for accelerated inference. Just set model_backend to lighting and leave transformer_path empty โ no pre-quantized model download needed.
The Lighting export saves all runtime-quantized models to disk, so you don't need to re-quantize on every startup. If you've also stacked LoRAs, they are permanently fused into the exported weights โ no LoRA nodes needed, no re-quantization, load and go.
QuantFunc has pre-exported commonly used models (runtime-quantized and ready to use). Download them directly from ModelScope or HuggingFace โ same 2xโ11x inference speedup as runtime quantization, but with faster loading since the quantization step is skipped.
Import from workflow_sample/:
| File | Use Case |
|---|---|
QuantFunc-Easy-Gen.json |
Beginners โ 3-node auto-download workflow |
QuantFunc-Text-to-Image-Workflow.json |
Text-to-image (SVDQ + Lighting side by side) |
QuantFunc-Image-to-Image-Workflow.json |
Image editing with reference images |
QuantFunc-Model-Export.json |
Export runtime-quantized models (supports LoRA fusion) |
| Issue | Solution |
|---|---|
| Worker failed to start | Check CUDA driver โฅ 560, ensure CUDA runtime libs installed |
| DLL/SO not found | Check bin/linux/ or bin/windows/ contains the library; restart ComfyUI to trigger auto-download |
| No log output | Update to latest library version (requires stderr log support) |
| cuDNN BAD_PARAM | Delete cuDNN algo cache and retry |
| Noisy output | Ensure model backend matches transformer weights (svdq vs lighting) |
| Auto-update fails | Install modelscope package, or manually download from ModelScope |
Join our community for support, updates, and discussions:
- ๐ฎ Discord server
- ๐ฌ Scan the QR code below to join our WeChat group: