Best Quantization for Stable Diffusion & Flux

DEV Community

Quality impact: Negligible for 90%+ of prompts. Side-by-side comparisons show differences only in very fine details at high magnification. Community blind tests consistently fail to distinguish FP8 from FP16 outputs.

When to use FP8:

Running Flux on 12GB GPUs (RTX 3060 12GB, RTX 5070) — see can the RTX 3060 run Stable Diffusion? for the full picture on what this card handles
Running Flux on 16GB GPUs and wanting headroom for ControlNet or batch generation
Running SD XL on 8GB GPUs

Hardware requirement: RTX 40/50 series GPUs have native FP8 tensor cores. RTX 30 series cards can use FP8 models but the computation falls back to FP16 math — you save VRAM but don't get the speed benefit.

NF4: when VRAM is desperate

NF4 (4-bit NormalFloat) quantization via ComfyUI nodes compresses models aggressively. This is how people run Flux on 8GB GPUs and SD XL on 6GB GPUs.

Quality impact: Visible. Fine details soften, color accuracy drifts slightly, and complex compositions show more artifacts. For quick drafts and iteration, it's usable. For final output quality, it's a compromise.

Quality aspect	FP16	FP8	NF4
Fine detail	Excellent	Excellent	Reduced
Color accuracy	Reference	Near-reference	Slight drift
Complex scenes	Full fidelity	Full fidelity	Occasional artifacts
Text rendering	Best available	Same as FP16	Degraded
Speed (RTX 40/50)	Baseline	Similar	Slower (dequant overhead)

When to use NF4: Only when FP8 doesn't fit. Running Flux on 8GB cards. Running SD XL on 6GB cards. If your GPU has 12GB+, FP8 is almost always the better tradeoff.

T5 text encoder quantization (Flux-specific)

Flux uses a large T5-XXL text encoder alongside its image model. At FP16, this encoder alone uses ~10GB of VRAM. Quantizing the T5 encoder to INT8 or INT4 reduces this to ~5GB or ~3GB respectively, with minimal impact on prompt understanding.

This is often more impactful than quantizing the image model itself. On a 16GB card, quantizing T5 to INT8 and keeping the DiT at FP8 gives a much better quality result than keeping T5 at FP16 and aggressively quantizing the DiT.

For a deeper look at Flux VRAM requirements, see the Flux GPU guide and VRAM requirements for Flux.

Which quantization should you use?

Your GPU VRAM	SD 1.5	SD XL	Flux
6GB	FP16	NF4	Not viable
8GB	FP16	FP16	NF4 (barely)
12GB	FP16	FP16	FP8
16GB	FP16	FP16	FP8 (with headroom)
24GB+	FP16	FP16	FP16

For GPU recommendations matched to these VRAM tiers, see the Stable Diffusion GPU guide and the ComfyUI GPU guide.

Related guides on Best GPU for AI

The full version lives on Best GPU for AI — VRAM calculator, GPU comparison table, and live Amazon pricing.