Quality impact: Negligible for 90%+ of prompts. Side-by-side comparisons show differences only in very fine details at high magnification. Community blind tests consistently fail to distinguish FP8 from FP16 outputs.
When to use FP8:
- Running Flux on 12GB GPUs (RTX 3060 12GB, RTX 5070) — see can the RTX 3060 run Stable Diffusion? for the full picture on what this card handles
- Running Flux on 16GB GPUs and wanting headroom for ControlNet or batch generation
- Running SD XL on 8GB GPUs
Hardware requirement: RTX 40/50 series GPUs have native FP8 tensor cores. RTX 30 series cards can use FP8 models but the computation falls back to FP16 math — you save VRAM but don't get the speed benefit.
NF4: when VRAM is desperate
NF4 (4-bit NormalFloat) quantization via ComfyUI nodes compresses models aggressively. This is how people run Flux on 8GB GPUs and SD XL on 6GB GPUs.
Quality impact: Visible. Fine details soften, color accuracy drifts slightly, and complex compositions show more artifacts. For quick drafts and iteration, it's usable. For final output quality, it's a compromise.
| Quality aspect |
FP16 |
FP8 |
NF4 |
| Fine detail |
Excellent |
Excellent |
Reduced |
| Color accuracy |
Reference |
Near-reference |
Slight drift |
| Complex scenes |
Full fidelity |
Full fidelity |
Occasional artifacts |
| Text rendering |
Best available |
Same as FP16 |
Degraded |
| Speed (RTX 40/50) |
Baseline |
Similar |
Slower (dequant overhead) |
When to use NF4: Only when FP8 doesn't fit. Running Flux on 8GB cards. Running SD XL on 6GB cards. If your GPU has 12GB+, FP8 is almost always the better tradeoff.
See the recommended pick on the original guide
T5 text encoder quantization (Flux-specific)
Flux uses a large T5-XXL text encoder alongside its image model. At FP16, this encoder alone uses ~10GB of VRAM. Quantizing the T5 encoder to INT8 or INT4 reduces this to ~5GB or ~3GB respectively, with minimal impact on prompt understanding.
This is often more impactful than quantizing the image model itself. On a 16GB card, quantizing T5 to INT8 and keeping the DiT at FP8 gives a much better quality result than keeping T5 at FP16 and aggressively quantizing the DiT.
For a deeper look at Flux VRAM requirements, see the Flux GPU guide and VRAM requirements for Flux.
Which quantization should you use?
| Your GPU VRAM |
SD 1.5 |
SD XL |
Flux |
| 6GB |
FP16 |
NF4 |
Not viable |
| 8GB |
FP16 |
FP16 |
NF4 (barely) |
| 12GB |
FP16 |
FP16 |
FP8 |
| 16GB |
FP16 |
FP16 |
FP8 (with headroom) |
| 24GB+ |
FP16 |
FP16 |
FP16 |
For GPU recommendations matched to these VRAM tiers, see the Stable Diffusion GPU guide and the ComfyUI GPU guide.
See the recommended pick on the original guide
Related guides on Best GPU for AI
The full version lives on Best GPU for AI — VRAM calculator, GPU comparison table, and live Amazon pricing.