Multi-GPU for Local AI in 2026: NVLink vs PCIe and When a Second Card Actually Helps

DEV Community

Here is the full consumer NVLink support table:

GPU	Architecture	NVLink support	Bandwidth
RTX 2080 Ti	Turing	Yes (NVLink 2.0)	100 GB/s
RTX 3090	Ampere	Yes (NVLink 3.0)	112.5 GB/s
RTX 3090 Ti	Ampere	No	—
RTX 4070 Ti / 4080 / 4090	Ada Lovelace	No	—
RTX 5060 Ti / 5070 / 5080 / 5090	Blackwell	No	—
RTX PRO 6000 Blackwell	Blackwell (workstation)	Yes (NVLink 5.0)	1,800 GB/s

The RTX 3090 Ti, announced the same generation, did not include the NVLink connector — making the base RTX 3090 the last consumer card with it. The RTX 4090 dropped NVLink entirely; NVIDIA stated it used the freed space for additional AI processing circuitry. The RTX 5090 and the rest of the 50-series continue that pattern.

What this means practically: if you want NVLink in a home lab, your only realistic option is a pair of used RTX 3090s with an NVLink bridge. Everything else is PCIe.

The bandwidth reality

To understand what this costs in performance, the numbers:

Interconnect	Bandwidth (bidirectional)	Typical home-lab hardware
PCIe 4.0 x16	64 GB/s	Most AMD and Intel desktop platforms
PCIe 5.0 x16	128 GB/s	Z790, X670E, AM5 with Gen 5 slot
NVLink 3.0 (RTX 3090 pair)	112.5 GB/s	RTX 3090 + NVLink bridge
NVLink 3.0 (A100 pair)	600 GB/s	Data center, out of home-lab budget
NVLink 4.0 (H100 pair)	900 GB/s	Data center

One important detail for dual-GPU desktop builds: when you install two cards in a typical consumer motherboard, each card gets x8 PCIe lanes rather than x16, because the CPU's PCIe lanes are split between slots. On PCIe 4.0, x8 = 32 GB/s bidirectional. On PCIe 5.0, x8 = 64 GB/s bidirectional.

GPU-to-GPU communication over PCIe also routes through the CPU memory controller — data moves from GPU 0 → CPU → GPU 1 — which adds latency that direct NVLink connections avoid entirely. The RTX 3090's NVLink bridge is a direct GPU-to-GPU connection at 112.5 GB/s with no CPU hop.

For tensor-parallel inference, where each token processed requires all-reduce operations between GPUs, that bandwidth gap translates directly into throughput. Benchmarks from a 4x RTX 3090 cluster found NVLink improves inference throughput by approximately 50% for 2-GPU tensor-parallel pairs, and around 10% for 4-GPU setups where only half of GPU pairs are bridged and the rest communicate over PCIe.

When a second GPU actually helps — and when it makes things worse

Adding a second GPU is not always an upgrade. The outcome depends entirely on the relationship between model size and your GPU's VRAM.

Scenario 1: Model doesn't fit on one card. If you are trying to run Llama 3.3 70B Q4 (requires ~42 GB) on a single RTX 4090 (24 GB), the model simply cannot load. A second 4090 brings you to 48 GB total and the model runs. In this case, the second card is not optional — it is a requirement.

Scenario 2: Model fits on one card, you add a second anyway. This is where people get surprised. If you are running Ollama with a 14B model that fits comfortably in 24 GB of VRAM, Ollama will automatically detect your second GPU and split layers across both cards. The result, counterintuitively, is slower inference — because every token now requires PCIe data transfers between cards that were not necessary when the model lived on one GPU. Ollama's official documentation confirms this behavior: second GPU accelerates large models that require VRAM pooling; it hurts small models that would otherwise run fully on one card.

Scenario 3: High-concurrency serving. If you are running vLLM and serving 10+ simultaneous users, tensor parallelism across two GPUs can roughly double throughput compared to a single-GPU setup, because both GPUs work on each request in parallel. The PCIe overhead is amortized across many concurrent requests. This is the use case where PCIe multi-GPU genuinely earns its keep even without NVLink.

The decision matrix:

Situation	Add second GPU?	Reasoning
70B+ model, single GPU too small	Yes, required	VRAM pooling is the only path
Personal use, <14B models	No — makes it slower	PCIe overhead > compute gain
vLLM serving, 10+ concurrent users	Yes	Throughput scales well
Fine-tuning / QLoRA	Cloud instead	See cloud GPU math
Ollama, model fits on one card	No	Ollama adds overhead, not speed

The RTX 3090 NVLink setup: what it actually buys you

For home-lab users who specifically want NVLink, this is the only practical path. Two used RTX 3090s connected with an NVLink bridge give you:

48 GB combined VRAM — enough for Llama 3.3 70B at Q4_K_M with context headroom
112.5 GB/s GPU-to-GPU bandwidth — ×ばつ the throughput of PCIe 4.0 x8
50% throughput improvement over running the same two 3090s without NVLink in tensor-parallel configurations

Hardware required:

Two RTX 3090 cards (NOT 3090 Ti — that card has no NVLink connector)
One NVIDIA NVLink Bridge 4-slot (ASIN B08S1RYPP6 on Amazon, also available from Newegg). Originally 79ドル MSRP; as of May 2026, available on Amazon and eBay in the 50ドル–80 range
A motherboard with two PCIe x16/x8 slots with sufficient slot spacing for the 4-slot bridge

The thermal reality: Two RTX 3090s at full inference load draw approximately 350W each, putting the combined GPU power draw at ~700W. The NVLink bridge sits between the cards, blocking airflow between them. A dual-3090 NVLink rig almost always requires aftermarket solutions — open-air cases, additional case fans directly above the GPU stack, or liquid cooling. The dual RTX 3090 cooling problem is well-documented and not optional to address. Plan power supply accordingly: a 1200W+ PSU is prudent.

For more context on the RTX 3090's value proposition individually, see Used RTX 3090 in 2026: Still the AI Value King?

Multi-GPU over PCIe: dual RTX 4090 and beyond

For the majority of multi-GPU home-lab builds in 2026 — dual RTX 4090, dual RTX 5090, any combination without NVLink — PCIe is the interconnect. Here is what to expect.

Dual RTX 4090 running Llama 3.3 70B Q4: approximately 25–30 tokens/sec generation speed with vLLM tensor parallelism. A single RTX 4090 cannot run this model at all (insufficient VRAM), so the comparison i