Consider the topology where 2 QEMU VMs running Linux Ubuntu 16.04
kernel version 4.4.0-210
have both virtio-net
interfaces with TAP
backends connected to the same (host) Linux bridge
and an SSH
connection between them.
ubuntu@VM1:~$ uname -a
Linux VM1 4.4.0-210-generic #242-Ubuntu SMP Fri Apr 16 09:57:56 UTC 2021 x86_64 x86_64 x8x
ubuntu@VM1:~$
Both VMs use paravirtualized virtio-net
interfaces defaulting to TX and RX checksum
offloading.
ubuntu@VM1:~$ ethtool -i eth0
driver: virtio_net
version: 1.0.0
firmware-version:
expansion-rom-version:
bus-info: .&checktime(0000,00,03,':').0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
ubuntu@VM1:~$
ubuntu@VM1:~$ ethtool -k eth0 | grep -i sum
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
ubuntu@VM1:~$
ubuntu@VM2:~$ ethtool -k eth0 | grep -i sum
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
ubuntu@VM2:~$
That actually means:
- kernel network stack sends out
SSH/TCP
packets without computing & filling the relevant TCPchecksum
field inside them (i.e. basically the TCPchecksum
inside the packets sent is either zeroed out or incorrect) - kernel network stack assumes the
virtio-net
interface has already checked/verified the TCPchecksum
forSSH/TCP
received packets and is therefore allowed to skip it
Hence the SSH
connection works even though traveling SSH/TCP
packets have an incorrect TCP
checksum (tcpdump
run inside both VM confirms this).
Later, changing the topology by connecting each VM to a different linux bridge with a virtual router
in the middle, suddenly the SSH
connection stop working. I double checked that virtual router actually forwards TCP/SSH
packets as-is from a bridge to the the other (in both directions), so I don't understand why the SSH
connection stopped working this time.
What is going on in the latter case ? Thanks.
1 Answer 1
Well.. Let's see.. With virtio-net the Linux kernel does "checksum off-load in software"..
The guest TCP stack builds the packet but leaves the TCP checksum field empty and tags the skb as CHECKSUM_PARTIAL. When the packet is finally about to leave the host, the device-model (tap/virtio backend or a real NIC) is supposed to finish the checksum. As long as the frame travels completely inside the same host (VM1 → host bridge → VM2) nobody ever needs the real checksum – every hop in that path trusts the CHECKSUM_PARTIAL tag and simply forwards the skb.
VM2’s virtio driver receives the tag and tells its TCP stack "checksum is already OK", so SSH works even though the field is still zero.
What changed when you inserted the virtual router
VM1 ─tap0─ bridgeA ─ vethA ↔ vethB ─ Linux-router ─ bridgeB ─ tap1 ─ VM2
• tap0 hands an skb that still carries CHECKSUM_PARTIAL to bridgeA.
• bridgeA forwards it to vethA without touching the checksum.
• vethA delivers the same skb to the router’s IP layer still tagged as CHECKSUM_UNNECESSARY (meaning "checksum already verified").
• The router therefore never validates or rewrites the checksum, routes the packet, and transmits it on vethB/tap1 with the checksum field still zero.
Now the frame really leaves the network stack and lands in VM2 with an incorrect checksum but without the magic kernel tag. VM2’s TCP stack calculates the checksum, sees that it is wrong and drops the segment → the SSH session dies.
How to make it work
You can just simply turn off TX checksum off-load inside the two VMs: bash
sudo ethtool -K eth0 tx off
(or disable it on the router’s veth interfaces so the router recomputes the checksum). At least it helped me in an almost identical case several years ago..
-
Thanks for replying. Maybe it was unclear, but the SSH connection's TCP packets never leave the Linux host. Even in the second case VM1, bridgeA, virtual router and bridgeB are all on the same Linux host. p.s. note that the correct topology is VM1 ─tap0─ bridgeA ─ vethA ↔ Linux-virtual-router ↔ vethB ─ bridgeB ─ tap1 ─ VM2CarloC– CarloC2025年07月07日 10:00:01 +00:00Commented Jul 7 at 10:00
-
1@CarloC, on a broader level, checksum offloading, by its very design, cannot work of virtual interfaces plugged into the same host. The "offloading" leaves the calculation of the checksum to the hardware, but packets on a virtual network within the same host never touch the hardware NIC. No matter with or without routing, since they don't leave the host OS, they have no business on the hardware NIC. That's why, as Groovy explained, the checksum never actually gets calculated. The real fix would be to tell the kernel to ignore checksums completely, but I don't know if that can be done.Mike– Mike2025年07月07日 15:48:23 +00:00Commented Jul 7 at 15:48
-
1@CarloC, yes you can force the checksum to be computed in software, and this eliminates the errors.My point is that computing the checksum at all is useless, precisely because the packets never leave the host OS. These checksums are intended to detect transmission errors, when packets a traveling a long distance over a wide variety of network infrastructures. On a virtual network, all packets stay in memory, so you're checksumming your own RAM. Would be nice to be able to tell the kernel to ignore checksums completely - not compute them, not check them. Just pretend they're not thereMike– Mike2025年07月08日 15:52:56 +00:00Commented Jul 8 at 15:52
-
1@CarloC, yes the checksumming (just like any other part of managing packets) is traditionally done in the kernel's network stack. Since checksumming is easy to do in hardware (you don't need a full-blown CPU, just a simple dedicated chip), people decided to leave it to the NIC to fill the checksum. How exactly to do this, is highly device dependent, so it must be handled by the interface driver. From the kernel's view, it leaves the checksum to the driver, but the driver does not compute it itself (this would make no difference vs. the kernel doing it), it forwards it to the NIC to do the mathMike– Mike2025年07月08日 19:00:58 +00:00Commented Jul 8 at 19:00
-
1@CarloC Well, Yes and no — both the protocol (IP/TCP/UDP/etc.) code and the device-driver code touch the skb->ip_summed field, but they do so at different moments and for different reasons... CHECKSUM_PARTIAL is set by the protocol layer when it wants the driver/NIC to finish the checksum... And CHECKSUM_UNNECESSARY (and the other RX values) are set by the driver on the way back up to tell the stack whether that checksum work was already done.Groovy– Groovy2025年07月08日 22:26:26 +00:00Commented Jul 8 at 22:26