TCP checksum offloading on virtio-net paravirtualized interfaces

Question 1

Consider the topology where 2 QEMU VMs running Linux Ubuntu 16.04 kernel version 4.4.0-210 have both virtio-net interfaces with TAP backends connected to the same (host) Linux bridge and an SSH connection between them.

ubuntu@VM1:~$ uname -a
Linux VM1 4.4.0-210-generic #242-Ubuntu SMP Fri Apr 16 09:57:56 UTC 2021 x86_64 x86_64 x8x
ubuntu@VM1:~$

Both VMs use paravirtualized virtio-net interfaces defaulting to TX and RX checksum offloading.

ubuntu@VM1:~$ ethtool -i eth0
driver: virtio_net
version: 1.0.0
firmware-version:
expansion-rom-version:
bus-info: .&checktime(0000,00,03,':').0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
ubuntu@VM1:~$
ubuntu@VM1:~$ ethtool -k eth0 | grep -i sum
rx-checksumming: on [fixed]
tx-checksumming: on
 tx-checksum-ipv4: off [fixed]
 tx-checksum-ip-generic: on
 tx-checksum-ipv6: off [fixed]
 tx-checksum-fcoe-crc: off [fixed]
 tx-checksum-sctp: off [fixed]
ubuntu@VM1:~$
ubuntu@VM2:~$ ethtool -k eth0 | grep -i sum
rx-checksumming: on [fixed]
tx-checksumming: on
 tx-checksum-ipv4: off [fixed]
 tx-checksum-ip-generic: on
 tx-checksum-ipv6: off [fixed]
 tx-checksum-fcoe-crc: off [fixed]
 tx-checksum-sctp: off [fixed]
ubuntu@VM2:~$

That actually means:

kernel network stack sends out SSH/TCP packets without computing & filling the relevant TCP checksum field inside them (i.e. basically the TCP checksum inside the packets sent is either zeroed out or incorrect)
kernel network stack assumes the virtio-net interface has already checked/verified the TCP checksum for SSH/TCP received packets and is therefore allowed to skip it

Hence the SSH connection works even though traveling SSH/TCP packets have an incorrect TCP checksum (tcpdump run inside both VM confirms this).

Later, changing the topology by connecting each VM to a different linux bridge with a virtual router in the middle, suddenly the SSH connection stop working. I double checked that virtual router actually forwards TCP/SSH packets as-is from a bridge to the the other (in both directions), so I don't understand why the SSH connection stopped working this time.

What is going on in the latter case ? Thanks.

Question 2

Well.. Let's see.. With virtio-net the Linux kernel does "checksum off-load in software"..

The guest TCP stack builds the packet but leaves the TCP checksum field empty and tags the skb as CHECKSUM_PARTIAL. When the packet is finally about to leave the host, the device-model (tap/virtio backend or a real NIC) is supposed to finish the checksum. As long as the frame travels completely inside the same host (VM1 → host bridge → VM2) nobody ever needs the real checksum – every hop in that path trusts the CHECKSUM_PARTIAL tag and simply forwards the skb.

VM2’s virtio driver receives the tag and tells its TCP stack "checksum is already OK", so SSH works even though the field is still zero.

What changed when you inserted the virtual router

VM1 ─tap0─ bridgeA ─ vethA ↔ vethB ─ Linux-router ─ bridgeB ─ tap1 ─ VM2

• tap0 hands an skb that still carries CHECKSUM_PARTIAL to bridgeA.

• bridgeA forwards it to vethA without touching the checksum.

• vethA delivers the same skb to the router’s IP layer still tagged as CHECKSUM_UNNECESSARY (meaning "checksum already verified").

• The router therefore never validates or rewrites the checksum, routes the packet, and transmits it on vethB/tap1 with the checksum field still zero.

Now the frame really leaves the network stack and lands in VM2 with an incorrect checksum but without the magic kernel tag. VM2’s TCP stack calculates the checksum, sees that it is wrong and drops the segment → the SSH session dies.

How to make it work

You can just simply turn off TX checksum off-load inside the two VMs: bash

sudo ethtool -K eth0 tx off

(or disable it on the router’s veth interfaces so the router recomputes the checksum). At least it helped me in an almost identical case several years ago..

Question 3

Thanks for replying. Maybe it was unclear, but the SSH connection's TCP packets never leave the Linux host. Even in the second case VM1, bridgeA, virtual router and bridgeB are all on the same Linux host. p.s. note that the correct topology is VM1 ─tap0─ bridgeA ─ vethA ↔ Linux-virtual-router ↔ vethB ─ bridgeB ─ tap1 ─ VM2

Question 4

@CarloC, on a broader level, checksum offloading, by its very design, cannot work of virtual interfaces plugged into the same host. The "offloading" leaves the calculation of the checksum to the hardware, but packets on a virtual network within the same host never touch the hardware NIC. No matter with or without routing, since they don't leave the host OS, they have no business on the hardware NIC. That's why, as Groovy explained, the checksum never actually gets calculated. The real fix would be to tell the kernel to ignore checksums completely, but I don't know if that can be done.

Question 5

@CarloC, yes you can force the checksum to be computed in software, and this eliminates the errors.My point is that computing the checksum at all is useless, precisely because the packets never leave the host OS. These checksums are intended to detect transmission errors, when packets a traveling a long distance over a wide variety of network infrastructures. On a virtual network, all packets stay in memory, so you're checksumming your own RAM. Would be nice to be able to tell the kernel to ignore checksums completely - not compute them, not check them. Just pretend they're not there

Question 6

@CarloC, yes the checksumming (just like any other part of managing packets) is traditionally done in the kernel's network stack. Since checksumming is easy to do in hardware (you don't need a full-blown CPU, just a simple dedicated chip), people decided to leave it to the NIC to fill the checksum. How exactly to do this, is highly device dependent, so it must be handled by the interface driver. From the kernel's view, it leaves the checksum to the driver, but the driver does not compute it itself (this would make no difference vs. the kernel doing it), it forwards it to the NIC to do the math

Question 7

@CarloC Well, Yes and no — both the protocol (IP/TCP/UDP/etc.) code and the device-driver code touch the skb->ip_summed field, but they do so at different moments and for different reasons... CHECKSUM_PARTIAL is set by the protocol layer when it wants the driver/NIC to finish the checksum... And CHECKSUM_UNNECESSARY (and the other RX values) are set by the driver on the way back up to tell the stack whether that checksum work was already done.

Groovy Groovy 5012 silver badges3 bronze badges · Accepted Answer · 2025-07-07 09:01:52Z

Well.. Let's see.. With virtio-net the Linux kernel does "checksum off-load in software"..

The guest TCP stack builds the packet but leaves the TCP checksum field empty and tags the skb as CHECKSUM_PARTIAL. When the packet is finally about to leave the host, the device-model (tap/virtio backend or a real NIC) is supposed to finish the checksum. As long as the frame travels completely inside the same host (VM1 → host bridge → VM2) nobody ever needs the real checksum – every hop in that path trusts the CHECKSUM_PARTIAL tag and simply forwards the skb.

VM2’s virtio driver receives the tag and tells its TCP stack "checksum is already OK", so SSH works even though the field is still zero.

What changed when you inserted the virtual router

VM1 ─tap0─ bridgeA ─ vethA ↔ vethB ─ Linux-router ─ bridgeB ─ tap1 ─ VM2

• tap0 hands an skb that still carries CHECKSUM_PARTIAL to bridgeA.

• bridgeA forwards it to vethA without touching the checksum.

• vethA delivers the same skb to the router’s IP layer still tagged as CHECKSUM_UNNECESSARY (meaning "checksum already verified").

• The router therefore never validates or rewrites the checksum, routes the packet, and transmits it on vethB/tap1 with the checksum field still zero.

Now the frame really leaves the network stack and lands in VM2 with an incorrect checksum but without the magic kernel tag. VM2’s TCP stack calculates the checksum, sees that it is wrong and drops the segment → the SSH session dies.

How to make it work

You can just simply turn off TX checksum off-load inside the two VMs: bash

sudo ethtool -K eth0 tx off

(or disable it on the router’s veth interfaces so the router recomputes the checksum). At least it helped me in an almost identical case several years ago..

Thanks for replying. Maybe it was unclear, but the SSH connection's TCP packets never leave the Linux host. Even in the second case VM1, bridgeA, virtual router and bridgeB are all on the same Linux host. p.s. note that the correct topology is VM1 ─tap0─ bridgeA ─ vethA ↔ Linux-virtual-router ↔ vethB ─ bridgeB ─ tap1 ─ VM2
@CarloC, on a broader level, checksum offloading, by its very design, cannot work of virtual interfaces plugged into the same host. The "offloading" leaves the calculation of the checksum to the hardware, but packets on a virtual network within the same host never touch the hardware NIC. No matter with or without routing, since they don't leave the host OS, they have no business on the hardware NIC. That's why, as Groovy explained, the checksum never actually gets calculated. The real fix would be to tell the kernel to ignore checksums completely, but I don't know if that can be done.
@CarloC, yes you can force the checksum to be computed in software, and this eliminates the errors.My point is that computing the checksum at all is useless, precisely because the packets never leave the host OS. These checksums are intended to detect transmission errors, when packets a traveling a long distance over a wide variety of network infrastructures. On a virtual network, all packets stay in memory, so you're checksumming your own RAM. Would be nice to be able to tell the kernel to ignore checksums completely - not compute them, not check them. Just pretend they're not there
@CarloC, yes the checksumming (just like any other part of managing packets) is traditionally done in the kernel's network stack. Since checksumming is easy to do in hardware (you don't need a full-blown CPU, just a simple dedicated chip), people decided to leave it to the NIC to fill the checksum. How exactly to do this, is highly device dependent, so it must be handled by the interface driver. From the kernel's view, it leaves the checksum to the driver, but the driver does not compute it itself (this would make no difference vs. the kernel doing it), it forwards it to the NIC to do the math
@CarloC Well, Yes and no — both the protocol (IP/TCP/UDP/etc.) code and the device-driver code touch the skb->ip_summed field, but they do so at different moments and for different reasons... CHECKSUM_PARTIAL is set by the protocol layer when it wants the driver/NIC to finish the checksum... And CHECKSUM_UNNECESSARY (and the other RX values) are set by the driver on the way back up to tell the stack whether that checksum work was already done.

Stack Exchange Network

TCP checksum offloading on virtio-net paravirtualized interfaces

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

TCP checksum offloading on virtio-net paravirtualized interfaces

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions