IBM: z/VM Performance Report

Dedicated OSA vs. Vswitch Update

Abstract

To connect to an external network, z/VM guests can use a dedicated OSA or a vswitch. This chapter provides a comparison of how the choice impacts the transaction rate when running request-response (RR) workloads and the outbound data rate when running streaming (STR) workloads. A variety of different configurations are compared.

Introduction

The Dedicated OSA vs. VSWITCH chapter of the z/VM 5.2 Performance Report compared two connectivity options available for guests running under z/VM: direct connection to OSA and vswitch.

Here we present an update of the z/VM 5.2 information. This refresh contains a comparison of key measurement points between the two options and lists some of the reasons for choosing one over the other. Customer results will vary according to system configuration and workload.

Method

Application Workload Modeler (AWM), a Linux network benchmarking application, was used to drive network traffic between one client Linux guest and one server Linux guest. Each guest was in its own dedicated LPAR. Both dedicated OSA configurations and vswitch configurations were evaluated. Both request-response (RR) and streaming (STR) workloads were used. The RR workload consisted of the client sending 200 bytes to the server and the server responding with 1000 bytes. The STR workload consisted of the client sending 20 bytes to the server and the server responding with 20 MB. The measurement ran for 600 seconds. The workloads were run in 12 configurations. The configurations varied by maximum transmission unit (MTU) size, SMT mode, and transport mode. The table below shows the combination of workloads and configurations used.

Table 1. Combination of workloads and configurations

Workload MTU Size SMT Mode Transport Mode

RR 1492 SMT-1 Layer 2

RR 1492 SMT-1 Layer 3

RR 1492 SMT-2 Layer 2

RR 1492 SMT-2 Layer 3

STR 1492 SMT-1 Layer 2

STR 1492 SMT-1 Layer 3

STR 1492 SMT-2 Layer 2

STR 1492 SMT-2 Layer 3

STR 8992 SMT-1 Layer 2

STR 8992 SMT-1 Layer 3

STR 8992 SMT-2 Layer 2

STR 8992 SMT-2 Layer 3

Note: See Layer 2 and Layer 3 for more details about transport modes.

Each combination from Table 1 was run three times: once using one socket connection, once using 10 concurrent socket connections, and once using 50 concurrent socket connections.

The measurements were done on a z15 8561-T01 using two dedicated LPARs. For SMT-1 runs, each LPAR used two logical IFL cores. For SMT-2 runs, each LPAR used one logical IFL core. Connectivity between the two LPARs was over an OSA-Express6 10GbE card. The software used included z/VM 7.2 and Linux SLES 12 SP1.

Figure 1. Vswitch Environment

Figure vswfig1 not displayed.

Use of a vswitch to connect the client guest to the server guest.

Figure 2. OSA Environment

Figure vswfig2 not displayed.

Use of dedicated OSA to connect the client guest to the server guest.

In both environments, the server Linux guest ran in LPAR 1 and the client Linux guest ran in LPAR 2. Each LPAR had 512 GB of central storage. CP monitor data was captured for LPAR 1 (server side) during each measurement and reduced using Performance Toolkit for VM (Perfkit).

The z/VM 5.2 measurements captured data from the client side. For this new study, the data was captured on the server side. This more closely aligns with the role typically played by a Linux guest.

Results and Discussion

The following tables contain the average of select metrics for each run. For RR runs, the focus is on transaction rate. For STR runs, the focus is on outbound data transmission rate. The tables also compare the difference in these metrics between the OSA and vswitch runs. The %diff numbers shown are the percent change comparing OSA to the vswitch. For example, if the number is positive, OSA was that percent greater than vswitch.

In general, a Linux guest using a dedicated OSA gets higher throughput and uses less CPU time than a Linux guest connected through a vswitch. However, this must be balanced against advantages gained using the vswitch, such as:

Ease of network design
Ability to share network resources (OSA card)
Management of the network including security and capabilities available to the z/VM guest on the LAN
Measurement of the network via z/VM monitor records
Layer 3 bridge
Less overhead than using a router stack

Table 2. Results of RR runs with MTU size of 1492 and using SMT-1

Transport mode Layer 3 Layer 3 Layer 3 Layer 2 Layer 2 Layer 2

Number of Clients 1 10 50 1 10 50

Workload RR RR RR RR RR RR

MTU size 1492 1492 1492 1492 1492 1492

SMT mode SMT-1 SMT-1 SMT-1 SMT-1 SMT-1 SMT-1

VSwitch

Runid NVS1L301 NVS1L310 NVS1L350 NVS1L201 NVS1L210 NVS1L250

ETR 5,754.34 35,214.88 89,560.30 5,766.87 35,484.94 90,498.04

Total CPU msec/transaction 0.00855 0.00481 0.00507 0.00902 0.00474 0.00495

Emul CPU msec/transaction 0.00589 0.00356 0.00402 0.00617 0.00350 0.00388

CP CPU msec/transaction 0.00266 0.00125 0.00105 0.00285 0.00124 0.00107

OSA

Runid NOS1L301 NOS1L310 NOS1L350 NOS1L201 NOS1L210 NOS1L250

ETR 9,950.98 59,350.26 160225.18 10,026.60 59,397.52 163,282.63

Total CPU msec/transaction 0.01025 0.00696 0.00475 0.01015 0.00686 0.00464

Emul CPU msec/transaction 0.00927 0.00657 0.00465 0.00920 0.00648 0.00454

CP CPU msec/transaction 0.00098 0.00039 0.00010 0.00095 0.00038 0.00010

% difference

ETR 72.93% 68.54% 78.90% 73.87% 67.39% 80.43%

Total CPU msec/transaction 19.88% 44.70% -6.31% 12.53% 44.73% -6.26%

Emul CPU msec/transaction 57.39% 84.55% 15.67% 49.11% 85.14% 17.01%

CP CPU msec/transaction -63.16% -68.80% -90.48% -66.67% -69.35% -90.65%

Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The ETR of the OSA runs was 68.54% to 80.43% higher than the equivalent vswitch runs when running the RR workload in an SMT-1 configuration with an MTU size of 1492. The total CPU per transaction of the OSA runs was between 44.73% higher to 6.31% lower than the equivalent vswitch runs.

Table 3. Results of RR runs with MTU size of 1492 and using SMT-2

Transport mode Layer 3 Layer 3 Layer 3 Layer 2 Layer 2 Layer 2

Number of Clients 1 10 50 1 10 50

Workload RR RR RR RR RR RR

MTU size 1492 1492 1492 1492 1492 1492

SMT mode SMT-2 SMT-2 SMT-2 SMT-2 SMT-2 SMT-2

VSwitch

Runid NVS2L301 NVS2L310 NVS2L350 NVS2L201 NVS2L210 NVS2L250

ETR 5,705.34 34,202.31 75,775.96 5,675.12 34,485.39 75,912.95

Total CPU msec/transaction 0.01059 0.00571 0.00670 0.01256 0.00560 0.00658

Emul CPU msec/transaction 0.00787 0.00438 0.00532 0.00913 0.00428 0.00520

CP CPU msec/transaction 0.00272 0.00133 0.00138 0.00343 0.00132 0.00138

OSA

Runid NOS2L301 NOS2L310 NOS2L350 NOS2L201 NOS2L210 NOS2L250

ETR 9,721.54 58,886.62 157,694.74 9,776.24 58,482.06 159,551.13

Total CPU msec/transaction 0.01192 0.00802 0.00586 0.01177 0.00782 0.00576

Emul CPU msec/transaction 0.01101 0.00762 0.00573 0.01086 0.00743 0.00564

CP CPU msec/transaction 0.00091 0.00040 0.00013 0.00091 0.00039 0.00012

% difference

ETR 70.39% 72.17% 108.11% 72.26% 69.59% 110.18%

Total CPU msec/transaction 12.56% 40.46% -12.54% -6.29% 39.64% -12.46%

Emul CPU msec/transaction 39.90% 73.97% 7.71% 18.95% 73.60% 8.46%

CP CPU msec/transaction -66.54% -69.92% -90.58% -73.47% -70.45% -91.30%

Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The ETR of the OSA runs was 69.59% to 110.18% higher than the equivalent vswitch runs when running the RR workload in an SMT-2 configuration with an MTU size of 1492. The total CPU per transaction of the OSA runs was between 40.46% higher to 12.54% lower than the equivalent vswitch runs.

Table 4. Results of STR runs with MTU size of 1492 and using SMT-1

Transport mode Layer 3 Layer 3 Layer 3 Layer 2 Layer 2 Layer 2

Number of Clients 1 10 50 1 10 50

Workload STR STR STR STR STR STR

MTU size 1492 1492 1492 1492 1492 1492

SMT mode SMT-1 SMT-1 SMT-1 SMT-1 SMT-1 SMT-1

VSwitch

Runid NVM1L301 NVM1L310 NVM1L350 NVM1L201 NVM1L210 NVM1L250

Outbound MB/sec 481 913 997 450 1,042 1,036

Total CPU msec/Outbound MB 2.00728 1.62651 1.40020 1.99556 1.59693 1.45753

Emul CPU msec/Outbound MB 1.15904 1.26725 1.06018 1.20244 1.25432 1.12548

CP CPU msec/Outbound MB 0.84824 0.35926 0.34002 0.79312 0.34261 0.33205

OSA

Runid NOM1L301 NOM1L310 NOM1L350 NOM1L201 NOM1L210 NOM1L250

Outbound MB/sec 935 1,131 1,136 785 1,159 1,155

Total CPU msec/Outbound MB 0.98599 1.03802 1.09419 1.07975 1.26488 1.26320

Emul CPU msec/Outbound MB 0.98513 1.03271 1.08363 1.07873 1.25626 1.25887

CP CPU msec/Outbound MB 0.00086 0.00531 0.01056 0.00102 0.00862 0.00433

% difference

Outbound MB/sec 94.39% 23.88% 13.94% 74.44% 11.23% 11.49%

Total CPU msec/Outbound MB -50.88% -36.18% -21.85% -45.89% -20.79% -13.33%

Emul CPU msec/Outbound MB -15.00% -18.51% 2.21% -10.29% 0.15% 11.85%

CP CPU msec/Outbound MB -99.90% -98.52% -96.89% -99.87% -97.48 -98.70%

Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 11.23% to 94.39% higher than the equivalent vswitch runs when running the STR workload in an SMT-1 configuration with an MTU size of 1492. The total CPU per outbound MB rate of the OSA runs was between 13.33% to 50.88% lower than the equivalent vswitch runs.

Table 5. Results of STR runs with MTU size of 1492 and using SMT-2

Transport mode Layer 3 Layer 3 Layer 3 Layer 2 Layer 2 Layer 2

Number of Clients 1 10 50 1 10 50

Workload STR STR STR STR STR STR

MTU size 1492 1492 1492 1492 1492 1492

SMT mode SMT-2 SMT-2 SMT-2 SMT-2 SMT-2 SMT-2

VSwitch

Runid NVM2L301 NVM2L310 NVM2L350 NVM2L201 NVM2L210 NVM2L250

Outbound MB/sec 448 812 885 380 876 843

Total CPU msec/Outbound MB 2.21920 1.81527 1.82260 2.23289 1.92123 1.85647

Emul CPU msec/Outbound MB 1.38638 1.40640 1.40904 1.39026 1.48630 1.43416

CP CPU msec/Outbound MB 0.83282 0.40887 0.41356 0.84263 0.43493 0.42231

OSA

Runid NOM2L301 NOM2L310 NOM2L350 NOM2L201 NOM2L210 NOM2L250

Outbound MB/sec 875 1,129 1,121 761 1,075 1,072

Total CPU msec/Outbound MB 1.05371 1.23738 1.41659 1.12431 1.54884 1.51026

Emul CPU msec/Outbound MB 1.05269 1.23206 1.40856 1.12326 1.53395 1.50466

CP CPU msec/Outbound MB 0.00102 0.00532 0.00803 0.00105 0.01489 0.00560

% difference

Outbound MB/sec 95.31% 39.04% 26.67% 100.26% 22.72% 27.16%

Total CPU msec/Outbound MB -52.52% -31.83% -22.28% -49.65% -19.38% -18.65%

Emul CPU msec/Outbound MB -24.07% -12.40% -0.03% -19.21% 3.21% 4.92%

CP CPU msec/Outbound MB -99.88% -98.70% -98.06% -99.88% -96.58% -98.67%

Notes: 8561-T01, 1 dedicated IFL core, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 22.72% to 100.26% higher than the equivalent vswitch runs when running the STR workload in an SMT-2 configuration with an MTU size of 1492. The total CPU per outbound MB rate of the OSA runs was between 18.65% to 52.52% lower than the equivalent vswitch runs.

Table 6. Results of STR runs with MTU size of 8992 and using SMT-1

Transport mode Layer 3 Layer 3 Layer 3 Layer 2 Layer 2 Layer 2

Number of Clients 1 10 50 1 10 50

Workload STR STR STR STR STR STR

MTU size 8992 8992 8992 8992 8992 8992

SMT mode SMT-1 SMT-1 SMT-1 SMT-1 SMT-1 SMT-1

VSwitch

Runid NVL1L301 NVL1L310 NVL1L350 NVL1L201 NVL1L210 NVL1L250

Outbound MB/sec 1011 1156 1154 747 1158 1156

Total CPU msec/Outbound MB 0.64857 0.61808 0.63648 0.63387 0.58109 0.60580

Emul CPU msec/Outbound MB 0.38586 0.38901 0.39636 0.38541 0.37332 0.39273

CP CPU msec/Outbound MB 0.26271 0.22907 0.24012 0.24846 0.20777 0.21307

OSA

Runid NOL1L301 NOL1L310 NOL1L350 NOL1L201 NOL1L210 NOL1L250

Outbound MB/sec 1,121 1,153 1,154 1,113 1,157 1,156

Total CPU msec/Outbound MB 0.50696 0.54293 0.56205 0.54403 0.56206 0.56522

Emul CPU msec/Outbound MB 0.50000 0.53374 0.54948 0.53504 0.55315 0.55303

CP CPU msec/Outbound MB 0.00696 0.00919 0.01257 0.00899 0.00891 0.01219

% difference

Outbound MB/sec 10.88% -0.26% 0.00% 49.00% -0.09% 0.00%

Total CPU msec/Outbound MB -21.83% -12.16% -11.69% -14.17% -3.27% -6.70%

Emul CPU msec/Outbound MB 29.58% 37.20% 38.63% 38.82% 48.17% 40.82%

CP CPU msec/Outbound MB -97.35% -95.99% -94.77% -96.38% -95.71% -94.28%

Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 0.26% lower to 49.00% higher than the equivalent vswitch runs when running the STR workload in an SMT-1 configuration with an MTU size of 8992. The total CPU per outbound MB rate of the OSA runs was between 3.27% to 21.83% lower than the equivalent vswitch runs.

Table 7. Results of STR runs with MTU size of 8992 and using SMT-2

Transport mode Layer 3 Layer 3 Layer 3 Layer 2 Layer 2 Layer 2

Number of Clients 1 10 50 1 10 50

Workload STR STR STR STR STR STR

MTU size 8992 8992 8992 8992 8992 8992

SMT mode SMT-2 SMT-2 SMT-2 SMT-2 SMT-2 SMT-2

VSwitch

Runid NVL2L301 NVL2L310 NVL2L350 NVL2L201 NVL2L210 NVL2L250

Outbound MB/sec 916 1156 1155 598 1157 1156

Total CPU msec/Outbound MB 0.73362 0.71678 0.73792 0.70084 0.68634 0.74334

Emul CPU msec/Outbound MB 0.45382 0.46557 0.47506 0.43779 0.45748 0.50389

CP CPU msec/Outbound MB 0.27980 0.25121 0.26286 0.26305 0.22886 0.23945

OSA

Runid NOL2L301 NOL2L310 NOL2L350 NOL2L201 NOL2L210 NOL2L250

Outbound MB/sec 1,137.00 1,145.00 1,154.00 1,123.00 1,156.00 1,156.00

Total CPU msec/Outbound MB 0.53369 0.59712 0.63527 0.57106 0.61090 0.63503

Emul CPU msec/Outbound MB 0.52647 0.58777 0.62227 0.56215 0.60156 0.62301

CP CPU msec/Outbound MB 0.00722 0.00935 0.01300 0.00891 0.00934 0.01202

% difference

Outbound MB/sec 24.13% -0.95% -0.09% 87.79% -0.09% 0.00%

Total CPU msec/Outbound MB -27.25% -16.69% -13.91% -18.52% -10.99% -14.57%

Emul CPU msec/Outbound MB 16.01% 26.25% 30.99% 28.41% 31.49% 23.64%

CP CPU msec/Outbound MB -97.42% -96.28% -95.05% -96.61% -95.92% -94.98%

Notes: 8561-T01, 1 dedicated IFL core, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 0.95% lower to 87.79% higher than the equivalent vswitch runs when running the STR workload in an SMT-2 configuration with an MTU size of 8992. The total CPU per outbound MB rate of the OSA runs was between 10.99% to 27.25% better than the equivalent vswitch runs.

Summary

The results of the experiments conducted for this report indicate that for a request-response (RR) workload, Linux guests using a dedicated OSA experience a greater ETR than Linux guests using a vswitch. Further, for a streaming (STR) workload, Linux guests using a dedicated OSA experience equal or greater outbound data rate than Linux guests using a vswitch. The degree of improvement varies depending on the number of concurrent connections used between the two guests, especially in the case of a streaming workload.