IBM: z/VM Performance Report: Guest Support for FICON CTCA

Guest Support for FICON CTCA

z/VM (at the appropriate service level) supports FICON Channel-to-Channel communications between an IBM zSeries 900 and another z900 or an S/390 Parallel Enterprise Server G5 or G6. This enables more reliable and higher bandwidth host-to-host communication than is available with ESCON channels. Note that there are two types of FICON channels which are referred to as FICON and FICON Express. The latter has higher throughput and maximum bandwidth capability. We did not have access to FICON Express and so all references to FICON in this section refer to the former.

Methodology: This section presents and discusses measurement results that assess the performance of the FICON adapter using the support included in z/VM 4.2.0 CP, with APAR VM62906 applied, and comparing it with existing ESCON support.

The workload driver is an internal tool which can be used to simulate such bulk-data transfers as FTP or primitive benchmarks such as streaming or request-response. The data are driven from the application layer of the TCP/IP protocol stack, thus causing the entire networking infrastructure, including the adapter and the TCP/IP protocol code, to be measured. It moves data between client-side memory and server-side memory, eliminating all outside bottlenecks such as DASD or tape.

A client-server pair was used in which the client sent one byte and received 20MB of data (streaming workload) or in which the client sent 200 bytes and received 1000 bytes (request-response workload). Additional client-server pairs were added to determine if throughput would vary with an increase of number of connections.

Figure 1. Environment

Figure fcnenv not displayed.

Figure 1 shows the measurement environment. VM_c was a 2064-109 with the clients running in an LPAR with 2 dedicated processors. The LPAR had 1GB central storage and 2GB expanded storage, and z/VM 4.2.0 with APAR VM62906 applied. VM_s was a 9672-XZ7 with the servers running in an LPAR which shares 12 processors with 2 other LPARs. The LPAR had 2GB central storage and 608M expanded store. All clients, running on VM_c, communicated over either the ESCON adapter or the FICON adapter card (shown as adp_c) to the servers on VM_s, which communicated over a second FICON or ESCON adapter card (shown as adp_s).

While collecting the performance data, it was determined that optimum streaming workload results were achieved when TCP/IP was configured with DATABUFFERPOOLSIZE set to 32760 and DATABUFFERLIMITS set to 10 for both the outbound buffer limit and the inbound buffer limit. These parameters are used to determine the number and size of buffers that may be allocated for a TCP connection that is using window scaling.

It should be noted that it is possible for monitor data to not reflect that a device is a FICON device. This can happen if the device goes offline (someone pops the card) and goes online but without the vary online command being issued. If this situation is ever encountered a vary offline followed by a vary online command will correct the situation.

Each performance run, starting with 1 client-server pair and progressing to 10 client-server pairs, consisted of starting the server(s) on VM_s and then starting the client(s) on VM_c. The client(s) received data for 400 seconds. Monitor data were collected for 330 seconds of that time period. Data were collected only on the client machine.

At least 3 measurement trials were taken for each case, and a representative trial was chosen to show in the results. A complete set of runs was done with the maximum transmission unit (MTU) set to 32760 for streaming, and to 1500 for both streaming and request-response (RR). The CP monitor data for each measurement were reduced by VMPRF.

There are multiple devices associated with a FICON channel and TCPIP can be configured to use just one device or multiple devices. Measurements were done with just one device configured for the 1, 3, 5, and 10 client-server pair runs. Measurements were then done, for comparison purposes, with one device per client-server pair. This was done by specifying a unique device number on each of 10 device statements and associating it with a unique ip address. Note that ESCON does not have the same multiplexing capability that FICON does and therefore does not benefit from this same technique.

The following charts show, for both ESCON and FICON, throughput and CPU time for the streaming and RR workloads. Each chart has a bar for each connectivity/MTU pair measured. Specific details are mentioned for each workload after the charts for that workload.

Figure 2. Throughput - Streaming

Figure thrus not displayed.

Throughput for all cases shows that both ESCON and FICON with a single device maintained the same throughput rate that they achieved with one connection. FICON was able to move about twice as much as ESCON. However, when one device was used per connection, the throughput was much better for both MTU sizes.

Figure 3. CPU Time - Streaming

Figure times not displayed.

The corresponding CPU time, in general, shows the same pattern where time increases with each additional client/server pair. Both FICON and ESCON had approximately the same amount of CPU msec/MB for the 32K MTU case. The 1500 MTU cases showed higher CPU msec/MB with the FICON multiple device case showing higher efficiencies.

Figure 4. Throughput - RR

Figure thrur not displayed.

Throughput for all cases shows the same trend of increasing as additional connections are made. ESCON leads throughput until 10 connections, where FICON with single devices does better. Note that using multiples devices (one per connection pair) yielded poorer results than either ESCON or FICON with a single device defined.

Figure 5. CPU Time - RR

Figure timer not displayed.

CPU time decreases slightly as the workload increases and the system becomes more efficient for both ESCON and FICON with a single device. This was not true for FICON with multiple devices.

Results: The results are summarized in the following tables. MB/sec (megabytes per second) or trans/sec (transactions per second) was supplied by the workload driver and shows the throughput rate. All other values are from CP monitor data or derived from CP monitor data.

Total_cpu_util

This field was obtained from the SYSTEM_SUMMARY_BY_TIME VMPRF report that shows the average of both processors out of 100%.

tcpip_tot_cpu_util

This field is calculated from the USER_RESOURCE_UTILIZATION VMPRF report (CPU seconds, total) for the client stack (tcpip). 100% is the equivalent of one fully utilized processor.

cpu_msec/MB

This field was calculated from the previous tot_cpu_util divided by the number of megabytes per second to show the number of milliseconds of time per megabyte.

cpu_msec/trans

This field was calculated from the previous tot_cpu_util divided by the number of transactions per second to show the number of milliseconds of time per transaction.

Table 1. ESCON - Streaming 32K

Number of clients
runid

01
ensf0103

03
ensf0302

05
ensf0502

10
ensf1002

MB/sec
elapsed time (sec)
total_cpu_util

12.88
330.00
9.20

12.90
330.00
9.30

12.90
330.00
9.40

12.90
330.00
9.80

tcpip_tot_cpu_util
tcpip_virt_cpu_util

10.60
5.20

10.60
5.20

10.90
5.20

11.20
5.50

cpu_msec/MB
emul_msec/MB
cp_msec/MB

14.29
8.85
5.43

14.42
8.99
5.43

14.57
9.15
5.43

15.19
9.46
5.74

tcpip_cpu_msec/MB
tcpip_vcpu/MB
tcpip_ccpu/MB

8.23
4.00
4.23

8.22
3.99
4.23

8.46
3.99
4.46

8.69
4.23
4.46

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 2. FICON - Streaming 32K - Single Device

Number of clients
runid

01
fnsf0103

03
fnsf0301

05
fnsf0502

10
fnsf1003

MB/sec
elapsed time (sec)
total_cpu_util

28.23
330.00
20.30

28.26
330.00
20.80

28.25
330.00
20.80

28.00
330.00
21.10

tcpip_tot_cpu_util
tcpip_virt_cpu_util

23.90
11.50

24.20
11.80

24.20
11.80

24.50
11.80

cpu_msec/MB
emul_msec/MB
cp_msec/MB

14.38
9.07
5.31

14.72
9.20
5.52

14.73
9.27
5.45

15.07
9.50
5.57

tcpip_cpu_msec/MB
tcpip_vcpu_msec/MB
tcpip_ccpu_msec/MB

8.48
4.08
4.40

8.58
4.18
4.40

8.58
4.18
4.40

8.77
4.22
4.55

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 3. FICON - Streaming 32K - Multiple Devices

Number of clients
runid

01
fjsx0102

03
fjsx0301

05
fjsx0501

10
fjsx1003

MB/sec
elapsed time (sec)
total_cpu_util

28.25
330.00
19.80

44.93
330.00
32.30

45.78
330.00
33.40

47.02
330.00
36.80

tcpip_tot_cpu_util
tcpip_virt_cpu_util

22.70
11.20

37.30
18.20

38.80
18.80

43.00
20.70

cpu_msec/MB
emul_msec/MB
cp_msec/MB

14.02
8.78
5.24

14.38
9.04
5.34

14.59
9.17
5.42

15.65
9.83
5.83

tcpip_cpu_msec/MB
tcpip_vcpu_msec/MB
tcpip_ccpu_msec/MB

8.05
3.97
4.08

8.30
4.05
4.25

8.47
4.10
4.37

9.15
4.40
4.75

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

FICON shows more than twice the throughput as ESCON when using a single device for FICON. When multiple devices are defined, FICON shows more than four times improvement.

Table 4. ESCON - Streaming 1500

Number of clients
runid

01
essf0103

03
essf0303

05
essf0503

10
essf1001

MB/sec
elapsed time (sec)
total_cpu_util

11.78
330.00
12.80

11.78
330.00
14.60

11.77
330.00
16.30

11.87
330.00
20.30

tcpip_tot_cpu_util
tcpip_virt_cpu_util

17.30
11.50

18.50
12.40

20.00
13.00

22.70
14.50

cpu_msec/MB
emul_msec/MB
cp_msec/MB

21.73
15.79
5.94

24.79
18.00
6.79

27.70
20.05
7.65

34.20
24.77
9.44

tcpip_cpu_msec/MB
tcpip_vcpu_msec/MB
tcpip_ccpu_msec/MB

14.66
9.78
4.89

15.69
10.55
5.14

16.99
11.07
5.92

19.15
12.25
6.89

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 5. FICON - Streaming 1500 - Single Device

Number of clients
runid

01
fssf0101

03
fssf0302

05
fssf0503

10
fssf1003

MB/sec
elapsed time (sec)
total_cpu_util

25.35
300.00
25.00

25.69
300.00
31.50

26.07
330.00
35.10

26.25
330.00
42.10

tcpip_tot_cpu_util
tcpip_virt_cpu_util

34.30
23.70

39.70
26.70

42.40
28.20

47.90
30.90

cpu_msec/MB
emul_msec/MB
cp_msec/MB

19.72
14.44
5.29

24.52
17.98
6.54

26.93
19.79
7.13

32.08
23.39
8.69

tcpip_cpu_msec/MB
tcpip_vcpu_msec/MB
tcpip_ccpu_msec/MB

13.54
9.34
4.21

15.45
10.38
5.07

16.27
10.81
5.46

18.24
11.77
6.46

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 6. FICON - Streaming 1500 - Multiple Devices

Number of clients
runid

01
fssx0102

03
fssx0301

05
fssx0502

10
fssx1002

MB/sec
elapsed time (sec)
total_cpu_util

25.44
330.00
26.10

41.85
330.00
43.50

41.46
330.00
44.80

40.92
330.00
46.40

tcpip_tot_cpu_util
tcpip_virt_cpu_util

35.80
24.50

60.00
40.70

61.20
40.90

62.70
41.50

cpu_msec/MB
emul_msec/MB
cp_msec/MB

20.52
14.86
5.66

20.79
15.01
5.78

21.61
15.53
6.08

22.68
16.18
6.50

tcpip_cpu_msec/MB
tcpip_vcpu_msec/MB
tcpip_ccpu_msec/MB

14.17
9.65
4.53

14.34
9.72
4.62

14.76
9.87
4.90

15.33
10.15
5.18

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 7. ESCON - RR 1500

Number of clients
runid

01
esrf0101

03
esrf0301

05
esrf0503

10
esrf1003

trans/sec
elapsed time (sec)
total_cpu_util

476.89
330.00
12.20

1096.80
330.00
27.20

1505.14
330.00
35.60

1824.48
330.00
42.80

tcpip_tot_cpu_util
tcpip_virt_cpu_util

12.40
6.40

25.80
13.00

32.40
16.70

37.30
19.40

cpu_msec/trans
emul_msec/trans
cp_msec/trans

0.51
0.34
0.18

0.50
0.33
0.16

0.47
0.32
0.15

0.47
0.33
0.14

tcpip_cpu_msec/trans
tcpip_vcpu_msec/trans
tcpip_ccpu_msec/trans

0.26
0.13
0.13

0.23
0.12
0.12

0.22
0.11
0.10

0.20
0.11
0.10

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 8. FICON - RR 1500 - Single Device

Number of clients
runid

01
fsrf0102

03
fsrf0301

05
fsrf0502

10
fsrf1002

trans/sec
elapsed time (sec)
total_cpu_util

437.71
330.00
11.40

1042.70
330.00
25.30

1412.47
300.00
33.20

1881.99
330.00
43.20

tcpip_tot_cpu_util
tcpip_virt_cpu_util

11.50
5.80

24.20
12.40

30.30
15.70

37.90
19.70

cpu_msec/trans
emul_msec/trans
cp_msec/trans

0.52
0.33
0.19

0.49
0.33
0.16

0.47
0.32
0.15

0.46
0.32
0.14

tcpip_cpu_msec/trans
tcpip_vcpu_msec/trans
tcpip_ccpu_msec/trans

0.26
0.13
0.13

0.23
0.12
0.11

0.21
0.11
0.10

0.20
0.10
0.10

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

Table 9. FICON - RR 1500 - Multiple devices

Number of clients
runid

01
fsrx0102

03
fsrx0301

05
fsrx0502

10
fsrx1001

trans/sec
elapsed time (sec)
total_cpu_util

437.07
300.00
11.90

841.23
330.00
24.30

956.42
330.00
26.30

978.30
330.00
27.80

tcpip_tot_cpu_util
tcpip_virt_cpu_util

11.70
6.00

23.60
12.80

26.30
13.30

27.30
13.90

cpu_msec/trans
emul_msec/trans
cp_msec/trans

0.54
0.35
0.20

0.58
0.39
0.19

0.55
0.36
0.19

0.57
0.38
0.19

tcpip_cpu_msec/trans
tcpip_vcpu_msec/trans
tcpip_ccpu_msec/trans

0.28
0.14
0.14

0.28
0.14
0.14

0.28
0.14
0.14

0.28
0.14
0.14

Note: 2064-109; z/VM 4.2.0 with 64-bit CP; TCP/IP 420; APAR VM62906

With the smaller amounts of data being transferred, the RR workload favors ESCON with FICON single device being similar. TCPIP uses a 32K buffer when transferring data over CTC and this is most likely the reason that the RR workload did not benefit from using multiple devices.