Single Core CPU Tests - comprising Whetstone, Dhrystone, Linpack and Livermore Loops Classic Benchmarks. Compared with a Pi 4B/Pi 3B+ CPU MHz ratio of 1.07, the overall performance gains for these four programs increased to around 1.8, 2.0, 4.0 and 2.8 times, with some further improvements between 1.05 and 1.26 from gcc 8 compilations.
Single Core Memory Benchmarks - measuring performance using data from caches and RAM. These include eight different measurements of FFTs, at 11 increasing sizes, with average Pi 4B speed gains of 3.26 times. BusSpeed was intended to identify maximum reading speeds, where there was not much difference from L1 cache, some gain via L2 cache and 80% from RAM, increasing by a further 25% using the gcc 8 compilation. MemSpeed and NeonSpeed carry out floating point and integer calculations, providing Pi 4B speed gains at all levels, best with double precision floating point calculations at greater than five times.
Multithreading Benchmarks - Most of the multithreading benchmarks execute the same calculations using 1, 2, 4 and 8 threads. The first are for Whetstone, Dhrystone and Linpack benchmarks, providing similar Pi 4B gains as the single core versions, with only Whetstones providing effective four core performance.
Various multithreaded and OpenMP cache/RAM benchmarks were run, these mainly demonstrating the sort of code that is good and bad for efficient MP utilisation. Most demonstrated appropriate single core Pi 4B performance gains, but with some other relationships totally confusing.
Finally, a number of benchmarks attempt to measure maximum MFLOPS floating point speed, using the same series of calculations, with variants covering single and double precision (SP and DP), vector intrinsic functions and OpenMP. Best DP performance was 10.4 GFLOPS with SP at 19.9 GFLOPS. Highest Pi 4B/Pi 3B+ gains were 6.69 times DP and 5.15 times SP. The gcc 8 compilations provided some improvement in speed.
Java and OpenGL Benchmarks - A Java Whetstone benchmark is provided and one using JavaDraw procedures. Test functions of the former were more than twice as fast on the Pi 4B, compared with the 3B+ and similar via javaDraw, for the more demanding tests, also many of the 25 OpenGL test routines. Initially Oracle 8 Java was used but later tests were via OpenJDK11.
Drive LAN and WiFi Benchmarks - Variations of the same program are provided to benchmark internal and USB drives or LAN and WiFi connections, measuring performance using large files, small files and random access. Considering large files, Pi 4B performance improvement shown were up to four times LAN, over five times USB 3, with similar scores using WiFi.
Stress Tests - These have also been run and will be covered in a later report. Default mode provides useful benchmarking information, as shown below. Pi 4B/Pi 3B+ performance ratios are shown to be up to 4.23 for cache based data and 2.09 using RAM.
I have run my benchmarks on the new system, where more descriptions and earlier results can be found in Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and stress tests.htm. The early opportunity to run the programs was due to my acceptance of the request for me to become a volunteer consultant, exercising the system prior to launch.
The programs and source codes used are available for downloading in Raspberry-Pi-4-Benchmarks.tar.gz.
My most recent benchmarks were compiled for the Raspberry Pi 2, using gcc 4.8. I tried others later, but they did not seem to make much difference. I thought that using a Cortex A72 might, so I have compiled the programs using gcc 8. The first step was to change the functions used to identify the hardware, where the existing procedures replicate information for each core (even four lots were too much). I noted that the lscpu command now provides adequate detail, so I use this now. The Raspbian release is also provided. RPi 3B+ and RPi 4B details are as follows:
Pi 3B+ Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Model: 4 Model name: ARMv7 Processor rev 4 (v7l) CPU max MHz: 1400.0000 CPU min MHz: 600.0000 BogoMIPS: 89.60 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 Raspberry Pi reference 2018年04月18日 Pi 4B Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Vendor ID: ARM Model: 3 Model name: Cortex-A72 Stepping: r0p3 CPU max MHz: 1500.0000 CPU min MHz: 600.0000 BogoMIPS: 270.00 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 Raspberry Pi reference 2019年05月13日
In this case, the overall MWIPS comparison ratios provide valid comparisons, The Pi 4B being between 1.76 and 1.87 times faster than the 3B+. Then gcc 8 provided no real improvement.
System MHz MWIPS ------MFLOPS------ ------------MOPS--------------- 1 2 3 COS EXP FIXPT IF EQUAL Arm V6 Pi 3B+ 1400 1094 391 407 348 21.7 12.3 1740 2084 1391 Pi 4B 1500 2048 520 473 389 53.8 27.1 2497 2245 2246 4B/3B+ 1.07 1.87 1.33 1.16 1.12 2.47 2.20 1.44 1.08 1.61 ARM V7 Pi 3B+ 1400 1060 391 383 298 21.7 12.3 1740 2083 1392 Pi 4B 1500 1884 516 478 310 54.7 27.1 2498 2247 999 4B/3B+ 1.07 1.78 1.32 1.25 1.04 2.52 2.21 1.44 1.08 0.72 gcc 8 Pi 3B+ 1400 1063 393 373 300 21.8 12.3 1748 2097 1398 Pi 4B 1500 1883 522 471 313 54.9 26.4 2496 3178 998 4B/3B+ 1.00 1.76 1.33 1.26 1.05 2.51 2.09 1.43 1.52 0.71 gcc 8/V7 Pi 4B 1.00 1.00 1.01 0.99 1.01 1.00 0.97 1.00 1.41 1.00
The Pi 4B was shown to be around twice as fast as the 3B+ and gcc 8 performance was similar to the ARM V7 compilation.
Best ----- Compiler ----- DMIPS System MHz ARM V6 ARM V7 gcc 8 G8/V7 /MHz Pi 3B+ 1400 2520 2825 2838 1.00 2.03 Pi 4B 1500 5077 5366 5646 1.05 3.76 4B/3B+ 1.07 2.01 1.90 1.99 1.86
All measurements demonstrate that the Pi 4B was between 3,6 and 4.7 times faster than the Pi 3B+.
ARM V6 ARM V7 gcc 8 vgcc8/ARMV7 System MHz DP SP DP SP DP SP DP SP Pi 3B+ 1400 206.0 220.2 210.5 225.2 224.8 227.3 1.00 1.01 Pi 4B 1500 764.7 880.6 760.2 921.6 957.1 1068.8 1.04 1.12 4B/3B+ 1.07 3.71 4.00 3.61 4.09 4.26 4.70
Based on Geomean results, Pi 4B is shown as being 2.36 times faster than the 3B+, and even more so via the gcc 8 compilation, where the gcc8/V7 performance ratio identified is 1.31.
MFLOPS for 24 loops Pi 3B+ 225 266 465 394 147 196 411 449 408 207 155 87 100 125 263 258 359 335 236 248 133 93 339 199 Pi 4B 746 964 988 943 212 538 1169 1800 1032 469 214 186 159 335 778 623 732 1034 320 350 489 360 749 187 Pi 3B+ gcc 8 330 262 459 407 231 198 538 542 462 247 174 198 122 123 281 240 394 325 275 294 213 94 354 198 Pi 4B gcc 8 1480 1017 974 930 383 657 1624 1861 1664 617 498 741 221 320 803 640 737 1003 451 378 1047 411 763 187 Comparisons System MHz Maximum Average Geomean Harmean Minimum ARM V7 Pi 3B+ 1400 464.8 246.7 220.1 193.9 78.3 Pi 4B 1500 1800.2 635.1 519.0 416.1 155.3 4B/3B+ 1.07 3.87 2.57 2.36 2.15 1.98 gcc 8 Pi 3B+ 1400 541.7 283.4 257.4 231.5 92.7 Pi 4B 1500 1860.8 800.4 679.0 564.1 179.5 4B/3B+ 1.07 3.40 2.80 2.61 2.41 1.90 g8/V7 1.00 1.03 1.26 1.31 1.36 1.16
Following are average running times from the three passes, then RPi 4B performance gains (fewer milliseconds), where all those for the optimised version were greater than 3 times and also many from the original benchmark. Most gcc 8 running times. on the Pi 4B, were slightly faster than the those produced by the older version.
Time in milliseconds Raspberry Pi 3B+ FFT 1 Raspberry Pi 3B+ FFT 3 ARM V7 gcc 8 ARM V7 gcc 8 Size K SP DP SP DP SP DP SP DP 1 0.14 0.14 0.16 0.17 0.18 0.14 0.15 0.14 2 0.31 0.36 0.35 0.48 0.39 0.32 0.33 0.32 4 0.78 0.92 0.91 1.32 1.05 0.77 0.78 0.75 8 1.92 2.17 3.02 3.36 2.14 1.76 1.84 1.76 16 4.67 5.28 5.09 5.99 4.71 5.46 4.27 4.89 32 10.95 20.57 12.31 20.62 10.71 15.03 9.55 13.65 64 34.54 128.96 37.33 130.93 28.94 36.78 26.09 33.23 128 246.04 308.67 254.23 320.44 70.03 84.44 64.74 76.98 256 586.84 638.88 620.49 734.14 157.29 196.35 145.14 180.66 512 1232.41 1374.18 1235.39 1447.85 363.61 434.28 336.57 405.09 1024 2759.71 2993.38 2779.37 3094.66 806.78 975.33 736.46 912.78 Size Raspberry Pi 4B FFT 1 Raspberry Pi 4B FFT 3 K 1 0.04 0.04 0.04 0.04 0.06 0.05 0.05 0.04 2 0.08 0.12 0.08 0.13 0.13 0.11 0.10 0.10 4 0.32 0.37 0.29 0.34 0.27 0.24 0.24 0.23 8 0.77 0.97 0.79 0.82 0.58 0.55 0.57 0.51 16 1.69 2.01 1.65 1.85 1.49 1.35 1.32 1.19 32 4.37 4.89 3.76 4.71 2.96 3.63 2.69 3.30 64 9.12 26.55 8.82 30.64 7.46 10.75 6.60 9.47 128 55.52 160.11 58.54 132.41 17.93 26.03 16.92 23.85 256 305.92 423.06 275.44 373.12 41.16 55.06 37.61 55.97 512 833.10 854.88 780.89 751.27 86.93 120.53 81.54 128.13 1024 1617.49 1875.52 1578.70 1812.20 190.28 266.60 186.45 288.27 Size RPi 4B Gains (>1.0 4B running time is less) K 1 3.45 3.46 4.02 3.94 3.06 2.66 2.88 3.45 2 3.79 3.14 4.27 3.84 3.10 2.93 3.28 3.29 4 2.46 2.50 3.19 3.84 3.86 3.23 3.24 3.22 8 2.51 2.24 3.82 4.12 3.67 3.18 3.21 3.44 16 2.76 2.62 3.08 3.23 3.17 4.06 3.25 4.10 32 2.51 4.21 3.27 4.38 3.62 4.14 3.55 4.13 64 3.79 4.86 4.23 4.27 3.88 3.42 3.95 3.51 128 4.43 1.93 4.34 2.42 3.91 3.24 3.83 3.23 256 1.92 1.51 2.25 1.97 3.82 3.57 3.86 3.23 512 1.48 1.61 1.58 1.93 4.18 3.60 4.13 3.16 1024 1.71 1.60 1.76 1.71 4.24 3.66 3.95 3.17
The speed via these increments can vary considerably, so comparison are provided for the read all column. Both the Pi 4B hardware and gcc 8 compilation contribute to performance gains of the new system, particularly to the highest ratio of 2.81 with impact on the larger L2 cache.
Pi 3B+ ARM V7 BusSpeed vfpv4 32b V1 Fri Apr 12 21:39:00 2019 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 3885 4365 4755 5013 5078 5118 32 1688 1765 2513 3489 4279 4737 64 716 720 1315 2268 3399 4147 128 665 668 1206 2137 3281 4085 256 632 635 1160 2053 3195 4032 512 268 277 550 1058 1925 3088 1024 140 153 296 581 1115 2199 4096 120 131 257 498 1001 1777 16384 126 132 256 496 991 1677 65536 128 132 256 491 991 1950 Pi 4B ARM V7 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All Gain 16 3836 4049 4467 5885 4641 5858 1.14 32 761 1473 2594 3216 3960 4780 1.01 64 409 801 1684 2422 3745 3940 0.95 128 406 803 1202 1914 3037 5377 1.32 256 415 700 1165 2481 4789 5137 1.27 512 392 760 1243 2455 3764 4264 1.38 1024 230 256 623 1061 2455 3501 1.59 4096 197 214 454 938 1852 3195 1.80 16384 138 215 445 897 1724 3210 1.91 65536 174 215 398 744 1655 3130 1.61 Pi 3B+ gcc 8 BusSpeed vfpv4 32b gcc 8 Wed May 15 09:51:20 2019 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 3833 4346 4729 5002 5046 5069 32 2435 2532 3152 4860 4949 4999 64 696 705 1313 2213 3278 3983 128 651 662 1227 2077 3207 3950 256 620 630 1183 2007 3152 3925 512 481 503 955 1641 2618 3318 1024 133 145 286 506 1012 1694 4096 117 130 249 453 915 1476 16384 124 129 247 455 910 1415 65536 124 108 251 453 905 1445 Pi 4B gcc 8 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read Pi 4B gcc 8 KBytes Words Words Words Words Words All Gain Gain 16 4880 5075 5612 5852 5877 5864 1.16 1.00 32 846 1138 2153 3229 4908 5300 0.99 1.11 64 746 1019 2035 3027 4910 5360 1.50 1.36 128 728 983 1952 2908 4888 5389 1.52 1.00 256 683 934 1901 2794 4874 5431 1.55 1.06 512 656 900 1760 2625 4585 5259 1.75 1.23 1024 301 410 870 1356 2846 4238 2.81 1.21 4096 233 248 531 996 2151 4045 2.35 1.27 16384 236 258 511 891 2143 4011 2.35 1.25 65536 237 257 508 881 2172 4015 2.40 1.28
Using the original ARM V7 versions, the Pi 4B is indicated as faster on all test functions, with best case on double precision calculations using cached data, being between three and six times faster. Similar gains are also shown in the gcc 8 comparisons. Then, gcc8/V7 compiler comparisons show gains with floating point but the old compiler producing some faster speeds using integers. Maximum MFLOPS performance is shown for the calculations in the first two columns, rising from 237 DP and 532 SP on the 3B+ to 1485 DP and 2740 SP on the Pi 4B, using gcc8 - improvements 6.27 times DP and 5.15 times SP..
Pi 3B+ ARM V7 Pi 3B+ Memory Reading Speed Test vfpv4 32 Bit Version 1 by Roy Longbottom Start of test Fri Apr 12 21:39:51 2019 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 1896 2125 4046 2784 2624 4448 3165 3694 3693 16 1900 2129 4058 2791 2627 4462 3181 3711 3711 32 1821 2000 3664 2602 2426 3965 3187 3719 3717 64 1807 1974 3625 2567 2369 3923 3057 3615 3599 128 1792 1959 3620 2545 2364 3906 3079 3544 3544 256 1738 1914 3472 2468 2291 3719 3064 3545 3553 512 1380 1493 2199 1769 1715 2331 2192 2522 2383 1024 1003 1138 1319 1250 1219 1298 1487 1324 1324 2048 925 1001 1104 1065 1049 1103 1093 1032 1035 4096 901 972 1073 1037 1005 1081 1002 968 973 8192 852 948 1076 1041 1021 1080 1009 977 975 Max MFLOPS 237 532 Pi 4B ARM V7 8 8459 4766 13344 8303 4768 15553 7806 9926 9927 16 7142 3918 8649 7103 4094 9309 7899 10086 10056 32 7969 4490 10339 7941 4532 11627 7758 10070 10048 64 8126 4602 9909 8114 4617 11069 7425 8021 8070 128 8302 4651 9623 8311 4657 10836 7374 8049 7934 256 8319 4663 9627 8360 4666 10768 7530 7922 7925 512 8088 4629 9453 8239 4650 10696 5023 7904 7949 1024 3581 3113 3618 3577 3150 3675 5358 2431 1560 2048 1338 1808 1780 1811 1832 1773 2131 950 956 4096 1881 1880 1852 1879 1664 1336 1988 984 1054 8192 1890 1901 1884 1729 1319 1367 2252 1018 1021 Max MFLOPS 1057 1192 Pi 4B/3B+ 8 4.46 2.24 3.30 2.98 1.82 3.50 2.47 2.69 2.69 16 3.76 1.84 2.13 2.54 1.56 2.09 2.48 2.72 2.71 32 4.38 2.25 2.82 3.05 1.87 2.93 2.43 2.71 2.70 64 4.50 2.33 2.73 3.16 1.95 2.82 2.43 2.22 2.24 128 4.63 2.37 2.66 3.27 1.97 2.77 2.39 2.27 2.24 256 4.79 2.44 2.77 3.39 2.04 2.90 2.46 2.23 2.23 512 5.86 3.10 4.30 4.66 2.71 4.59 2.29 3.13 3.34 1024 3.57 2.74 2.74 2.86 2.58 2.83 3.60 1.84 1.18 2048 1.45 1.81 1.61 1.70 1.75 1.61 1.95 0.92 0.92 4096 2.09 1.93 1.73 1.81 1.66 1.24 1.98 1.02 1.08 8192 2.22 2.01 1.75 1.66 1.29 1.27 2.23 1.04 1.05 Pi 3B+ gcc 8 8 2024 3191 1931 2973 4464 2077 3415 4426 4426 16 2031 3194 1933 2977 4470 2078 3430 4451 4451 32 1972 3111 1902 2842 4291 2059 3433 4455 4451 64 1932 3042 1875 2752 4121 2008 3240 4223 4223 128 1972 3083 1888 2825 4163 2012 3281 4272 4276 256 1980 3089 1888 2851 4177 2013 3312 4244 4239 512 1750 2778 1739 2460 3711 1846 3106 4029 4096 1024 979 1862 1390 1213 2230 1463 1463 1225 1220 2048 979 1858 1379 1137 2111 1442 859 828 828 4096 975 1809 1363 1136 2091 1428 944 924 920 8192 976 1788 1364 1139 2053 1409 802 792 733 Max MFLOPS 254 799MemSpeed Continued BelowPi 4B gcc 8 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 11768 9844 3841 11787 9934 4351 10309 7816 7804 16 11880 9880 3822 11886 10043 4363 10484 7902 7892 32 9539 8528 3678 9517 8661 4098 10564 7948 7945 64 9952 9310 3733 9997 9470 4160 8452 7717 7732 128 9947 9591 3757 9990 9757 4178 8205 7680 7753 256 10015 9604 3758 10030 9781 4186 8120 7734 7707 512 9073 9300 3751 9472 9526 4175 7995 7709 7602 1024 2681 5303 3594 2664 4965 3760 4828 3592 3569 2048 1671 3488 3242 1757 3635 3540 2882 1036 1023 4096 1777 3700 3283 1827 3627 3555 2433 1052 1054 8192 1931 3805 3420 1933 3815 3629 2465 980 971 Max MFLOPS 1485 2740 Pi 4B/3B+ 8 5.81 3.08 1.99 3.96 2.23 2.09 3.02 1.77 1.76 16 5.85 3.09 1.98 3.99 2.25 2.10 3.06 1.78 1.77 32 4.84 2.74 1.93 3.35 2.02 1.99 3.08 1.78 1.78 64 5.15 3.06 1.99 3.63 2.30 2.07 2.61 1.83 1.83 128 5.04 3.11 1.99 3.54 2.34 2.08 2.50 1.80 1.81 256 5.06 3.11 1.99 3.52 2.34 2.08 2.45 1.82 1.82 512 5.18 3.35 2.16 3.85 2.57 2.26 2.57 1.91 1.86 1024 2.74 2.85 2.59 2.20 2.23 2.57 3.30 2.93 2.93 2048 1.71 1.88 2.35 1.55 1.72 2.45 3.36 1.25 1.24 4096 1.82 2.05 2.41 1.61 1.73 2.49 2.58 1.14 1.15 8192 1.98 2.13 2.51 1.70 1.86 2.58 3.07 1.24 1.32 4B gcc 8 gains 8 1.39 2.07 0.29 1.42 2.08 0.28 1.32 0.79 0.79 16 1.66 2.52 0.44 1.67 2.45 0.47 1.33 0.78 0.78 32 1.20 1.90 0.36 1.20 1.91 0.35 1.36 0.79 0.79 64 1.22 2.02 0.38 1.23 2.05 0.38 1.14 0.96 0.96 128 1.20 2.06 0.39 1.20 2.10 0.39 1.11 0.95 0.98 256 1.20 2.06 0.39 1.20 2.10 0.39 1.08 0.98 0.97 512 1.12 2.01 0.40 1.15 2.05 0.39 1.59 0.98 0.96 1024 0.75 1.70 0.99 0.74 1.58 1.02 0.90 1.48 2.29 2048 1.25 1.93 1.82 0.97 1.98 2.00 1.35 1.09 1.07 4096 0.94 1.97 1.77 0.97 2.18 2.66 1.22 1.07 1.00 8192 1.02 2.00 1.82 1.12 2.89 2.65 1.09 0.96 0.95
Pi 3B+ NEON Speed Test V 1.0 Fri Apr 12 22:11:38 2019 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3170 4669 4037 4930 5220 5545 32 3119 4531 3952 4780 5071 5374 64 2845 3920 3558 4075 4235 4438 128 2873 3954 3626 4095 4227 4484 256 2917 4027 3705 4184 4313 4563 512 2271 2923 2777 3000 3075 3127 1024 1181 1209 1221 1201 1163 1198 4096 1062 1077 1071 1050 1073 1076 16384 1087 1115 1111 1043 1094 1086 65536 1125 1144 1139 851 1126 1110 Pi 4B 16 9677 10072 8905 9358 9776 10473 32 10149 10330 9364 9539 9988 10543 64 10948 11708 10466 10568 11318 11994 128 10484 11232 10410 10104 11200 11792 256 10509 11369 10428 10264 11273 11842 512 10406 11066 10134 10054 11075 11467 1024 3069 3202 3159 3166 3204 3203 4096 1721 1910 1908 1882 1903 1900 16384 2023 2009 2008 1965 2032 2013 65536 2073 2074 2074 2073 2068 2064 Pi 4B/3B+ Comparisons Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3.05 2.16 2.21 1.90 1.87 1.89 512 4.58 3.79 3.65 3.35 3.60 3.67 1024 2.60 2.65 2.59 2.64 2.75 2.67 16384 1.86 1.80 1.81 1.88 1.86 1.85 Pi 3B+ gcc 8 NEON Speed Test gcc 8 Wed May 15 09:57:18 2019 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3289 5377 2010 5076 5731 5732 32 3280 5341 1995 5043 5706 5706 64 3115 4547 1923 4348 4771 4771 128 3145 4683 1927 4482 4886 4888 256 3146 4698 1926 4500 4906 4908 512 2666 3762 1779 3527 3903 3915 1024 1879 1228 1395 1225 1238 1238 4096 1792 1151 1373 1144 1164 1162 16384 1698 1167 1353 1119 1167 1170 65536 1229 1157 1328 874 1165 1166 Pi 4B gcc 8 16 9884 12882 3910 12773 13090 15133 32 9904 13061 3916 13002 13162 15239 64 9029 11526 3450 10704 11708 12084 128 9242 11784 3391 11016 11816 12179 256 9283 11890 3396 11215 11929 12284 512 9043 10680 3413 10211 10925 11241 1024 5818 3310 3507 3288 3239 2902 4096 4060 1994 3497 1991 2009 2011 16384 4030 2063 3445 2068 2072 2067 65536 3936 2109 3391 1858 2122 2121NeonSpeed Continued BelowPi 4B/3B+ Comparisons Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3.01 2.40 1.95 2.52 2.28 2.64 512 3.39 2.84 1.92 2.90 2.80 2.87 1024 3.10 2.70 2.51 2.68 2.62 2.34 16384 2.37 1.77 2.55 1.85 1.78 1.77 4B gcc 8 gains and losses 16 1.02 1.28 0.44 1.36 1.34 1.44 512 0.87 0.97 0.34 1.02 0.99 0.98 16384 1.99 1.03 1.72 1.05 1.02 1.03
Based on the 4 thread MWIPS rating, both compilations indicate the same Pi4B performance improvement, but there are variations on the individual test functions.
Pi 3B+ ARM V7 MP-Whetstone Benchmark Linux/ARM V7A v1.0 Wed Apr 24 22:48:42 2019 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1116.9 582.4 603.6 299.7 21.7 13.3 6969.0 1364.0 1398.5 2T 2226.5 1167.8 1181.0 593.5 43.4 26.4 12545.8 2789.0 2794.1 4T 4436.8 2354.9 2387.3 1190.1 86.3 52.5 27429.4 5539.7 5546.8 8T 4614.6 3174.1 3140.6 1250.0 88.1 54.7 36555.2 6409.9 6051.1 Overall Seconds 4.99 1T, 5.02 2T, 5.10 4T, 10.20 8T Pi 4B ARM V7 1T 2059.3 672.8 680.1 310.6 55.6 33.1 7461.6 2244.6 995.2 2T 4117.1 1341.7 1390.7 624.2 110.7 65.9 14887.3 4466.5 1986.2 4T 7910.0 2652.0 2722.2 1180.0 208.5 132.6 29291.2 8952.4 3832.3 8T 8651.6 3057.1 2971.1 1268.3 233.2 149.6 38367.5 11922.5 3941.7 Overall Seconds 4.99 1T, 5.01 2T, 5.29 4T, 10.71 8T Pi 3B+ gcc 8 MP-Whetstone Benchmark Linux/ARM gcc 8 Fri Jun 14 14:25:28 2019 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1057.5 390.9 392.6 298.1 21.0 12.3 5227.8 1363.1 1399.4 2T 2121.8 777.4 778.5 598.3 42.3 24.6 10185.9 2769.0 2762.9 4T 4225.9 1509.6 1532.2 1192.3 84.7 48.8 19273.0 5326.5 5552.9 8T 4419.6 1914.9 2041.9 1260.8 86.0 51.3 27645.3 7213.5 6031.5 Overall Seconds 4.98 1T, 5.00 2T, 5.11 4T, 10.09 8T Pi 4B gcc 8 1T 1889.5 538.7 537.6 311.4 56.3 26.1 7450.5 2243.2 659.9 2T 3782.7 1065.5 1071.2 627.1 112.3 52.0 14525.7 4460.9 1327.3 4T 7564.1 2101.0 2145.9 1250.4 225.0 104.1 29430.5 8944.2 2660.8 8T 8003.6 2598.8 2797.0 1313.0 233.2 110.4 37906.3 10786.7 2799.4 Overall Seconds 4.99 1T, 5.00 2T, 5.03 4T, 10.06 8T 4 Thread 4B/3B+ Performance ratios V7 1.78 1.13 1.14 0.99 2.42 2.53 1.07 1.62 0.69 gcc8 1.79 1.39 1.40 1.05 2.66 2.13 1.53 1.68 0.48
Pi 3B+ ARM V7 MP-Dhrystone Benchmark Linux/ARM V7A v1.0 Wed Apr 24 22:57:46 2019 Using 1, 2, 4 and 8 Threads Threads 1 2 4 8 Seconds 0.85 0.96 1.36 2.71 Dhrystones per Second 4733611 8295393 11750518 11789451 VAX MIPS rating 2694 4721 6688 6710 Pi 4B ARM V7 Seconds 0.82 1.59 2.70 5.04 Dhrystones per Second 9731507 10082787 11833655 12706636 VAX MIPS rating 5539 5739 6735 7232 Pi 3B+ gcc 8 Threads 1 2 4 8 Seconds 0.79 0.92 1.23 2.46 Dhrystones per Second 5035879 8678942 13020489 13028455 VAX MIPS rating 2866 4940 7411 7415 Pi 4B gcc 8 Threads 1 2 4 8 Seconds 0.79 1.21 2.62 4.88 Dhrystones per Second 10126308 13262168 12230188 13106002 VAX MIPS rating 5763 7548 6961 7459
Single thread performance, was the slowest accessing the larger data arrays (N value), more constant across the four sets of results. Fastest Pi 4B improvements were at N = 100, at around three times.
The programs produce the sumchecks, as shown below, with the four sets of calculations producing identical numeric results (as they should).
Pi 3B+ ARM V7 Linpack Single Precision MultiThreaded Benchmark Using NEON Intrinsics, Wed Apr 24 23:03:08 2019 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 627.07 66.31 64.79 64.14 N 500 465.16 293.95 292.37 293.76 N 1000 346.63 311.81 309.19 311.76 Pi 4B ARM V7 N 100 1921.53 108.66 101.88 102.46 N 500 1548.81 530.23 714.37 733.09 N 1000 399.94 378.11 364.78 398.21 Pi 3B+ gcc 8 N 100 638.49 66.92 66.23 66.14 N 500 471.71 304.69 297.05 305.51 N 1000 356.13 317.22 316.88 316.33 Pi 4B gcc 8 N 100 2007.38 112.55 107.85 106.98 N 500 1332.24 686.10 686.11 689.02 N 1000 402.61 435.26 432.21 432.01 Sumchecks N 100 500 1000 NR 2.17 5.42 9.50 RE 5.16722466e-05 6.46698638e-04 2.26586126e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -2.38418579e-07 -5.54323196e-05 -1.26898289e-04 XN -5.06639481e-06 -4.70876694e-06 1.41978264e-04
Comparisons are provided for RdAll, at 1, 2 and 4 threads. These are subject to multiprocessing peculiarities, but Pi 4B/Pi 3B+ performance gains were indicated as being around 2.5, using L1 cache data, and twice as fast, via L2 cache and RAM, with the gcc 8 produced version little different from the earlier compilations.
Pi 3B+ ARM V7 MP-BusSpd ARM V7A v2 Wed Apr 24 22:58:50 2019 MB/Second Reading Data, 1, 2, 4 and 8 Threads Staggered starting addresses to avoid caching KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3470 4390 4408 4760 5138 4926 2T 6272 7807 8321 9131 9780 9599 4T 9867 13732 15514 17568 19512 18209 8T 7385 10918 12320 14591 17357 16462 122.9 1T 662 648 1253 2129 3291 4475 2T 1044 1032 2003 3611 6135 8931 4T 1068 1085 2180 4354 8409 16053 8T 1057 1078 2124 4247 8227 15070 12288 1T 125 131 252 494 1009 1996 2T 195 136 272 501 1088 2121 4T 126 135 263 515 1017 1922 8T 114 136 305 545 994 2076 Pi 4B ARM V7 Pi 4B/3B+ 12.3 1T 5263 5637 5809 5894 5936 13445 2.73 2T 9412 10020 10567 11454 11604 24980 2.60 4T 16282 15577 16418 21222 20000 45530 2.50 8T 11600 13285 16070 18579 20593 36837 122.9 1T 739 956 1888 3153 5008 9527 2.13 2T 629 1158 1568 5058 9509 16489 1.85 4T 600 1093 2134 4527 8732 16816 1.05 8T 593 1104 2121 4382 8629 17158 12288 1T 238 258 518 1005 2001 4029 2.02 2T 278 228 453 1690 1826 3628 1.71 4T 269 257 740 1019 1790 4145 2.16 8T 233 292 532 926 2186 3581 Pi 3B+ gcc 8 MP-BusSpd ARM V7A gcc 8 Wed May 15 10:06:27 2019 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3555 4451 4382 4788 5124 5205 2T 6515 8132 8332 9016 9793 10100 4T 10667 14186 15956 17529 19228 16522 8T 7463 10987 13299 14948 17756 16781 122.9 1T 681 683 1211 2133 3280 4713 2T 1049 1057 2009 3848 6155 9293 4T 1049 1085 2191 4360 7921 16268 8T 1072 1092 2180 4303 8156 15722 12288 1T 125 131 256 495 1005 1970 2T 135 133 273 505 1100 2110 4T 116 130 243 511 1009 2059 8T 126 138 260 532 1061 2017 Pi 4B gcc 8 Pi 4B/3B+ 12.3 1T 5310 5616 5801 5898 5940 13425 2.54 2T 9393 10008 11293 11293 11368 24932 2.47 4T 15781 15015 17606 19034 22279 40736 2.47 8T 8465 9599 14580 18465 20034 36831 122.9 1T 664 930 1861 3191 5017 10281 2.18 2T 564 726 1523 5376 9387 18985 2.04 4T 486 919 1886 4289 8337 16979 1.04 8T 487 912 1854 4275 8271 16826 12288 1T 225 258 514 1010 1992 3975 2.02 2T 202 421 450 1765 3307 7396 3.51 4T 261 288 825 1332 1772 5014 2.44 8T 218 273 496 1041 2571 4021
Besides the full results, comparisons of the four thread results are shown below for Pi 4B/3B+ performance ratios. The Pi 3B+ appears to be faster reading data from the shared L2 cache, with 4 threads only, otherwise, the average performance of the new processor was indicated as 80% faster.
Pi 3B+ ARM V7 MP-RandMem Linux/ARM V7A v1.0 Wed Apr 24 22:54:55 2019 KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 3419 4333 3420 4422 2T 6531 4397 6515 4397 4T 12814 4308 12896 4303 8T 12922 4289 12561 4244 122.9 1T 3133 3959 800 1041 2T 5992 3959 1469 1040 4T 11584 3913 2322 1025 8T 11417 3895 2288 1028 12288 1T 2034 795 48 62 2T 2176 799 93 63 4T 3183 790 128 63 8T 2008 788 130 62 Pi 4B ARM V7 12.3 1T 5860 7905 5927 7657 2T 11747 7908 11182 7746 4T 21416 7626 17382 7731 8T 20649 7528 20431 7378 122.9 1T 5479 7269 1826 1923 2T 10355 6964 1667 1920 4T 9808 7177 1715 1908 8T 11677 7058 1697 1919 12288 1T 3438 1271 179 152 2T 4176 1204 213 167 4T 4227 1117 337 161 8T 3479 1093 287 168 Pi 4B/3B+ 12.3 4T 1.67 1.77 1.35 1.80 122.9 4T 0.85 1.83 0.74 1.86 12288 4T 1.33 1.41 2.63 2.56 Pi 3B+ gcc 8 12.3 1T 4362 4386 4363 4386 2T 8222 4308 8132 4311 4T 16391 4268 16396 4286 8T 16297 4244 15510 4228 122.9 1T 3643 3879 925 1025 2T 7008 3873 1692 1040 4T 12553 3877 2373 1038 8T 12000 3881 2330 1043 12288 1T 1848 833 67 62 2T 2183 829 119 63 4T 3672 825 135 63 8T 2608 826 136 63 Pi 4B gcc 8 12.3 1T 5950 7903 5945 7896 2T 11849 7923 11887 7917 4T 23404 7785 23395 7761 8T 21903 7669 23104 7655 122.9 1T 5670 7309 2002 1924 2T 10682 7285 1648 1923 4T 9944 7266 1813 1927 8T 9896 7216 1812 1919 12288 1T 3904 1075 179 164 2T 7317 1055 215 164 4T 3398 1063 343 165 8T 4156 1062 350 165 Pi 4B/3B+ gcc 8 12.3 4T 1.43 1.82 1.43 1.81 122.9 4T 0.79 1.87 0.76 1.86 12288 4T 0.93 1.29 2.54 2.62
Note across the board Pi 4B performance gains on all programs, with maximum speeds of 17.2 GFLOPS for single precision calculations and and 10.4 GFLOPS using double precision.
Single Precision Version Pi 3B+ ARM V7 MP-MFLOPS Linux/ARM V7A v1.0 Wed Apr 24 23:08:19 2019 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 214 212 189 813 812 797 2T 403 427 354 1613 1587 1573 4T 717 811 372 3044 3027 2982 8T 756 777 388 3005 3101 3064 Pi 4B ARM V7 1T 987 993 606 2816 2794 2804 2T 1823 1837 567 5610 5541 5497 4T 2119 3349 647 9884 10702 9081 8T 3136 3783 609 10230 10504 9240 Max 4B/3B+ 415 4.66 1.67 3.36 3.45 3.02 Pi 3B+ gcc 8 1T 214 212 189 799 784 781 2T 417 417 365 1568 1583 1540 4T 754 683 385 3026 3017 2919 8T 738 761 401 3053 2997 2866 Pi 4B gcc 8 1T 1224 1257 520 2814 2800 2803 2T 2485 2257 525 5608 5575 5576 4T 4119 3243 534 11018 10645 8358 8T 4131 4618 541 9941 10339 8165 Max 4B/3B+ 5.48 6.07 1.35 3.61 3.53 2.86 ################################################### NEON Intrinsic Functions Version Pi 3B+ ARM V7 MP-MFLOPS NEON Intrinsics v1.0 Wed Apr 24 22:41:38 2019 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 692 685 393 2052 2017 1887 2T 1126 1358 403 4096 3924 3697 4T 2434 2030 405 7848 7740 5547 8T 2363 2095 407 7584 7609 6097 Pi 4B ARM V7 1T 2491 2399 615 4325 4285 4261 2T 5629 5520 591 8602 8463 8308 4T 10580 5594 553 16991 16493 9124 8T 7047 10785 513 14325 16219 8867 Max 4B/3B+ 4.35 5.15 1.36 2.17 2.13 1.50MP-MFLOPS Continued BelowPi 3B+ gcc 8 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 691 684 407 1910 1874 1828 2T 1214 1306 410 3746 3747 3392 4T 1943 2568 410 7403 7435 5913 8T 2093 2233 411 7217 7087 6044 Pi 4B gcc 8 1T 2797 2870 641 4422 4454 4405 2T 3217 5601 569 8587 8800 8377 4T 7902 9864 611 17061 17215 9704 8T 7070 10562 603 15531 16203 9516 Max 4B/3B+ 3.78 4.13 1.49 2.30 2.32 1.61 ################################################### Double Precision Version Pi 3B+ ARM V7 MP-MFLOPS Double Precision v1.0 Sat Jun 15 12:07:33 2019 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 209 206 166 782 797 747 2T 415 416 198 1566 1590 1462 4T 663 801 198 3125 3122 2770 8T 746 729 199 3061 2909 2745 Pi 4B ARM V7 1T 1187 1220 309 2682 2714 2701 2T 2420 2416 282 5379 5415 4780 4T 4665 2381 317 10256 10336 5242 8T 4385 3114 310 9721 10340 5131 Max 4B/3B+ 6.25 3.89 1.59 3.28 3.31 1.89 Pi 3B+ gcc 8 1T 214 213 168 798 797 776 2T 409 416 194 1567 1590 1466 4T 694 675 195 3122 3120 2751 8T 698 797 198 3055 3005 2779 Pi 4B gcc 8 1T 1203 1211 315 2675 2719 2674 2T 2291 2441 293 5406 5421 4907 4T 4673 2501 309 10313 10393 5256 8T 4394 3550 265 8782 10110 5197 Max 4B/3B+ 6.69 4.45 1.56 3.30 3.33 1.89 Sumchecks SP 76406 97075 99969 66015 95363 99951 NEON 76406 97075 99969 66014 95363 99951 DP 76384 97072 99969 66065 95370 99951
The final data values are checked for consistency. Different compilers or different CPUs could involve using alternative instructions or rounding effects, with variable accuracy. Then, OpenMP sumchecks could be expected to be the same as those from NotOpenMP single core values. However, this is not always the case. The double precision gcc 8 benchmarks appears to be consistent, but only single precision sumchecks are provided.
This benchmark was a compilation of code used for desktop PCs, starting at 100 KB, then 1 MB and 10 MB.
OpenMP MFLOPS Benchmark 1 Wed Apr 24 22:51:10 2019 Test 4 Byte Ops/ Repeat Seconds MFLOPS First All Words Word Passes Results Same Data in & out 100000 2 2500 0.281575 1776 0.929538 Yes Data in & out 1000000 2 250 1.265817 395 0.992550 Yes Data in & out 10000000 2 25 1.222289 409 0.999250 Yes Data in & out 100000 8 2500 0.376635 5310 0.957126 Yes Data in & out 1000000 8 250 1.305504 1532 0.995524 Yes Data in & out 10000000 8 25 1.267736 1578 0.999550 Yes Data in & out 100000 32 2500 3.285631 2435 0.890232 Yes Data in & out 1000000 32 250 3.351830 2387 0.988068 Yes Data in & out 10000000 32 25 3.329400 2403 0.998785 Yes End of test Wed Apr 24 22:51:26 2019 SumChecks V7A OMP 3B+ 4B 0.929538 0.992550 0.999250 0.957126 0.995524 0.999550 0.890232 0.988068 0.998785 V7A Not 3B+ 4B 0.929538 0.992550 0.999250 0.957126 0.995524 0.999550 0.890268 0.988078 0.998806 gcc 8 OMP 3B+, Not 3B+ 4B 0.929538 0.992550 0.999250 0.957126 0.995524 0.999550 0.890282 0.988096 0.998806 gcc 8 4B OMP 0.098043 0.810084 0.922891 0.144870 0.922568 0.918226 0.401577 0.935064 0.916277 gcc 8 DP OMP 3B+ 4B, Not 3B+ 4B 0.929474 0.992543 0.999249 0.957164 0.995525 0.999550 0.890377 0.988101 0.998799
The other comparisons identify Pi 4B performance gains over the Pi 3B+, where those applying to single core use being better than via OpenMP. Highest OpenMP improvement was 4.5 times, via gcc 8 and double precision operation. Maximum demonstrated Pi 4B speeds were 19.9 GFLOPS single precision and 9.3 GFLOPS double precision.
V7A Compiler Pi 3B+ Pi 4B Pi4 Gains KB+Ops 4 1 4 core 4 1 4 core 4 1 /Word Cores Core Gain Cores Core Gain Cores Core 100- 2 1776 831 2.14 4716 2850 1.65 2.66 3.43 1000- 2 395 391 1.01 556 429 1.30 1.41 1.10 10000- 2 409 409 1.00 544 632 0.86 1.33 1.55 100- 8 5310 2009 2.64 7981 5191 1.54 1.50 2.58 1000- 8 1532 1445 1.06 2389 2082 1.15 1.56 1.44 10000- 8 1578 1478 1.07 2199 2003 1.10 1.39 1.36 100-32 2435 1855 1.31 8147 5449 1.50 3.35 2.94 1000-32 2387 1733 1.38 7951 5385 1.48 3.33 3.11 10000-32 2403 1736 1.38 8030 5379 1.49 3.34 3.10OpenMP-MFLOPS Continued Belowgcc 8 Compiler Pi 3B+ Pi 4B Pi4 Gains KB+Ops 4 1 4 core 4 1 4 core 4 1 /Word Cores Core Gain Cores Core Gain Cores Core 100- 2 2139 778 2.75 5100 2270 2.25 2.38 2.92 1000- 2 398 403 0.99 617 632 0.98 1.55 1.57 10000- 2 412 415 0.99 542 631 0.86 1.32 1.52 100- 8 7348 1919 3.83 13805 5511 2.50 1.88 2.87 1000- 8 1597 1448 1.10 2168 2217 0.98 1.36 1.53 10000- 8 1635 1444 1.13 2178 2542 0.86 1.33 1.76 100-32 8497 2023 4.20 19921 5341 3.73 2.34 2.64 1000-32 5997 1903 3.15 8556 5267 1.62 1.43 2.77 10000-32 6057 1914 3.16 8731 5276 1.65 1.44 2.76 gcc 8 Double Precision Pi 3B+ Pi 4B Pi4 Gains KB+Ops 4 1 4 core 4 1 4 core 4 1 /Word Cores Core Gain Cores Core Gain Cores Core 100- 2 711 203 3.50 3200 977 3.28 4.50 4.81 1000- 2 193 168 1.15 274 295 0.93 1.42 1.76 10000- 2 199 172 1.16 273 307 0.89 1.37 1.78 100- 8 1898 503 3.77 6771 2440 2.78 3.57 4.85 1000- 8 730 434 1.68 1102 1072 1.03 1.51 2.47 10000- 8 755 435 1.74 1108 1255 0.88 1.47 2.89 100-32 3072 793 3.87 9229 2725 3.39 3.00 3.44 1000-32 2695 765 3.52 4256 2674 1.59 1.58 3.50 10000-32 2719 765 3.55 4469 2677 1.67 1.64 3.50
In my MP MFLOPS programs, the routines that include 32 double precision floating point operations per data word read, disassembly indicates that the following instructions are used, with 64 bit d registers, where maximum measured speed was just over 10 GFLOPS.
.L18:
vldr.64 d17, [r1]
vadd.f64 d16, d17, d4
vadd.f64 d18, d17, d0
vadd.f64 d25, d17, d15
vadd.f64 d24, d17, d11
vmul.f64 d16, d16, d5
vadd.f64 d23, d17, d31
vadd.f64 d22, d17, d27
vadd.f64 d21, d17, d2
vadd.f64 d20, d17, d6
vadd.f64 d19, d17, d13
vfma.f64 d16, d18, d1
vadd.f64 d18, d17, d9
vadd.f64 d17, d17, d29
vfma.f64 d16, d25, d14
vfma.f64 d16, d24, d10
vfma.f64 d16, d23, d30
vfma.f64 d16, d22, d28
vfms.f64 d16, d21, d3
vfms.f64 d16, d20, d7
vfms.f64 d16, d19, d12
vfms.f64 d16, d18, d8
vfms.f64 d16, d17, d26
vstmia.64 r1!, {d16}
cmp r0, r1
bne .L18
Pi 3B+ ARM V7 Memory Reading Speed Test OpenMP Version 2 by Roy Longbottom Start of test Wed Apr 24 22:45:07 2019 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 6432 3483 1646 10276 5514 1770 18468 9721 1534 8 7041 3603 1651 11747 5783 1788 19068 10085 1538 16 7023 3606 1557 11694 5839 1672 19316 9528 1469 32 6983 3600 1525 11413 5915 1656 19385 9532 1442 64 6283 3554 1584 10861 5751 1621 14307 9466 1443 128 6828 3578 1580 11074 5828 1659 10791 8935 1490 256 5384 3365 1521 11216 5166 1687 9806 8148 1519 512 5371 3253 1511 8917 4858 1412 7752 4363 1365 1024 3084 2643 1066 3772 3504 1314 1450 1403 1136 2048 3345 2087 1086 4148 3589 1471 1052 1063 1139 4096 915 2648 894 4143 2456 1655 984 987 1190 8192 3644 2504 1124 4183 3530 1496 903 909 1074 16384 963 2050 922 3867 3154 1478 752 849 1156 32768 3889 2467 1179 3562 3328 1667 838 833 1150 65536 3902 2009 1109 3843 1437 1596 917 917 927 131072 986 667 819 1145 904 820 858 865 584 Not OMP 8 1860 2972 4449 2787 4039 4449 3168 3164 3170 256 1810 2791 4137 2655 3860 4135 3126 3065 3066 65536 960 1121 1109 1100 1120 1115 901 793 844 Pi 4B ARM V7 Memory Reading Speed Test OpenMP Version 2 by Roy Longbottom Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 7732 8092 1266 7627 8431 1616 31436 15892 889 8 7546 8158 1284 7925 8537 1597 30383 16635 884 16 7695 8198 1261 7854 8549 1598 27037 15644 896 32 7773 7808 1255 8036 7727 1612 29621 16928 897 64 9728 9094 1233 9355 9028 1602 16855 13297 867 128 11296 10068 1002 11342 10813 1686 13594 15106 794 256 13987 11677 1231 15357 13496 1732 12707 10415 878 512 17763 8841 1170 10023 13404 1529 12655 9137 693 1024 6070 6553 1262 10196 10069 1455 5405 5027 670 2048 3858 6609 1343 6440 6643 1657 2234 2324 877 4096 6055 6743 989 6608 6568 1664 2114 2369 777 8192 1669 2047 1126 7071 6894 1581 2532 2569 857 16384 1974 1953 1385 6748 4399 1763 2643 1845 753 32768 1594 3482 1115 7680 7494 1814 1739 1908 1147 65536 2630 7446 1320 1632 1826 1651 2061 2920 904 131072 1438 1540 1249 1714 1694 1244 1760 2011 856 Not OMP 8 8602 11536 13324 8607 11756 13378 7826 7689 7670 256 8319 9856 10030 8338 8984 9308 5800 7510 7535 65536 1373 1725 2071 2059 2072 2044 2170 912 900OpenMP-MemSpeed Continued BelowPi 3B+ gcc 8 Memory Reading Speed Test OpenMP gcc 8 by Roy Longbottom Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 7065 3661 8370 10058 5245 9260 18199 9342 9242 8 7350 3854 9338 11747 5786 10201 19108 9663 9412 16 7444 3955 9543 11918 5961 10696 19339 9854 9831 32 7198 3953 9537 9783 5908 10683 19075 9958 9971 64 6848 3901 9057 11146 5168 9187 10408 9399 9440 128 7655 3916 9113 11204 5785 10073 10315 9185 9191 256 7044 3921 9154 11263 5785 10114 9601 9002 9019 512 6662 3579 7738 9326 5206 7931 8313 7911 7903 1024 4050 2892 4167 3997 3674 4318 1437 1422 1435 2048 3996 2879 4134 4038 3624 4325 1042 1012 999 4096 3909 2803 4078 3981 3591 4223 1047 988 1044 8192 3880 2871 3805 4196 3555 4117 935 948 940 16384 1366 2193 3757 4058 3178 3895 902 894 843 32768 2202 2138 3428 3577 3335 3559 871 793 893 65536 1180 1119 1696 1447 1178 1721 853 874 868 131072 1016 688 1096 1133 893 1141 844 1141 1080 Not OMP 8 2020 1878 2056 2959 2018 2068 3398 4406 4406 256 1973 1833 1990 2845 1966 1993 3306 4215 4215 65536 1016 1248 1287 1130 1302 1301 1005 928 915 Pi 4B gcc 8 Memory Reading Speed Test OpenMP gcc 8 by Roy Longbottom Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 8097 8322 8641 8020 8436 8384 39701 19701 19712 8 7814 8555 8756 8321 8548 8526 39042 19984 19996 16 8149 7738 7742 8303 7779 8192 37995 19883 19984 32 8969 8769 8799 9040 8759 8743 37737 20133 20130 64 7617 7457 7437 7575 7380 7422 17770 15332 14248 128 11221 10936 11003 11105 11011 10986 13650 13910 13881 256 17883 18144 18036 17691 18094 17844 13073 12465 12535 512 18001 18468 19675 17075 18221 19264 13511 13895 12008 1024 9532 10590 9772 11842 11282 11277 7173 9473 9496 2048 7095 7025 6866 7117 7043 6946 2914 3475 3468 4096 7244 6927 7036 5951 7054 6531 2582 3130 3122 8192 4578 7173 7025 6322 7078 7182 2504 3127 3115 16384 5470 7043 7067 7103 7052 7020 2557 3093 3088 32768 7359 7817 7766 7158 7078 7757 2618 3066 3094 65536 7810 7268 7266 3824 7478 5164 2486 3016 2931 131072 2460 2655 7224 7513 7308 7339 2540 2944 2940 Not OMP 8 11775 3895 4342 11787 4325 4354 10334 7806 7816 256 10032 3699 4223 9978 4289 4185 7105 7612 7621 65536 2099 2587 3033 2103 3021 3001 2585 1105 1101
******************** Pi 3B+ 2.4 GHz ******************** MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 5.71 6.07 5.96 5.69 5.46 4.76 16 6.14 6.38 6.47 6.14 6.15 5.91 Random Read Write From MB 4 8 16 4 8 16 msecs 2.94 3.081 3.185 3.04 2.89 3.7 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.16 0.57 0.96 0.36 0.63 1.17 ms/file 25.3 14.31 17.1 11.46 13.04 14.06 2.138 ********************* Pi 3B+ 5 GHz ********************* MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 12.82 14.52 14.00 10.98 11.09 8.94 16 11.60 12.91 4.48 9.16 8.19 7.69 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.41 0.76 1.46 0.41 0.74 1.46 ms/file 9.96 10.83 11.19 10.11 11.02 11.23 1.990 Random similar to 2.4 GHz ********************* Pi 4B 2.4 GHz ******************** MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 6.35 6.33 6.38 7.05 6.98 7.10 16 6.70 6.82 6.76 7.19 6.53 7.22 Random Read Write From MB 4 8 16 4 8 16 msecs 2.691 2.875 3.048 3.13 2.93 2.84 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.34 0.44 1.04 0.37 0.37 1.26 ms/file 12.14 18.59 15.7 11.1 22.2 12.99 2.153 ********************** Pi 4B 5 GHz ********************* MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 11.90 12.96 13.16 10.11 9.55 9.66 16 11.50 13.93 14.13 9.91 8.88 9.92 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.13 0.46 0.91 0.25 0.55 1.02 ms/file 30.85 17.83 18.10 16.62 14.93 16.01 3.361 Random similar to 2.4 GHz
************************ Pi 3B+ ************************ MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 31.17 31.62 31.61 13.5 26.19 26.38 16 31.62 31.89 31.76 26.7 26.94 27.01 Random Read Write From MB 4 8 16 4 8 16 msecs 0.007 1.09 0.688 1.16 1.04 1.08 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.15 2.26 4.18 1.73 3.18 5.66 ms/file 3.57 3.62 3.92 2.36 2.58 2.89 0.511 Larger Files MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 32 31.99 31.61 32.13 21.39 27.09 26.87 64 32.33 32.37 32.35 26.94 26.98 26.7 ************************ Pi 4B ************************ MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 67.82 12.97 90.19 99.84 93.49 96.83 16 92.25 92.66 92.96 103.9 105.28 91.17 Random Read Write From MB 4 8 16 4 8 16 msecs 0.007 0.01 0.04 1.01 0.85 0.91 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.47 2.8 5.14 2.47 4.71 8.61 ms/file 2.78 2.92 3.19 1.66 1.74 1.9 0.256 Larger Files MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 32 78.2 34.46 80.71 84.94 87.11 84.97 64 88.18 87.52 87.03 111.34 109.58 107.28 128 98.84 99.24 96.58 110.99 110.57 87.43 256 106.75 105.43 106.4 85.78 108.99 106.29
On large files, Pi 4B performance gains on the largest files shown, were 2.2 times on writing and 5.3 times on reading. Unlike LanSpeed, DriveSpeed uses Direct i/O, leading to an extra entry for cached files, reading mainly influenced by RAM speeds. Results can be too variable to provide meaningful comparisons.
Random access speeds were quite similar. On small files, relative reading speed was indicates as five times faster, on the Pi 4B, but the 3B+ appeared to be nearly 30 times faster, on reading.
For the Pi 4B, additional large file performance are included for a Patriot Rage 2 USB 3 stick, rated as reading at up to 400 MB/second, with near 300 MB/second demonstrated using a Windows version of DriveSpeed.. In this case, it appeared to be slightly slower than the first one on reading, but faster on writing, at 80 MB/second. This second drive also obtained those painfully slow speeds on writing small files.
********************* Pi 3B+ USB 2 ******************** DriveSpeed RasPi 1.1 Wed Apr 24 22:09:09 2019 /media/pi/REMIX_OS/ Total MB 9017, Free MB 7486, Used MB 1531 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 27.71 27.35 27.13 30.72 30.9 31.31 16 27.21 27.54 23.69 29.89 31.34 31.27 Cached 8 52.24 59.57 46.88 333.08 741.57 780.68 Random Read Write From MB 4 8 16 4 8 16 msecs 0.403 0.403 0.404 0.74 0.85 0.59 200 File Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.10 2.12 3.82 6.04 9.17 14.01 ms/file 3.71 3.86 4.28 0.68 0.89 1.17 0.123 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 1000 27.25 27.25 27.19 31.23 31.27 31.27 2000 27.30 27.07 27.32 31.32 31.26 31.26 ********************* Pi 4B USB 3 ********************* DriveSpeed RasPi 1.1 Fri Apr 26 17:21:56 2019 /media/pi/REMIXOSSYS// Total MB 5108, Free MB 3982, Used MB 1126 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 33.28 32.27 32.28 161.34 162.25 163.85 16 39.85 41.95 43.02 164.07 165.53 165.84 Cached 8 33.32 34.96 34.96 593.94 582.25 589.22 Random Read Write From MB 4 8 16 4 8 16 msecs 0.383 0.372 0.371 0.77 0.83 0.63 200 File Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.04 0.07 0.15 20.64 41.04 70.01 ms/file 110.04 109.97 110.01 0.20 0.20 0.23 0.089 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 500 56.36 58.13 55.25 166.31 165.46 165.43 1000 59.56 61.46 60.54 161.69 165.97 166.49 /media/pi/PATRIOT/ Total MB 120832, Free MB 120832, Used MB 0 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 1000 80.87 80.23 81.92 131.41 130.72 130.39 2000 83.67 81.82 82.14 130.85 131.29 131.36
DriveSpeed RasPi 1.1 Mon Apr 29 10:20:57 2019 Current Directory Path: /home/pi/Raspberry_Pi_Benchmarks/DriveSpeed/drive1 Total MB 14845, Free MB 8198, Used MB 6646 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 16.41 11.21 12.27 39.81 40.10 40.39 16 11.79 21.10 34.05 40.18 40.19 40.33 Cached 8 137.47 156.43 285.59 580.73 598.66 587.97 Random Read Write From MB 4 8 16 4 8 16 msecs 0.371 0.371 0.363 1.28 1.53 1.30 200 File Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 3.49 6.41 8.26 7.67 11.68 17.51 ms/file 1.17 1.28 1.98 0.53 0.70 0.94 0.014
Pi 4B performance was nearly as good as the compiled C version. However, there can be wide variations involving new Java versions. Here, the Pi 3B+ overall MWIPS rating was particularly slow, entirely due to the time taken by the sin,cos and exp,sqrt tests. Other than these, the Pi 4B was three to four times faster.
************************ Pi 3B+ ************************ Whetstone Benchmark Java Version, May 14 2019, 15:02:11 1 Pass Test Result MFLOPS MOPS millisecs N1 floating point -1.124750137 215.20 0.0892 N2 floating point -1.131330490 208.76 0.6438 N3 if then else 1.000000000 103.58 0.9992 N4 fixed point 12.000000000 538.09 0.5854 N5 sin,cos etc. 0.499110103 7.04 11.8100 N6 floating point 0.999999821 106.22 5.0780 N7 assignments 3.000000000 322.85 0.5724 N8 exp,sqrt etc. 0.751108646 1.38 26.9200 MWIPS 214.14 46.6980 Operating System Linux, Arch. arm, Version 4.14.70-v7+ Java Vendor Oracle Corporation, Version 1.8.0_212 ************************ Pi 4B ************************ Whetstone Benchmark Java Version, May 14 2019, 14:16:44 1 Pass Test Result MFLOPS MOPS millisecs N1 floating point -1.124750137 503.94 0.0381 N2 floating point -1.131330490 488.37 0.2752 N3 if then else 1.000000000 332.80 0.3110 N4 fixed point 12.000000000 881.37 0.3574 N5 sin,cos etc. 0.499110132 42.92 1.9384 N6 floating point 0.999999821 345.77 1.5600 N7 assignments 3.000000000 332.97 0.5550 N8 exp,sqrt etc. 0.825148463 25.00 1.4880 MWIPS 1533.01 6.5231 Operating System Linux, Arch. arm, Version 4.19.29-v7l+ Java Vendor Oracle Corporation, Version 1.8.0_212 ******************* Pi 4B OpenJDK11 ******************* Whetstone Benchmark OpenJDK11 Java Version, May 15 2019, 18:48:20 1 Pass Test Result MFLOPS MOPS millisecs N1 floating point -1.124750137 524.02 0.0366 N2 floating point -1.131330490 494.12 0.2720 N3 if then else 1.000000000 289.92 0.3570 N4 fixed point 12.000000000 1092.99 0.2882 N5 sin,cos etc. 0.499110132 59.86 1.3900 N6 floating point 0.999999821 345.95 1.5592 N7 assignments 3.000000000 331.54 0.5574 N8 exp,sqrt etc. 0.825148463 25.41 1.4640 MWIPS 1687.92 5.9244 Operating System Linux, Arch. arm, Version 4.19.37-v7l+ Java Vendor BellSoft, Version 11.0.2-BellSoft
Pi 4B performance gains were best on the most complex test function.
A later version was produced and run via OpenJDK11.
************************ Pi 3B+ ************************ Java Drawing Benchmark, May 14 2019, 15:32:06 Produced by javac 1.6.0_27 Test Frames FPS Display PNG Bitmap Twice Pass 1 566 56.55 Display PNG Bitmap Twice Pass 2 651 65.00 Plus 2 SweepGradient Circles 665 66.45 Plus 200 Random Small Circles 660 65.93 Plus 320 Long Lines 442 44.16 Plus 4000 Random Small Circles 334 33.30 Total Elapsed Time 60.1 seconds Operating System Linux, Arch. arm, Version 4.14.70-v7+ Java Vendor Oracle Corporation, Version 1.8.0_212 ************************ Pi 4B ************************ Java Drawing Benchmark, May 14 2019, 14:33:58 Produced by javac 1.7.0_02 Test Frames FPS Display PNG Bitmap Twice Pass 1 791 79.05 Display PNG Bitmap Twice Pass 2 932 93.11 Plus 2 SweepGradient Circles 1152 115.17 Plus 200 Random Small Circles 1200 119.98 Plus 320 Long Lines 784 78.31 Plus 4000 Random Small Circles 621 62.03 Total Elapsed Time 60.1 seconds Operating System Linux, Arch. arm, Version 4.19.29-v7l+ Java Vendor Oracle Corporation, Version 1.8.0_212 ******************* Pi 4B OpenJDK11 ******************* Java Drawing Benchmark, May 15 2019, 18:55:41 Produced by OpenJDK 11 javac Test Frames FPS Display PNG Bitmap Twice Pass 1 877 87.65 Display PNG Bitmap Twice Pass 2 1042 104.18 Plus 2 SweepGradient Circles 1015 101.47 Plus 200 Random Small Circles 779 77.85 Plus 320 Long Lines 336 33.52 Plus 4000 Random Small Circles 83 8.25 Total Elapsed Time 60.1 seconds Operating System Linux, Arch. arm, Version 4.19.37-v7l+ Java Vendor BellSoft, Version 11.0.2-BellSoft
After installing freeglut3, the benchmark ran as before. The benchmark measures graphics speed in terms of Frames Per Second (FPS) via six simple and more complex tests. The first four tests portray moving up and down a tunnel including various independently moving objects, with and without texturing. The last two tests, represent a real application for designing kitchens. The first is in wireframe format, drawn with 23,000 straight lines. The second has colours and textures applied to the surfaces.
As a benchmark, it was run using the following script file
export vblank_mode=0 ./videogl32 Width 320, Height 240, NoEnd ./videogl32 Width 640, Height 480, NoHeading, NoEnd ./videogl32 Width 1024, Height 768, NoHeading, NoEnd ./videogl32 NoHeading
************************ Pi 3B+ ************************ GLUT OpenGL Benchmark 32 Bit Version 1, Fri Apr 12 22:21:35 2019 Running Time Approximately 5 Seconds Each Test Window Size Coloured Objects Textured Objects WireFrm Texture Pixels Few All Few All Kitchen Kitchen Wide High FPS FPS FPS FPS FPS FPS 320 240 343.8 208.3 88.4 56.6 24.3 15.5 640 480 243.0 170.3 82.8 54.5 24.2 15.5 1024 768 110.6 101.2 63.6 47.8 24.1 15.4 1920 1080 49.5 47.3 36.8 32.9 23.4 14.9 ************************ Pi 4B ************************ GLUT OpenGL Benchmark 32 Bit Version 1, Thu May 2 19:01:05 2019 Running Time Approximately 5 Seconds Each Test Window Size Coloured Objects Textured Objects WireFrm Texture Pixels Few All Few All Kitchen Kitchen Wide High FPS FPS FPS FPS FPS FPS 320 240 766.7 371.4 230.6 130.2 32.5 22.7 640 480 427.3 276.5 206.0 121.8 31.7 22.2 1024 768 193.1 178.8 150.5 110.4 31.9 21.5 1920 1080 81.4 79.4 74.6 68.3 30.8 20.0
************************ Pi 3B+ ************************ MP-Integer-Test 32 Bit v1.0 Fri Jun 21 15:09:22 2019 Benchmark 1, 2, 4, 8, 16 and 32 Threads MB/second KB KB MB Same All Secs Thrds 16 160 16 Sumcheck Tests 9.4 1 3497 3284 1813 00000000 Yes 6.3 2 6994 6505 2123 FFFFFFFF Yes 5.6 4 13839 12528 1882 5A5A5A5A Yes 5.6 8 13723 13780 1872 AAAAAAAA Yes 5.6 16 13734 14049 1857 CCCCCCCC Yes 5.6 32 13499 13881 1879 0F0F0F0F Yes ************************ Pi 4B ************************ MP-Integer-Test 32 Bit v1.0 Fri Jun 21 15:39:57 2019 4.9 1 5956 5754 3977 00000000 Yes 3.6 2 11861 11429 3763 FFFFFFFF Yes 3.1 4 22998 21799 3464 5A5A5A5A Yes 3.1 8 22695 21128 3490 AAAAAAAA Yes 3.1 16 22835 23491 3485 CCCCCCCC Yes 3.0 32 22593 23485 3591 0F0F0F0F Yes Average Gains Caches 1.68, RAM 1.91 ************************ Pi 3B+ ************************ MP-Threaded-MFLOPS 32 Bit v1.0 Fri Jun 21 15:10:28 2019 Benchmark 1, 2, 4 and 8 Threads MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 3.1 T1 2 857 849 414 40392 76406 99700 5.5 T2 2 1661 1678 411 40392 76406 99700 7.4 T4 2 3086 3336 413 40392 76406 99700 9.4 T8 2 3194 3168 414 40392 76406 99700 13.8 T1 8 1942 1935 1495 54756 85091 99820 16.7 T2 8 3756 3824 1659 54756 85091 99820 19.0 T4 8 7209 7528 1643 54756 85091 99820 21.3 T8 8 6978 7341 1657 54756 85091 99820 36.8 T1 32 2019 2050 1915 35296 66020 99519 44.6 T2 32 4078 4031 3757 35296 66020 99519 48.9 T4 32 7927 7910 6095 35296 66020 99519 53.1 T8 32 7919 8141 6336 35296 66020 99519 ************************ Pi 4B ************************ MP-Threaded-MFLOPS 32 Bit v1.0 Sun May 26 21:23:49 2019 1.6 T1 2 2134 2607 656 40392 76406 99700 2.9 T2 2 5048 5156 621 40392 76406 99700 4.0 T4 2 7536 9939 681 40392 76406 99700 5.2 T8 2 7934 9839 639 40392 76406 99700 7.2 T1 8 5535 5420 2569 54756 85091 99820 8.7 T2 8 10757 10732 2454 54756 85091 99820 10.1 T4 8 18108 20703 2444 54756 85091 99820 11.5 T8 8 19236 20286 2245 54756 85091 99820 17.4 T1 32 5309 5270 5262 35296 66020 99519 20.4 T2 32 10551 10528 9753 35296 66020 99519 22.4 T4 32 20120 20886 11064 35296 66020 99519 24.5 T8 32 19415 20464 9929 35296 66020 99519 Average Gains Caches 2.72, RAM 1.75Stress Tests Continued Below************************ Pi 3B+ ************************ MP-Threaded-MFLOPS 32 Bit v1.0 Fri Jun 21 15:11:41 2019 Double Precision Benchmark 1, 2, 4 and 8 Threads MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 9.7 T1 2 215 213 173 40395 76384 99700 15.9 T2 2 420 426 206 40395 76384 99700 20.6 T4 2 819 830 205 40395 76384 99700 25.3 T8 2 807 823 205 40395 76384 99700 41.4 T1 8 508 502 437 54805 85108 99820 49.8 T2 8 1002 1008 778 54805 85108 99820 55.8 T4 8 1985 1955 768 54805 85108 99820 61.6 T8 8 1974 1958 817 54805 85108 99820 100.5 T1 32 799 794 775 35159 66065 99521 120.1 T2 32 1595 1588 1533 35159 66065 99521 130.5 T4 32 3115 3087 2731 35159 66065 99521 140.7 T8 32 3154 3126 2821 35159 66065 99521 ************************ Pi 4B ************************ MP-Threaded-MFLOPS 32 Bit v1.0 Sun May 26 21:26:37 2019 Double Precision Benchmark 1, 2, 4 and 8 Threads 3.4 T1 2 921 998 326 40395 76384 99700 6.1 T2 2 1968 1995 308 40395 76384 99700 8.4 T4 2 3465 3925 342 40395 76384 99700 10.9 T8 2 3646 3702 301 40395 76384 99700 15.1 T1 8 2377 2446 1283 54805 85108 99820 18.1 T2 8 4916 4860 1326 54805 85108 99820 20.5 T4 8 9202 9510 1391 54805 85108 99820 23.1 T8 8 9090 9006 1298 54805 85108 99820 34.5 T1 32 2695 2725 2707 35159 66065 99521 40.3 T2 32 5416 5441 5121 35159 66065 99521 44.1 T4 32 10666 10831 5275 35159 66065 99521 48.3 T8 32 10427 10602 4832 35159 66065 99521 Average Gains Caches 4.23, RAM 2.09