Raspberry Pi 4B Stress Tests Including High Performance Linpack

Raspberry Pi 4B Stress Tests Including High Performance Linpack
Roy Longbottom

Introduction

General Integer Test Floating Point Tests

Environment Monitors OpenGL Test Livermore Loops Test

Input/Output Test High Performance Linpack

Test Results

Unstressed Tests Single/Multi Core CPU Tests OpenGL Test No Cooling

Integer Stress Tests SP Floating Point Stress Tests DP Floating Point Stress Tests

High Performance Linpack Tests Livermore Loops/OpenGL Tests Loops/OpenGL Dual Monitors

Input/Output Stress Tests

Summary

These stress tests are a continuation of activity also covered in Raspberry Pi 4B 32 Bit Benchmarks.htm

This report contains details and results of the programs used for stress testing Raspberry Pi 4B. They cover multi core CPU integer and floating point tests with data covering caches and RAM, Input/Output exercisers for the main, USB and network connected drives and graphics activity via OpenGL. The programs used, or new test functions, are ones that are known to have caused errors, system failures or overheating issuers on earlier Raspberry Pi systems and PCs. Those reported here represent a small sample of the tests carried out.

When running the stress tests, the environment was monitored using system utilities and a program that measures CPU MHz, voltage and temperature. The specially written stress tests mainly provide, display and log average ongoing measurements of performance over sections of the testing period, more useful than a report at the end.

An important observation is that the processor runs at full speed 1500 MHz, until the temperature reaches 80[C, when throttling starts, firstly at 1000 MHz, then 750, 600 and lower. The first tests are intended to show that, using a single core, the CPU can run at full speed with the temperatures less than 75[C (room up to 25[C). This applied to a one hour OpenGL session and a five minute integer stress test, where four cores started throttling after less than a minute.

4 Core Integer Tests - These were run for 15 minutes without any cooling, with a copper heatsink, using an official Power Over Ethernet Hat/Fan and an inexpensive case/fan. The first started throttling after a minute, reaching 85[C, finally running at 56% of initial speed. The second was slightly better at 63%. Both with fans ran at full speed at up to 61[C and 66[C respectively.

4 Core Floating Point Tests - These included runs with no cooling and using the case/fan enclosure, covering L1 cache and L2 cache based data, with the same thermal behaviour as the integer tests. The single precision version ran continuously at more than 20 GFLOPS with the fan, reducing slowly to 10.9 GFLOPS with no cooling. The double precision version obtained up to 10.7 GFLOPS, down to 5.2 GFLOPS.

High Performance Linpack - This double precision benchmark was included as it lead to wrong results and all sorts of other failures using the original Pi 3. The version using ATLAS, with the alternative Basic Linear Algebra Subprograms, was built on the Pi 4 and run with increasing data array sizes up to 20000 x 20000 words, occupying over 3 GB. All ran successfully with and without cooling. At the largest size, the fan cooled setup obtained 10.8 GFLOPS at up to 71[C in just over 8 minutes. The one without cooling was completed in over 14 minutes, down to 6.2 GFLOPS reaching 87[C.

Livermore Loops and OpenGL Benchmarks - Three copies of the former were run along with the most demanding test function from the latter at 1920 x 1080 pixels. The CPU program executes 24 different double precision calculations using three different data sizes for a minimum of nearly 15 minutes. With the case/fan combination all ran at near full speed at an average of 60[C. There was an overall reduction in performance of up to 40% with no cooling, with temperature up to 85[C. The latter was repeated using dual monitors, effectively at 3840 x 1080 pixels, with all programs running somewhat slower.

Input/Output Stress Tests - The main I/O program writes four 164 MB files comprising numerous different hexadecimal data patterns, reads them on a random selection basis, for a specified time, then repetitively reads each 64 KB data block numerous times. Three copies were run for nearly 15 minutes, accessing the main drive, a USB 3 stick and a remote PC via a 1 Gbps LAN, at the same time, along with a copy of the 4 threaded integer testing program, with no cooling attachment. There was the usual throttling at temperatures up to 86[C, with CPU test starting at 58% of possible maximum, reducing to 44%. The LAN test appeared to run continuously at over 32 MB/second, the main drive at 85% of maximum expectation and the USB 3 drive slower at down to 64%. No data comparison failures were detected in handling all that data.

Introduction below or Go To Start

Introduction

My original Raspberry Pi Stress Tests were reported in Raspberry Pi 32 Bit and 64 Bit Benchmarks and Stress Tests.htm September 2017 for up to Pi 3B, then Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and stress tests.htm September 2018, and later Raspberry Pi 3B and 3B+ High Performance Linpack and Error Tests.htm March 2019, including High Performance Linpack errors.

Two new versions of CPU stress tests were introduced, one using single precision floating point multiply and add calculations and the other with integer add and subtract instructions. Stress tests comprised running multiple copies of the programs, in different terminal windows, along with another program that measures CPU MHz, voltage and temperature. The latter tests were carried out, following earlier reports that the Linpack High Performance Benchmark could produce the wrong numeric results, or cause a system crash, using the older Raspberry Pi 3B. The stress tests could reproduce the same sort of failures, using the Pi 3B but, as with HPL, not running on the Pi 3B+.

Analysing results of the earlier tests was complicated when Operating Systems did not assign resources evenly between programs and tests would become more restrictive with CPUs containing more than four cores.

On producing benchmarks to run on Android based systems, the multiple program approach to stress testing was not really applicable. So, I produced Android App version with single programs using multiple threads. I have now converted these to run on Raspberry Pi systems. Examples of Android tests are in android benchmarks.htm. These include MP behaviour and performance of more advanced ARM processors, up to 8 core Cortex-A73.

The new Respberry Pi Versions are initially available as 32 bit programs, comprising tests using single precision floating point, double precision floating point and integer calculations. For these programs, each thread uses dedicated segments of memory based data. There is also a slightly modified MHz/volts/temperature application.

The older OpenGL program is also considered here. All these have run time options to select a particular test function and running time. The CPU tests can be run in a benchmarking mode, the default without any command line parameters, to indicate which function is most appropriate and, at run time, the data size can be specified.

These stress tests are a continuation of activity covered in Raspberry Pi 4B 32 Bit Benchmarks.htm with programs and source codes in Raspberry-Pi-4-Benchmarks.tar.gz that also includes those used for this exercise.

Integer test next or Go To Start

Integer Stress Test - MP-IntStress

The integer program test loop comprises 32 add or subtract instructions, operating on hexadecimal data patterns, with sequences of 8 subtracts then 8 adds to restore the original pattern. The benchmarking mode uses 1, 2, 4, 8, 16 and 32 threads, with data sizes 16 KB, 160 KB and 16 MB. Below is the log file from running the 32 bit benchmark on a Raspberry Pi 4B via Raspbian Buster. Disassembly shows that the test loop, in fact, used 68 instructions, most additional ones being load register type. The result of these is 68/32 instructions per 4 byte word. At the maximum of 1489M words per second, using a single core, resultant execution speed was 3164 MIPS with nearly four times more using all cores.


 MP-Integer-Test 32 Bit v1.0 Fri Jun 21 15:39:57 2019
 Benchmark 1, 2, 4, 8, 16 and 32 Threads
 MB/second
 KB KB MB Same All
 Secs Thrds 16 160 16 Sumcheck Tests

 4.9 1 5956 5754 3977 00000000 Yes
 3.6 2 11861 11429 3763 FFFFFFFF Yes
 3.1 4 22998 21799 3464 5A5A5A5A Yes
 3.1 8 22695 21128 3490 AAAAAAAA Yes
 3.1 16 22835 23491 3485 CCCCCCCC Yes
 3.0 32 22593 23485 3591 0F0F0F0F Yes

Stress Testing Mode

The following shows the run time command and available parameters.

./MP_IntStress Threads tt, Minutes mm, KB kk, Log ll 
tt = 1, 2, 4, 8, 16, 32 
mm = greater than 0 
kk = between 12 and 15624 
ll = number added to log file name between 0 and 99

Floating Point Stress Tests below or Go To Start

Floating Point Stress Tests - MP-FPUStress, MP-FPUStressDP

The floating point programs use functions containing 2, 8 or 32 multiply and add operations, to exploit the availability of instructions that can fuse them together for increased performance. The benchmark uses data sizes of 12.8 KB, 128 KB and 12.8 MB with calculations via 1, 2, 4 and 8 threads. Each word is initialised with the same value of 0.99999 that calculations slowly reduce, the final one being multiplied by 100000 for a sumcheck. Each word is then checked to confirm that all results are identical.

Results are provided below, showing that sumchecks vary by data size and operations per word, due to variations in the number of calculations, but are constant when the thread count is different through executing the same calculations.

Disassembly shows that Double Precision (DP) compilation produced instructions such as vfma.f64 d16, d25, fused multiply and add, operating in one DP word registers. Information available indicates that this is the best possible performance option, producing two operation results per clock cycle, 3.0 GFLOPS per core, in this case. Single Precision (SP) code was vfma.f32 q8, q2, q13 with 4 words in quad registers, where eight results per cycle, might be expected, or 12 GFLOPS per core. Actual maximum SP speeds look as though they could be about half of that.


 MP-Threaded-MFLOPS 32 Bit v1.0 Sun May 26 21:23:49 2019
 Benchmark 1, 2, 4 and 8 Threads
 MFLOPS Numeric Results
 Ops/ KB KB MB KB KB MB
 Secs Thrd Word 12.8 128 12.8 12.8 128 12.8

 1.6 T1 2 2134 2607 656 40392 76406 99700
 2.9 T2 2 5048 5156 621 40392 76406 99700
 4.0 T4 2 7536 9939 681 40392 76406 99700
 5.2 T8 2 7934 9839 639 40392 76406 99700
 7.2 T1 8 5535 5420 2569 54756 85091 99820
 8.7 T2 8 10757 10732 2454 54756 85091 99820
 10.1 T4 8 18108 20703 2444 54756 85091 99820
 11.5 T8 8 19236 20286 2245 54756 85091 99820
 17.4 T1 32 5309 5270 5262 35296 66020 99519
 20.4 T2 32 10551 10528 9753 35296 66020 99519
 22.4 T4 32 20120 20886 11064 35296 66020 99519
 24.5 T8 32 19415 20464 9929 35296 66020 99519

 MP-Threaded-MFLOPS 32 Bit v1.0 Sun May 26 21:26:37 2019
 Double Precision Benchmark 1, 2, 4 and 8 Threads
 MFLOPS Numeric Results
 Ops/ KB KB MB KB KB MB
 Secs Thrd Word 12.8 128 12.8 12.8 128 12.8

 3.4 T1 2 921 998 326 40395 76384 99700
 6.1 T2 2 1968 1995 308 40395 76384 99700
 8.4 T4 2 3465 3925 342 40395 76384 99700
 10.9 T8 2 3646 3702 301 40395 76384 99700
 15.1 T1 8 2377 2446 1283 54805 85108 99820
 18.1 T2 8 4916 4860 1326 54805 85108 99820
 20.5 T4 8 9202 9510 1391 54805 85108 99820
 23.1 T8 8 9090 9006 1298 54805 85108 99820
 34.5 T1 32 2695 2725 2707 35159 66065 99521
 40.3 T2 32 5416 5441 5121 35159 66065 99521
 44.1 T4 32 10666 10831 5275 35159 66065 99521
 48.3 T8 32 10427 10602 4832 35159 66065 99521

Stress Testing Mode

The following shows the run time command and available parameters.

./MP_FPUStress Threads tt, Minutes mm, KB kk, Ops 00, Log ll
or MP_FPUStressDP 
tt = 1, 2, 4, 8, 16, 32, 64 
mm = greater than 0 
kk = between 12 and 15624 
ll = number added to log file name between 0 and 99 
oo = 2, 8 or 32 operations per word

Environment Monitors below or Go To Start

Environment Monitors - RPiHeatMHzVolts2 vmstat, sar

A new version of RPiHeatMHzVolts2 was produced to incorporate temperature of the Power Measurement Integrated Circuit (PMIC). The following shows the run time command and available parameters for the program and an example of logged output. Note that the details are instantaneous samples. This is fine for temperature measurements, that change relatively slowly, but when CPU temperature reaches a critical level, 80[C in this case, MHz throttling comes into play, and this can be down and up quite rapidly. My CPU stress test programs repetitively report average performance over a number of seconds, carrying out the same calculations, providing a better indication of the amount of throttling.

 ./RPiHeatMHzVolts2 Passes pp, Seconds ss, Log ll
 pp = number of passes at ss intervals
 ss = sampling intervals 
 ll = number added to log file name between 0 and 99 

 Temperature and CPU MHz Measurement

 Temperature and CPU MHz Measurement
 Start at Sun Jun 30 14:53:16 2019
 Using 11 samples at 30 second intervals
 Seconds
 0.0 ARM MHz=1500, core volt=0.8912V, CPU temp=60.0'C, pmic temp=54.3'C
 30.0 ARM MHz=1500, core volt=0.8859V, CPU temp=74.0'C, pmic temp=62.8'C
 60.7 ARM MHz=1500, core volt=0.8859V, CPU temp=78.0'C, pmic temp=68.4'C
 91.3 ARM MHz=1500, core volt=0.8859V, CPU temp=82.0'C, pmic temp=70.3'C
 122.0 ARM MHz=1500, core volt=0.8859V, CPU temp=81.0'C, pmic temp=70.3'C
 152.8 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=70.3'C
 183.5 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=70.3'C
 214.4 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=72.2'C
 245.1 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=72.2'C
 276.0 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=72.2'C
 306.9 ARM MHz=1000, core volt=0.8859V, CPU temp=81.0'C, pmic temp=71.2'C
 337.6 ARM MHz=1500, core volt=0.8859V, CPU temp=71.0'C, pmic temp=65.6'C
 End at Sun Jun 30 14:58:54 2019

vmstat - This is used when running stress tests, to indicate system utilisation and to confirm speeds measured by tests. Main columns used are free memory, I/O bytes in and out and user plus system CPU utilisation, where 25% equals equivalent of 100% of one core.

pi@raspberrypi:~ $ vmstat 10 6 - for 6 measurements at 10 second intervals
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 3654628 20884 196956 0 0 26 1 136 223 3 1 96 0 0
1 0 0 3613900 20888 213328 0 0 0 6 1249 2143 13 2 85 0 0
1 0 0 3612044 20904 214660 0 0 0 12 991 1650 24 3 73 0 0
1 0 0 3609776 20904 216944 0 0 0 3 935 1556 25 2 73 0 0
1 0 0 3604040 20912 222448 0 0 0 12 1025 1653 25 3 73 0 0
1 0 0 3602588 20920 224852 0 0 0 6 946 1548 25 2 73 0 0

sar -n DEV - This utility can be used to measure network traffic after installing Sysstat.

sar -n DEV 30 25> sar.txt - for 25 measurements over 30 second periods
IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
Example Write
wlan0 1190.20 2527.47 65.99 3744.17 0.00 0.00 2.93 0.00
Example Read
wlan0 2340.90 1059.03 3378.11 98.89 0.00 0.00 1.60 0.00

OpenGL Stress Tests below or Go To Start

OpenGL Stress Tests - videogl32

The OpenGL benchmark can also be run as a stress test. As a benchmark, it has six tests, the first four portraying moving up and down a tunnel including various independently moving objects, with and without texturing. The last two tests, represent a real application for designing kitchens. The first is in wireframe format, drawn with 23,000 straight lines, the second having colours and textures applied to the surfaces.

The program has options to specify window sizes and to avoid excessive logging for use in a script file, as in the example below. Starting with export vblank_mode=0, turns off VSYNC, identifying where FPS speeds greater than 60 FPS are possible. Following is a script file and sample Pi 4 log. Default running time is 5 seconds each test and full screen, where no sizes are specified. The time can be changes by adding such as Seconds 20 to the commands.

 export vblank_mode=0 
 ./videogl32 Width 320, Height 240, NoEnd 
 ./videogl32 Width 640, Height 480, NoHeading, NoEnd 
 ./videogl32 Width 1024, Height 768, NoHeading, NoEnd 
 ./videogl32 NoHeading 
###################################################################

 GLUT OpenGL Benchmark 32 Bit Version 1, Thu May 2 19:01:05 2019
 Running Time Approximately 5 Seconds Each Test
 Window Size Coloured Objects Textured Objects WireFrm Texture
 Pixels Few All Few All Kitchen Kitchen
 Wide High FPS FPS FPS FPS FPS FPS

 320 240 766.7 371.4 230.6 130.2 32.5 22.7
 640 480 427.3 276.5 206.0 121.8 31.7 22.2
 1024 768 193.1 178.8 150.5 110.4 31.9 21.5
 1920 1080 81.4 79.4 74.6 68.3 30.8 20.0

Stress Tests

It is more appropriate to produce a script file to run stress tests and to include that export function. The run command needs a minutes parameter and an optional test, the default being Test 4 (./videogl32, Minutes 60 would run test 4 for an hour on a full screen.

Below is an indication of CPU utilisation during the six tests. This is followed by results of a short stress test, where average speed over each 30 seconds is reported.


 GLUT OpenGL Benchmark 32 Bit Version 1, Mon Jul 1 16:10:02 2019
 Running Time Approximately 5 Seconds Each Test
 Window Size Coloured Objects Textured Objects WireFrm Texture
 Pixels Few All Few All Kitchen Kitchen
 Wide High FPS FPS FPS FPS FPS FPS

 1920 1080 57.3 56.2 53.4 49.9 30.7 19.9
1 core CP UT 20 28 40 68 104 100
 ###################################################################
 Run Commands - export vblank_mode=0
 ./videogl32 Test 4, Mins 1, Log 7

 OpenGL Reliability Test 32 Bit Version 1, Wed Jul 3 17:28:02 2019

 Display 1920 x 1080 All Objects, With Textures, Test for 1 minutes
 Test 4 All Objects, With Textures, 30 seconds, 47 FPS
 Test 4 All Objects, With Textures, 30 seconds, 46 FPS
 End at Wed Jul 3 17:29:03 2019

Livermore Loops Stress below or Go To Start

Livermore Loops Stress Test - liverloopsPiA7R

The Livermore Loops benchmark was converted to act as a stress test, following wrong numeric results being produced on an overclocked, PC using a Pentium Pro CPU. The Loops comprise 24 double precision floating point kernels, with performance measurements in terms of Millions of Floating Point Operations Per Second or MFLOPS. The kernel tests are repeated three times, with different data sizes. By including the running time of each loop converts the benchmark into a stress test, whereby numeric results of calculations are checked for correctness after each of the numerous passes, with errors errors being logged, along with performance details. Detailed results are displayed continuously, as the tests are running. There is too much detail for logging. So, as shown below, the start times of each section are reported.

Below an example command to run each test for approximately 12 seconds and save results in LoopsLog1.txt. Total time should be around 24 x 3 x 12 = 864 seconds, or longer with CPU MHz throttling. This is followed by an example of results for a short run.

 Run command - ./liverloopsPiA7R Seconds 12 Log 1
 #####################################################

 Livermore Loops Benchmark vfpv4 32 Bit via C/C++ Wed Jul 3 15:11:50 2019

 Reliability test 2 seconds each loop x 24 x 3
 Part 1 of 3 start at Wed Jul 3 15:11:50 2019
 Part 2 of 3 start at Wed Jul 3 15:12:38 2019
 Part 3 of 3 start at Wed Jul 3 15:13:27 2019
 Numeric results were as expected
 MFLOPS for 24 loops
 745.8 955.8 988.7 942.6 209.0 769.8 1194.1 1792.5 1254.6 447.9 213.2 186.3
 150.7 349.9 778.3 623.3 734.2 1035.4 322.9 350.0 435.8 352.9 746.1 187.3
 Overall Ratings
 Maximum Average Geomean Harmean Minimum
 1793.5 641.2 520.2 412.7 140.3
 End of test Wed Jul 3 15:14:16 2019

Input/Output Stress Test below or Go To Start

Input/Output Stress Test - burnindrive2

This is essentially the same as my program used during hundreds of UK Government and University computer acceptance trials during the 1970s and 1980s, with some significant achievements. Burnindrive writes four files, using 164 blocks of 64 KB, repeated 16 times (164.0 MB), with each block containing a unique data pattern. The files are then read for two minutes, on a sort of random sequence, with data and file ID checked for correct values. Then each block (unique pattern) is read numerous times, over one second, again with checking for correct values. Total time is normally about 5 minutes for all tests, with default parameters. The data patterns are shown below, followed by run time parameters, then examples of results provided (see later detailed results).

Patterns
 No. Hex No. Hex No. Hex No. Hex No. Hex No. Hex No. Hex

 1 0 25 800000 49 3 73 FF 97 FFFFDFFF 121 FFFFEAAA 145 FFFFF0F0
 2 1 26 1000000 50 33 74 FF00FF 98 FFFFBFFF 122 FFFFAAAA 146 FFF0F0F0
 3 2 27 2000000 51 333 75 1FF 99 FFFF7FFF 123 FFFEAAAA 147 F0F0F0F0
 4 4 28 4000000 52 3333 76 3FF 100 FFFEFFFF 124 FFFAAAAA 148 FFFFFFE0
 5 8 29 8000000 53 33333 77 7FF 101 FFFDFFFF 125 FFEAAAAA 149 FFFF83E0
 6 10 30 10000000 54 333333 78 FFF 102 FFFBFFFF 126 FFAAAAAA 150 FE0F83E0
 7 20 31 20000000 55 3333333 79 1FFF 103 FFF7FFFF 127 FEAAAAAA 151 FFFFFFC0
 8 40 32 40000000 56 33333333 80 3FFF 104 FFEFFFFF 128 FAAAAAAA 152 FFFC0FC0
 9 80 33 1 57 7 81 7FFF 105 FFDFFFFF 129 EAAAAAAA 153 FFFFFF80
 10 100 34 5 58 1C7 82 FFFF 106 FFBFFFFF 130 AAAAAAAA 154 FFE03F80
 11 200 35 15 59 71C7 83 FFFFFFFF 107 FF7FFFFF 131 FFFFFFFC 155 FFFFFF00
 12 400 36 55 60 1C71C7 84 FFFFFFFE 108 FEFFFFFF 132 FFFFFFCC 156 FF00FF00
 13 800 37 155 61 71C71C7 85 FFFFFFFD 109 FDFFFFFF 133 FFFFFCCC 157 FFFFFE00
 14 1000 38 555 62 F 86 FFFFFFFB 110 FBFFFFFF 134 FFFFCCCC 158 FFFFFC00
 15 2000 39 1555 63 F0F 87 FFFFFFF7 111 F7FFFFFF 135 FFFCCCCC 159 FFFFF800
 16 4000 40 5555 64 F0F0F 88 FFFFFFEF 112 EFFFFFFF 136 FFCCCCCC 160 FFFFF000
 17 8000 41 15555 65 F0F0F0F 89 FFFFFFDF 113 DFFFFFFF 137 FCCCCCCC 161 FFFFE000
 18 10000 42 55555 66 1F 90 FFFFFFBF 114 BFFFFFFF 138 CCCCCCCC 162 FFFFC000
 19 20000 43 155555 67 7C1F 91 FFFFFF7F 115 FFFFFFFE 139 FFFFFFF8 163 FFFF8000
 20 40000 44 555555 68 1F07C1F 92 FFFFFEFF 116 FFFFFFFA 140 FFFFFE38 164 FFFF0000
 21 80000 45 1555555 69 3F 93 FFFFFDFF 117 FFFFFFEA 141 FFFF8E38
 22 100000 46 5555555 70 3F03F 94 FFFFFBFF 118 FFFFFFAA 142 FFE38E38
 23 200000 47 15555555 71 7F 95 FFFFF7FF 119 FFFFFEAA 143 F8E38E38
 24 400000 48 55555555 72 1FC07F 96 FFFFEFFF 120 FFFFFAAA 144 FFFFFFF0

 Sequences - First 16
 No. File No. File No. File No. File

 1 0 1 2 3 5 0 2 1 3 9 0 3 1 2 13 0 1 2 3
 2 1 2 3 0 6 1 3 2 0 10 1 0 3 2 14 1 2 3 0
 3 2 3 0 1 7 2 0 1 3 11 2 1 0 3 15 2 3 0 1
 4 3 0 2 1 8 3 1 2 0 12 3 2 1 0 16 3 0 2 1
 ###########################################################################

Run Time Parameters - Upper or Lower Case
 Default
R or Repeats Data size, multiplier of 10.25 MB, more or less 16
P or Patterns Number of patterns for smaller files < 164 164 M or Minutes Large file reading time 2 L or Log Log file name extension 0 to 99 0 S or Seconds Time to read each block, last section 1 F or FilePath For other than SD card or SD card directory C or CacheData Omit O_DIRECT on opening files to allow caching No O or OutputPatterns Log patterns and file sequences used as above No D or DontRunReadTests Or only run write tests No Format ./burnindrive2 Repeats 16, Minutes 2, Log 0, Seconds 1 or ./burnindrive2 R 16, M 2, L 0, S 1 ########################################################################### 
Examples of Results

 File 1 164.00 MB written in 12.79 seconds 
 File 2 164.00 MB written in 11.93 seconds 
 Read passes 1 x 4 Files x 164.00 MB in 0.31 minutes
 Read passes 2 x 4 Files x 164.00 MB in 0.63 minutes
 Passes in 1 second(s) for each of 164 blocks of 64KB:
 580 580 580 580 580 580 580 580 580 580 580
 580 580 580 580 580 580 580 580 580 580 580

High Performance Linpack Benchmark below or Go To Start

High Performance Linpack Benchmark - xhpl

In 1993, it was found that a precompiled version of High Performance Linpack (HPL) could produce the wrong and inconsistent numeric calculations, also system crashes. For more information see this Raspberry Pi 3B and 3B+ High Performance Linpack and Error Tests.htm. This report includes behaviour of another version, compiled to use ATLAS, using alternative Basic Linear Algebra Subprograms. This took 14 hours to build, and was slower than the earlier one, but still produced the same failures. As indicated in the report, my stress tests could be arranged to produce similar problems. There were no sumcheck failures or system crashes, using the Pi 3B+.

The original precompiled version would not run on the Pi 4 but I rebuilt ATLAS on the new system, this time taking 8 hours. an example of the output for a quick test is shown below:


HPLinpack 2.2 -- High-Performance Linpack benchmark -- February 24, 2016
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 1000 
NB : 128 
PMAP : Row-major process mapping
P : 2 
Q : 2 
PFACT : Right 
NBMIN : 4 
NDIV : 2 
RFACT : Crout 
BCAST : 1ringM 
DEPTH : 1 
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
 ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 1000 128 2 2 0.17 4.048e+00
HPL_pdgesv() start time Sun May 26 08:44:56 2019
HPL_pdgesv() end time Sun May 26 08:44:56 2019
------------------------------------------------------------------------========

Unstressed Tests below or Go To Start

Unstressed Tests

It is quite easy to produce programs that run at high speeds on all cores of a modern computer, be it a PC, tablet, phone or a small board system like the Raspberry Pi. These programs are likely to lead to increased CPU temperatures. Given insufficient cooling arrangements, the systems are likely to continuously reduce CPU MHz (throttling) in order to continue operation, and eventually power down. Before examining the results of stress testing, it is useful to consider what can be run without throttling occurring, in this case, on a Raspberry Pi 4, without any cooling.

Single Core and Multi-Core CPU Tests

Below are various results from running five minute MP-Integer-Tests on a Raspberry Pi 4B, out of the case, with no cooling attachment. As indicated earlier, ongoing speed measurements by a benchmark provides a better understanding of behaviour, than samples of CPU MHz, that can vary rapidly.

Starting and ending recorded temperatures are shown, along with time when and if 80[C was reached, when throttling will start. The first column is for a run using a single thread, where CPU MHz, and effectively measured speeds, were constant over the whole period. The second column provides details when using four threads, with data in L1 caches. The next two made use of data in L2 cache, starting throttling after one minute, worse than the L1 results, but starting at a higher temperature. The last column provides results when data was in RAM and running at full speed for over four aand a half minutes.


 MB/second
 Cache/RAM L2 L1 L2 L2 RAM

 KB 512 64 640 1536 15624
 Threads 1 4 4 8 4
 Start 62 60 62 64 61
 10 5718 23631 22628 20177 3445
 20 5717 23603 22634 18329 3443
 30 5640 23416 22670 18756 3405
 40 5735 23613 22045 17737 3440
 50 5740 23618 22636 18456 3444
 60 5652 23244 22069 19059 3410
 70 5707 23483 19864 17648 3437
 80 5736 23360 18639 16017 3445
 90 5683 21552 17986 16654 3447
 100 5695 20867 17383 14864 3395
 110 5719 20218 16475 14805 3437
 120 5672 19017 16207 15128 3443
 130 5727 18871 15165 13328 3401
 140 5735 18888 14773 12638 3437
 150 5732 18460 14979 12780 3443
 160 5677 17799 14780 13086 3440
 170 5719 17976 14313 13221 3404
 180 5711 18005 14391 12618 3443
 190 5650 17745 14018 12185 3440
 200 5738 17312 14120 13267 3397
 210 5709 17241 14062 11916 3442
 220 5678 17124 14004 11866 3441
 230 5719 17392 13467 12018 3397
 240 5720 16990 13728 11825 3440
 250 5651 17289 13372 12011 3434
 260 5714 17135 13683 11596 3442
 270 5717 16891 13584 11481 3398
 280 5657 16505 13055 11781 3442
 290 5725 17049 13396 11550 3445
 300 5713 16578 12957 11666 3402
 Max 5740 23631 22670 20177 3447
 Min 5640 16505 12957 11481 3395
 % 98 70 57 57 98
 Max C 72 82 84 85 80
Time 80[C N/A 90 60 60 280

OpenGL Test below or Go To Start

OpenGL Test No Cooling

Earlier, I connected the Pi 4 system to BBC iPlayer, via WiFi, and displayed programmes for more than two hours on a full screen 1920x1080 display (not a hot day). With CPU utilisation around 100% of one core, maximum temperature was 70[C, with CPU at 1500 MHz all the time.

For this exercise, I ran the OpenGL Textured Kitchen test for an hour, with a full screen display (hotter day than above). Following is a summary of recorded results by the program, the environmental monitor and vmstat. The program ran at 22 FPS over the whole period, with CPU at a constant 1500 MHz, recording slightly more than 100% utilisation of one core, with maximum temperature reaching 73[C.


 =------ Monitors ------ --------- vmstat --------- Video
 gl32
 [C [C 
 Seconds MHz Volts CPU PMIC free User System Idle FPS

 0 1500 0.8894 61 54 3589900 0 0 100
 120 1500 0.8841 69 59 3523336 25 2 73 22
 240 1500 0.8841 71 62 3520464 25 2 73 22
 360 1500 0.8841 71 63 3522848 25 2 73 22
 480 1500 0.8841 73 63 3522292 25 2 73 22
 600 1500 0.8841 72 63 3522284 25 2 73 22
 720 1500 0.8841 72 63 3521780 24 2 74 22
 840 1500 0.8841 73 63 3520640 25 2 73 22
 960 1500 0.8841 72 63 3520884 25 2 73 22
 1080 1500 0.8841 72 63 3520140 25 2 73 22
 1200 1500 0.8841 73 63 3519864 24 2 73 22
 1320 1500 0.8841 73 63 3519892 25 2 73 22
 1440 1500 0.8841 73 63 3519892 25 2 73 22
 1560 1500 0.8841 73 63 3518880 25 2 73 22
 1680 1500 0.8841 72 63 3519264 25 2 73 22
 1800 1500 0.8841 73 63 3517976 25 2 73 22
 1920 1500 0.8841 73 63 3518616 25 2 73 22
 2040 1500 0.8841 72 63 3517984 25 2 73 22
 2160 1500 0.8841 72 63 3518604 24 2 73 22
 2280 1500 0.8841 73 63 3518496 25 2 73 22
 2400 1500 0.8841 73 63 3518868 25 2 73 22
 2520 1500 0.8841 72 63 3518488 25 2 73 22
 2640 1500 0.8841 73 63 3518212 25 2 73 22
 2760 1500 0.8841 73 63 3520008 25 2 73 22
 2880 1500 0.8841 73 63 3519756 25 2 73 22
 3000 1500 0.8841 73 63 3516752 25 3 72 22
 3120 1500 0.8841 73 63 3518132 25 2 73 22
 3240 1500 0.8841 73 63 3518132 25 2 73 22
 3360 1500 0.8841 73 63 3517620 24 2 73 22
 3480 1500 0.8841 73 63 3517428 25 2 73 22
 3600 1500 0.8841 73 63 3517656 25 2 73 22

Integer Stress Tests below or Go To Start

Integer Stress Tests - MP-IntStress

The following are results of 15 minute stress tests, using 1280 KB data and 8 threads. The data is greater than L2 cache, but was in cache as only four threads were executed at a time. This then ran at full speed, with additional swapping of cached data.

Four tests were carried out with no added cooling on a bare board, fitted with a copper heatsink, then with the official, and expensive, Power Over Ethernet fan and, finally, using an inexpensive case with a fitted fan (GeeekPi Acrylic Case). The changing CPU MHz measurements show that throttling is occurring but, with coarse sampling, they do not reflect real performance, unlike the MB/second details.

With no cooling, throttling started after a minute, reaching 85[C to 86[C, slowly reducing performance to almost half speed. The copper heatsink produced a small improvement. During the two tests where fans were used, the processor ran continuously at 1500 MHz and throughput effectively at a constant MB/second. The POE fan appeared to be slightly more efficient.


 No Cooling Copper Heatsink Official POE Hat Case With Fan
 
Seconds MB/sec MHz [C MB/sec MHz [C MB/sec MHz [C MB/sec MHz [C

 0 1500 60 1500 60 1500 47 1500 41
 20 21651 1500 73 21381 1500 71 21770 1500 56 22018 1500 54
 40 21892 1500 79 20517 1500 74 21767 1500 57 21979 1500 56
 60 20919 1500 81 21407 1500 77 22234 1500 57 22076 1500 58
 80 17174 1000 81 21153 1500 79 22035 1500 58 22248 1500 60
 100 15643 1000 81 20960 1500 81 21920 1500 59 22153 1500 61
 120 15163 1000 82 18967 1500 82 22184 1500 60 22239 1500 63
 140 14756 1000 81 16828 1000 81 21941 1500 60 22037 1500 64
 160 14491 1000 83 15892 1500 83 21863 1500 60 22231 1500 65
 180 14492 1000 83 16157 1000 82 21753 1500 60 22130 1500 64
 200 14283 1000 84 15039 1000 82 21921 1500 60 22050 1500 65
 220 14386 1000 83 15438 1000 82 21656 1500 60 22210 1500 66
 240 14101 1000 83 14905 1000 82 21908 1500 60 22132 1500 65
 260 13574 1000 84 14597 1000 83 21983 1500 60 22298 1500 65
 280 13763 1000 83 14703 1000 83 21701 1500 60 22031 1500 66
 300 13179 1000 84 14519 1000 82 21857 1500 60 22285 1500 65
 320 13566 1000 84 14204 1000 84 21791 1500 60 22009 1500 65
 340 13368 750 84 14139 750 83 21468 1500 60 22101 1500 65
 360 13530 1000 84 14249 1000 84 22162 1500 60 22166 1500 65
 380 13190 1000 85 14457 1000 82 21819 1500 61 22163 1500 66
 400 13215 1000 84 14395 1000 83 21800 1500 60 22243 1500 65
 420 13021 750 85 14365 1000 83 22083 1500 61 22115 1500 64
 440 13127 1000 84 14214 1000 83 21780 1500 60 22172 1500 64
 460 12933 1000 85 14152 1000 83 21902 1500 60 22138 1500 64
 480 12658 1000 85 14090 1000 84 21964 1500 60 22220 1500 64
 500 12981 750 83 14199 1000 84 22026 1500 61 22061 1500 65
 520 12699 1000 85 14005 1000 83 21661 1500 61 22027 1500 64
 540 12622 1000 84 13987 1000 84 21684 1500 60 22281 1500 65
 560 12761 1000 84 14222 1000 84 22071 1500 59 22097 1500 64
 580 13408 1000 84 13845 1000 84 21728 1500 58 22225 1500 64
 600 13878 1000 85 13945 1000 84 21981 1500 59 22091 1500 62
 620 13893 1000 83 13877 1000 84 21704 1500 58 22203 1500 62
 640 13717 1000 86 13844 1000 84 21935 1500 58 22133 1500 62
 660 13321 1000 85 13774 1000 83 21816 1500 61 22075 1500 62
 680 13154 1000 85 13500 1000 83 21827 1500 61 22229 1500 63
 700 12663 1000 85 13926 1000 83 21995 1500 60 22007 1500 63
 720 12504 1000 85 13722 1000 83 22004 1500 60 22279 1500 64
 740 12501 750 85 13778 750 84 21954 1500 60 22020 1500 65
 760 12227 1000 85 13564 1000 83 21848 1500 60 22270 1500 65
 780 12199 750 85 13755 1000 82 21840 1500 61 22129 1500 65
 800 12505 1000 85 13451 1500 82 22137 1500 59 22175 1500 64
 820 12268 750 85 13587 1000 83 21876 1500 60 22210 1500 64
 840 12322 1500 85 13610 1000 82 21685 1500 61 22041 1500 65
 860 12312 1500 85 14411 1500 82 22077 1500 61 22192 1500 65
 880 12306 1500 85 14380 1500 83 21842 1500 61 22109 1500 65
 900 12305 1500 85 14345 1500 83 21883 1500 61 22199 1500 65
 Max 21892 86 21407 84 22234 61 22298 66
 Min 12199 750 13451 750 21468 1500 21979 1500
%Min/Max 56 63 97 99

Floating Point Stress Tests or Go To Start

Single Precision Floating Point Stress Tests - MP-FPUStress

The table below covers the first 10 minutes of tests on the three cooling configurations. This time, the rather meaningless variations in recorded CPU MHz are not included. Again they used 1280 KB data (320K words) and 8 threads, with 8 floating point operations per word. Maximum temperatures and associated performance degradations were similar to those during the integer tests.

The following graphs provide a more meaningful indication of the effects of adequate cooling that is needed for this kind CPU utilisation (confirmed during running by vmstat as 100% of four cores).


 No Cooling Copper HS Case+Fan
Seconds [C GFLOPS [C GFLOPS [C GFLOPS

 0 61 59 40
 20 76 19.2 73 19.6 55 20.7
 40 81 19.0 78 19.4 61 20.3
 60 82 17.8 80 19.6 62 20.2
 80 83 15.5 82 17.2 64 20.7
 100 84 15.0 82 15.6 65 20.2
 120 83 14.0 82 14.5 66 20.3
 140 84 13.3 81 13.9 65 20.3
 160 84 13.3 83 13.9 66 20.7
 180 86 12.9 83 13.5 67 20.3
 200 85 13.0 83 13.6 67 20.3
 220 84 12.8 84 13.4 66 20.4
 240 84 12.6 83 13.3 67 20.6
 260 83 12.6 84 13.3 67 20.3
 280 85 12.2 84 13.3 67 20.4
 300 84 12.1 83 13.0 67 20.3
 320 85 12.0 84 13.0 67 20.8
 340 84 11.6 85 12.8 67 20.3
 360 85 11.6 84 13.0 67 20.2
 380 85 11.3 83 12.7 67 20.7
 400 85 11.6 84 12.8 67 20.5
 420 84 11.6 84 12.5 68 20.2
 440 85 11.5 84 12.7 67 20.4
 460 84 11.5 85 12.6 67 20.4
 480 85 11.5 84 12.3 66 20.2
 500 84 11.1 85 12.4 67 20.3
 520 85 11.3 83 12.4 67 20.2
 540 84 11.4 85 12.4 68 20.5
 560 84 11.3 84 12.3 67 20.2
 580 85 11.3 83 12.3 67 20.4
 600 85 11.3 84 12.3 67 20.2
 900 85 10.9 84 12.2 67 20.3
 Max 19.2 19.6 20.8
 Min 10.9 12.2 20.3
%Min/Max 57

GFLOPS

Double Precision Floating Point Stress Tests below or Go To Start

Double Precision Floating Point Stress Tests - MP-FPUStressDP

Four sets of results are below, again excluding those CPU MHz figures, but including PMIC temperatures. They are without and with the case/fan, using 8 threads, one with 1280 KB data size at 8 operations per word, and the other 128 KB with 32 operations per word.

The second one runs at a higher speed and lower temperature, using data in L1 caches, compared with the other via L2 cache. Maximum temperature and performance degradation of the latter were similar to the earlier examples.


 1280 KB, 8 Threads, 8 Ops/Word 128 KB, 8 Threads, 32 Ops/Word
 No Fan CPU PMIC Fan CPU PMIC No Fan CPU PMIC Fan CPU PMIC
 Second GFLOPS [C [C GFLOPS [C [C GFLOPS [C [C GFLOPS [C [C

 0 48 42.0 45 42.0 54 47.7 39 35.4
 20 9.3 64 55.2 9.1 61 55.2 10.7 70 57.1 10.7 39 35.4
 40 9.2 73 62.8 9.0 65 59.0 10.6 73 61.8 10.7 53 43.9
 60 9.2 79 68.4 9.1 67 61.8 10.7 75 64.6 10.6 56 48.6
 80 8.8 80 70.3 9.3 66 62.8 10.7 78 67.5 10.6 57 50.5
 100 7.8 81 70.3 9.1 67 62.8 10.7 80 69.4 10.7 58 51.4
 120 7.2 82 70.3 9.2 67 62.8 10.1 82 70.3 10.7 59 53.3
 140 6.8 82 70.3 9.3 67 62.8 9.5 81 70.3 10.7 59 53.3
 160 6.5 82 70.3 9.1 68 62.8 9.1 80 70.3 10.6 59 53.3
 180 6.3 82 70.3 9.1 68 62.8 8.7 82 70.3 10.7 60 53.3
 200 6.1 81 70.3 9.3 68 64.6 8.5 81 70.3 10.7 59 54.3
 220 6.2 82 70.3 9.1 69 62.8 8.5 82 70.3 10.7 59 54.3
 240 6.2 83 72.2 9.1 68 62.8 8.3 81 70.3 10.6 60 54.3
 260 6.1 83 72.2 9.3 68 62.8 8.3 81 70.3 10.7 59 54.3
 280 6.1 84 72.2 9.1 67 64.6 8.0 83 70.3 10.7 61 54.3
 300 6.1 83 70.3 9.1 68 64.6 8.0 81 70.3 10.6 60 54.3
 320 6.0 84 72.2 9.1 68 64.6 7.9 82 70.3 10.7 61 54.3
 340 5.9 85 72.2 9.2 68 64.6 7.6 82 71.2 10.8 61 53.3
 360 5.8 85 72.2 9.1 68 62.8 7.7 82 70.3 10.7 60 54.3
 380 5.8 84 72.2 9.2 68 64.6 7.8 83 70.3 10.6 60 54.3
 400 5.7 84 72.2 9.1 68 62.8 7.7 83 70.3 10.6 61 54.3
 420 5.7 84 72.2 9.2 68 62.8 7.7 82 70.3 10.6 60 54.3
 440 5.6 84 72.2 9.1 68 64.6 7.6 82 70.3 10.7 60 54.3
 460 5.7 84 72.2 9.1 68 62.8 7.6 83 70.3 10.6 61 54.3
 480 5.6 84 72.2 9.1 69 64.6 7.5 82 70.3 10.7 60 54.3
 500 5.6 84 72.2 9.1 69 62.8 7.5 82 71.2 10.6 60 54.3
 520 5.5 85 72.2 9.1 68 62.8 7.4 81 70.3 10.7 60 54.3
 540 5.5 84 74.1 9.3 67 64.6 7.4 82 70.3 10.7 60 54.3
 560 5.5 84 72.2 9.1 69 62.8 7.4 82 70.3 10.8 59 54.3
 580 5.4 84 74.1 9.1 67 64.6 7.3 82 70.3 10.7 60 55.2
 600 5.5 84 74.1 9.2 68 62.8 7.3 81 70.3 10.7 60 54.3
 620 5.4 85 74.1 9.2 68 62.8 7.3 82 70.3 10.6 61 54.3
 640 5.4 84 74.1 9.2 69 62.8 7.3 83 70.3 10.6 62 55.2
 660 5.4 85 74.1 9.3 68 62.8 7.3 83 70.3 10.7 60 54.3
 680 5.5 85 72.2 9.0 67 62.8 7.3 83 70.3 10.7 60 54.3
 700 5.4 85 74.1 9.1 69 62.8 7.3 81 70.3 10.7 60 54.3
 720 5.4 85 72.2 9.2 68 64.6 7.3 84 70.3 10.7 60 54.3
 740 5.4 84 72.2 9.1 68 62.8 7.3 82 70.3 10.7 60 55.2
 760 5.3 85 74.1 9.1 68 62.8 7.3 81 70.3 10.7 60 54.3
 780 5.4 85 74.1 9.3 67 62.8 7.3 83 70.3 10.7 59 54.3
 800 5.4 84 74.1 9.1 69 64.6 7.3 81 70.3 10.7 60 54.3
 820 5.3 85 72.2 9.1 68 62.8 7.3 82 70.3 10.7 60 54.3
 840 5.3 84 72.2 9.2 68 62.8 7.2 82 70.3 10.7 60 54.3
 860 5.2 85 74.1 9.1 69 64.6 7.2 81 70.3 10.6 60 54.3
 880 5.2 85 74.1 9.1 68 62.8 7.2 82 70.3 10.6 60 54.3
 900 5.3 84 74.1 9.1 68 62.8 7.2 81 70.3 10.6 60 54.3
 Max 9.3 85 74.1 9.3 69 64.6 10.7 84 71.2 10.8 62 55.2
 Min 5.2 9.0 7.2 10.6
%Min/Ma 57 97 67 98

High Performance Linpack below or Go To Start

High Performance Linpack Tests - xhpl

Parameter sizes (as set in HPL.dat) were the same as in the introductory description, except for the one for data size (N). The programs were run on a bare board Pi 4 and one in the inexpensive case with a fan. No data errors or system freezes/crashes were encountered over these and many more runs.

Following is a summary of four tests on each of the test beds. The the bare board arrangement performs relatively well for short duration tests, but the long ones are needed to demonstrate maximum performance. The latter was 10.8 Double Precision GFLOPS, similar to my MP-FPUStressDP program, where, at 58%, that also applied to efficiency of the uncooled processor. As it should be, the sumchecks of hot and cold systems were identical, at a given data size.

Assuming similarity with the original scalar Linpack benchmark, data size would be N x N x 8 for double precision operation or 3.2 GB at N = 20000, as approximately confirmed by the vmstat memory details provided below. The latter also indicate that the four core CPU utilisation was 100%.

Below the table is a graph, of the worst case uncooled scenario, to demonstrate CPU MHz throttling and temperature ([C times 10), based on samples every 10 seconds.


Cooling N Seconds GFLOPS SumCheck Max [C Av MHz

None 4000 5.7 7.4 0.002398 71 1500
Fan 4000 5.2 8.2 0.002398 54 1500
None 8000 39.9 8.6 0.001675 81 1500
Fan 8000 36.7 9.3 0.001675 61 1500
None 16000 404.3 6.8 0.001126 86 919
Fan 16000 263.0 10.4 0.001126 70 1500
None 20000 856.0 6.2 0.001019 87 828
Fan 20000 494.3 10.8 0.001019 71 1500
%None/Fan 20000 58 58 Same 55

procs -----------memory---------- ---swap-- -----io---- -system- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st

0 0 0 3510712 30172 276440 0 0 17 1 90 111 16 1 83 0 0
4 0 0 3097880 30180 277088 0 0 0 6 526 515 52 3 45 0 0
4 0 0 2357404 30188 276492 0 0 0 6 620 344 95 5 0 0 0
4 0 0 1615192 30196 276976 0 0 0 11 586 289 95 5 0 0 0
5 0 0 871872 30204 271032 0 0 0 5 490 75 96 4 0 0 0
4 0 768 282692 26828 241092 0 34 20 40 604 307 95 4 0 0 0
4 0 768 276088 26968 250344 6 0 118 12 591 288 99 1 0 0 0

HPL MHz

Livermore loops/OpenGL Tests below or Go To Start

Livermore Loops/OpenGL Tests

Three copies of Livermore Loops stress tests were run along with the OpenGL Tiled Kitchen section, on a Pi 4 without any cooling, then in the case with a fan. The former program was arranged to have a nominal durations of 864 seconds (72 x 12). When running, the CPU load is continuously changing and that can be reflected in ongoing temperature and OpenGL Frames Per Second. The tests make use of six terminal windows and a full screen display, run by the commands shown below. This is followed by the results.

With no cooling, there were the usual increases in temperature and performance degradation, but not as severe as some of the earlier tests. With cooling performance was effectively constant. Averages at the end reflect the differences. There were no reports of errors or any sign of system failures.

Dual Monitors - The benchmarks, with no cooling, were repeated using two monitors, providing a screen area of 3840 x 1080 pixels, the results being included below. Performance was only between 7% and 15% slower than the single monitor example. Benchmark results of all OpenGL tests and provided at the end of the table, showing those more dependent on graphics speed were affected by the number of pixels displayed.

Run Commands
Terminal 1
vmstat 10 100

Terminal 2 script file
lxterminal -e ./RPiHeatMHzVolts2 Passes 120 Seconds 10 Log 20
lxterminal -e ./liverloopsPiA7R Seconds 12 Log 20
lxterminal -e ./liverloopsPiA7R Seconds 12 Log 21
lxterminal -e ./liverloopsPiA7R Seconds 12 Log 22

Terminal 3
./videogl32 Test 6, Mins 16, Log 20
 Dual Monitors
 No Cooling Case + Fan No Cooling
Seconds MHz [C FPS MHz [C FPS MHz [C FPS

 0 1500 64 1500 42 1500 69
 30 1000 82 19 1500 57 20 1000 82 13
 60 1000 82 16 1500 62 21 750 84 13
 90 1500 83 15 1500 66 20 1000 83 12
 120 750 85 13 1500 64 21 1000 85 11
 150 1000 84 13 1500 62 20 600 84 10
 180 1000 83 14 1500 60 22 750 85 10
 210 1000 84 15 1500 62 21 1000 85 12
 240 1000 83 14 1500 61 19 750 84 12
 270 1000 84 14 1500 63 21 1000 85 11
 300 1000 84 14 1500 61 21 750 84 12
 330 750 84 14 1500 64 21 1000 85 12
 360 1000 82 14 1500 64 21 750 84 11
 390 1000 83 12 1500 66 21 750 84 12
 420 1000 84 13 1500 63 21 750 84 12
 450 1000 84 14 1500 62 20 750 85 11
 480 750 84 12 1500 63 21 750 85 12
 510 750 85 13 1500 61 21 1000 84 12
 540 750 84 11 1500 59 21 750 84 11
 570 1000 84 12 1500 62 21 1000 85 11
 600 1000 84 14 1500 62 22 750 83 10
 630 1000 84 13 1500 66 19 750 84 11
 660 750 84 14 1500 60 21 750 85 12
 690 750 86 13 1500 65 21 1000 85 12
 720 1000 84 13 1500 63 21 600 83 11
 750 1000 83 13 1500 62 21 1000 84 12
 780 750 84 12 1500 61 21 1000 85 11
 810 750 85 12 1500 62 21 1000 84 11
 840 1000 85 12 1500 58 21 750 86 10
 870 750 85 12 1500 58 21 750 85 11
 900 1000 84 13 1500 54 21 1000 85 10
 930 1000 85 13 1500 50 21 1000 85 11
 960 1000 84 13 1500 49 21 750 85 11
 990 1000 85 14 1500 45 21 750 85 12
Average 956 83 13 1500 60 21 866 84 11
%Fan 64 139 64
MFLOPS 916 1502 854
%Fan 61

 OpenGL Benchmark Single and Dual Monitors
 Window Size Coloured Objects Textured Objects WireFrm Texture
 Pixels Few All Few All Kitchen Kitchen
 Wide High FPS FPS FPS FPS FPS FPS

 1920 1080 58.2 56.7 54.5 49.9 31.0 20.7
 3840 1080 27.9 26.5 26.0 25.2 25.7 16.3

Input/Output Stress Tests below or Go To Start

Input/Output Stress Tests - burnindrive2

For this test, three copies of burnindrive2 were run, accessing the main drive, a USB 3 stick and a remote PC via a 1 Gbps LAN, along with MP-IntStress using four threads. The environment was monitored using RPiHeatMHzVolts2, vmstat for drive activity and CPU MHz, and sar -n for network traffic. Commands used and results are provided below. Stress tests are generally based on executing a fixed set of activities, where completion times can vary. Hence, the provided results are extrapolated approximations, with drive speeds the average for a particular activity.

All stress tests ran to completion without detecting any errors. CPU utilisation was around 90% of four cores but CPU throttling still occurred, with temperatures up to 86[C (and possibly not enough throttling). Performance measured by the stress tests was broadly in line with the system vmstat and sar measurements. In order to indicate which activity suffered from the most degradation, performance of standalone runs are also provided. It seems that LAN traffic was given a higher priority, with no speed reduction, followed by the main SD drive. Worst was the CPU bound program, probably suffering from a lower priority besides throttling.


 ------ MB/second ------ 
 Secs Main USB 3 1Gbps MP-Int MHz [C
 Drive Drive LAN Stress 

 0 1500 55
 30 11.9 38.0 42.3 13116 1500 66
 60 11.9 44.1 32.8 13063 1500 73
 90 28.1 44.1 32.8 13615 1500 75
 120 28.1 44.1 32.8 13734 1500 81
 150 28.1 44.1 32.8 13370 1500 83
 180 28.1 44.1 32.8 13555 1000 82
 210 28.1 44.1 32.8 13285 1000 82
 240 28.1 44.1 32.8 13194 1000 82
 270 28.1 44.1 32.8 13022 1000 83
 300 28.1 44.1 32.8 13316 1000 82
 330 28.1 44.1 32.8 13615 1000 82
 360 28.1 44.1 32.8 13677 1000 84
 390 28.1 44.1 32.8 13315 1000 83
 420 28.1 44.1 32.8 13273 1000 82
 450 28.1 44.1 32.8 13117 1000 83
 480 28.1 44.1 32.8 12860 1000 83
 510 28.1 44.1 32.8 12370 1000 83
 540 28.1 44.1 32.8 11863 1000 84
 570 28.1 44.1 32.8 11550 1000 84
 600 28.1 44.1 32.8 11312 1000 82
 630 28.1 44.1 32.7 10895 1000 83
 660 28.1 54.0 32.7 10696 1000 83
 690 29.7 54.0 32.7 10479 1000 84
 720 29.7 54.0 32.7 10223 750 84
 750 29.7 54.0 32.7 10227 1000 85
 780 29.7 54.0 32.7 10413 750 84
 810 29.7 54.0 10090 750 86
 840 29.7 9952 1000 84
 Stand Alone
 Max 33.4 68.6 32.3 22664

vmstat
procs -----------memory---------- --swap-- -----io---- -system- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
Start
 6 2 0 3499820 45700 271552 0 0 12409 32193 16450 13425 54 24 20 2 0
 2 2 0 3503956 45776 264632 0 0 46811 12381 27174 16714 68 23 3 5 0
 4 2 0 3506080 45816 264348 0 0 76271 248 25885 16188 64 22 7 7 0
Read 1
 5 2 0 3502984 45992 264844 0 0 75473 5 18777 14118 67 24 3 6 0
 5 2 0 3504888 46032 264884 0 0 74726 7 18907 14631 66 25 4 5 0
Read 2
 6 2 0 3503236 46544 265452 0 0 86628 7 17180 15114 62 28 4 6 0
 4 2 0 3501964 46592 265452 0 0 80815 6 15395 14321 68 28 2 2 0

 Ethernet Read sar -n DEV 
 rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil

 24841.37 6883.90 36206.23 505.50 0.00 0.00 0.03 29.66

Go To Start