Raspberry Pi 4 CPU MHz Throttling Performance Effects

Roy Longbottom


Contents



Summary

On running stress tests on a Pi 4, without a cooling fan attached, CPU temperature can increase, leading to clock speed throttling in stages, normally between 1500, 1000, 750 and 600 MHz. In turn, this leads to slower performance, proportional to to clock speed reduction, for the processor speed limited programs. The following series of tests were run at the two extremes of 1500 and 600 MHz, via Raspbian, and performance measured, with monitoring of CPU MHz, voltage and temperatures.

Video Playback - These tests were run using BBC iPlayer with data transfers via LAN. Unlike with WiFi connection, no buffering was indicated using both MHz settings but, at 600 MHz, pixel dimension quality was worse viewing complex images, then the same with plain backgrounds.

OpenGL Benchmark - Performance was the same or worse, at 600 MHz, depending whether graphics or CPU speed was the limiting factor.

Main Drive Benchmark - Writing and reading large files, average data transfer speed was around 6% faster at the higher MHz setting.

LAN Benchmark - Again transferring large files, as for the drive benchmark, but with increased CPU time. Gigabit speeds were demonstrated at the higher MHz, some 25% faster than at 600 MHz.

LAN Plus CPU Benchmarks - Using the same LAN benchmark plus a single threaded processor test, network speeds were the same as before but the CPU benchmark performance was proportional to MHz settings.

Copying Files From Pi 4 USB 3 Drive Via LAN To Windows PC - Transferring 1.1 GB files, at three quarters gigabit speeds at 1500 MHz, data transfers were 70% faster than at 600 MHz, where CPU time was particularly important.

Remember - The measurements of performance at 600 MHz represent the extreme deviations from unthrottled operation, unlikely to be seen in most environments, running the applications considered here.


Introduction

On running stress tests on a Pi 4, without a cooling fan attached, CPU temperature can increase, leading to clock speed throttling in stages, normally between 1500, 1000, 750 and 600 MHz. In turn, this leads to slower performance, proportional to to clock speed reduction, for the processor speed limited programs.

I decided that it would be useful to obtain some idea of the effects on other activities that have different workload profiles. The first problem was find a way of running continuously at a constant low speed. Initially, I used the uncontrollable hair dryer treatment, where the CPU throttling was reduced to indicate 429 MHz at 88ーC, with the remarkable Pi 4 continuing its processing activity.

Fortunately, I found that setting the frequency scaling governor to powersave resulted in a constant 600 MHz. Along with using the performance setting, for 1500 MHz, I ran the following tests at both frequencies to determine speed or throughput changes. These were in conjunction with using bcmstat performance monitor, particularly to identify CPU utilisation of individual cores.

Following are examples of the main CPU details quoted, with added average for CPU 0 to 4, that is the same as total CPU utilisation. Then %idle = 100 - %total. A complication is that adding percentages for the first seven columns, less %idle, is not always the same as %total.


 %user %nice %sys %idle %iowt %irq %s/irq %total cpu0 cpu1 cpu2 cpu3 av 0to3

 1.86 0 11.00 58.97 12.56 0 3.75 41.03 60.87 10.61 34.25 58.29 41.01
 0.96 0 2.65 73.89 23.11 0 0 26.11 1.80 100 1.80 0.84 26.11 

During the tests, CPU temperatures MHz and voltage were also noted, the former not increasing that much, with the others continuing at constant values.

Note - The measurements of performance at 600 MHz represent the extreme deviations from unthrottled operation, unlikely to be seen in most environments, running the applications considered here.

Video Playback Next or Go To Start


Video Playback

When using BBC iPlayer, with LAN connection, and displaying a TV programme with complex images (lions and grass), the player indicated data transfer speed of 3900 kbps and image size 960 x 540, at CPU frequency of 1500 MHz. Here, CPU utilisation of all cores approached 50% and received Bytes per second was a similar rating to the identified kbps.

My main HD TV played the complex programme at 1920 x 1080 pixels, but reverted to 960 x 540 with input from the Pi 4.

Inferior quality was indicated using 600 MHz, at 1700 kbps and size 704 x 396, with near double CPU utilisation. The performance statistics were somewhat strange, where measured data reading speed appeared to be much higher than that at 1500 MHz. Was it errors causing retransmission?

Another programme with a snow background appeared to run at the same quality at 1500 and 600 MHz. I think that this identifies the claim that the same performance can be obtained at a lower clock speed. In this case, the performance requirements would be the same Frames Per Second and image quality. Then additional but slower instructions speeds need to have execution time less than frame time.


 Average Values from bcmstat
 ARM Bytes Per Second
 MHz RX B/s TX B/s %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3

1500 463,603 8,933 29 14 53 0 0 47 52 46 46 45 
 600 921,810 11,166 48 27 19 0 0 81 83 77 81 82
 
OpenGL Benchmark Next or Go To Start


OpenGL Benchmark

Below are Frames Per Second Speeds measured by the benchmark, with VSYNC disabled, avoiding clamping maximum display rate at 60 FPS. The results at this window size indicate effectively the same performance at 1500 and 600 MHz for the first four test functions, but 1500 MHz more than twice as fast for the more complex kitchen displays.

First sight of the total utilisation figures can suggest the opposite effects, being higher at 600 MHz for the first batch and similar for the others. In fact, the benchmark program has no built in multithreading, leading to most of the processing time using a single core, but not the same one over a period. Most of the value obtained by multiplying %total by 4 represents utilisation of that core. With the first tests, the time to display the images is far greater than that. Then, with the CPU limited kitchen displays, both configurations were effectively running at 100% CPU utilisation (of one core), leading to the frame time being much longer on the 600 MHz setup.


 Window Size Coloured Objects Textured Objects WireFrm Texture
 CPU Pixels Few All Few All Kitchen Kitchen
 MHz Wide High FPS FPS FPS FPS FPS FPS

1500 1920 1080 56.9 55.4 52.5 48.8 30.6 20.2
 600 1920 1080 55.9 54.6 51.5 48.4 12.9 9.0

 ARM
 Test MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3

 1 1500 4 3 92 0 0 8 11 9 9 4
 6 1500 24 3 73 0 0 27 6 23 76 2
 1 600 8 5 86 0 0 14 15 18 13 12
 6 600 26 4 70 0 0 30 7 54 55 6
 
Main Drive Benchmark Next or Go To Start


Main Drive Benchmark

Selecting large files, most of the time is spent on sequential writing and reading. So, just this section is considered. Performance measured by the benchmark shows that using the higher clock speed produced slightly faster results.

This time, utilisation details are averages over three sample 30 second periods. Note that the relatively high values for single core activity are due to waiting for I/O time. Real utilisation (user and system) is quite low, identifying difference in CPU MHz.


 MBytes/Second
 MB Write1 Write2 Write3 Read1 Read2 Read3

1500 MHz 
 512 18.64 18.86 18.39 42.71 42.73 42.67
1024 18.59 18.59 18.60 42.65 42.67 42.67
 600 MHz
 512 17.81 17.86 17.97 40.02 39.90 39.82
1024 18.04 18.07 18.11 39.90 40.10 40.02

 MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3

1500 0.9 1.7 73 24 0 27 2 93 9 2
1500 0.9 1.8 73 24 0 27 49 2 54 1
1500 0.8 1.8 73 23 0 27 7 28 71 2
 600 1.9 3.5 71 22 0 29 30 75 8 2
 600 1.8 4.2 71 21 0 29 34 32 20 29
 600 1.8 3.7 71 21 0 29 21 61 23 9
 
LAN Benchmark Next or Go To Start


LAN Benchmark

The LAN benchmark is the same as that used for the above drive tests, for writing and reading large files. The tests are for accessing a Windows based PC. The bcmstat TX B/s and RX B/s measurements and Windows Task Manager reports confirmed the writing and reading speeds provided below. The later bcmstat CPU utilisation results are for average writing and reading over all large files (as they were quite similar).

In this case, with higher speed measurements and CPU utilisation than during drive tests, performance degradation at 600 MHz was more significant, estimated as an average of around 20%. There, 60% total utilisation indicates more than two cores in continuous use.


 ------------------ MBytes/Second ------------------
 MB Write1 Write2 Write3 Read1 Read2 Read3

1500 MHz
 512 110.32 91.41 110.53 107.83 99.65 107.70
1024 112.08 111.89 111.59 109.38 104.58 108.37
 600 MHz
 512 70.35 51.76 79.38 92.76 100.62 95.26
1024 84.79 83.52 81.84 97.44 96.58 93.85

 MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3

1500 1.4 18.3 53.2 6.9 11.8 47 82 28 41 36
 600 2.5 25.6 40.2 6.4 19.6 60 90 60 48 41
 
LAN and CPU Benchmarks Next or Go To Start


LAN and CPU Benchmarks

This was a repeat of the LAN tests, also running a CPU benchmark, using a single thread, at the same time. LAN performance was again degraded by an average around 20% with that for the CPU benchmark in line with clock speed difference.


 ----------------------- MBytes/Second ----------------------
 MB Write1 Write2 Write3 Read1 Read2 Read3 CPU Test

1500 MHz
 512 110.48 111.43 111.18 109.97 94.91 97.20 5950
1024 111.62 112.28 114.25 107.49 101.86 111.01
 600 MHz
 512 52.44 57.23 40.36 90.69 92.02 95.19 2364
1024 84.80 71.79 98.81 98.19 102.36 101.84
 
 MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3

1500 23.9 18.1 29.9 7.9 9.9 70 82 77 57 64
 600 26.4 25.9 19.4 5.5 18.7 81 89 74 79 80
 
Copying Files To Windows PC Next or Go To Start


Copying Pi 4 USB 3 Files To Windows PC Via LAN

Tests were carried out copying 1.1 GB files from a USB 3 flash drive on the Pi 4, via LAN, to a Windows based PC, to see if there were different performance implications to LAN benchmarks. Below are average bcmstat results at 1500 and 600 MHz, the copying time being based on the number of one second sample when data was being transmitted. With overheads, time and MB/second details confirmed data volumes.

Performance degradation at 600 MHz, based on MB/second copying speed was 40%, compared with 60% in MHz. CPU utilisation and data transfer speed were lower than those for the LAN benchmark


 MHz Secs MB/sec %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3
 
 1500 17 71.9 1.7 11.2 74.0 4.4 2.5 26 71 6 10 18
 600 29 42.4 2.9 18.8 66.7 4.4 1.5 33 73 17 27 16
 
Core Utilisation Variations Next or Go To Start


Core Utilisation Variations

Using bcmstat, minimum sampling period is one second, where this being a relatively long time, it is not possible to determine whether cores are executing instructions at the same time. For example, four cores each at 25% utilisation could apply to only one core being used continuously, but the Operating System switching between cores to share the load. The bcmstat provided %Totals represents averages of those for all cores.

Below are all the recorded bcmstat utilisation records for the file copying tests, at one second intervals, the values being rounded down for clarification of variations. This time the total shown is not the average. At 1500 MHz, it looks as though nearly 100% of one CPU is used continuously. Carrying this over to the 600 MHz results. leads to a much longer time to copy the files. Then there is other activity that means that more than one core is being used at the same time.


 1500 MHz 600 MHz
 Seconds cpu0 cpu1 cpu2 cpu3 Total cpu0 cpu1 cpu2 cpu3 Total

 1 60 28 24 10 123 61 65 40 39 205
 2 87 6 14 13 120 75 47 11 25 158
 3 83 9 4 9 105 78 9 14 28 129
 4 86 4 5 4 100 95 16 24 8 143
 5 86 6 5 2 100 55 9 10 55 130
 6 86 3 8 1 99 53 57 9 8 128
 7 86 5 7 1 100 95 10 5 12 122
 8 86 4 8 2 100 54 6 14 59 133
 9 86 2 9 2 100 61 50 11 6 129
 10 85 4 8 2 99 97 5 14 4 120
 11 64 5 7 27 102 52 10 62 8 132
 12 51 1 14 35 101 57 6 59 10 133
 13 48 3 7 46 104 56 6 60 13 135
 14 48 2 7 47 104 52 9 61 13 135
 15 50 4 8 46 108 87 8 16 13 124
 16 41 4 16 37 98 97 7 10 7 121
 17 97 8 10 6 121
 18 95 9 12 8 123
 19 80 9 17 24 131
 20 95 14 8 5 123
 21 77 22 21 9 129
 22 54 12 62 7 135
 23 54 11 55 11 131
 24 61 10 55 6 133
 25 74 10 32 11 128
 26 82 19 17 13 131
 
The other extremes, from above, are for video playback, indicated for sample periods below. In this case, it could be expected that displayed frames per second were the same at both CPU MHz settings. The 600 MHz results indicate that more than two cores were in use at the same time. At 1500 MHz, average CPU time used is 1.7 CPU seconds per displayed second. At 600 MHz (times 15/6), this suggests 4.25 CPU seconds per second, impossible with four cores, indicating that there must be some performance degradation. This appeared to be in the form of poorer quality of displayed images (that might not be noticed).

 1500 MHz 600 MHz
 Seconds cpu0 cpu1 cpu2 cpu3 Total cpu0 cpu1 cpu2 cpu3 Total

 1 41 45 36 42 165 79 77 80 72 308
 2 38 43 33 44 159 73 53 75 68 269
 3 40 38 34 44 156 62 48 64 58 232
 4 47 55 42 48 192 81 69 84 84 319
 5 45 54 38 45 181 86 70 88 85 329
 6 39 49 32 40 160 84 67 80 80 312
 7 40 44 36 41 161 76 60 78 77 290
 8 49 57 40 45 191 74 66 76 65 281
 9 44 46 40 47 176 67 54 63 60 244
 10 38 45 31 41 154 71 48 61 59 239
 Average 170 282
 
Go To Start









AltStyle によって変換されたページ (->オリジナル) /