We use some essential cookies to make our website work.

We use optional cookies, as detailed in our cookie policy, to remember your settings and understand how you use our website.

143 posts
bleep42
Posts: 275
Joined: Wed Mar 07, 2012 12:43 pm

Re: NUMA Testing

Fri Nov 01, 2024 3:13 pm

Hi Dom,
I 'm using a 2GB Pi 4 doing a 7 minute compile, using all 4 cores & am not seeing any improvement, is this to be expected with only 2GB to play with? As far as I can tell NUMA is active, and numa=fake=1 was automatically added to my /proc/cmdline file. I have also tried changing it to numa=fake=2 but got exactly the same compile/build time.
Any thoughts?
Regards Kevin

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Nov 01, 2024 3:48 pm

bleep42 wrote:
Fri Nov 01, 2024 3:13 pm
I 'm using a 2GB Pi 4 doing a 7 minute compile, using all 4 cores & am not seeing any improvement, is this to be expected with only 2GB to play with? As far as I can tell NUMA is active, and numa=fake=1 was automatically added to my /proc/cmdline file. I have also tried changing it to numa=fake=2 but got exactly the same compile/build time.
1GB and 2GB Pi4 are single rank, and we don't recommend changing BANKLOW on Pi4 as without iommus on the codec path, it hurts video decode performance (h264/hevc) and possibly camera.

Hence the default 1 numa region as there is no benefit available. A 4GB or 8GB Pi4 is dual rank and would benefit 2 numa regions.

If you don't use video hardware, you could try running with SDRAM_BANKLOW=1 which will default to 4 numa regions and may show a benefit.
But I'd describe this configuration as experimental, and not generally recommended.

bleep42
Posts: 275
Joined: Wed Mar 07, 2012 12:43 pm

Re: NUMA Testing

Fri Nov 01, 2024 5:23 pm

Hi Dom,
Thanks for your reply, for the sake of experimentation, I tried SDRAM_BANKLOW=1, which did indeed give me numa=fake=4, however, looking at the output from numactl --hardware and lscpu | grep -i numa it looked to me as though it had fowled up, and my compile time actually got worse, so I tried numa=fake=2 and that showed a speed up from 6:55 to 6:32, if you include the other memory tuning you said had been done, my compile time was 7:01, now with the settings indicated 6:32, so a roughly 7% speed up for some under the hood optimisations, tried a couple of videos using VLC and they seemed to play back ok, so it's looking good to me. :-) Maybe now I'll have to get a new Pi5.
Regards, Kevin.

bensimmo
Posts: 8140
Joined: Sun Dec 28, 2014 3:02 pm

Re: NUMA Testing

Mon Nov 04, 2024 4:59 pm

Just to say it hasn't broken my Jellyfin .org setup on a Pi5 8GB.
No idea if it is faster or slower. Just that it still works.

fik
Posts: 99
Joined: Thu Jan 17, 2013 1:34 pm

Re: NUMA Testing

Thu Nov 28, 2024 8:42 pm

Has the NUMA-enabled kernel 6.6.62 just been released as a package update to all users? So one no more needs to do rpi-update, just the SDRAM_BANKLOW=, right?

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing

Fri Nov 29, 2024 2:12 am

fik wrote:
Thu Nov 28, 2024 8:42 pm
Has the NUMA-enabled kernel 6.6.62 just been released as a package update to all users? So one no more needs to do rpi-update, just the SDRAM_BANKLOW=, right?
From what I understand, the 6.12.x kernel uses a completely different NUMA patch set.

viewtopic.php?p=2270416#p2270416

This would appear to imply that testing needs to start all over again to verify video and everything else works properly.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Nov 29, 2024 10:48 am

fik wrote:
Thu Nov 28, 2024 8:42 pm
Has the NUMA-enabled kernel 6.6.62 just been released as a package update to all users? So one no more needs to do rpi-update, just the SDRAM_BANKLOW=, right?
Can you check "vcgencmd bootloader_version"? (and also "vcgencmd version" if on a Pi4). Newer than 23 Oct is wanted.

Mikael
Posts: 127
Joined: Wed Feb 11, 2015 12:35 pm

Re: NUMA Testing

Fri Nov 29, 2024 3:42 pm

Seems NUMA is active now for me with a regular apt update/full-upgrade:

https://browser.geekbench.com/v6/cpu/9140585

solaris33
Posts: 16
Joined: Wed Jan 06, 2021 2:58 pm

Re: NUMA Testing. Is this way safe enough?

Sat Nov 30, 2024 10:14 pm

Hello everyone:
I enable NUMA follow the instruction of this topic,and got a high score with geekebnch.
I'm using a 8G pi5.Here is my configuration:
===========================================================
pi@Pi-5:~ $ vcgencmd version
2024年11月12日 16:10:44
Copyright (c) 2012 Broadcom
version 4b019946 (release) (embedded)
pi@Pi-5:~ $ vcgencmd bootloader_version
2024年11月12日 16:10:44
version 4b019946a06ea87c36d14fd6d702cc65fb458b9e (release)
timestamp 1731427844
update-time 1732995212
capabilities 0x0000007f

pi@Pi-5:~ $ vcgencmd bootloader_config
[all]
BOOT_UART=1
BOOT_ORDER=0xf461
NET_INSTALL_AT_POWER_ON=1
SDRAM_BANKLOW=1

pi@Pi-5:~ $ cat /proc/cmdline
reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave numa=fake=8 system_heap.max_order=0 smsc95xx.macaddr=2C:CF:67:3C:6C:FD vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000 console=ttyAMA10,115200 console=tty1 root=PARTUUID=f580ed4f-02 rootfstype=ext4 fsck.repair=yes rootwait video=HDMI-A-1:1440x900M@60D plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=CN
pi@Pi-5:~ $
pi@Pi-5:/sys/class/gpio $ uname -a
Linux Pi-5 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024年11月25日) aarch64 GNU/Linux
===========================================================

To test memory stability ,I run 4 processes of memtester simultaneously . One of process report FAILURE at first run.
==================================================
pi@Pi-5:~ $ sudo memtester 1600M 2
memtester version 4.6.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 16384
pagesizemask is 0xffffffffffffc000
want 1600MB (1677721600 bytes)
got 1600MB (1677721600 bytes), trying mlock ...locked.
Loop 1/2:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : |FAILURE: 0x57bd735cdc7eda8f != 0x57bd735ddc7eda8f at offset 0x000000002a229478.
16-bit Writes : ok

Loop 2/2:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.
==================================================
When 4 processes running simultaneously ,Soc temp is high to 67 degree.
==================================================
pi@Pi-5:/sys/class/gpio $ sensors
pwmfan-isa-0000
Adapter: ISA adapter
fan1: 4697 RPM

cpu_thermal-virtual-0
Adapter: Virtual device
temp1: +65.0°C

rpi_volt-isa-0000
Adapter: ISA adapter
in0: N/A

rp1_adc-isa-0000
Adapter: ISA adapter
in1: 1.49 V
in2: 2.52 V
in3: 1.40 V
in4: 1.42 V
temp1: +51.4°C
==================================================
I have overclock pi5 to 2.9G,without over volt.Maybe it's the reason of FAILURE?But my pi5 runs well for sevral week.
I may test more(eg. No overcloking,disable NUMA) to find the reason.
Attachments
memtester.png
memtester.png (45.36 KiB) Viewed 3470 times

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing. Is this way safe enough?

Sat Nov 30, 2024 10:22 pm

solaris33 wrote:
Sat Nov 30, 2024 10:14 pm
I have overclock pi5 to 2.9G,without over volt.Maybe it's the reason of FAILURE?But my pi5 runs well for sevral week.
I may test more(eg. No overcloking,disable NUMA) to find the reason.
If memtester reports failure, then something is definitely broken. I agree it's likely the overclock went wrong.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing. Is this way safe enough?

Sun Dec 01, 2024 6:33 pm

solaris33 wrote:
Sat Nov 30, 2024 10:14 pm
I have overclock pi5 to 2.9G,without over volt.Maybe it's the reason of FAILURE?But my pi5 runs well for sevral week.
I may test more(eg. No overcloking,disable NUMA) to find the reason.
Yes, I'm sure the memtester failure is due to overclock and not NUMA.
Reduce the overclock, or add an over_voltage_delta setting until memtester passes.

cjan
Posts: 1192
Joined: Sun May 06, 2012 12:00 am

Re: NUMA Testing

Sun Dec 01, 2024 9:06 pm

does NUMA against sched_ext, got system a little bit freeze.
Pi4/kernel-6.12/SDRAM_BANKLOW=3/scx_lavd.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Mon Dec 02, 2024 10:31 am

cjan wrote:
Sun Dec 01, 2024 9:06 pm
does NUMA against sched_ext, got system a little bit freeze.
Pi4/kernel-6.12/SDRAM_BANKLOW=3/scx_lavd.
Sorry. Unable to parse. More words needed.

geerlingguy
Posts: 585
Joined: Sun Feb 15, 2015 3:43 am

Re: NUMA Testing

Mon Dec 02, 2024 7:02 pm

I got stable results for different benchmarks, testing on a couple different Pi 5s. I also tested overclocking at various frequencies, and saw a nice speed boost there too (taking back my single core record for Geekbench 6 of course, though I left some room for someone else to push it a bit further!):

https://www.jeffgeerling.com/blog/2024/ ... ram-tuning
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Mon Dec 02, 2024 7:46 pm

I believe rpi-update is no longer needed. An apt full-upgrade should give you a numa supporting bootloader/firmware/kernel.

DirkS
Posts: 11516
Joined: Tue Jun 19, 2012 9:46 pm

Re: NUMA Testing

Mon Dec 02, 2024 7:59 pm

dom wrote:
Mon Dec 02, 2024 7:46 pm
I believe rpi-update is no longer needed. An apt full-upgrade should give you a numa supporting bootloader/firmware/kernel.
I get

Code: Select all

pi@pi5crow:~$ dmesg | grep NUMA 
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: NODE_DATA [mem 0x3f7fd2c0-0x3f7fffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7fffd2c0-0x7fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xbfffd2c0-0xbfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xffffd2c0-0xffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x13fffd2c0-0x13fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x17fffd2c0-0x17fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x1bfffd2c0-0x1bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x1ffb892c0-0x1ffb8bfff]
[ 0.000000] mempolicy: NUMA default policy overridden to 'interleave:0-7'
[ 1.886532] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
without rpi-update. Seems to confirm it if I understand correctly.

geerlingguy
Posts: 585
Joined: Sun Feb 15, 2015 3:43 am

Re: NUMA Testing

Mon Dec 02, 2024 8:50 pm

Indeed! I've updated the blog post to note that.
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

geerlingguy
Posts: 585
Joined: Sun Feb 15, 2015 3:43 am

Re: NUMA Testing

Mon Dec 02, 2024 9:33 pm

Mikael wrote:
Wed Oct 23, 2024 8:03 pm
Ran some tests and things seem to be working well overall. Geekbench 6 is indeed faster than ever before. So is Google Octane v2. Passmark performs extremely good as well, except for the "Memory Write" sub test, which regresses from ~11500 MB/s to 8000 something. Sysbench memory read performance is also better than ever (very nice latency, both average and max), while the write test corroborates Passmark's result, showing a severe regression in write performance.
Seeing your post, I also re-ran tinymembench, and found a similar regression—in my case memset going from 13+ GB/sec to 9.3 GB/sec. See my full test results here.

@dom - Here's how I'm running `tinymembench` for a reproducer:

Code: Select all

git clone https://github.com/rojaster/tinymembench.git
cd tinymembench && make
./tinymembench
The lower synthetic speeds don't result in other benchmarks running any slower though. It's just sad that not every number goes up and to the right :D
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing

Mon Dec 02, 2024 11:22 pm

geerlingguy wrote:
Mon Dec 02, 2024 9:33 pm
Mikael wrote:
Wed Oct 23, 2024 8:03 pm
Ran some tests and things seem to be working well overall. Geekbench 6 is indeed faster than ever before. So is Google Octane v2. Passmark performs extremely good as well, except for the "Memory Write" sub test, which regresses from ~11500 MB/s to 8000 something. Sysbench memory read performance is also better than ever (very nice latency, both average and max), while the write test corroborates Passmark's result, showing a severe regression in write performance.
Seeing your post, I also re-ran tinymembench, and found a similar regression—in my case memset going from 13+ GB/sec to 9.3 GB/sec. See my full test results here.

@dom - Here's how I'm running `tinymembench` for a reproducer:

Code: Select all

git clone https://github.com/rojaster/tinymembench.git
cd tinymembench && make
./tinymembench
The lower synthetic speeds don't result in other benchmarks running any slower though. It's just sad that not every number goes up and to the right :D
Can numactl be used to change the memory allocation strategy program by program?

helpmepi
Posts: 16
Joined: Thu Jan 23, 2020 6:51 am

Re: NUMA Testing

Tue Dec 03, 2024 8:53 am

I've got a Pi 5 8Gb and I've not done any testing for this so far so I haven't made any manual changes or installed anything for this other than keeping my Pi up to date. I'm running 12 Nov Bootloader.

When I run

dmesg | grep NUMA

I get:

[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000001ffffffff]
.
[ 0.000000] mempolicy: NUMA default policy overridden to 'interleave:0'

Does this mean it's working or do I need to do something else? I've tried searching for just 'interleave:0', but I could only find something where it says it would be 'interleave:0-7'?

Thanks.

DirkS
Posts: 11516
Joined: Tue Jun 19, 2012 9:46 pm

Re: NUMA Testing

Tue Dec 03, 2024 12:40 pm

helpmepi wrote:
Tue Dec 03, 2024 8:53 am
Does this mean it's working or do I need to do something else? I've tried searching for just 'interleave:0', but I could only find something where it says it would be 'interleave:0-7'?

Thanks.
You still need to activate by adding a setting in the eeprom config. See e.g. Jeff Geerling's blog https://www.jeffgeerling.com/blog/2024/ ... ram-tuning

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Tue Dec 03, 2024 12:48 pm

DirkS wrote:
Tue Dec 03, 2024 12:40 pm
You still need to activate by adding a setting in the eeprom config. See e.g. Jeff Geerling's blog https://www.jeffgeerling.com/blog/2024/ ... ram-tuning
Or even the first post of this thread!

helpmepi
Posts: 16
Joined: Thu Jan 23, 2020 6:51 am

Re: NUMA Testing

Tue Dec 03, 2024 2:45 pm

dom wrote:
Tue Dec 03, 2024 12:48 pm
DirkS wrote:
Tue Dec 03, 2024 12:40 pm
You still need to activate by adding a setting in the eeprom config. See e.g. Jeff Geerling's blog https://www.jeffgeerling.com/blog/2024/ ... ram-tuning
Or even the first post of this thread!
Apologies, I was a little confused what was needed now with the changes being incorporated into bootloaders etc. that was all.

I'll add the line

Code: Select all

SDRAM_BANKLOW=1 
and test.

Thanks for the fast replies.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Tue Dec 03, 2024 8:30 pm

ejolson wrote:
Mon Dec 02, 2024 11:22 pm
Can numactl be used to change the memory allocation strategy program by program?
Yes. If it's not enabled on command line you can enable it for one process with:

Code: Select all

pi@pios:~ $ numactl --interleave=all head -1 /proc/self/numa_maps
5570550000 interleave:0-7 file=/usr/bin/head mapped=9 active=0 N0=2 N1=2 N2=1 N3=1 N4=1 N6=1 N7=1 kernelpagesize_kB=4
And if it's enabled you can disable it for one process:

Code: Select all

pi@pios:~ $ numactl -l head -1 /proc/self/numa_maps
555f8b0000 local file=/usr/bin/head mapped=9 active=0 N0=2 N1=2 N2=1 N3=1 N4=1 N6=1 N7=1 kernelpagesize_kB=4
Although be aware if you've enabled SDRAM_BANKLOW in eeprom config, it won't disable that, so you'll probably harm performance by disabling it this way.

bytter
Posts: 5
Joined: Fri Dec 06, 2024 11:45 pm

Re: NUMA Testing

Sat Dec 07, 2024 12:00 am

I believe I'm part of the unlucky ones that are observing a regression here. Here's the before:

Code: Select all

sysbench memory --memory-block-size=1G --memory-total-size=20G --memory-oper=write run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
 block size: 1048576KiB
 total size: 20480MiB
 operation: write
 scope: global
Initializing worker threads...
Threads started!
Total operations: 20 ( 12.02 per second)
20480.00 MiB transferred (12305.51 MiB/sec)
General statistics:
 total time: 1.6630s
 total number of events: 20
Latency (ms):
 min: 82.77
 avg: 83.14
 max: 83.63
 95th percentile: 82.96
 sum: 1662.71
Threads fairness:
 events (avg/stddev): 20.0000/0.00
 execution time (avg/stddev): 1.6627/0.00
Here's the system description:

Code: Select all

> uname -a
Linux raspberrypi01 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024年10月08日) aarch64 GNU/Linux
> vcgencmd bootloader_version
2024年10月21日 15:27:49
version 951e1cc9d8b1d81c0ca1783a0634605616970bc3 (release)
timestamp 1729520869
update-time 1730230594
capabilities 0x0000007f
Now, let's perform a simple update to the kernel and EEPROM (without setting the SDRAM_BANKLOW):

Code: Select all

sudo apt update && sudo apt full-upgrade
So now we are in:

Code: Select all

> uname -a
Linux raspberrypi01 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024年11月25日) aarch64 GNU/Linux
> vcgencmd bootloader_version
2024年11月12日 16:10:44
version 4b019946a06ea87c36d14fd6d702cc65fb458b9e (release)
timestamp 1731427844
update-time 1733529134
capabilities 0x0000007f
Running the benchmark again:

Code: Select all

sysbench memory --memory-block-size=1G --memory-total-size=20G --memory-oper=write run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
 block size: 1048576KiB
 total size: 20480MiB
 operation: write
 scope: global
Initializing worker threads...
Threads started!
Total operations: 20 ( 12.03 per second)
20480.00 MiB transferred (12319.23 MiB/sec)
General statistics:
 total time: 1.6611s
 total number of events: 20
Latency (ms):
 min: 82.39
 avg: 83.04
 max: 83.79
 95th percentile: 82.96
 sum: 1660.77
Threads fairness:
 events (avg/stddev): 20.0000/0.00
 execution time (avg/stddev): 1.6608/0.00
Still consistent. Now let's update the flag...

Code: Select all

> sudo rpi-eeprom-config -e
SDRAM_BANKLOW=1
Reboot. Check for NUMA:

Code: Select all

> cat /proc/cmdline
... numa=fake=8 ...
Run the benchmark:

Code: Select all

sysbench memory --memory-block-size=1G --memory-total-size=20G --memory-oper=write run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
 block size: 1048576KiB
 total size: 20480MiB
 operation: write
 scope: global
Initializing worker threads...
Threads started!
Total operations: 20 ( 8.37 per second)
20480.00 MiB transferred (8575.85 MiB/sec)
General statistics:
 total time: 2.3869s
 total number of events: 20
Latency (ms):
 min: 118.76
 avg: 119.33
 max: 120.26
 95th percentile: 118.92
 sum: 2386.66
Threads fairness:
 events (avg/stddev): 20.0000/0.00
 execution time (avg/stddev): 2.3867/0.00
Thoughts?

143 posts

Return to "Advanced users"

AltStyle によって変換されたページ (->オリジナル) /