We use some essential cookies to make our website work.

We use optional cookies, as detailed in our cookie policy, to remember your settings and understand how you use our website.

143 posts
dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Oct 25, 2024 9:50 am

mby wrote:
Fri Oct 25, 2024 9:12 am
Could you apply the NUMA commits to 6.12 as well, please? – Thank you!
Interestingly, in parallel to the "fake" NUMA patches we developed (with Igalia), which are on the 6.6 kernel,
there was similar work from Microsoft.

And the good news is that patch set is (at least partly) upstream in 6.12.

We did try to adapt the Microsoft patches to 6.6, but there had been so much rework in mm between 6.6 and 6.12 that it wasn't feasible,
so 6.6 uses Igalia's version and later kernels will use upstream version.

I did test the Microsoft patches (from mailing list, rather than merged version) on our 6.11 kernel a while back, and (with a little massaging) we got the same positive results.

Let me have a go at 6.12.

fguerraz
Posts: 14
Joined: Mon Jun 03, 2024 9:15 am

Re: NUMA Testing

Fri Oct 25, 2024 9:52 am

dom wrote:
Thu Oct 24, 2024 12:50 pm
Don't update vcgencmd. Update start4.elf (correctly - because you are not running the version you think you are).
I'm supposed to get the latest firmware from https://github.com/raspberrypi/rpi-eeprom/ right? master branch?
I tried everything, it was stuck on that stupid version from February (which is not even in that repo???)

I ended up booting on an external drive on rpi-os, doing the update, copying the files from the /boot/firmware/ directory to my normal install, and now I have the right version, and yes it automatically adds the 2 fake numa nodes.

First question: do you have any idea why I wasn't able to update the firmware just by doing

Code: Select all

sudo rpi-eeprom-config -e pieeprom-2024年10月21日.bin
on my mongel ubuntu rpi install?

Second far less important question: why 2 fake numa nodes if the optimal number is 8 according to your previous comment?

mby
Posts: 116
Joined: Sat Dec 15, 2018 3:05 pm

Re: NUMA Testing

Fri Oct 25, 2024 9:56 am

dom wrote:
Fri Oct 25, 2024 9:50 am
(...) Let me have a go at 6.12.
Thank you, @dom!

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Oct 25, 2024 10:03 am

xeny wrote:
Fri Oct 25, 2024 9:48 am
Am I right in thinking that in "typical" workloads, reads significantly outnumber writes(I dimly remember a ratio of 3:1)? If that is the case, worse write performance as a trade for better read performance would be an overall win.
Yes. And I'll also say that many write dominated benchmarks do go faster with NUMA.
So far all real world workloads I've measured have been better with NUMA, so I think for users it's a big win overall.

I need to look more closely at the two benchmarks that are bad.
I think Passmark is closed source, so that may be trickier to analyse, but sysbench is open source, so should be more amenable.

You can write a multi-core memset many ways, and we have seen some that are pathologically bad.
e.g. OpenMP parallelises a memset loop by having each core write alternate words.

That is a terrible thing to do architecturally (the cores will get out of sync with each other and thrash caches and sdram pages).
Giving each core a contiguous quarter of the buffer would be massively more efficient.

NUMA isn't necessarily worse for that workload, but we've found that pathological workloads tend to be unstable, and small changes can have big effects.
For example running the arms a little faster may make the test run much slower (as the thrashing starts to beat in a worse way).

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Oct 25, 2024 10:18 am

fguerraz wrote:
Fri Oct 25, 2024 9:52 am
I'm supposed to get the latest firmware from https://github.com/raspberrypi/rpi-eeprom/ right? master branch?
No - that gets the latest bootloader (reported by "vcgencmd bootloader_version", which you had updated correctly).
The firmware (start*.elf/fixup*.dat) comes from here. (And its version is reported by "vcgencmd version").

Note: Pi4 has bootloader and firmware. Pi5 only has bootloader.
rpi-update on RPiOS updates both.
Second far less important question: why 2 fake numa nodes if the optimal number is 8 according to your previous comment?
Read this post.
Default numbers for Pi4 and Pi5 (and 1GB/2GB vs 4GB/8GB) are different.

fguerraz
Posts: 14
Joined: Mon Jun 03, 2024 9:15 am

Re: NUMA Testing

Fri Oct 25, 2024 10:30 am

dom wrote:
Fri Oct 25, 2024 10:18 am
No - that gets the latest bootloader (reported by "vcgencmd bootloader_version", which you had updated correctly).
The firmware (start*.elf/fixup*.dat) comes from here. (And its version is reported by "vcgencmd version").

Note: Pi4 has bootloader and firmware. Pi5 only has bootloader.
rpi-update on RPiOS updates both.
That makes sense now! Thank you so much
dom wrote:
Fri Oct 25, 2024 10:18 am
Second far less important question: why 2 fake numa nodes if the optimal number is 8 according to your previous comment?

Read this post.
Default numbers for Pi4 and Pi5 (and 1GB/2GB vs 4GB/8GB) are different.
I did the maths and it checks out, all clear now.

As a side note for me 8 is too many, I've got memory allocation failures when attempting to do nfs mounts. So I would say it's probably a bad default.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Oct 25, 2024 10:59 am

fguerraz wrote:
Fri Oct 25, 2024 10:30 am
As a side note for me 8 is too many, I've got memory allocation failures when attempting to do nfs mounts. So I would say it's probably a bad default.
It is not the default for Pi4 (which is 2). If you have a large cma region configured you may hit problems (cma must fit into a single numa region).

In general significant cma (e.g. more than 64M) is not needed on Pi5, where all hardware blocks have iommus and can use non-contiguous system memory.

significant cma may be needed on earlier Pi's if you have using the hardware blocks (camera and hw video decode).
3d did have an iommu on Pi4 (but not earlier), so again doesn't need significant cma.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Fri Oct 25, 2024 2:35 pm

mby wrote:
Fri Oct 25, 2024 9:56 am
dom wrote:
Fri Oct 25, 2024 9:50 am
(...) Let me have a go at 6.12.
Thank you, @dom!
See https://github.com/raspberrypi/linux/pull/6443

mby
Posts: 116
Joined: Sat Dec 15, 2018 3:05 pm

Re: NUMA Testing

Fri Oct 25, 2024 4:31 pm

Thank you so much, @dom, terrific!

Initial results: compiles great, cat /proc/cmdline produces numa=fake=8, as expected, and performance has definitely improved (need to establish a comparable system, yet).

Craig1234
Posts: 49
Joined: Sun May 30, 2021 5:24 pm

Re: NUMA Testing

Sat Oct 26, 2024 6:16 pm

Thank you.
I have been using it for a few days. works really well and no problems very fast.

Mikael
Posts: 127
Joined: Wed Feb 11, 2015 12:35 pm

Re: NUMA Testing

Mon Oct 28, 2024 8:03 am

I've done some additional testing on my 8GB board. I've been comparing results from my initial testing of this board back when it was released, trying to replicate conditions as best as I could. I've seen no regressions so far, except for the previously mentioned synthetic memory write tests. The performance uplift compared to the original state back in November last year is really quite impressive. Browser tests and compilation seems to have received very healthy improvements (often >10%), although I guess this could be in part due to browser/compiler updates as well. And Geekbench scores are looking good:

Geekbench 5:
ST: 617 -> 670
MT: 1734 -> 1996

Geekbench 6:
ST: 753 -> 893
MT: 1507 -> 2126

EDIT: Yamagi Quake 2 performance is also up by 6-9 %, but that included using Mesa 24.3.0-devel, so not sure yet what contributes most to the performance improvement there.

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing

Mon Oct 28, 2024 1:53 pm

Mikael wrote:
Mon Oct 28, 2024 8:03 am
I've done some additional testing on my 8GB board. I've been comparing results from my initial testing of this board back when it was released, trying to replicate conditions as best as I could. I've seen no regressions so far, except for the previously mentioned synthetic memory write tests. The performance uplift compared to the original state back in November last year is really quite impressive. Browser tests and compilation seems to have received very healthy improvements (often >10%), although I guess this could be in part due to browser/compiler updates as well. And Geekbench scores are looking good:

Geekbench 5:
ST: 617 -> 670
MT: 1734 -> 1996

Geekbench 6:
ST: 753 -> 893
MT: 1507 -> 2126

EDIT: Yamagi Quake 2 performance is also up by 6-9 %, but that included using Mesa 24.3.0-devel, so not sure yet what contributes most to the performance improvement there.
Do you still have output for the individual tests that comprise the Geekbench averages?

Mikael
Posts: 127
Joined: Wed Feb 11, 2015 12:35 pm

Re: NUMA Testing

Mon Oct 28, 2024 6:35 pm

ejolson wrote:
Mon Oct 28, 2024 1:53 pm
Do you still have output for the individual tests that comprise the Geekbench averages?
Sure, here are comparison links:

GB5: https://browser.geekbench.com/v5/cpu/co ... e=21923528

GB6: https://browser.geekbench.com/v6/cpu/co ... ne=3448484

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Mon Oct 28, 2024 6:43 pm

Mikael wrote:
Mon Oct 28, 2024 6:35 pm
Sure, here are comparison links:

GB5: https://browser.geekbench.com/v5/cpu/co ... e=21923528

GB6: https://browser.geekbench.com/v6/cpu/co ... ne=3448484
Some hefty boosts there. Like:
HTML5 Browser +207.0%

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing

Tue Oct 29, 2024 6:47 am

dom wrote:
Mon Oct 28, 2024 6:43 pm
Mikael wrote:
Mon Oct 28, 2024 6:35 pm
Sure, here are comparison links:

GB5: https://browser.geekbench.com/v5/cpu/co ... e=21923528

GB6: https://browser.geekbench.com/v6/cpu/co ... ne=3448484
Some hefty boosts there. Like:
HTML5 Browser +207.0%
The average 41 percent improvement seen by

2126/1507=1.41

for GeekBench 6 obtained without changing the executable or hardware is astonishing.

The canine coder growled, it's not a fake NUMA allocator but instead fake UMA hardware. Even so, tails did not stop wagging all day.

While Linux on Xeon and EPYC servers is already NUMA aware due to the sockets and chiplets, I wonder if additional considerations related to the physical layout of the RAM would also lead to better performance. For x86 just 10 percent is equivalent to a generational improvement.

Mikael
Posts: 127
Joined: Wed Feb 11, 2015 12:35 pm

Re: NUMA Testing

Tue Oct 29, 2024 7:31 am

ejolson wrote:
Tue Oct 29, 2024 6:47 am
The average 41 percent improvement seen by

2126/1507=1.41

for GeekBench 6 obtained without changing the executable or hardware is astonishing.
It is indeed a very nice improvement. I will point out that the earlier GB6 test uses version 6.2.1 and the recent one uses 6.3.0. However, according to Primate Labs, the results are comparable as long as the CPU does not have Scalable Matrix Extensions support (which the Pi 5 does not). See https://www.geekbench.com/blog/2024/04/geekbench-63/

For the tests themselves, they were both run on fresh Bookworm installations with force_turbo=1, so the results should be trustworthy. :)

Solskogen
Posts: 236
Joined: Tue Sep 27, 2016 6:07 am

Re: NUMA Testing

Tue Oct 29, 2024 8:26 am

I haven't seen anyone report problems with the NUMA changes. Is it perhaps time to release an official kernel? :-)

fik
Posts: 99
Joined: Thu Jan 17, 2013 1:34 pm

Re: NUMA Testing

Tue Oct 29, 2024 9:18 am

I observe very nice gains on both RPi5 8GB (default numa=8) and RPi4 8GB (default numa=2) for both single and multi-threaded benchmarks:

Code: Select all

benchmark		threrads	RPi5	RPi4
Geekbench 6.3.0		1		+14%	+1%
Geekbench 6.3.0		4		+46%	+8%
Stream 5.10 add		1		-2%	-1%
Stream 5.10 add		4		+27%	+7%
bzip2			1		+16%	+6%
bzip3			4		+96%	+12%
gzip			1		+0%	+0%
pigz			4		+28%	-6%
zstd -T0		4		+164%	+36%
xz -T0			4		+21%	+7%
kernel compile		4		+11%	+4%
EDIT: by mistake I had numa=fake=4 for RPi4, re-running the benchmarks now.
Last edited by fik on Wed Oct 30, 2024 12:46 pm, edited 6 times in total.

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing

Tue Oct 29, 2024 3:54 pm

Solskogen wrote:
Tue Oct 29, 2024 8:26 am
I haven't seen anyone report problems with the NUMA changes. Is it perhaps time to release an official kernel? :-)
The idea seems sound but there could be bugs. For example, does dividing up memory using fake NUMA over time lead to not being able to allocate needed DMA buffers? Probably not, but there are many things to test.

As far as I can tell, the main reason Intel went with overclock settings by default on their 13th and 14th generation CPUs is because journalists generally test with default settings and faster is better publicity. The instabilities resulting from electromigration were not such good publicity.

https://www.theverge.com/2024/10/4/2426 ... -cause-fix

As the time of journalists testing the Pi 4 and 5 with default settings is long past, I think it's better to focus on stability and be conservative with changes to the Linux kernel. At the same time, since fake NUMA does not involve any voltage or clocking changes, damage to the Pi is unlikely.

At this point I'd delay using fake NUMA on a NAS or embedded device that is already working well but definitely turn it on for a desktop.
Last edited by ejolson on Tue Oct 29, 2024 4:20 pm, edited 2 times in total.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Tue Oct 29, 2024 4:15 pm

Assuming no serious regressions are reported, the next step will be enable it by default in rpi-update firmware (i.e. SDRAM_BANKLOW will default to recommended settings) which will get a slightly larger test group.

If there are still no regressions it will go to apt.

It will probably be several weeks before it hits apt - hopefully enough time for any common issues to be reported by rpi-update testers.

cjan
Posts: 1192
Joined: Sun May 06, 2012 12:00 am

Re: NUMA Testing

Tue Oct 29, 2024 6:56 pm

so, what about scx-scheds according to this vlog?
https://www.phoronix.com/news/sched_ext-NUMA-Awareness

ps. Pi4-8G, numa=fake=2.

shi.siudk
Posts: 14
Joined: Thu Apr 06, 2023 8:14 pm

Re: NUMA Testing

Tue Oct 29, 2024 9:49 pm

Hi! Great job!

Tho, I tried the

Code: Select all

SDRAM_BANKLOW=3
out on a pi 4, rev 1.4, 8GB version (that one is my only 8GB pi 4). While I got a roughly 6-7% higher geekbench 6 scores, I got some weird stuff going on with the Chromium browswer. Some random black box or noisy box apearing when I was browsing/scrolling over the internet. Without this lane, everything works just fine, tho with this line, I got a ~1.4% performance boost on geekbench 6.

Here are some other informations about the pi am currently testing on.

Code: Select all

2024年10月21日 15:24:54
version 951e1cc9d8b1d81c0ca1783a0634605616970bc3 (release)
timestamp 1729520694
update-time 1730237791
capabilities 0x0000007f
Linux swayberry 6.6.58-v8+ #1809 SMP PREEMPT Wed Oct 23 11:53:53 BST 2024 aarch64 GNU/Linux
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
DHCP_TIMEOUT=45000
DHCP_REQ_TIMEOUT=4000
TFTP_FILE_TIMEOUT=30000
ENABLE_SELF_UPDATE=1
DISABLE_HDMI=0
BOOT_ORDER=0xf41
coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_headphones=0 numa_policy=interleave snd_bcm2835.enable_headphones=1 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_hdmi=0 smsc95xx.macaddr=E4:5F:01:0A:A7:15 vc_mem.mem_base=0x3eb00000 vc_mem.mem_size=0x3ff00000 console=tty1 root=PARTUUID=43fa19c6-02 rootfstype=ext4 fsck.repair=yes rootwait cfg80211.ieee80211_regdom=CH
Also I overclocked the pi to core 2147MHz and gpu 750MHz. In case that matters.

ejolson
Posts: 13865
Joined: Tue Mar 18, 2014 11:47 am

Re: NUMA Testing

Tue Oct 29, 2024 10:42 pm

shi.siudk wrote:
Tue Oct 29, 2024 9:49 pm
Hi! Great job!

Tho, I tried the

Code: Select all

SDRAM_BANKLOW=3
out on a pi 4, rev 1.4, 8GB version (that one is my only 8GB pi 4). While I got a roughly 6-7% higher geekbench 6 scores, I got some weird stuff going on with the Chromium browswer. Some random black box or noisy box apearing when I was browsing/scrolling over the internet. Without this lane, everything works just fine, tho with this line, I got a ~1.4% performance boost on geekbench 6.

Here are some other informations about the pi am currently testing on.

Code: Select all

2024年10月21日 15:24:54
version 951e1cc9d8b1d81c0ca1783a0634605616970bc3 (release)
timestamp 1729520694
update-time 1730237791
capabilities 0x0000007f
Linux swayberry 6.6.58-v8+ #1809 SMP PREEMPT Wed Oct 23 11:53:53 BST 2024 aarch64 GNU/Linux
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
DHCP_TIMEOUT=45000
DHCP_REQ_TIMEOUT=4000
TFTP_FILE_TIMEOUT=30000
ENABLE_SELF_UPDATE=1
DISABLE_HDMI=0
BOOT_ORDER=0xf41
coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_headphones=0 numa_policy=interleave snd_bcm2835.enable_headphones=1 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_hdmi=0 smsc95xx.macaddr=E4:5F:01:0A:A7:15 vc_mem.mem_base=0x3eb00000 vc_mem.mem_size=0x3ff00000 console=tty1 root=PARTUUID=43fa19c6-02 rootfstype=ext4 fsck.repair=yes rootwait cfg80211.ieee80211_regdom=CH
Also I overclocked the pi to core 2147MHz and gpu 750MHz. In case that matters.
If you turn off the overclock do things run faster?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: NUMA Testing

Wed Oct 30, 2024 10:39 am

shi.siudk wrote:
Tue Oct 29, 2024 9:49 pm
Tho, I tried the

Code: Select all

SDRAM_BANKLOW=3
out on a pi 4, rev 1.4, 8GB version (that one is my only 8GB pi 4). While I got a roughly 6-7% higher geekbench 6 scores, I got some weird stuff going on with the Chromium browswer. Some random black box or noisy box apearing when I was browsing/scrolling over the internet. Without this lane, everything works just fine, tho with this line, I got a ~1.4% performance boost on geekbench 6.
...
Also I overclocked the pi to core 2147MHz and gpu 750MHz. In case that matters.
Can you suggest a web page that reliably gives issues when browsed?
Does just visiting that page (after a reboot) show the issue, or does it require something more elaborate?
Can you disable the overclock and check you still see it?
After seeing it, are there any new messages in dmesg (e.g. allocation failures?)
Report

Code: Select all

cat /proc/meminfo
after seeing the issue.

shi.siudk
Posts: 14
Joined: Thu Apr 06, 2023 8:14 pm

Re: NUMA Testing

Wed Oct 30, 2024 6:26 pm

dom wrote:
Wed Oct 30, 2024 10:39 am
Can you suggest a web page that reliably gives issues when browsed?
Does just visiting that page (after a reboot) show the issue, or does it require something more elaborate?
Can you disable the overclock and check you still see it?
After seeing it, are there any new messages in dmesg (e.g. allocation failures?)
Report

Code: Select all

cat /proc/meminfo
after seeing the issue.
Ugh... So this is kinda really weird... I did a apt upgrage today before I read this reply and I can not replicate that problem anymore after changed the parameter back to SDRAM_BANKLOW=3.

But I remembered that I tried to check the meminfo when the issue occured, especially for the CMA part, and it was just normal as usual. Below is the current result of meminfo with almost identical tabs opened. But the black/noisy boxes issue is also gone.

One thing I think I have to mention is: I use zram for swap. And I am not sure about wether the swap was occupied when the problem occured.

But to be honest, I now think that that behavior is not caused by the NUMA patch. Sorry for not doing a extensive test before I replied.

Code: Select all

cat /proc/meminfo
MemTotal: 8008644 kB
MemFree: 5436236 kB
MemAvailable: 6325684 kB
Buffers: 54160 kB
Cached: 1162392 kB
SwapCached: 0 kB
Active: 1812344 kB
Inactive: 370132 kB
Active(anon): 1246260 kB
Inactive(anon): 0 kB
Active(file): 566084 kB
Inactive(file): 370132 kB
Unevictable: 157844 kB
Mlocked: 136 kB
SwapTotal: 1806524 kB
SwapFree: 1806524 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 432 kB
Writeback: 0 kB
AnonPages: 1123812 kB
Mapped: 445004 kB
Shmem: 280336 kB
KReclaimable: 44024 kB
Slab: 95048 kB
SReclaimable: 44024 kB
SUnreclaim: 51024 kB
KernelStack: 10080 kB
PageTables: 31124 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5810844 kB
Committed_AS: 11359092 kB
VmallocTotal: 257687552 kB
VmallocUsed: 42752 kB
VmallocChunk: 0 kB
Percpu: 720 kB
CmaTotal: 524288 kB
CmaFree: 478340 kB

143 posts

Return to "Advanced users"

AltStyle によって変換されたページ (->オリジナル) /