-
-
Notifications
You must be signed in to change notification settings - Fork 846
rpi4's bare-metal hashing performance is poor without caching #155
-
Disclaimer: - I'm assuming this topic can be discussed here. If not, please let me know and I will remove this topic.
Question: ran into an odd issue. I'm working on a secure bootloader that's written entirely in rust. Most of the boot code for the rpi4 is from this repo. I managed to get all the pieces working. However, I've run into a strange performance issue. The gist of it is
- when I compute the hash of a large file (like 30MB) on a raspberry pi 4, it takes way too long. (i.e. I'm expecting a 30MB file to be hashed in 3 seconds but it takes about 36-40 seconds).
- my bare-metal bootloader's results are way off when compared with
OpenSSL and the sha2 crate
running on a standard linux OS + raspberry pi 4 i.e. the hashing-speed for openssl is 121 MiB/s and sha2 is 82 MiB/s, which roughly translates to less than 3 seconds for a 30MB file. - My suspicion is its some kind of hardware mis-configuration issue but I cant seem to figure it out.
I'm hoping folks here who have more experience with a rpi can offer some insight into what's probably missing/wrong.
A link to the implementation. The boot code is present in /boards/bootloaders/rpi4/src/boot.rs
serial output from an rpi4: as you can see from the logs below, computing a hash kernel and ramdisk takes an additional 80 secs (give or take).
boards\bootloaders\rpi4 on main is 📦 v0.1.0 via 🦀 v1.61.0-nightly ❯ terminal-s.exe --- COM3 is connected. Press Ctrl+] to quit --- [ 2.170921] EMMC2 driver initialized... .... .... .... [ 42.699906] loaded fit: 62202019 bytes, starting at addr: 0x200000 [ 42.703127] authenticating fit-image... [ 42.712671] [INFO] computing "kernel" hash [ 42.714672] - rustBoot::dt::fit @ line:289 [ 78.644641] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb [ 78.652289] - rustBoot::dt::fit @ line:294 [ 78.657293] [INFO] kernel integrity consistent with supplied itb... [ 78.664885] - rustBoot::dt::fit @ line:306 [ 78.670539] [INFO] computing "fdt" hash [ 78.674268] - rustBoot::dt::fit @ line:289 [ 78.710473] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540 [ 78.717861] - rustBoot::dt::fit @ line:294 [ 78.722847] [INFO] fdt integrity consistent with supplied itb... [ 78.730197] - rustBoot::dt::fit @ line:306 [ 78.735997] [INFO] computing "ramdisk" hash [ 78.739927] - rustBoot::dt::fit @ line:289 [ 119.074666] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c [ 119.082401] - rustBoot::dt::fit @ line:294 [ 119.087369] [INFO] ramdisk integrity consistent with supplied itb... [ 119.095084] - rustBoot::dt::fit @ line:306 [ 119.101018] [INFO] computing "rbconfig" hash [ 119.104902] - rustBoot::dt::fit @ line:289 [ 119.110001] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519 [ 119.120365] - rustBoot::dt::fit @ line:294 [ 119.125330] [INFO] rbconfig integrity consistent with supplied itb... [ 119.133135] - rustBoot::dt::fit @ line:306 ######## ecdsa signature checks out, image is authentic ######## [ 120.415416] relocating kernel to addr: 0x4200000 [ 121.660402] relocating initrd to addr: 0x6200000 [ 121.662056] load rbconfig... [ 121.666328] patching dtb... [ 121.671186] relocating dtb to addr: 0x6000000 ***************************************** Starting kernel ******************************************** [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 5 comments 17 replies
-
I presume the file is already entirely copied to RAM when your loader does computation on it?
Do you have virtual memory and caching enabled?
Beta Was this translation helpful? Give feedback.
All reactions
-
Yes, the file (to be hashed) is loaded into RAM.
The MMU is disabled, so no virtual memory. I assume by caching, you mean d-cache. If yes, that's not enabled either. (one of the goals is to ensure that the bootloader has the smallest possible trusted computing base)
But that's an interesting point. I assumed the only variable to consider was the single-core frequency. Would enabling them improve performance?
If yes, I'd be curious to know why?
Beta Was this translation helpful? Give feedback.
All reactions
-
Then you have your case I‘d say.
Your hashing code will inevitably use some temporary storage (on the stack) when doing it’s computation. Having that readily available in the cache will boost performance.
Caches are filled in quantums of the cacheline-size (usually 64 byte on aarch64 cpus). So for every load, you get the next few bytes „for free".
Also, when you operate on a file that is layed out sequentially in memory, the CPU‘s prefetchers will most likely kick in and pre-load even more upcoming needed data in the background.
I-Cache will help for similar reasons.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Makes sense. I'll test this and report back. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Thank you, this helps. I'm not aware of identity maps. Would you know of any reading material that I could use to learn more?
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
yeah, was just looking at this. thanks again.
Beta Was this translation helpful? Give feedback.
All reactions
-
So, tried this. It kind of works but I'm a bit stuck.
I added the mmu-specific code from exercise-10
to my bootloader. It seems to crash right away. So, I moved the snippet of code for enabling the mmu + caching
to right after we acquire logging capabilities.
The output below indicates, attempts to modify the SCTLR_EL1 register
simply crashes the entire system. The odd thing here, is it doesn't panic
either. The red status led turns on and stays on (until we we perform a hard reset).
PS: I've captured the register's value just before we try to modify it - 0xc50838. I cross-checked it with ARM's register docs, couldn't find anything wrong with it.
boards\bootloaders\rpi4 on main [✘!?] is 📦 v0.1.0 via 🦀 v1.61.0-nightly ❯ terminal-s.exe --- COM3 is connected. Press Ctrl+] to quit --- [ 1.696665] EMMC: reset card. [ 1.696758] control1: 16143 [ 1.699378] Divisor = 63, Freq Set = 396825 [ 2.106809] CSD Contents : 00 40 0e 00 32 5b 59 00 00ed c8 7f 80 0a 40 40 [ 2.110637] cemmc_structure=1, spec_vers=0, taac=0x0E, nsac=0x00, tran_speed=0x32,ccc=0x05B5, read_bl_len=0x09, read_bl_partial=0b, write_blk_misalign=0b,read_blk_misalign=0b, dsr_imp=0b, sector_size =0x7F, erase_blk_en=1b [ 2.130268] CSD 2.0: ver2_c_size = 0xEFFC, card capacity: 31914459136 bytes or 31.91GiB [ 2.138174] wp_grp_size=0x0000000b, wp_grp_enable=0b, default_ecc=00b, r2w_factor=010b, write_bl_len=0x09, write_bl_partial=0b, file_format_grp=0, copy=1b, perm_write_protect=0b, tmp_write_protect=0b, file_format=0b ecc=00b [ 2.157897] control1: 271 [ 2.160414] Divisor = 1, Freq Set = 25000000 [ 2.166935] EMMC: Bus width set to 4 [ 2.168068] EMMC: SD Card Type 2 HC, 30436Mb, mfr_id: 3, 'SD:ACLCD', r8.0, mfr_date: 1/2017, serial: 0xbbce119c, RCA: 0xaaaa [ 2.179179] EMMC2 driver initialized... [ 2.183002] mmu not enabled check [ 2.186216] translation granularity supported [ 2.190473] MAIR_EL1 set [ 2.473125] translation tables populated [ 2.474084] TTBR0_EL1 SET [ 2.476603] TCR SET [ 2.478601] first isb passed [ 2.481381] SCTLR_EL1: c50838
Beta Was this translation helpful? Give feedback.
All reactions
-
Ok, I compiled excercise-10
, flashed the (kernel8) image onto my rpi4. It works as expected.
Note: I moved the MMU activation code, so that we're able to log the activation flow. .
[ 0.007482] MAIR_EL1: 0xff04 [ 0.075640] Special regions: [ 0.075698] 0x00080000 - 0x0008ffff | 64 KiB | C RO PX | Kernel code and RO data [ 0.076652] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 0.077638] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 0.078527] BASE ADDR: 0x120000 [ 0.078905] TTBR0_EL1: 0x120000 [ 0.079285] TCR_EL1: 0x200807520 [ 0.079675] SCTLR_EL1: 0xc50838 [ 0.080054] After enabling MMU, SCTLR_EL1: 0xc5183d [ 0.080648] mingo version 0.10.0 [ 0.081038] Booting on: Raspberry Pi 4 [ 0.081493] MMU online. Special regions: [ 0.081970] 0x00080000 - 0x0008ffff | 64 KiB | C RO PX | Kernel code and RO data [ 0.082988] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 0.083974] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 0.084862] Current privilege level: EL1 [ 0.085339] Exception handling state: [ 0.085783] Debug: Masked [ 0.086173] SError: Masked [ 0.086563] IRQ: Masked [ 0.086953] FIQ: Masked [ 0.087343] Architectural timer resolution: 18 ns [ 0.087917] Drivers loaded: [ 0.088253] 1. BCM GPIO [ 0.088610] 2. BCM PL011 UART [ 0.089033] Timer test, spinning for 1 second [ !!! ] Writing through the remapped UART at 0x1FFF_1000 [ 1.089900] Echoing input now
However, when I copy and paste the (same) mmu-code from excercise-10
into my bootloader, it ends up crashing the entire system (...perplexing).
Note:
- all mmu-related code has been pulled into a single folder called memory and
- I think I've used every possible permutation and combination to set the relevant fields in the
SCTLR_EL1
register (i.e. set, write, modify, modify_on_read) and even tried to write the raw value into the register but I cant seem to get it work.
I plan on getting a hardware debugger.
But in the meantime, any thoughts on what I'm doing wrong here?
❯ terminal-s.exe --- COM3 is connected. Press Ctrl+] to quit --- ...... ...... [ 2.211136] MAIR_EL1: 0xff04 [ 2.485412] translation tables populated [ 2.486370] Special regions: [ 2.489151] 0x00080000 - 0x000a2fff | 140 KiB | C RO PX | Kernel code and RO data [ 2.497317] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 2.505223] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 2.512347] BASE ADDR: 0x280000 [ 2.515387] TTBR0_EL1: 0x280000 [ 2.518427] TCR_EL1: 0x200807520 [ 2.521555] first isb passed [ 2.524334] SCTLR_EL1: 0xc50838 [ 2.527375] new SCTLR_EL1: 0xc5183d [ ---- crashes ----- a red led turns on and stays on
Beta Was this translation helpful? Give feedback.
All reactions
-
objdump'd my elf binary and I think I understand the cause of the error.
So, as suggested, I examined the contents of the address - 0x09fa6c
(i.e. which contains the faulting instruction), observed the following
- the instruction is part of the
write_str
subroutine and - it attempts to store the contents of the
x10
register to an address at[x8 + 8]
The value in x8
at the time of the exception is 0xad018
, which happens to be an address in the .data
section. (it contains the memory mapped address for the PL011_UART peripheral). However, as we're adding an offset of 8 to x8, the faulting instruction attempts to store the contents of x10
to address 0xad020
, which results in a (level-3 table) permission fault.
- note: FAR_EL1 also contains
0xad020
.
So, my previous suspicion that it had something to do with writing to SCTLR_EL1
turned out to be wrong. The relevant SCTLR_EL1
bits are set and the MMU is enabled but later on when we try to log/print anything to serial output, we get the above (bad-write) exception.
A couple of things that I haven't figured out:
- why does printing fail only after enabling the MMU? I noticed we're able to log a single character -
[
just before the panic. - I have not been able to figure out the exact execution path in
kernel_main
. I see that theexecution-flow
passes from kernel_init to kernel_main but how we end up inwrite_str
is still a mystery or at least I'm not sure we can answer that with juststatic code analysis
.- attached the full bootloader_objdump.s file, in case it is required.
- the other thing is why do we get a write-permission fault, address 0x0ad020 is basically the
.data
section which should be writeable - right?
Would adding an extra tabledescriptor
to the LAYOUT
for .data
section and making it ReadWrite-able
solve this?
[ 2.130593] mmu not enabled check [ 2.130994] translation granularity supported [ 2.131525] MAIR_EL1 set [ 2.131828] MAIR_EL1: 0xff04 [ 2.226713] translation tables populated [ 2.226837] Special regions: [ 2.227183] 0x00080000 - 0x000acfff | 180 KiB | C RO PX | Kernel code and RO data [ 2.228202] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 2.229187] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 2.230076] BASE ADDR: 0x280000 [ 2.230455] TTBR0_EL1: 0x280000 [ 2.230834] TCR_EL1: 0x200807520 [ 2.231224] first isb passed [ 2.231571] SCTLR_EL1: 0xc50838 [ 2.231950] new SCTLR_EL1: 0xc5183d [[ 2.232384] Kernel panic! Panic location: File 'hal\src\rpi\rpi4\exception\exception.rs', line 64, column 5 CPU Exception! ESR_EL1: 0x9600004f Exception Class (EC) : 0x25 - Data Abort, current EL Instr Specific Syndrome (ISS): 0x4f FAR_EL1: 0x00000000000ad020 SPSR_EL1: 0x600003c5 Flags: Negative (N): Not set Zero (Z): Set Carry (C): Set Overflow (V): Not set Exception handling state: Debug (D): Masked SError (A): Masked IRQ (I): Masked FIQ (F): Masked Illegal Execution State (IL): Not set ELR_EL1: 0x000000000009fa6c General purpose register: x0 : 0x000000000007fbf0 x1 : 0x00000000000ac111 x2 : 0x0000000000000003 x3 : 0x000000000009fa50 x4 : 0x0000000000000006 x5 : 0x000000000007ff44 x6 : 0x000000000007ff48 x7 : 0x000000000007ff4c x8 : 0x00000000000ad018 x9 : 0x00000000000ac113 x10: 0x00000000000006bb x11: 0x00000000fe201000 x12: 0x0000000000000009 x13: 0x00000000000a6bf8 x14: 0x0000000000000006 x15: 0x0000000000000057 x16: 0x000000000007fc8b x17: 0x0000000000000005 x18: 0x0000000000000002 x19: 0x000000000007f540 x20: 0x0000000000000005 x21: 0x0000000000000118 x22: 0x00000000000a6b88 x23: 0x00000000000aab70 x24: 0x0000000000081d88 x25: 0x00000000000abea0 x26: 0x0000000100000000 x27: 0x000000000007f7e0 x28: 0x0000000000081480 x29: 0x0000000000081ddc lr : 0x0000000000082c14
Beta Was this translation helpful? Give feedback.
All reactions
-
I think I spotted it. Your end of code section and start of data section is not 64KiB aligned, but that is the paging granularity. You get the permission fault because start of your data is still covered by the last code page, which is mapped RO.
Beta Was this translation helpful? Give feedback.
All reactions
-
Ah, makes sense. I'll change that and report back.
Beta Was this translation helpful? Give feedback.
All reactions
-
Yep, that works 🙌🏾. I added a (64KiB) alignment constraint to the .data
section of the linker script.
.data : ALIGN(65536) { *(.data*) } :segment_data
as for the original issue - performance is way better than what I could have hoped for. What took a 100 seconds before, now completes in less than 1.5 seconds and that includes
- hashing and validating the integrity of 4 files (with a total size of 62MB) along with verifying an ECC signature.
- guess, caching is a wondrous thing (until it is not).
I run into an instruction abort
exception at the end. The faulting instruction starts at address 0x4600000 located in the .bss
section (which is where I've loaded the Linux kernel). Its another permission fault. I guess the fix here, is to mark the kernel load-range as a special region with the read + execute permissions - right?
[ 1.714893] EMMC: reset card. [ 1.714982] control1: 16143 [ 1.715250] Divisor = 63, Freq Set = 396825 [ 2.119087] CSD Contents : 00 40 0e 00 32 5b 59 00 00ed c8 7f 80 0a 40 40 [ 2.119571] cemmc_structure=1, spec_vers=0, taac=0x0E, nsac=0x00, tran_speed=0x32,ccc=0x05B5, read_bl_len=0x09, read_bl_partial=0b, write_blk_misalign=0b,read_blk_misalign=0b, dsr_imp=0b, sector_size =0x7F, erase_blk_en=1b [ 2.122018] CSD 2.0: ver2_c_size = 0xEFFC, card capacity: 31914459136 bytes or 31.91GiB [ 2.123004] wp_grp_size=0x0000000b, wp_grp_enable=0b, default_ecc=00b, r2w_factor=010b, write_bl_len=0x09, write_bl_partial=0b, file_format_grp=0, copy=1b, perm_write_protect=0b, tmp_write_protect=0b, file_format=0b ecc=00b [ 2.125465] control1: 271 [ 2.125778] Divisor = 1, Freq Set = 25000000 [ 2.128635] EMMC: Bus width set to 4 [ 2.128721] EMMC: SD Card Type 2 HC, 30436Mb, mfr_id: 3, 'SD:ACLCD', r8.0, mfr_date: 1/2017, serial: 0xbbce119c, RCA: 0xaaaa [ 2.130102] EMMC2 driver initialized... [ 2.232355] rpi4 version 0.1.0 [ 2.232721] Booting on: Raspberry Pi 4 [ 2.233176] MMU online. Special regions: [ 2.233653] 0x00080000 - 0x000acfff | 180 KiB | C RO PX | Kernel code and RO data [ 2.234671] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 2.235657] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 2.236546] Current privilege level: EL1 [ 2.237022] Exception handling state: [ 2.237466] Debug: Masked [ 2.237856] SError: Masked [ 2.238246] IRQ: Masked [ 2.238636] FIQ: Masked [ 2.239026] Architectural timer resolution: 18 ns [ 2.239600] Drivers loaded: [ 2.239936] 1. BCM GPIO [ 2.240294] 2. BCM PL011 UART [ 2.240716] Chars written: 2494 [ !!! ] Writing through the remapped UART at 0x1FFF_1000 [ 2.241790] [INFO] create new emmc-fat controller... [ 2.242504] - rustBoot::fs::controller @ line:200 [ 2.247239] Listing root directory: [ 2.250831] - Found: SIGNED~1.ITB [ 2.251027] loading fit-image... [ 33.920214] loaded fit: 62202019 bytes, starting at addr: 0x600000 [ 33.920617] authenticating fit-image... [ 33.921360] [INFO] computing "kernel" hash [ 33.921830] - rustBoot::dt::fit @ line:289 [ 34.612911] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb [ 34.613864] - rustBoot::dt::fit @ line:294 [ 34.614467] [INFO] kernel integrity consistent with supplied itb... [ 34.615435] - rustBoot::dt::fit @ line:308 [ 34.616054] [INFO] computing "fdt" hash [ 34.616605] - rustBoot::dt::fit @ line:289 [ 34.617811] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540 [ 34.618732] - rustBoot::dt::fit @ line:294 [ 34.619333] [INFO] fdt integrity consistent with supplied itb... [ 34.620270] - rustBoot::dt::fit @ line:308 [ 34.620891] [INFO] computing "ramdisk" hash [ 34.621484] - rustBoot::dt::fit @ line:289 [ 35.398004] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c [ 35.398968] - rustBoot::dt::fit @ line:294 [ 35.399570] [INFO] ramdisk integrity consistent with supplied itb... [ 35.400550] - rustBoot::dt::fit @ line:308 [ 35.401174] [INFO] computing "rbconfig" hash [ 35.401774] - rustBoot::dt::fit @ line:289 [ 35.402376] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519 [ 35.403702] - rustBoot::dt::fit @ line:294 [ 35.404303] [INFO] rbconfig integrity consistent with supplied itb... [ 35.405295] - rustBoot::dt::fit @ line:308 ######## ecdsa signature checks out, image is authentic ######## [ 35.434296] relocating kernel to addr: 0x4600000 [ 35.456677] relocating initrd to addr: 0x6400000 [ 35.456885] load rbconfig... [ 35.457266] patching dtb... [ 35.457772] relocating dtb to addr: 0x400000 ***************************************** Starting kernel ******************************************** [ 35.459487] Kernel panic! Panic location: File 'hal\src\rpi\rpi4\exception\exception.rs', line 64, column 5 CPU Exception! ESR_EL1: 0x8600000f Exception Class (EC) : 0x21 - N/A Instr Specific Syndrome (ISS): 0xf FAR_EL1: 0x0000000004600000 SPSR_EL1: 0x600003c5 Flags: Negative (N): Not set Zero (Z): Set Carry (C): Set Overflow (V): Not set Exception handling state: Debug (D): Masked SError (A): Masked IRQ (I): Masked FIQ (F): Masked Illegal Execution State (IL): Not set ELR_EL1: 0x0000000004600000 General purpose register: x0 : 0x0000000000400000 x1 : 0x0000000000000000 x2 : 0x0000000000000000 x3 : 0x0000000000000000 x4 : 0x0000000000000006 x5 : 0x0000000000005ea8 x6 : 0x0000000000000001 x7 : 0x0000000000000000 x8 : 0x0000000004600000 x9 : 0x00000000000a7014 x10: 0x00000000000013de x11: 0x00000000fe201000 x12: 0x0000000000000019 x13: 0x000000000007f810 x14: 0x0000000000000000 x15: 0x0000000000000000 x16: 0x0000000000000030 x17: 0x0000000000000078 x18: 0x0000000000400000 x19: 0x00000000000b0018 x20: 0x000000004e650000 x21: 0x0000000000083bec x22: 0x00000000000000bc x23: 0x000000003b9aca00 x24: 0x0000000000000244 x25: 0x00000000000f4240 x26: 0x00000000000abea0 x27: 0x0000000000006521 x28: 0x0000000000000264 x29: 0x0000000000081ddc lr : 0x000000000008bd5c
Beta Was this translation helpful? Give feedback.
All reactions
-
Well, the first thing that Linux will do is to set up its own page tables. I don’t know by heart what the expectation from a previous boot loader stage is with respect to the architectural state of the memory subsystem.
For starters, I would probably just disable the MMU again before jumping to Linux.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 2
-
Yeah, completely forgot about this (got lost in MMU translation - 😁). I'll need to reset most of the hardware to an early state for linux
to boot.
Beta Was this translation helpful? Give feedback.
All reactions
-
@nihalpasham can you do me a favor and check what the speedup is with instruction caching alone?
Would be a nice datapoint to have.
Beta Was this translation helpful? Give feedback.
All reactions
-
enabled instruction-caching alone.
// Enable the MMU and turn on instruction caching alone. SCTLR_EL1.modify(SCTLR_EL1::M::Enable + SCTLR_EL1::I::Cacheable);
Results: for the same set of operations
- approx time: 47.5 seconds or half the original amount of time.
..... ..... [ 2.399815] rpi4 version 0.1.0 [ 2.400183] Booting on: Raspberry Pi 4 [ 2.400638] MMU online. Special regions: [ 2.401115] 0x00080000 - 0x000a4fff | 148 KiB | C RO PX | Kernel code and RO data [ 2.402134] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 2.403119] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 2.404008] Current privilege level: EL1 [ 2.404484] Exception handling state: [ 2.404928] Debug: Masked [ 2.405318] SError: Masked [ 2.405708] IRQ: Masked [ 2.406098] FIQ: Masked [ 2.406488] Architectural timer resolution: 18 ns [ 2.407062] Drivers loaded: [ 2.407398] 1. BCM GPIO [ 2.407756] 2. BCM PL011 UART [ 2.408178] Chars written: 1793 [ !!! ] Writing through the remapped UART at 0x1FFF_1000 [ 2.409253] [INFO] create new emmc-fat controller... [ 2.409966] - rustBoot::fs::controller @ line:200 [ 2.414702] Listing root directory: [ 2.418335] - Found: SIGNED~1.ITB [ 2.418538] loading fit-image... [ 34.053223] loaded fit: 62202019 bytes, starting at addr: 0x290000 [ 34.053627] authenticating fit-image... [ 34.055365] [INFO] computing "kernel" hash [ 34.055617] - rustBoot::dt::fit @ line:289 [ 56.309267] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb [ 56.310223] - rustBoot::dt::fit @ line:294 [ 56.310875] [INFO] kernel integrity consistent with supplied itb... [ 56.311793] - rustBoot::dt::fit @ line:308 [ 56.312622] [INFO] computing "fdt" hash [ 56.312963] - rustBoot::dt::fit @ line:289 [ 56.333355] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540 [ 56.334278] - rustBoot::dt::fit @ line:294 [ 56.334926] [INFO] fdt integrity consistent with supplied itb... [ 56.335816] - rustBoot::dt::fit @ line:308 [ 56.336685] [INFO] computing "ramdisk" hash [ 56.337030] - rustBoot::dt::fit @ line:289 [ 81.346679] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c [ 81.347646] - rustBoot::dt::fit @ line:294 [ 81.348290] [INFO] ramdisk integrity consistent with supplied itb... [ 81.349227] - rustBoot::dt::fit @ line:308 [ 81.350135] [INFO] computing "rbconfig" hash [ 81.350451] - rustBoot::dt::fit @ line:289 [ 81.351205] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519 [ 81.352380] - rustBoot::dt::fit @ line:294 [ 81.353024] [INFO] rbconfig integrity consistent with supplied itb... [ 81.353972] - rustBoot::dt::fit @ line:308 ######## ecdsa signature checks out, image is authentic ######## [ 81.556150] relocating kernel to addr: 0x4200000 .... ....
enabled data-caching alone.
// Enable the MMU and turn on data caching alone. SCTLR_EL1.modify(SCTLR_EL1::M::Enable + SCTLR_EL1::C::Cacheable);
Results: for the same set of operations
- approx time: 61.15 seconds
.... .... [ 2.399946] rpi4 version 0.1.0 [ 2.400313] Booting on: Raspberry Pi 4 [ 2.400767] MMU online. Special regions: [ 2.401245] 0x00080000 - 0x000a4fff | 148 KiB | C RO PX | Kernel code and RO data [ 2.402263] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO [ 2.403249] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO [ 2.404138] Current privilege level: EL1 [ 2.404614] Exception handling state: [ 2.405058] Debug: Masked [ 2.405448] SError: Masked [ 2.405838] IRQ: Masked [ 2.406228] FIQ: Masked [ 2.406618] Architectural timer resolution: 18 ns [ 2.407192] Drivers loaded: [ 2.407528] 1. BCM GPIO [ 2.407885] 2. BCM PL011 UART [ 2.408308] Chars written: 1793 [ !!! ] Writing through the remapped UART at 0x1FFF_1000 [ 2.409383] [INFO] create new emmc-fat controller... [ 2.410096] - rustBoot::fs::controller @ line:200 [ 2.414937] Listing root directory: [ 2.419248] - Found: SIGNED~1.ITB [ 2.419470] loading fit-image... [ 42.972299] loaded fit: 62202019 bytes, starting at addr: 0x290000 [ 42.972711] authenticating fit-image... [ 42.977009] [INFO] computing "kernel" hash [ 42.977267] - rustBoot::dt::fit @ line:289 [ 71.962372] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb [ 71.963334] - rustBoot::dt::fit @ line:294 [ 71.964113] [INFO] kernel integrity consistent with supplied itb... [ 71.964905] - rustBoot::dt::fit @ line:308 [ 71.966198] [INFO] computing "fdt" hash [ 71.966423] - rustBoot::dt::fit @ line:289 [ 71.992601] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540 [ 71.993531] - rustBoot::dt::fit @ line:294 [ 71.994300] [INFO] fdt integrity consistent with supplied itb... [ 71.995069] - rustBoot::dt::fit @ line:308 [ 71.996478] [INFO] computing "ramdisk" hash [ 71.996746] - rustBoot::dt::fit @ line:289 [ 104.578328] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c [ 104.579301] - rustBoot::dt::fit @ line:294 [ 104.580056] [INFO] ramdisk integrity consistent with supplied itb... [ 104.580881] - rustBoot::dt::fit @ line:308 [ 104.582422] [INFO] computing "rbconfig" hash [ 104.582700] - rustBoot::dt::fit @ line:289 [ 104.583560] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519 [ 104.584629] - rustBoot::dt::fit @ line:294 [ 104.585383] [INFO] rbconfig integrity consistent with supplied itb... [ 104.586221] - rustBoot::dt::fit @ line:308 ######## ecdsa signature checks out, image is authentic ######## [ 106.124374] relocating kernel to addr: 0x4200000
Conclusions:
- for the above set of operations, instruction caching alone contributes to a 50% speed-up
- for the same set of operations, data caching alone contributes to a 40% speed-up
- cumulatively though i.e. with both instruction + data caching enabled, we get a massive 100x speed-up.
The results kind of make sense as hashing algorithms (are typically implemented in 3 steps - init, update and finalize). The bulk of the work is performed in the update
step where we apply the same operations on new chunks of data, repeatedly.
Beta Was this translation helpful? Give feedback.
All reactions
-
🚀 1