Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rpi4's bare-metal hashing performance is poor without caching #155

nihalpasham started this conversation in General
Discussion options

Disclaimer: - I'm assuming this topic can be discussed here. If not, please let me know and I will remove this topic.

Question: ran into an odd issue. I'm working on a secure bootloader that's written entirely in rust. Most of the boot code for the rpi4 is from this repo. I managed to get all the pieces working. However, I've run into a strange performance issue. The gist of it is

  • when I compute the hash of a large file (like 30MB) on a raspberry pi 4, it takes way too long. (i.e. I'm expecting a 30MB file to be hashed in 3 seconds but it takes about 36-40 seconds).
  • my bare-metal bootloader's results are way off when compared with OpenSSL and the sha2 crate running on a standard linux OS + raspberry pi 4 i.e. the hashing-speed for openssl is 121 MiB/s and sha2 is 82 MiB/s, which roughly translates to less than 3 seconds for a 30MB file.
  • My suspicion is its some kind of hardware mis-configuration issue but I cant seem to figure it out.

I'm hoping folks here who have more experience with a rpi can offer some insight into what's probably missing/wrong.

A link to the implementation. The boot code is present in /boards/bootloaders/rpi4/src/boot.rs

serial output from an rpi4: as you can see from the logs below, computing a hash kernel and ramdisk takes an additional 80 secs (give or take).

boards\bootloaders\rpi4 on  main is 📦 v0.1.0 via 🦀 v1.61.0-nightly
❯ terminal-s.exe
--- COM3 is connected. Press Ctrl+] to quit ---
[ 2.170921] EMMC2 driver initialized...
....
....
....
[ 42.699906] loaded fit: 62202019 bytes, starting at addr: 0x200000
[ 42.703127] authenticating fit-image...
[ 42.712671] [INFO] computing "kernel" hash
[ 42.714672] - rustBoot::dt::fit @ line:289
[ 78.644641] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb
[ 78.652289] - rustBoot::dt::fit @ line:294
[ 78.657293] [INFO] kernel integrity consistent with supplied itb...
[ 78.664885] - rustBoot::dt::fit @ line:306
[ 78.670539] [INFO] computing "fdt" hash
[ 78.674268] - rustBoot::dt::fit @ line:289
[ 78.710473] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540
[ 78.717861] - rustBoot::dt::fit @ line:294
[ 78.722847] [INFO] fdt integrity consistent with supplied itb...
[ 78.730197] - rustBoot::dt::fit @ line:306
[ 78.735997] [INFO] computing "ramdisk" hash
[ 78.739927] - rustBoot::dt::fit @ line:289
[ 119.074666] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c
[ 119.082401] - rustBoot::dt::fit @ line:294
[ 119.087369] [INFO] ramdisk integrity consistent with supplied itb...
[ 119.095084] - rustBoot::dt::fit @ line:306
[ 119.101018] [INFO] computing "rbconfig" hash
[ 119.104902] - rustBoot::dt::fit @ line:289
[ 119.110001] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519
[ 119.120365] - rustBoot::dt::fit @ line:294
[ 119.125330] [INFO] rbconfig integrity consistent with supplied itb...
[ 119.133135] - rustBoot::dt::fit @ line:306
######## ecdsa signature checks out, image is authentic ########
[ 120.415416] relocating kernel to addr: 0x4200000
[ 121.660402] relocating initrd to addr: 0x6200000
[ 121.662056] load rbconfig...
[ 121.666328] patching dtb...
[ 121.671186] relocating dtb to addr: 0x6000000
***************************************** Starting kernel ********************************************
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
You must be logged in to vote

Replies: 5 comments 17 replies

Comment options

I presume the file is already entirely copied to RAM when your loader does computation on it?

Do you have virtual memory and caching enabled?

You must be logged in to vote
3 replies
Comment options

Yes, the file (to be hashed) is loaded into RAM.

The MMU is disabled, so no virtual memory. I assume by caching, you mean d-cache. If yes, that's not enabled either. (one of the goals is to ensure that the bootloader has the smallest possible trusted computing base)

But that's an interesting point. I assumed the only variable to consider was the single-core frequency. Would enabling them improve performance?

If yes, I'd be curious to know why?

Comment options

Then you have your case I‘d say.

Your hashing code will inevitably use some temporary storage (on the stack) when doing it’s computation. Having that readily available in the cache will boost performance.

Caches are filled in quantums of the cacheline-size (usually 64 byte on aarch64 cpus). So for every load, you get the next few bytes „for free".

Also, when you operate on a file that is layed out sequentially in memory, the CPU‘s prefetchers will most likely kick in and pre-load even more upcoming needed data in the background.

I-Cache will help for similar reasons.

Comment options

Makes sense. I'll test this and report back. Thank you!

Comment options

Just in case this wasn’t already discussed: Caching is predicated on the MMU being enabled on Arm v{7,8}~A. Cacheability expression needs to be done via page table descriptors. This is true irrespective of address translation being required or not. A typical setup for such early boot code is to setup identity maps via a suitable set of page table entries.
...
On 2022年4月15日 at 18:35, nihalpasham ***@***.***> wrote: Yes, the file (to be hashed) is loaded into RAM. The MMU is disabled, so no virtual memory. I assume by caching, you mean d-cache. If yes, that's not enabled either. (one of the goals is to ensure that the bootloader has the smallest possible trusted computing base) But that's an interesting point. I assumed the only variable to consider was the single-core frequency. Would enabling them improve performance? If yes, I'd be curious to know why? — Reply to this email directly, view it on GitHub <#155 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFMKYRSRRIBYNOYS6XGEKLVFFSQBANCNFSM5TP7SIFQ> . You are receiving this because you are subscribed to this thread.Message ID: <rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2573843 @github.com>
You must be logged in to vote
4 replies
Comment options

Thank you, this helps. I'm not aware of identity maps. Would you know of any reading material that I could use to learn more?

Comment options

Comment options

yeah, was just looking at this. thanks again.

Comment options

So, tried this. It kind of works but I'm a bit stuck.

I added the mmu-specific code from exercise-10 to my bootloader. It seems to crash right away. So, I moved the snippet of code for enabling the mmu + caching to right after we acquire logging capabilities.

The output below indicates, attempts to modify the SCTLR_EL1 register simply crashes the entire system. The odd thing here, is it doesn't panic either. The red status led turns on and stays on (until we we perform a hard reset).

PS: I've captured the register's value just before we try to modify it - 0xc50838. I cross-checked it with ARM's register docs, couldn't find anything wrong with it.

boards\bootloaders\rpi4 on  main [✘!?] is 📦 v0.1.0 via 🦀 v1.61.0-nightly
❯ terminal-s.exe
--- COM3 is connected. Press Ctrl+] to quit ---
[ 1.696665] EMMC: reset card.
[ 1.696758] control1: 16143
[ 1.699378] Divisor = 63, Freq Set = 396825
[ 2.106809] CSD Contents : 00 40 0e 00 32 5b 59 00 00ed c8 7f 80 0a 40 40
[ 2.110637] cemmc_structure=1, spec_vers=0, taac=0x0E, nsac=0x00, tran_speed=0x32,ccc=0x05B5, read_bl_len=0x09, read_bl_partial=0b, write_blk_misalign=0b,read_blk_misalign=0b, dsr_imp=0b, sector_size =0x7F, erase_blk_en=1b
[ 2.130268] CSD 2.0: ver2_c_size = 0xEFFC, card capacity: 31914459136 bytes or 31.91GiB
[ 2.138174] wp_grp_size=0x0000000b, wp_grp_enable=0b, default_ecc=00b, r2w_factor=010b, write_bl_len=0x09, write_bl_partial=0b, file_format_grp=0, copy=1b, perm_write_protect=0b, tmp_write_protect=0b, file_format=0b ecc=00b
[ 2.157897] control1: 271
[ 2.160414] Divisor = 1, Freq Set = 25000000
[ 2.166935] EMMC: Bus width set to 4
[ 2.168068] EMMC: SD Card Type 2 HC, 30436Mb, mfr_id: 3, 'SD:ACLCD', r8.0, mfr_date: 1/2017, serial: 0xbbce119c, RCA: 0xaaaa
[ 2.179179] EMMC2 driver initialized...
[ 2.183002] mmu not enabled check
[ 2.186216] translation granularity supported
[ 2.190473] MAIR_EL1 set
[ 2.473125] translation tables populated
[ 2.474084] TTBR0_EL1 SET
[ 2.476603] TCR SET
[ 2.478601] first isb passed
[ 2.481381] SCTLR_EL1: c50838
Comment options

Ok, I compiled excercise-10, flashed the (kernel8) image onto my rpi4. It works as expected.

Note: I moved the MMU activation code, so that we're able to log the activation flow. .

[ 0.007482] MAIR_EL1: 0xff04
[ 0.075640] Special regions:
[ 0.075698] 0x00080000 - 0x0008ffff | 64 KiB | C RO PX | Kernel code and RO data
[ 0.076652] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 0.077638] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 0.078527] BASE ADDR: 0x120000
[ 0.078905] TTBR0_EL1: 0x120000
[ 0.079285] TCR_EL1: 0x200807520
[ 0.079675] SCTLR_EL1: 0xc50838
[ 0.080054] After enabling MMU, SCTLR_EL1: 0xc5183d
[ 0.080648] mingo version 0.10.0
[ 0.081038] Booting on: Raspberry Pi 4
[ 0.081493] MMU online. Special regions:
[ 0.081970] 0x00080000 - 0x0008ffff | 64 KiB | C RO PX | Kernel code and RO data
[ 0.082988] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 0.083974] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 0.084862] Current privilege level: EL1
[ 0.085339] Exception handling state:
[ 0.085783] Debug: Masked
[ 0.086173] SError: Masked
[ 0.086563] IRQ: Masked
[ 0.086953] FIQ: Masked
[ 0.087343] Architectural timer resolution: 18 ns
[ 0.087917] Drivers loaded:
[ 0.088253] 1. BCM GPIO
[ 0.088610] 2. BCM PL011 UART
[ 0.089033] Timer test, spinning for 1 second
[ !!! ] Writing through the remapped UART at 0x1FFF_1000
[ 1.089900] Echoing input now

However, when I copy and paste the (same) mmu-code from excercise-10 into my bootloader, it ends up crashing the entire system (...perplexing).

Note:

  • all mmu-related code has been pulled into a single folder called memory and
  • I think I've used every possible permutation and combination to set the relevant fields in the SCTLR_EL1 register (i.e. set, write, modify, modify_on_read) and even tried to write the raw value into the register but I cant seem to get it work.

I plan on getting a hardware debugger.

But in the meantime, any thoughts on what I'm doing wrong here?

❯ terminal-s.exe
--- COM3 is connected. Press Ctrl+] to quit ---
......
......
[ 2.211136] MAIR_EL1: 0xff04
[ 2.485412] translation tables populated
[ 2.486370] Special regions:
[ 2.489151] 0x00080000 - 0x000a2fff | 140 KiB | C RO PX | Kernel code and RO data
[ 2.497317] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 2.505223] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 2.512347] BASE ADDR: 0x280000
[ 2.515387] TTBR0_EL1: 0x280000
[ 2.518427] TCR_EL1: 0x200807520
[ 2.521555] first isb passed
[ 2.524334] SCTLR_EL1: 0xc50838
[ 2.527375] new SCTLR_EL1: 0xc5183d
[ 
---- crashes ----- a red led turns on and stays on
You must be logged in to vote
8 replies
Comment options

objdump'd my elf binary and I think I understand the cause of the error.

So, as suggested, I examined the contents of the address - 0x09fa6c (i.e. which contains the faulting instruction), observed the following

  • the instruction is part of the write_str subroutine and
  • it attempts to store the contents of the x10 register to an address at [x8 + 8]

image

The value in x8 at the time of the exception is 0xad018, which happens to be an address in the .data section. (it contains the memory mapped address for the PL011_UART peripheral). However, as we're adding an offset of 8 to x8, the faulting instruction attempts to store the contents of x10 to address 0xad020, which results in a (level-3 table) permission fault.

  • note: FAR_EL1 also contains 0xad020 .

image

So, my previous suspicion that it had something to do with writing to SCTLR_EL1 turned out to be wrong. The relevant SCTLR_EL1 bits are set and the MMU is enabled but later on when we try to log/print anything to serial output, we get the above (bad-write) exception.

A couple of things that I haven't figured out:

  • why does printing fail only after enabling the MMU? I noticed we're able to log a single character - [ just before the panic.
  • I have not been able to figure out the exact execution path in kernel_main. I see that the execution-flow passes from kernel_init to kernel_main but how we end up in write_str is still a mystery or at least I'm not sure we can answer that with just static code analysis.
  • the other thing is why do we get a write-permission fault, address 0x0ad020 is basically the .data section which should be writeable - right?

Would adding an extra tabledescriptor to the LAYOUT for .data section and making it ReadWrite-able solve this?

[ 2.130593] mmu not enabled check
[ 2.130994] translation granularity supported
[ 2.131525] MAIR_EL1 set
[ 2.131828] MAIR_EL1: 0xff04
[ 2.226713] translation tables populated
[ 2.226837] Special regions:
[ 2.227183] 0x00080000 - 0x000acfff | 180 KiB | C RO PX | Kernel code and RO data
[ 2.228202] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 2.229187] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 2.230076] BASE ADDR: 0x280000
[ 2.230455] TTBR0_EL1: 0x280000
[ 2.230834] TCR_EL1: 0x200807520
[ 2.231224] first isb passed
[ 2.231571] SCTLR_EL1: 0xc50838
[ 2.231950] new SCTLR_EL1: 0xc5183d
[[ 2.232384] Kernel panic!
Panic location:
 File 'hal\src\rpi\rpi4\exception\exception.rs', line 64, column 5
CPU Exception!
ESR_EL1: 0x9600004f
 Exception Class (EC) : 0x25 - Data Abort, current EL
 Instr Specific Syndrome (ISS): 0x4f
FAR_EL1: 0x00000000000ad020
SPSR_EL1: 0x600003c5
 Flags:
 Negative (N): Not set
 Zero (Z): Set
 Carry (C): Set
 Overflow (V): Not set
 Exception handling state:
 Debug (D): Masked
 SError (A): Masked
 IRQ (I): Masked
 FIQ (F): Masked
 Illegal Execution State (IL): Not set
ELR_EL1: 0x000000000009fa6c
General purpose register:
 x0 : 0x000000000007fbf0 x1 : 0x00000000000ac111
 x2 : 0x0000000000000003 x3 : 0x000000000009fa50
 x4 : 0x0000000000000006 x5 : 0x000000000007ff44
 x6 : 0x000000000007ff48 x7 : 0x000000000007ff4c
 x8 : 0x00000000000ad018 x9 : 0x00000000000ac113
 x10: 0x00000000000006bb x11: 0x00000000fe201000
 x12: 0x0000000000000009 x13: 0x00000000000a6bf8
 x14: 0x0000000000000006 x15: 0x0000000000000057
 x16: 0x000000000007fc8b x17: 0x0000000000000005
 x18: 0x0000000000000002 x19: 0x000000000007f540
 x20: 0x0000000000000005 x21: 0x0000000000000118
 x22: 0x00000000000a6b88 x23: 0x00000000000aab70
 x24: 0x0000000000081d88 x25: 0x00000000000abea0
 x26: 0x0000000100000000 x27: 0x000000000007f7e0
 x28: 0x0000000000081480 x29: 0x0000000000081ddc
 lr : 0x0000000000082c14
Comment options

I think I spotted it. Your end of code section and start of data section is not 64KiB aligned, but that is the paging granularity. You get the permission fault because start of your data is still covered by the last code page, which is mapped RO.

Comment options

Ah, makes sense. I'll change that and report back.

Comment options

Yep, that works 🙌🏾. I added a (64KiB) alignment constraint to the .data section of the linker script.

 .data : ALIGN(65536) { *(.data*) } :segment_data

as for the original issue - performance is way better than what I could have hoped for. What took a 100 seconds before, now completes in less than 1.5 seconds and that includes

  • hashing and validating the integrity of 4 files (with a total size of 62MB) along with verifying an ECC signature.
  • guess, caching is a wondrous thing (until it is not).

I run into an instruction abort exception at the end. The faulting instruction starts at address 0x4600000 located in the .bss section (which is where I've loaded the Linux kernel). Its another permission fault. I guess the fix here, is to mark the kernel load-range as a special region with the read + execute permissions - right?

[ 1.714893] EMMC: reset card.
[ 1.714982] control1: 16143
[ 1.715250] Divisor = 63, Freq Set = 396825
[ 2.119087] CSD Contents : 00 40 0e 00 32 5b 59 00 00ed c8 7f 80 0a 40 40
[ 2.119571] cemmc_structure=1, spec_vers=0, taac=0x0E, nsac=0x00, tran_speed=0x32,ccc=0x05B5, read_bl_len=0x09, read_bl_partial=0b, write_blk_misalign=0b,read_blk_misalign=0b, dsr_imp=0b, sector_size =0x7F, erase_blk_en=1b
[ 2.122018] CSD 2.0: ver2_c_size = 0xEFFC, card capacity: 31914459136 bytes or 31.91GiB
[ 2.123004] wp_grp_size=0x0000000b, wp_grp_enable=0b, default_ecc=00b, r2w_factor=010b, write_bl_len=0x09, write_bl_partial=0b, file_format_grp=0, copy=1b, perm_write_protect=0b, tmp_write_protect=0b, file_format=0b ecc=00b
[ 2.125465] control1: 271
[ 2.125778] Divisor = 1, Freq Set = 25000000
[ 2.128635] EMMC: Bus width set to 4
[ 2.128721] EMMC: SD Card Type 2 HC, 30436Mb, mfr_id: 3, 'SD:ACLCD', r8.0, mfr_date: 1/2017, serial: 0xbbce119c, RCA: 0xaaaa
[ 2.130102] EMMC2 driver initialized...
[ 2.232355] rpi4 version 0.1.0
[ 2.232721] Booting on: Raspberry Pi 4
[ 2.233176] MMU online. Special regions:
[ 2.233653] 0x00080000 - 0x000acfff | 180 KiB | C RO PX | Kernel code and RO data
[ 2.234671] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 2.235657] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 2.236546] Current privilege level: EL1
[ 2.237022] Exception handling state:
[ 2.237466] Debug: Masked
[ 2.237856] SError: Masked
[ 2.238246] IRQ: Masked
[ 2.238636] FIQ: Masked
[ 2.239026] Architectural timer resolution: 18 ns
[ 2.239600] Drivers loaded:
[ 2.239936] 1. BCM GPIO
[ 2.240294] 2. BCM PL011 UART
[ 2.240716] Chars written: 2494
[ !!! ] Writing through the remapped UART at 0x1FFF_1000
[ 2.241790] [INFO] create new emmc-fat controller...
[ 2.242504] - rustBoot::fs::controller @ line:200
[ 2.247239] Listing root directory:
[ 2.250831] - Found: SIGNED~1.ITB
[ 2.251027] loading fit-image...
[ 33.920214] loaded fit: 62202019 bytes, starting at addr: 0x600000
[ 33.920617] authenticating fit-image...
[ 33.921360] [INFO] computing "kernel" hash
[ 33.921830] - rustBoot::dt::fit @ line:289
[ 34.612911] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb
[ 34.613864] - rustBoot::dt::fit @ line:294
[ 34.614467] [INFO] kernel integrity consistent with supplied itb...
[ 34.615435] - rustBoot::dt::fit @ line:308
[ 34.616054] [INFO] computing "fdt" hash
[ 34.616605] - rustBoot::dt::fit @ line:289
[ 34.617811] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540
[ 34.618732] - rustBoot::dt::fit @ line:294
[ 34.619333] [INFO] fdt integrity consistent with supplied itb...
[ 34.620270] - rustBoot::dt::fit @ line:308
[ 34.620891] [INFO] computing "ramdisk" hash
[ 34.621484] - rustBoot::dt::fit @ line:289
[ 35.398004] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c
[ 35.398968] - rustBoot::dt::fit @ line:294
[ 35.399570] [INFO] ramdisk integrity consistent with supplied itb...
[ 35.400550] - rustBoot::dt::fit @ line:308
[ 35.401174] [INFO] computing "rbconfig" hash
[ 35.401774] - rustBoot::dt::fit @ line:289
[ 35.402376] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519
[ 35.403702] - rustBoot::dt::fit @ line:294
[ 35.404303] [INFO] rbconfig integrity consistent with supplied itb...
[ 35.405295] - rustBoot::dt::fit @ line:308
######## ecdsa signature checks out, image is authentic ########
[ 35.434296] relocating kernel to addr: 0x4600000
[ 35.456677] relocating initrd to addr: 0x6400000
[ 35.456885] load rbconfig...
[ 35.457266] patching dtb...
[ 35.457772] relocating dtb to addr: 0x400000
***************************************** Starting kernel ********************************************
[ 35.459487] Kernel panic!
Panic location:
 File 'hal\src\rpi\rpi4\exception\exception.rs', line 64, column 5
CPU Exception!
ESR_EL1: 0x8600000f
 Exception Class (EC) : 0x21 - N/A
 Instr Specific Syndrome (ISS): 0xf
FAR_EL1: 0x0000000004600000
SPSR_EL1: 0x600003c5
 Flags:
 Negative (N): Not set
 Zero (Z): Set
 Carry (C): Set
 Overflow (V): Not set
 Exception handling state:
 Debug (D): Masked
 SError (A): Masked
 IRQ (I): Masked
 FIQ (F): Masked
 Illegal Execution State (IL): Not set
ELR_EL1: 0x0000000004600000
General purpose register:
 x0 : 0x0000000000400000 x1 : 0x0000000000000000
 x2 : 0x0000000000000000 x3 : 0x0000000000000000
 x4 : 0x0000000000000006 x5 : 0x0000000000005ea8
 x6 : 0x0000000000000001 x7 : 0x0000000000000000
 x8 : 0x0000000004600000 x9 : 0x00000000000a7014
 x10: 0x00000000000013de x11: 0x00000000fe201000
 x12: 0x0000000000000019 x13: 0x000000000007f810
 x14: 0x0000000000000000 x15: 0x0000000000000000
 x16: 0x0000000000000030 x17: 0x0000000000000078
 x18: 0x0000000000400000 x19: 0x00000000000b0018
 x20: 0x000000004e650000 x21: 0x0000000000083bec
 x22: 0x00000000000000bc x23: 0x000000003b9aca00
 x24: 0x0000000000000244 x25: 0x00000000000f4240
 x26: 0x00000000000abea0 x27: 0x0000000000006521
 x28: 0x0000000000000264 x29: 0x0000000000081ddc
 lr : 0x000000000008bd5c
Comment options

Well, the first thing that Linux will do is to set up its own page tables. I don’t know by heart what the expectation from a previous boot loader stage is with respect to the architectural state of the memory subsystem.

For starters, I would probably just disable the MMU again before jumping to Linux.

Comment options

This doc outlines the AArch64 Linux boot protocol and architectural/micro-architectural expectations: https://www.kernel.org/doc/Documentation/arm64/booting.txt The MMU needs to be off and the caching needs to be explicitly disabled additionally.
...
On Thu, Apr 21, 2022 at 7:38 PM Andre Richter ***@***.***> wrote: Well, the first thing that Linux will do is to set up its own page tables. I don’t know by heart what the expectation from a previous boot loader stage is with respect to the architectural state of the memory subsystem. For starters, I would probably just disable caching again before jumping Linux. — Reply to this email directly, view it on GitHub <#155 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFMKYVEXFOPYGMKBGRFJETVGGOBZANCNFSM5TP7SIFQ> . You are receiving this because you commented.Message ID: <rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2610867 @github.com>
You must be logged in to vote
1 reply
Comment options

Yeah, completely forgot about this (got lost in MMU translation - 😁). I'll need to reset most of the hardware to an early state for linux to boot.

Comment options

@nihalpasham can you do me a favor and check what the speedup is with instruction caching alone?

Would be a nice datapoint to have.

You must be logged in to vote
1 reply
Comment options

enabled instruction-caching alone.

 // Enable the MMU and turn on instruction caching alone.
 SCTLR_EL1.modify(SCTLR_EL1::M::Enable + SCTLR_EL1::I::Cacheable);

Results: for the same set of operations

  • approx time: 47.5 seconds or half the original amount of time.
.....
.....
[ 2.399815] rpi4 version 0.1.0
[ 2.400183] Booting on: Raspberry Pi 4
[ 2.400638] MMU online. Special regions:
[ 2.401115] 0x00080000 - 0x000a4fff | 148 KiB | C RO PX | Kernel code and RO data
[ 2.402134] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 2.403119] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 2.404008] Current privilege level: EL1
[ 2.404484] Exception handling state:
[ 2.404928] Debug: Masked
[ 2.405318] SError: Masked
[ 2.405708] IRQ: Masked
[ 2.406098] FIQ: Masked
[ 2.406488] Architectural timer resolution: 18 ns
[ 2.407062] Drivers loaded:
[ 2.407398] 1. BCM GPIO
[ 2.407756] 2. BCM PL011 UART
[ 2.408178] Chars written: 1793
[ !!! ] Writing through the remapped UART at 0x1FFF_1000
[ 2.409253] [INFO] create new emmc-fat controller...
[ 2.409966] - rustBoot::fs::controller @ line:200
[ 2.414702] Listing root directory:
[ 2.418335] - Found: SIGNED~1.ITB
[ 2.418538] loading fit-image...
[ 34.053223] loaded fit: 62202019 bytes, starting at addr: 0x290000
[ 34.053627] authenticating fit-image...
[ 34.055365] [INFO] computing "kernel" hash
[ 34.055617] - rustBoot::dt::fit @ line:289
[ 56.309267] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb
[ 56.310223] - rustBoot::dt::fit @ line:294
[ 56.310875] [INFO] kernel integrity consistent with supplied itb...
[ 56.311793] - rustBoot::dt::fit @ line:308
[ 56.312622] [INFO] computing "fdt" hash
[ 56.312963] - rustBoot::dt::fit @ line:289
[ 56.333355] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540
[ 56.334278] - rustBoot::dt::fit @ line:294
[ 56.334926] [INFO] fdt integrity consistent with supplied itb...
[ 56.335816] - rustBoot::dt::fit @ line:308
[ 56.336685] [INFO] computing "ramdisk" hash
[ 56.337030] - rustBoot::dt::fit @ line:289
[ 81.346679] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c
[ 81.347646] - rustBoot::dt::fit @ line:294
[ 81.348290] [INFO] ramdisk integrity consistent with supplied itb...
[ 81.349227] - rustBoot::dt::fit @ line:308
[ 81.350135] [INFO] computing "rbconfig" hash
[ 81.350451] - rustBoot::dt::fit @ line:289
[ 81.351205] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519
[ 81.352380] - rustBoot::dt::fit @ line:294
[ 81.353024] [INFO] rbconfig integrity consistent with supplied itb...
[ 81.353972] - rustBoot::dt::fit @ line:308
######## ecdsa signature checks out, image is authentic ########
[ 81.556150] relocating kernel to addr: 0x4200000
....
....

enabled data-caching alone.

 // Enable the MMU and turn on data caching alone.
 SCTLR_EL1.modify(SCTLR_EL1::M::Enable + SCTLR_EL1::C::Cacheable);

Results: for the same set of operations

  • approx time: 61.15 seconds
....
....
[ 2.399946] rpi4 version 0.1.0
[ 2.400313] Booting on: Raspberry Pi 4
[ 2.400767] MMU online. Special regions:
[ 2.401245] 0x00080000 - 0x000a4fff | 148 KiB | C RO PX | Kernel code and RO data
[ 2.402263] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 2.403249] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 2.404138] Current privilege level: EL1
[ 2.404614] Exception handling state:
[ 2.405058] Debug: Masked
[ 2.405448] SError: Masked
[ 2.405838] IRQ: Masked
[ 2.406228] FIQ: Masked
[ 2.406618] Architectural timer resolution: 18 ns
[ 2.407192] Drivers loaded:
[ 2.407528] 1. BCM GPIO
[ 2.407885] 2. BCM PL011 UART
[ 2.408308] Chars written: 1793
[ !!! ] Writing through the remapped UART at 0x1FFF_1000
[ 2.409383] [INFO] create new emmc-fat controller...
[ 2.410096] - rustBoot::fs::controller @ line:200
[ 2.414937] Listing root directory:
[ 2.419248] - Found: SIGNED~1.ITB
[ 2.419470] loading fit-image...
[ 42.972299] loaded fit: 62202019 bytes, starting at addr: 0x290000
[ 42.972711] authenticating fit-image...
[ 42.977009] [INFO] computing "kernel" hash
[ 42.977267] - rustBoot::dt::fit @ line:289
[ 71.962372] [INFO] computed "kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb
[ 71.963334] - rustBoot::dt::fit @ line:294
[ 71.964113] [INFO] kernel integrity consistent with supplied itb...
[ 71.964905] - rustBoot::dt::fit @ line:308
[ 71.966198] [INFO] computing "fdt" hash
[ 71.966423] - rustBoot::dt::fit @ line:289
[ 71.992601] [INFO] computed "fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540
[ 71.993531] - rustBoot::dt::fit @ line:294
[ 71.994300] [INFO] fdt integrity consistent with supplied itb...
[ 71.995069] - rustBoot::dt::fit @ line:308
[ 71.996478] [INFO] computing "ramdisk" hash
[ 71.996746] - rustBoot::dt::fit @ line:289
[ 104.578328] [INFO] computed "ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c
[ 104.579301] - rustBoot::dt::fit @ line:294
[ 104.580056] [INFO] ramdisk integrity consistent with supplied itb...
[ 104.580881] - rustBoot::dt::fit @ line:308
[ 104.582422] [INFO] computing "rbconfig" hash
[ 104.582700] - rustBoot::dt::fit @ line:289
[ 104.583560] [INFO] computed "rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519
[ 104.584629] - rustBoot::dt::fit @ line:294
[ 104.585383] [INFO] rbconfig integrity consistent with supplied itb...
[ 104.586221] - rustBoot::dt::fit @ line:308
######## ecdsa signature checks out, image is authentic ########
[ 106.124374] relocating kernel to addr: 0x4200000

Conclusions:

  • for the above set of operations, instruction caching alone contributes to a 50% speed-up
  • for the same set of operations, data caching alone contributes to a 40% speed-up
  • cumulatively though i.e. with both instruction + data caching enabled, we get a massive 100x speed-up.

The results kind of make sense as hashing algorithms (are typically implemented in 3 steps - init, update and finalize). The bulk of the work is performed in the update step where we apply the same operations on new chunks of data, repeatedly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /