BIOS disk read performance testing
This forums is for OS project announcements including project openings, new releases, update notices, test requests, and job openings (both paying and volunteer).
- zerodivision
- Posts: 12
- Joined: Tue Sep 16, 2025 10:25 am
BIOS disk read performance testing
Post by zerodivision »
Hello again!
In a previous topic I mentioned that I do multisector reads using the BIOS in the bootloader, because it's more efficient than reading files sector-by-sector. This idea was met with some criticism, because the bootloader runs only once during the boot process and thus efficiency isn't a top priority and also because there is some additional code to handle different read sizes that needs to be maintained too.
Given that I had already implemented multisector reads (except for handling partial reads, which isn't necessary or even possible to do reliably after all) and that the bootloader is expected to load more than just the kernel, it made no sense to me to remove this feature. I'd simply trust my judgement and do nothing more with it, but then I got intrigued.
[引用]
Still, each file that has been installed to a freshly-formatted filesystem is actually expected to occupy consecutive sectors or filesystem blocks and for this reason I still consider this test to be valuable. However, directories aren't expected to occupy consecutive sectors, because they grow dynamically block-by-block while blocks are being allocated to new files in the meanwhile.
I ran the test on 3 computers and 5 drives in total:
Honestly, even myself I didn't expect that the differences would be this big. Especially on the HP 250 G6 machine it seems that each read request comes with a unreasonably large overhead, making smaller reads even less efficient.
Unfortunately, I only possess 512n drives and as such I wasn't able to test unaligned reads against aligned ones, nor aligned 120-sector reads against 127-sector reads that are both-ends unaligned 6 times out of 8. I'd be very grateful for any results with 512e drives.
I'm looking forward to your input.
Thank you in advance!
In a previous topic I mentioned that I do multisector reads using the BIOS in the bootloader, because it's more efficient than reading files sector-by-sector. This idea was met with some criticism, because the bootloader runs only once during the boot process and thus efficiency isn't a top priority and also because there is some additional code to handle different read sizes that needs to be maintained too.
Given that I had already implemented multisector reads (except for handling partial reads, which isn't necessary or even possible to do reliably after all) and that the bootloader is expected to load more than just the kernel, it made no sense to me to remove this feature. I'd simply trust my judgement and do nothing more with it, but then I got intrigued.
[引用]
Octocontrabass wrote: ↑ Wed Sep 24, 2025 12:40 amHow much faster is it, though? Have you measured? Is the difference really worth writing and maintaining all the extra code? Is there any difference at all?
At this point, I'd like to sincerely thank Octocontrabass for the idea. So I went ahead and wrote a test that can be used as a VBR of a partition without a filesystem or chainloaded by e.g. GRUB from a file. The test measures PIT ticks (approx. 18.2 per second) elapsed while reading 128 MiB of consecutive sectors in the following patterns:- 127 sectors at once, starting at LBA 0
- 120 sectors at once, starting at LBA 0
- 120 sectors at once, starting at LBA 1
- 8 sectors at once, starting at LBA 0
- 8 sectors at once, starting at LBA 1
- Sector-by-sector, starting at LBA 0
- 127 sectors is the maximum transfer size the BIOS can be assumed to support
- 8 sectors is a common filesystem block size, and also 8 logical sectors constitute a physical sector in 512e drives
- 120 is the last multiple of 8 that doesn't exceed 127
- Starting from LBA 1 causes unaligned accesses on 512e drives
Still, each file that has been installed to a freshly-formatted filesystem is actually expected to occupy consecutive sectors or filesystem blocks and for this reason I still consider this test to be valuable. However, directories aren't expected to occupy consecutive sectors, because they grow dynamically block-by-block while blocks are being allocated to new files in the meanwhile.
I ran the test on 3 computers and 5 drives in total:
Code: Select all
== HP 250 G6 (CSM) ==
Internal SSD
size@align ticks seconds MB/s
127 16 0.88 145.45
120@0 17 0.93 137.63
120@1 17 0.93 137.63
8@0 179 9.83 13.02
8@1 180 9.89 12.94
1 1436 78.87 1.62
USB 3.0 External HDD
size@align ticks seconds MB/s
127 26 1.43 89.51
120@0 28 1.54 83.12
120@1 28 1.54 83.12
8@0 221 12.14 10.54
8@1 221 12.14 10.54
1 1735 95.30 1.34
USB 3.0 Flash Drive
size@align ticks seconds MB/s
127 26 1.43 89.51
120@0 27 1.48 86.49
120@1 27 1.48 86.49
8@0 416 22.85 5.60
8@1 428 23.51 5.44
1 3283 180.32 0.71
== Dell Latitude D505 ==
Internal IDE HDD
size@align ticks seconds MB/s
127 96 5.27 24.29
120@0 97 5.33 24.02
120@1 97 5.33 24.02
8@0 109 5.99 21.37
8@1 109 5.99 21.37
1 729 40.04 3.20
== HP Compaq Presario CQ56 ==
Internal SATA HDD
size@align ticks seconds MB/s
127 30 1.65 77.58
120@0 31 1.70 75.30
120@1 31 1.70 75.30
8@0 101 5.55 23.06
8@1 102 5.60 22.86
1 679 37.29 3.43
Unfortunately, I only possess 512n drives and as such I wasn't able to test unaligned reads against aligned ones, nor aligned 120-sector reads against 127-sector reads that are both-ends unaligned 6 times out of 8. I'd be very grateful for any results with 512e drives.
Code: Select all
org 0x7C00
use16
cli
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
sti
jmp 0x0000:enforce_cs
enforce_cs:
mov [bootdev], dl
mov si, str_size_127
call print_string
mov eax, 262144
xor ebx, ebx
xor ecx, ecx
mov cl, 127
call read_test
jc .error_read
mov si, str_size_120_0
call print_string
mov eax, 262144
xor ebx, ebx
xor ecx, ecx
mov cl, 120
call read_test
jc .error_read
mov si, str_size_120_1
call print_string
mov eax, 262144
xor ebx, ebx
xor ecx, ecx
mov bl, 1
mov cl, 120
call read_test
jc .error_read
mov si, str_size_8_0
call print_string
mov eax, 262144
xor ebx, ebx
xor ecx, ecx
mov cl, 8
call read_test
jc .error_read
mov si, str_size_8_1
call print_string
mov eax, 262144
xor ebx, ebx
xor ecx, ecx
mov bl, 1
mov cl, 8
call read_test
jc .error_read
mov si, str_size_1
call print_string
mov eax, 262144
xor ebx, ebx
xor ecx, ecx
mov cl, 1
call read_test
jc .error_read
jmp .halt
.error_read:
mov si, str_error_read
call print_string
.halt:
cli
hlt
jmp .halt
; In:
; EAX = Sectors to read total
; EBX = Start LBA to read
; ECX = Sectors to read at once (max 127)
; Out:
; AX = PIT ticks elapsed
read_test:
push bx
push dx
push si
mov [.remaining], eax
mov si, dap
mov word [si + 2], cx
mov word [si + 4], 0
mov word [si + 6], 0x1000
mov dword [si + 8], ebx
mov dword [si + 12], 0
call read_pit_count
mov [.pit_count], ax
.iterate_read:
cmp [.remaining], 0
je .ok
cmp [.remaining], ecx
jae .read
mov ecx, [.remaining]
mov word [si + 2], cx
.read:
mov ah, 0x42
mov dl, [bootdev]
int 0x13
jc .return
; Advance segment
mov ax, cx
shl ax, 5
add word [si + 6], ax
; Check if next read fits in the rest of the buffer
mov bx, [si + 6]
add bx, ax
cmp bx, 0x2000
jb .continue
; Reset segment
mov word [si + 6], 0x1000
.continue:
; Advance LBA
add dword [si + 8], ecx
; Decrease remaining
sub [.remaining], ecx
jmp .iterate_read
.ok:
call read_pit_count
sub ax, [.pit_count]
call print_word
call print_newline
clc
.return:
pop si
pop dx
pop bx
ret
align 4
.remaining dd 0
.pit_count dw 0
; Out:
; AX = Current count
read_pit_count:
push cx
push dx
mov ah, 0x00
int 0x1A
mov ax, dx
pop dx
pop cx
ret
print_newline:
push ax
push bx
mov bx, 0x0007
mov ax, 0x0E0D
int 0x10
mov ax, 0x0E0A
int 0x10
pop bx
pop ax
ret
; In:
; DS:Si -> String to print
print_string:
pusha
mov bx, 0x0007
.iterate_char:
lodsb
test al, al
jz .return
mov ah, 0x0E
int 0x10
jmp .iterate_char
.return:
popa
ret
; In:
; AX = Word to print
print_word:
pusha
mov bx, 5
mov cx, 10
.iterate_digit:
xor dx, dx
div cx
mov bp, dx
or bp, ax
jz .print
dec bx
add dl, '0'
mov [.buffer + bx], dl
jmp .iterate_digit
.print:
cmp bx, 5
je .print_0
mov si, .buffer
add si, bx
call print_string
jmp .return
.print_0:
mov ax, 0x0E30
mov bx, 0x0007
int 0x10
.return:
popa
ret
.buffer rb 5
.nul db 0
str_error_read db "E:Read", 0
str_size_1 db "1: ", 0
str_size_8_0 db "8@0: ", 0
str_size_8_1 db "8@1: ", 0
str_size_120_0 db "120@0: ", 0
str_size_120_1 db "120@1: ", 0
str_size_127 db "127: ", 0
align 4
dap:
.size db 16
db 0
.sectors db 0
db 0
.offset dw 0
.segment dw 0
.start_lba dq 0
times 510 - ($ - $$) nop
dw 0xAA55
bootdev rb 1
Thank you in advance!
Last edited by zerodivision on Wed Oct 15, 2025 1:31 pm, edited 1 time in total.
- Octocontrabass
- Member
Member - Posts: 6011
- Joined: Mon Mar 25, 2013 7:01 pm
Re: BIOS disk read performance testing
Post by Octocontrabass »
[引用]
Are you sure? Sometimes it's hard to tell.
[引用] It's a bigger difference than I was expecting, but it also confirms that one sector at a time can be fast enough if you aren't loading a ridiculous amount of data.
Also, depending on the drive you're using, reading "empty" sectors may be a lot faster than reading sectors with actual data. I don't know if it would be a big enough difference to skew your results, though.
[引用] It's a bigger difference than I was expecting, but it also confirms that one sector at a time can be fast enough if you aren't loading a ridiculous amount of data.
Also, depending on the drive you're using, reading "empty" sectors may be a lot faster than reading sectors with actual data. I don't know if it would be a big enough difference to skew your results, though.
- zerodivision
- Posts: 12
- Joined: Tue Sep 16, 2025 10:25 am
Re: BIOS disk read performance testing
Post by zerodivision »
Thank you for your reply!
I considered that reading 120 aligned sectors at a time could be possibly faster than reading 127 sectors at a time, where they are 1 out of 8 times start-aligned, 1 out of 8 times end-aligned and 6 out of 8 times unaligned on both ends. But the disk buffer seems to be doing wonders and the main cause for the overhead seems instead to be the BIOS and/or the communication with the respective controller.
I don't know whether unaligned reads on other 512e drives would incur bigger latencies, or whether the latencies on this disk would be a more significant factor if not for BIOS and/or or communication overhead. I don't think I can reliably test this disk on another computer, because this computer is the only one that I can test and supports USB 3.0 speeds. Would you or anyone else be willing to test another 512e drive?
Interestingly, the worst overhead occurs on the newest computer out of those tested. I suppose that, due to this exact overhead, the "Internal SSD" listed above reaches only a fraction of its throughput even at reading the maximum 127 sectors at a time.
If there are more slowdowns later during the boot process, they will all add up. This is why most popular Linux distributions need over a minute to boot from an IDE drive and, without having checked the source code, I doubt GRUB even reads the kernel and initrd sector-by-sector, otherwise it would be safe to assume that SSD booting time on the HP 250 G6 would be at least 40 seconds instead of 10 that it is. Besides, the slowdown due to reading sector-by-sector is probably among the easier ones to avoid.
Oops, you're right. The "USB 3.0 External HDD" listed above is a 512e drive.Octocontrabass wrote:Are you sure? Sometimes it's hard to tell.
I considered that reading 120 aligned sectors at a time could be possibly faster than reading 127 sectors at a time, where they are 1 out of 8 times start-aligned, 1 out of 8 times end-aligned and 6 out of 8 times unaligned on both ends. But the disk buffer seems to be doing wonders and the main cause for the overhead seems instead to be the BIOS and/or the communication with the respective controller.
I don't know whether unaligned reads on other 512e drives would incur bigger latencies, or whether the latencies on this disk would be a more significant factor if not for BIOS and/or or communication overhead. I don't think I can reliably test this disk on another computer, because this computer is the only one that I can test and supports USB 3.0 speeds. Would you or anyone else be willing to test another 512e drive?
Same for me. I was expecting a factor of around 2 or 3, and not between 7.59 (Dell Latitude D505, Internal IDE HDD) and 126.07 (HP 250 G6 (CSM), USB 3.0 Flash Drive).Octocontrabass wrote:It's a bigger difference than I was expecting
Interestingly, the worst overhead occurs on the newest computer out of those tested. I suppose that, due to this exact overhead, the "Internal SSD" listed above reaches only a fraction of its throughput even at reading the maximum 127 sectors at a time.
When reading relatively small amounts of data, 3.20 MB/s or even 0.71 MB/s isn't a bad throughput. However, I expect to be reading several files before starting the kernel, including the bootloader, the kernel, the VFS, the console font, and at least the drivers for the boot device and the filesystem on the boot partition. The font itself might exceed 1 MB after hopefully implementing most of Unicode sometime in the future. Reading all of this at 0.71 MB/s throughput would cause a slowdown of 2 to 3 seconds. Even at 3.20 MB/s it would be around 0.5 seconds.Octocontrabass wrote:but it also confirms that one sector at a time can be fast enough if you aren't loading a ridiculous amount of data.
If there are more slowdowns later during the boot process, they will all add up. This is why most popular Linux distributions need over a minute to boot from an IDE drive and, without having checked the source code, I doubt GRUB even reads the kernel and initrd sector-by-sector, otherwise it would be safe to assume that SSD booting time on the HP 250 G6 would be at least 40 seconds instead of 10 that it is. Besides, the slowdown due to reading sector-by-sector is probably among the easier ones to avoid.
I wasn't aware of that. But except for the USB flash drive, the other drives tested have a partition with files starting at 1 MiB, so it's rather unlikely that they'll have disproportionally more "empty" sectors than a bootloader would be reading (e.g. in the last block of a file).Octocontrabass wrote:Also, depending on the drive you're using, reading "empty" sectors may be a lot faster than reading sectors with actual data.