BIOS disk read performance testing

This forums is for OS project announcements including project openings, new releases, update notices, test requests, and job openings (both paying and volunteer).
Post Reply
3 posts • Page 1 of 1
zerodivision
Posts: 12
Joined: Tue Sep 16, 2025 10:25 am

BIOS disk read performance testing

Post by zerodivision »

Hello again!

In a previous topic I mentioned that I do multisector reads using the BIOS in the bootloader, because it's more efficient than reading files sector-by-sector. This idea was met with some criticism, because the bootloader runs only once during the boot process and thus efficiency isn't a top priority and also because there is some additional code to handle different read sizes that needs to be maintained too.

Given that I had already implemented multisector reads (except for handling partial reads, which isn't necessary or even possible to do reliably after all) and that the bootloader is expected to load more than just the kernel, it made no sense to me to remove this feature. I'd simply trust my judgement and do nothing more with it, but then I got intrigued.
[引用]
Octocontrabass wrote: Wed Sep 24, 2025 12:40 amHow much faster is it, though? Have you measured? Is the difference really worth writing and maintaining all the extra code? Is there any difference at all?
At this point, I'd like to sincerely thank Octocontrabass for the idea. So I went ahead and wrote a test that can be used as a VBR of a partition without a filesystem or chainloaded by e.g. GRUB from a file. The test measures PIT ticks (approx. 18.2 per second) elapsed while reading 128 MiB of consecutive sectors in the following patterns:
  • 127 sectors at once, starting at LBA 0
  • 120 sectors at once, starting at LBA 0
  • 120 sectors at once, starting at LBA 1
  • 8 sectors at once, starting at LBA 0
  • 8 sectors at once, starting at LBA 1
  • Sector-by-sector, starting at LBA 0
Some explanations about why I chose exactly these values:
  • 127 sectors is the maximum transfer size the BIOS can be assumed to support
  • 8 sectors is a common filesystem block size, and also 8 logical sectors constitute a physical sector in 512e drives
  • 120 is the last multiple of 8 that doesn't exceed 127
  • Starting from LBA 1 causes unaligned accesses on 512e drives
Of course, probably no bootloader will be reading 128 MiB of consecutive sectors. However, the total size had to be made sufficiently large, so that any possibility of the disk buffer retaining data from previous iterations is eliminated and that the measured time is somewhat reliable despite the imprecision of the PIT.

Still, each file that has been installed to a freshly-formatted filesystem is actually expected to occupy consecutive sectors or filesystem blocks and for this reason I still consider this test to be valuable. However, directories aren't expected to occupy consecutive sectors, because they grow dynamically block-by-block while blocks are being allocated to new files in the meanwhile.

I ran the test on 3 computers and 5 drives in total:

Code: Select all

== HP 250 G6 (CSM) ==
Internal SSD
size@align ticks seconds MB/s
127 16 0.88 145.45
120@0 17 0.93 137.63
120@1 17 0.93 137.63
8@0 179 9.83 13.02
8@1 180 9.89 12.94
1 1436 78.87 1.62
USB 3.0 External HDD
size@align ticks seconds MB/s
127 26 1.43 89.51
120@0 28 1.54 83.12
120@1 28 1.54 83.12
8@0 221 12.14 10.54
8@1 221 12.14 10.54
1 1735 95.30 1.34
USB 3.0 Flash Drive
size@align ticks seconds MB/s
127 26 1.43 89.51
120@0 27 1.48 86.49
120@1 27 1.48 86.49
8@0 416 22.85 5.60
8@1 428 23.51 5.44
1 3283 180.32 0.71
== Dell Latitude D505 ==
Internal IDE HDD
size@align ticks seconds MB/s
127 96 5.27 24.29
120@0 97 5.33 24.02
120@1 97 5.33 24.02
8@0 109 5.99 21.37
8@1 109 5.99 21.37
1 729 40.04 3.20
== HP Compaq Presario CQ56 ==
Internal SATA HDD
size@align ticks seconds MB/s
127 30 1.65 77.58
120@0 31 1.70 75.30
120@1 31 1.70 75.30
8@0 101 5.55 23.06
8@1 102 5.60 22.86
1 679 37.29 3.43
Honestly, even myself I didn't expect that the differences would be this big. Especially on the HP 250 G6 machine it seems that each read request comes with a unreasonably large overhead, making smaller reads even less efficient.

Unfortunately, I only possess 512n drives and as such I wasn't able to test unaligned reads against aligned ones, nor aligned 120-sector reads against 127-sector reads that are both-ends unaligned 6 times out of 8. I'd be very grateful for any results with 512e drives.

Code: Select all

	org 0x7C00
	use16
	cli
	xor ax, ax
	mov ds, ax
	mov es, ax
	mov ss, ax
	mov sp, 0x7C00
	sti
	jmp 0x0000:enforce_cs
enforce_cs:
	mov [bootdev], dl
	mov si, str_size_127
	call print_string
	mov eax, 262144
	xor ebx, ebx
	xor ecx, ecx
	mov cl, 127
	call read_test
	jc .error_read
	mov si, str_size_120_0
	call print_string
	mov eax, 262144
	xor ebx, ebx
	xor ecx, ecx
	mov cl, 120
	call read_test
	jc .error_read
	mov si, str_size_120_1
	call print_string
	mov eax, 262144
	xor ebx, ebx
	xor ecx, ecx
	mov bl, 1
	mov cl, 120
	call read_test
	jc .error_read
	mov si, str_size_8_0
	call print_string
	mov eax, 262144
	xor ebx, ebx
	xor ecx, ecx
	mov cl, 8
	call read_test
	jc .error_read
	mov si, str_size_8_1
	call print_string
	mov eax, 262144
	xor ebx, ebx
	xor ecx, ecx
	mov bl, 1
	mov cl, 8
	call read_test
	jc .error_read
	mov si, str_size_1
	call print_string
	mov eax, 262144
	xor ebx, ebx
	xor ecx, ecx
	mov cl, 1
	call read_test
	jc .error_read
	jmp .halt
.error_read:
	mov si, str_error_read
	call print_string
.halt:
	cli
	hlt
	jmp .halt
; In:
; EAX = Sectors to read total
; EBX = Start LBA to read
; ECX = Sectors to read at once (max 127)
; Out:
; AX = PIT ticks elapsed
read_test:
	push bx
	push dx
	push si
	mov [.remaining], eax
	mov si, dap
	mov word [si + 2], cx
	mov word [si + 4], 0
	mov word [si + 6], 0x1000
	mov dword [si + 8], ebx
	mov dword [si + 12], 0
	call read_pit_count
	mov [.pit_count], ax
.iterate_read:
	cmp [.remaining], 0
	je .ok
	cmp [.remaining], ecx
	jae .read
	mov ecx, [.remaining]
	mov word [si + 2], cx
.read:
	mov ah, 0x42
	mov dl, [bootdev]
	int 0x13
	jc .return
	; Advance segment
	mov ax, cx
	shl ax, 5
	add word [si + 6], ax
	; Check if next read fits in the rest of the buffer
	mov bx, [si + 6]
	add bx, ax
	cmp bx, 0x2000
	jb .continue
	; Reset segment
	mov word [si + 6], 0x1000
.continue:
	; Advance LBA
	add dword [si + 8], ecx
	; Decrease remaining
	sub [.remaining], ecx
	jmp .iterate_read
.ok:
	call read_pit_count
	sub ax, [.pit_count]
	call print_word
	call print_newline
	clc
.return:
	pop si
	pop dx
	pop bx
	ret
	align 4
.remaining dd 0
.pit_count dw 0
; Out:
; AX = Current count
read_pit_count:
	push cx
	push dx
	mov ah, 0x00
	int 0x1A
	mov ax, dx
	pop dx
	pop cx
	ret
print_newline:
	push ax
	push bx
	mov bx, 0x0007
	mov ax, 0x0E0D
	int 0x10
	mov ax, 0x0E0A
	int 0x10
	pop bx
	pop ax
	ret
; In:
; DS:Si -> String to print
print_string:
	pusha
	mov bx, 0x0007
.iterate_char:
	lodsb
	test al, al
	jz .return
	mov ah, 0x0E
	int 0x10
	jmp .iterate_char
.return:
	popa
	ret
; In:
; AX = Word to print
print_word:
	pusha
	mov bx, 5
	mov cx, 10
.iterate_digit:
	xor dx, dx
	div cx
	mov bp, dx
	or bp, ax
	jz .print
	dec bx
	add dl, '0'
	mov [.buffer + bx], dl
	jmp .iterate_digit
.print:
	cmp bx, 5
	je .print_0
	mov si, .buffer
	add si, bx
	call print_string
	jmp .return
.print_0:
	mov ax, 0x0E30
	mov bx, 0x0007
	int 0x10
.return:
	popa
	ret
.buffer rb 5
.nul db 0
str_error_read db "E:Read", 0
str_size_1 db "1: ", 0
str_size_8_0 db "8@0: ", 0
str_size_8_1 db "8@1: ", 0
str_size_120_0 db "120@0: ", 0
str_size_120_1 db "120@1: ", 0
str_size_127 db "127: ", 0
	align 4
dap:
.size db 16
 db 0
.sectors db 0
 db 0
.offset dw 0
.segment dw 0
.start_lba dq 0
	times 510 - ($ - $$) nop
	dw 0xAA55
bootdev rb 1
I'm looking forward to your input.

Thank you in advance!
Last edited by zerodivision on Wed Oct 15, 2025 1:31 pm, edited 1 time in total.
Octocontrabass
Member
Member
Posts: 6011
Joined: Mon Mar 25, 2013 7:01 pm

Re: BIOS disk read performance testing

Post by Octocontrabass »

[引用]
zerodivision wrote: Sat Oct 11, 2025 1:42 pmUnfortunately, I only possess 512n drives
Are you sure? Sometimes it's hard to tell.
[引用]
zerodivision wrote: Sat Oct 11, 2025 1:42 pmI'm looking forward to your input.
It's a bigger difference than I was expecting, but it also confirms that one sector at a time can be fast enough if you aren't loading a ridiculous amount of data.

Also, depending on the drive you're using, reading "empty" sectors may be a lot faster than reading sectors with actual data. I don't know if it would be a big enough difference to skew your results, though.
zerodivision
Posts: 12
Joined: Tue Sep 16, 2025 10:25 am

Re: BIOS disk read performance testing

Post by zerodivision »

Thank you for your reply!
Octocontrabass wrote:Are you sure? Sometimes it's hard to tell.
Oops, you're right. The "USB 3.0 External HDD" listed above is a 512e drive.

I considered that reading 120 aligned sectors at a time could be possibly faster than reading 127 sectors at a time, where they are 1 out of 8 times start-aligned, 1 out of 8 times end-aligned and 6 out of 8 times unaligned on both ends. But the disk buffer seems to be doing wonders and the main cause for the overhead seems instead to be the BIOS and/or the communication with the respective controller.

I don't know whether unaligned reads on other 512e drives would incur bigger latencies, or whether the latencies on this disk would be a more significant factor if not for BIOS and/or or communication overhead. I don't think I can reliably test this disk on another computer, because this computer is the only one that I can test and supports USB 3.0 speeds. Would you or anyone else be willing to test another 512e drive?
Octocontrabass wrote:It's a bigger difference than I was expecting
Same for me. I was expecting a factor of around 2 or 3, and not between 7.59 (Dell Latitude D505, Internal IDE HDD) and 126.07 (HP 250 G6 (CSM), USB 3.0 Flash Drive).

Interestingly, the worst overhead occurs on the newest computer out of those tested. I suppose that, due to this exact overhead, the "Internal SSD" listed above reaches only a fraction of its throughput even at reading the maximum 127 sectors at a time.
Octocontrabass wrote:but it also confirms that one sector at a time can be fast enough if you aren't loading a ridiculous amount of data.
When reading relatively small amounts of data, 3.20 MB/s or even 0.71 MB/s isn't a bad throughput. However, I expect to be reading several files before starting the kernel, including the bootloader, the kernel, the VFS, the console font, and at least the drivers for the boot device and the filesystem on the boot partition. The font itself might exceed 1 MB after hopefully implementing most of Unicode sometime in the future. Reading all of this at 0.71 MB/s throughput would cause a slowdown of 2 to 3 seconds. Even at 3.20 MB/s it would be around 0.5 seconds.

If there are more slowdowns later during the boot process, they will all add up. This is why most popular Linux distributions need over a minute to boot from an IDE drive and, without having checked the source code, I doubt GRUB even reads the kernel and initrd sector-by-sector, otherwise it would be safe to assume that SSD booting time on the HP 250 G6 would be at least 40 seconds instead of 10 that it is. Besides, the slowdown due to reading sector-by-sector is probably among the easier ones to avoid.
Octocontrabass wrote:Also, depending on the drive you're using, reading "empty" sectors may be a lot faster than reading sectors with actual data.
I wasn't aware of that. But except for the USB flash drive, the other drives tested have a partition with files starting at 1 MiB, so it's rather unlikely that they'll have disproportionally more "empty" sectors than a bootloader would be reading (e.g. in the last block of a file).
Post Reply

3 posts • Page 1 of 1

Return to "Announcements, Test Requests, & Job openings"