I just started with deep learning of x86 architecture and assembly so i decided to go from bare metal and write my own bootloader. To add some "functionality" i decided to make quine (from binary perspective) and i have some questions:
Could this be considered as valid quine ?
Is mechanic and design of my code acceptable?
Is there some asm patterns that generally should be followed (i come from Java background where patterns are everywhere)
How to properly format asm code ?
My code :
start:
mov si, 7F00h ;; set stack pointer after our bootloader
mov ax, 0h
mov ds, ax ;; set DS to 0;
mov di, 7C00h ;; set Data pointer to memory location where is our bootloader loaded
.printMemoryValue:
mov al, 0 ;; Using int15 to simulate pause for real time output, ax,ah,cx,dx dictate pause length
mov ah, 86h
mov cx, 0006H
mov dx, 8480H
int 15h
mov ah, 0Eh ;; Ah to 0eh setting for teletype output ( int 10h)
mov dx, [ds:di] ;; moving content of first memory location in dx (7c00h)
push dx ;; save dx in our stack
xor cx, cx ;; for (int i = 0; ;; for loop used to output 16 bit string (0 & 1)
.loopstart:
cmp cx, 00010h ;; i < 16;
je .loopend ;; break if i >= 16
pop dx
rol dx, 1 ;; rotating left 1 bit so we can extract MSF bit with our mask. because we write on screen from right to left
push dx ;; save our curent dx to stack
and dx, 0000000000000001 ;; mask
add dx, 30h ;; adding (30h) for ASCII ( 0 or 1)
push cx ;; saving our counter in stack because cx is volotile register
mov al, dl ;; moving low 8 bits from dx (dl) to our teletype output register Al for calling int 10h
int 10h ;; int 10h ( al = character to output if ah = 0eH, teletype )
pop cx
add cx, 1 ;; i++
jmp .loopstart
.loopend:
mov al, 000Ah ;; new Row
int 10h
mov al, 000Dh ;; carriage return
int 10h
add di, 2h ;; adding 2 to our di pointer ( because we are in 16bit mode)
cmp di, 7E00h ;; are we at the end of our bootloader (7c00h + 200h )?
je .hlt ;; if yes halt
jmp .printMemoryValue ;; print next memory location ( di is increased by 2 )
.hlt:
hlt ;; ende
times 510-($-$$) db 0
dw 0xAA55 ; => 0x55 0xAA (little endian byte order)
U can try code with mountable image available here
Any suggestions are welcome.
3 Answers 3
First correct some bugs.
start: mov si, 7F00h ;; set stack pointer after our bootloader mov ax, 0h mov ds, ax ;; set DS to 0; mov di, 7C00h ;; set Data pointer to memory location where is our bootloader loaded
The stackpointer on x86 is held in the SP
register. You've loaded SI
, a general purpose register that your program doesn't use at all hereafter.
Your bootloader program sits in memory from 7C00h to 7DFFh. If you put your stack behind the programcode and start with a stackpointer at 7F00h, you will have 256 bytes of stack (from 7E00h to 7EFFh) before you run into the program itself. If the forementioned error were corrected, this is exactly what would happen because of a second error further down in the program.
Each word you read from the program memory, you push it on the stack, but you leave it there and so the stack fills up to the point that you're pushing on top of the program memory!
Either pop dx
to balance the stack on each iteration of the program (write it just above .loopend:
), or much better don't use the stack this way since there are enough registers free to store that particular value.
Given that your program is a bootloader, there are no segment registers that you can trust to have a defined value! If you're going to setup a stack, you'll need to also initialize the SS
segment register. It's important to do that in the instruction directly above initializing the SP
register.
start:
xor ax, ax ; This is shorter/faster than "mov ax, 0h"
mov ds, ax
mov ss, ax
mov sp, 7F00h ; set stack pointer after our bootloader
mov di, 7C00h ; set Data pointer to memory location where is our bootloader loaded
Keep things together.
You've put the setup of the BIOS.Teletype function number outside of the loop. Although of course this is not wrong, it does diminish the readability of the program. I prefer to always have the mov ah, 0Eh
just above of the int 10h
.
Get rid of redundant things.
The mov al, 0
instruction doesn't serve any purpose for the BIOS delay function.
The ds:
segment override prefix in mov dx, [ds:di]
is redundant since DS
is the segment that will be used by default. Writing it increases the code size by 1 byte.
The BIOS.Teletype function does not require nor clobber the CX
register. You don't have to push cx
... pop cx
. Your mention of CX
being a volatile register does not apply when dealing with an API like BIOS. Just look at the registers in and registers out.
If you look up the Teletype function you will notice that it also requires you to setup the BH
register with the DisplayPage and if on a graphics screen the BL
register with the Color.
Use the best loop construct.
Currently your program uses a While constructs that requires 2 jumps on each iteration. Jumps are expensive in terms of execution time and so we always try to have as few as possible.
If you know that the loop is going to be run at least once then a Repeat-Until construct is better suited.
xor cx, cx
.loopstart:
... ; Counting upward [0,15]
inc cx
cmp cx, 16
jb .loopstart
If you know that the body of this loop does not depend on the actual value in the loop counter then a slightly better version will be:
mov cx, 16
.loopstart:
... ; Counting downward [16,1]
dec cx
jnz .loopstart
See the opportunities to write compacter code.
pop dx rol dx, 1 push dx and dx, 0000000000000001 add dx, 30h push cx mov al, dl int 10h pop cx
When you know that the BIOS.Teletype function expects the character in the AL
register, you should strive to do these calculation straight on the accumulator which additionaly uses shorter encodings.
pop ax
rol ax, 1
push ax
and al, 1
add al, "0"
mov ah, 0Eh
int 10h
We can go a bit further here. Instead of placing the data on the stack (push dx
), we can hold it in the BP
register (mov bp, [di]
). Use the registers that are at your disposal!
With some more clever re-arranging the above snippet becomes:
rol bp, 1 ; Produces a CF
mov ax, 0E00h ; Function number in AH, zeroing AL
adc al, "0" ; 0 + "0" + CF=0 ==> "0"
int 10h ; 0 + "0" + CF=1 ==> "1"
The conditional jumps can jump 128 bytes backwards (x86-16).
add di, 2h cmp di, 7E00h ;; are we at the end of our bootloader (7c00h + 200h )? je .hlt ;; if yes halt jmp .printMemoryValue ;; print next memory location ( di is increased by 2 ) .hlt: hlt ;; ende
Inverse the conditioncode and you'll no longer need the direct jump nor the extra label.
add di, 2
cmp di, 7E00h
jne .printMemoryValue
hlt
For robustness the jne
should become jb
. Sometimes things go wrong and so it could be, that the expected value of 7E00h never occurs producing an infinite loop! That's why prudent programmers prefer testing for less/below and greater/above conditions.
This is the complete code with all of the above applied:
start:
xor ax, ax ; This is shorter/faster than "mov ax, 0h"
mov ds, ax
mov ss, ax
mov sp, 7F00h ; set stack pointer after our bootloader
mov di, 7C00h ; set Data pointer to where is our bootloader loaded
.printMemoryValue:
mov cx, 0006H ; CX:DX = 00068480h Pause for about 0.4 sec
mov dx, 8480H
mov ah, 86h ; BIOS.Delay
int 15h
mov bp, [di] ; Moving content of memory location in BP
mov bx, 0007h ; Display page 0 in BH, Attribute WhiteOnBlack in BL
mov cx, 16
.loopstart:
rol bp, 1 ; Produces a CF
mov ax, 0E00h ; Function number in AH, zeroing AL
adc al, "0" ; 0 + "0" + CF=0 ==> "0"
int 10h ; 0 + "0" + CF=1 ==> "1"
dec cx
jnz .loopstart
mov ax, 0E0Dh ; Newline is carriage return plus linefeed
int 10h
mov ax, 0E0Ah
int 10h
add di, 2
cmp di, 7E00h
jb .printMemoryValue
hlt ; ende
times 510-($-$$) db 0
dw 0AA55h ; => 55h 0AAh (little endian byte order)
4.How to properly format asm code ?
Everybody has a personal style. Most people however like to use the nice tabular format that you see in my code examples.
What you should do is being consistent when it comes to number representations.
Choose between the hex prefix
0x
or the hex suffixh
, but try not to mix both in the same program. This is especially true in a short program.Don't forget to write the affix for numbers that need it.
and dx, 0000000000000001 --> and dx, 0000000000000001b
Write as many hex digits as the register can take.
mov ax, 0h --> mov ax, 0000h cmp cx, 00010h --> cmp cx, 0010h add dx, 30h --> add dx, 0030h mov al, 000Ah --> mov al, 0Ah mov al, 000Dh --> mov al, 0Dh add di, 2h --> add di, 0002h
Don't express numbers that don't really need it in hexadecimal.
mov ax, 0h --> mov ax, 0 cmp cx, 00010h --> cmp cx, 16 add di, 2h --> add di, 2
Sometimes expressing a number as a character improves readability.
add dl, 30h --> add dl, "0"
Since
cx
is not used other than a loop counter, you may consider making a countdown loop. That would spare 2 instructions:move cx, 16 .loopstart: pop dx .... dec cx jnz .loopstart
The MSB operations can also be shortened:
xor al,al # Clear al sal dx,1 # MSB lands in CF adc al,30h # Add with carry. al becomes 30h + MSB
Yet another jump could be spared by
jne printMemoryValue
.
As far as code formatting and as pointed out by @Sep Roland there lacks a definitive standard other than tabulating each instruction on a single line. I've gone a step further and you may have noticed how I nest my comments and group things that logically go together.
It may seem like a lot of work, but amply describe what your program is doing. I still struggle with coming up with meaningful statements, but you'd be surprised without detailed description how hard it is deciphering your code a year or two later. I've found this method too, solidifies concepts and makes debugging much easier.
; Read E820 map into a temporary buffer just above boot sector @ 7E0:0
mov ax, BOOT_SEG + 32 ; So MSB of EAX is nullified
mov es, ax
mov ds, ax ; So segment overrides are not required
xor di, di ; ES:DI = Pointer to base address of map
mov bx, di ; Initial continuation value
mov edx, 'PAMS' ; Function signature
push edx
; Top of loop to read first or next map entry
ReadNext: inc byte [MAP_ENTR] ; Bump number of map entries = 0 first iteration.
.skip: mov cl, 48 ; Let function call know how big entry can be.
mov ax, 0xe820 ; System Service function number.
int SYS_SERVICE
; Assert the possible error and termination conditions
jc .done ; CY = 1 can happen in all cases
cmp bl, 1 ; Is this the first entry
jb .done ; If zero, no more entries
ja .J0 ; Next code only needs happen on first iteration
; This need only happen on first iteration
pop edx
sub eax, edx ; Does BIOS even support this function
jz .J0 - 3
dec byte [MAP_ENTR] ; Bump value back to -1
jmp .done
mov [MAP_SIZE], cl ; Save actual size of entry returned by function.
.J0: jcxz .skip ; Ignore any null length entries
cmp cl, 20
jbe .J1
test byte [di + 20], 1 ; Ignore ACPI entries
jz .skip
; Test 64 bit value representing length for zero
.J1: mov eax, [di + 8] ; Get low order DWORD of length
or eax, [di + 12] ; Determine if QWORD value is zero
jz .skip
; Bump ES:DI pointer to next entry
add di, cx
jmp ReadNext
.done: or di, di ; Was a map even created
jnz .movemap
; Screen is completely blank now, so to indicate there was a problem with E820
mov ax, 0xb800 ; Point to video
mov es, ax
mov di, 0x7CE ; Offset to vertical & horizontal center of screen
; This will display flashing "[ ]" in yellow with "E" between in high intensity white
mov eax, 0xf458e5b
stosd
inc al
inc al ; AL = "]"
stosw
push ss ; Define upper for calculating total sectors
jmp MoveBlock ; Dont need to move map as it doesnt exist
; Move E820 entries immediately below bottom of stack frame.
.movemap: mov cx, di ; Get copy of total bytes in E820 map
shr cx, 2 ; CX / 4 = Total DWORDS to move
; Only every 4th entry is segment aligned (16 bytes), so offset in DI needs to be
; calculated so last entry of map terminates at bottom of stack.
mov ax, ss ; Get base of stack frame
sub bx, di
and bx, 15 ; BL = 0, 4, 8, 12
jz $ + 3
; Because BL <> 0 segment has to be skewed by one
dec ax ; Bump back one more segment
; Now offset can be saved and moved into DI
mov [MAP_ADDR], bx ; Lower half of long pointer
xchg di, bx ; Move offset into index
shr bx, 4 ; BL = Total # of 16 byte segments
sub ax, bx
mov [MAP_ADDR+2], ax ; Upper half of long pointer
mov es, ax ; ES:DI = Destination buffer
push ax ; Define upper for calculating total sectors
; Establish source pointer and then move CX DWORDS
xor si, si ; DS:SI = Source
rep movsd