9
\$\begingroup\$

I just started with deep learning of x86 architecture and assembly so i decided to go from bare metal and write my own bootloader. To add some "functionality" i decided to make quine (from binary perspective) and i have some questions:

  1. Could this be considered as valid quine ?

  2. Is mechanic and design of my code acceptable?

  3. Is there some asm patterns that generally should be followed (i come from Java background where patterns are everywhere)

  4. How to properly format asm code ?

My code :

start:
mov si, 7F00h ;; set stack pointer after our bootloader 
mov ax, 0h 
mov ds, ax ;; set DS to 0;
mov di, 7C00h ;; set Data pointer to memory location where is our bootloader loaded 
.printMemoryValue:
mov al, 0 ;; Using int15 to simulate pause for real time output, ax,ah,cx,dx dictate pause length 
mov ah, 86h 
mov cx, 0006H
mov dx, 8480H
int 15h
mov ah, 0Eh ;; Ah to 0eh setting for teletype output ( int 10h)
mov dx, [ds:di] ;; moving content of first memory location in dx (7c00h)
push dx ;; save dx in our stack
xor cx, cx ;; for (int i = 0; ;; for loop used to output 16 bit string (0 & 1)
.loopstart:
cmp cx, 00010h ;; i < 16;
je .loopend ;; break if i >= 16 
pop dx 
rol dx, 1 ;; rotating left 1 bit so we can extract MSF bit with our mask. because we write on screen from right to left
push dx ;; save our curent dx to stack
and dx, 0000000000000001 ;; mask 
add dx, 30h ;; adding (30h) for ASCII ( 0 or 1)
push cx ;; saving our counter in stack because cx is volotile register 
mov al, dl ;; moving low 8 bits from dx (dl) to our teletype output register Al for calling int 10h
int 10h ;; int 10h ( al = character to output if ah = 0eH, teletype )
pop cx
add cx, 1 ;; i++
jmp .loopstart 
.loopend:
mov al, 000Ah ;; new Row
int 10h
mov al, 000Dh ;; carriage return
int 10h
add di, 2h ;; adding 2 to our di pointer ( because we are in 16bit mode)
cmp di, 7E00h ;; are we at the end of our bootloader (7c00h + 200h )?
je .hlt ;; if yes halt
jmp .printMemoryValue ;; print next memory location ( di is increased by 2 )
.hlt:
hlt ;; ende 
times 510-($-$$) db 0 
dw 0xAA55 ; => 0x55 0xAA (little endian byte order)

U can try code with mountable image available here

Any suggestions are welcome.

asked Jan 9, 2018 at 16:09
\$\endgroup\$

3 Answers 3

6
\$\begingroup\$

First correct some bugs.

start:
 mov si, 7F00h ;; set stack pointer after our bootloader 
 mov ax, 0h 
 mov ds, ax ;; set DS to 0;
 mov di, 7C00h ;; set Data pointer to memory location where is our bootloader loaded 

The stackpointer on x86 is held in the SP register. You've loaded SI, a general purpose register that your program doesn't use at all hereafter.

Your bootloader program sits in memory from 7C00h to 7DFFh. If you put your stack behind the programcode and start with a stackpointer at 7F00h, you will have 256 bytes of stack (from 7E00h to 7EFFh) before you run into the program itself. If the forementioned error were corrected, this is exactly what would happen because of a second error further down in the program.

Each word you read from the program memory, you push it on the stack, but you leave it there and so the stack fills up to the point that you're pushing on top of the program memory! Either pop dx to balance the stack on each iteration of the program (write it just above .loopend:), or much better don't use the stack this way since there are enough registers free to store that particular value.

Given that your program is a bootloader, there are no segment registers that you can trust to have a defined value! If you're going to setup a stack, you'll need to also initialize the SS segment register. It's important to do that in the instruction directly above initializing the SP register.

start:
 xor ax, ax ; This is shorter/faster than "mov ax, 0h"
 mov ds, ax
 mov ss, ax
 mov sp, 7F00h ; set stack pointer after our bootloader 
 mov di, 7C00h ; set Data pointer to memory location where is our bootloader loaded 

Keep things together.

You've put the setup of the BIOS.Teletype function number outside of the loop. Although of course this is not wrong, it does diminish the readability of the program. I prefer to always have the mov ah, 0Eh just above of the int 10h.

Get rid of redundant things.

The mov al, 0 instruction doesn't serve any purpose for the BIOS delay function.

The ds: segment override prefix in mov dx, [ds:di] is redundant since DS is the segment that will be used by default. Writing it increases the code size by 1 byte.

The BIOS.Teletype function does not require nor clobber the CX register. You don't have to push cx ... pop cx. Your mention of CX being a volatile register does not apply when dealing with an API like BIOS. Just look at the registers in and registers out.
If you look up the Teletype function you will notice that it also requires you to setup the BH register with the DisplayPage and if on a graphics screen the BL register with the Color.

Use the best loop construct.

Currently your program uses a While constructs that requires 2 jumps on each iteration. Jumps are expensive in terms of execution time and so we always try to have as few as possible.

If you know that the loop is going to be run at least once then a Repeat-Until construct is better suited.

 xor cx, cx
.loopstart:
 ... ; Counting upward [0,15]
 inc cx
 cmp cx, 16
 jb .loopstart

If you know that the body of this loop does not depend on the actual value in the loop counter then a slightly better version will be:

 mov cx, 16
.loopstart:
 ... ; Counting downward [16,1]
 dec cx
 jnz .loopstart

See the opportunities to write compacter code.

pop dx
rol dx, 1
push dx
and dx, 0000000000000001
add dx, 30h
push cx
mov al, dl
int 10h
pop cx

When you know that the BIOS.Teletype function expects the character in the AL register, you should strive to do these calculation straight on the accumulator which additionaly uses shorter encodings.

pop ax
rol ax, 1
push ax
and al, 1
add al, "0"
mov ah, 0Eh
int 10h

We can go a bit further here. Instead of placing the data on the stack (push dx), we can hold it in the BP register (mov bp, [di]). Use the registers that are at your disposal!
With some more clever re-arranging the above snippet becomes:

rol bp, 1 ; Produces a CF
mov ax, 0E00h ; Function number in AH, zeroing AL
adc al, "0" ; 0 + "0" + CF=0 ==> "0"
int 10h ; 0 + "0" + CF=1 ==> "1"

The conditional jumps can jump 128 bytes backwards (x86-16).

add di, 2h
cmp di, 7E00h ;; are we at the end of our bootloader (7c00h + 200h )?
je .hlt ;; if yes halt
jmp .printMemoryValue ;; print next memory location ( di is increased by 2 )
.hlt:
hlt ;; ende 

Inverse the conditioncode and you'll no longer need the direct jump nor the extra label.

add di, 2
cmp di, 7E00h
jne .printMemoryValue
hlt

For robustness the jne should become jb. Sometimes things go wrong and so it could be, that the expected value of 7E00h never occurs producing an infinite loop! That's why prudent programmers prefer testing for less/below and greater/above conditions.


This is the complete code with all of the above applied:

start:
 xor ax, ax ; This is shorter/faster than "mov ax, 0h"
 mov ds, ax
 mov ss, ax
 mov sp, 7F00h ; set stack pointer after our bootloader 
 mov di, 7C00h ; set Data pointer to where is our bootloader loaded 
.printMemoryValue:
 mov cx, 0006H ; CX:DX = 00068480h Pause for about 0.4 sec
 mov dx, 8480H
 mov ah, 86h ; BIOS.Delay
 int 15h
 mov bp, [di] ; Moving content of memory location in BP
 mov bx, 0007h ; Display page 0 in BH, Attribute WhiteOnBlack in BL 
 mov cx, 16
.loopstart:
 rol bp, 1 ; Produces a CF
 mov ax, 0E00h ; Function number in AH, zeroing AL
 adc al, "0" ; 0 + "0" + CF=0 ==> "0"
 int 10h ; 0 + "0" + CF=1 ==> "1"
 dec cx
 jnz .loopstart
 mov ax, 0E0Dh ; Newline is carriage return plus linefeed
 int 10h
 mov ax, 0E0Ah
 int 10h
 add di, 2
 cmp di, 7E00h
 jb .printMemoryValue
 hlt ; ende
 times 510-($-$$) db 0
 dw 0AA55h ; => 55h 0AAh (little endian byte order)

4.How to properly format asm code ?

Everybody has a personal style. Most people however like to use the nice tabular format that you see in my code examples.

What you should do is being consistent when it comes to number representations.

  • Choose between the hex prefix 0x or the hex suffix h, but try not to mix both in the same program. This is especially true in a short program.

  • Don't forget to write the affix for numbers that need it.

    and dx, 0000000000000001 --> and dx, 0000000000000001b
    
  • Write as many hex digits as the register can take.

    mov ax, 0h --> mov ax, 0000h
    cmp cx, 00010h --> cmp cx, 0010h
    add dx, 30h --> add dx, 0030h
    mov al, 000Ah --> mov al, 0Ah
    mov al, 000Dh --> mov al, 0Dh
    add di, 2h --> add di, 0002h
    
  • Don't express numbers that don't really need it in hexadecimal.

    mov ax, 0h --> mov ax, 0
    cmp cx, 00010h --> cmp cx, 16
    add di, 2h --> add di, 2
    
  • Sometimes expressing a number as a character improves readability.

    add dl, 30h --> add dl, "0"
    
answered Jan 14, 2018 at 18:31
\$\endgroup\$
6
\$\begingroup\$
  • Since cx is not used other than a loop counter, you may consider making a countdown loop. That would spare 2 instructions:

     move cx, 16
    .loopstart:
     pop dx
     ....
     dec cx
     jnz .loopstart
    
  • The MSB operations can also be shortened:

     xor al,al # Clear al
     sal dx,1 # MSB lands in CF
     adc al,30h # Add with carry. al becomes 30h + MSB
    
  • Yet another jump could be spared by jne printMemoryValue.

answered Jan 9, 2018 at 18:55
\$\endgroup\$
2
\$\begingroup\$

As far as code formatting and as pointed out by @Sep Roland there lacks a definitive standard other than tabulating each instruction on a single line. I've gone a step further and you may have noticed how I nest my comments and group things that logically go together.

It may seem like a lot of work, but amply describe what your program is doing. I still struggle with coming up with meaningful statements, but you'd be surprised without detailed description how hard it is deciphering your code a year or two later. I've found this method too, solidifies concepts and makes debugging much easier.

; Read E820 map into a temporary buffer just above boot sector @ 7E0:0
 mov ax, BOOT_SEG + 32 ; So MSB of EAX is nullified
 mov es, ax
 mov ds, ax ; So segment overrides are not required
 xor di, di ; ES:DI = Pointer to base address of map
 mov bx, di ; Initial continuation value
 mov edx, 'PAMS' ; Function signature
 push edx
 ; Top of loop to read first or next map entry
ReadNext: inc byte [MAP_ENTR] ; Bump number of map entries = 0 first iteration.
 .skip: mov cl, 48 ; Let function call know how big entry can be.
 mov ax, 0xe820 ; System Service function number.
 int SYS_SERVICE
 ; Assert the possible error and termination conditions
 jc .done ; CY = 1 can happen in all cases
 cmp bl, 1 ; Is this the first entry
 jb .done ; If zero, no more entries
 ja .J0 ; Next code only needs happen on first iteration
 ; This need only happen on first iteration
 pop edx
 sub eax, edx ; Does BIOS even support this function
 jz .J0 - 3
 dec byte [MAP_ENTR] ; Bump value back to -1
 jmp .done
 mov [MAP_SIZE], cl ; Save actual size of entry returned by function.
 .J0: jcxz .skip ; Ignore any null length entries
 cmp cl, 20
 jbe .J1
 test byte [di + 20], 1 ; Ignore ACPI entries
 jz .skip
 ; Test 64 bit value representing length for zero
 .J1: mov eax, [di + 8] ; Get low order DWORD of length
 or eax, [di + 12] ; Determine if QWORD value is zero
 jz .skip
 ; Bump ES:DI pointer to next entry
 add di, cx
 jmp ReadNext
 .done: or di, di ; Was a map even created
 jnz .movemap
 ; Screen is completely blank now, so to indicate there was a problem with E820
 mov ax, 0xb800 ; Point to video
 mov es, ax
 mov di, 0x7CE ; Offset to vertical & horizontal center of screen
 ; This will display flashing "[ ]" in yellow with "E" between in high intensity white
 mov eax, 0xf458e5b
 stosd
 inc al
 inc al ; AL = "]"
 stosw
 push ss ; Define upper for calculating total sectors
 jmp MoveBlock ; Dont need to move map as it doesnt exist
; Move E820 entries immediately below bottom of stack frame.
.movemap: mov cx, di ; Get copy of total bytes in E820 map
 shr cx, 2 ; CX / 4 = Total DWORDS to move
 ; Only every 4th entry is segment aligned (16 bytes), so offset in DI needs to be
 ; calculated so last entry of map terminates at bottom of stack.
 mov ax, ss ; Get base of stack frame
 sub bx, di
 and bx, 15 ; BL = 0, 4, 8, 12
 jz $ + 3
 ; Because BL <> 0 segment has to be skewed by one
 dec ax ; Bump back one more segment
 ; Now offset can be saved and moved into DI
 mov [MAP_ADDR], bx ; Lower half of long pointer
 xchg di, bx ; Move offset into index
 shr bx, 4 ; BL = Total # of 16 byte segments
 sub ax, bx
 mov [MAP_ADDR+2], ax ; Upper half of long pointer
 mov es, ax ; ES:DI = Destination buffer
 push ax ; Define upper for calculating total sectors
 ; Establish source pointer and then move CX DWORDS
 xor si, si ; DS:SI = Source
 rep movsd
answered Jan 18, 2018 at 17:40
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.