6
\$\begingroup\$

Environment

NASM is required to build this program, and DosBox is required to run it. I'd recommend installing these using the Scoop Package Manager. Feel free to ignore install statements for any programs you have installed already.

iwr -useb get.scoop.sh | iex
scoop install git
scoop install dosbox
scoop install nasm

Building

nasm -f bin -o helper.com helper.asm

Running

Load DosBox, then mount the path where helper.com resides to any available drive. For those unfamiliar, it can be any drive in the A-Z range.

mount H: C:\Users\T145\Desktop\
H:
dir
helper.com

helper.asm

bits 16
org 0x100
section .text
_main:
 lea di, [prompt]
 call putstring
 lea di, [string]
 call getstring
 lea di, [hello]
 call putstring
 lea di, [string]
 call putstring
 mov ah, 0x4c ; standard exit code
 mov al, 0
 int 0x21
; no parameters
; returns a char in ax
getchar:
 mov ah, 0 ; call interrupt x16 sub interrupt 0
 int 0x16
 mov ah, 0
 ret
; takes a char to print in dx
; no return value
putchar:
 mov ax, dx ; call interrupt x10 sub interrupt xE
 mov ah, 0x0E
 mov cx, 1
 int 0x10
 ret
; takes an address to write to in di
; writes to address until a newline is encountered
; returns nothing
getstring:
 call getchar ; read a character
 cmp ax, 13 ; dos has two ascii characters for new lines 13 then 10
 je .done ; its not a 13, whew...
 cmp ax, 10 ; check for 10 now
 je .done ; its not a 10, whew...
 mov [di], al ; write the character to the current byte
 inc di ; move to the next address
 mov dx, ax ; dos doesn't print as it reads like windows, let's fix that
 call putchar
 jmp getstring
.done:
 mov dx, 13 ; write a newline for sanity
 call putchar
 mov dx, 10
 call putchar
 ret
; takes an address to write to in di
; writes to address until a newline is encountered
; returns nothing
putstring:
 cmp byte [di], 0 ; see if the current byte is a null terminator
 je .done ; nope keep printing
.continue:
 mov dl, [di] ; grab the next character of the string
 mov dh, 0 ; print it
 call putchar
 inc di ; move to the next character
 jmp putstring
.done:
 ret
section .data
 prompt: db "Please enter your first name: ", 0
 string: times 20 db 0
 hello: db "Hello, ", 0

Output

output

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Nov 12, 2019 at 1:05
\$\endgroup\$
4
  • \$\begingroup\$ Is there a particular reason you're using BIOS interrupts instead of DOS? \$\endgroup\$ Commented Nov 13, 2019 at 6:41
  • \$\begingroup\$ @Shift_Left No, not really. Idk wym by usage of DOS over BIOS interrupts tbh. This is my first time messing around w/ 16-bit assembly. I've read through this official documentation page on the topic, but that's about it. \$\endgroup\$ Commented Nov 13, 2019 at 11:57
  • \$\begingroup\$ This is not Win16, it is 8086 DOS. \$\endgroup\$ Commented Dec 12, 2019 at 11:57
  • \$\begingroup\$ @ecm Ty for that clarification; I'll be sure to use it in future questions relating to 16 bit NASM running on DOSBOX \$\endgroup\$ Commented Dec 13, 2019 at 1:30

2 Answers 2

2
\$\begingroup\$

In the face of all else the assembler assumes a 16bit flat binary, so all that is required is;

~$ nasm ?.asm -o?.com

Although not wrong, but even bits 16 is redundant. In operating system development you might use32 or use64 to utilize those instruction sets, but it would still be a flat binary file. Otherwise, the only thing that makes this type of executable unique is;

 org 0x100

This establishes the entry point, so a label like main is unnecessary unless it is required to branch back to the beginning of the application.

As to the question I asked in your original post, knowing what resources you have to deal with is monumentally important. DOS provides a lot of utility that can be found here, therefore this

 mov dx, Prompt
 mov ah, WRITE
 int DOS

replaces all of this

 putstring:
 cmp byte [di], 0 ; see if the current byte is a null terminator
 je .done ; nope keep printing
 .continue:
 mov dl, [di] ; grab the next character of the string
 mov dh, 0 ; print it
 call putchar
 inc di ; move to the next character
 jmp putstring
 .done:
 ret

by terminating string with what DOS expects as so

 Prompt db 13, 10, 13, 10, 'Please enter your first name: $'

and because CR/LF is embedded in string now, this can be eliminated.

 mov dx, 13 ; write a newline for sanity
 call putchar
 mov dx, 10
 call putchar

Input as such

; Read string from operator
 mov dx, InpBuff
 mov ah, READ
 int DOS
; To a buffer specified with Max input of 128 chars. -1 is just a place holder
; which will be replace by the number of characters entered.
 InpBuff: db 128, -1 

The input is terminated with 0x0D and must be replaced with '$'. This little snippet does that.

; Terminate this input with '$'
 mov bx, dx
 movzx ax, byte [bx+1]
 inc al
 inc al
 add bx, ax
 mov byte [bx], '$'

replaces these

 ; no parameters
 ; returns a char in ax
 getchar:
 mov ah, 0 ; call interrupt x16 sub interrupt 0
 int 0x16
 mov ah, 0
 ret
 ; takes an address to write to in di
 ; writes to address until a newline is encountered
 ; returns nothing
 getstring:
 call getchar ; read a character
 cmp ax, 13 ; dos has two ascii characters for new lines 13 then 10
 je .done ; its not a 13, whew...
 cmp ax, 10 ; check for 10 now
 je .done ; its not a 10, whew...
 mov [di], al ; write the character to the current byte
 inc di ; move to the next address
 mov dx, ax ; dos doesn't print as it reads like windows, let's fix that
 call putchar
 jmp getstring

So all in all this code is almost 50% smaller (91 bytes vs 163) and only because I've utilized what DOS provides. If I was to have utilized BIOS calls, then my code would not have been that much smaller, maybe 5-10 %.

 org 0x100
 DOS equ 33 ; = 21H
 WRITE equ 9
 READ equ 10
 ; Display initial prompting
 mov dx, Prompt
 mov ah, WRITE
 int DOS
 ; Read string from operator
 mov dx, InpBuff
 mov ah, READ
 int DOS
 ; Terminate this input with '$'
 mov bx, dx
 movzx ax, byte [bx+1]
 inc al
 inc al
 add bx, ax
 mov byte [bx], '$'
 ; Display next prompting
 push dx ; We will want this pointer again
 mov dx, hello
 mov ah, WRITE
 int DOS
 pop dx
 inc dx ; Bump over max and actual lengths
 inc dx
 int DOS 
 ret
 Prompt db 13, 10, 13, 10, 'Please enter your first name: $'
 hello db 10, 10, 9, 'Hello, $'
 InpBuff: db 128, -1 

I changed the formatting of hello slightly just you can see the difference and experiment a little and replace 10's with 13's @ hello and watch what happens.

answered Nov 13, 2019 at 16:02
\$\endgroup\$
2
  • \$\begingroup\$ Wow, that's amazing! Just out of curiosity, why do you inc dx twice after popping, and have -1 on the tail of InpBuff? To the last point, there's no ASCII code for it. \$\endgroup\$ Commented Nov 13, 2019 at 18:39
  • \$\begingroup\$ DX is pointing to the very beginning of InBuf, so it has to be incremented twice to point to the actual text. -1 is just a place holder so when I'm looking for the string or that section of the buffer in debug, it's easy to identify, otherwise it doesn't need to be there. I used inc twice has it's one byte shorter than add dx,2. The second byte of InBuff will be replace with the number of characters entered. \$\endgroup\$ Commented Nov 13, 2019 at 19:02
3
\$\begingroup\$

There's plenty to optimize here!

In NASM you get the address simply by writing mov di, prompt. This has a shorter encoding than lea di, [prompt]. (In MASM this would be mov di, offset prompt giving the same benefit over the lea form).

Instead of writing the pair mov ah, 0x4c mov al, 0, you could combine these in 1 instruction as mov ax, 0x4C00. This shaves off 1 byte from the program.

Your getchar returns a byte in AX and your putchar expects a byte in DX. You would be better off if you used AL and DL. This would avoid those several mov ah, 0 and mov dh, 0 instructions.

Your putchar code uses the BIOS.Teletype function 0x0E. This function does not expect anything in the CX register. What it does require is that you specify the displaypage in the BH register. Simply add mov bh, 0 here. And if it's even possible that your program has to run on the graphical video mode then it would make sense to write mov bx, 0x0007 because then the color for the character is taken from the BL register.

I see that the getstring code also checks for the linefeed code 10. No one does that. If the user presses the Enter key, you'll receive the carriage return code 13 and that's the only code that you need to check. The linefeed code only comes into play when outputting.

The pair of instructions mov [di], al inc di (3 bytes) can be replaced by the 1-byte instruction stosb. Given that your program is in the .COM file format we have DS=ES and the direction flag is almost certainly going to be clear. Ideal for using the string primitive assembly instructions. This also means that your putstring routine could use lodsb if you're willing to trade in DI for SI as the input parameter.

An interesting optimization comes from eliminating a tail call. You wrote call putchar directly followed by ret. This is equivalent to writing jmp putchar. Both shorter and faster this way!

Make it better

  • Your getstring procedure must not allow the user to input more than 19 characters. Anything more would overflow the 20-byte buffer.

  • Your getstring procedure should store (in the buffer) a terminating zero when the finishing Enter key arrives. This way the buffer can be used repeatedly and not just this one time.

  • In assembly we want to avoid all kinds of jumping because those are more time consuming than many other instructions.
    Your putstring code uses a je and a jmp instruction on each iteration of the loop. The code below only uses the jne instruction on each iteration.

    ; IN (di)
    putstring:
     jmp .first
    .continue:
     call putchar
     inc di ; move to the next character
    .first:
     mov al, [di] ; grab the next character of the string
     cmp al, 0
     jne .continue
     ret
    ; IN (al)
    putchar:
     mov ah, 0x0E ; BIOS.Teletype
     mov bx, 0x0007
     int 0x10
     ret
    

    Using DX as the input for putchar is a poor choice, not only because DL would be enough, but especially because you need the character in AL anyway. So why not move it there in the first place?

Be consistent

Always write your numbers the same way. You wrote mov ah, 0x4c and also mov ah, 0x0E.
I suggest you use capitals for the hexadecimal digits and always write as many digits as will fit in the destination. So don't write stuff like mov ah, 0xE.
In case you're wondering why I make this suggestion. Using uppercase hexadecimal digits enhances the contrast with the lowercase 0x prefix or lowercase h suffix. Readability is very important in a program.

mov ah, 0x4C
mov ah, 0x0E

or

mov ah, 4Ch
mov ah, 0Eh

For many programmers function numbers are easiest recognized when expressed in hexadecimal. You could thus write mov ah, 0x00 int 0x16 in your getchar routine.


As a final note, your labels are well chosen and the comments that you've added are all to the point. Congrats...

answered Nov 13, 2019 at 15:02
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.