Environment
NASM is required to build this program, and DosBox is required to run it. I'd recommend installing these using the Scoop Package Manager. Feel free to ignore install statements for any programs you have installed already.
iwr -useb get.scoop.sh | iex scoop install git scoop install dosbox scoop install nasm
Building
nasm -f bin -o helper.com helper.asm
Running
Load DosBox, then mount the path where helper.com
resides to any available drive. For those unfamiliar, it can be any drive in the A-Z range.
mount H: C:\Users\T145\Desktop\
H:
dir
helper.com
helper.asm
bits 16
org 0x100
section .text
_main:
lea di, [prompt]
call putstring
lea di, [string]
call getstring
lea di, [hello]
call putstring
lea di, [string]
call putstring
mov ah, 0x4c ; standard exit code
mov al, 0
int 0x21
; no parameters
; returns a char in ax
getchar:
mov ah, 0 ; call interrupt x16 sub interrupt 0
int 0x16
mov ah, 0
ret
; takes a char to print in dx
; no return value
putchar:
mov ax, dx ; call interrupt x10 sub interrupt xE
mov ah, 0x0E
mov cx, 1
int 0x10
ret
; takes an address to write to in di
; writes to address until a newline is encountered
; returns nothing
getstring:
call getchar ; read a character
cmp ax, 13 ; dos has two ascii characters for new lines 13 then 10
je .done ; its not a 13, whew...
cmp ax, 10 ; check for 10 now
je .done ; its not a 10, whew...
mov [di], al ; write the character to the current byte
inc di ; move to the next address
mov dx, ax ; dos doesn't print as it reads like windows, let's fix that
call putchar
jmp getstring
.done:
mov dx, 13 ; write a newline for sanity
call putchar
mov dx, 10
call putchar
ret
; takes an address to write to in di
; writes to address until a newline is encountered
; returns nothing
putstring:
cmp byte [di], 0 ; see if the current byte is a null terminator
je .done ; nope keep printing
.continue:
mov dl, [di] ; grab the next character of the string
mov dh, 0 ; print it
call putchar
inc di ; move to the next character
jmp putstring
.done:
ret
section .data
prompt: db "Please enter your first name: ", 0
string: times 20 db 0
hello: db "Hello, ", 0
Output
-
\$\begingroup\$ Is there a particular reason you're using BIOS interrupts instead of DOS? \$\endgroup\$Shift_Left– Shift_Left2019年11月13日 06:41:57 +00:00Commented Nov 13, 2019 at 6:41
-
\$\begingroup\$ @Shift_Left No, not really. Idk wym by usage of DOS over BIOS interrupts tbh. This is my first time messing around w/ 16-bit assembly. I've read through this official documentation page on the topic, but that's about it. \$\endgroup\$T145– T1452019年11月13日 11:57:24 +00:00Commented Nov 13, 2019 at 11:57
-
\$\begingroup\$ This is not Win16, it is 8086 DOS. \$\endgroup\$ecm– ecm2019年12月12日 11:57:16 +00:00Commented Dec 12, 2019 at 11:57
-
\$\begingroup\$ @ecm Ty for that clarification; I'll be sure to use it in future questions relating to 16 bit NASM running on DOSBOX \$\endgroup\$T145– T1452019年12月13日 01:30:38 +00:00Commented Dec 13, 2019 at 1:30
2 Answers 2
In the face of all else the assembler assumes a 16bit flat binary, so all that is required is;
~$ nasm ?.asm -o?.com
Although not wrong, but even bits 16 is redundant. In operating system development you might use32 or use64 to utilize those instruction sets, but it would still be a flat binary file. Otherwise, the only thing that makes this type of executable unique is;
org 0x100
This establishes the entry point, so a label like main is unnecessary unless it is required to branch back to the beginning of the application.
As to the question I asked in your original post, knowing what resources you have to deal with is monumentally important. DOS provides a lot of utility that can be found here, therefore this
mov dx, Prompt
mov ah, WRITE
int DOS
replaces all of this
putstring:
cmp byte [di], 0 ; see if the current byte is a null terminator
je .done ; nope keep printing
.continue:
mov dl, [di] ; grab the next character of the string
mov dh, 0 ; print it
call putchar
inc di ; move to the next character
jmp putstring
.done:
ret
by terminating string with what DOS expects as so
Prompt db 13, 10, 13, 10, 'Please enter your first name: $'
and because CR/LF is embedded in string now, this can be eliminated.
mov dx, 13 ; write a newline for sanity
call putchar
mov dx, 10
call putchar
Input as such
; Read string from operator
mov dx, InpBuff
mov ah, READ
int DOS
; To a buffer specified with Max input of 128 chars. -1 is just a place holder
; which will be replace by the number of characters entered.
InpBuff: db 128, -1
The input is terminated with 0x0D and must be replaced with '$'. This little snippet does that.
; Terminate this input with '$'
mov bx, dx
movzx ax, byte [bx+1]
inc al
inc al
add bx, ax
mov byte [bx], '$'
replaces these
; no parameters
; returns a char in ax
getchar:
mov ah, 0 ; call interrupt x16 sub interrupt 0
int 0x16
mov ah, 0
ret
; takes an address to write to in di
; writes to address until a newline is encountered
; returns nothing
getstring:
call getchar ; read a character
cmp ax, 13 ; dos has two ascii characters for new lines 13 then 10
je .done ; its not a 13, whew...
cmp ax, 10 ; check for 10 now
je .done ; its not a 10, whew...
mov [di], al ; write the character to the current byte
inc di ; move to the next address
mov dx, ax ; dos doesn't print as it reads like windows, let's fix that
call putchar
jmp getstring
So all in all this code is almost 50% smaller (91 bytes vs 163) and only because I've utilized what DOS provides. If I was to have utilized BIOS calls, then my code would not have been that much smaller, maybe 5-10 %.
org 0x100
DOS equ 33 ; = 21H
WRITE equ 9
READ equ 10
; Display initial prompting
mov dx, Prompt
mov ah, WRITE
int DOS
; Read string from operator
mov dx, InpBuff
mov ah, READ
int DOS
; Terminate this input with '$'
mov bx, dx
movzx ax, byte [bx+1]
inc al
inc al
add bx, ax
mov byte [bx], '$'
; Display next prompting
push dx ; We will want this pointer again
mov dx, hello
mov ah, WRITE
int DOS
pop dx
inc dx ; Bump over max and actual lengths
inc dx
int DOS
ret
Prompt db 13, 10, 13, 10, 'Please enter your first name: $'
hello db 10, 10, 9, 'Hello, $'
InpBuff: db 128, -1
I changed the formatting of hello slightly just you can see the difference and experiment a little and replace 10's with 13's @ hello and watch what happens.
-
-
\$\begingroup\$ DX is pointing to the very beginning of InBuf, so it has to be incremented twice to point to the actual text. -1 is just a place holder so when I'm looking for the string or that section of the buffer in debug, it's easy to identify, otherwise it doesn't need to be there. I used inc twice has it's one byte shorter than
add dx,2
. The second byte of InBuff will be replace with the number of characters entered. \$\endgroup\$Shift_Left– Shift_Left2019年11月13日 19:02:16 +00:00Commented Nov 13, 2019 at 19:02
There's plenty to optimize here!
In NASM you get the address simply by writing mov di, prompt
. This has a shorter encoding than lea di, [prompt]
. (In MASM this would be mov di, offset prompt
giving the same benefit over the lea
form).
Instead of writing the pair mov ah, 0x4c
mov al, 0
, you could combine these in 1 instruction as mov ax, 0x4C00
. This shaves off 1 byte from the program.
Your getchar returns a byte in AX
and your putchar expects a byte in DX
. You would be better off if you used AL
and DL
. This would avoid those several mov ah, 0
and mov dh, 0
instructions.
Your putchar code uses the BIOS.Teletype function 0x0E. This function does not expect anything in the CX
register. What it does require is that you specify the displaypage in the BH
register. Simply add mov bh, 0
here. And if it's even possible that your program has to run on the graphical video mode then it would make sense to write mov bx, 0x0007
because then the color for the character is taken from the BL
register.
I see that the getstring code also checks for the linefeed code 10. No one does that. If the user presses the Enter key, you'll receive the carriage return code 13 and that's the only code that you need to check. The linefeed code only comes into play when outputting.
The pair of instructions mov [di], al
inc di
(3 bytes) can be replaced by the 1-byte instruction stosb
. Given that your program is in the .COM file format we have DS
=ES
and the direction flag is almost certainly going to be clear. Ideal for using the string primitive assembly instructions. This also means that your putstring routine could use lodsb
if you're willing to trade in DI
for SI
as the input parameter.
An interesting optimization comes from eliminating a tail call. You wrote call putchar
directly followed by ret
. This is equivalent to writing jmp putchar
. Both shorter and faster this way!
Make it better
Your getstring procedure must not allow the user to input more than 19 characters. Anything more would overflow the 20-byte buffer.
Your getstring procedure should store (in the buffer) a terminating zero when the finishing Enter key arrives. This way the buffer can be used repeatedly and not just this one time.
In assembly we want to avoid all kinds of jumping because those are more time consuming than many other instructions.
Your putstring code uses aje
and ajmp
instruction on each iteration of the loop. The code below only uses thejne
instruction on each iteration.; IN (di) putstring: jmp .first .continue: call putchar inc di ; move to the next character .first: mov al, [di] ; grab the next character of the string cmp al, 0 jne .continue ret ; IN (al) putchar: mov ah, 0x0E ; BIOS.Teletype mov bx, 0x0007 int 0x10 ret
Using
DX
as the input for putchar is a poor choice, not only becauseDL
would be enough, but especially because you need the character inAL
anyway. So why not move it there in the first place?
Be consistent
Always write your numbers the same way. You wrote mov ah, 0x4c
and also mov ah, 0x0E
.
I suggest you use capitals for the hexadecimal digits and always write as many digits as will fit in the destination. So don't write stuff like mov ah, 0xE
.
In case you're wondering why I make this suggestion. Using uppercase hexadecimal digits enhances the contrast with the lowercase 0x
prefix or lowercase h
suffix. Readability is very important in a program.
mov ah, 0x4C
mov ah, 0x0E
or
mov ah, 4Ch
mov ah, 0Eh
For many programmers function numbers are easiest recognized when expressed in hexadecimal.
You could thus write mov ah, 0x00
int 0x16
in your getchar routine.
As a final note, your labels are well chosen and the comments that you've added are all to the point. Congrats...