I started learning assembly some days ago and I wrote my first program today. Nothing exceptional here: some user inputs, strings manipulations, integer to string conversion etc... The only purpose was to test some things. I would like to have your reviews and advices to improve my code. Also if i didn't make big mistakes.
I'm looking for advice about instructions alignment in loops for example. Except inserting nops, I don't have other idea. Or know the size of all instructions and predict the good alignment on 16 or 32 bytes boundaries.
SYS_READ equ 3
SYS_WRITE equ 4
SYS_EXIT equ 1
STDIN equ 0
STDOUT equ 1
%macro printm 2
mov eax, SYS_WRITE
mov ebx, STDOUT
mov ecx, %1
mov edx, %2
int 0x80
%endmacro
%macro readm 2
mov eax, SYS_READ
mov ebx, STDIN
mov ecx, %1
mov edx, %2
int 0x80
%endmacro
%macro prolog 0
push ebp,
mov ebp, esp
%endmacro
%macro epilog 0
mov esp, ebp
pop ebp
%endmacro
%use smartalign
section .text
global _start
_start:
push ebp
mov ebp, esp
and esp, 0xFFFFFFF0
; first check if our strlen proc works
push dword msgbegin
call strlen
add esp, byte 4
cmp eax, lenbegin
jne .exit ; it works, we continue
; after we copy the msgbegin in string strdst
mov ecx, lenbegin
mov esi, msgbegin
mov edi, strdst
rep movsb
; we print the string to check if the memcpy is good
printm strdst, lenbegin
; after we ask for user to enter two number (1 digit each)
printm msgbinp1, leninp1
readm num1, 2
printm msgbinp2, leninp2
readm num2, 2
; user input to enter a number greater than one digit
printm msgbinp3, leninp3
readm bignum, 4
; we convert this string number to an (dword) integer value
mov edx, bignum
call atoi
; we compare it with 123 to check if atoi worked
cmp eax, dword 123
jne .exit ; exit if bignum != 123
; need to strip line feed from bignum
printm bignum, 4
printm msgoutp, lenoutp
; now we compute the sum with the first two digits
mov al, byte [num1]
sub al, '0'
mov bl, byte [num2]
sub bl, '0'
add al, bl
add al, '0'
mov [sum], al
; we put the string msgres to uppercase
mov esi, msgres
.next_char:
lodsb
test al, al ; check for end of string
jz .done
cmp al, 'a' ; ignore unless in range
jl .next_char
cmp al, 'z'
jg .next_char
sub al, 0x20 ; convert to upper case
mov [esi-1], al ; write back to string
jmp .next_char
.done:
printm msgres, lenres
; we print the sum
printm sum, 1
.exit:
; exiting the programm
mov eax, SYS_EXIT
int 0x80
strlen:
push edi
mov edi, [esp + 8]
sub ecx, ecx
sub al, al
mov ecx, -1
cld
repne scasb
not ecx
mov eax, ecx ; keep null term in size
pop edi
ret
atoi:
xor eax, eax ; zero a "result so far"
.top:
movzx ecx, byte [edx] ; get a character
inc edx ; ready for next one
cmp ecx, '0' ; valid?
jb .done
cmp ecx, '9'
ja .done
sub ecx, '0' ; "convert" character to number
imul eax, 10 ; multiply "result so far" by ten
add eax, ecx ; add in current digit
jmp .top ; until done
.done:
ret
; not implemented yet
rand:
push ebp
mov ebp, esp
rdtsc
xor edx, edx ; to always fit
div dword [ebp + 8] ; range
mov eax, edx
mov esp, ebp
pop ebp
ret
section .data
msgbegin db "hello everyone !", 0xa, 0
lenbegin equ $ - msgbegin
msgbinp1 db "Enter a digit : ", 0xa, 0
leninp1 equ $ - msgbinp1
msgbinp2 db "Enter second digit : ", 0xa, 0
leninp2 equ $ - msgbinp2
msgbinp3 db "Enter third digit : ", 0xa, 0
leninp3 equ $ - msgbinp3
msgoutp db "is equal to 123 !", 0xa, 0
lenoutp equ $ - msgoutp
msgres db "sum of x and y is ", 0xa, 0
lenres equ $ - msgres
strdst times lenbegin db 0
segment .bss
sum resb 1
num1 resb 2
num2 resb 2
bignum resd 1
1 Answer 1
I think you can do better with aligning the text. You got comments that start all over the place and instead of separating the instruction from its operands with a single space character, you could adopt the more 'assembly way of doing things' and align the operands (and anything else) tabularly. See my snippets below.
The mention %use smartalign
only enables the feature of the multibyte nop
's. You still have to use the normal align 16
macro to get the desired padding.
You have included a cld
instruction in the strlen subroutine. Normally you can trust the direction flag to be clear (unless of course you yourself toggled it for some reason) and only issue a single cld
when your program begins.
Although the byte tag in add esp, byte 4
is fine in order to avoid the longer dword form that NASM would generate without optimizations, I see no reason to write cmp eax, dword 123
as the dword form would be the default here.
I find NASM is confusing in how it treats these immediate operands, so do yourself a favor and don't write these size tags but always assemble with the -O2
option.
For your strlen you used a parameter on the stack and for your atoi you used a parameter in a register. Both methods are fine and especially in the context of exploring the assembly language. However for a more serious program I would be inclined to say that the author should make their mind up.
sum resb 1 num1 resb 2 num2 resb 2 bignum resd 1
Data alignment is important. Since the .bss section is dword-aligned by default, you should have put the dword-sized bignum on top. At least that was what I first read. As it turns out, bignum is not meant to be a dword at all! It is a string that happens to hold 4 bytes. You should write this as bignum resb 4
. (It can stay put).
cmp eax, lenbegin jne .exit mov ecx, lenbegin
This is a missed opportunity to write the shortest code. If the branch wasn't taken you know that EAX is equal to lenbegin. Then best setup ECX from EAX (mov ecx, eax
). It will shave off 3 bytes.
About strlen
The preferred way to zero a register is through xor
ing the register with itself.
If you are going to load the ECX register whole with mov ecx, -1
, then it's useless to also zero ECX beforehand.
Because you chose to use repne scasb
and apparently want to follow some register convention, you had to push EDI and allow ECX to get clobbered.
Below is my version of this code using a simple loop and nothing more than the EAX register:
; IN (stack) OUT (eax)
strlen:
mov eax, [esp+4] ; Address of string
.a: cmp byte [eax], 0
lea eax, [eax+1] ; EAX++ without disturbing flags
jne .a
sub eax, [esp+4] ; Length is 'zero-terminator-included'
ret
About atoi
You can condense the part that checks the validity of the character and that converts the character to number, to just 3 instructions: sub ecx, '0'
cmp cl, 9
ja .done
.
You can speedup the part that multiplies the current result and that adds the newest digit with a couple of lea
instructions: lea eax, [eax + eax * 4]
lea eax, [ecx + eax * 2]
.
You can avoid the unconditional branch jmp .top
simply by re-arranging the code. Having jumps that execute repeatedly is costly.
; IN (edx) OUT (eax) MOD (ecx,edx)
atoi:
xor eax, eax ; Result = 0
jmp .b
.a: lea eax, [eax + eax * 4] ; Result * 10 + Digit
lea eax, [ecx + eax * 2]
.b: movzx ecx, byte [edx] ; Fetch character
inc edx
sub ecx, '0' ; Convert ["0","9"] to [0,9]
cmp cl, 10
jb .a
ret
About ucase
You have inlined the uppercase conversion. This actively harms the readability of your program. And anyway, it's a conversion that deserves its own subroutine.
The ASCII codes are unsigned quantities. It's best to use the unsigned conditional branches on them, so jb
and ja
.
Just like was the case with atoi, you can avoid the unconditional branch jmp .next_char
by re-arranging the code.
; IN (esi) OUT () MOD (eax)
ucase:
push esi
lodsb
test al, al ; Check end-of-string
jz .c
.a: cmp al, 'a' ; Ignore if not in ["a","z"]
jb .b
cmp al, 'z'
ja .b
sub al, 32 ; Convert to upper case
mov [esi-1], al ; Update string
.b: lodsb
test al, al ; Check end-of-string
jnz .a
.c: pop esi
ret
About code alignment
Consider the example of the strlen subroutine. It's not really the strlen label itself that you would want to align but rather the .a label inside because it gets repeatedly jumped to.
Method 1 - on the execution path
; IN (stack) OUT (eax)
strlen:
mov eax, [esp+4] ; Address of string
ALIGN 16 ; (*)
.a: cmp byte [eax], 0
lea eax, [eax+1] ; EAX++ without disturbing flags
jne .a
sub eax, [esp+4] ; Length is 'zero-terminator-included'
ret
(*) Because 'smartalign' was enabled, NASM will pad with some multibyte nop instruction. This nop
gets executed, and under circumstances it could even be another jmp
. Read about this in the NASM manual chapter 5 paragraph 2.
Method 2 - outside the execution path
times (15-($-$$+4+15) % 16) db 0 ; (**)
; IN (stack) OUT (eax)
strlen:
mov eax, [esp+4] ; Address of string
.a: cmp byte [eax], 0
lea eax, [eax+1] ; EAX++ without disturbing flags
jne .a
sub eax, [esp+4] ; Length is 'zero-terminator-included'
ret
(**) This times
prefix inserts just the right amount of padding bytes to have the local .a label paragraph-aligned. The number 4 in the expression is the length of the instruction mov eax, [esp+4]
. In this case there's nothing additional that gets executed.
-
\$\begingroup\$ only issue a single cld when your program begins. - the i386 System V ABI guarantees that
cld
is clear on function call/return (thus at the top ofmain
), and in the initial process environment (thus at the top of_start
). So you never needcld
in a Linux program unless your own code usedstd
somewhere. \$\endgroup\$Peter Cordes– Peter Cordes2023年04月20日 10:44:28 +00:00Commented Apr 20, 2023 at 10:44
scasb
: What is the best way to set a register to zero in x86 assembly: xor, mov or and? - usexor eax,eax
. Also,repne scasb
is fairly low-performance; SSE2 can go much faster even for small strings, especially if your buffer is known to be aligned. (Only rep movs / rep stos have "fast strings" microcode) re: atoi: some more optimization is possible with LEA, and to detect ASCII '0'..'9' inputs more efficiently as part of converting to integer 0..9: NASM Assembly convert input to integer? \$\endgroup\$perf stat -all-user -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,uops_executed.thread,idq.mite_uops
or similar to look for high counts of legacy decode (MITE uops) \$\endgroup\$