First assembly program: working with strings and integers

Question 1

I started learning assembly some days ago and I wrote my first program today. Nothing exceptional here: some user inputs, strings manipulations, integer to string conversion etc... The only purpose was to test some things. I would like to have your reviews and advices to improve my code. Also if i didn't make big mistakes.

I'm looking for advice about instructions alignment in loops for example. Except inserting nops, I don't have other idea. Or know the size of all instructions and predict the good alignment on 16 or 32 bytes boundaries.

SYS_READ equ 3
SYS_WRITE equ 4
SYS_EXIT equ 1
STDIN equ 0
STDOUT equ 1
%macro printm 2
 mov eax, SYS_WRITE
 mov ebx, STDOUT
 mov ecx, %1
 mov edx, %2
 int 0x80
%endmacro
%macro readm 2
 mov eax, SYS_READ
 mov ebx, STDIN
 mov ecx, %1
 mov edx, %2
 int 0x80
%endmacro
%macro prolog 0
 push ebp,
 mov ebp, esp
%endmacro
%macro epilog 0 
 mov esp, ebp
 pop ebp
%endmacro
%use smartalign
section .text
global _start
_start:
 push ebp
 mov ebp, esp
 and esp, 0xFFFFFFF0
 
 ; first check if our strlen proc works
 push dword msgbegin
 call strlen
 add esp, byte 4
 cmp eax, lenbegin
 jne .exit ; it works, we continue
 
 ; after we copy the msgbegin in string strdst
 mov ecx, lenbegin
 mov esi, msgbegin
 mov edi, strdst
 rep movsb
 ; we print the string to check if the memcpy is good 
 printm strdst, lenbegin
 ; after we ask for user to enter two number (1 digit each)
 printm msgbinp1, leninp1
 readm num1, 2
 printm msgbinp2, leninp2
 readm num2, 2
 ; user input to enter a number greater than one digit
 printm msgbinp3, leninp3
 readm bignum, 4
 ; we convert this string number to an (dword) integer value
 mov edx, bignum
 call atoi
 ; we compare it with 123 to check if atoi worked
 cmp eax, dword 123
 jne .exit ; exit if bignum != 123
 ; need to strip line feed from bignum
 printm bignum, 4
 printm msgoutp, lenoutp
 ; now we compute the sum with the first two digits
 mov al, byte [num1]
 sub al, '0'
 mov bl, byte [num2]
 sub bl, '0'
 add al, bl
 add al, '0'
 mov [sum], al
 
 ; we put the string msgres to uppercase
 mov esi, msgres
.next_char:
 lodsb
 test al, al ; check for end of string
 jz .done 
 cmp al, 'a' ; ignore unless in range
 jl .next_char
 cmp al, 'z'
 jg .next_char
 sub al, 0x20 ; convert to upper case 
 mov [esi-1], al ; write back to string
 jmp .next_char
.done:
 printm msgres, lenres
 ; we print the sum
 printm sum, 1
.exit: 
 ; exiting the programm
 mov eax, SYS_EXIT
 int 0x80
strlen:
 push edi
 mov edi, [esp + 8]
 sub ecx, ecx
 sub al, al
 mov ecx, -1
 cld
 repne scasb
 not ecx
 mov eax, ecx ; keep null term in size
 pop edi
 ret
atoi:
 xor eax, eax ; zero a "result so far"
.top:
 movzx ecx, byte [edx] ; get a character
 inc edx ; ready for next one
 cmp ecx, '0' ; valid?
 jb .done
 cmp ecx, '9'
 ja .done
 sub ecx, '0' ; "convert" character to number
 imul eax, 10 ; multiply "result so far" by ten
 add eax, ecx ; add in current digit
 jmp .top ; until done
.done:
 ret
 ; not implemented yet
rand:
 push ebp
 mov ebp, esp
 rdtsc
 xor edx, edx ; to always fit
 div dword [ebp + 8] ; range
 mov eax, edx
 mov esp, ebp
 pop ebp
 ret 
section .data
 msgbegin db "hello everyone !", 0xa, 0
 lenbegin equ $ - msgbegin
 msgbinp1 db "Enter a digit : ", 0xa, 0
 leninp1 equ $ - msgbinp1
 msgbinp2 db "Enter second digit : ", 0xa, 0
 leninp2 equ $ - msgbinp2
 msgbinp3 db "Enter third digit : ", 0xa, 0
 leninp3 equ $ - msgbinp3
 msgoutp db "is equal to 123 !", 0xa, 0
 lenoutp equ $ - msgoutp
 msgres db "sum of x and y is ", 0xa, 0
 lenres equ $ - msgres
 strdst times lenbegin db 0
segment .bss
 sum resb 1
 num1 resb 2
 num2 resb 2
 bignum resd 1

Question 2

Welcome to Code Review! I see that it prints messages like "hello everyone !", "Enter a digit :" and "Enter second digit :" but It would benefit reviewers to have a bit more information about what the code achieves in the description. From the help center page How to ask: "You will get more insightful reviews if you not only provide your code, but also give an explanation of what it does. The more detail, the better."

Question 3

I edited my code with more comments. It implements strlen, print, user input, a toupper sequence. I hope it would help. I would like to have reviews about : mistakes and major errors and code improvisation. For example, in my atoi function, I have to use imul for the conversion. I know that imul is really time consuming. Maybe it would be faster with bitshift instructions.

Question 4

re: zeroing AL for scasb: What is the best way to set a register to zero in x86 assembly: xor, mov or and? - use xor eax,eax. Also, repne scasb is fairly low-performance; SSE2 can go much faster even for small strings, especially if your buffer is known to be aligned. (Only rep movs / rep stos have "fast strings" microcode) re: atoi: some more optimization is possible with LEA, and to detect ASCII '0'..'9' inputs more efficiently as part of converting to integer 0..9: NASM Assembly convert input to integer?

Question 5

Code alignment often isn't a huge deal these days, with CPUs running from a uop cache. The JCC erratum performance pothole for Skylake CPUs is a bigger deal for code alignment: How can I mitigate the impact of the Intel jcc erratum on gcc?. Usually just assemble and look at the output to see if there's anything problematic, and/or use

perf stat -all-user -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,uops_executed.thread,idq.mite_uops

or similar to look for high counts of legacy decode (MITE uops)

Question 6

I think you can do better with aligning the text. You got comments that start all over the place and instead of separating the instruction from its operands with a single space character, you could adopt the more 'assembly way of doing things' and align the operands (and anything else) tabularly. See my snippets below.

The mention %use smartalign only enables the feature of the multibyte nop's. You still have to use the normal align 16 macro to get the desired padding.

You have included a cld instruction in the strlen subroutine. Normally you can trust the direction flag to be clear (unless of course you yourself toggled it for some reason) and only issue a single cld when your program begins.

Although the byte tag in add esp, byte 4 is fine in order to avoid the longer dword form that NASM would generate without optimizations, I see no reason to write cmp eax, dword 123 as the dword form would be the default here.
I find NASM is confusing in how it treats these immediate operands, so do yourself a favor and don't write these size tags but always assemble with the -O2 option.

For your strlen you used a parameter on the stack and for your atoi you used a parameter in a register. Both methods are fine and especially in the context of exploring the assembly language. However for a more serious program I would be inclined to say that the author should make their mind up.

sum resb 1
num1 resb 2
num2 resb 2
bignum resd 1

Data alignment is important. Since the .bss section is dword-aligned by default, you should have put the dword-sized bignum on top. At least that was what I first read. As it turns out, bignum is not meant to be a dword at all! It is a string that happens to hold 4 bytes. You should write this as bignum resb 4. (It can stay put).

cmp eax, lenbegin
jne .exit
mov ecx, lenbegin

This is a missed opportunity to write the shortest code. If the branch wasn't taken you know that EAX is equal to lenbegin. Then best setup ECX from EAX (mov ecx, eax). It will shave off 3 bytes.

About strlen

The preferred way to zero a register is through xoring the register with itself.
If you are going to load the ECX register whole with mov ecx, -1, then it's useless to also zero ECX beforehand.
Because you chose to use repne scasb and apparently want to follow some register convention, you had to push EDI and allow ECX to get clobbered.
Below is my version of this code using a simple loop and nothing more than the EAX register:

; IN (stack) OUT (eax)
strlen:
 mov eax, [esp+4] ; Address of string
.a: cmp byte [eax], 0
 lea eax, [eax+1] ; EAX++ without disturbing flags
 jne .a
 sub eax, [esp+4] ; Length is 'zero-terminator-included'
 ret

About atoi

You can condense the part that checks the validity of the character and that converts the character to number, to just 3 instructions: sub ecx, '0' cmp cl, 9 ja .done.
You can speedup the part that multiplies the current result and that adds the newest digit with a couple of lea instructions: lea eax, [eax + eax * 4] lea eax, [ecx + eax * 2].
You can avoid the unconditional branch jmp .top simply by re-arranging the code. Having jumps that execute repeatedly is costly.

; IN (edx) OUT (eax) MOD (ecx,edx)
atoi:
 xor eax, eax ; Result = 0
 jmp .b
.a: lea eax, [eax + eax * 4] ; Result * 10 + Digit
 lea eax, [ecx + eax * 2]
.b: movzx ecx, byte [edx] ; Fetch character
 inc edx
 sub ecx, '0' ; Convert ["0","9"] to [0,9]
 cmp cl, 10
 jb .a
 ret

About ucase

You have inlined the uppercase conversion. This actively harms the readability of your program. And anyway, it's a conversion that deserves its own subroutine.
The ASCII codes are unsigned quantities. It's best to use the unsigned conditional branches on them, so jb and ja.
Just like was the case with atoi, you can avoid the unconditional branch jmp .next_char by re-arranging the code.

; IN (esi) OUT () MOD (eax)
ucase:
 push esi
 lodsb
 test al, al ; Check end-of-string
 jz .c 
.a: cmp al, 'a' ; Ignore if not in ["a","z"]
 jb .b
 cmp al, 'z'
 ja .b
 sub al, 32 ; Convert to upper case 
 mov [esi-1], al ; Update string
.b: lodsb
 test al, al ; Check end-of-string
 jnz .a
.c: pop esi
 ret

About code alignment

Consider the example of the strlen subroutine. It's not really the strlen label itself that you would want to align but rather the .a label inside because it gets repeatedly jumped to.

Method 1 - on the execution path

; IN (stack) OUT (eax)
strlen:
 mov eax, [esp+4] ; Address of string
 ALIGN 16 ; (*)
.a: cmp byte [eax], 0
 lea eax, [eax+1] ; EAX++ without disturbing flags
 jne .a
 sub eax, [esp+4] ; Length is 'zero-terminator-included'
 ret

(*) Because 'smartalign' was enabled, NASM will pad with some multibyte nop instruction. This nop gets executed, and under circumstances it could even be another jmp. Read about this in the NASM manual chapter 5 paragraph 2.

Method 2 - outside the execution path

 times (15-($-$$+4+15) % 16) db 0 ; (**)
; IN (stack) OUT (eax)
strlen:
 mov eax, [esp+4] ; Address of string
.a: cmp byte [eax], 0
 lea eax, [eax+1] ; EAX++ without disturbing flags
 jne .a
 sub eax, [esp+4] ; Length is 'zero-terminator-included'
 ret

(**) This times prefix inserts just the right amount of padding bytes to have the local .a label paragraph-aligned. The number 4 in the expression is the length of the instruction mov eax, [esp+4]. In this case there's nothing additional that gets executed.

Question 7

only issue a single cld when your program begins. - the i386 System V ABI guarantees that cld is clear on function call/return (thus at the top of main), and in the initial process environment (thus at the top of _start). So you never need cld in a Linux program unless your own code used std somewhere.

Sep Roland Sep Roland 4,78317 silver badges28 bronze badges · Answer 1 · 2022-07-01 18:21:47Z

I think you can do better with aligning the text. You got comments that start all over the place and instead of separating the instruction from its operands with a single space character, you could adopt the more 'assembly way of doing things' and align the operands (and anything else) tabularly. See my snippets below.

The mention %use smartalign only enables the feature of the multibyte nop's. You still have to use the normal align 16 macro to get the desired padding.

You have included a cld instruction in the strlen subroutine. Normally you can trust the direction flag to be clear (unless of course you yourself toggled it for some reason) and only issue a single cld when your program begins.

Although the byte tag in add esp, byte 4 is fine in order to avoid the longer dword form that NASM would generate without optimizations, I see no reason to write cmp eax, dword 123 as the dword form would be the default here.
I find NASM is confusing in how it treats these immediate operands, so do yourself a favor and don't write these size tags but always assemble with the -O2 option.

For your strlen you used a parameter on the stack and for your atoi you used a parameter in a register. Both methods are fine and especially in the context of exploring the assembly language. However for a more serious program I would be inclined to say that the author should make their mind up.

sum resb 1
num1 resb 2
num2 resb 2
bignum resd 1

Data alignment is important. Since the .bss section is dword-aligned by default, you should have put the dword-sized bignum on top. At least that was what I first read. As it turns out, bignum is not meant to be a dword at all! It is a string that happens to hold 4 bytes. You should write this as bignum resb 4. (It can stay put).

cmp eax, lenbegin
jne .exit
mov ecx, lenbegin

This is a missed opportunity to write the shortest code. If the branch wasn't taken you know that EAX is equal to lenbegin. Then best setup ECX from EAX (mov ecx, eax). It will shave off 3 bytes.

About strlen

The preferred way to zero a register is through xoring the register with itself.
If you are going to load the ECX register whole with mov ecx, -1, then it's useless to also zero ECX beforehand.
Because you chose to use repne scasb and apparently want to follow some register convention, you had to push EDI and allow ECX to get clobbered.
Below is my version of this code using a simple loop and nothing more than the EAX register:

; IN (stack) OUT (eax)
strlen:
 mov eax, [esp+4] ; Address of string
.a: cmp byte [eax], 0
 lea eax, [eax+1] ; EAX++ without disturbing flags
 jne .a
 sub eax, [esp+4] ; Length is 'zero-terminator-included'
 ret

About atoi

You can condense the part that checks the validity of the character and that converts the character to number, to just 3 instructions: sub ecx, '0' cmp cl, 9 ja .done.
You can speedup the part that multiplies the current result and that adds the newest digit with a couple of lea instructions: lea eax, [eax + eax * 4] lea eax, [ecx + eax * 2].
You can avoid the unconditional branch jmp .top simply by re-arranging the code. Having jumps that execute repeatedly is costly.

; IN (edx) OUT (eax) MOD (ecx,edx)
atoi:
 xor eax, eax ; Result = 0
 jmp .b
.a: lea eax, [eax + eax * 4] ; Result * 10 + Digit
 lea eax, [ecx + eax * 2]
.b: movzx ecx, byte [edx] ; Fetch character
 inc edx
 sub ecx, '0' ; Convert ["0","9"] to [0,9]
 cmp cl, 10
 jb .a
 ret

About ucase

You have inlined the uppercase conversion. This actively harms the readability of your program. And anyway, it's a conversion that deserves its own subroutine.
The ASCII codes are unsigned quantities. It's best to use the unsigned conditional branches on them, so jb and ja.
Just like was the case with atoi, you can avoid the unconditional branch jmp .next_char by re-arranging the code.

; IN (esi) OUT () MOD (eax)
ucase:
 push esi
 lodsb
 test al, al ; Check end-of-string
 jz .c 
.a: cmp al, 'a' ; Ignore if not in ["a","z"]
 jb .b
 cmp al, 'z'
 ja .b
 sub al, 32 ; Convert to upper case 
 mov [esi-1], al ; Update string
.b: lodsb
 test al, al ; Check end-of-string
 jnz .a
.c: pop esi
 ret

About code alignment

Consider the example of the strlen subroutine. It's not really the strlen label itself that you would want to align but rather the .a label inside because it gets repeatedly jumped to.

Method 1 - on the execution path

; IN (stack) OUT (eax)
strlen:
 mov eax, [esp+4] ; Address of string
 ALIGN 16 ; (*)
.a: cmp byte [eax], 0
 lea eax, [eax+1] ; EAX++ without disturbing flags
 jne .a
 sub eax, [esp+4] ; Length is 'zero-terminator-included'
 ret

(*) Because 'smartalign' was enabled, NASM will pad with some multibyte nop instruction. This nop gets executed, and under circumstances it could even be another jmp. Read about this in the NASM manual chapter 5 paragraph 2.

Method 2 - outside the execution path

 times (15-($-$$+4+15) % 16) db 0 ; (**)
; IN (stack) OUT (eax)
strlen:
 mov eax, [esp+4] ; Address of string
.a: cmp byte [eax], 0
 lea eax, [eax+1] ; EAX++ without disturbing flags
 jne .a
 sub eax, [esp+4] ; Length is 'zero-terminator-included'
 ret

(**) This times prefix inserts just the right amount of padding bytes to have the local .a label paragraph-aligned. The number 4 in the expression is the length of the instruction mov eax, [esp+4]. In this case there's nothing additional that gets executed.

only issue a single cld when your program begins. - the i386 System V ABI guarantees that cld is clear on function call/return (thus at the top of main), and in the initial process environment (thus at the top of _start). So you never need cld in a Linux program unless your own code used std somewhere.

Stack Exchange Network

First assembly program: working with strings and integers

1 Answer 1

About strlen

About atoi

About ucase

About code alignment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

First assembly program: working with strings and integers

1 Answer 1

About strlen

About atoi

About ucase

About code alignment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions