I wrote cat
program in x86 NASM and I would like to know if there is anything that can be improved.
This implementation of cat
supports:
- Reading from stdin.
- Multiple arguments.
I checked few cases and program behaves exactly like original cat
command.
I was thinking if I should add labels like write
even if I wouldn't jmp
to them, so code would be cleaner.
Code:
section .text
global _start
_start:
;esi=1
mov esi, 1
;if(argc==1)
;goto open
cmp dword [esp], 1
jnz open
;else
;fd=STDIN_FILENO
mov dword [fd], 0
jmp read
open:
;fd=open(path,oflag)
mov eax, 5
mov ebx, [esp+esi*4+4]
mov ecx, 0
int 0x80
mov [fd], eax
read:
;bytes_read=read(fd,buf,BUFSIZE)
mov eax, 3
mov ebx, [fd]
mov ecx, buf
mov edx, BUFSIZE
int 0x80
;write(STDOUT_FILENO,buf,bytes_read)
mov edx, eax
mov eax, 4
mov ebx, 1
mov ecx, buf
int 0x80
;if(bytes_read!=0)
;goto read
cmp edx, 0
jnz read
;close(fd)
mov eax, 6
mov ebx, [fd]
int 0x80
;esi++
inc esi
;if(esi<argc)
cmp dword esi, [esp]
jl open
exit:
mov eax, 1
mov ebx, 0
int 0x80
section .data
BUFSIZE equ 1024
section .bss
buf resb BUFSIZE
fd resd 1
1 Answer 1
I was thinking if I should add labels like write even if I wouldn't jmp to them, so code would be cleaner.
Writing labels that aren't jumped to is fine. Provided they have meaningful names, it can help to understand the program.
My comments (not in a particular order)
;esi=1 mov esi, 1
This is a redundant comment! I already can see what the instruction does. What I can't know is what the ESI
register will be used for.
;if(argc==1) ;goto open cmp dword [esp], 1 jnz open
Here the comment does not correspond to the code. The code actually does if(argc!=1)
.
cmp edx, 0 jnz read
The more usual way to test for zero would be test edx, edx
. It has a smaller encoding and is generally a bit faster.
mov ebx, 0
Similarly, zeroing a register is best done using the xor
instruction (xor ebx, ebx
). This is both faster and shorter.
mov dword [fd], 0
Your program would not need this instruction at all, if you would move the fd variable from the .bss
section to the .data
section using fd dd 0
.
This in turn opens up an opportunity to shorter the code from:
cmp dword [esp], 1
jnz open
mov dword [fd], 0
jmp read
open:
to:
cmp dword [esp], 1
je read
open:
Also note that in conjunction with the cmp
instruction, it's better to use je
and jne
.
In conjunction with the test
instruction, it's preferable to use jz
and jnz
.
je
and jz
have an identical encoding but choosing the right instruction mnemonic better expresses what the intention of the code is.
;if(esi<argc) cmp dword esi, [esp] jl open
The argc is by its very nature an unsigned number. After all it's just a count. You should use the conditional jumps that are provided to deal with the unsigned conditions. So better use jb open
(JumpIfBelow).
Also there's no point in writing the dword
size-tag. The mention of the dword register ESI
already dictates the size.
mov edx, eax mov eax, 4 mov ebx, 1 mov ecx, buf int 0x80
I always prefer to write the function number directly above the int 0x80
instruction. That way it's immediately clear what function is getting invoked. And for perfection I even assign the registers in a sorted manner:
mov edx, eax
mov ecx, buf
mov ebx, 1
mov eax, 4
int 0x80
Putting it all together
- not being afraid to write redundant labels
- putting the comments in a separate column for readability
- shaving off one more byte by using the fact that
ESI==1
whencmp dword [esp], 1
is executed.
section .text
global _start
_start:
mov esi, 1
cmp [esp], esi ;if(argc==1) goto read
je read
open:
xor ecx, ecx ;fd=open(path,oflag)
mov ebx, [esp+esi*4+4]
mov eax, 5
int 0x80
mov [fd], eax
read:
mov edx, BUFSIZE ;bytes_read=read(fd,buf,BUFSIZE)
mov ecx, buf
mov ebx, [fd]
mov eax, 3
int 0x80
write:
mov edx, eax ;write(STDOUT_FILENO,buf,bytes_read)
mov ecx, buf
mov ebx, 1
mov eax, 4
int 0x80
more:
test edx, edx ;if(bytes_read!=0) goto read
jnz read
close:
mov ebx, [fd] ;close(fd)
mov eax, 6
int 0x80
next:
inc esi
cmp esi, [esp] ;if(esi<argc)
jb open
exit:
xor ebx, ebx
mov eax, 1
int 0x80
section .data
fd dd 0 ;fd=STDIN_FILENO
section .bss
BUFSIZE equ 1024
buf resb BUFSIZE
Concern
Nowhere in your code do you check for any errors from api calls like open
or read
. Imagine what might happen in these cases?
-
\$\begingroup\$ Thanks for pointing out these comments. Can you tell me how do you know that
test
is faster (same withxor
)? Or how can I check myself is something is faster in NASM? I forgot I can define this variable, so it will be0
at start and assign other value later. I was always assigning registers "for perfection" starting witheax
, so it looks like in most programming languages i.e. <function name>(<args>). I'm not checking for errors, cause writing this code took me bit of time and I wanted to know if I'm on right track. I will add error handling later. Thanks for that answer. \$\endgroup\$DeBos– DeBos2019年10月31日 17:05:56 +00:00Commented Oct 31, 2019 at 17:05 -
\$\begingroup\$ By the way, is there any difference if constant is defined in
.bss
section, not in.data
? \$\endgroup\$DeBos– DeBos2019年10月31日 17:07:30 +00:00Commented Oct 31, 2019 at 17:07 -
\$\begingroup\$ Can you also tell me why are you comparing
[esp]
withesi
, not just1
? Is it faster when you are using register for that? \$\endgroup\$DeBos– DeBos2019年10月31日 17:10:09 +00:00Commented Oct 31, 2019 at 17:10 -
1\$\begingroup\$ I did everything you described in answer and comments and code looks much cleaner, but for some reason executable file that is created is 2 times bigger. Is it cause I used some instruction I didn't use before or I did something wrong? \$\endgroup\$DeBos– DeBos2019年10月31日 17:30:05 +00:00Commented Oct 31, 2019 at 17:30
-
1\$\begingroup\$ @DeBos99 I could not suggest to you a tutorial mainly because I've never used one. When writing code for the x86 architecture, you should download the relevant Intel manuals from their website. You could also peek at the Intel Optimization Manual. And where NASM is concerned, I think the accompanying manual should suffice. Good hunting! \$\endgroup\$Sep Roland– Sep Roland2019年10月31日 17:58:39 +00:00Commented Oct 31, 2019 at 17:58