I wrote a stack implementation using a singly linked list in x86_64 assembly. This stack supports the usual push
/pop
operations as well as first/next for iterating over each element.
I'm looking for general feedback.
Here are the stack subroutines:
; Stack Structure
; Pointer Head
; Pointer Current
; Stack Node
; Pointer Next
; Pointer Value
; input
; void
; output
; stack or 0 on error
StackCreate:
mov rdi, 24
call malloc
test rax,rax
jz StackCreate_end
xor rdi, rdi
mov [rax], rdi
mov [rax+8], rdi
ret
StackCreate_end:
add rsp, 8
ret
; input
; stack
; output
; void
StackDestroy:
call free
ret
; input
; stack
; value
; output
; 0 on error
StackPush:
push rdi
push rsi
mov rdi, 16
call malloc
test rax,rax
jz StackPush_end
pop rsi
pop rdi
mov rdx, [rdi]
mov [rax], rdx
mov [rax+8], rsi
mov [rdi], rax
ret
StackPush_end:
add rsp, 16
ret
; input
; stack
; output
; value or 0 if empty
StackPop:
mov rsi, rdi
mov rdi, [rdi]
test rdi, rdi
jz StackPop_end
mov rax, [rdi]
mov [rsi], rax
mov rax, [rdi+8]
push rax
call free
pop rax
ret
StackPop_end:
xor rax, rax
ret
nop;e
; input
; stack
; output
; value or 0 if empty
StackFirst:
mov rax, [rdi]
test rax, rax
jz StackFirst_end
mov rsi, [rax]
mov [rdi+8], rsi
mov rax, [rax+8]
StackFirst_end:
ret
; input
; stack
; output
; value or 0 if end
StackNext:
mov rsi, [rdi+8]
test rsi, rsi
jz StackNext_end
mov rax, [rsi]
mov [rdi+8], rax
mov rax, [rsi+8]
ret
StackNext_end:
xor rax, rax
ret
Here is a driver:
[extern puts]
[extern malloc]
[extern free]
[SECTION .data]
arguments:
db "Incorrect arguments.",10,"Expected: stackme <string1> <string2> ...",0
[SECTION .text]
BITS 64
global main
wrong_arguments:
mov rdi, arguments
call puts
jmp main_end
main:
cmp rdi, 1
jle wrong_arguments
lea r12, [rsi+8]
lea r13, [rdi-1]
call StackCreate
test rax, rax
jz main_end
mov r14, rax
pushing_loop:
test r13, r13
jz pushing_done
mov rdi, r14
mov rsi, [r12]
call StackPush
add r12, 8
dec r13
jmp pushing_loop
pushing_done:
mov rdi, r14
call StackFirst
print_loop:
test rax, rax
jz print_done
mov rdi, rax
call puts
mov rdi, r14
call StackNext
jmp print_loop
print_done:
stack_delete:
mov rdi, r14
call StackPop
test rax, rax
jnz stack_delete
stack_delete_done:
mov rdi, r14
call StackDestroy
main_end:
xor eax, eax
ret
Assembling/Linking
nasm -f elf64 -l stackme.lst stackme.asm
gcc -o stackme stackme.o
1 Answer 1
The code is generally well written and easy to understand, but I have a few comments on it that could help improve it.
Eliminate "magic numbers"
In the StackCreate
routine, the first instruction is mov rdi,24
but it's not clear what 24 signifies in this context. Either a comment or a named constant or both would help with that.
Add more meaningful comments
The stack structure is not very clear from the comments. We can infer that the stack consists of nodes, but it's not easy to tell from the code or the comments.
Also, consider the user of the stack code. There isn't currently enough within the code to understand the calling sequence, register usage or return values. All of those would be useful to have in the comments. If you're intending to use a standard ABI, it would be useful to say which one.
Use rep ret
as appropriate
The gcc compiler emits rep ret
when the ret
could be the target of a jump. This seems weird (and it is!) but the reason is that the branch prediction logic on both AMD and Intel processors works better when the ret
is not the target of branch. So this means the StackFirst_end
label in your code should actually have a rep
prefix just before the ret
.
Use struc
as appropriate
Because your stack uses two structures, it would benefit from using NASM's struc/endstruc
macros. This would both make it more clear when the code is manipulating data structures and also eliminate quite a few of the "magic numbers" I mentioned earlier.
Consider refactoring StackFirst
and StackNext
Other than minor differences in specific register selection, the StackFirst
and StackNext
routines are very similar. It may be possible to combine them to eliminate some code.
Reduce use of malloc
and free
The malloc
and free
calls tend to be relatively computationally expensive, especially relative to the assembly code you've got. For that reason, it would probably make more sense to either allocate and manage a block rather than calling malloc
for every stack push, or to replace calls to malloc
and free
with some other memory allocation scheme that might be user-replaceable for speed.
Rearrange the loop to avoid unconditional jumps
The current pushing_loop
looks like this:
pushing_loop:
test r13, r13
jz pushing_done
mov rdi, r14
mov rsi, [r12]
call StackPush
add r12, 8
dec r13
jmp pushing_loop
pushing_done:
However, it could be slightly rearranged to eliminate one of the jumps:
inc r13
jmp push_test
pushing_loop:
mov rdi, r14
mov rsi, [r12]
call StackPush
add r12, 8
push_test:
dec r13
jnz pushing_loop
The unconditional jump is only made once at the top of the loop and all other iterations use only the conditional test at the bottom of the loop. There are a few other places in the code this could be applied.