Linux NASM assembly linked list implementation

Question 1

Inspired by all of the lovely linked lists lately, I decided to implement one in assembly language. This code maintains two linked lists - one is a free store and the other is the active linked list. Rather than allocating memory whenever a new node is needed, the Node is pulled from the free list and all of the pointers updated. The data is then copied into the newly "allocated" Node.

This program includes a test driver which reads a series of NUL terminated strings (C strings) from memory, calculates their lengths and then puts a pointer to the string and the length into each Node until the end of list is found.

If there are not enough free nodes to hold all of the found strings, the program silently exits with an error code of 1. If there is enough and there are no other errors, the program prints each of the found strings on a separate line and exits with a status code of 0.

The output is this:

Alfred
Barbara
Carlos
Delores
Edward
Frances

My Makefile looks like this:

%.o : %.asm
 nasm -f elf64 -l foo.lst -g -F stabs -o $@ $<
% : %.o
 ld -g -o $@ $<

The Makefile is invoked as make linky.

The program, which I've place in a file named linky.asm is below:

; 64-bit linked list implementation in Linux NASM
global _start ; global entry point export for ld
MAX_NODES equ 300 ; this can easily be expanded
section .text
_start:
 ; first initialize the free node list
 mov rcx, MAX_NODES-1 
 mov rdi, [freenode]
 mov rsi, linkedlist
 mov rax, Node_size
nodeclear:
 add rsi, rax ; point to next node
 mov [rdi + Node.next], rsi
 add rdi, rax ; advance pointer
 loop nodeclear
 ; set the last link to NULL
 mov qword [rdi + Node.next], 0
 ; start out pointing to our rootnode pointer
 mov rdi, rootnode
 ; our string pointer starts at the beginning
 mov rbx, namestrings
looptop:
 ; fetch the next string
 call FetchString
 ; skip to print function if we are at end of list
 jrcxz finish
 ; insert the string into new node
 call NodeInsert
 jmp looptop
finish:
 ; print all strings in the linked list
 mov rdi, rootnode
morenodes:
 call NodePrint
 or rdi,rdi
 jnz morenodes
 ; (rdi is already 0 here) 
 ; normal exit with status 0
exit:
 mov rax, 60 ; sys_exit
 syscall
errorExit:
 mov rdi, 1 ; return 1 (error)
 jmp exit
; NodePrint:
; advances to next node and prints if non-null
; Entry:
; rdi points to memory location containing pointer
; to next node
; Exit:
; rsi points to string just printed 
; rdx is the length of the string just printed 
; rdi = 0 if we are at the end of the linked list
NodePrint:
 mov rsi, [rdi]
 mov rdi, rsi
 or rdi, rdi
 jz retnow
 mov rsi, [rdi + Node.mydata + mydata.msgptr]
 mov rdx, [rdi + Node.mydata + mydata.msglen]
 push rdi
 call printstr
 call printeol
 pop rdi
retnow:
 ret
 ; rsi = addr, rdx= len
; NodeInsert:
; creates and populates new node in linked list
; Entry:
; rdi points to memory location to which pointer
; to new node should be written
; rsi points to a string to be placed in the new Node
; rcx is the length of the string pointed to by rsi
NodeInsert:
 push rax
 push rbx
 ; steal a node from the freenode list
 ; [rootnode] = [freenode]
 mov rbx, freenode ; rbx -> freenode
 cmp qword [rbx], 0 ; Q: nodes available?
 je errorExit ; N: nope, so leave
 mov rax, [rbx] ; 
 mov qword [rdi], rax ; 
 mov rdi, rax ; rdi -> node
 ; [free] = [root.next]
 mov rax, [rdi + Node.next] 
 mov [rbx], rax
 ; [root.next] = 0
 mov qword [rdi + Node.next], 0
 pop rbx
 pop rax
 ;; fall through to NodeAddData
; NodeAddData:
; add string data to existing node
; rdi points to an existing Node
; rsi points to a string to be placed in the new Node
; rcx is the length of the string pointed to by rsi
NodeAddData:
 mov [rdi + Node.mydata + mydata.msglen], rcx
 mov [rdi + Node.mydata + mydata.msgptr], rsi
 ret
; FetchString:
; fetch NUL terminated string and calculate len
; Entry:
; rbx points to start of NUL terminated string
; Exit:
; rsi points to string start
; rcx contains length of string
; rbx points to next string (if any)
FetchString:
 push rax
 push rdi
 mov rdi, rbx ; copy pointer
 mov rcx, -1 ; max count in ecx
 mov al,0 ; look for NUL byte
 cld ; count forward
 repne scasb ; look for NUL
 not rcx ; tricky math to get len
 dec rcx
 mov rsi, rbx
 mov rbx, rdi ; recover original pointer
 pop rdi
 pop rax
 ret
printeol:
 mov rsi, eol
 mov edx, 1
printstr:
 mov rax, 1 ; sys_write
 mov rdi, 1 ; stdout
 syscall
 ret
; ==== data section ====
section .data
; this is the data that goes into a node
struc mydata
 .msglen resq 1
 .msgptr resq 1
endstruc
; this is a node of the linked list
struc Node
 .next resq 1
 .mydata resb mydata_size
endstruc 
 eol: db 0x0a
 namestrings: db 'Alfred', 0 
 db 'Barbara', 0 
 db 'Carlos', 0 
 db 'Delores', 0
 db 'Edward', 0
 db 'Frances', 0
 db 0,0
rootnode dq 0
freenode dq linkedlist
section .bss
linkedlist resb MAX_NODES * Node_size

Question 2

Just a few notes, mostly in micro-optimizations that even assembly language programmers probably ignore nowadays.

 ; steal a node from the freenode list
 ; [rootnode] = [freenode]
 mov rbx, freenode ; rbx -> freenode
 cmp qword [rbx], 0 ; Q: nodes available?
 je errorExit ; N: nope, so leave
 mov rax, [rbx] ;

To compare to 0, I prefer to use a test instruction:

mov rbx, freenode
mov rax, [rbx]
test rax, rax
jz errorExit
; ...

Likewise, in NodePrint you use an or to determine whether a value is 0:

 or rdi, rdi
 jz retnow

Again, a test is sufficient here. The difference is that or will normally affect not only the flags (which you want) but the register (which you don't). In this case, you haven't actually changed the value in the register, but some CPUs will assume you did, which can prevent some instruction level parallelism.

When you're getting ready for the scasb in FetchString, you load a 0 into AL:

 mov rcx, -1 ; max count in ecx
 mov al,0 ; look for NUL byte
 cld ; count forward
 repne scasb ; look for NUL

At least if memory serves, you'll get marginally smaller code with something like:

xor rax, rax

This zeros the entirety of rax instead of just AL, but you don't seem to need/use the rest of rax afterward. Some processors also have a partial register stall that makes essentially any operation on AL relatively slow (though I don't remember if this applies to any 64-bit processors, so it may not matter in this case).

Although it's minutely more fragile, I'd consider whether it's worth replacing this:

 loop nodeclear
 ; set the last link to NULL
 mov qword [rdi + Node.next], 0

...with code like:

loop nodeclear
mov qword [rdi+Node.next], rcx

Taking advantage of the fact that immediately after executing a loop, we know rcx contains 0.

I think it's worth noting, however, that all of these are really very minor. Overall, the code strikes me as quite clear and nicely written.

Question 3

With respect to xor rax, rax versus mov al, 0, xor rax, rax assembles to 48 31 C0, whereas mov al, 0 assembles to B0 00, so mov al, 0 is shorter by one byte.

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges145 bronze badges · Accepted Answer · 2014-04-27 00:04:04Z

Just a few notes, mostly in micro-optimizations that even assembly language programmers probably ignore nowadays.

 ; steal a node from the freenode list
 ; [rootnode] = [freenode]
 mov rbx, freenode ; rbx -> freenode
 cmp qword [rbx], 0 ; Q: nodes available?
 je errorExit ; N: nope, so leave
 mov rax, [rbx] ;

To compare to 0, I prefer to use a test instruction:

mov rbx, freenode
mov rax, [rbx]
test rax, rax
jz errorExit
; ...

Likewise, in NodePrint you use an or to determine whether a value is 0:

 or rdi, rdi
 jz retnow

Again, a test is sufficient here. The difference is that or will normally affect not only the flags (which you want) but the register (which you don't). In this case, you haven't actually changed the value in the register, but some CPUs will assume you did, which can prevent some instruction level parallelism.

When you're getting ready for the scasb in FetchString, you load a 0 into AL:

 mov rcx, -1 ; max count in ecx
 mov al,0 ; look for NUL byte
 cld ; count forward
 repne scasb ; look for NUL

At least if memory serves, you'll get marginally smaller code with something like:

xor rax, rax

This zeros the entirety of rax instead of just AL, but you don't seem to need/use the rest of rax afterward. Some processors also have a partial register stall that makes essentially any operation on AL relatively slow (though I don't remember if this applies to any 64-bit processors, so it may not matter in this case).

Although it's minutely more fragile, I'd consider whether it's worth replacing this:

 loop nodeclear
 ; set the last link to NULL
 mov qword [rdi + Node.next], 0

...with code like:

loop nodeclear
mov qword [rdi+Node.next], rcx

Taking advantage of the fact that immediately after executing a loop, we know rcx contains 0.

I think it's worth noting, however, that all of these are really very minor. Overall, the code strikes me as quite clear and nicely written.

With respect to xor rax, rax versus mov al, 0, xor rax, rax assembles to 48 31 C0, whereas mov al, 0 assembles to B0 00, so mov al, 0 is shorter by one byte.

Stack Exchange Network

Linux NASM assembly linked list implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Linux NASM assembly linked list implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions