Selection sort algorithm in x86_64 Yasm assembler

Question 1

I returned to study assembly language. And this is actually my first function written in Yasm. Implementing this function is a suggested project from this book. I slightly modified the pseudo code presented in that book:

input:
 an array of integers 'array'
 length of 'array' 'len'
algorithm:
 for i := 0 to len-1
 min := array[i]
 i_min := i
 for j := i+1 to len-1
 if array[j] < min then
 min := array[j]
 i_min := j
 swap array[i_min] and array[i]

NOTE: The inner loop starts from i+1 so we need the outer loop only up to len-2. However, it is inconvenient because we can't just compare a counter with a decremented variable in a single instruction (as I understand). That is why I just left the outer loop up to len-1 and seemingly it overflows but actually it is not a problem, and as a result a dummy swap (the last element with itself) is made as a last step. In the original code the inner loop starts from i (not i+1) which is not necessary, of course, but then the inner loop doesn't overflow, however, len extra operations are performed.

Sorting function

I think the code is well commented (maybe even overcommented (: ) so I won't explain it. The only thing I want to highlight is the use of registers instead of stack for local variables.

section .text
 global ssort
 ; Selection sorting algorithm
 ; Arguments:
 ; rdi : address of the array (the first element)
 ; rsi : value of the length
 ; Local variables:
 ; registers :
 ; r10 : counter for the outer loop (i)
 ; r11 : counter for the inner loop (j)
 ; r12 : min (minimal element found in the inner loop)
 ; rbx : i_min (position of min)
 ; rcx : temporary variable for swapping
 ssort:
 prologue:
 ; save registers' values
 push r12
 push rbx
 push rcx
 mov r10, 0 ; i = 0
 outer_loop:
 ; for ( i = 0; i < length; i++ )
 cmp r10, rsi ; compare i and length
 jb continue_outer_loop ; if i < length (unsigned) then continue
 jmp epilogue ; else end
 continue_outer_loop:
 mov r12, qword [rdi + (r10 * 8)] ; min = list[i]
 mov rbx, r10 ; i_min = i
 mov r11, r10 ; j = i 
 inner_loop:
 ; for( j = i+1; j < length; j++ )
 inc r11 ; j++
 cmp r11, rsi ; compare j and length 
 jb continue_inner_loop ; ( j < length (unsigned) ) conditional jump (distance limit) 
 jmp swap_elements ; ( else ) unconditional jump (no distance limit)
 continue_inner_loop:
 cmp r12, qword [rdi + (r11 * 8)] ; compare min and list[j]
 jg update_min ; if min > list[j] then update min
 jmp inner_loop ; else check next element 
 update_min:
 mov r12, qword [rdi + (r11 * 8)] ; min = list[j]
 mov rbx, r11 ; i_min = j
 jmp inner_loop
 swap_elements:
 ; swap min and list[i]
 mov rcx, qword [rdi + (r10 * 8)] ; rcx = list[i], use rcx as a temporary variable
 mov qword [rdi + (rbx * 8)], rcx ; list[i_min] = list[i]
 mov qword [rdi + (r10 * 8)], r12 ; list[i] = min
 inc r10 ; i++
 jmp outer_loop
 epilogue:
 ; restore initial registers' values
 pop rcx
 pop rbx
 pop r12
 ret

Test

I have tested the algorithm on four different arrays : random, one-element, two-element, and sorted (the labels one, two, three and four are for debugging purposes):

section .data
 list dq 4, 24, 17, 135, -4, 450, 324, 0, 3
 len dq 9
 list2 dq 1 
 len2 dq 1
 list3 dq 4, 3 
 len3 dq 2
 list4 dq -1, 0, 1, 2 
 len4 dq 4
secion .text
 global _start
 _start:
 one:
 mov rdi, list 
 mov rsi, qword [len] 
 call ssort
 two:
 mov rdi, list2 
 mov rsi, qword [len2] 
 call ssort
 three:
 mov rdi, list3 
 mov rsi, qword [len3] 
 call ssort
 four:
 mov rdi, list4 
 mov rsi, qword [len4] 
 call ssort
 _end:
 mov rax, sys_exit
 mov rdi, EXIT_SUCCESS
 syscall

What do you think?

Question 2

I understand that you've written this code staying close to the high level example, but assembly code is typically not written that way. To me at least this code is less readable than it could be.
The code that you have is of course a good starting point, but in my opinion it should not stay the final version.

A selection of improvements

To clear a register instead of using mov r10, 0, you should write xor r10d, r10d. This is both faster and shorter code.

In a snippet like:

cmp r10, rsi
jb continue_outer_loop
jmp epilogue
continue_outer_loop:

you can save yourself from writing the extra label and remove one of the jumps, if you simply reverse the condition:

cmp r10, rsi
jnb epilogue

This is something that you can apply 3 times in your code.

The only thing I want to highlight is the use of registers instead of stack for local variables.

It's certainly a good idea to use registers whenever you can, but here it makes for less readable text. Perhaps you could use the EQU directive to makes things clearer.

i equ r10 ; counter for the outer loop
j equ r11 ; counter for the inner loop
min equ r12 ; minimal element found in the inner loop
i_min equ rbx ; position of min
temp equ rcx ; temporary variable for swapping

I agree that you've slightly over-commented the source. Some comments were redundant.

mov r12, qword [rdi + (r10 * 8)] ; min = list[i]

I don't know YASM, but I think you can drop the qword tag in many instructions where the size is clear from the other operands:

mov r12, [rdi + (r10 * 8)] ; min = list[i]

r12 is a qword so the mention of the tag is redundant.

My rewrite, more the assembly way

See what you do with the EQU idea!

ssort:
 push r12
 push rbx
 push rcx
 xor r10d, r10d ; i = 0
 outer_loop: ; for ( i = 0; i < length; i++ )
 cmp r10, rsi ; compare i and length
 jnb epilogue ; if i >= length (unsigned) thenend
 mov r12, [rdi + (r10 * 8)] ; min = list[i]
 mov rbx, r10 ; i_min = i
 mov r11, r10 ; j = i 
 inner_loop: ; for( j = i+1; j < length; j++ )
 inc r11 ; j++
 cmp r11, rsi ; compare j and length 
 jnb swap_elements ; ( j >= length (unsigned) ) unconditional jump (no distance limit)
 cmp r12, [rdi + (r11 * 8)] ; compare min and list[j]
 jng inner_loop ; if min <= list[j] then check next element 
 mov r12, [rdi + (r11 * 8)] ; min = list[j]
 mov rbx, r11 ; i_min = j
 jmp inner_loop
 swap_elements: ; swap min and list[i]
 mov rcx, [rdi + (r10 * 8)] ; rcx = list[i], use rcx as a temporary variable
 mov [rdi + (rbx * 8)], rcx ; list[i_min] = list[i]
 mov [rdi + (r10 * 8)], r12 ; list[i] = min
 inc r10 ; i++
 jmp outer_loop
 epilogue:
 pop rcx
 pop rbx
 pop r12
 ret

Question 3

OK. Thank you for answer. One comment per a recommendation. About jump instructions. I do it in that way because of (from the book I referred to) "...the target label must be within +/- 128 bytes from the conditional jump instruction" (see comments in the code also). I think that way is safer. Though, probably the linker would throw an error if there was a problem. I mean I did it on purpose.

Question 4

@LRDPRDX That's a very old recommendation dating back to the era of the 8086 pc. Since then we have the full range of conditional jumps that can jump all the distance you need. I don't think it's safer using that trick. A good compiler/assembler will choose the optimal form for you automatically, using the shortest encoding possible.

Sep Roland Sep Roland 4,78317 silver badges28 bronze badges · Answer 1 · 2020-06-14 15:49:22Z

I understand that you've written this code staying close to the high level example, but assembly code is typically not written that way. To me at least this code is less readable than it could be.
The code that you have is of course a good starting point, but in my opinion it should not stay the final version.

A selection of improvements

To clear a register instead of using mov r10, 0, you should write xor r10d, r10d. This is both faster and shorter code.

In a snippet like:

cmp r10, rsi
jb continue_outer_loop
jmp epilogue
continue_outer_loop:

you can save yourself from writing the extra label and remove one of the jumps, if you simply reverse the condition:

cmp r10, rsi
jnb epilogue

This is something that you can apply 3 times in your code.

The only thing I want to highlight is the use of registers instead of stack for local variables.

It's certainly a good idea to use registers whenever you can, but here it makes for less readable text. Perhaps you could use the EQU directive to makes things clearer.

i equ r10 ; counter for the outer loop
j equ r11 ; counter for the inner loop
min equ r12 ; minimal element found in the inner loop
i_min equ rbx ; position of min
temp equ rcx ; temporary variable for swapping

I agree that you've slightly over-commented the source. Some comments were redundant.

mov r12, qword [rdi + (r10 * 8)] ; min = list[i]

I don't know YASM, but I think you can drop the qword tag in many instructions where the size is clear from the other operands:

mov r12, [rdi + (r10 * 8)] ; min = list[i]

r12 is a qword so the mention of the tag is redundant.

My rewrite, more the assembly way

See what you do with the EQU idea!

ssort:
 push r12
 push rbx
 push rcx
 xor r10d, r10d ; i = 0
 outer_loop: ; for ( i = 0; i < length; i++ )
 cmp r10, rsi ; compare i and length
 jnb epilogue ; if i >= length (unsigned) thenend
 mov r12, [rdi + (r10 * 8)] ; min = list[i]
 mov rbx, r10 ; i_min = i
 mov r11, r10 ; j = i 
 inner_loop: ; for( j = i+1; j < length; j++ )
 inc r11 ; j++
 cmp r11, rsi ; compare j and length 
 jnb swap_elements ; ( j >= length (unsigned) ) unconditional jump (no distance limit)
 cmp r12, [rdi + (r11 * 8)] ; compare min and list[j]
 jng inner_loop ; if min <= list[j] then check next element 
 mov r12, [rdi + (r11 * 8)] ; min = list[j]
 mov rbx, r11 ; i_min = j
 jmp inner_loop
 swap_elements: ; swap min and list[i]
 mov rcx, [rdi + (r10 * 8)] ; rcx = list[i], use rcx as a temporary variable
 mov [rdi + (rbx * 8)], rcx ; list[i_min] = list[i]
 mov [rdi + (r10 * 8)], r12 ; list[i] = min
 inc r10 ; i++
 jmp outer_loop
 epilogue:
 pop rcx
 pop rbx
 pop r12
 ret

OK. Thank you for answer. One comment per a recommendation. About jump instructions. I do it in that way because of (from the book I referred to) "...the target label must be within +/- 128 bytes from the conditional jump instruction" (see comments in the code also). I think that way is safer. Though, probably the linker would throw an error if there was a problem. I mean I did it on purpose.
@LRDPRDX That's a very old recommendation dating back to the era of the 8086 pc. Since then we have the full range of conditional jumps that can jump all the distance you need. I don't think it's safer using that trick. A good compiler/assembler will choose the optimal form for you automatically, using the shortest encoding possible.

Stack Exchange Network

Selection sort algorithm in x86_64 Yasm assembler

Sorting function

Test

1 Answer 1

A selection of improvements

My rewrite, more the assembly way

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Selection sort algorithm in x86_64 Yasm assembler

Sorting function

Test

1 Answer 1

A selection of improvements

My rewrite, more the assembly way

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions