Finding the maximum of a given list of data in GNU Assembly x86 (32 bit)

Question 1

I am following the book Programming from the Ground Up and as an answer to a question in the Use the Concepts section of chapter 4:

Convert the maximum program given in the Section called Finding a Maximum Value in Chapter 3 so that it is a function which takes a pointer to several values and returns their maximum. Write a program that calls maximum with 3 different lists, and returns the result of the last one as the program’s exit status code.

I wrote the following code:

maxfunc.s

# NAME: maxfunc.s
# PURPOSE: A modular approach of finding the maximum of lists
 .section .data
 # The data section has three lists for testing, change the code
 # highlighted in the text section to change the list passed.
data_1:
 .long 1, 2, 3, 4, 5, 6, 7, 8, 9, 0
data_2:
 .long 23, 45, 62, 13, 87, 54, 0
data_3:
 .long 1, 2, 3, 3, 3, 7, 6, 5, 8, 1, 1, 0
# VARIABLES:
# 1. %edi - Will be our index variable, to access the items
# 2. %ebx - The element we're looking at currently
# 3. %eax - The maximum value we've found so far
# 4. %ecx - To store the address of the list
 .section .text
 .globl _start
_start:
 # Push the address of the list we want to find the maximum of
 pushl $data_3
 # ^
 # +---> Change this to select your list
 call maximum
 addl 4,ドル %esp # Reset the stack
 movl %eax, %ebx
 movl 1,ドル %eax
 int 0ドルx80
 .type maximum, @function
maximum:
 # Setup
 pushl %ebp
 movl %esp, %ebp
 # The initial setup:
 # Get the address of the list
 # Set the index to 0
 # Get the first item from the list
 # The first item is currently the largest
 movl 0,ドル %edi
 movl 8(%ebp), %ecx
 movl (%ecx, %edi, 4), %ebx
 movl %ebx, %eax
max_loop:
 cmpl 0,ドル %ebx
 je exit_loop
 incl %edi
 movl (%ecx, %edi, 4), %ebx
 cmpl %eax, %ebx
 jle max_loop
 # %ebx is greater than %eax, therefore, update max
 movl %ebx, %eax
 jmp max_loop
exit_loop:
 # Tear down
 movl %ebp, %esp
 popl %ebp
 ret

It compiles and works as intended. I'm compiling it as a 32 bit binary on my 64 bit machine, so that I can follow the book.

I'd like suggestions on the proper use of conventions, and everything in general.

Question 2

Looking at Rafael's answer, there are a few things I would change:

To see if a register is zero, I'd use the test instruction rather than cmp. This is slightly more efficient.
I'd avoid using the index register and increment ebx directly. I believe this produces (slightly) faster code (I know, not really a consideration here, but a habit worth acquiring). This also 'saves' a register that can be used for other purposes, doesn't have to be preserved, etc.
I usually try to structure my loops to avoid a double jump (jmp max_loop; je exit_loop).

Warning: I don't have a setup to test this, I'm composing this in the browser. But I'd picture something more like:

# maximum - Find the maximum value of the integers in a list
# A value of zero marks the end of the list.
#
# On input:
# ebx - pointer to list
#
# On output:
# eax - largest value in list
# ebx - scratch (points to last entry in list, but don't depend on this)
# edx - scratch (technically zero, but don't depend on this)
maximum:
 movl (%ebx), %eax # Set return register to first value
 test %eax, %eax # Check for end of list marker
 jz exit_loop
 lea 4(%ebx), %ebx # Increment pointer
 movl (%ebx), %edx # Read next value
max_loop:
 lea 4(%ebx), %ebx # Increment pointer
 cmp %edx, %eax # Check for greater value
 cmovg %edx, %eax # Update return if needed
 movl (%ebx), %edx # Read next value
 test %edx, %edx # Check for end of list marker
 jnz max_loop
exit_loop:
 ret

Note the heavier use of comments. This is a style thing, but I believe it makes the code MUCH easier to read quickly. And if there is a flaw (which there probably is, since I haven't even assembled this, let alone tested it), at least the intent of the code is clear. This makes things easier for the guy who (years from now) needs to maintain this code. At least he can see what you meant to do.

I acknowledge that the extra code outside my loop makes things a bit clunkier. But as a general rule, I'm willing to trade some "one-time clunky" for a more efficient loop.

And you can see that this different arrangement lets the flow just drop out of the bottom of the loop when the end of the list is encountered (ie there is only 1 jump inside the loop).

I should also mention that I use the lea to increment ebx (rather than add) since it doesn't affect the flags. Presumably this allows the cpu to optimize things a bit more since the instructions will 'fight' less, but who knows for sure anymore.

As for why I describe ebx and edx as 'scratch': Imagine this code is actually being used somewhere. Now, some day somebody figures out a more efficient way to calculate this that no longer sets these exact values in those specific registers. But they can't know for sure whether some code somewhere might be depending on this, so they would have to 'force' them to still have the same value.

I would normally just doc them as 'scratch,' but for learning purposes, I included the extra text.

Question 3

"..., but who knows for sure anymore." So true!

Question 4

cmp %edx, %eax cmovg %edx, %eax Since greater means that the DST (%eax) is greater than the SRC (%edx), shouldn't the conditional move be cmovng ?

Question 5

great optimizations

Question 6

In general I try to stick to the naming conventions of the registers themselves. Although, I have been criticized for it because many programmers think of these conventions as archaic. But I think it helps reading the code.

General-Purpose Registers (GPR) - 16-bit naming conventions[edit] The 8 GPRs are:

Accumulator register (AX). Used in arithmetic operations.

Counter register (CX). Used in shift/rotate instructions and loops.

Data register (DX). Used in arithmetic operations and I/O operations.

Base register (BX). Used as a pointer to data (located in segment register DS, when in segmented mode).

Stack Pointer register (SP). Pointer to the top of the stack.

Stack Base Pointer register (BP). Used to point to the base of the stack.

Source Index register (SI). Used as a pointer to a source in stream operations.

Destination Index register (DI). Used as a pointer to a destination in stream operations.

Cited from: https://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture

.section .data
data_1:
 .long 1, 3, 4, 5, 6, 7, 8, 9, 0
data_2:
 .long 23, 45, 62, 13, 87, 54, 0
data_3:
 .long 1, 3, 7, 6, 5, 8, 1, 1, 0
.section .text
.globl _start
.type maximum, @function
.type exit, @function
_start:
 movl $data_3, %ebx
 call maximum
exit: 
 movl %eax, %ebx
 movl 1,ドル %eax
 int 0ドルx80
maximum:
 movl (%ebx), %edx # notice i did not index address here.
 movl %edx, %eax
 movl 1,ドル %esi
max_loop:
 cmpl 0,ドル %edx
 je exit_loop
 movl (%ebx, %esi, 4), %edx
 cmpl %edx, %eax
 cmovng %edx, %eax
 incl %esi
 jmp max_loop
exit_loop:
 ret

Instead of passing the args on the stack, I opted to pass the address in the base register. AMD64 defines way different calling conventions than the original 32bit stuff. If this were a non trivial program we'd definitely want to adhere to the sysv calling conventions which define which registers may be overwritten or must be preserved between calls, stack alignment, etc. In the maximum function we could use a cmov__ instruction if we wanted to conditionally move the data register to the accumulator (current max) if it was greater.

The next book to get if you want to learn a great deal more about assembly AFTER programming from the ground up is Professional Assembly Language.

Linus Torvalds has criticized cmov instructions because of the out of order execution nature of modern processors. Which the Professional Assembly Language book explains in detail.

http://yarchive.net/comp/linux/cmov.html

Question 7

Welcome to Code Review! Glad I brought some one on board. :-) And nice answer. I suppose the movq is necessary because we're passing the address of a stream of longs? Or is it because long is 64 bits?

Question 8

that was a mistake :)

Question 9

I believe you can omit the size specifiers on the instructions as long as the assembler can figure it. Usually you only have to specify sizes when you are moving to / from memory. The register names themselves specify size. I hate seeing the size specifiers when they are not needed...I find that it obscures the instruction names.

Question 10

I suppose mov and add do seem more readable compared to movl and addl. I checked it and it works. That's a relief tbh.

Question 11

cmp %edx, %eax cmovg %edx, %eax Since greater means that the DST (%eax) is greater than the SRC (%edx), shouldn't the conditional move be cmovng ?

Question 12

David, nice changes. I don't disagree with anything in your answer.

I was curious about how an optimizing compiler like GCC would handle our function. So I wrote a test C program and output it with gcc -O2 max.c -S.

max.c

int findmax(int num[])
{
 int max = *num;
 num++;
 while (*num) {
 if (*num > max) {
 max = *num;
 }
 num++;
 }
 return max;
}
int main(int argc, char const* argv[])
{
 int num[] = { 32 , 9, 8, 7, 6, 66, 128, 89, 6, 54, 3, 0 };
 return findmax(num);
}

Here is what the compiler generated, I changed the labels and removed the alignment directives for brevity.

rdi has our num pointer in it as expected (sysv calling conventions). rax has our max.

GCC

findmax:
 .cfi_startproc # set stack frame?
 movl 4(%rdi), %edx # move num[1] to edx
 movl (%rdi), %eax # move num[0] to eax
 leaq 4(%rdi), %rcx # set rcx to &num[1]
 testl %edx, %edx # num[1] == 0?
 je .exit_loop
.loop:
 cmpl %edx, %eax # compare num[1] to num[0]
 cmovl %edx, %eax # if num[0] < num[1] then move num[1] to eax (our max) 
 addq 4,ドル %rcx # set rcx to next element address
 movl (%rcx), %edx # move next element to edx
 testl %edx, %edx # is element 0?
 jne .loop # repeat loop if element not 0, indexes get updated.
.exit_loop:
 rep ret

CLANG

findmax:
 .cfi_startproc
 movl (%rdi), %eax
 movl 4(%rdi), %ecx
 testl %ecx, %ecx
 je .exit
 addq 8,ドル %rdi
.loop: 
 cmpl %eax, %ecx
 cmovgel %ecx, %eax
 movl (%rdi), %ecx
 addq 4,ドル %rdi
 testl %ecx, %ecx
 jne .loop
.exit:
 retq

The compilers do a fantastic job at generating optimized code.

The only difference I would make to GCC's code is to move the lea under the je resulting in faster termination.

Question 13

Not all that different than what I came up with. It seems they used both lea and add so there's that. I note that you used x64, while OP explicitly referenced x86.

Question 14

Nice to see an extensive interaction of Reviewers. :-) Usually we see a reference to other answers. Also, i meant to ask... is it a good practice to add the suffixes to the operations? After all it's optional.

Question 15

I'm probably not the best person to ask about these suffixes, since I'm not a big fan of ATT syntax (which is what this is). But as you can see from Rafael's example, gcc's output always uses them. Not everyone would agree that gcc sets the standard here, but it's suggestive.

Question 16

Perhaps the compiler writers took a shortcut? When you program intel syntax, you never use size specifiers unless you are referencing memory. I would not use them unless you need them. When you read the code you can clearly see what instructions are manipulating memory because they stand out. Look at David's example compared to the GCC output, his is much easier to read imo.

David Wohlferd David Wohlferd 1,5181 gold badge8 silver badges17 bronze badges · Answer 1 · 2017-03-05 01:11:10Z

Looking at Rafael's answer, there are a few things I would change:

To see if a register is zero, I'd use the test instruction rather than cmp. This is slightly more efficient.
I'd avoid using the index register and increment ebx directly. I believe this produces (slightly) faster code (I know, not really a consideration here, but a habit worth acquiring). This also 'saves' a register that can be used for other purposes, doesn't have to be preserved, etc.
I usually try to structure my loops to avoid a double jump (jmp max_loop; je exit_loop).

Warning: I don't have a setup to test this, I'm composing this in the browser. But I'd picture something more like:

# maximum - Find the maximum value of the integers in a list
# A value of zero marks the end of the list.
#
# On input:
# ebx - pointer to list
#
# On output:
# eax - largest value in list
# ebx - scratch (points to last entry in list, but don't depend on this)
# edx - scratch (technically zero, but don't depend on this)
maximum:
 movl (%ebx), %eax # Set return register to first value
 test %eax, %eax # Check for end of list marker
 jz exit_loop
 lea 4(%ebx), %ebx # Increment pointer
 movl (%ebx), %edx # Read next value
max_loop:
 lea 4(%ebx), %ebx # Increment pointer
 cmp %edx, %eax # Check for greater value
 cmovg %edx, %eax # Update return if needed
 movl (%ebx), %edx # Read next value
 test %edx, %edx # Check for end of list marker
 jnz max_loop
exit_loop:
 ret

Note the heavier use of comments. This is a style thing, but I believe it makes the code MUCH easier to read quickly. And if there is a flaw (which there probably is, since I haven't even assembled this, let alone tested it), at least the intent of the code is clear. This makes things easier for the guy who (years from now) needs to maintain this code. At least he can see what you meant to do.

I acknowledge that the extra code outside my loop makes things a bit clunkier. But as a general rule, I'm willing to trade some "one-time clunky" for a more efficient loop.

And you can see that this different arrangement lets the flow just drop out of the bottom of the loop when the end of the list is encountered (ie there is only 1 jump inside the loop).

I should also mention that I use the lea to increment ebx (rather than add) since it doesn't affect the flags. Presumably this allows the cpu to optimize things a bit more since the instructions will 'fight' less, but who knows for sure anymore.

As for why I describe ebx and edx as 'scratch': Imagine this code is actually being used somewhere. Now, some day somebody figures out a more efficient way to calculate this that no longer sets these exact values in those specific registers. But they can't know for sure whether some code somewhere might be depending on this, so they would have to 'force' them to still have the same value.

I would normally just doc them as 'scratch,' but for learning purposes, I included the extra text.

cmp %edx, %eax cmovg %edx, %eax Since greater means that the DST (%eax) is greater than the SRC (%edx), shouldn't the conditional move be cmovng ?

Rafael Rafael 1714 bronze badges · Answer 2 · 2017-03-04 17:09:52Z

In general I try to stick to the naming conventions of the registers themselves. Although, I have been criticized for it because many programmers think of these conventions as archaic. But I think it helps reading the code.

General-Purpose Registers (GPR) - 16-bit naming conventions[edit] The 8 GPRs are:

Accumulator register (AX). Used in arithmetic operations.

Counter register (CX). Used in shift/rotate instructions and loops.

Data register (DX). Used in arithmetic operations and I/O operations.

Base register (BX). Used as a pointer to data (located in segment register DS, when in segmented mode).

Stack Pointer register (SP). Pointer to the top of the stack.

Stack Base Pointer register (BP). Used to point to the base of the stack.

Source Index register (SI). Used as a pointer to a source in stream operations.

Destination Index register (DI). Used as a pointer to a destination in stream operations.

Cited from: https://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture

.section .data
data_1:
 .long 1, 3, 4, 5, 6, 7, 8, 9, 0
data_2:
 .long 23, 45, 62, 13, 87, 54, 0
data_3:
 .long 1, 3, 7, 6, 5, 8, 1, 1, 0
.section .text
.globl _start
.type maximum, @function
.type exit, @function
_start:
 movl $data_3, %ebx
 call maximum
exit: 
 movl %eax, %ebx
 movl 1,ドル %eax
 int 0ドルx80
maximum:
 movl (%ebx), %edx # notice i did not index address here.
 movl %edx, %eax
 movl 1,ドル %esi
max_loop:
 cmpl 0,ドル %edx
 je exit_loop
 movl (%ebx, %esi, 4), %edx
 cmpl %edx, %eax
 cmovng %edx, %eax
 incl %esi
 jmp max_loop
exit_loop:
 ret

Instead of passing the args on the stack, I opted to pass the address in the base register. AMD64 defines way different calling conventions than the original 32bit stuff. If this were a non trivial program we'd definitely want to adhere to the sysv calling conventions which define which registers may be overwritten or must be preserved between calls, stack alignment, etc. In the maximum function we could use a cmov__ instruction if we wanted to conditionally move the data register to the accumulator (current max) if it was greater.

The next book to get if you want to learn a great deal more about assembly AFTER programming from the ground up is Professional Assembly Language.

Linus Torvalds has criticized cmov instructions because of the out of order execution nature of modern processors. Which the Professional Assembly Language book explains in detail.

http://yarchive.net/comp/linux/cmov.html

Welcome to Code Review! Glad I brought some one on board. :-) And nice answer. I suppose the movq is necessary because we're passing the address of a stream of longs? Or is it because long is 64 bits?
I believe you can omit the size specifiers on the instructions as long as the assembler can figure it. Usually you only have to specify sizes when you are moving to / from memory. The register names themselves specify size. I hate seeing the size specifiers when they are not needed...I find that it obscures the instruction names.
I suppose mov and add do seem more readable compared to movl and addl. I checked it and it works. That's a relief tbh.
cmp %edx, %eax cmovg %edx, %eax Since greater means that the DST (%eax) is greater than the SRC (%edx), shouldn't the conditional move be cmovng ?

Rafael Rafael 1714 bronze badges · Answer 3 · 2017-03-06 00:26:27Z

David, nice changes. I don't disagree with anything in your answer.

I was curious about how an optimizing compiler like GCC would handle our function. So I wrote a test C program and output it with gcc -O2 max.c -S.

max.c

int findmax(int num[])
{
 int max = *num;
 num++;
 while (*num) {
 if (*num > max) {
 max = *num;
 }
 num++;
 }
 return max;
}
int main(int argc, char const* argv[])
{
 int num[] = { 32 , 9, 8, 7, 6, 66, 128, 89, 6, 54, 3, 0 };
 return findmax(num);
}

Here is what the compiler generated, I changed the labels and removed the alignment directives for brevity.

rdi has our num pointer in it as expected (sysv calling conventions). rax has our max.

GCC

findmax:
 .cfi_startproc # set stack frame?
 movl 4(%rdi), %edx # move num[1] to edx
 movl (%rdi), %eax # move num[0] to eax
 leaq 4(%rdi), %rcx # set rcx to &num[1]
 testl %edx, %edx # num[1] == 0?
 je .exit_loop
.loop:
 cmpl %edx, %eax # compare num[1] to num[0]
 cmovl %edx, %eax # if num[0] < num[1] then move num[1] to eax (our max) 
 addq 4,ドル %rcx # set rcx to next element address
 movl (%rcx), %edx # move next element to edx
 testl %edx, %edx # is element 0?
 jne .loop # repeat loop if element not 0, indexes get updated.
.exit_loop:
 rep ret

CLANG

findmax:
 .cfi_startproc
 movl (%rdi), %eax
 movl 4(%rdi), %ecx
 testl %ecx, %ecx
 je .exit
 addq 8,ドル %rdi
.loop: 
 cmpl %eax, %ecx
 cmovgel %ecx, %eax
 movl (%rdi), %ecx
 addq 4,ドル %rdi
 testl %ecx, %ecx
 jne .loop
.exit:
 retq

The compilers do a fantastic job at generating optimized code.

The only difference I would make to GCC's code is to move the lea under the je resulting in faster termination.

Not all that different than what I came up with. It seems they used both lea and add so there's that. I note that you used x64, while OP explicitly referenced x86.
Nice to see an extensive interaction of Reviewers. :-) Usually we see a reference to other answers. Also, i meant to ask... is it a good practice to add the suffixes to the operations? After all it's optional.
I'm probably not the best person to ask about these suffixes, since I'm not a big fan of ATT syntax (which is what this is). But as you can see from Rafael's example, gcc's output always uses them. Not everyone would agree that gcc sets the standard here, but it's suggestive.
Perhaps the compiler writers took a shortcut? When you program intel syntax, you never use size specifiers unless you are referencing memory. I would not use them unless you need them. When you read the code you can clearly see what instructions are manipulating memory because they stand out. Look at David's example compared to the GCC output, his is much easier to read imo.

Stack Exchange Network

Finding the maximum of a given list of data in GNU Assembly x86 (32 bit)

maxfunc.s

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Finding the maximum of a given list of data in GNU Assembly x86 (32 bit)

maxfunc.s

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions