I am following the book Programming from the Ground Up and as an answer to a question in the Use the Concepts section of chapter 4:
- Convert the maximum program given in the Section called Finding a Maximum Value in Chapter 3 so that it is a function which takes a pointer to several values and returns their maximum. Write a program that calls maximum with 3 different lists, and returns the result of the last one as the program’s exit status code.
I wrote the following code:
maxfunc.s
# NAME: maxfunc.s
# PURPOSE: A modular approach of finding the maximum of lists
.section .data
# The data section has three lists for testing, change the code
# highlighted in the text section to change the list passed.
data_1:
.long 1, 2, 3, 4, 5, 6, 7, 8, 9, 0
data_2:
.long 23, 45, 62, 13, 87, 54, 0
data_3:
.long 1, 2, 3, 3, 3, 7, 6, 5, 8, 1, 1, 0
# VARIABLES:
# 1. %edi - Will be our index variable, to access the items
# 2. %ebx - The element we're looking at currently
# 3. %eax - The maximum value we've found so far
# 4. %ecx - To store the address of the list
.section .text
.globl _start
_start:
# Push the address of the list we want to find the maximum of
pushl $data_3
# ^
# +---> Change this to select your list
call maximum
addl 4,ドル %esp # Reset the stack
movl %eax, %ebx
movl 1,ドル %eax
int 0ドルx80
.type maximum, @function
maximum:
# Setup
pushl %ebp
movl %esp, %ebp
# The initial setup:
# Get the address of the list
# Set the index to 0
# Get the first item from the list
# The first item is currently the largest
movl 0,ドル %edi
movl 8(%ebp), %ecx
movl (%ecx, %edi, 4), %ebx
movl %ebx, %eax
max_loop:
cmpl 0,ドル %ebx
je exit_loop
incl %edi
movl (%ecx, %edi, 4), %ebx
cmpl %eax, %ebx
jle max_loop
# %ebx is greater than %eax, therefore, update max
movl %ebx, %eax
jmp max_loop
exit_loop:
# Tear down
movl %ebp, %esp
popl %ebp
ret
It compiles and works as intended. I'm compiling it as a 32 bit binary on my 64 bit machine, so that I can follow the book.
I'd like suggestions on the proper use of conventions, and everything in general.
3 Answers 3
Looking at Rafael's answer, there are a few things I would change:
- To see if a register is zero, I'd use the
test
instruction rather thancmp
. This is slightly more efficient. - I'd avoid using the index register and increment ebx directly. I believe this produces (slightly) faster code (I know, not really a consideration here, but a habit worth acquiring). This also 'saves' a register that can be used for other purposes, doesn't have to be preserved, etc.
- I usually try to structure my loops to avoid a double jump (
jmp max_loop; je exit_loop
).
Warning: I don't have a setup to test this, I'm composing this in the browser. But I'd picture something more like:
# maximum - Find the maximum value of the integers in a list
# A value of zero marks the end of the list.
#
# On input:
# ebx - pointer to list
#
# On output:
# eax - largest value in list
# ebx - scratch (points to last entry in list, but don't depend on this)
# edx - scratch (technically zero, but don't depend on this)
maximum:
movl (%ebx), %eax # Set return register to first value
test %eax, %eax # Check for end of list marker
jz exit_loop
lea 4(%ebx), %ebx # Increment pointer
movl (%ebx), %edx # Read next value
max_loop:
lea 4(%ebx), %ebx # Increment pointer
cmp %edx, %eax # Check for greater value
cmovg %edx, %eax # Update return if needed
movl (%ebx), %edx # Read next value
test %edx, %edx # Check for end of list marker
jnz max_loop
exit_loop:
ret
Note the heavier use of comments. This is a style thing, but I believe it makes the code MUCH easier to read quickly. And if there is a flaw (which there probably is, since I haven't even assembled this, let alone tested it), at least the intent of the code is clear. This makes things easier for the guy who (years from now) needs to maintain this code. At least he can see what you meant to do.
I acknowledge that the extra code outside my loop makes things a bit clunkier. But as a general rule, I'm willing to trade some "one-time clunky" for a more efficient loop.
And you can see that this different arrangement lets the flow just drop out of the bottom of the loop when the end of the list is encountered (ie there is only 1 jump inside the loop).
I should also mention that I use the lea
to increment ebx (rather than add
) since it doesn't affect the flags. Presumably this allows the cpu to optimize things a bit more since the instructions will 'fight' less, but who knows for sure anymore.
As for why I describe ebx and edx as 'scratch': Imagine this code is actually being used somewhere. Now, some day somebody figures out a more efficient way to calculate this that no longer sets these exact values in those specific registers. But they can't know for sure whether some code somewhere might be depending on this, so they would have to 'force' them to still have the same value.
I would normally just doc them as 'scratch,' but for learning purposes, I included the extra text.
-
1\$\begingroup\$ "..., but who knows for sure anymore." So true! \$\endgroup\$Sep Roland– Sep Roland2017年03月05日 20:53:10 +00:00Commented Mar 5, 2017 at 20:53
-
2\$\begingroup\$
cmp %edx, %eax
cmovg %edx, %eax
Since greater means that the DST (%eax) is greater than the SRC (%edx), shouldn't the conditional move becmovng
? \$\endgroup\$Sep Roland– Sep Roland2017年03月05日 21:26:18 +00:00Commented Mar 5, 2017 at 21:26 -
\$\begingroup\$ great optimizations \$\endgroup\$Rafael– Rafael2017年03月06日 00:47:55 +00:00Commented Mar 6, 2017 at 0:47
In general I try to stick to the naming conventions of the registers themselves. Although, I have been criticized for it because many programmers think of these conventions as archaic. But I think it helps reading the code.
General-Purpose Registers (GPR) - 16-bit naming conventions[edit] The 8 GPRs are:
- Accumulator register (AX). Used in arithmetic operations.
- Counter register (CX). Used in shift/rotate instructions and loops.
- Data register (DX). Used in arithmetic operations and I/O operations.
- Base register (BX). Used as a pointer to data (located in segment register DS, when in segmented mode).
- Stack Pointer register (SP). Pointer to the top of the stack.
- Stack Base Pointer register (BP). Used to point to the base of the stack.
- Source Index register (SI). Used as a pointer to a source in stream operations.
- Destination Index register (DI). Used as a pointer to a destination in stream operations.
Cited from: https://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture
.section .data
data_1:
.long 1, 3, 4, 5, 6, 7, 8, 9, 0
data_2:
.long 23, 45, 62, 13, 87, 54, 0
data_3:
.long 1, 3, 7, 6, 5, 8, 1, 1, 0
.section .text
.globl _start
.type maximum, @function
.type exit, @function
_start:
movl $data_3, %ebx
call maximum
exit:
movl %eax, %ebx
movl 1,ドル %eax
int 0ドルx80
maximum:
movl (%ebx), %edx # notice i did not index address here.
movl %edx, %eax
movl 1,ドル %esi
max_loop:
cmpl 0,ドル %edx
je exit_loop
movl (%ebx, %esi, 4), %edx
cmpl %edx, %eax
cmovng %edx, %eax
incl %esi
jmp max_loop
exit_loop:
ret
Instead of passing the args on the stack, I opted to pass the address in the base register. AMD64 defines way different calling conventions than the original 32bit stuff. If this were a non trivial program we'd definitely want to adhere to the sysv calling conventions which define which registers may be overwritten or must be preserved between calls, stack alignment, etc. In the maximum
function we could use a cmov__
instruction if we wanted to conditionally move the data register to the accumulator (current max) if it was greater.
The next book to get if you want to learn a great deal more about assembly AFTER programming from the ground up is Professional Assembly Language.
Linus Torvalds has criticized cmov
instructions because of the out of order execution nature of modern processors. Which the Professional Assembly Language book explains in detail.
-
\$\begingroup\$ Welcome to Code Review! Glad I brought some one on board. :-) And nice answer. I suppose the
movq
is necessary because we're passing the address of a stream oflong
s? Or is it becauselong
is 64 bits? \$\endgroup\$Hungry Blue Dev– Hungry Blue Dev2017年03月04日 17:29:06 +00:00Commented Mar 4, 2017 at 17:29 -
\$\begingroup\$ that was a mistake :) \$\endgroup\$Rafael– Rafael2017年03月04日 17:31:01 +00:00Commented Mar 4, 2017 at 17:31
-
2\$\begingroup\$ I believe you can omit the size specifiers on the instructions as long as the assembler can figure it. Usually you only have to specify sizes when you are moving to / from memory. The register names themselves specify size. I hate seeing the size specifiers when they are not needed...I find that it obscures the instruction names. \$\endgroup\$Rafael– Rafael2017年03月04日 17:33:15 +00:00Commented Mar 4, 2017 at 17:33
-
\$\begingroup\$ I suppose
mov
andadd
do seem more readable compared tomovl
andaddl
. I checked it and it works. That's a relief tbh. \$\endgroup\$Hungry Blue Dev– Hungry Blue Dev2017年03月04日 17:36:46 +00:00Commented Mar 4, 2017 at 17:36 -
1\$\begingroup\$
cmp %edx, %eax
cmovg %edx, %eax
Since greater means that the DST (%eax) is greater than the SRC (%edx), shouldn't the conditional move becmovng
? \$\endgroup\$Sep Roland– Sep Roland2017年03月05日 21:26:54 +00:00Commented Mar 5, 2017 at 21:26
David, nice changes. I don't disagree with anything in your answer.
I was curious about how an optimizing compiler like GCC would handle our function. So I wrote a test C program and output it with gcc -O2 max.c -S
.
max.c
int findmax(int num[])
{
int max = *num;
num++;
while (*num) {
if (*num > max) {
max = *num;
}
num++;
}
return max;
}
int main(int argc, char const* argv[])
{
int num[] = { 32 , 9, 8, 7, 6, 66, 128, 89, 6, 54, 3, 0 };
return findmax(num);
}
Here is what the compiler generated, I changed the labels and removed the alignment directives for brevity.
rdi
has our num
pointer in it as expected (sysv calling conventions).
rax
has our max.
GCC
findmax:
.cfi_startproc # set stack frame?
movl 4(%rdi), %edx # move num[1] to edx
movl (%rdi), %eax # move num[0] to eax
leaq 4(%rdi), %rcx # set rcx to &num[1]
testl %edx, %edx # num[1] == 0?
je .exit_loop
.loop:
cmpl %edx, %eax # compare num[1] to num[0]
cmovl %edx, %eax # if num[0] < num[1] then move num[1] to eax (our max)
addq 4,ドル %rcx # set rcx to next element address
movl (%rcx), %edx # move next element to edx
testl %edx, %edx # is element 0?
jne .loop # repeat loop if element not 0, indexes get updated.
.exit_loop:
rep ret
CLANG
findmax:
.cfi_startproc
movl (%rdi), %eax
movl 4(%rdi), %ecx
testl %ecx, %ecx
je .exit
addq 8,ドル %rdi
.loop:
cmpl %eax, %ecx
cmovgel %ecx, %eax
movl (%rdi), %ecx
addq 4,ドル %rdi
testl %ecx, %ecx
jne .loop
.exit:
retq
The compilers do a fantastic job at generating optimized code.
The only difference I would make to GCC's code is to move the lea
under the je
resulting in faster termination.
-
\$\begingroup\$ Not all that different than what I came up with. It seems they used both
lea
andadd
so there's that. I note that you used x64, while OP explicitly referenced x86. \$\endgroup\$David Wohlferd– David Wohlferd2017年03月06日 02:13:48 +00:00Commented Mar 6, 2017 at 2:13 -
\$\begingroup\$ Nice to see an extensive interaction of Reviewers. :-) Usually we see a reference to other answers. Also, i meant to ask... is it a good practice to add the suffixes to the operations? After all it's optional. \$\endgroup\$Hungry Blue Dev– Hungry Blue Dev2017年03月06日 05:10:10 +00:00Commented Mar 6, 2017 at 5:10
-
\$\begingroup\$ I'm probably not the best person to ask about these suffixes, since I'm not a big fan of ATT syntax (which is what this is). But as you can see from Rafael's example, gcc's output always uses them. Not everyone would agree that gcc sets the standard here, but it's suggestive. \$\endgroup\$David Wohlferd– David Wohlferd2017年03月06日 11:07:38 +00:00Commented Mar 6, 2017 at 11:07
-
1\$\begingroup\$ Perhaps the compiler writers took a shortcut? When you program intel syntax, you never use size specifiers unless you are referencing memory. I would not use them unless you need them. When you read the code you can clearly see what instructions are manipulating memory because they stand out. Look at David's example compared to the GCC output, his is much easier to read imo. \$\endgroup\$Rafael– Rafael2017年03月06日 13:21:14 +00:00Commented Mar 6, 2017 at 13:21