Example x86-64 program

Question 1

I wrote the following x86 program to make sure I'm following the correct practices in calling a function and then exiting to the OS:

.globl _start
_start:
 # Calculate 2*3 + 7*9 = 6 + 63 = 69
 # The multiplication will be done with a separate function call
 # Parameters passed in System V ABI
 # The first 6 integer/pointer arguments are passed in:
 # %rdi, %rsi, %rdx, %rcx, %r8, and %r9
 # The return value is passed in %rax
 # multiply(2, 3)
 # Part 1 --> Call the parameters
 mov 2,ドル %rdi
 mov 3,ドル %rsi
 # Part 2 --> Call the function (`push` return address onto stack and `jmp` to function label)
 call multiply
 # Part 3 --> Handle the return value from %rax (here we'll just push it to the stack as a test)
 push %rax
 # multiply(7, 9)
 mov 7,ドル %rdi
 mov 9,ドル %rsi
 call multiply
 # Add the two together
 # Restore from stack onto rdi for the first function
 pop %rdi
 # The previous value from multiply(7,9) is already in rax, so just add to rbx
 add %rax, %rdi
 # for the 64-bit calling convention, do syscall instead of int 0x80
 # use %rdi instead of %rbx for the exit arg
 # use 60ドル instead of 1 for the exit code
 movq 60,ドル %rax # use the `_exit` [fast] syscall
 # rdi contains out exit code
 syscall # make syscall
multiply:
 mov %rdi, %rax
 imul %rsi, %rax
 ret

Does the above follow the x86-64 conventions properly? I know it is probably as basic as it comes, but what can be improved here?

Question 2

To elaborate on some comments you got on the SO version of this question, the main thing you're missing is stack alignment, a requirement of the SysV ABI calling conventions that's often overlooked by beginners.

The requirement is (ABI 3.2.2):

The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary.

So that means that, at the instant before you execute a call instruction, the stack pointer %rsp needs to be a multiple of 16. In your case you have a push of 8 bytes without a pop in between your two calls to multiply, so they can't both have correct alignment.

Some wrinkles are introduced here by the fact that your parent function is _start instead of main or another function called by C code:

The conditions on entry to _start are described in 3.4 of the ABI. In particular, the stack is aligned to 16 bytes at the instant _start gets control. Also, since you cannot return from _start (there is no return address on the stack), you have to exit with a system call as you do, and so there is no need to save any registers for the caller.
For main or any other function, the stack would have been aligned to 16 bytes before your function was called, so the extra 8 bytes for the return address mean that on entry to your function, the stack is now "misaligned", i.e. the value of rsp is 8 more or less than a multiple of 16. (Since one would normally only manipulate the stack in 8-byte increments, it's only really ever in two possible states, which I'll call "aligned" and "misaligned".) Also, in such functions, you would need to preserve the contents of the callee-saved registers %rbx, %rbp, %r12-r15.

So as it stands, your first call to multiply has correct stack alignment, but your second does not. Of course, it's only of academic interest in this case, because multiply doesn't do anything that needs stack alignment (it doesn't even use the stack at all), but it's good practice to do it right.

One way to fix it would be to subtract another 8 bytes from the stack pointer before the second call, either with sub 8,ドル %rsp or (more efficiently) by simply pushing any random 64-bit register. But why should we bother to use the stack at all to save this value? We could simply put it in one of the callee-saved registers, say %rbx, which we know multiply must preserve. Normally this would require us to save and restore the contents of this register, but since we are in the special case of being _start, we don't have to.

A separate comment is that you have a lot of instructions like mov 7,ドル %rdi where you operate on 64-bit registers. This would be better to write as mov 7,ドル %edi. Recall that every write to a 32-bit register will zero the upper half of the corresponding 64-bit register, so the effect is the same as long as your constant is unsigned 32 bits, and the encoding of mov 7,ドル %edi is one byte shorter as it doesn't need a REX prefix.

So I'd revise your code as

.globl _start
_start:
 # Calculate 2*3 + 7*9 = 6 + 63 = 69
 # The multiplication will be done with a separate function call
 # Parameters passed in System V ABI
 # The first 6 integer/pointer arguments are passed in:
 # %rdi, %rsi, %rdx, %rcx, %r8, and %r9
 # The return value is passed in %rax
 # multiply(2, 3)
 # Part 1 --> Load the parameters
 mov 2,ドル %edi
 mov 3,ドル %esi
 # Part 2 --> Call the function (`push` return address onto stack and `jmp` to function label)
 call multiply
 # Part 3 --> Save the return value
 mov %rax, %rbx # could also do mov %ebx, %eax if you know the result fits in 32 bits
 # multiply(7, 9)
 mov 7,ドル %edi
 mov 9,ドル %esi
 call multiply
 # Add the two together
 add %rbx, %rax
 mov %rax, %rdi
 # for the 64-bit calling convention, do syscall instead of int 0x80
 # use %rdi instead of %rbx for the exit arg
 # use 60ドル instead of 1 for the exit code
 mov 60,ドル %eax # use the `_exit` [fast] syscall
 # rdi contains out exit code
 syscall # make syscall
multiply:
 mov %rdi, %rax
 imul %rsi, %rax
 ret

If you want to rely on the result of multiply fitting in 32 bits, you could replace mov %rax, %rbx with mov %eax, %ebx to save one byte. And likewise, the "Add the two together" could use 32-bit instructions instead to save two more bytes.

Finally, there's a stylistic point on whether to use the AT&T-syntax operand size suffixes, like addq versus add. They are optional when one operand is a register, since the operand size can be deduced from the size of that register (e.g. 32 bits for %eax, 64 bits for %rax, etc). My personal preference is to always use them, as a little extra verification that you're really writing what you mean, but omitting them as you (mostly) did is also common and fine; just be consistent. You did have one instance of movq 60,ドル %rax where it wasn't needed, so for consistency I omitted the suffix there. (I also changed it to %eax for the reasons noted above.)

Question 3

thanks for your thorough response and all the suggestions. A few questions: (1) what is the main reason for the 16-byte alignment requirement: is it mainly to support packed types? You mentioned that my code doesn't need that, but where might a function need stack-alignment? (2) Is there another place to view the ABI for example as a saved pdf (that github link requires me to build the latex files with a few additional libraries I don't have) ?

Question 4

(2) There's a copy here which is not the latest but should be close enough. stackoverflow.com/questions/18133812/… may eventually have better links.

Question 5

(1) Sort of. There are SSE and later instructions which require aligned data and fault otherwise. Most of them are intended for packed data but are often used for other purposes, e.g. fast memory copying. But in order to use with stack data, you need to know how the stack is aligned. See also stackoverflow.com/questions/63440410/….

Question 6

if interested, I posted a continuation question from the above: codereview.stackexchange.com/questions/249198/…

Nate Eldredge Nate Eldredge 2811 silver badge5 bronze badges · Accepted Answer · 2020-08-31 14:40:38Z

To elaborate on some comments you got on the SO version of this question, the main thing you're missing is stack alignment, a requirement of the SysV ABI calling conventions that's often overlooked by beginners.

The requirement is (ABI 3.2.2):

The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary.

So that means that, at the instant before you execute a call instruction, the stack pointer %rsp needs to be a multiple of 16. In your case you have a push of 8 bytes without a pop in between your two calls to multiply, so they can't both have correct alignment.

Some wrinkles are introduced here by the fact that your parent function is _start instead of main or another function called by C code:

The conditions on entry to _start are described in 3.4 of the ABI. In particular, the stack is aligned to 16 bytes at the instant _start gets control. Also, since you cannot return from _start (there is no return address on the stack), you have to exit with a system call as you do, and so there is no need to save any registers for the caller.
For main or any other function, the stack would have been aligned to 16 bytes before your function was called, so the extra 8 bytes for the return address mean that on entry to your function, the stack is now "misaligned", i.e. the value of rsp is 8 more or less than a multiple of 16. (Since one would normally only manipulate the stack in 8-byte increments, it's only really ever in two possible states, which I'll call "aligned" and "misaligned".) Also, in such functions, you would need to preserve the contents of the callee-saved registers %rbx, %rbp, %r12-r15.

So as it stands, your first call to multiply has correct stack alignment, but your second does not. Of course, it's only of academic interest in this case, because multiply doesn't do anything that needs stack alignment (it doesn't even use the stack at all), but it's good practice to do it right.

One way to fix it would be to subtract another 8 bytes from the stack pointer before the second call, either with sub 8,ドル %rsp or (more efficiently) by simply pushing any random 64-bit register. But why should we bother to use the stack at all to save this value? We could simply put it in one of the callee-saved registers, say %rbx, which we know multiply must preserve. Normally this would require us to save and restore the contents of this register, but since we are in the special case of being _start, we don't have to.

A separate comment is that you have a lot of instructions like mov 7,ドル %rdi where you operate on 64-bit registers. This would be better to write as mov 7,ドル %edi. Recall that every write to a 32-bit register will zero the upper half of the corresponding 64-bit register, so the effect is the same as long as your constant is unsigned 32 bits, and the encoding of mov 7,ドル %edi is one byte shorter as it doesn't need a REX prefix.

So I'd revise your code as

.globl _start
_start:
 # Calculate 2*3 + 7*9 = 6 + 63 = 69
 # The multiplication will be done with a separate function call
 # Parameters passed in System V ABI
 # The first 6 integer/pointer arguments are passed in:
 # %rdi, %rsi, %rdx, %rcx, %r8, and %r9
 # The return value is passed in %rax
 # multiply(2, 3)
 # Part 1 --> Load the parameters
 mov 2,ドル %edi
 mov 3,ドル %esi
 # Part 2 --> Call the function (`push` return address onto stack and `jmp` to function label)
 call multiply
 # Part 3 --> Save the return value
 mov %rax, %rbx # could also do mov %ebx, %eax if you know the result fits in 32 bits
 # multiply(7, 9)
 mov 7,ドル %edi
 mov 9,ドル %esi
 call multiply
 # Add the two together
 add %rbx, %rax
 mov %rax, %rdi
 # for the 64-bit calling convention, do syscall instead of int 0x80
 # use %rdi instead of %rbx for the exit arg
 # use 60ドル instead of 1 for the exit code
 mov 60,ドル %eax # use the `_exit` [fast] syscall
 # rdi contains out exit code
 syscall # make syscall
multiply:
 mov %rdi, %rax
 imul %rsi, %rax
 ret

If you want to rely on the result of multiply fitting in 32 bits, you could replace mov %rax, %rbx with mov %eax, %ebx to save one byte. And likewise, the "Add the two together" could use 32-bit instructions instead to save two more bytes.

Finally, there's a stylistic point on whether to use the AT&T-syntax operand size suffixes, like addq versus add. They are optional when one operand is a register, since the operand size can be deduced from the size of that register (e.g. 32 bits for %eax, 64 bits for %rax, etc). My personal preference is to always use them, as a little extra verification that you're really writing what you mean, but omitting them as you (mostly) did is also common and fine; just be consistent. You did have one instance of movq 60,ドル %rax where it wasn't needed, so for consistency I omitted the suffix there. (I also changed it to %eax for the reasons noted above.)

thanks for your thorough response and all the suggestions. A few questions: (1) what is the main reason for the 16-byte alignment requirement: is it mainly to support packed types? You mentioned that my code doesn't need that, but where might a function need stack-alignment? (2) Is there another place to view the ABI for example as a saved pdf (that github link requires me to build the latex files with a few additional libraries I don't have) ?
(2) There's a copy here which is not the latest but should be close enough. stackoverflow.com/questions/18133812/… may eventually have better links.
(1) Sort of. There are SSE and later instructions which require aligned data and fault otherwise. Most of them are intended for packed data but are often used for other purposes, e.g. fast memory copying. But in order to use with stack data, you need to know how the stack is aligned. See also stackoverflow.com/questions/63440410/….
if interested, I posted a continuation question from the above: codereview.stackexchange.com/questions/249198/…

Stack Exchange Network

Example x86-64 program

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Example x86-64 program

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions