I wrote the following x86 program to make sure I'm following the correct practices in calling a function and then exiting to the OS:
.globl _start
_start:
# Calculate 2*3 + 7*9 = 6 + 63 = 69
# The multiplication will be done with a separate function call
# Parameters passed in System V ABI
# The first 6 integer/pointer arguments are passed in:
# %rdi, %rsi, %rdx, %rcx, %r8, and %r9
# The return value is passed in %rax
# multiply(2, 3)
# Part 1 --> Call the parameters
mov 2,ドル %rdi
mov 3,ドル %rsi
# Part 2 --> Call the function (`push` return address onto stack and `jmp` to function label)
call multiply
# Part 3 --> Handle the return value from %rax (here we'll just push it to the stack as a test)
push %rax
# multiply(7, 9)
mov 7,ドル %rdi
mov 9,ドル %rsi
call multiply
# Add the two together
# Restore from stack onto rdi for the first function
pop %rdi
# The previous value from multiply(7,9) is already in rax, so just add to rbx
add %rax, %rdi
# for the 64-bit calling convention, do syscall instead of int 0x80
# use %rdi instead of %rbx for the exit arg
# use 60ドル instead of 1 for the exit code
movq 60,ドル %rax # use the `_exit` [fast] syscall
# rdi contains out exit code
syscall # make syscall
multiply:
mov %rdi, %rax
imul %rsi, %rax
ret
Does the above follow the x86-64 conventions properly? I know it is probably as basic as it comes, but what can be improved here?
1 Answer 1
To elaborate on some comments you got on the SO version of this question, the main thing you're missing is stack alignment, a requirement of the SysV ABI calling conventions that's often overlooked by beginners.
The requirement is (ABI 3.2.2):
The end of the input argument area shall be aligned on a 16 (32 or 64, if
__m256
or__m512
is passed on stack) byte boundary.
So that means that, at the instant before you execute a call
instruction, the stack pointer %rsp
needs to be a multiple of 16. In your case you have a push
of 8 bytes without a pop
in between your two calls to multiply
, so they can't both have correct alignment.
Some wrinkles are introduced here by the fact that your parent function is _start
instead of main
or another function called by C code:
The conditions on entry to
_start
are described in 3.4 of the ABI. In particular, the stack is aligned to 16 bytes at the instant_start
gets control. Also, since you cannot return from_start
(there is no return address on the stack), you have to exit with a system call as you do, and so there is no need to save any registers for the caller.For
main
or any other function, the stack would have been aligned to 16 bytes before your function was called, so the extra 8 bytes for the return address mean that on entry to your function, the stack is now "misaligned", i.e. the value ofrsp
is 8 more or less than a multiple of 16. (Since one would normally only manipulate the stack in 8-byte increments, it's only really ever in two possible states, which I'll call "aligned" and "misaligned".) Also, in such functions, you would need to preserve the contents of the callee-saved registers%rbx, %rbp, %r12-r15
.
So as it stands, your first call to multiply
has correct stack alignment, but your second does not. Of course, it's only of academic interest in this case, because multiply
doesn't do anything that needs stack alignment (it doesn't even use the stack at all), but it's good practice to do it right.
One way to fix it would be to subtract another 8 bytes from the stack pointer before the second call, either with sub 8,ドル %rsp
or (more efficiently) by simply push
ing any random 64-bit register. But why should we bother to use the stack at all to save this value? We could simply put it in one of the callee-saved registers, say %rbx
, which we know multiply
must preserve. Normally this would require us to save and restore the contents of this register, but since we are in the special case of being _start
, we don't have to.
A separate comment is that you have a lot of instructions like mov 7,ドル %rdi
where you operate on 64-bit registers. This would be better to write as mov 7,ドル %edi
. Recall that every write to a 32-bit register will zero the upper half of the corresponding 64-bit register, so the effect is the same as long as your constant is unsigned 32 bits, and the encoding of mov 7,ドル %edi
is one byte shorter as it doesn't need a REX prefix.
So I'd revise your code as
.globl _start
_start:
# Calculate 2*3 + 7*9 = 6 + 63 = 69
# The multiplication will be done with a separate function call
# Parameters passed in System V ABI
# The first 6 integer/pointer arguments are passed in:
# %rdi, %rsi, %rdx, %rcx, %r8, and %r9
# The return value is passed in %rax
# multiply(2, 3)
# Part 1 --> Load the parameters
mov 2,ドル %edi
mov 3,ドル %esi
# Part 2 --> Call the function (`push` return address onto stack and `jmp` to function label)
call multiply
# Part 3 --> Save the return value
mov %rax, %rbx # could also do mov %ebx, %eax if you know the result fits in 32 bits
# multiply(7, 9)
mov 7,ドル %edi
mov 9,ドル %esi
call multiply
# Add the two together
add %rbx, %rax
mov %rax, %rdi
# for the 64-bit calling convention, do syscall instead of int 0x80
# use %rdi instead of %rbx for the exit arg
# use 60ドル instead of 1 for the exit code
mov 60,ドル %eax # use the `_exit` [fast] syscall
# rdi contains out exit code
syscall # make syscall
multiply:
mov %rdi, %rax
imul %rsi, %rax
ret
If you want to rely on the result of multiply
fitting in 32 bits, you could replace mov %rax, %rbx
with mov %eax, %ebx
to save one byte. And likewise, the "Add the two together" could use 32-bit instructions instead to save two more bytes.
Finally, there's a stylistic point on whether to use the AT&T-syntax operand size suffixes, like addq
versus add
. They are optional when one operand is a register, since the operand size can be deduced from the size of that register (e.g. 32 bits for %eax
, 64 bits for %rax
, etc). My personal preference is to always use them, as a little extra verification that you're really writing what you mean, but omitting them as you (mostly) did is also common and fine; just be consistent. You did have one instance of movq 60,ドル %rax
where it wasn't needed, so for consistency I omitted the suffix there. (I also changed it to %eax
for the reasons noted above.)
-
\$\begingroup\$ thanks for your thorough response and all the suggestions. A few questions: (1) what is the main reason for the 16-byte alignment requirement: is it mainly to support packed types? You mentioned that my code doesn't need that, but where might a function need stack-alignment? (2) Is there another place to view the ABI for example as a saved pdf (that github link requires me to build the latex files with a few additional libraries I don't have) ? \$\endgroup\$samuelbrody1249– samuelbrody12492020年09月01日 03:36:36 +00:00Commented Sep 1, 2020 at 3:36
-
1\$\begingroup\$ (2) There's a copy here which is not the latest but should be close enough. stackoverflow.com/questions/18133812/… may eventually have better links. \$\endgroup\$Nate Eldredge– Nate Eldredge2020年09月01日 03:43:16 +00:00Commented Sep 1, 2020 at 3:43
-
\$\begingroup\$ (1) Sort of. There are SSE and later instructions which require aligned data and fault otherwise. Most of them are intended for packed data but are often used for other purposes, e.g. fast memory copying. But in order to use with stack data, you need to know how the stack is aligned. See also stackoverflow.com/questions/63440410/…. \$\endgroup\$Nate Eldredge– Nate Eldredge2020年09月01日 03:47:29 +00:00Commented Sep 1, 2020 at 3:47
-
\$\begingroup\$ if interested, I posted a continuation question from the above: codereview.stackexchange.com/questions/249198/… \$\endgroup\$samuelbrody1249– samuelbrody12492020年09月11日 05:24:25 +00:00Commented Sep 11, 2020 at 5:24