Writing a recursive factorial function in x86-64

Question 1

The following assembly program works to print the factorial of a number:

SYS_EXIT = 60
.globl _start
_start:
 # run 4! --> 4*3*2*1 = 24
 mov 4,ドル %edi 
 call factorial
 mov %eax, %edi
 mov $SYS_EXIT, %eax
 syscall
.type factorial @function
factorial:
 # if it is the base case, return 1 and exit
 cmp 1,ドル %edi 
 jne _factorial
 mov 1,ドル %eax
 ret
_factorial:
 # if not the base case, set up the stack frame
 push %rbp
 mov %rsp, %rbp
 push %rdi # move the current value or n to the stack, 
 dec %rdi # so we can pop it later and multiple by the factorial(n-1) function
 call factorial
 pop %rbx
 mul %rbx # multiples eax (return of factorial) by previoud rdi (n)
 # clean up the stack frame
 mov %rbp, %rsp
 pop %rbp
 ret

Here is an example output:

$ as factorial.s -o factorial.o && ld factorial.o -o factorial && ./factorial; echo $?
24

How does the program look? Any feedback would be greatly appreciated!

Question 2

Input Handling

At least at first glance, it looks like this doesn't handle the factorial of zero correctly. 0! is equal to 1, so fixing it is pretty trivial, by changing jne _factorial to ja _factorial:

# if it is the base case, return 1 and exit
cmp 1,ドル %edi 
ja _factorial
mov 1,ドル %eax
ret

Since factorial isn't (at least normally) defined for negative numbers, I've treated the input as unsigned. If you want to treat it as signed, you'd use jg instead of ja.

Register Usage

mul produces a result in edx:eax, not just in eax, so you normally want to clear edx before you start doing your multiplications.

Stack Frame

I would rewrite the function a bit to use an internal function for the recursion. That internal function would use a purely register-based calling convention to avoid setting up a stack frame for every invocation.

Using Intel syntax, I'd write the code something on this general order:

; when first called, input value in edi
; and edx:eax containing 0:1
; result: factorial in edx:eax
;
internal_factorial:
 mul eax, edi
 sub edi, 1
 jz done
 call internal_factorial
done:
 ret

Then the main routine would be something on this general order:

factorial:
 mov eax, 1 ; prepare our 64-bit result value in edx:eax
 xor edx, edx
 cmp edi, eax ; check for (either) base case
 jbe do_ret ; if it's not a base case, compute the factorial
 call internal_factorial
do_ret:
 ret

Question 3

Why are you zeroing edx? It is never read, only written to (by the mul), in the OP, and is otherwise completely unused in your version.

Question 4

@1201ProgramAlarm: Clearing EDX allows the caller to actually get a result in edx:eax instead of only in eax. Depending on the prior state of the processor, this can also actually improve speed. The processor detects clearing a register (such as with xor reg, reg or sub reg, reg) so afterwards it knows operations that affect that register don't depend on its prior state, so it can execute the two sets of operations in parallel.

Question 5

One-operand mul overwrites rdx, the previous state of it doesn't matter. It's div before which rdx should usually be zeroed

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges144 bronze badges · Answer 1 · 2021-01-31 23:56:52Z

Input Handling

At least at first glance, it looks like this doesn't handle the factorial of zero correctly. 0! is equal to 1, so fixing it is pretty trivial, by changing jne _factorial to ja _factorial:

# if it is the base case, return 1 and exit
cmp 1,ドル %edi 
ja _factorial
mov 1,ドル %eax
ret

Since factorial isn't (at least normally) defined for negative numbers, I've treated the input as unsigned. If you want to treat it as signed, you'd use jg instead of ja.

Register Usage

mul produces a result in edx:eax, not just in eax, so you normally want to clear edx before you start doing your multiplications.

Stack Frame

I would rewrite the function a bit to use an internal function for the recursion. That internal function would use a purely register-based calling convention to avoid setting up a stack frame for every invocation.

Using Intel syntax, I'd write the code something on this general order:

; when first called, input value in edi
; and edx:eax containing 0:1
; result: factorial in edx:eax
;
internal_factorial:
 mul eax, edi
 sub edi, 1
 jz done
 call internal_factorial
done:
 ret

Then the main routine would be something on this general order:

factorial:
 mov eax, 1 ; prepare our 64-bit result value in edx:eax
 xor edx, edx
 cmp edi, eax ; check for (either) base case
 jbe do_ret ; if it's not a base case, compute the factorial
 call internal_factorial
do_ret:
 ret

Why are you zeroing edx? It is never read, only written to (by the mul), in the OP, and is otherwise completely unused in your version.
@1201ProgramAlarm: Clearing EDX allows the caller to actually get a result in edx:eax instead of only in eax. Depending on the prior state of the processor, this can also actually improve speed. The processor detects clearing a register (such as with xor reg, reg or sub reg, reg) so afterwards it knows operations that affect that register don't depend on its prior state, so it can execute the two sets of operations in parallel.
One-operand mul overwrites rdx, the previous state of it doesn't matter. It's div before which rdx should usually be zeroed

Stack Exchange Network

Writing a recursive factorial function in x86-64

1 Answer 1

Input Handling

Register Usage

Stack Frame

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Writing a recursive factorial function in x86-64

1 Answer 1

Input Handling

Register Usage

Stack Frame

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions