MASM) assembly implementation of recursive Fibonacci

Question 1

I'm trying to teach myself assembly via MASM, below is my first implementation of the recursive form of the fibonacci sequence. I timed it against an equivalent C implementation in release mode and the C implementation was much better optimized. What can I do better? I haven't gotten the hang of how the various side effects of instructions impact the state of registers or flags yet, or, well, most things in assembly.

Assembly:

;extern "C" unsigned long fibonacci_asm_canonical(unsigned long number)
fibonacci_asm_canonical proc number: dword
 LOCAL fib_lhs:DWORD
 mov eax, number ; store number parameter in eax register (also used as return value).
 .IF eax > 1
 ; recursively call this function, retrieving the value for the left-hand side.
 dec eax
 mov edx, eax
 invoke fibonacci_asm_canonical, edx
 mov fib_lhs, eax ; store the result of fibonacci_asm_canonical(number - 1) into fib_lhs.
 mov eax, number ; store number parameter in eax register to set up right-hand side of addition.
 ; recursively call this function, retrieving the value for the right-hand side.
 sub eax, 2 
 invoke fibonacci_asm_canonical, eax
 ;eax now contains result of fibonacci_asm_canonical(number - 2), following the invocation,
 ;so add it with the result of fibonacci_asm_canonical(number - 1) which is in fib_lhs.
 add eax, fib_lhs
 .ENDIF
 ret
fibonacci_asm_canonical endp

C version:

unsigned long fib_canonical(unsigned long number)
{
 if (number <= 1)
 return number;
 return fib_canonical(number - 1) + fib_canonical(number - 2);
}
```

Question 2

There is not much to micro-optimize. Function calls are relatively expensive, same as lots of loads/stores into the stack. But calls & stack usage are result of choosing recursive solution. Number of iterations is exponential, for n=20 it performs in the ballpark of a million iterations. If you were to implement iterative solution it would only need 20 iterations (for n=20). Iterative solution may also be more susceptible to micro-optimization.

Question 3

dec eax
mov edx, eax

This you can write as the single instruction lea edx, [eax-1], but your code doesn't need to use the extra register EDX at all. The single dec eax is enough.

If you really want to optimize code then don't use constructs like .IF eax > 1 .ENDIF (and many others) for which you don't necessarily know how they get encoded.

invoke, proc, local, and endp will very probably entail that prologue and epilogue codes are inserted. This is fine for the interfacing with the main program but is demanding on the recursion.

push ebp ; Prologue
mov ebp, esp
sub esp, 4 ; Local
...
mov esp, ebp ; Epilogue
pop ebp
ret 4 ; Parameter removal

With a separate recursive routine DoFib, you can:

avoid using the above and rely on the faster [esp] relative addressing.
reuse the argument from the first recursive call. All it takes is a second dec on the already decremented value.
pass the single dword argument via the EAX register instead of on the stack.

;extern "C" unsigned long fibonacci_asm_canonical(unsigned long number)
fibonacci_asm_canonical proc number: dword
 mov eax, number
 call DoFib ; -> EAX
 ret
fibonacci_asm_canonical endp
; IN (eax) OUT (eax)
DoFib:
 cmp eax, 1
 jbe .done
 sub esp, 8 ; Room for local storage
 dec eax
 mov [esp], eax ; (number - 1)
 call DoFib ; -> EAX
 mov [esp+4], eax ; Store lhs result
 mov eax, [esp] ; (number - 1)
 dec eax ; (number - 2)
 call DoFib ; -> EAX
 add eax, [esp+4] ; rhs + lhs
 add esp, 8
 .done:
 ret

Sep Roland Sep Roland 4,78317 silver badges28 bronze badges · Answer 1 · 2021-03-07 20:13:51Z

dec eax
mov edx, eax

This you can write as the single instruction lea edx, [eax-1], but your code doesn't need to use the extra register EDX at all. The single dec eax is enough.

If you really want to optimize code then don't use constructs like .IF eax > 1 .ENDIF (and many others) for which you don't necessarily know how they get encoded.

invoke, proc, local, and endp will very probably entail that prologue and epilogue codes are inserted. This is fine for the interfacing with the main program but is demanding on the recursion.

push ebp ; Prologue
mov ebp, esp
sub esp, 4 ; Local
...
mov esp, ebp ; Epilogue
pop ebp
ret 4 ; Parameter removal

With a separate recursive routine DoFib, you can:

avoid using the above and rely on the faster [esp] relative addressing.
reuse the argument from the first recursive call. All it takes is a second dec on the already decremented value.
pass the single dword argument via the EAX register instead of on the stack.

;extern "C" unsigned long fibonacci_asm_canonical(unsigned long number)
fibonacci_asm_canonical proc number: dword
 mov eax, number
 call DoFib ; -> EAX
 ret
fibonacci_asm_canonical endp
; IN (eax) OUT (eax)
DoFib:
 cmp eax, 1
 jbe .done
 sub esp, 8 ; Room for local storage
 dec eax
 mov [esp], eax ; (number - 1)
 call DoFib ; -> EAX
 mov [esp+4], eax ; Store lhs result
 mov eax, [esp] ; (number - 1)
 dec eax ; (number - 2)
 call DoFib ; -> EAX
 add eax, [esp+4] ; rhs + lhs
 add esp, 8
 .done:
 ret

Stack Exchange Network

MASM) assembly implementation of recursive Fibonacci

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MASM) assembly implementation of recursive Fibonacci

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions