I'm trying to teach myself assembly via MASM, below is my first implementation of the recursive form of the fibonacci sequence. I timed it against an equivalent C implementation in release mode and the C implementation was much better optimized. What can I do better? I haven't gotten the hang of how the various side effects of instructions impact the state of registers or flags yet, or, well, most things in assembly.
Assembly:
;extern "C" unsigned long fibonacci_asm_canonical(unsigned long number)
fibonacci_asm_canonical proc number: dword
LOCAL fib_lhs:DWORD
mov eax, number ; store number parameter in eax register (also used as return value).
.IF eax > 1
; recursively call this function, retrieving the value for the left-hand side.
dec eax
mov edx, eax
invoke fibonacci_asm_canonical, edx
mov fib_lhs, eax ; store the result of fibonacci_asm_canonical(number - 1) into fib_lhs.
mov eax, number ; store number parameter in eax register to set up right-hand side of addition.
; recursively call this function, retrieving the value for the right-hand side.
sub eax, 2
invoke fibonacci_asm_canonical, eax
;eax now contains result of fibonacci_asm_canonical(number - 2), following the invocation,
;so add it with the result of fibonacci_asm_canonical(number - 1) which is in fib_lhs.
add eax, fib_lhs
.ENDIF
ret
fibonacci_asm_canonical endp
C version:
unsigned long fib_canonical(unsigned long number)
{
if (number <= 1)
return number;
return fib_canonical(number - 1) + fib_canonical(number - 2);
}
```
-
1\$\begingroup\$ There is not much to micro-optimize. Function calls are relatively expensive, same as lots of loads/stores into the stack. But calls & stack usage are result of choosing recursive solution. Number of iterations is exponential, for n=20 it performs in the ballpark of a million iterations. If you were to implement iterative solution it would only need 20 iterations (for n=20). Iterative solution may also be more susceptible to micro-optimization. \$\endgroup\$stepan– stepan2021年02月18日 16:35:33 +00:00Commented Feb 18, 2021 at 16:35
1 Answer 1
dec eax mov edx, eax
This you can write as the single instruction lea edx, [eax-1]
, but your code doesn't need to use the extra register EDX
at all. The single dec eax
is enough.
If you really want to optimize code then don't use constructs like .IF eax > 1
.ENDIF
(and many others) for which you don't necessarily know how they get encoded.
invoke
, proc
, local
, and endp
will very probably entail that prologue and epilogue codes are inserted. This is fine for the interfacing with the main program but is demanding on the recursion.
push ebp ; Prologue
mov ebp, esp
sub esp, 4 ; Local
...
mov esp, ebp ; Epilogue
pop ebp
ret 4 ; Parameter removal
With a separate recursive routine DoFib, you can:
- avoid using the above and rely on the faster
[esp]
relative addressing. - reuse the argument from the first recursive call. All it takes is a second
dec
on the already decremented value. - pass the single dword argument via the
EAX
register instead of on the stack.
;extern "C" unsigned long fibonacci_asm_canonical(unsigned long number)
fibonacci_asm_canonical proc number: dword
mov eax, number
call DoFib ; -> EAX
ret
fibonacci_asm_canonical endp
; IN (eax) OUT (eax)
DoFib:
cmp eax, 1
jbe .done
sub esp, 8 ; Room for local storage
dec eax
mov [esp], eax ; (number - 1)
call DoFib ; -> EAX
mov [esp+4], eax ; Store lhs result
mov eax, [esp] ; (number - 1)
dec eax ; (number - 2)
call DoFib ; -> EAX
add eax, [esp+4] ; rhs + lhs
add esp, 8
.done:
ret