Dynamic Fibonacci algorithm in x86 (MASM)

Question 1

My task was to implement this algorithm that uses memoization to calculate fibonnaci numbers:

int fib(int n)
{
 int f[n+2]; // 1 extra to handle case, n = 0
 int i;
 /* 0th and 1st number of the series are 0 and 1*/
 f[0] = 0;
 f[1] = 1;
 for (i = 2; i <= n; i++)
 {
 f[i] = f[i-1] + f[i-2];
 }
 return f[n];
}

This proved to be more challenging because it involves local variables. What's more, it's a local array with a variable initial size.

;*******************************************************************************
; This program calculates fibonacci numbers with a dynamic programming
; algorithm.
;
; Author: William 
 .386 
 .model flat, stdcall
 .stack 4096
ExitProcess PROTO, dwExitCode:DWORD 
.code
;-------------------------------------------------------------------------------
; FibAsm( int n )
;
; Calculates fibonacci numbers using memoizaiton
;
; Entry: n @ ebp + 8 ; DWORD unsigned int up to 47
; 
; Local: f[] ; array to store fibs
;
; Exit: the fib number is returned in eax
;-------------------------------------------------------------------------------
FibAsm:
 ; set up stack frame
 push ebp
 mov ebp, esp
 push ebx
 push ecx
 push esi
 push edi
 mov ebx, [ebp + 8] ; n stored in ebx
 ; new f[] of size n + 2 
 lea edi, [ebx + 2] ; save size of array in edi
 shl edi, 2 ; mul by 4, size of DWORD int
 sub esp, edi ; allocate space for f[]
 ; pointer to f[] in esp !
 ; f[0] = 0 and f[1] = 1 
 mov DWORD PTR[esp], 0
 mov DWORD PTR[esp + 1 * 4], 1 
 ; for (i = 2; i <=n; i++)
 mov ecx, 2 ;loop count i set to 2
L1: cmp ecx, ebx
 jg endL1
 ; f[i] = f[i - 1] + f[i - 2]
 lea eax, [ecx - 1]
 mov esi, DWORD PTR[esp + eax * 4]
 dec eax
 add esi, DWORD PTR[esp + eax * 4]
 mov [esp + ecx * 4], esi
 inc ecx ;inc i
 jmp L1
endL1: 
 ; return f[n]
 mov eax, DWORD PTR[esp + ebx * 4]
 ;clean up stack, restore regs 
 add esp, edi ; add array size to esp
 pop edi
 pop esi
 pop ecx
 pop ebx
 pop ebp
 ret 4
main:
 push 47
 call FibAsm
 INVOKE ExitProcess, 0 
 END main

What I ended up doing after a long time playing with the stack frame was to calculate the size first and save that in a register in order to clean up the stack later. Also I used the esp register as the start of the array instead of an offset from ebp. That seems counter intuitive, but it was simpler than calculating some distance from ebp (which would have to account for all the registers than were pushed to the stack after ebp).

Anyway, I'm wondering how this is normally done and looking for any other criticism as well.

Question 2

I'm wondering how this is normally done

Yours is as good a solution as anyone else's.
That said, a solution that pushes the elements would not need to address these elements relative to EBP at all.

; new f[] of size n + 2 
lea edi, [ebx + 2] ; save size of array in edi
shl edi, 2 ; mul by 4, size of DWORD int

You can calculate the required stackspace in a single LEA instruction:

; new f[] of size n + 2 
lea edi, [ebx * 4 + 8]

; f[0] = 0 and f[1] = 1 
mov DWORD PTR[esp], 0
mov DWORD PTR[esp + 1 * 4], 1 
; for (i = 2; i <=n; i++)
mov ecx, 2 ; loop count i set to 2

All of these assignments require a 32-bit immediate value. That's a lot of bytes! Because later we need ECX=2, we can start with clearing ECX and incrementing twice towards 2, storing the intermediate results in memory:

; f[0] = 0 and f[1] = 1 
xor ecx, ecx
mov [esp + 0 * 4], ecx ; f[0] = 0
inc ecx
mov [esp + 1 * 4], ecx ; f[1] = 1
inc ecx ; loop count i set to 2

But wait, there still room for improvement!

The address [esp + 0 * 4] requires a modR/M-byte, sib-byte and an 8-bit displacement.
The address [esp + ecx * 4] requires a modR/M-byte and a sib-byte.

Final draft:

; f[0] = 0 and f[1] = 1 
xor ecx, ecx
mov [esp + ecx * 4], ecx ; f[0] = 0
inc ecx
mov [esp + ecx * 4], ecx ; f[1] = 1
inc ecx ; loop count i set to 2

; f[i] = f[i - 1] + f[i - 2]
lea eax, [ecx - 1]
mov esi, DWORD PTR[esp + eax * 4]
dec eax
add esi, DWORD PTR[esp + eax * 4]
mov [esp + ecx * 4], esi

You can easily address the 2 previous array elements by adding offsets -4 and -8.
This liberates the ESI register that also no longer needs to be preserved.

; f[i] = f[i - 1] + f[i - 2]
mov eax, [esp + ecx * 4 - 8] ; f[i - 2]
add eax, [esp + ecx * 4 - 4] ; f[i - 1]
mov [esp + ecx * 4], eax ; f[i]

 mov ecx, 2
L1: cmp ecx, ebx
 jg endL1
 ...
 inc ecx ; inc i
 jmp L1
endL1:

A loop like this is wasteful because of the uncondional jump that gets executed on every iteration, on top of the conditional jump that needs to be processed anyway. You can make the conditional jump the only jump that gets executed repeatedly.

 mov ecx, 2
 jmps L2 ; once
L1: ...
 inc ecx ; inc i
L2: cmp ecx, ebx
 jbe L1 ; repeatedly

Sep Roland Sep Roland 4,78317 silver badges28 bronze badges · Accepted Answer · 2018-04-15 15:24:50Z

I'm wondering how this is normally done

Yours is as good a solution as anyone else's.
That said, a solution that pushes the elements would not need to address these elements relative to EBP at all.

; new f[] of size n + 2 
lea edi, [ebx + 2] ; save size of array in edi
shl edi, 2 ; mul by 4, size of DWORD int

You can calculate the required stackspace in a single LEA instruction:

; new f[] of size n + 2 
lea edi, [ebx * 4 + 8]

; f[0] = 0 and f[1] = 1 
mov DWORD PTR[esp], 0
mov DWORD PTR[esp + 1 * 4], 1 
; for (i = 2; i <=n; i++)
mov ecx, 2 ; loop count i set to 2

All of these assignments require a 32-bit immediate value. That's a lot of bytes! Because later we need ECX=2, we can start with clearing ECX and incrementing twice towards 2, storing the intermediate results in memory:

; f[0] = 0 and f[1] = 1 
xor ecx, ecx
mov [esp + 0 * 4], ecx ; f[0] = 0
inc ecx
mov [esp + 1 * 4], ecx ; f[1] = 1
inc ecx ; loop count i set to 2

But wait, there still room for improvement!

The address [esp + 0 * 4] requires a modR/M-byte, sib-byte and an 8-bit displacement.
The address [esp + ecx * 4] requires a modR/M-byte and a sib-byte.

Final draft:

; f[0] = 0 and f[1] = 1 
xor ecx, ecx
mov [esp + ecx * 4], ecx ; f[0] = 0
inc ecx
mov [esp + ecx * 4], ecx ; f[1] = 1
inc ecx ; loop count i set to 2

; f[i] = f[i - 1] + f[i - 2]
lea eax, [ecx - 1]
mov esi, DWORD PTR[esp + eax * 4]
dec eax
add esi, DWORD PTR[esp + eax * 4]
mov [esp + ecx * 4], esi

You can easily address the 2 previous array elements by adding offsets -4 and -8.
This liberates the ESI register that also no longer needs to be preserved.

; f[i] = f[i - 1] + f[i - 2]
mov eax, [esp + ecx * 4 - 8] ; f[i - 2]
add eax, [esp + ecx * 4 - 4] ; f[i - 1]
mov [esp + ecx * 4], eax ; f[i]

 mov ecx, 2
L1: cmp ecx, ebx
 jg endL1
 ...
 inc ecx ; inc i
 jmp L1
endL1:

A loop like this is wasteful because of the uncondional jump that gets executed on every iteration, on top of the conditional jump that needs to be processed anyway. You can make the conditional jump the only jump that gets executed repeatedly.

 mov ecx, 2
 jmps L2 ; once
L1: ...
 inc ecx ; inc i
L2: cmp ecx, ebx
 jbe L1 ; repeatedly

Stack Exchange Network

Dynamic Fibonacci algorithm in x86 (MASM)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Dynamic Fibonacci algorithm in x86 (MASM)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions