My task was to implement this algorithm that uses memoization to calculate fibonnaci numbers:
int fib(int n)
{
int f[n+2]; // 1 extra to handle case, n = 0
int i;
/* 0th and 1st number of the series are 0 and 1*/
f[0] = 0;
f[1] = 1;
for (i = 2; i <= n; i++)
{
f[i] = f[i-1] + f[i-2];
}
return f[n];
}
This proved to be more challenging because it involves local variables. What's more, it's a local array with a variable initial size.
;*******************************************************************************
; This program calculates fibonacci numbers with a dynamic programming
; algorithm.
;
; Author: William
.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD
.code
;-------------------------------------------------------------------------------
; FibAsm( int n )
;
; Calculates fibonacci numbers using memoizaiton
;
; Entry: n @ ebp + 8 ; DWORD unsigned int up to 47
;
; Local: f[] ; array to store fibs
;
; Exit: the fib number is returned in eax
;-------------------------------------------------------------------------------
FibAsm:
; set up stack frame
push ebp
mov ebp, esp
push ebx
push ecx
push esi
push edi
mov ebx, [ebp + 8] ; n stored in ebx
; new f[] of size n + 2
lea edi, [ebx + 2] ; save size of array in edi
shl edi, 2 ; mul by 4, size of DWORD int
sub esp, edi ; allocate space for f[]
; pointer to f[] in esp !
; f[0] = 0 and f[1] = 1
mov DWORD PTR[esp], 0
mov DWORD PTR[esp + 1 * 4], 1
; for (i = 2; i <=n; i++)
mov ecx, 2 ;loop count i set to 2
L1: cmp ecx, ebx
jg endL1
; f[i] = f[i - 1] + f[i - 2]
lea eax, [ecx - 1]
mov esi, DWORD PTR[esp + eax * 4]
dec eax
add esi, DWORD PTR[esp + eax * 4]
mov [esp + ecx * 4], esi
inc ecx ;inc i
jmp L1
endL1:
; return f[n]
mov eax, DWORD PTR[esp + ebx * 4]
;clean up stack, restore regs
add esp, edi ; add array size to esp
pop edi
pop esi
pop ecx
pop ebx
pop ebp
ret 4
main:
push 47
call FibAsm
INVOKE ExitProcess, 0
END main
What I ended up doing after a long time playing with the stack frame was to calculate the size first and save that in a register in order to clean up the stack later. Also I used the esp
register as the start of the array instead of an offset from ebp
. That seems counter intuitive, but it was simpler than calculating some distance from ebp
(which would have to account for all the registers than were pushed to the stack after ebp
).
Anyway, I'm wondering how this is normally done and looking for any other criticism as well.
1 Answer 1
I'm wondering how this is normally done
Yours is as good a solution as anyone else's.
That said, a solution that pushes the elements would not need to address these elements relative to EBP
at all.
; new f[] of size n + 2 lea edi, [ebx + 2] ; save size of array in edi shl edi, 2 ; mul by 4, size of DWORD int
You can calculate the required stackspace in a single LEA
instruction:
; new f[] of size n + 2
lea edi, [ebx * 4 + 8]
; f[0] = 0 and f[1] = 1 mov DWORD PTR[esp], 0 mov DWORD PTR[esp + 1 * 4], 1 ; for (i = 2; i <=n; i++) mov ecx, 2 ; loop count i set to 2
All of these assignments require a 32-bit immediate value. That's a lot of bytes! Because later we need ECX=2
, we can start with clearing ECX
and incrementing twice towards 2, storing the intermediate results in memory:
; f[0] = 0 and f[1] = 1
xor ecx, ecx
mov [esp + 0 * 4], ecx ; f[0] = 0
inc ecx
mov [esp + 1 * 4], ecx ; f[1] = 1
inc ecx ; loop count i set to 2
But wait, there still room for improvement!
- The address
[esp + 0 * 4]
requires a modR/M-byte, sib-byte and an 8-bit displacement. - The address
[esp + ecx * 4]
requires a modR/M-byte and a sib-byte.
Final draft:
; f[0] = 0 and f[1] = 1
xor ecx, ecx
mov [esp + ecx * 4], ecx ; f[0] = 0
inc ecx
mov [esp + ecx * 4], ecx ; f[1] = 1
inc ecx ; loop count i set to 2
; f[i] = f[i - 1] + f[i - 2] lea eax, [ecx - 1] mov esi, DWORD PTR[esp + eax * 4] dec eax add esi, DWORD PTR[esp + eax * 4] mov [esp + ecx * 4], esi
You can easily address the 2 previous array elements by adding offsets -4 and -8.
This liberates the ESI
register that also no longer needs to be preserved.
; f[i] = f[i - 1] + f[i - 2]
mov eax, [esp + ecx * 4 - 8] ; f[i - 2]
add eax, [esp + ecx * 4 - 4] ; f[i - 1]
mov [esp + ecx * 4], eax ; f[i]
mov ecx, 2 L1: cmp ecx, ebx jg endL1 ... inc ecx ; inc i jmp L1 endL1:
A loop like this is wasteful because of the uncondional jump that gets executed on every iteration, on top of the conditional jump that needs to be processed anyway. You can make the conditional jump the only jump that gets executed repeatedly.
mov ecx, 2
jmps L2 ; once
L1: ...
inc ecx ; inc i
L2: cmp ecx, ebx
jbe L1 ; repeatedly
Explore related questions
See similar questions with these tags.