I have the following function which I wrote in NASM to merge two sorted arrays. Its not a standalone, but its supposed to be compiled into a .o
file and then linked to a .c
file from which the function can be called with the following signature:
void merge(int* arr1, int size1, int* arr2, int size2, int* merged_array)
I have tested it and my code works as intended. However, I am somewhat new to assembly and I don't know if I'm following best practices with this code. Also, I timed it against a function that I wrote in C to do the same thing, and the C function consistently outperforms it for large arrays. So I'm wondering if there's any obvious sections of it I can rewrite to improve performance.
global merge
merge: ; entry point for function merge
push ebp
mov ebp, esp
mov eax,[esp+8] ; pointer to array 1
mov esi,[esp+12] ; length of array 1
mov ebx,[esp+16] ; pointer to array 2
mov edi,[esp+20] ; length of array 2
mov edx,[esp+24] ; pointer to destination array (pre-allocated)
begin_merge:
test esi,esi ; if we've run out of array 1
jz copy2 ; just copy over array 2
test edi,edi ; if we've run out of array 2
jz copy1 ; just copy over array 1
;otherwise, compare the first elements
mov ecx,[eax]
cmp ecx,[ebx]
jl merge_from_1 ; if eax > ebx, merge from array 1
jmp merge_from_2; otherwise keep going for the merge from array 2
; merge a single element from an array
merge_from_2:
mov ecx,[ebx]
mov [edx],ecx
add edx,4
add ebx,4
dec edi
jmp begin_merge
merge_from_1:
mov ecx,[eax]
mov [edx],ecx
add edx,4
add eax,4
dec esi
jmp begin_merge
; copy over an entire array
copy1:
test esi,esi ; if we've run out of array 1
jz finish ; end
; otherwise, copy an element from array 1
jmp merge_from_1 ; this will return to begin_merge, but thats okay
; because begin_merge will just forward it on to here again
copy2:
test edi,edi ; if we've run out of array 2
jz finish ; end
; otherwise, copy an element from array 2
jmp merge_from_2 ; this will return to begin_merge, but thats okay
; because begin_merge will just forward it on to here again
finish:
leave
ret
-
\$\begingroup\$ I have rolled back Rev 6 → 4. Code in the question should not be modified after answers have been posted. Please see What to do when someone answers . \$\endgroup\$200_success– 200_success2017年06月16日 18:14:55 +00:00Commented Jun 16, 2017 at 18:14
4 Answers 4
Firstly, the way you are addressing the stack (cdecl calling convention) is wrong. Now because your creating an empty procedure frame, the way you've done it works, but if you were to allocate space on the stack which is usually the reason for a procedure frame, then your method would fail.
push ebp
mov ebp, esp
add esp, -512
or
enter 512, 0
Your method would require
mov edx, [esp+536]
After
void merge(int* arr1, int size1, int* arr2, int size2, int* merged_array)
stacks contents would be as follows
EBP + 24 = arr1
EBP + 20 = size1
EBP + 16 = arr2
EBP + 12 = size2
EBP + 8 = merged_array
EBP + 4 = Return address to calling application
EBP + 0 = Previous contents of EBP
Use the registers that are intended for a particular purpose. Yes, all 8 aregeneral purpose, but they also facilitate tighter code when used correctly.
mov edi, [ebp+8] ; Destination buffer
mov esi, [ebp+24] ; Pntr to arr1
mov ecx, [ebp+20] ; size1
Then it is a simple matter of
repnz movsd
Load parameters into ESI & ECX for array 2 and your done and repeat this instruction and your done.
MOVS(?)
This instruction moves a byte (b) word (w) dword (d) or if your working in 64 bit qword (q) increments EDI & ESI by 1, 2, 4 or 8 respectively and decrements ECX by 1 and continues until ECX = 0. Traversing memory regions can be reversed by setting DF (Direction flag) in EFLAGS.
-
\$\begingroup\$ I see what you are saying, and I understand the movsd instruction now as well, thank you. I didn't know about the calling convention. But should I not move the parameter values from [EBP+??] at all except at the end where I copy over the entire array? Also whats the difference between
EBP
andESP
? \$\endgroup\$Raghav Malik– Raghav Malik2017年06月16日 18:42:35 +00:00Commented Jun 16, 2017 at 18:42 -
\$\begingroup\$ Registers only have to be loaded with values when needed. After the first invocation of MOVSD, EDI is already pointing to next part of the destination buffer, so all that's required is to load ESI & ECX with next set of values. \$\endgroup\$Shift_Left– Shift_Left2017年06月16日 19:16:02 +00:00Commented Jun 16, 2017 at 19:16
-
\$\begingroup\$ Oh I think you misunderstand what I am trying to do in my code... I'm not concatenating the two arrays, I'm merging two sorted arrays so as to maintain the sorted order. But the
movsd
trick is still good, so thank you \$\endgroup\$Raghav Malik– Raghav Malik2017年06月16日 19:19:35 +00:00Commented Jun 16, 2017 at 19:19 -
1\$\begingroup\$ LODSD copies the contents of the memory location pointed to by ESI into EAX. STOSD copies the contents of EAX into the memory location pointed to by EDI. MOVSD does exactly the same thing, but without altering the contents of EAX. MOVS(?) without the REP qualifier does not alter ECX. \$\endgroup\$Shift_Left– Shift_Left2017年06月16日 19:37:07 +00:00Commented Jun 16, 2017 at 19:37
-
1\$\begingroup\$ Let us continue this discussion in chat. \$\endgroup\$Shift_Left– Shift_Left2017年06月16日 19:53:37 +00:00Commented Jun 16, 2017 at 19:53
While OP seemed to find the comments useful, we're supposed to put this stuff in answers. Seems about as sensible as Jeopardy's "put your response in the form of a question." Still, that's the rule, so:
- Why are you doing
jmp merge_from_2
? Where do you think the execution is going to go if you don't do that jmp? - And in copy2, why have 2 jumps, instead of just
jnz merge_from_2
? - Which brings us to
jz copy1
. In copy1, you are going to check if esi is zero. But you just checked that before thejz
. If there no way to get to copy1 with esi zero, then checking for zero there is redundant. And if you remove the test/jz, that just leaves jmp merge_from_1. Why have a jump to a jump?
And one more:
- When you do
dec edi
ordec esi
, thedec
sets the flags. You can do the conditional jump right there. Since you are only decrementing one or the other, jumping back to a place where you test both again seems redundant. If you check the results of the decrement, you don't need to do this. Maybe move thebegin_merge
label down so you only do the 2 tests on entry.
mergesort is supposed to be a stable sorting algorithm. Therefore, it is customary to implement the merge operation such that if an element from array 1 is equal to an element from array 2, then it will prefer to take the element from array 1 first. (If you're comparing integers, then stability doesn't matter, since the result will be indistinguishable. But it's a matter of principle that you should implement the conventional merging algorithm.)
Therefore, I recommend changing
jl merge_from_1
to
jle merge_from_1
Also, the following line (jmp merge_from_2
) is superfluous and can be eliminated.
-
\$\begingroup\$ Thank you. I missed the stability thing. Is there any way I can optimize the code I've written further to improve performance? After making those changes it still runs about twice as slow as the C version. \$\endgroup\$Raghav Malik– Raghav Malik2017年06月16日 17:19:08 +00:00Commented Jun 16, 2017 at 17:19
The copy mechanism is too clever. It adds an overhead of 2 tests and 4 jumps to copy a single unit of data. The right way is to setup ecx
according to how much data left, and rep movsd
them.
-
\$\begingroup\$ Could you please explain that further? I get what you mean about how inefficiently I copy over arrays, but like I said I am somewhat new to NASM and I'm not sure how to do what you described syntactically \$\endgroup\$Raghav Malik– Raghav Malik2017年06月16日 18:08:22 +00:00Commented Jun 16, 2017 at 18:08
Explore related questions
See similar questions with these tags.