Merge two sorted arrays in NASM

Question 1

I have the following function which I wrote in NASM to merge two sorted arrays. Its not a standalone, but its supposed to be compiled into a .o file and then linked to a .c file from which the function can be called with the following signature:

void merge(int* arr1, int size1, int* arr2, int size2, int* merged_array)

I have tested it and my code works as intended. However, I am somewhat new to assembly and I don't know if I'm following best practices with this code. Also, I timed it against a function that I wrote in C to do the same thing, and the C function consistently outperforms it for large arrays. So I'm wondering if there's any obvious sections of it I can rewrite to improve performance.

global merge
merge: ; entry point for function merge
 push ebp
 mov ebp, esp
 mov eax,[esp+8] ; pointer to array 1
 mov esi,[esp+12] ; length of array 1
 mov ebx,[esp+16] ; pointer to array 2
 mov edi,[esp+20] ; length of array 2
 mov edx,[esp+24] ; pointer to destination array (pre-allocated)
begin_merge:
 test esi,esi ; if we've run out of array 1
 jz copy2 ; just copy over array 2
 test edi,edi ; if we've run out of array 2
 jz copy1 ; just copy over array 1
 ;otherwise, compare the first elements
 mov ecx,[eax]
 cmp ecx,[ebx]
 jl merge_from_1 ; if eax > ebx, merge from array 1
 jmp merge_from_2; otherwise keep going for the merge from array 2
; merge a single element from an array
merge_from_2:
 mov ecx,[ebx]
 mov [edx],ecx
 add edx,4
 add ebx,4
 dec edi
 jmp begin_merge
merge_from_1:
 mov ecx,[eax]
 mov [edx],ecx
 add edx,4
 add eax,4
 dec esi
 jmp begin_merge
; copy over an entire array
copy1:
 test esi,esi ; if we've run out of array 1
 jz finish ; end
 ; otherwise, copy an element from array 1
 jmp merge_from_1 ; this will return to begin_merge, but thats okay
 ; because begin_merge will just forward it on to here again
copy2:
 test edi,edi ; if we've run out of array 2
 jz finish ; end
 ; otherwise, copy an element from array 2
 jmp merge_from_2 ; this will return to begin_merge, but thats okay
 ; because begin_merge will just forward it on to here again
finish:
 leave
 ret

Question 2

I have rolled back Rev 6 → 4. Code in the question should not be modified after answers have been posted. Please see What to do when someone answers .

Question 3

Firstly, the way you are addressing the stack (cdecl calling convention) is wrong. Now because your creating an empty procedure frame, the way you've done it works, but if you were to allocate space on the stack which is usually the reason for a procedure frame, then your method would fail.

 push ebp
 mov ebp, esp
 add esp, -512

or

 enter 512, 0

Your method would require

 mov edx, [esp+536]

After

void merge(int* arr1, int size1, int* arr2, int size2, int* merged_array)

stacks contents would be as follows

EBP + 24 = arr1
EBP + 20 = size1
EBP + 16 = arr2
EBP + 12 = size2
EBP + 8 = merged_array
EBP + 4 = Return address to calling application
EBP + 0 = Previous contents of EBP

Use the registers that are intended for a particular purpose. Yes, all 8 aregeneral purpose, but they also facilitate tighter code when used correctly.

 mov edi, [ebp+8] ; Destination buffer
 mov esi, [ebp+24] ; Pntr to arr1
 mov ecx, [ebp+20] ; size1

Then it is a simple matter of

 repnz movsd

Load parameters into ESI & ECX for array 2 and your done and repeat this instruction and your done.

MOVS(?)

This instruction moves a byte (b) word (w) dword (d) or if your working in 64 bit qword (q) increments EDI & ESI by 1, 2, 4 or 8 respectively and decrements ECX by 1 and continues until ECX = 0. Traversing memory regions can be reversed by setting DF (Direction flag) in EFLAGS.

Question 4

I see what you are saying, and I understand the movsd instruction now as well, thank you. I didn't know about the calling convention. But should I not move the parameter values from [EBP+??] at all except at the end where I copy over the entire array? Also whats the difference between EBP and ESP?

Question 5

Registers only have to be loaded with values when needed. After the first invocation of MOVSD, EDI is already pointing to next part of the destination buffer, so all that's required is to load ESI & ECX with next set of values.

Question 6

Oh I think you misunderstand what I am trying to do in my code... I'm not concatenating the two arrays, I'm merging two sorted arrays so as to maintain the sorted order. But the movsd trick is still good, so thank you

Question 7

LODSD copies the contents of the memory location pointed to by ESI into EAX. STOSD copies the contents of EAX into the memory location pointed to by EDI. MOVSD does exactly the same thing, but without altering the contents of EAX. MOVS(?) without the REP qualifier does not alter ECX.

Question 8

Let us continue this discussion in chat.

Question 9

While OP seemed to find the comments useful, we're supposed to put this stuff in answers. Seems about as sensible as Jeopardy's "put your response in the form of a question." Still, that's the rule, so:

Why are you doing jmp merge_from_2? Where do you think the execution is going to go if you don't do that jmp?
And in copy2, why have 2 jumps, instead of just jnz merge_from_2?
Which brings us to jz copy1. In copy1, you are going to check if esi is zero. But you just checked that before the jz. If there no way to get to copy1 with esi zero, then checking for zero there is redundant. And if you remove the test/jz, that just leaves jmp merge_from_1. Why have a jump to a jump?

And one more:

When you do dec edi or dec esi, the dec sets the flags. You can do the conditional jump right there. Since you are only decrementing one or the other, jumping back to a place where you test both again seems redundant. If you check the results of the decrement, you don't need to do this. Maybe move the begin_merge label down so you only do the 2 tests on entry.

Question 10

mergesort is supposed to be a stable sorting algorithm. Therefore, it is customary to implement the merge operation such that if an element from array 1 is equal to an element from array 2, then it will prefer to take the element from array 1 first. (If you're comparing integers, then stability doesn't matter, since the result will be indistinguishable. But it's a matter of principle that you should implement the conventional merging algorithm.)

Therefore, I recommend changing

jl merge_from_1

to

jle merge_from_1

Also, the following line (jmp merge_from_2) is superfluous and can be eliminated.

Question 11

Thank you. I missed the stability thing. Is there any way I can optimize the code I've written further to improve performance? After making those changes it still runs about twice as slow as the C version.

Question 12

The copy mechanism is too clever. It adds an overhead of 2 tests and 4 jumps to copy a single unit of data. The right way is to setup ecx according to how much data left, and rep movsd them.

Question 13

Could you please explain that further? I get what you mean about how inefficiently I copy over arrays, but like I said I am somewhat new to NASM and I'm not sure how to do what you described syntactically

Shift_Left Shift_Left 9905 silver badges11 bronze badges · Accepted Answer · 2017-06-16 18:19:59Z

Firstly, the way you are addressing the stack (cdecl calling convention) is wrong. Now because your creating an empty procedure frame, the way you've done it works, but if you were to allocate space on the stack which is usually the reason for a procedure frame, then your method would fail.

 push ebp
 mov ebp, esp
 add esp, -512

or

 enter 512, 0

Your method would require

 mov edx, [esp+536]

After

void merge(int* arr1, int size1, int* arr2, int size2, int* merged_array)

stacks contents would be as follows

EBP + 24 = arr1
EBP + 20 = size1
EBP + 16 = arr2
EBP + 12 = size2
EBP + 8 = merged_array
EBP + 4 = Return address to calling application
EBP + 0 = Previous contents of EBP

Use the registers that are intended for a particular purpose. Yes, all 8 aregeneral purpose, but they also facilitate tighter code when used correctly.

 mov edi, [ebp+8] ; Destination buffer
 mov esi, [ebp+24] ; Pntr to arr1
 mov ecx, [ebp+20] ; size1

Then it is a simple matter of

 repnz movsd

Load parameters into ESI & ECX for array 2 and your done and repeat this instruction and your done.

MOVS(?)

This instruction moves a byte (b) word (w) dword (d) or if your working in 64 bit qword (q) increments EDI & ESI by 1, 2, 4 or 8 respectively and decrements ECX by 1 and continues until ECX = 0. Traversing memory regions can be reversed by setting DF (Direction flag) in EFLAGS.

I see what you are saying, and I understand the movsd instruction now as well, thank you. I didn't know about the calling convention. But should I not move the parameter values from [EBP+??] at all except at the end where I copy over the entire array? Also whats the difference between EBP and ESP?
Registers only have to be loaded with values when needed. After the first invocation of MOVSD, EDI is already pointing to next part of the destination buffer, so all that's required is to load ESI & ECX with next set of values.
Oh I think you misunderstand what I am trying to do in my code... I'm not concatenating the two arrays, I'm merging two sorted arrays so as to maintain the sorted order. But the movsd trick is still good, so thank you
LODSD copies the contents of the memory location pointed to by ESI into EAX. STOSD copies the contents of EAX into the memory location pointed to by EDI. MOVSD does exactly the same thing, but without altering the contents of EAX. MOVS(?) without the REP qualifier does not alter ECX.

Stack Exchange Network

Merge two sorted arrays in NASM

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Merge two sorted arrays in NASM

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions