Classic merge sort (part 2)

Question 1

(Initial discussion from Classic merge sort, since it is new code, I start a new thread)

Post my code below, my major question is, I have to create another array result to hold sub-parts merge sort result. Is there a way I can just use original number to save additional space in result?

Any other comments on code bugs, performance (in terms of algorithm time complexity), code style, etc. are appreciated.

Code written in Python 2.7.

def merge_sort(numbers, start, end):
 if start == end:
 return
 pivot_index = start + (end-start)//2
 merge_sort(numbers, start, pivot_index)
 merge_sort(numbers, pivot_index+1, end)
 i = start
 j = pivot_index+1
 result = []
 while i <= pivot_index and j <= end:
 if numbers[i] <= numbers[j]:
 result.append(numbers[i])
 i+=1
 else:
 result.append(numbers[j])
 j+=1
 if i <= pivot_index:
 result.extend(numbers[i:pivot_index+1])
 if j <= end:
 result.extend(numbers[j:end+1])
 k=0
 for i in range(start, end+1):
 numbers[i] = result[k]
 k+=1
if __name__ == "__main__":
 numbers = [1,4,2,5,6,8,3,4,0]
 merge_sort(numbers, 0, len(numbers)-1)
 print numbers

Question 2

Is in-place merge sorting arrays possible without at least doubling run time? (in your wording, number lacks "the plural-s") Are you aware of Ford-Johnson merge-insertion sort (and improvements, e.g. by T. D. Bui & Mai Thanh), "Practical in-place mergesort"s by Katajainen, Pasanen & Teuhola (based on Kronrod) or Huang & Langston, and the relatively new, non-stable QuickMergeSort?

Question 3

@greybeard, nice post and did some study, but sure if merge-insertion sort from time complexity perspective, less efficient than the solution coderodde posted -- which only use one additional copy? Thanks.

Question 4

sure if merge-insertion sort from time complexity perspective, less efficient than [merge sort with a single buffer allocation]

well, the attempts at in-place merge sort before "Practical in-place mergesort" were not practical due to increased run time, "the practical ones" have been complicated, QuickMergeSort may not be a merge sort in everybody's book.

Question 5

Thanks greybeard, in what scenarios do you think merge quick sort is useful than other merge sort implementation mechanisms, like what we discussed below?

Question 6

BTW greybeard, how do you think Ford-Johnson merge-insertion sort? Do you think it is more practical comparing to quick merge sort?

Question 7

You can make your merge sort run 35% faster by allocating the auxiliary memory only once and reusing it throughout the algorithm:

def coderodde_merge(source,
 target,
 source_offset,
 target_offset,
 left_run_length,
 right_run_length):
 left_run_index = source_offset
 right_run_index = source_offset + left_run_length
 left_run_index_bound = right_run_index
 right_run_index_bound = right_run_index + right_run_length
 while left_run_index != left_run_index_bound and right_run_index != right_run_index_bound:
 if source[right_run_index] < source[left_run_index]:
 target[target_offset] = source[right_run_index]
 right_run_index += 1
 else:
 target[target_offset] = source[left_run_index]
 left_run_index += 1
 target_offset += 1
 while left_run_index != left_run_index_bound:
 target[target_offset] = source[left_run_index]
 target_offset += 1
 left_run_index += 1
 while right_run_index != right_run_index_bound:
 target[target_offset] = source[right_run_index]
 target_offset += 1
 right_run_index += 1
def coderodde_mergesort_impl(source,
 target,
 source_offset,
 target_offset,
 range_length):
 if range_length < 2:
 return
 left_run_length = range_length // 2
 right_run_length = range_length - left_run_length
 coderodde_mergesort_impl(target,
 source,
 target_offset,
 source_offset,
 left_run_length)
 coderodde_mergesort_impl(target,
 source,
 target_offset + left_run_length,
 source_offset + left_run_length,
 right_run_length)
 coderodde_merge(source,
 target,
 source_offset,
 target_offset,
 left_run_length,
 right_run_length)
def coderodde_mergesort(array, start, end):
 range_length = end - start
 if range_length < 2:
 return
 aux = [array[index] for index in range(start, end)]
 coderodde_mergesort_impl(aux, array, 0, start, range_length)
def coderodde_mergesort_all(array):
 coderodde_mergesort(array, 0, len(array))

When running this demonstration, I get the following results:

OP mergesort in 17628 milliseconds.
coderodde mergesort in 11411 milliseconds.
Algorithms agree: True

Also, as an additional nitpick, by PEP 8 you should separate binary operators by a space before and after, i.e., not i+=1, but i += 1.

Check you range

If I do something as crazy as

ar = [1, 2, 3]
merge_sort(ar, 0, -1)

you will get a stack overflow.

Question 8

Sorry, I cannot help you with merge-insertion sort. Ask, say, Morwenn.

Question 9

@LinMa If you replace the word 'target' with 'destination' I hope you'll find a correlation of 'using destination as source, and source as destination' with variable names S & D here...

Question 10

@LinMa Precisely, except this implementation is a bit more general: it assumes the part of the array to be sorted needn't start at index 0. That's why it needs additional start parameter to coderodde_mergesortroutine and separate starting indices source_offset and target_offset in each recursive call to a helper routine coderodde_mergesort_impl (implementing the algorithm I called double-array-merge-sort).

Question 11

@LinMa Morwenn can tell you about merge-insertion sort. :)

Question 12

@greybeard No, the copying is unnecessary. The second array is created as a copy of the subarray to be sorted: aux = [array[index] for index in range(start, end)] and no data is modified during stepping down the recursion, so when it comes to one-item range, the corresponding data in both arrays are the same, whether you land in the original array or in a copy.

coderodde coderodde 31.8k15 gold badges77 silver badges202 bronze badges · Accepted Answer · 2016-12-13 07:52:21Z

You can make your merge sort run 35% faster by allocating the auxiliary memory only once and reusing it throughout the algorithm:

def coderodde_merge(source,
 target,
 source_offset,
 target_offset,
 left_run_length,
 right_run_length):
 left_run_index = source_offset
 right_run_index = source_offset + left_run_length
 left_run_index_bound = right_run_index
 right_run_index_bound = right_run_index + right_run_length
 while left_run_index != left_run_index_bound and right_run_index != right_run_index_bound:
 if source[right_run_index] < source[left_run_index]:
 target[target_offset] = source[right_run_index]
 right_run_index += 1
 else:
 target[target_offset] = source[left_run_index]
 left_run_index += 1
 target_offset += 1
 while left_run_index != left_run_index_bound:
 target[target_offset] = source[left_run_index]
 target_offset += 1
 left_run_index += 1
 while right_run_index != right_run_index_bound:
 target[target_offset] = source[right_run_index]
 target_offset += 1
 right_run_index += 1
def coderodde_mergesort_impl(source,
 target,
 source_offset,
 target_offset,
 range_length):
 if range_length < 2:
 return
 left_run_length = range_length // 2
 right_run_length = range_length - left_run_length
 coderodde_mergesort_impl(target,
 source,
 target_offset,
 source_offset,
 left_run_length)
 coderodde_mergesort_impl(target,
 source,
 target_offset + left_run_length,
 source_offset + left_run_length,
 right_run_length)
 coderodde_merge(source,
 target,
 source_offset,
 target_offset,
 left_run_length,
 right_run_length)
def coderodde_mergesort(array, start, end):
 range_length = end - start
 if range_length < 2:
 return
 aux = [array[index] for index in range(start, end)]
 coderodde_mergesort_impl(aux, array, 0, start, range_length)
def coderodde_mergesort_all(array):
 coderodde_mergesort(array, 0, len(array))

When running this demonstration, I get the following results:

OP mergesort in 17628 milliseconds.
coderodde mergesort in 11411 milliseconds.
Algorithms agree: True

Also, as an additional nitpick, by PEP 8 you should separate binary operators by a space before and after, i.e., not i+=1, but i += 1.

Check you range

If I do something as crazy as

ar = [1, 2, 3]
merge_sort(ar, 0, -1)

you will get a stack overflow.

Sorry, I cannot help you with merge-insertion sort. Ask, say, Morwenn.
@LinMa If you replace the word 'target' with 'destination' I hope you'll find a correlation of 'using destination as source, and source as destination' with variable names S & D here...
@LinMa Precisely, except this implementation is a bit more general: it assumes the part of the array to be sorted needn't start at index 0. That's why it needs additional start parameter to coderodde_mergesortroutine and separate starting indices source_offset and target_offset in each recursive call to a helper routine coderodde_mergesort_impl (implementing the algorithm I called double-array-merge-sort).
@greybeard No, the copying is unnecessary. The second array is created as a copy of the subarray to be sorted: aux = [array[index] for index in range(start, end)] and no data is modified during stepping down the recursion, so when it comes to one-item range, the corresponding data in both arrays are the same, whether you land in the original array or in a copy.

Stack Exchange Network

Classic merge sort (part 2)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Classic merge sort (part 2)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions