(Initial discussion from Classic merge sort, since it is new code, I start a new thread)
Post my code below, my major question is, I have to create another array result
to hold sub-parts merge sort result. Is there a way I can just use original number
to save additional space in result
?
Any other comments on code bugs, performance (in terms of algorithm time complexity), code style, etc. are appreciated.
Code written in Python 2.7.
def merge_sort(numbers, start, end):
if start == end:
return
pivot_index = start + (end-start)//2
merge_sort(numbers, start, pivot_index)
merge_sort(numbers, pivot_index+1, end)
i = start
j = pivot_index+1
result = []
while i <= pivot_index and j <= end:
if numbers[i] <= numbers[j]:
result.append(numbers[i])
i+=1
else:
result.append(numbers[j])
j+=1
if i <= pivot_index:
result.extend(numbers[i:pivot_index+1])
if j <= end:
result.extend(numbers[j:end+1])
k=0
for i in range(start, end+1):
numbers[i] = result[k]
k+=1
if __name__ == "__main__":
numbers = [1,4,2,5,6,8,3,4,0]
merge_sort(numbers, 0, len(numbers)-1)
print numbers
1 Answer 1
You can make your merge sort run 35% faster by allocating the auxiliary memory only once and reusing it throughout the algorithm:
def coderodde_merge(source,
target,
source_offset,
target_offset,
left_run_length,
right_run_length):
left_run_index = source_offset
right_run_index = source_offset + left_run_length
left_run_index_bound = right_run_index
right_run_index_bound = right_run_index + right_run_length
while left_run_index != left_run_index_bound and right_run_index != right_run_index_bound:
if source[right_run_index] < source[left_run_index]:
target[target_offset] = source[right_run_index]
right_run_index += 1
else:
target[target_offset] = source[left_run_index]
left_run_index += 1
target_offset += 1
while left_run_index != left_run_index_bound:
target[target_offset] = source[left_run_index]
target_offset += 1
left_run_index += 1
while right_run_index != right_run_index_bound:
target[target_offset] = source[right_run_index]
target_offset += 1
right_run_index += 1
def coderodde_mergesort_impl(source,
target,
source_offset,
target_offset,
range_length):
if range_length < 2:
return
left_run_length = range_length // 2
right_run_length = range_length - left_run_length
coderodde_mergesort_impl(target,
source,
target_offset,
source_offset,
left_run_length)
coderodde_mergesort_impl(target,
source,
target_offset + left_run_length,
source_offset + left_run_length,
right_run_length)
coderodde_merge(source,
target,
source_offset,
target_offset,
left_run_length,
right_run_length)
def coderodde_mergesort(array, start, end):
range_length = end - start
if range_length < 2:
return
aux = [array[index] for index in range(start, end)]
coderodde_mergesort_impl(aux, array, 0, start, range_length)
def coderodde_mergesort_all(array):
coderodde_mergesort(array, 0, len(array))
When running this demonstration, I get the following results:
OP mergesort in 17628 milliseconds. coderodde mergesort in 11411 milliseconds. Algorithms agree: True
Also, as an additional nitpick, by PEP 8 you should separate binary operators by a space before and after, i.e., not i+=1
, but i += 1
.
Check you range
If I do something as crazy as
ar = [1, 2, 3]
merge_sort(ar, 0, -1)
you will get a stack overflow.
-
1
-
1
-
1\$\begingroup\$ @LinMa Precisely, except this implementation is a bit more general: it assumes the part of the array to be sorted needn't start at index 0. That's why it needs additional
start
parameter tocoderodde_mergesort
routine and separate starting indicessource_offset
andtarget_offset
in each recursive call to a helper routinecoderodde_mergesort_impl
(implementing the algorithm I calleddouble-array-merge-sort
). \$\endgroup\$CiaPan– CiaPan2016年12月15日 09:58:35 +00:00Commented Dec 15, 2016 at 9:58 -
1\$\begingroup\$ @LinMa Morwenn can tell you about merge-insertion sort. :) \$\endgroup\$coderodde– coderodde2016年12月15日 10:19:22 +00:00Commented Dec 15, 2016 at 10:19
-
2\$\begingroup\$ @greybeard No, the copying is unnecessary. The second array is created as a copy of the subarray to be sorted:
aux = [array[index] for index in range(start, end)]
and no data is modified during stepping down the recursion, so when it comes to one-item range, the corresponding data in both arrays are the same, whether you land in the original array or in a copy. \$\endgroup\$CiaPan– CiaPan2016年12月16日 07:07:20 +00:00Commented Dec 16, 2016 at 7:07
number
lacks "the plural-s
") Are you aware of Ford-Johnson merge-insertion sort (and improvements, e.g. by T. D. Bui & Mai Thanh), "Practical in-place mergesort"s by Katajainen, Pasanen & Teuhola (based on Kronrod) or Huang & Langston, and the relatively new, non-stable QuickMergeSort? \$\endgroup\$sure if merge-insertion sort from time complexity perspective, less efficient than [merge sort with a single buffer allocation]
well, the attempts at in-place merge sort before "Practical in-place mergesort" were not practical due to increased run time, "the practical ones" have been complicated, QuickMergeSort may not be a merge sort in everybody's book. \$\endgroup\$