2
\$\begingroup\$

Is this efficient?

"""
merge sort algorithm in python
"""
def merge_sort(array):
 if len(array) == 1 or len(array) < 1: # base case
 return array
 int_mid = len(array) / 2
 left = merge_sort(array[:int_mid])
 right = merge_sort(array[int_mid:])
 return merge_arrays(left, right)
def merge_arrays(left, right):
 left_index = 0
 right_index = 0
 new_array = []
 size = (len(left) + len(right))
 for i in range(size):
 if left[left_index] <= right[right_index]:
 new_array.append(left[left_index])
 left_index += 1
 else:
 new_array.append(right[right_index])
 right_index += 1
 if right_index >= len(right):
 for i in range(left_index, len(left)):
 new_array.append(left[i])
 break
 elif left_index >= len(left):
 for i in range(right_index, len(right)):
 new_array.append(right[i])
 break
 return new_array
if __name__ == '__main__':
 print merge_sort([8, 5, 3, 4, 6])
 print merge_sort([10, 9, 8, 7, 6, 5, 4])
 print merge_sort([1, 2, 3, 4, 5])
 print merge_sort(['b', 'c', 'a'])
 print merge_sort([54, 26, 93, 17, 77, 31, 44, 55, 20])
 print merge_sort([])
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jun 17, 2016 at 23:00
\$\endgroup\$
1
  • \$\begingroup\$ If you're using python2 you may want to switch the ranges to xrange. Probably will speed things up a bit. \$\endgroup\$ Commented Jun 18, 2016 at 2:32

2 Answers 2

2
\$\begingroup\$

Code

What comes to your merge_sort, you can rewrite the very first check succintly as follows:

if len(array) < 2: # base case

Algorithm

You could try something like this:

def coderodde_mergesort_impl(source,
 target,
 source_offset,
 target_offset,
 range_length):
 if range_length < 2:
 return
 half_range_length = range_length // 2
 coderodde_mergesort_impl(target,
 source,
 target_offset,
 source_offset,
 half_range_length)
 coderodde_mergesort_impl(target,
 source,
 target_offset + half_range_length,
 source_offset + half_range_length,
 range_length - half_range_length)
 left_run_index = source_offset
 right_run_index = source_offset + half_range_length
 left_run_bound = right_run_index
 right_run_bound = source_offset + range_length
 target_index = target_offset
 while left_run_index < left_run_bound and right_run_index < right_run_bound:
 if source[right_run_index] < source[left_run_index]:
 target[target_index] = source[right_run_index]
 right_run_index += 1
 else:
 target[target_index] = source[left_run_index]
 left_run_index += 1
 target_index += 1
 while left_run_index < left_run_bound:
 target[target_index] = source[left_run_index]
 target_index += 1
 left_run_index += 1
 while right_run_index < right_run_bound:
 target[target_index] = source[right_run_index]
 target_index += 1
 right_run_index += 1
def coderodde_mergesort_ext(array, from_index, to_index):
 range_length = to_index - from_index
 aux = []
 i = from_index
 while i < to_index:
 aux.append(array[i])
 i += 1
 coderodde_mergesort_impl(aux, array, 0, from_index, range_length)
def coderodde_mergesort(array):
 coderodde_mergesort_ext(array, 0, len(array))

The above gives me the following performance figures:


Jaime_mergesort in 18399 milliseconds.
coderodde_mergesort in 16601 milliseconds.
op_merge_sort in 22787 milliseconds.

Hope that helps.

answered Jun 18, 2016 at 10:17
\$\endgroup\$
2
  • \$\begingroup\$ "/" is fine because I am using python 2.7. \$\endgroup\$ Commented Jun 18, 2016 at 12:12
  • \$\begingroup\$ @Orestis I updated the answer. \$\endgroup\$ Commented Jun 18, 2016 at 12:17
2
\$\begingroup\$

No, it's not very efficient... There are operations you typically want to minimize if you want a performing solution, and memory allocation is one of the top ones. But your merge_arrays function is creating a new list every time it gets called, which is not a good thing. An efficient implementation of mergesort will do the sorting in-place (with a copy of the list before if the original has to be kept unchanged), and will reuse the merge buffer for all merging operations.

def mergesort(array, left=0, right=None):
 if right is None:
 right = len(array)
 if right - left <= 1:
 return
 mid = left + (right - left) // 2
 mergesort(array, left, mid)
 mergesort(array, mid, right)
 merge(array, left, mid, right)
def merge(array, left, mid, right):
 buffer = array[left:mid]
 read_left = 0
 read_right = mid
 write = left
 while read_left < len(buffer) and read_right < right:
 if array[read_right] < buffer[read_left]:
 array[write] = array[read_right]
 read_right += 1
 else:
 array[write] = buffer[read_left]
 read_left += 1
 write += 1
 while read_left < len(buffer):
 array[write] = buffer[read_left]
 read_left += 1
 write += 1

Notice how the left part is the only one copied out into the buffer, as that is all that is needed. Notice also the loop structure, which is substantially different from yours. It is a matter of taste, but the single while loop for the merging, with an extra loop to copy items that could be left in the buffer (notice that any item left on the right side is already in the right place) seems to be preferred in most production implementations I've seen, and I find it more clear, but YMMV.

The above code is creating a new buffer for every merge operation, which makes the code a little simpler. A possible way of reusing a buffer is to have it be a static variable of merge. Static variables are a little funny in Python, but it could look something like this:

def merge(array, left, mid, right):
 left_len = mid - left
 if len(merge.buffer) < left_len:
 merge.buffer.extend([None] * (len(buffer) - left_len))
 for write, read in enumerate(range(left, mid)):
 merge.buffer[write] = array[read]
 read_left = 0
 read_right = mid
 write = left
 while read_left < left_len and read_right < right:
 if array[read_right] < buffer[read_left]:
 array[write] = array[read_right]
 read_right += 1
 else:
 array[write] = buffer[read_left]
 read_left += 1
 write += 1
 while read_left < len(buffer):
 array[write] = buffer[read_left]
 read_left += 1
 write += 1
merge.buffer = []

I wouldn't be surprised though if, this being Python, the explicit looping took longer than the previous version. But this would be preferred in a lower level language.

Another possible optimization is to scan the left half of the array prior to the copying into the buffer, to avoid moving around items that are already in the right place:

def merge(array, left, mid, right):
 while array[left] < array[mid]:
 left += 1
 ...

and the rest of the code would stay the same.

answered Jun 18, 2016 at 8:15
\$\endgroup\$
2
  • \$\begingroup\$ From what I know, correct me if I am wrong, in-place merge sort is of theoretical interest and in practice might me slower than then typical implementation. Anyway, thanks for your answer, it is useful. \$\endgroup\$ Commented Jun 18, 2016 at 8:30
  • 1
    \$\begingroup\$ That would be a bufferless mergesort that you have in mind. Doesn't apply to this code, which is how most real world mergesort are implemented. \$\endgroup\$ Commented Jun 18, 2016 at 8:33

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.