Sorting almost sorted array with a minimum heap

Question 1

I'm currently looking into the quickest way to sort an almost sorted array:

Given an array of \$n\$ elements, where each element is at most \$k\$ away from its target position, devise an algorithm that sorts in \$O(n \log k)\$ time.

I've implemented a sorting function which "slides" through an input list, pushes the element from a sliding window into a min heap (using the heapq built-in module), pops the minimum element which is collected in the result list:

from typing import List
from heapq import heappush, heappop
def sort_almost_sorted(a: List[int], k: int) -> List[int]:
 if not a:
 return []
 length = len(a)
 if k >= length:
 return sorted(a) # apply built-in "timsort", designed to be quick on almost sorted lists
 result = []
 min_heap = [] # maintain a min heap for a sliding window
 for slice_start in range(0, length, k + 1): # sliding window
 # push the next slice to min heap
 for index in range(slice_start, min(slice_start + k + 1, length)):
 heappush(min_heap, a[index])
 result.append(heappop(min_heap))
 # put the rest of the heap into the result
 for _ in range(len(min_heap)):
 result.append(heappop(min_heap))
 return result

It works on my sample inputs.

Do you think I'm using the minimum heap appropriately and this implementation is \$O(n \log k)\$? What would you improve code-quality or code-organization wise?

Question 2

I'm not sure I understand the logic for which this is \$O(n log k)\$ instead of \$O(k^2)\,ドル can you elaborate?

Question 3

@ChatterOne the idea was based on the min heap's "log k" complexities for pushing and popping the min element, but I think I am overcomplicating a bit, looks like I can simply push the next element to the heap instead of using this "sliding window"..and I may be not using the heapq correctly, cause the built-in sorted outperforms this approach dramatically on the artificial almost sorted lists I've prepared. Well, feedback would be definitely helpful. Thanks.

Question 4

If they're at most k away, you can probably split the list in sublists of k elements and sort those. I'm not sure how to exploit this fact though.

Question 5

I think you've almost got it sorted: you need a sliding window, but you don't need to rebuild it all the time. First read the first 2k items from the list into a heap. Then, iterative over the remaining elements, each time adding the next element to the heap, and then removing the minimum of that element from the heap and putting it in a new array. When you run out of elements, you just heap the heap out into the output. This means that your heap is never larger than 2k+1 (giving O(log(k)) add and pop), and you always consider each element in the context of those k either side.

Question 6

Algorithm complexity

I am not 100% sure, but it looks like a fancy heapsort which runs (after all) in \$O(n \log n)\$. In order to get this intuition, consider

for slice_start in range(0, length, k + 1): # sliding window
 # push the next slice to min heap
 for index in range(slice_start, min(slice_start + k + 1, length)):
 heappush(min_heap, a[index])
 result.append(heappop(min_heap))

You operate in chunks of length k + 1 array components (save for the very last chunk that may be shorter). For each chunk, you load it entirely to the heap, and then pop only one element from it to the result array. Finally, in

for _ in range(len(min_heap)):
 result.append(heappop(min_heap))

you handle the majority of elements via heap sort, and that's linearithmic worst case running time.

Perhaps this one?

# This sort runs in O(n log k + k log k),
# which simplifies to O(n log k) whenever k = o(n)
# (k is sublinear in n).
def window_heapsort(a, k):
 if k > len(a):
 return sorted(a)
 window_width = k + 1
 window_heap = []
 result_array = []
 index = 0
 # Initialize the window heap:
 # Runs in O(k log k)
 for i in range(window_width):
 heappush(window_heap, a[i])
 # Keep sliding the window over the array:
 # Here, the size of the heap alternates between
 # 'window_width' and 'window_width - 1', which is
 # O(k), and thus pops and pushes run in O(log k).
 # Since we march over (at most) n elements, the
 # complexity of this loop is O(n log k).
 while index + window_width < len(a):
 result_array.append(heappop(window_heap))
 heappush(window_heap, a[window_width + index])
 index += 1
 # Dump the leftovers to the end of the result array:
 # Runs in O(k log k)
 while window_heap:
 result_array.append(heappop(window_heap))
 return result_array

You can find the entire test snippet here. Hope that helps.

coderodde 32.4k15 gold badges79 silver badges205 bronze badges · Accepted Answer · 2017-03-28 17:27:20Z

Algorithm complexity

I am not 100% sure, but it looks like a fancy heapsort which runs (after all) in \$O(n \log n)\$. In order to get this intuition, consider

for slice_start in range(0, length, k + 1): # sliding window
 # push the next slice to min heap
 for index in range(slice_start, min(slice_start + k + 1, length)):
 heappush(min_heap, a[index])
 result.append(heappop(min_heap))

You operate in chunks of length k + 1 array components (save for the very last chunk that may be shorter). For each chunk, you load it entirely to the heap, and then pop only one element from it to the result array. Finally, in

for _ in range(len(min_heap)):
 result.append(heappop(min_heap))

you handle the majority of elements via heap sort, and that's linearithmic worst case running time.

Perhaps this one?

# This sort runs in O(n log k + k log k),
# which simplifies to O(n log k) whenever k = o(n)
# (k is sublinear in n).
def window_heapsort(a, k):
 if k > len(a):
 return sorted(a)
 window_width = k + 1
 window_heap = []
 result_array = []
 index = 0
 # Initialize the window heap:
 # Runs in O(k log k)
 for i in range(window_width):
 heappush(window_heap, a[i])
 # Keep sliding the window over the array:
 # Here, the size of the heap alternates between
 # 'window_width' and 'window_width - 1', which is
 # O(k), and thus pops and pushes run in O(log k).
 # Since we march over (at most) n elements, the
 # complexity of this loop is O(n log k).
 while index + window_width < len(a):
 result_array.append(heappop(window_heap))
 heappush(window_heap, a[window_width + index])
 index += 1
 # Dump the leftovers to the end of the result array:
 # Runs in O(k log k)
 while window_heap:
 result_array.append(heappop(window_heap))
 return result_array

You can find the entire test snippet here. Hope that helps.

Stack Exchange Network

Sorting almost sorted array with a minimum heap

1 Answer 1

Algorithm complexity

Perhaps this one?

You must log in to answer this question.

Hot Network Questions

Sorting almost sorted array with a minimum heap

1 Answer 1

Algorithm complexity

Perhaps this one?

You must log in to answer this question.

Related

Hot Network Questions