Please critique my implementation of Quicksort in python 3.8:
import random
from typing import List
def quicksort(A: List[int], left: int, right: int):
"""
Sort the array in-place in the range [left, right(
"""
if left >= right:
return
pivot_index = random.randint(left, right - 1)
swap(A, left, pivot_index)
pivot_new_index = partition(A, left, right)
quicksort(A, left, pivot_new_index)
quicksort(A, pivot_new_index + 1, right)
def partition(A: List[int], left: int, right: int) -> int:
"""
in-place partition around first item
"""
if left >= right:
return left
pivot = A[left]
# initialize i, j pointers
j = left + 1
while j < right and A[j] <= pivot:
j += 1
i = j
# invariant :
# A[left+1 ... i-1] <= pivot
# A[i ... j-1] > pivot
# A[j ... right( unexplored
while j < right:
if A[j] <= pivot:
swap(A, i, j)
i += 1
j += 1
swap(A, left, i - 1)
return i - 1
def swap(A: List[int], i: int, j: int):
A[i], A[j] = A[j], A[i]
I followed Tim Roughgarden's explanations in his MOOC on algorithms. But he assumes the keys are all distinct and says that having Quicksort work efficiently on many duplicated keys is trickier than it seems and point to Robert Sedgewick and Kevin Wayne book on algorithms. I checked it and the partition method looks indeed way different.
My current code seems to work even with duplicated keys but I think if all the keys are the same, I will get a O(n2) complexity (partitioned arrays will be very unbalanced). How can I update my current code to address this issue?
-
\$\begingroup\$ (Please do not change in your code (after the first CR answer): when using round and square brackets for the notation of intervals, they are in standard orientation. I've seen mirrored ones when using (round) parentheses exclusively.) \$\endgroup\$greybeard– greybeard2021年09月09日 06:21:48 +00:00Commented Sep 9, 2021 at 6:21
-
\$\begingroup\$ (Dual pivot is superior to three way (<,=,>) partition.) \$\endgroup\$greybeard– greybeard2021年09月09日 06:23:33 +00:00Commented Sep 9, 2021 at 6:23
2 Answers 2
Performance problems with quicksort are due to one single partition getting almost all of the items, incurring repeated "partition cost".
In the extreme (undetected, if for sake of argument) case of all equal keys, the code presented (implementing Lomuto's partition scheme) puts all items equal to pivot in the left partition, returning the index of the rightmost one, leading to quadratic time.
Careful implementations of Hoare partition move values equal to pivot to "the other partition", resulting in a balanced split here.
(When implementing dual pivot, checking they differ seems prudent.)
The minimally invasive change would seem to be to change the way values equal to pivot are handled - either
- Keep a toggle indicating where to put the next one - or
- Put the item in the partition currently smaller.
if A[j] < pivot or ( A[j] == pivot and i-left < j-i):
- Why restrict items to
int
? - Most of the implementation looks minimalistic (down to function docstrings, only - without full-stops).
The special handling of values not exceeding pivot before the main loop ofpartition()
doesn't quite seem in line. - While you can put pivot selection in
partition()
just as well as inquicksort()
, you could put it in a function of its own to emphasize mutual independance.
One possible solution to handle duplicates without a big performance degrade is to randomly shuffle the input list before doing the sort. A specific algorithm to do so is a Fisher Yates Shuffle.
-
\$\begingroup\$ What does shuffling duplicates achieve? \$\endgroup\$no comment– no comment2021年09月08日 20:19:07 +00:00Commented Sep 8, 2021 at 20:19
-
\$\begingroup\$ This allows expected complexity to be stated which does not depend on the algorithms input \$\endgroup\$Akash Patel– Akash Patel2021年09月08日 20:21:10 +00:00Commented Sep 8, 2021 at 20:21
-
\$\begingroup\$ (When some information has not been just overlooked by a commenter, add the elaboration to your post rather than commenting a comment.) \$\endgroup\$greybeard– greybeard2021年09月09日 21:28:11 +00:00Commented Sep 9, 2021 at 21:28
-
\$\begingroup\$ Did you notice the
quicksort()
presented already tries to evade systematic bad choice of pivot value:pivot_index = random.randint(left, right - 1)
? This doesn't help coping with significantly fewer key values than items - neither does shuffling. \$\endgroup\$greybeard– greybeard2021年10月09日 21:32:55 +00:00Commented Oct 9, 2021 at 21:32
Explore related questions
See similar questions with these tags.