Python Quicksort Implementation with duplicates

Question 1

Please critique my implementation of Quicksort in python 3.8:

import random
from typing import List
def quicksort(A: List[int], left: int, right: int):
 """
 Sort the array in-place in the range [left, right(
 """
 if left >= right:
 return
 pivot_index = random.randint(left, right - 1)
 swap(A, left, pivot_index)
 pivot_new_index = partition(A, left, right)
 quicksort(A, left, pivot_new_index)
 quicksort(A, pivot_new_index + 1, right)
def partition(A: List[int], left: int, right: int) -> int:
 """
 in-place partition around first item
 """
 if left >= right:
 return left
 pivot = A[left]
 # initialize i, j pointers
 j = left + 1
 while j < right and A[j] <= pivot:
 j += 1
 i = j
 # invariant :
 # A[left+1 ... i-1] <= pivot
 # A[i ... j-1] > pivot
 # A[j ... right( unexplored
 while j < right:
 if A[j] <= pivot:
 swap(A, i, j)
 i += 1
 j += 1
 swap(A, left, i - 1)
 return i - 1
def swap(A: List[int], i: int, j: int):
 A[i], A[j] = A[j], A[i]

I followed Tim Roughgarden's explanations in his MOOC on algorithms. But he assumes the keys are all distinct and says that having Quicksort work efficiently on many duplicated keys is trickier than it seems and point to Robert Sedgewick and Kevin Wayne book on algorithms. I checked it and the partition method looks indeed way different.

My current code seems to work even with duplicated keys but I think if all the keys are the same, I will get a O(n2) complexity (partitioned arrays will be very unbalanced). How can I update my current code to address this issue?

Question 2

(Please do not change in your code (after the first CR answer): when using round and square brackets for the notation of intervals, they are in standard orientation. I've seen mirrored ones when using (round) parentheses exclusively.)

Question 3

(Dual pivot is superior to three way (<,=,>) partition.)

Question 4

Performance problems with quicksort are due to one single partition getting almost all of the items, incurring repeated "partition cost".
In the extreme (undetected, if for sake of argument) case of all equal keys, the code presented (implementing Lomuto's partition scheme) puts all items equal to pivot in the left partition, returning the index of the rightmost one, leading to quadratic time.
Careful implementations of Hoare partition move values equal to pivot to "the other partition", resulting in a balanced split here.
(When implementing dual pivot, checking they differ seems prudent.)

The minimally invasive change would seem to be to change the way values equal to pivot are handled - either

Keep a toggle indicating where to put the next one - or

Put the item in the partition currently smaller.

if A[j] < pivot or (
 A[j] == pivot and i-left < j-i):

Why restrict items to int?
Most of the implementation looks minimalistic (down to function docstrings, only - without full-stops).
The special handling of values not exceeding pivot before the main loop of partition() doesn't quite seem in line.
While you can put pivot selection in partition() just as well as in quicksort(), you could put it in a function of its own to emphasize mutual independance.

Question 5

One possible solution to handle duplicates without a big performance degrade is to randomly shuffle the input list before doing the sort. A specific algorithm to do so is a Fisher Yates Shuffle.

Question 6

What does shuffling duplicates achieve?

Question 7

This allows expected complexity to be stated which does not depend on the algorithms input

Question 8

(When some information has not been just overlooked by a commenter, add the elaboration to your post rather than commenting a comment.)

Question 9

Did you notice the quicksort() presented already tries to evade systematic bad choice of pivot value: pivot_index = random.randint(left, right - 1)? This doesn't help coping with significantly fewer key values than items - neither does shuffling.

greybeard greybeard 7,3813 gold badges21 silver badges55 bronze badges · Answer 1 · 2021-10-10 07:39:00Z

Performance problems with quicksort are due to one single partition getting almost all of the items, incurring repeated "partition cost".
In the extreme (undetected, if for sake of argument) case of all equal keys, the code presented (implementing Lomuto's partition scheme) puts all items equal to pivot in the left partition, returning the index of the rightmost one, leading to quadratic time.
Careful implementations of Hoare partition move values equal to pivot to "the other partition", resulting in a balanced split here.
(When implementing dual pivot, checking they differ seems prudent.)

The minimally invasive change would seem to be to change the way values equal to pivot are handled - either

Keep a toggle indicating where to put the next one - or

Put the item in the partition currently smaller.

if A[j] < pivot or (
 A[j] == pivot and i-left < j-i):

Why restrict items to int?
Most of the implementation looks minimalistic (down to function docstrings, only - without full-stops).
The special handling of values not exceeding pivot before the main loop of partition() doesn't quite seem in line.
While you can put pivot selection in partition() just as well as in quicksort(), you could put it in a function of its own to emphasize mutual independance.

Akash Patel Akash Patel 3701 silver badge8 bronze badges · Answer 2 · 2021-09-08 19:38:10Z

1

\$\begingroup\$

One possible solution to handle duplicates without a big performance degrade is to randomly shuffle the input list before doing the sort. A specific algorithm to do so is a Fisher Yates Shuffle.

Share

edited Oct 10, 2021 at 10:47

coderodde's user avatar

coderodde

31.6k15 gold badges77 silver badges201 bronze badges

answered Sep 8, 2021 at 19:38

Akash Patel's user avatar

Akash Patel Akash Patel

3701 silver badge8 bronze badges

\$\endgroup\$

4

\$\begingroup\$ What does shuffling duplicates achieve? \$\endgroup\$

no comment
– no comment

2021年09月08日 20:19:07 +00:00
Commented Sep 8, 2021 at 20:19
\$\begingroup\$ This allows expected complexity to be stated which does not depend on the algorithms input \$\endgroup\$

Akash Patel
– Akash Patel

2021年09月08日 20:21:10 +00:00
Commented Sep 8, 2021 at 20:21
\$\begingroup\$ (When some information has not been just overlooked by a commenter, add the elaboration to your post rather than commenting a comment.) \$\endgroup\$

greybeard
– greybeard

2021年09月09日 21:28:11 +00:00
Commented Sep 9, 2021 at 21:28
\$\begingroup\$ Did you notice the quicksort() presented already tries to evade systematic bad choice of pivot value: pivot_index = random.randint(left, right - 1)? This doesn't help coping with significantly fewer key values than items - neither does shuffling. \$\endgroup\$

greybeard
– greybeard

2021年10月09日 21:32:55 +00:00
Commented Oct 9, 2021 at 21:32

Add a comment |

Stack Exchange Network

Python Quicksort Implementation with duplicates

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python Quicksort Implementation with duplicates

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions