There are many sorting algorithms, so how about unsorting algorithms?
But how could one go about creating a random list of unique elements while not using shuffle, NumPy or any other ready made tools to do it?
def get_un_sorted(limit):
numbers = list(range(limit))
shuffled = [0] * limit
for i in range(limit):
index = randint(0, len(numbers) - 1)
shuffled[i] = numbers[index]
numbers.pop(index)
return shuffled
def main():
not_sorted = get_un_sorted(2 * 10 ** 4)
if __name__ == '__main__':
main()
It doesn't seem so nifty to me and it is very slow, but it gets the work done.
2 Answers 2
Your question reminds me of a beautiful page that visually explains the efficacy of shuffling.
But here are the two main problems with your algorithm:
Your memory usage is bad, this is as you're not performing the shuffle in-place. As you have to make a new list, to keep the old intact and to make the new you need \$O(n)\$ memory. If you instead performed the shuffle in-place then you'd only need \$O(1)\$ memory.
You're removing from the middle of an array. As Pythons lists are internally arrays, when you remove anything that is not the item at the end, you have to shift all the previous data. This leads to
list.pop
having a worst case of \$O(n)\$ performance.If you take this with the fact that you have to perform this action \$n\$ times, then you can quickly see this algorithm is \$O(n^2)\$.
Both the problems above have the solution of performing the shuffle in-place. To do this you want to swap the chosen item with the current index. So if we're on the first item, and we randomly pick the index four, then we'll swap the item at index zero and four. Or in Python:
i, j = 0, 4
numbers[i], numbers[j] = numbers[j], numbers[i]
Now that we can perform the swapping efficiently,
you'd have to change what indexes you pick, when using randint
.
If we're on index 4 of 8, then you'd not want to pick an index below four.
I'd change your program to have a function shuffle
that takes an array as input, swaps it in-place and returns None
.
And for you to use this rather than get_un_sorted
.
And so you should be able to get something like, the Fisher–Yates shuffle:
def shuffle(l):
for i in range(len(l) - 1):
j = randint(i, len(l) - 1)
l[i], l[j] = l[j], l[i]
Finally you could use randrange
rather than randint
as it's a short hand for having the upper bound be non-inclusive.
- You don't have to construct
shuffled
up front using[0] * limit
. Instead, you can initializeshuffled
to the empty list, and thenappend
to it. You use
pop
to remove an element from an arbitrary position in the list. I believe this takes O(n) time with respect to the length of the list. Thus,get_un_sorted
requires O(n^2) time. That is probably why you are seeing slow performance.Unfortunately, there isn't any easy way to get around this using Python's builtin types. However, the blist package provides a list-like datastructure with better asymptotic performance (O(log(n))) in this case.
Using a list-comprehension might increase the "niftyness factor".
def get_un_sorted(limit):
numbers = list(range(limit))
def pick_one():
index = randint(0, len(numbers) - 1)
return numbers.pop(index)
return [pick_one() for i in range(limit)]
Though I wouldn't particularly recommend it -- performing mutations within a list comprehension is generally frowned upon.
-
\$\begingroup\$ You can use
randrange(a, b)
instead ofrandint(a, b-1)
. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年11月30日 08:15:52 +00:00Commented Nov 30, 2016 at 8:15
randint
"? The code above uses other functions too, likerange
,list
, etc? What are the other allowed and not allowed functions? \$\endgroup\$random.shuffle()
, and you're making a copy rather than working in-place, but that's still a shuffle. \$\endgroup\$