I wrote the following function to implement counting sort:
def CountSort( Arr, Max ):
counter, retVal = [0] * (Max + 1), []
for _ in Arr:
counter[_] += 1
for i in xrange( Max + 1 ):
retVal += [i] * counter[i]
return retVal
NOTE
The function takes the array of integer/list of integer as its first argument, and the maximum value it stores as the second. The only restriction is that the data in array should be distributed for \0ドル \le x \le Max\$.
I'm not happy with it since I have to maintain two lists in memory now, namely Arr
and retVal
. I could, obviously, use the following:
# indentation preserved
pos = 0
for i in xrange( Max + 1 ):
while counter[i] > 0:
Arr[pos] = i
pos += 1
counter[i] -= 1
but it won't be as pythonic as the first one. An alternate approach is to delete the input Arr
, and recreate it:
# indentation preserved
del Arr
Arr = []
for i in xrange( Max + 1 ):
Arr += [i] * counter[i]
but this is not dissimilar to the second approach.
I'm open to suggestions on improvement of the first (or any of the latter).
4 Answers 4
I'm going to review your code for clarity, as I had trouble reading it at all initially.
First, you're using PascalCase for naming where you're supposed to use snake_case. Python's style guide (PEP0008) says that functions and variables should use snake_case. PascalCase is reserved for classes. Particularly Arr
looks like it's a class of yours, rather than a simple list. As for the actual name, use numbers
. It gives a clue to the type of list and doesn't confuse people about whether or not it's a plain list.
def count_sort(numbers, max_value):
Just because Python allows multiple line assignment doesn't mean it should be used all the time. It's best used when the two values being assigned have some relationship, like initialising co-ordinates x, y = 0, 0
or swapping values a, b = b, a
. In this case, it's just confusing. Again about names, I'd change retVal
to results
.
results = []
counter = [0] * (max_value + 1)
Using for _ in
is the common style for when you don't actually need the value you're calling, but you use it to assign to counter
. So instead you should use i
which is Python style for a temporary loop integer (Python wont care that you reuse it later).
for i in numbers:
counter[i] += 1
Instead of looping to get i
and then access counter
with that, you can use Python's enumerate
function. It will allow you to simultaneously get the index and value from each element of a list, like this:
for i, count in enumerate(counter):
results += [i] * count
Now that I've gone line by line, I can actually see how it works but that was a lot of work because of unfamiliar style and no comments. A docstring in particular will make it easier to understand the point of the parameters and what is returned.
Even now, I'm confused why max
is necessary since it adds confusion and potentially generates uncaught errors. You could call max(numbers)
and get the highest number from the list that way. But maybe you wanted to avoid using too many builtin functions? Of course as @Barry notes this adds execution time, but the current method doesn't handle errors or explain how to pass max_value
or why it's necessary. If you're leaving it out to optimise then you should catch index errors and provide a more relevant error (like ValueError
) and note the point of the value in your script.
-
\$\begingroup\$ That's the counting sorts' contract. For ease we can compute max while calling function itself.
res = count_sort(ary, max(ary))
\$\endgroup\$CodeYogi– CodeYogi2015年09月24日 12:05:52 +00:00Commented Sep 24, 2015 at 12:05 -
3\$\begingroup\$ "You could call
max(numbers)
..." No you shouldn't! The point of counting sort is to sort large arrays of integers that have very small ranges very efficiently. Having to make an extra pass through the array is bad. \$\endgroup\$Barry– Barry2015年09月24日 12:40:07 +00:00Commented Sep 24, 2015 at 12:40 -
\$\begingroup\$ @Barry That does make sense, but is entirely undocumented and in its current state can raise confusing errors. I'll amend a note about it. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年09月24日 13:11:27 +00:00Commented Sep 24, 2015 at 13:11
Well, you kind of have to have two arrays. You need your counts array and you need your output array. If you want to sort the input array in place, you can do that, that's just a completely different semantic for your algorithm. So you'll just need to figure that out up front: do you want to sort the input list (a la list.sort()
) or do you want to return a new list that is sorted (a la sorted()
).
That said, here's some comments on your initial solution. To start with:
counter, retVal = [0] * (Max + 1), []
There is no reason to do such a thing. That is unnecessary terse and you're just making it hard to see what's going on. Furthermore, you don't even need retVal
for now. So let's just start with the counter
by itself:
counter = [0] * (Max + 1)
for _ in Arr:
counter[_] += 1
Next, we have this loop:
for i in xrange( Max + 1 ):
retVal += [i] * counter[i]
You can actually do a little better in terms of code appearance. We have our counter
array, which has all of those indices... what we want to do is iterate over the counter
array while also having access to the indices. That's what enumerate()
is for:
retVal = []
for i, count in enumerate(counter):
retVal += [i] * count
return retVal
Lastly, if you want to follow PEP-8, your function name should really be counting_sort()
. And _
should probably be replaced with some single-letter name (x
, a
, elem
, what have you).
It would be helpful if you could provide more details about your use case. For example, why is the presence of two lists so objectionable? And are you intentionally avoiding built in functions and packages such as itertools
?
Style
- I would change
_
toi
infor _ in Arr:
. The_
is used for a throwaway term, but you actually do use_
. Also thisi
is related to thei
in your secondfor loop
so I think this would enhance understanding.
Efficiency
The rule of thumb is to avoid
for loops
when possible. It seems to me you could replace the firstfor
loop with a dictionary comprehension and the secondfor loop
with 1 or 2 list comprehrensions.d = {x:arr.count(x) for x in a} l = [[x]*d[x] for x in d.keys()] return [item for sublist in l for item in sublist]
We can take this a bit further with generator expressions to keep the number of lists down by using a generator expression
for l
rather than a list comprehension. We could even fit it into the second list comprehension, to give you the following 2-line function:
d = {x:arr.count(x) for x in a}
return [item for sublist in ([x]*d[x] for x in d.keys()) for item in sublist]
You will notice I incorporated @SuperBiasedMan's comment about snake_case, which I agree with. Also I assume that the dictionary keys are stored in order, and this seems to be empirically true when I test the code, but you'd want to confirm that this is guaranteed in the docs.
-
1\$\begingroup\$ ""Also I assume that the dictionary keys are stored in order - That's actually not true, dictionaries have arbitrary order and can't be relied on. You can use an
OrderedDict
if you want order (you have to import it fromcollections
). \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年09月24日 08:43:01 +00:00Commented Sep 24, 2015 at 8:43 -
1\$\begingroup\$ @JoeWallis I actually didn't know that! Interesting. @.sunny Though this apparently works it still isn't reliable to trust as the order will vary in most dictionary uses. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年09月24日 09:58:39 +00:00Commented Sep 24, 2015 at 9:58
-
\$\begingroup\$ @SuperBiasedMan I specifically said in my post "you'd want to confirm" and did not guarantee it worked, only that it seemed to work. \$\endgroup\$sunny– sunny2015年09月24日 13:02:52 +00:00Commented Sep 24, 2015 at 13:02
-
\$\begingroup\$ I know, that's why I was explaining it for you and the OP. I still upvoted your answer but wanted to add the info you weren't sure about. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年09月24日 13:08:12 +00:00Commented Sep 24, 2015 at 13:08
First example
You should not use _
in counter[_] += 1
.
_
is normally used to void output.
And the behaviour changes in the command line utility.
>>> _ = 1
>>> 2
2
>>> _
2
Some of your style are un-Pythonic,
CountSort( Arr, Max )
should be count_sort(arr, max_)
.
Camels are classy, in Python.
Where, almost, everything else is snake-case.
max_
isn't really Pythonic.
Python implements a function max
,
and so to not overwrite this function it is better to use max_
than max
.
But in PEP8, it's explicitly stated that using a synonym is better.
domain
, max_number
or length
may be better.
So in short overwriting builtins is bad, unsurprisingly, and so use synonyms.
You can use a generator if using two lists bugs you a lot.
for number, amount in enumerate(counter):
for _ in xrange(amount):
yield number
The way you use it is like you do xrange
, and so is only good if you only increment through the list.
Overall it's pretty solid.
Second example
You can change the second example to use two for loops, rather than one.
The first that iterates through counter
,
and the second that iterates through an xrange
of the amount returned.
This portrays what you want to achieve better in Python.
def count_sort(arr, max_):
counter = [0] * (max_ + 1)
for i in arr:
counter[i] += 1
pos = 0
for amount in counter:
for _ in xrange(amount):
arr[pos] = amount
pos += 1
my_list = [3, 2, 1]
my_new_list = count_sort(my_list, 3)
print my_list
# [1, 2, 3]
print my_new_list
# None
So, if you didn't know about shallow copy's this would be strange.
This happens as arr
is a shallow copy, and so any changes to it change the original.
Third example
It's very uncommon to use del
in Python.
And IIRC there are some quirks with it.
Lets go through what happens if you use del
in this instance.
I also use enumerate
as explained before.
def count_sort(arr, max_):
counter = [0] * (max_ + 1)
for i in arr:
counter[i] += 1
del arr
arr = []
for number, amount in enumerate(counter):
arr += [number] * amount
return arr
my_list = [3, 2, 1]
my_new_list = count_sort(my_list, 3)
print my_list
# [3, 2, 1]
print my_new_list
# [1, 2, 3]
In short it's a long-winded way of doing what you were doing before.
But it also scares the reader that my_list
would be gone,
if they are unfamiliar with it.
-
\$\begingroup\$ You could probably explain why you used
max_
instead of justmax
, OP might not know about shadowing builtins. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年09月23日 16:08:21 +00:00Commented Sep 23, 2015 at 16:08 -
\$\begingroup\$ Where does
i
get its value from in the 2nd example? In the statement:arr[pos] = i
\$\endgroup\$hjpotter92– hjpotter922015年09月24日 04:11:20 +00:00Commented Sep 24, 2015 at 4:11 -
\$\begingroup\$ Liked the third solution. Good to declare variable where its used instead of declaring all the variables at one place. But deleting array seems unnecessary, overall a good design. \$\endgroup\$CodeYogi– CodeYogi2015年09月24日 12:04:48 +00:00Commented Sep 24, 2015 at 12:04
collections.Counter
? \$\endgroup\$non-comparision
sorting algorithm forIntegers
. \$\endgroup\$