I'm trying to make sorting of objects in my program as fast as possible. I know that sorting with a cmp function is deprecated in python 3.x and that using keys is faster, but I'm not sure how to get the same functionality without using a cmp function.
Here's the definition of the class:
class Topic:
def __init__(self, x, y, val):
self.id = val
self.x = x
self.y = y
I have a dictionary full of Topic
to float key, value pairs and a list of Topics to be sorted. Each topic in the list of Topic
s to be sorted has an entry in this dictionary. I need to sort the list of topics by the value in the dictionary. If two topics have a difference in value <= .001, the topic with higher ID should come first.
Here's the current code I'm using:
topicToDistance = {}
# ...
# topicToDistance populated
# ...
topics = topicToDistance.keys()
def firstGreaterCmp(a, b):
if abs(topicToDistance[a]-topicToDistance[b]) <= .001:
if a.id < b.id:
return 1
if topicToDistance[a] > topicToDistance[b]:
return 1
return -1
# sorting by key may be faster than using a cmp function
topics.sort(cmp = firstGreaterCmp)
Any help making this a bit more efficient would be greatly appreciated.
-
\$\begingroup\$ Could you explain what problem you are trying to solve by sorting this way? \$\endgroup\$Janne Karila– Janne Karila2014年11月02日 13:44:02 +00:00Commented Nov 2, 2014 at 13:44
2 Answers 2
Note that comparison doesn't have to return 1
or -1
. For example returning 123
would have the same effect as returning 1
. With this in mind, the firstGreaterCmp
function could be somewhat simpler:
def firstGreaterCmp(a, b):
difference = topicToDistance[a] - topicToDistance[b]
if abs(difference) <= .001:
if a.id < b.id:
return 1
return int(difference)
On the other hand, the comparison function must return an int
, and since your value is a float
, I had to cast it. I don't really know if that has a non-negligible cost or not.
UPDATE
Also, as @janne-karila pointed out in a comment,
int(difference)
is 0 for a range of values (due to integer truncation),
while your original code never returns 0.
It depends on your use case whether this is acceptable or not.
In the Topic
class, I find it somewhat strange that the id
parameter comes last in the parameter list.
Normally I would expect an id field to be the first.
Your code doesn't follow PEP8. It would be better this way:
def firstGreaterCmp(a, b):
if abs(topicToDistance[a] - topicToDistance[b]) <= .001:
if a.id < b.id:
return 1
if topicToDistance[a] > topicToDistance[b]:
return 1
return -1
# sorting by key may be faster than using a cmp function
topics.sort(cmp=firstGreaterCmp)
-
\$\begingroup\$ Thanks for pointing out the styling guidelines. Do you think there's a way to keep from having to access the dictionary for every comparison? Right now topicToDistance is accessed 2 times for every comparison. Considering the worst case access time for dictionaries is O(n), this could be very costly. Do you see a key I could potentially use in this scenario? \$\endgroup\$maxbart– maxbart2014年11月02日 00:01:26 +00:00Commented Nov 2, 2014 at 0:01
-
1\$\begingroup\$ In the first proposal
int(difference)
is 0 for a range of values, while OP's code never returns 0. \$\endgroup\$Janne Karila– Janne Karila2014年11月02日 11:16:34 +00:00Commented Nov 2, 2014 at 11:16 -
\$\begingroup\$ Thanks @JanneKarila, well spotted, I added a note on that. \$\endgroup\$janos– janos2014年11月02日 13:08:52 +00:00Commented Nov 2, 2014 at 13:08
Sorting algorithms only give consistent results when the comparison function is transitive, meaning that whenever A> B and B> C, then also A> C. Your comparison function does not have that property, because you apply a different rule according to the distance between the objects.
This is also why it is not possible to get the same results using a key function (at least not a useful one. Technically you can always convert a comparison function by functools.cmp_to_key
, but that will not make the sorting faster or consistent.)
If want to do this anyway, hoping that the data you will be given won't contain any groups of items whose order is ambiguos, you could try an approach where you sort two times: first by key (distance, -id), then by your comparison function. Python's Timsort algorithm is optimized to deal efficiently with partially sorted data. Therefore the first sorting step should minimize the number of comparisons in the second step.
-
\$\begingroup\$ I agree the function is not transitive. The rule is not something I can change though. This program is for a coding challenge. quora.com/challenges#nearby if you are interested. \$\endgroup\$maxbart– maxbart2014年11月02日 14:19:42 +00:00Commented Nov 2, 2014 at 14:19
Explore related questions
See similar questions with these tags.