My aim is to sort a list of strings where words have to be sorted alphabetically. Words starting with s
should be at the start of the list (they should be sorted as well), followed by the other words.
The below function does that for me.
def mysort(words):
mylist1 = sorted([i for i in words if i[:1] == "s"])
mylist2 = sorted([i for i in words if i[:1] != "s"])
list = mylist1 + mylist2
return list
I am looking for better ways to achieve this, or if anyone can find any issues with the code above.
4 Answers 4
Some notes about your code:
sorted([i for i in words if i[:1] == "s"])
: No need to generate an intermediate list, drop those[
]
and you'll get a lazy generator expression instead.mylist1
. Thismy
prefix looks pretty bad. Do other languages use them?list
: Red flag. Don't overshadow a built-in (at least not one so important) with a variable.return list
. You can directly writereturn mylist1 + mylist2
, no gain in naming the value you're about to return (the name of the function should inform about that).i[:1] == "s"
->i.startswith("s")
.
Now what I'd write. Using sorted with the argument key
(Schwartzian transform) and taking advantage of the lexicographical order defined by tuples:
def mysort(words):
return sorted(words, key=lambda word: (0 if word.startswith("s") else 1, word))
If you are familiar with the fact that False < True
, it can be simplified a bit more:
sorted(words, key=lambda word: (not word.startswith("s"), word))
Here's another way using the abstraction partition
:
words_with_s, other_words = partition(words, lambda word: word.startswith("s"))
sorted(words_with_s) + sorted(other_words)
Writing partition
is left as an exercise for the reader :-)
-
\$\begingroup\$ Dang, I was just writing up the same thing! +1 :-) \$\endgroup\$tobias_k– tobias_k2013年07月12日 14:00:34 +00:00Commented Jul 12, 2013 at 14:00
-
\$\begingroup\$ @tobias_k: sorry ;-) I was a bit suprised no one had proposed something similar yet, this is pretty standard stuff. \$\endgroup\$tokland– tokland2013年07月12日 14:01:15 +00:00Commented Jul 12, 2013 at 14:01
-
1\$\begingroup\$ No problem, but two times the same answer within 2 minutes for a 8 hours old question... how probable is this? \$\endgroup\$tobias_k– tobias_k2013年07月12日 14:02:16 +00:00Commented Jul 12, 2013 at 14:02
For this snippet, I would have probably written the more compact:
def mysort(words):
return (sorted([i for i in words if i[0] == "s"]) +
sorted([i for i in words if i[0] != "s"]))
(You're not supposed to align code like this in Python, but I tend to do it anyway.) Note that I wrote i[0]
and not i[:1]
- I think the former is clearer. Using str.startswith()
is even better.
Also, it is generally considered bad practice to use list
as a variable name.
However, your algorithm iterates the list at least three times: Once to look for the words starting with s
, once to look for the words not starting with s
and then finally O(n log n)
times more to sort the two lists. If necessary, you can improve the algorithm to do just one pass before sorting, by populating both lists simultaneously:
def prefix_priority_sort(words, special_prefix = "s"):
begins_with_special = []
not_begin_with_special = []
for word in words:
if word.startswith(special_prefix):
begins_with_special.append(word)
else:
not_begin_with_special.append(word)
return sorted(begins_with_special) + sorted(not_begin_with_special)
However, the best way is to define your own comparator and pass it to the sorting function, like mariosangiorgio
suggests. In Python 3, you need to pass in a key function, see the docs for details, or this article.
Depending on execution speed requirements, list sizes, available memory and so on, you might want to pre-allocate memory for the lists using [None] * size
. In your case it is probably premature optimization, though.
-
5\$\begingroup\$ Note that
i[0]
will fail for empty strings, whilei[:1]
will not. Agreed thatstartswith
is best. \$\endgroup\$tobias_k– tobias_k2013年07月12日 14:04:16 +00:00Commented Jul 12, 2013 at 14:04 -
\$\begingroup\$ Your
mysort
needs another pair of parentheses, or a backslash before the newline. \$\endgroup\$Gareth Rees– Gareth Rees2013年07月12日 14:48:22 +00:00Commented Jul 12, 2013 at 14:48 -
\$\begingroup\$ @tobias_k Ah, good point! \$\endgroup\$Lstor– Lstor2013年07月12日 16:48:37 +00:00Commented Jul 12, 2013 at 16:48
-
\$\begingroup\$ What makes you say you're not supposed to align code like that? PEP-8 does it. \$\endgroup\$flornquake– flornquake2013年07月16日 00:53:50 +00:00Commented Jul 16, 2013 at 0:53
Lastor already said what I was going to point out so I am not going to repeat that. I'll just add some other things.
I tried timing your solution with a bunch of other solutions I came up with. Among these the one with best time-memory combination should be the function mysort3
as it gave me best timing in nearly all cases. I am still looking about proper timing in Python. You can try putting in different test cases in the function tests
to test the timing for yourself.
def mysort(words):
mylist1 = sorted([i for i in words if i[:1] == "s"])
mylist2 = sorted([i for i in words if i[:1] != "s"])
list = mylist1 + mylist2
return list
def mysort3(words):
ans = []
p = ans.append
q = words.remove
words.sort()
for i in words[:]:
if i[0] == 's':
p(i)
q(i)
return ans + words
def mysort4(words):
ans1 = []
ans2 = []
p = ans1.append
q = ans2.append
for i in words:
if i[0] == 's':
p(i)
else:
q(i)
ans1.sort()
ans2.sort()
return ans1 + ans2
def mysort6(words):
return ( sorted([i for i in words if i[:1] == "s"]) +
sorted([i for i in words if i[:1] != "s"])
)
if __name__ == "__main__":
from timeit import Timer
def test(f):
f(['a','b','c','abcd','s','se', 'ee', 'as'])
print Timer(lambda: test(mysort)).timeit(number = 10000)
print Timer(lambda: test(mysort3)).timeit(number = 10000)
print Timer(lambda: test(mysort4)).timeit(number = 10000)
print Timer(lambda: test(mysort6)).timeit(number = 10000)
If you want to sort the list according to your own criterion probably the best choice is to write your own comparator.
A comparator function takes as parameters two values and has to return -1 if the first one is the smaller, 0 if they are equal and +1 if the second is the bigger.
In your case you should first check wether one of the two parameters starts with 's'. If it does manage it properly, otherwise fall back to standard string ordering.
This will turn your code in a single call to the sort function plus the definition of the comparison function.
If you want an example of how comparators can be implemented have a look at the answers of this StackOVerflow Q&A
-
\$\begingroup\$ Are you sure? IMO
sorted
along thekey
argument is on a higher level of abstraction than defining a comparator thus should be generally preferred. en.wikipedia.org/wiki/Schwartzian_transform \$\endgroup\$tokland– tokland2013年07月12日 14:24:58 +00:00Commented Jul 12, 2013 at 14:24 -
\$\begingroup\$ @tokland Python 2 takes a comparator function, Python 3 takes a key function, if that's what you mean? \$\endgroup\$Lstor– Lstor2013年07月12日 16:50:54 +00:00Commented Jul 12, 2013 at 16:50
-
\$\begingroup\$ AFAIK
sort
(in-place) has always taken a comparator function andsorted
(not in-place) a key function, but I am not 100% sure. \$\endgroup\$tokland– tokland2013年07月12日 18:08:00 +00:00Commented Jul 12, 2013 at 18:08 -
\$\begingroup\$ @tokland Ah, you are more or less right regarding
sorted
. In Python 2.7, it takes a comparator function and a key function: docs.python.org/2.7/library/… \$\endgroup\$Lstor– Lstor2013年07月12日 18:16:24 +00:00Commented Jul 12, 2013 at 18:16