Sort characters in a Python string by frequency

Question 1

I wrote a solution to the leetcode problem Sort characters by frequency and the solution passes 34/35 test cases. On the last test case, there is a "time limit exceeded" error with an input string of 100000+ "a" and "b" characters. How can the following code be optimized to handle an input of that size?

 def frequencySort(s):
 """
 :type s: str
 :rtype: str
 """
 freq = {}
 for i in s:
 if i in freq:
 freq[i] +=1
 else:
 freq[i] = 1
 output = ""
 newDict = sorted(freq.items(), key=lambda kv: kv[1],reverse = True)
 for k,v in newDict:
 for i in range(v):
 output += k
 return output

Question 2

The key issue is the line output += k --- this rebuilds the string for each additional character as strings are immutable in Python (see Shlemiel the painter's algorithm). You're better off using ''.join(chars), where chars is a list or generator.

Question 3

The "set or increment dictionary value" logic

freq = {}
for i in s:
 if i in freq:
 freq[i] +=1
 else:
 freq[i] = 1

can be simplified using a defaultdict:

from collections import defaultdict
# ...
freq = defaultdict(int)
for i in s:
 freq[i] += 1

But instead of creating, updating and sorting your own dictionary you can use the built-in Counter class:

A counter tool is provided to support convenient and rapid tallies.

and its most_common() method:

Return a list of the n most common elements and their counts from the most common to the least.

That simplifies the code to

from collections import Counter
def frequencySort(s):
 """
 :type s: str
 :rtype: str
 """
 counter = Counter(s)
 output = "".join(char * freq for char, freq in counter.most_common())
 return output

Question 4

I'm new here, and don't yet know the local customs of codereview.SE. Would you normally also make PEP8 improvements by putting the function name in lower case and adding another newline before the def?

Question 5

@Oddthinking: Yes, and PEP8 improvements are in fact part of many answers to Python questions on CR. In this particular case, the function name is given by the LeetCode submission template, so that OP cannot change it. (But it would still be a valid point for a review. You can write an answer if you want.) – The missing newline before the def was my fault (there is no import statement in the original code).

Question 6

join is a bit faster when provided with a list rather than a generator, so I would write "".join([...). Other than that, excellent answer, +1.

Question 7

If you prefer functional style you can also use itertools.starmap and operator.mul (or str.__mul__): return "".join(starmap(operator.mul, Counter(s).most_common())).

Question 8

The section:

for i in range(v):
 output += k

Can be rewritten as:

output += k*v

So that characters can be appended to the output in chunks this is much faster then doing it character by character

Question 9

I think your code is getting a timeout because of an O(n^2) on the last part.

for k,v in newDict:
 for i in range(v):
 output += k

The solution below maps every item in the ordered dict object to an ordered tuple, which is then much easier to iterate:

def solution(st):
 dict_ = {}
 for s in st:
 if s in dict_:
 dict_[s] += 1
 else:
 dict_[s] = 1
 items = sorted(dict_.items(), key=lambda kv: kv[1], reverse=True)
 string = ""
 for item in items:
 If item[1] != 0:
 string = string + item[0]*item[1]
 return string

Question 10

Welcome to CodeReview@SE. I agree with your problem analysis. Did you digest the older answers?

Martin R Martin R 24.2k2 gold badges38 silver badges96 bronze badges · Answer 1 · 2018-09-03 19:49:28Z

10

\$\begingroup\$

The "set or increment dictionary value" logic

freq = {}
for i in s:
 if i in freq:
 freq[i] +=1
 else:
 freq[i] = 1

can be simplified using a defaultdict:

from collections import defaultdict
# ...
freq = defaultdict(int)
for i in s:
 freq[i] += 1

But instead of creating, updating and sorting your own dictionary you can use the built-in Counter class:

A counter tool is provided to support convenient and rapid tallies.

and its most_common() method:

Return a list of the n most common elements and their counts from the most common to the least.

That simplifies the code to

from collections import Counter
def frequencySort(s):
 """
 :type s: str
 :rtype: str
 """
 counter = Counter(s)
 output = "".join(char * freq for char, freq in counter.most_common())
 return output

Share

edited Sep 4, 2018 at 6:24

answered Sep 3, 2018 at 19:49

Martin R's user avatar

Martin R Martin R

24.2k2 gold badges38 silver badges96 bronze badges

\$\endgroup\$

4

1

\$\begingroup\$ I'm new here, and don't yet know the local customs of codereview.SE. Would you normally also make PEP8 improvements by putting the function name in lower case and adding another newline before the def? \$\endgroup\$

Oddthinking
– Oddthinking

2018年09月04日 02:31:15 +00:00
Commented Sep 4, 2018 at 2:31
2

\$\begingroup\$ @Oddthinking: Yes, and PEP8 improvements are in fact part of many answers to Python questions on CR. In this particular case, the function name is given by the LeetCode submission template, so that OP cannot change it. (But it would still be a valid point for a review. You can write an answer if you want.) – The missing newline before the def was my fault (there is no import statement in the original code). \$\endgroup\$

Martin R
– Martin R

2018年09月04日 06:24:08 +00:00
Commented Sep 4, 2018 at 6:24
\$\begingroup\$ join is a bit faster when provided with a list rather than a generator, so I would write "".join([...). Other than that, excellent answer, +1. \$\endgroup\$

Ma0
– Ma0

2018年09月04日 07:33:58 +00:00
Commented Sep 4, 2018 at 7:33
\$\begingroup\$ If you prefer functional style you can also use itertools.starmap and operator.mul (or str.__mul__): return "".join(starmap(operator.mul, Counter(s).most_common())). \$\endgroup\$

David Foerster
– David Foerster

2018年09月04日 09:26:39 +00:00
Commented Sep 4, 2018 at 9:26

Add a comment |

3xi 3xi 1214 bronze badges · Answer 2 · 2018-09-03 19:44:42Z

The section:

for i in range(v):
 output += k

Can be rewritten as:

output += k*v

So that characters can be appended to the output in chunks this is much faster then doing it character by character

mike mutoro mike mutoro 111 bronze badge · Answer 3 · 2020-05-18 05:38:16Z

I think your code is getting a timeout because of an O(n^2) on the last part.

for k,v in newDict:
 for i in range(v):
 output += k

The solution below maps every item in the ordered dict object to an ordered tuple, which is then much easier to iterate:

def solution(st):
 dict_ = {}
 for s in st:
 if s in dict_:
 dict_[s] += 1
 else:
 dict_[s] = 1
 items = sorted(dict_.items(), key=lambda kv: kv[1], reverse=True)
 string = ""
 for item in items:
 If item[1] != 0:
 string = string + item[0]*item[1]
 return string

Welcome to CodeReview@SE. I agree with your problem analysis. Did you digest the older answers?

Stack Exchange Network

Sort characters in a Python string by frequency

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Sort characters in a Python string by frequency

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions