10
\$\begingroup\$

I wrote a code that given a string (txt), will print a histogram that contains the number of occurrences of each word in it and the word itself.

A word is defined to be a sequence of chars separated by one space or more.

The code is working, I just want to see if I can improve it.

def PrintWordsOccurence(txt):
 lst = [x for x in txt.split(' ') if x != ' ' and x != '']
 newlst = list(dict.fromkeys(lst))
 for item in newlst:
 print("[{0}] {1}".format(lst.count(item),item))

An input-output example:

>> PrintWordsOccurence("hello code review! this is my code")
[1] hello
[2] code
[1] review!
[1] this
[1] is
[1] my

The first line creates a list with all words, there will be duplicates in it, so to remove them, I turn the list into a dictionary and then I turn it back to a list. In the end, for each item in the list, I print the number of its occurrences in the original string and the item itself.

The time complexity of the code is \$\Theta(n^2)\$

Any suggestions to improve the code?

Toby Speight
87.9k14 gold badges104 silver badges325 bronze badges
asked Jul 8, 2021 at 9:23
\$\endgroup\$

2 Answers 2

15
\$\begingroup\$

str.split method without arguments

It does the same filtering as you do. Just skip ' ' argument.

list(dict.fromkeys(list comprehension))

It looks overburdened, right? You can create a set of list comprehension as well:

newlst = list({x for x in txt.split()})

or just

newlst = list(set(txt.split()))

but the best way is

collections.Counter

You're reinventing it with worse time complexity.

Format strings

f-strings look better (but it depends on your task).

Function name

PEP8 advices to use lower case with underscores: print_words_occurence

All together

from collections import Counter
def print_words_occurence(txt):
 for word, count in Counter(txt.split()).items():
 print(f"[{count}] {word}")

Also consider dividing an algorithmic part and input/output - like yielding a pair (word, count) from the function and outputting it somewhere else.

answered Jul 8, 2021 at 10:34
\$\endgroup\$
2
\$\begingroup\$

This can be improved by implementing a two-line, basic counting sort in core python:

def print_words_occurence(txt):
 wdict = dict()
 for w in txt.strip().split(): 
 wdict[w] = wdict.get(w, 0) + 1
 # just to print results
 [print(f"[{wdict[k]}] {k}") for k in wdict.keys()]

The run time is O(n) if the final dict isn't sorted. [If it is sorted, it's still O(n+k).] I don't think anything can be more efficient.

answered Jul 9, 2021 at 2:40
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.