Create a words histogram of a given string in python

Question 1

I wrote a code that given a string (txt), will print a histogram that contains the number of occurrences of each word in it and the word itself.

A word is defined to be a sequence of chars separated by one space or more.

The code is working, I just want to see if I can improve it.

def PrintWordsOccurence(txt):
 lst = [x for x in txt.split(' ') if x != ' ' and x != '']
 newlst = list(dict.fromkeys(lst))
 for item in newlst:
 print("[{0}] {1}".format(lst.count(item),item))

An input-output example:

>> PrintWordsOccurence("hello code review! this is my code")
[1] hello
[2] code
[1] review!
[1] this
[1] is
[1] my

The first line creates a list with all words, there will be duplicates in it, so to remove them, I turn the list into a dictionary and then I turn it back to a list. In the end, for each item in the list, I print the number of its occurrences in the original string and the item itself.

The time complexity of the code is \$\Theta(n^2)\$

Any suggestions to improve the code?

Question 2

str.split method without arguments

It does the same filtering as you do. Just skip ' ' argument.

list(dict.fromkeys(list comprehension))

It looks overburdened, right? You can create a set of list comprehension as well:

newlst = list({x for x in txt.split()})

or just

newlst = list(set(txt.split()))

but the best way is

collections.Counter

You're reinventing it with worse time complexity.

Format strings

f-strings look better (but it depends on your task).

Function name

PEP8 advices to use lower case with underscores: print_words_occurence

All together

from collections import Counter
def print_words_occurence(txt):
 for word, count in Counter(txt.split()).items():
 print(f"[{count}] {word}")

Also consider dividing an algorithmic part and input/output - like yielding a pair (word, count) from the function and outputting it somewhere else.

Question 3

This can be improved by implementing a two-line, basic counting sort in core python:

def print_words_occurence(txt):
 wdict = dict()
 for w in txt.strip().split(): 
 wdict[w] = wdict.get(w, 0) + 1
 # just to print results
 [print(f"[{wdict[k]}] {k}") for k in wdict.keys()]

The run time is O(n) if the final dict isn't sorted. [If it is sorted, it's still O(n+k).] I don't think anything can be more efficient.

Pavlo Slavynskyy Pavlo Slavynskyy 2,3846 silver badges11 bronze badges · Answer 1 · 2021-07-08 10:34:27Z

str.split method without arguments

It does the same filtering as you do. Just skip ' ' argument.

list(dict.fromkeys(list comprehension))

It looks overburdened, right? You can create a set of list comprehension as well:

newlst = list({x for x in txt.split()})

or just

newlst = list(set(txt.split()))

but the best way is

collections.Counter

You're reinventing it with worse time complexity.

Format strings

f-strings look better (but it depends on your task).

Function name

PEP8 advices to use lower case with underscores: print_words_occurence

All together

from collections import Counter
def print_words_occurence(txt):
 for word, count in Counter(txt.split()).items():
 print(f"[{count}] {word}")

Also consider dividing an algorithmic part and input/output - like yielding a pair (word, count) from the function and outputting it somewhere else.

ingyhere ingyhere 1294 bronze badges · Answer 2 · 2021-07-09 02:40:59Z

This can be improved by implementing a two-line, basic counting sort in core python:

def print_words_occurence(txt):
 wdict = dict()
 for w in txt.strip().split(): 
 wdict[w] = wdict.get(w, 0) + 1
 # just to print results
 [print(f"[{wdict[k]}] {k}") for k in wdict.keys()]

The run time is O(n) if the final dict isn't sorted. [If it is sorted, it's still O(n+k).] I don't think anything can be more efficient.

Stack Exchange Network

Create a words histogram of a given string in python

2 Answers 2

str.split method without arguments

list(dict.fromkeys(list comprehension))

collections.Counter

Format strings

Function name

All together

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Create a words histogram of a given string in python

2 Answers 2

str.split method without arguments

list(dict.fromkeys(list comprehension))

collections.Counter

Format strings

Function name

All together

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions