6
\$\begingroup\$

Please critique my word frequency generator. I am a beginner programmer so any criticism are welcome.

Original Code: http://pastebin.com/rSRfbnCt

Here is my revised code after the feedback:

import string, time, webbrowser, collections, urllib2
def timefunc(function):
 def wrapped(*args):
 start = time.time()
 data = function(*args)
 end = time.time()
 timetaken = end - start
 print "Function: "+function.__name__+"\nTime taken:",timetaken
 return data
 return wrapped
@timefunc
def process_text(text_file):
 words = text_file.read().lower().split()
 words = [word.strip(string.punctuation+string.whitespace) for word in words]
 words = [word for word in words if word]#skips ''(None) elements
 return words
@timefunc
def create_freq_dict(wordlist):
 freq_dict = collections.Counter(wordlist)
 return freq_dict
@timefunc
def create_sorted_list(freqdict):
 sorted_list = [(value,key) for key,value in list(freqdict.items())]#list() provides python 3 compatibility
 sorted_list.sort(reverse=True)
 return sorted_list
@timefunc
def write_results(sorted_list):
 text_file = open('wordfreq.txt','w')
 text_file.write('Word Frequency List\n\n')
 rank = 0
 for word in sorted_list:
 rank += 1
 write_str = "[{0}] {1:-<10}{2:->10}\n".format(rank, word[1],word[0])
 text_file.write(write_str)
 text_file.close()
## The Brothers Grimm
## This file can be obtained from Project Gutenberg:
## http://www.gutenberg.org/cache/epub/5314/pg5314.txt
web_file = urllib2.urlopen('http://www.gutenberg.org/cache/epub/5314/pg5314.txt')
wordlist = process_text(web_file)
freqdict = create_freq_dict(wordlist)
sorted_list = create_sorted_list(freqdict)
results = write_results(sorted_list)
webbrowser.open('wordfreq.txt')
print "END"
asked Feb 4, 2012 at 8:57
\$\endgroup\$

2 Answers 2

7
\$\begingroup\$
import string, time, math, webbrowser
def timefunc(function, *args):
 start = time.time()
 data = function(*args)
 end = time.time()
 timetaken = end - start

I'd recommend calling this time_taken as its slightly easier to read.

 print "Function: "+function.__name__+"\nTime taken:",timetaken

Print already introduces newlines and combines different pieces. Take advantage of that.

 print "Function: ", function.__name__
 print "Time Taken: ", time_taken

That's easier to follow

 return data
def process_text(filename):

You never use filename in here, but you do use fin which is the same thing. typo?

 t = []

Not a very descriptive name. I suggest coming up with something clearer

 for line in fin:
 for i in line.split():

i usually means index which its not here

 word = i.lower()
 word = word.strip(string.punctuation)
 if word != '':
 t.append(word)
 return t

I'd do this as

 words = fin.read().lower().split()
 words = [word.strip() for word in words]
 words = [word for word in words if word]
 return words

I think its easier to follow and probably more efficient

def create_freq_dict(wordlist):
 d = dict()

d is not a very good name. Usually dicts are created with {} not dict(). No difference, but the first is generally preffered

 for word in wordlist:
 if word not in d.keys():
 d[word] = 1
 else:
 d[word] += 1

Use d = collections.defaultdict(int) or d = collections.Counter(). Both will make it easier to count up like this. See the python documentation for collections. You should actually be able to write this function in one line

 return d
def sort_dict(in_dict):
 t = []
 for key,value in in_dict.items():
 t.append((value, key))

in_dict.items() is a list already, there is no reason to copy the elements into the list. (NOTE: in Python 3.x in_dict.items() is no longer a list). Even if it wasn't a list you could do:

 t = list(in_dict.items())

Which would do the same thing your code does

 t.sort(reverse=True)
 return t

I'd implement this function as

return sorted(in_dict.items())

The sorted function takes anything sufficiently list-like and produces a sorted list from it.

def write_results(sorted_list):

sorted_list isn't a great name. It would be better to give an indication of what's in the list.

 fout = open('wordfreq.txt','w')
 fout.write('Word Frequency List\n\n')
 r = 0
 for i in sorted_list:
 r += 1

Use for r, (key, value) in enumerate(sortedlist): That way you don't need to manage r yourself, and you can refer the value as value rather then the harder to read i[1].

 fillamount = 20 - (len(i[1]) + len(str(r)))

Some of the those parens are unnecessary.

 write_str = str(r)+': '+i[1]+' '+('-' * (fillamount-2))+' '+str(i[0])+'\n'

Multiplication has precedence, you don't need the parens to make that happen. Also, python has a method ljust which does this for you

 write_str = (str(r) + ': ' + i[1]).ljust('-', 20) + str(i[0]) + '\n'

You may also want to consider using string formatting rather then adding strings

 write_str = ('%d: %s' % (r, i[1])).just('-', 20) + '%d\n' % i[0]

I think its easier to follow, although I'd probably split across several lines

 fout.write(write_str)
 fout.close()
## The Brothers Grimm
## This file can be obtained from Project Gutenberg:
## http://www.gutenberg.org/cache/epub/5314/pg5314.txt
fin = open('c:\Python27\My Programs\wx\grimm.txt')

fin presumable stands for file in. Give a name that indicates what's actually in it ike grim_text

wordlist = timefunc(process_text,fin)
freqdict = timefunc(create_freq_dict,wordlist)
sorted_list = timefunc(sort_dict,freqdict)
results = timefunc(write_results,sorted_list)

Very nice

webbrowser.open('wordfreq.txt')

That's a slightly unusual use of a webbrowser

print "END"
answered Feb 4, 2012 at 21:04
\$\endgroup\$
3
  • \$\begingroup\$ Thanks for the great feedback. I have added my revised code. \$\endgroup\$ Commented Feb 8, 2012 at 2:42
  • \$\begingroup\$ @talloaktrees, nice work. Two things: 1. the list in created_sorted_list isn't necessary for Python 3 compatibility. The python 3 iterator and python 2 list are similar enough that both work in that situation. In write_results, use for rank, word in enumerate(sorted_list): then you can avoid having to keep track of rank yourself. \$\endgroup\$ Commented Feb 8, 2012 at 5:21
  • \$\begingroup\$ Yes I accidentally skipped over enumerate(sorted_list) when I was rewriting my code. Also, I forgot to mention that webbrowser.open('wordfreq.txt') was a quick way to open up a text file in the default text editor I found somewhere. I imagine it's a bit hackish. \$\endgroup\$ Commented Feb 8, 2012 at 14:58
3
\$\begingroup\$

You could turn the timefunc function into a decorator, like this:

def timed(function):
 def wrapped(*args):
 start = time.time()
 data = function(*args)
 end = time.time()
 timetaken = end - start
 print "Function: "+function.__name__+"\nTime taken:",timetaken
 return data
 return wrapped

And then:

@timed
def process_text(filename):
 ...
@timed
def create_freq_dict(wordlist):
 ...

Etc., which would let you call your functions like this:

wordlist = process_text(fin)
freqdict = create_freq_dict(wordlist)
sorted_list = sort_dict(freqdict)
results = write_results(sorted_list)
answered Feb 6, 2012 at 12:05
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.