Please critique my word frequency generator. I am a beginner programmer so any criticism are welcome.
Original Code: http://pastebin.com/rSRfbnCt
Here is my revised code after the feedback:
import string, time, webbrowser, collections, urllib2
def timefunc(function):
def wrapped(*args):
start = time.time()
data = function(*args)
end = time.time()
timetaken = end - start
print "Function: "+function.__name__+"\nTime taken:",timetaken
return data
return wrapped
@timefunc
def process_text(text_file):
words = text_file.read().lower().split()
words = [word.strip(string.punctuation+string.whitespace) for word in words]
words = [word for word in words if word]#skips ''(None) elements
return words
@timefunc
def create_freq_dict(wordlist):
freq_dict = collections.Counter(wordlist)
return freq_dict
@timefunc
def create_sorted_list(freqdict):
sorted_list = [(value,key) for key,value in list(freqdict.items())]#list() provides python 3 compatibility
sorted_list.sort(reverse=True)
return sorted_list
@timefunc
def write_results(sorted_list):
text_file = open('wordfreq.txt','w')
text_file.write('Word Frequency List\n\n')
rank = 0
for word in sorted_list:
rank += 1
write_str = "[{0}] {1:-<10}{2:->10}\n".format(rank, word[1],word[0])
text_file.write(write_str)
text_file.close()
## The Brothers Grimm
## This file can be obtained from Project Gutenberg:
## http://www.gutenberg.org/cache/epub/5314/pg5314.txt
web_file = urllib2.urlopen('http://www.gutenberg.org/cache/epub/5314/pg5314.txt')
wordlist = process_text(web_file)
freqdict = create_freq_dict(wordlist)
sorted_list = create_sorted_list(freqdict)
results = write_results(sorted_list)
webbrowser.open('wordfreq.txt')
print "END"
2 Answers 2
import string, time, math, webbrowser
def timefunc(function, *args):
start = time.time()
data = function(*args)
end = time.time()
timetaken = end - start
I'd recommend calling this time_taken
as its slightly easier to read.
print "Function: "+function.__name__+"\nTime taken:",timetaken
Print already introduces newlines and combines different pieces. Take advantage of that.
print "Function: ", function.__name__
print "Time Taken: ", time_taken
That's easier to follow
return data
def process_text(filename):
You never use filename in here, but you do use fin which is the same thing. typo?
t = []
Not a very descriptive name. I suggest coming up with something clearer
for line in fin:
for i in line.split():
i
usually means index
which its not here
word = i.lower()
word = word.strip(string.punctuation)
if word != '':
t.append(word)
return t
I'd do this as
words = fin.read().lower().split()
words = [word.strip() for word in words]
words = [word for word in words if word]
return words
I think its easier to follow and probably more efficient
def create_freq_dict(wordlist):
d = dict()
d
is not a very good name. Usually dicts are created with {}
not dict()
. No difference, but the first is generally preffered
for word in wordlist:
if word not in d.keys():
d[word] = 1
else:
d[word] += 1
Use d = collections.defaultdict(int)
or d = collections.Counter()
. Both will make it easier to count up like this. See the python documentation for collections. You should actually be able to write this function in one line
return d
def sort_dict(in_dict):
t = []
for key,value in in_dict.items():
t.append((value, key))
in_dict.items()
is a list already, there is no reason to copy the elements into the list. (NOTE: in Python 3.x in_dict.items() is no longer a list). Even if it wasn't a list you could do:
t = list(in_dict.items())
Which would do the same thing your code does
t.sort(reverse=True)
return t
I'd implement this function as
return sorted(in_dict.items())
The sorted function takes anything sufficiently list-like and produces a sorted list from it.
def write_results(sorted_list):
sorted_list isn't a great name. It would be better to give an indication of what's in the list.
fout = open('wordfreq.txt','w')
fout.write('Word Frequency List\n\n')
r = 0
for i in sorted_list:
r += 1
Use for r, (key, value) in enumerate(sortedlist):
That way you don't need to manage r
yourself, and you can refer the value as value
rather then the harder to read i[1]
.
fillamount = 20 - (len(i[1]) + len(str(r)))
Some of the those parens are unnecessary.
write_str = str(r)+': '+i[1]+' '+('-' * (fillamount-2))+' '+str(i[0])+'\n'
Multiplication has precedence, you don't need the parens to make that happen. Also, python has a method ljust which does this for you
write_str = (str(r) + ': ' + i[1]).ljust('-', 20) + str(i[0]) + '\n'
You may also want to consider using string formatting rather then adding strings
write_str = ('%d: %s' % (r, i[1])).just('-', 20) + '%d\n' % i[0]
I think its easier to follow, although I'd probably split across several lines
fout.write(write_str)
fout.close()
## The Brothers Grimm
## This file can be obtained from Project Gutenberg:
## http://www.gutenberg.org/cache/epub/5314/pg5314.txt
fin = open('c:\Python27\My Programs\wx\grimm.txt')
fin presumable stands for file in. Give a name that indicates what's actually in it ike grim_text
wordlist = timefunc(process_text,fin)
freqdict = timefunc(create_freq_dict,wordlist)
sorted_list = timefunc(sort_dict,freqdict)
results = timefunc(write_results,sorted_list)
Very nice
webbrowser.open('wordfreq.txt')
That's a slightly unusual use of a webbrowser
print "END"
-
\$\begingroup\$ Thanks for the great feedback. I have added my revised code. \$\endgroup\$talloaktrees– talloaktrees2012年02月08日 02:42:22 +00:00Commented Feb 8, 2012 at 2:42
-
\$\begingroup\$ @talloaktrees, nice work. Two things: 1. the
list
increated_sorted_list
isn't necessary for Python 3 compatibility. The python 3 iterator and python 2 list are similar enough that both work in that situation. Inwrite_results
, usefor rank, word in enumerate(sorted_list):
then you can avoid having to keep track of rank yourself. \$\endgroup\$Winston Ewert– Winston Ewert2012年02月08日 05:21:32 +00:00Commented Feb 8, 2012 at 5:21 -
\$\begingroup\$ Yes I accidentally skipped over
enumerate(sorted_list)
when I was rewriting my code. Also, I forgot to mention thatwebbrowser.open('wordfreq.txt')
was a quick way to open up a text file in the default text editor I found somewhere. I imagine it's a bit hackish. \$\endgroup\$talloaktrees– talloaktrees2012年02月08日 14:58:58 +00:00Commented Feb 8, 2012 at 14:58
You could turn the timefunc
function into a decorator, like this:
def timed(function):
def wrapped(*args):
start = time.time()
data = function(*args)
end = time.time()
timetaken = end - start
print "Function: "+function.__name__+"\nTime taken:",timetaken
return data
return wrapped
And then:
@timed
def process_text(filename):
...
@timed
def create_freq_dict(wordlist):
...
Etc., which would let you call your functions like this:
wordlist = process_text(fin)
freqdict = create_freq_dict(wordlist)
sorted_list = sort_dict(freqdict)
results = write_results(sorted_list)