Python Dictionary Manipulation

Question 1

I just finished working on a coding challenge for learning Python 2.7. Essentially, I'm a function that if it is fed a string such as:

"The man drank the drink and ate the bread and forgot the drink"

I get in return:

{'and': {'ate': 1, 'forgot': 1},
 'ate': {'the': 1},
 'bread': {'and': 1},
 'drank': {'the': 1},
 'drink': {'and': 1},
 'forgot': {'the': 1},
 'man': {'drank': 1},
 'the': {'bread': 1, 'drink': 2, 'man': 1}}

In other words, each word (that has a word following it) is a key, and the value is a dictionary of words that come right after, and the number of times that happens. (drink follows the twice in the string, hence the 2 value in its dictionary.

Here's the function I wrote to accomplish this end:

def word_counts(f):
 #Function to remove punctuation, change to lowercase, etc. from incoming string 
 def string_clean(file_content):
 fc_new = "".join([i.lower() for i in file_content if i not in string.punctuation])
 fc_new = fc_new.split()
 return fc_new
 f = string_clean(f)
 unique_f = f[:] 
 #For next part of function, get the unique words found in string. 
 #We'll then run each through the string and find words that follow
 #Pop() the last word, since nothing follows it 
 unique_f = list(set(unique_f.pop()))
 result = {}
 for word in unique_f:
 next_word_keeper = {}
 for _ in range(0, len(f)-1):
 if word == f[_]:
 if f[_+1] in next_word_keeper.keys():
 next_word_keeper[f[_+1]] = next_word_keeper[f[_+1]] + 1 
 else:
 next_word_keeper[f[_+1]] = 1
 result[word] = next_word_keeper
 return result

Feedback appreciated, thanks.

Question 2

string.punctuation == string.punctuation.lower().
You don't need string_clean to be a function as you only use it once.
Don't use _ as a variable, and definitely don't in a loop, as most use it as a 'garbage' variable.
You can use f[:-1] to get the same as u = f[:];u.pop()
Your algorithm is ok, but can be a bit odd to read.

To improve your code I'd add collections.defaultdict. This will allow you to remove the innermost if/else. This is as if the value isn't in the dictionary it'll default it to something for you.

>>> from collections import defaultdict
>>> next_word_keeper = defaultdict(int)
>>> next_word_keeper['test'] += 1
>>> next_word_keeper
defaultdict(<type 'int'>, {'test': 1})
>>> next_word_keeper['test'] += 1
>>> next_word_keeper
defaultdict(<type 'int'>, {'test': 2})
>>> next_word_keeper['test2'] += 1
>>> next_word_keeper
defaultdict(<type 'int'>, {'test': 2, 'test2': 1})

Using the above should get you:

def word_counts(f):
 f = f.lower().split()
 unique_f = list(set(f[:-1]))
 result = {}
 for word in unique_f:
 next_word_keeper = defaultdict(int)
 for i in range(len(f)-1):
 if word == f[i]:
 next_word_keeper[f[i + 1]] += 1
 result[word] = next_word_keeper
 return result

But this code is not the best when it comes to readability and performance!

Instead of going through the list multiple times, you can go though it once. Using enumerate we get the current index, and then we can use it to get the next word. And then using two defaultdicts we can simplify the function to six lines:

def word_counts(line):
 line = line.lower().split()
 results = defaultdict(lambda:defaultdict(int))
 for i, value in enumerate(line[:-1]):
 results[value][line[i + 1]] += 1
 return results

You can also go onto use the itertools pairwise recipe to further simplify the code.

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2016-08-24 15:54:37Z

string.punctuation == string.punctuation.lower().
You don't need string_clean to be a function as you only use it once.
Don't use _ as a variable, and definitely don't in a loop, as most use it as a 'garbage' variable.
You can use f[:-1] to get the same as u = f[:];u.pop()
Your algorithm is ok, but can be a bit odd to read.

To improve your code I'd add collections.defaultdict. This will allow you to remove the innermost if/else. This is as if the value isn't in the dictionary it'll default it to something for you.

>>> from collections import defaultdict
>>> next_word_keeper = defaultdict(int)
>>> next_word_keeper['test'] += 1
>>> next_word_keeper
defaultdict(<type 'int'>, {'test': 1})
>>> next_word_keeper['test'] += 1
>>> next_word_keeper
defaultdict(<type 'int'>, {'test': 2})
>>> next_word_keeper['test2'] += 1
>>> next_word_keeper
defaultdict(<type 'int'>, {'test': 2, 'test2': 1})

Using the above should get you:

def word_counts(f):
 f = f.lower().split()
 unique_f = list(set(f[:-1]))
 result = {}
 for word in unique_f:
 next_word_keeper = defaultdict(int)
 for i in range(len(f)-1):
 if word == f[i]:
 next_word_keeper[f[i + 1]] += 1
 result[word] = next_word_keeper
 return result

But this code is not the best when it comes to readability and performance!

Instead of going through the list multiple times, you can go though it once. Using enumerate we get the current index, and then we can use it to get the next word. And then using two defaultdicts we can simplify the function to six lines:

def word_counts(line):
 line = line.lower().split()
 results = defaultdict(lambda:defaultdict(int))
 for i, value in enumerate(line[:-1]):
 results[value][line[i + 1]] += 1
 return results

You can also go onto use the itertools pairwise recipe to further simplify the code.

Stack Exchange Network

Python Dictionary Manipulation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python Dictionary Manipulation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions