Use up all characters to form words

Question 1

Task:

Given a dictionary of words, and a set of characters, judge if all the characters can form the words from the dictionary, without any characters left. For example, given the dictionary {hello, world, is, my, first, program}, if the characters set is "iiifrssst", you should return 'true' because you can form {is, is, first} from the set; if the character set is "eiifrsst", you should return 'false' because you cannot use all the characters from the set.

P.S. there may be tens of thousands of words in the dictionary, and the chars set length could be up to hundreds, so I really need some efficient algorithm.

Source: http://www.careercup.com/question?id=5841660122497024

Here is my solution in Python:

from collections import Counter
from itertools import combinations_with_replacement
import copy
# Check if word can be made up with the characters in char_count
def valid_word(word):
 # build character count for word
 word_char_count = Counter(word)
 # check if character count for word
 for char in word_char_count.keys():
 if char not in char_count or word_char_count[char] > char_count[char]:
 return False
 return True
# Only return words that can be made up from characters with char_set
def prune_words(words):
 res = []
 for word in words:
 if valid_word(word):
 res.append(word)
 return res
def main(words, char_set):
 global char_count
 char_count = Counter(char_set) # count character occurrences
 # Prune words that cannot possible be made from the set of characters
 words = prune_words(words)
 words_len = [len(word) for word in words]
 # Maximum number of r with generating combinations
 max_r = len(char_set) / min(words_len)
 # Generate permutations with the valid words
 for r in range(1, max_r + 1):
 for comb in combinations_with_replacement(words, r):
 # If the total length of the combination matches length of char_set,
 # do further checks
 if sum(map(len, comb)) == len(char_set):
 temp_count = copy.deepcopy(char_count)
 # Subtract char count of each word in combination from char_set
 for word in comb:
 temp_count.subtract(word)
 # If the remaining count are all zero, then we have found an
 # answer
 if all([count == 0 for count in temp_count.values()]):
 print comb
 return True
 return False
if __name__ == '__main__':
 print main(['hello', 'world', 'is', 'my', 'first', 'program'],
 'iiifrssst')

What improvements can be made to my code (e.g. algorithmic speed, make it more 'Python', style, etc etc)? Happy to hear any constructive comments!

Question 2

My initial impression is that this sounds like a Set Cover problem, which would be NP-complete.

Question 3

Since "the chars set length could be up to hundreds", it can easily contain enough letters to form any word in an English dictionary. In such a case prune_words would not prune any words. This makes me think the pruning is an example of premature optimization.
Instead of using Counter to check a match, I believe it would be faster to compare sorted lists of characters, i.e. compare sorted("".join(comb)) to a pre-computed sorted(char_set).
Copying a Counter by Counter(char_count) is faster than by copy.deepcopy.
To check if all counts are zeros you don't need the list comprehension as you can simply use if not any(temp_count.itervalues())
You check lots of combinations that are the wrong length. Even though the length check is the first thing inside the loop, it would be better to generate the combinations in a way that allows you to back off when the combined length exceeds the length of char_set. I would use recursion to explore the combinations.
A more clever approach might be found from representing the dictionary as a prefix tree.

Question 4

Replace `prune_words` with a filter.

Replace words = prune_words(words) with words = filter(valid_word, words).
You can now remove prune_words.

Avoid Deepcopy

There doesn't appear to be any reason to preserve char_sets, so why not subtract directly from it?

Question 5

Thanks. Just a question regarding "filter" - why do I need to change valid_word to return None instead of False? I read the docs but it's not clear to me what they mean.

Question 6

It is not necessary to change valid_word to return None.

Question 7

Because you don't really care about the order of the letters in the word, you could perform some simple preprocessing in order to deal with fewer words :

>>> with open('/etc/dictionaries-common/words', 'r') as f:
... words = {w.strip().lower() for w in f}
... 
>>> words_anagram = { ''.join(sorted(w)) for w in words}
>>> len(words)
97723
>>> len(words_anagram)
90635

It might not help much but it's so simple that it cannot hurt much neither.

Janne Karila Janne Karila 10.6k21 silver badges34 bronze badges · Accepted Answer · 2014-03-31 16:01:52Z

Since "the chars set length could be up to hundreds", it can easily contain enough letters to form any word in an English dictionary. In such a case prune_words would not prune any words. This makes me think the pruning is an example of premature optimization.
Instead of using Counter to check a match, I believe it would be faster to compare sorted lists of characters, i.e. compare sorted("".join(comb)) to a pre-computed sorted(char_set).
Copying a Counter by Counter(char_count) is faster than by copy.deepcopy.
To check if all counts are zeros you don't need the list comprehension as you can simply use if not any(temp_count.itervalues())
You check lots of combinations that are the wrong length. Even though the length check is the first thing inside the loop, it would be better to generate the combinations in a way that allows you to back off when the combined length exceeds the length of char_set. I would use recursion to explore the combinations.
A more clever approach might be found from representing the dictionary as a prefix tree.

Stack Exchange Network

Use up all characters to form words

3 Answers 3

Replace `prune_words` with a filter.

Avoid Deepcopy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Use up all characters to form words

3 Answers 3

Replace prune_words with a filter.

Avoid Deepcopy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Replace `prune_words` with a filter.