I am having a problem with my program: it does as required but largest_blanagram()
is too slow. I'm not sure whether the best way to speed it up is (1) try to speed up blanagram()
, which is being called a lot (2) figure out a way to call blanagram()
less. I'm not sure how to go about either of these things particularly.
Things I've been thinking: is there a way to use another dictionary more efficiently, such as using map()
?
# CW Q2 blanagram.py
import string
from Q1_anagram import readfile, make_anagram_dict # change
def blanagram(word, anagramdict): # split up into smaller functions?
"""
"""
word = sorted(word) # sort alphabetically
alphabet = list(string.ascii_lowercase)
index = 0
blanagrams = []
for char in word:
original_char = char # save original char, so can revert back to it later
for letter in alphabet:
word[index] = letter
word_copy = word # as order messes up when order alphabetically, so save current state and revert back to it after the iteration
word = ''.join(sorted(word)) # list -> string to be able to check dict and sort alphabetically
if word in anagramdict.keys() and anagramdict[word] not in blanagrams:
for anagramdict[word] in anagramdict[word]:
blanagrams.append(anagramdict[word])# change name?
word = word_copy
# next character in word
word = list(word) # string -> list in order to index
word[index] = original_char # set back to original char, as only 1 char can be different from original word
index += 1 # to change following char
return blanagrams # get rid of anagrams?
def largest_blanagram(words, word_length): # greatest number of blanagram variants
"""
"""
blanagrams = {}
length = 0
for word in words:
if len(word) == word_length:
variants = blanagram(word, make_anagram_dict(words))
#print(variants)
if len(variants) > length:
blanagrams[word] = variants
length = len(variants)
return blanagrams
if __name__ == "__main__":
# print(blanagram('ghost', make_anagram_dict(readfile('words.txt'))))
import sys
print(largest_blanagram(readfile('words.txt'), 13)) #sys.argv[1]
Imports -
def readfile(filename):
"""
"""
file = open(filename,'r').readlines() # look into WITH open file
words = []
for line in file:
line = line.lower().split() # orang and Orang? unique? or is asteer another answer
for word in line:
words.append(word)
return words
def make_anagram_dict(words):
"""
"""
assert len(words) != 0
anagram_dict = {}
for word in words:
alpha = ''.join(sorted(word)) # Order word alphabetically
# Key is alphabetically ordered word, value is list of anagrams of the key
anagram_dict.setdefault(alpha, []).append(word) # So not appending NoneType (as list.append returns None)
return anagram_dict
And here is a pastebin equivalent (pastebin has file size limit) link to the words.txt file that I use.
-
2\$\begingroup\$ Can you give inputs / outputs to play with? \$\endgroup\$FunkySayu– FunkySayu2016年12月24日 18:31:26 +00:00Commented Dec 24, 2016 at 18:31
-
3\$\begingroup\$ Welcome to CodeReview.SE! Your question title (and your comment) should tell us more about the problem you are trying to solve. Also, it seems like we are missing some code from a different file... \$\endgroup\$SylvainD– SylvainD2016年12月24日 18:32:19 +00:00Commented Dec 24, 2016 at 18:32
-
\$\begingroup\$ You have lots of loops on the data. So some options to speed it up:1) use pandas and use column-based formulas (Will use fast C code of pandas and numpy) 2) using Cython can increase the loop speed about 100 times. \$\endgroup\$keiv.fly– keiv.fly2016年12月24日 18:55:27 +00:00Commented Dec 24, 2016 at 18:55
-
1\$\begingroup\$ The indentation in the second piece of code seems wrong. Please have a look at at. \$\endgroup\$SylvainD– SylvainD2016年12月24日 20:47:31 +00:00Commented Dec 24, 2016 at 20:47
-
1\$\begingroup\$ So, what exactly is a blanagram? \$\endgroup\$Graipher– Graipher2016年12月24日 20:52:29 +00:00Commented Dec 24, 2016 at 20:52
1 Answer 1
Do not compute things more often than you need to
The point of make_anagram_dict
is to have a cache to perform some anagram check very efficiently. However, you rebuild it for each and every words which misses the point of having a cache.
By writing:
ana_dict = make_anagram_dict(words)
length = 0
for word in words:
if len(word) == word_length:
variants = blanagram(word, ana_dict)
things should be much faster already.
Strange loop
I must confess that for anagramdict[word] in anagramdict[word]
really surprised me. What are you expecting from this loop?