I have written this function that takes to strings in order to see if they are anagrams:
def anagram_check(str_x, str_y):
x = string1.replace(" ","")
y = string2.replace(" ","")
lower1 = x.lower()
lower2 = y.lower()
sorted1 = sorted(lower1)
sorted2 = sorted(lower2)
if sorted1 == sorted2:
return True
else:
return False
this function works fine, the problem is that now I need to use this function in another function in order to find anagrams in a text file. I want to print a list of tuples with all the anagrams in it. this is what i have done so far
def anagrams_finder(words_num):
anagrams = []
f = open("words.txt")
a = list(f)
list1 = ([s.replace('\n', '') for s in a])
list2 = ([i.lower() for i in list1])
list3 = list2[0:words_num] #number of words from text that need to be checked.
for i in list3:
....
I tried using for loops, while loops, appand.... but nothing seems to work. how can I use the first function in order to help me with the second? Please help...
2 Answers 2
As you say, your anagram_check() function works correctly. However, for this problem a more useful function would be one that converts the given word into its "canonical" form, such that two anagrams would have the same canonical form.
One such function is:
def canonical(word):
return ''.join(sorted(word.lower()))
Now all you have to do is have a dictionary that would map a canonical form to the list of corresponding words. You can populate this dictionary with a single pass over the text file. Producing the required list of tuples from the dictionary is trivial.
Since this is homework, I leave the remaining details for you to figure out.
Comments
yupp, came here to say the same thing as aix:
It'd take a long time to call the anagram checker functions for all possible pair of words in a large chunk of text, hence you'll need a decent hash functions to find anagrams.
An anagram-finder hash function will have two attributes:
- returns the same value for two words that are anagrams
- if the returned value for two different words are the same, then the two different words are good candidate for being anagrams (they are possibly anagrams but may not)
The hash function proposed by aix (that is sorting the letters in the word) is absolute adequate for finding anagrams, I also used this function for the same purpose on relatively large chunks of text (like the size of a book) and it worked fast.
1 Comment
canonical are anagrams, so this function is better than your proposed hash.
return (sorted1 == sorted2)directly instead of doing an extra if-then-else