Given a string such as 'helloyellowellow', parse all the valid strings from the given string. (Eg: [[hell,hello,yellow],[low, low]........]
I am looking for the most optimized way to write the code. Here is mine but I am not sure if this is the best way.
Full disclosure - This was an interview question
master = []
# Dictionary for us to look up words
def is_word(inputstr):
#returns True/False
def processstring(fstr,secstr,li):
if is_word(fstr):
li.append(fstr)
if len(secstr) == 0:
if len(li) != 0:
master.append(li)
return
processstring(fstr+secstr[0], secstr[1:len(secstr)],li)
def wrapperprocess(inpstr):
li = []
if len(inpstr) == 0:
return
processstring('',inpstr,li)
wrapperprocess(inpstr[1:len(inpstr)])
wrapperprocess('helloyellowellow')
print master
3 Answers 3
Since you mentioned you're looking for an efficient algorithm, and assuming you get the dictionary in advance (and not just as a callable predicate), you can use the Aho–Corasick algorithm.
Of course, if the input text is short, a more naive algorithm will be faster, for avoiding the "expensive" pre-processing of the dictionary.
Plus, an alternative python-answer: here's a simple way to simply check each substring:
def gen_words(txt):
n = len(txt)
for i in range(n):
for j in range(i+1, n+1):
subtxt = txt[i:j]
if is_word(subtxt):
yield subtxt
For uniqueness, do:
all_words = set(gen_words(txt))
Comments
You could do something like:
tgt='helloyellowellow'
with open('/usr/share/dict/words') as f:
for word in f:
word=word.strip()
if word in tgt and len(word)>1:
print word
Prints:
el
ell
he
hell
hello
lo
low
loy
ow
owe
we
well
ye
yell
yellow
If you are just looking for the function is_word that you have undefined, you can play with something like this:
def is_word(word, dic='/usr/share/dict/words'):
if not hasattr(is_word, 'words'):
with open(dic) as f:
is_word.words={word.strip() for word in f}
return word in is_word.words and len(word)>1
As a default data structure, Python sets have an average look-up time of O(1). You are very unlikely to write something on your own that is faster.
2 Comments
It's good problem to solve with,
Use Wordnet package,
While parsing your given string start with some index and keep tormenting your index value for every incremental on the index, check the existence of the same word using wordnet, it will tell you weather the particular sub-string is a meaningful or not !
To install wordnet:
https://pypi.python.org/pypi/Wordnet-bn/1.0
return li. A better approach is toyieldthe matched words, instead of maintaining a list, appending to it, and returning it.