2

I am creating an app where the user enters 8 characters. After he enters the string I have to see if it is an eight letter word. If not, check if contains a seven letter word etc.

I am checking against a given pool of 150k+ words. I only care about the longest possible match. Is there a better way then this one:

  • WordPool.Contains(word.substring(0,8))
  • WordPool.Contains(word.substring(0,7)),
    WordPool.Contains(word.substirng(1,7))
  • WordPool.Contains(word.substring(0,6)),
    WordPool.Contains(word.substirng(1,6)),
    WordPool.Contains(word.substirng(2,6))
  • etc...

Edit

I forgot to add that I am checking against an english dictionary.

So far I am using this:

for(i = 8; i >= 3; i--)
 for(j = 0; j <= 8 - i; j++)
 if(words.contains(word.substring(j, i))
 //do something

Edit 2 I have been using the approach defined above just with a minor change. I am using a few background agents which all search for a word of a certain length. They all then return a result and I just pick the one which gives the user the highest score.

asked Nov 26, 2013 at 19:44
10
  • While Searching integer sequences is about integers, would a modification on that work for you? Commented Nov 26, 2013 at 19:51
  • You also have to check all possible sub strings? What language is this? Do you have any performance guarantees about the Contains method? Commented Nov 26, 2013 at 19:55
  • @Ampt I need to find the longest possible match. For example if they enter the word "Carpooler" I want to stop right there despite having some smaller substrings like car and pool. Commented Nov 26, 2013 at 20:03
  • But would matching pool to ajfpoolz be allowed? Commented Nov 26, 2013 at 20:07
  • 1
    @IvanCrojachKaračić that still doesn't answer my question. If the user types in garbage like afjpoolz should it match to pool? Commented Nov 26, 2013 at 20:10

1 Answer 1

2

Preprocessing your word pool into a trie would make the longest search easy. Just traverse the trie until you can't go any further. You'd still have to try it for each starting position, though. For example:

wordPool.longestMatch("deadbeef");
wordPool.longestMatch("eadbeef");
wordPool.longestMatch("adbeef");
wordPool.longestMatch("dbeef");
wordPool.longestMatch("beef");
wordPool.longestMatch("eef");
wordPool.longestMatch("ef");
wordPool.longestMatch("f");

You could also short-circuit if you already found a match longer than the length of the remaining subsequences. The example would find "dead" on the first line, and "beef" on the fifth line, so you could automatically skip the last three subsequences.

answered Nov 26, 2013 at 20:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.