2
\$\begingroup\$

I have a set of names and a request with asterisks. An asterisk can be replaced with any letter.

For example, there is a set of names: hasad, ahmed, fizo. And there are a few sample requests: hasad, has**, fizoo. The first one and the second one should return YES, and the third one should return NO.

Here is the source of the question.

package careepcup.fb;
import java.util.HashSet;
import java.util.Set;
public class Main {
 /*
 * This class offers constant time performance for the basic operations (add, remove, contains
 * and size), assuming the hash function disperses the elements properly among the buckets.
 */
 private static final Set<String> NAMES = new HashSet<String>();
 private static final String YES = "YES";
 private static final String NO = "NO";
 static {
 NAMES.add("hasad");
 NAMES.add("ahmed");
 NAMES.add("moustafa");
 NAMES.add("fizo");
 }
 public static void main(String[] args) {
 System.out.println(hasName("has**"));
 }
 static boolean generate(String query, int length, String prefix) {
 for (char letter = 'a'; letter <= 'z'; letter++) {
 if (length > 1) {
 boolean result = generate(query, length - 1, prefix + letter);
 if (result) {
 return true;
 }
 }
 else {
 char[] generated = (prefix + letter).toCharArray();
 char[] queryAsArray = query.toCharArray();
 for (int i = 0, j = 0; i < queryAsArray.length; i++) {
 if (queryAsArray[i] == '*') {
 queryAsArray[i] = generated[j];
 j++;
 }
 }
 if (NAMES.contains(new String(queryAsArray))) {
 return true;
 }
 }
 }
 return false;
 }
 static String hasName(String query) {
 int fromIndex = 0;
 int asteriskCount = 0;
 while ((fromIndex = query.indexOf("*", fromIndex)) != -1) {
 asteriskCount++;
 fromIndex++;
 }
 if (asteriskCount == 0) {
 return NAMES.contains(query) ? YES : NO;
 } else {
 return generate(query, asteriskCount, "") ? YES : NO;
 }
 }
}

When I was writing my code, I saw this solution. I personally like it because it seems to be simpler than mine. You can find this answer here.

public class Searcher {
 private final String data;
 public Searcher(String[] data) {
 if (data == null || data.length == 0) {
 throw new IllegalArgumentException("bad data");
 }
 StringBuilder builder = new StringBuilder();
 for (String part : data) {
 builder.append(part).append("\n");
 }
 this.data = builder.toString();
 }
 public String checkValuePresent(String value) {
 boolean result = search(value);
 return result ? "YES" : "NO";
 }
 public boolean search(String input) {
 if (input == null) {
 throw new IllegalArgumentException("input should not be NULL");
 }
 String patternSource = input.replace('*', '.');
 Pattern pattern = Pattern.compile("^" + patternSource + "$", Pattern.MULTILINE);
 Matcher matcher = pattern.matcher(data);
 boolean result = matcher.find();
 return result;
 }
 public static void main(String[] args) {
 String[] data = { "hazem", "ahmed", "moustafa", "fizo" };
 Searcher searcher = new Searcher(data);
 System.out.println("data = " + Arrays.toString(data));
 String result = null;
 String input = "ahmed";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "m**stafa";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "fizzo";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "fizd";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "*****";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "****";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "**";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 }
}
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Feb 28, 2015 at 18:03
\$\endgroup\$
9
  • 1
    \$\begingroup\$ If the Searcher class was not written by you, I'd suggest removing it from your question and only link to it instead. This is because of various reasons. Also, is there anything specific you are concerned about? Do you want a general review? Add a few words about why you are here. \$\endgroup\$ Commented Feb 28, 2015 at 18:17
  • \$\begingroup\$ @SimonAndréForsberg, I'd do that, but it's impossible to give a link to an answer on careercup.com \$\endgroup\$ Commented Feb 28, 2015 at 18:20
  • 1
    \$\begingroup\$ Actually, it is not possible to answer a query in O(1). Reading the query string already requires O(string lenght) time and we cannot answer a query without reading the entire input string in the worst case. \$\endgroup\$ Commented Feb 28, 2015 at 21:38
  • 1
    \$\begingroup\$ @SimonAndréForsberg, I'd like this code to be reviewed because I need some comments about memory allocation, the algorithm itself, etc. For example, I allocate a new char array in the recursive function. Is it OK? There may be other drawbacks in my solution. \$\endgroup\$ Commented Mar 1, 2015 at 10:36
  • 1
    \$\begingroup\$ @MaksimDmitriev I believe janos just answered that below. \$\endgroup\$ Commented Mar 1, 2015 at 11:45

2 Answers 2

5
\$\begingroup\$

Both solutions are inefficient:

  • Your solution tries to fill in all combinations of 'a' to 'z' for every * in the input until it finds a match in the dictionary. That's \26ドル^k\$ combinations where \$k\$ is the number of * in the query. Consider for example the query **********, the result of which will be NO, but your solution will check all those non-working combinations. Pretty much run forever on such input. And it will do this for each query.

  • The other solution converts the query to a regex and tries it on all the words in the dictionary. This is better, but still inefficient, because in the worst case it will check all the entries in the dictionary. It probably also defeats the purpose of the interview question.

A more efficient solution will be to pre-fill the dictionary with all possible positions of *. For a word with \$n\$ letters, there will be \$\sum\binom{n}{k}\$ combinations, which happens to be \2ドル^n\$. For a word of 8 letters, that's 256 combinations. Yes, this will be a big tradeoff between time and space. While this approach ensures lookups in constant time, the space used grows exponentially with the size of the dictionary.

Given the example data, there are 336 total theoretical combinations, in reality 334 of them are unique thanks to some overlaps, for example ***e* matches both "hazem" and "ahmed". With a bigger dictionary, the space can become unmanageable, while lookups remain super-fast.

answered Mar 1, 2015 at 11:40
\$\endgroup\$
4
  • \$\begingroup\$ "there will be sum(n over k) combinations"... for all k from 0 to n? In that case, that's 2^n (yes, always, for all n). I'm not sure that would work, or at least I don't understand how it works. Remember that if the input is has*o and your matching against the examples given, it will match neither hasad nor fizoo, so that will return "NO". \$\endgroup\$ Commented Mar 1, 2015 at 11:50
  • 1
    \$\begingroup\$ Thanks @SimonAndréForsberg, I don't remember the math, but the values appear to be the same for a couple of examples I tried. It's a simpler way to write it and gives a better idea of the space complexity, so I updated my post, thanks for pointing out! \$\endgroup\$ Commented Mar 1, 2015 at 12:00
  • \$\begingroup\$ I wrote code. Please take a look at it. It's Update #1. Now I'm sure my code is far from perfect. So if you see any shortcomings, please let me know. \$\endgroup\$ Commented Mar 7, 2015 at 14:03
  • \$\begingroup\$ Hi @MaksimDmitriev, if you want another round of review of your revised solution, you might want to post that as a new question. Btw I was inspired by this to implement my own solution, and the reviews I got may be insightful for your too: codereview.stackexchange.com/q/83031/12390 \$\endgroup\$ Commented Mar 7, 2015 at 16:20
0
\$\begingroup\$

If you really want an efficient solution, you could sort the names list and then apply binary search. All you need is to define a proper Comparator that takes asterisks into account.

The comparator would be something like the following pseudocode

compare(String a, String b) {
 for i from 0 to min(a.length(), b.length()) {
 if a[i] == b[i] or a[i] == '*' or b[i] == '*' 
 next; 
 return the natural order of a[i] and b[i]
 }
 if they are unequal lengths, 
 then return -1 if a is shorter than b, otherwise 1
 return 0 // they matched
}

With this neat trick (if I can say so myself :-) you get speed = O(log N)

answered Mar 1, 2015 at 16:22
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.