A set of names and a request with asterisks

Question 1

I have a set of names and a request with asterisks. An asterisk can be replaced with any letter.

For example, there is a set of names: hasad, ahmed, fizo. And there are a few sample requests: hasad, has**, fizoo. The first one and the second one should return YES, and the third one should return NO.

Here is the source of the question.

package careepcup.fb;
import java.util.HashSet;
import java.util.Set;
public class Main {
 /*
 * This class offers constant time performance for the basic operations (add, remove, contains
 * and size), assuming the hash function disperses the elements properly among the buckets.
 */
 private static final Set<String> NAMES = new HashSet<String>();
 private static final String YES = "YES";
 private static final String NO = "NO";
 static {
 NAMES.add("hasad");
 NAMES.add("ahmed");
 NAMES.add("moustafa");
 NAMES.add("fizo");
 }
 public static void main(String[] args) {
 System.out.println(hasName("has**"));
 }
 static boolean generate(String query, int length, String prefix) {
 for (char letter = 'a'; letter <= 'z'; letter++) {
 if (length > 1) {
 boolean result = generate(query, length - 1, prefix + letter);
 if (result) {
 return true;
 }
 }
 else {
 char[] generated = (prefix + letter).toCharArray();
 char[] queryAsArray = query.toCharArray();
 for (int i = 0, j = 0; i < queryAsArray.length; i++) {
 if (queryAsArray[i] == '*') {
 queryAsArray[i] = generated[j];
 j++;
 }
 }
 if (NAMES.contains(new String(queryAsArray))) {
 return true;
 }
 }
 }
 return false;
 }
 static String hasName(String query) {
 int fromIndex = 0;
 int asteriskCount = 0;
 while ((fromIndex = query.indexOf("*", fromIndex)) != -1) {
 asteriskCount++;
 fromIndex++;
 }
 if (asteriskCount == 0) {
 return NAMES.contains(query) ? YES : NO;
 } else {
 return generate(query, asteriskCount, "") ? YES : NO;
 }
 }
}

When I was writing my code, I saw this solution. I personally like it because it seems to be simpler than mine. You can find this answer here.

public class Searcher {
 private final String data;
 public Searcher(String[] data) {
 if (data == null || data.length == 0) {
 throw new IllegalArgumentException("bad data");
 }
 StringBuilder builder = new StringBuilder();
 for (String part : data) {
 builder.append(part).append("\n");
 }
 this.data = builder.toString();
 }
 public String checkValuePresent(String value) {
 boolean result = search(value);
 return result ? "YES" : "NO";
 }
 public boolean search(String input) {
 if (input == null) {
 throw new IllegalArgumentException("input should not be NULL");
 }
 String patternSource = input.replace('*', '.');
 Pattern pattern = Pattern.compile("^" + patternSource + "$", Pattern.MULTILINE);
 Matcher matcher = pattern.matcher(data);
 boolean result = matcher.find();
 return result;
 }
 public static void main(String[] args) {
 String[] data = { "hazem", "ahmed", "moustafa", "fizo" };
 Searcher searcher = new Searcher(data);
 System.out.println("data = " + Arrays.toString(data));
 String result = null;
 String input = "ahmed";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "m**stafa";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "fizzo";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "fizd";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "*****";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "****";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 input = "**";
 result = searcher.checkValuePresent(input);
 System.out.println(input + ": " + result);
 }
}

Question 2

If the Searcher class was not written by you, I'd suggest removing it from your question and only link to it instead. This is because of various reasons. Also, is there anything specific you are concerned about? Do you want a general review? Add a few words about why you are here.

Question 3

@SimonAndréForsberg, I'd do that, but it's impossible to give a link to an answer on careercup.com

Question 4

Actually, it is not possible to answer a query in O(1). Reading the query string already requires O(string lenght) time and we cannot answer a query without reading the entire input string in the worst case.

Question 5

@SimonAndréForsberg, I'd like this code to be reviewed because I need some comments about memory allocation, the algorithm itself, etc. For example, I allocate a new char array in the recursive function. Is it OK? There may be other drawbacks in my solution.

Question 6

@MaksimDmitriev I believe janos just answered that below.

Question 7

Both solutions are inefficient:

Your solution tries to fill in all combinations of 'a' to 'z' for every * in the input until it finds a match in the dictionary. That's \26ドル^k\$ combinations where \$k\$ is the number of * in the query. Consider for example the query **********, the result of which will be NO, but your solution will check all those non-working combinations. Pretty much run forever on such input. And it will do this for each query.
The other solution converts the query to a regex and tries it on all the words in the dictionary. This is better, but still inefficient, because in the worst case it will check all the entries in the dictionary. It probably also defeats the purpose of the interview question.

A more efficient solution will be to pre-fill the dictionary with all possible positions of *. For a word with \$n\$ letters, there will be \$\sum\binom{n}{k}\$ combinations, which happens to be \2ドル^n\$. For a word of 8 letters, that's 256 combinations. Yes, this will be a big tradeoff between time and space. While this approach ensures lookups in constant time, the space used grows exponentially with the size of the dictionary.

Given the example data, there are 336 total theoretical combinations, in reality 334 of them are unique thanks to some overlaps, for example ***e* matches both "hazem" and "ahmed". With a bigger dictionary, the space can become unmanageable, while lookups remain super-fast.

Question 8

"there will be sum(n over k) combinations"... for all k from 0 to n? In that case, that's 2^n (yes, always, for all n). I'm not sure that would work, or at least I don't understand how it works. Remember that if the input is has*o and your matching against the examples given, it will match neither hasad nor fizoo, so that will return "NO".

Question 9

Thanks @SimonAndréForsberg, I don't remember the math, but the values appear to be the same for a couple of examples I tried. It's a simpler way to write it and gives a better idea of the space complexity, so I updated my post, thanks for pointing out!

Question 10

I wrote code. Please take a look at it. It's Update #1. Now I'm sure my code is far from perfect. So if you see any shortcomings, please let me know.

Question 11

Hi @MaksimDmitriev, if you want another round of review of your revised solution, you might want to post that as a new question. Btw I was inspired by this to implement my own solution, and the reviews I got may be insightful for your too: codereview.stackexchange.com/q/83031/12390

Question 12

If you really want an efficient solution, you could sort the names list and then apply binary search. All you need is to define a proper Comparator that takes asterisks into account.

The comparator would be something like the following pseudocode

compare(String a, String b) {
 for i from 0 to min(a.length(), b.length()) {
 if a[i] == b[i] or a[i] == '*' or b[i] == '*' 
 next; 
 return the natural order of a[i] and b[i]
 }
 if they are unequal lengths, 
 then return -1 if a is shorter than b, otherwise 1
 return 0 // they matched
}

With this neat trick (if I can say so myself :-) you get speed = O(log N)

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2015-03-01 11:40:34Z

Both solutions are inefficient:

Your solution tries to fill in all combinations of 'a' to 'z' for every * in the input until it finds a match in the dictionary. That's \26ドル^k\$ combinations where \$k\$ is the number of * in the query. Consider for example the query **********, the result of which will be NO, but your solution will check all those non-working combinations. Pretty much run forever on such input. And it will do this for each query.
The other solution converts the query to a regex and tries it on all the words in the dictionary. This is better, but still inefficient, because in the worst case it will check all the entries in the dictionary. It probably also defeats the purpose of the interview question.

A more efficient solution will be to pre-fill the dictionary with all possible positions of *. For a word with \$n\$ letters, there will be \$\sum\binom{n}{k}\$ combinations, which happens to be \2ドル^n\$. For a word of 8 letters, that's 256 combinations. Yes, this will be a big tradeoff between time and space. While this approach ensures lookups in constant time, the space used grows exponentially with the size of the dictionary.

Given the example data, there are 336 total theoretical combinations, in reality 334 of them are unique thanks to some overlaps, for example ***e* matches both "hazem" and "ahmed". With a bigger dictionary, the space can become unmanageable, while lookups remain super-fast.

"there will be sum(n over k) combinations"... for all k from 0 to n? In that case, that's 2^n (yes, always, for all n). I'm not sure that would work, or at least I don't understand how it works. Remember that if the input is has*o and your matching against the examples given, it will match neither hasad nor fizoo, so that will return "NO".
Thanks @SimonAndréForsberg, I don't remember the math, but the values appear to be the same for a couple of examples I tried. It's a simpler way to write it and gives a better idea of the space complexity, so I updated my post, thanks for pointing out!
I wrote code. Please take a look at it. It's Update #1. Now I'm sure my code is far from perfect. So if you see any shortcomings, please let me know.
Hi @MaksimDmitriev, if you want another round of review of your revised solution, you might want to post that as a new question. Btw I was inspired by this to implement my own solution, and the reviews I got may be insightful for your too: codereview.stackexchange.com/q/83031/12390

Stack Exchange Network

A set of names and a request with asterisks

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

A set of names and a request with asterisks

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions