I have a set of names and a request with asterisks. An asterisk can be replaced with any letter.
For example, there is a set of names: hasad, ahmed, fizo. And there are a few sample requests: hasad, has**, fizoo. The first one and the second one should return YES, and the third one should return NO.
Here is the source of the question.
package careepcup.fb;
import java.util.HashSet;
import java.util.Set;
public class Main {
/*
* This class offers constant time performance for the basic operations (add, remove, contains
* and size), assuming the hash function disperses the elements properly among the buckets.
*/
private static final Set<String> NAMES = new HashSet<String>();
private static final String YES = "YES";
private static final String NO = "NO";
static {
NAMES.add("hasad");
NAMES.add("ahmed");
NAMES.add("moustafa");
NAMES.add("fizo");
}
public static void main(String[] args) {
System.out.println(hasName("has**"));
}
static boolean generate(String query, int length, String prefix) {
for (char letter = 'a'; letter <= 'z'; letter++) {
if (length > 1) {
boolean result = generate(query, length - 1, prefix + letter);
if (result) {
return true;
}
}
else {
char[] generated = (prefix + letter).toCharArray();
char[] queryAsArray = query.toCharArray();
for (int i = 0, j = 0; i < queryAsArray.length; i++) {
if (queryAsArray[i] == '*') {
queryAsArray[i] = generated[j];
j++;
}
}
if (NAMES.contains(new String(queryAsArray))) {
return true;
}
}
}
return false;
}
static String hasName(String query) {
int fromIndex = 0;
int asteriskCount = 0;
while ((fromIndex = query.indexOf("*", fromIndex)) != -1) {
asteriskCount++;
fromIndex++;
}
if (asteriskCount == 0) {
return NAMES.contains(query) ? YES : NO;
} else {
return generate(query, asteriskCount, "") ? YES : NO;
}
}
}
When I was writing my code, I saw this solution. I personally like it because it seems to be simpler than mine. You can find this answer here.
public class Searcher {
private final String data;
public Searcher(String[] data) {
if (data == null || data.length == 0) {
throw new IllegalArgumentException("bad data");
}
StringBuilder builder = new StringBuilder();
for (String part : data) {
builder.append(part).append("\n");
}
this.data = builder.toString();
}
public String checkValuePresent(String value) {
boolean result = search(value);
return result ? "YES" : "NO";
}
public boolean search(String input) {
if (input == null) {
throw new IllegalArgumentException("input should not be NULL");
}
String patternSource = input.replace('*', '.');
Pattern pattern = Pattern.compile("^" + patternSource + "$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(data);
boolean result = matcher.find();
return result;
}
public static void main(String[] args) {
String[] data = { "hazem", "ahmed", "moustafa", "fizo" };
Searcher searcher = new Searcher(data);
System.out.println("data = " + Arrays.toString(data));
String result = null;
String input = "ahmed";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
input = "m**stafa";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
input = "fizzo";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
input = "fizd";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
input = "*****";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
input = "****";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
input = "**";
result = searcher.checkValuePresent(input);
System.out.println(input + ": " + result);
}
}
2 Answers 2
Both solutions are inefficient:
Your solution tries to fill in all combinations of
'a'
to'z'
for every*
in the input until it finds a match in the dictionary. That's \26ドル^k\$ combinations where \$k\$ is the number of*
in the query. Consider for example the query**********
, the result of which will beNO
, but your solution will check all those non-working combinations. Pretty much run forever on such input. And it will do this for each query.The other solution converts the query to a regex and tries it on all the words in the dictionary. This is better, but still inefficient, because in the worst case it will check all the entries in the dictionary. It probably also defeats the purpose of the interview question.
A more efficient solution will be to pre-fill the dictionary with all possible positions of *
. For a word with \$n\$ letters, there will be \$\sum\binom{n}{k}\$ combinations, which happens to be \2ドル^n\$. For a word of 8 letters, that's 256 combinations. Yes, this will be a big tradeoff between time and space. While this approach ensures lookups in constant time, the space used grows exponentially with the size of the dictionary.
Given the example data, there are 336 total theoretical combinations, in reality 334 of them are unique thanks to some overlaps, for example ***e*
matches both "hazem" and "ahmed".
With a bigger dictionary, the space can become unmanageable,
while lookups remain super-fast.
-
\$\begingroup\$ "there will be sum(n over k) combinations"... for all k from 0 to n? In that case, that's
2^n
(yes, always, for alln
). I'm not sure that would work, or at least I don't understand how it works. Remember that if the input ishas*o
and your matching against the examples given, it will match neitherhasad
norfizoo
, so that will return "NO". \$\endgroup\$Simon Forsberg– Simon Forsberg2015年03月01日 11:50:19 +00:00Commented Mar 1, 2015 at 11:50 -
1\$\begingroup\$ Thanks @SimonAndréForsberg, I don't remember the math, but the values appear to be the same for a couple of examples I tried. It's a simpler way to write it and gives a better idea of the space complexity, so I updated my post, thanks for pointing out! \$\endgroup\$janos– janos2015年03月01日 12:00:52 +00:00Commented Mar 1, 2015 at 12:00
-
\$\begingroup\$ I wrote code. Please take a look at it. It's Update #1. Now I'm sure my code is far from perfect. So if you see any shortcomings, please let me know. \$\endgroup\$Maksim Dmitriev– Maksim Dmitriev2015年03月07日 14:03:27 +00:00Commented Mar 7, 2015 at 14:03
-
\$\begingroup\$ Hi @MaksimDmitriev, if you want another round of review of your revised solution, you might want to post that as a new question. Btw I was inspired by this to implement my own solution, and the reviews I got may be insightful for your too: codereview.stackexchange.com/q/83031/12390 \$\endgroup\$janos– janos2015年03月07日 16:20:29 +00:00Commented Mar 7, 2015 at 16:20
If you really want an efficient solution, you could sort the names list and then apply binary search. All you need is to define a proper Comparator that takes asterisks into account.
The comparator would be something like the following pseudocode
compare(String a, String b) {
for i from 0 to min(a.length(), b.length()) {
if a[i] == b[i] or a[i] == '*' or b[i] == '*'
next;
return the natural order of a[i] and b[i]
}
if they are unequal lengths,
then return -1 if a is shorter than b, otherwise 1
return 0 // they matched
}
With this neat trick (if I can say so myself :-) you get speed = O(log N)
Searcher
class was not written by you, I'd suggest removing it from your question and only link to it instead. This is because of various reasons. Also, is there anything specific you are concerned about? Do you want a general review? Add a few words about why you are here. \$\endgroup\$