The other day I was intrigued by this question (originally from here):
Write a program that answers YES/NO search queries containing * placeholders. Example: if the data you have is (hazem, ahmed, moustafa, fizo), then you should answer as follows for:
ahmed: YES m**stafa: YES fizoo: NO fizd: NO *****: YES **: NO
Your program should be able to answer each search query in \$O(1)\$.
I came up with this implementation:
class MatcherWithPlaceholders {
protected final Set<String> index = new HashSet<>();
public MatcherWithPlaceholders(List<String> words) {
words.forEach(this::addPermutations);
}
private final void addPermutations(String word) {
char[] letters = word.toCharArray();
for (List<Integer> starPositions : enumerateStarPositions(word.length())) {
String wordWithStars = getWordWithStars(letters, starPositions);
index.add(wordWithStars);
}
}
private static class ListWithIndex {
final List<Integer> list;
final int index;
private ListWithIndex(List<Integer> list, int index) {
this.list = list;
this.index = index;
}
}
protected List<List<Integer>> enumerateStarPositions(int length) {
List<List<Integer>> results = new LinkedList<>();
results.add(Arrays.asList());
Queue<ListWithIndex> canGrow = new LinkedList<>();
canGrow.add(new ListWithIndex(new LinkedList<>(), 0));
while (!canGrow.isEmpty()) {
ListWithIndex listWithIndex = canGrow.poll();
for (int i = listWithIndex.index; i < length - 1; ++i) {
List<Integer> copy = new LinkedList<>(listWithIndex.list);
copy.add(i);
results.add(copy);
canGrow.add(new ListWithIndex(copy, i + 1));
}
List<Integer> copy = new LinkedList<>(listWithIndex.list);
copy.add(length - 1);
results.add(copy);
}
return results;
}
private String getWordWithStars(char[] letters, List<Integer> starPositions) {
char[] copy = letters.clone();
for (int pos : starPositions) {
copy[pos] = '*';
}
return new String(copy);
}
public boolean hasMatch(String word) {
return index.contains(word);
}
}
Unit tests
public class MatcherWithPlaceholdersTest {
private MatcherWithPlaceholders matcher =
new MatcherWithPlaceholders(Arrays.asList("hazem", "ahmed", "moustafa", "fizo"));
private boolean hasMatch(String query) {
return matcher.hasMatch(query);
}
@Test
public void test_enumerateStarPos_1() {
assertEquals("[[], [0]]", matcher.enumerateStarPositions(1).toString());
}
@Test
public void test_enumerateStarPos_2() {
assertEquals("[[], [0], [1], [0, 1]]",
matcher.enumerateStarPositions(2).toString());
}
@Test
public void test_enumerateStarPos_3() {
assertEquals("[[], [0], [1], [2], [0, 1], [0, 2], [1, 2], [0, 1, 2]]",
matcher.enumerateStarPositions(3).toString());
}
@Test
public void test_ahmed() {
assertTrue(hasMatch("ahmed"));
}
@Test
public void test_m00stafa() {
assertTrue(hasMatch("m**stafa"));
}
@Test
public void test_00() {
assertFalse(hasMatch("**"));
}
@Test
public void test_0000() {
assertTrue(hasMatch("****"));
}
@Test
public void test_00000() {
assertTrue(hasMatch("*****"));
}
@Test
public void test_000000000000000() {
assertFalse(hasMatch("***************"));
}
@Test
public void test_fizoo() {
assertFalse(hasMatch("fizoo"));
}
@Test
public void test_fizd() {
assertFalse(hasMatch("fizd"));
}
}
The implementation works fine, unit tests passing, and it's \$O(1)\$ if you don't count the hashing of the input string to perform the lookup. Of course space is sacrificed here for speed.
I'm wondering if it's possible to improve the implementation. I find the permutation part a bit awkward, and have the feeling there's a cleaner, more elegant way to do that.
I'm also interested in other improvement ideas as well, including the main algorithm / approach.
2 Answers 2
I like the solution you use by filling a set with all possible matches. This does indeed make the lookup a \$O(1)\$ operation. The naming, style, and presentation is all fine.
I have a suggestion about the algorithm for computing the permutations of the starred words. When dealing with problems like this it is often most convenient to use bitwise manipulations.This will eliminate the cumbersome storage of the List<List<Integer>>
. To explain it better, consider the word fizo
. This has 4 letters:
fizo
0000 ****
0001 ***o
0010 **z*
0011 **zo
0100 *i**
....
1111 fizo
We can count from 0 to 15 and use the bits in each value to mask out the letters and replace them with the asterisks.
In code, this would look like:
private final void addPermutations(String word) {
int permutations = 1 << word.length();
IntStream.range(0,permutations).forEach(mask -> index.add(maskChars(mask, word)));
}
private final String maskChars(int mask, String word) {
char[] clone = word.toCharArray();
for (int i = 0; i < clone.length; i++) {
if ((mask & (1 << i)) == 0) {
clone[i] = '*';
}
}
return new String(clone);
}
In the event that you have word of more than 31 characters, you can swap to using long values. In the event that longs are not enough, use a bitset.... but you probably wold not have enough memory for that anyway ;-)
-
\$\begingroup\$ Yeay, that's pretty clever indeed! \$\endgroup\$janos– janos2015年03月02日 22:33:46 +00:00Commented Mar 2, 2015 at 22:33
Choice of datatypes
public MatcherWithPlaceholders(List<String> words)
Does it really matter that the words comes in a list? How about a Collection
? Or, actually, how about Iterable
? Iterable is after all what defines the forEach
method that you are using.
protected List<List<Integer>> enumerateStarPositions(int length)
Does the order of the result matter? Only for your unit tests. Change the tests! Don't be dependent on the implementation of toString
. Use assertArrayEquals
instead, or assertEquals(Collection, Collection)
.
In this case, I would return Set<Set<Integer>>
. No order of the integers really matters, because the integers themselves are the order (inside the String). It is important to know that there are no duplicates, so a Set
is perfect.
In fact, the inner Set<Integer>
might be better of as an array instead. So Set<int[]>
Permutations
@rolfl beat me to it on this one, but here's what I would do:
protected Set<int[]> enumerateStarPositions(int length) {
int highest = 1 << length;
Set<int[]> result = new HashSet<>();
for (int i = 0; i < highest; i++) {
int bitCount = Integer.bitCount(i);
int[] count = new int[bitCount];
int value = i;
int index = 0;
for (int bit = 0; bit < length; bit++) {
int bitValue = 1 << bit;
if ((value & bitValue) == bitValue) {
count[index++] = bit;
}
}
result.add(count);
}
return result;
}
This approach is essentially the same as @rolfl's, but his way of using streams totally beats this. (I think, until someone benchmarks)
Either way, using a Queue
and a ListWithIndex
inner class is a bit (ha-ha) overkill.
Back to the topic of...
My approach did cause a break in your unit tests though, which leads us back to the topic Choice of datatypes and Don't be dependent on the implementation of toString
. When I used Set<Set<Integer>>
:
org.junit.ComparisonFailure:
Expected :[[], [0], [1], [2], [0, 1], [0, 2], [1, 2], [0, 1, 2]]
Actual :[[], [0], [1], [0, 1], [2], [0, 2], [1, 2], [0, 1, 2]]
-
\$\begingroup\$ In practice,
assertEquals
on the string representation of arrays produces more readable failure messages in IntelliJ (maybe Eclipse too, I don't remember). I agree with the other points, nice answer, thanks! \$\endgroup\$janos– janos2015年03月02日 22:42:02 +00:00Commented Mar 2, 2015 at 22:42 -
\$\begingroup\$ @janos
assertEquals
callstoString
on the objects that is passed to it. So even if you pass a list or a set, you still get readable results. My test did break one of your unit tests, see edit. \$\endgroup\$Simon Forsberg– Simon Forsberg2015年03月02日 22:54:32 +00:00Commented Mar 2, 2015 at 22:54
Explore related questions
See similar questions with these tags.