12
\$\begingroup\$

The other day I was intrigued by this question (originally from here):

Write a program that answers YES/NO search queries containing * placeholders. Example: if the data you have is (hazem, ahmed, moustafa, fizo), then you should answer as follows for:

ahmed: YES 
m**stafa: YES 
fizoo: NO
fizd: NO
*****: YES
**: NO

Your program should be able to answer each search query in \$O(1)\$.

I came up with this implementation:

class MatcherWithPlaceholders {
 protected final Set<String> index = new HashSet<>();
 public MatcherWithPlaceholders(List<String> words) {
 words.forEach(this::addPermutations);
 }
 private final void addPermutations(String word) {
 char[] letters = word.toCharArray();
 for (List<Integer> starPositions : enumerateStarPositions(word.length())) {
 String wordWithStars = getWordWithStars(letters, starPositions);
 index.add(wordWithStars);
 }
 }
 private static class ListWithIndex {
 final List<Integer> list;
 final int index;
 private ListWithIndex(List<Integer> list, int index) {
 this.list = list;
 this.index = index;
 }
 }
 protected List<List<Integer>> enumerateStarPositions(int length) {
 List<List<Integer>> results = new LinkedList<>();
 results.add(Arrays.asList());
 Queue<ListWithIndex> canGrow = new LinkedList<>();
 canGrow.add(new ListWithIndex(new LinkedList<>(), 0));
 while (!canGrow.isEmpty()) {
 ListWithIndex listWithIndex = canGrow.poll();
 for (int i = listWithIndex.index; i < length - 1; ++i) {
 List<Integer> copy = new LinkedList<>(listWithIndex.list);
 copy.add(i);
 results.add(copy);
 canGrow.add(new ListWithIndex(copy, i + 1));
 }
 List<Integer> copy = new LinkedList<>(listWithIndex.list);
 copy.add(length - 1);
 results.add(copy);
 }
 return results;
 }
 private String getWordWithStars(char[] letters, List<Integer> starPositions) {
 char[] copy = letters.clone();
 for (int pos : starPositions) {
 copy[pos] = '*';
 }
 return new String(copy);
 }
 public boolean hasMatch(String word) {
 return index.contains(word);
 }
}

Unit tests

public class MatcherWithPlaceholdersTest {
 private MatcherWithPlaceholders matcher =
 new MatcherWithPlaceholders(Arrays.asList("hazem", "ahmed", "moustafa", "fizo"));
 private boolean hasMatch(String query) {
 return matcher.hasMatch(query);
 }
 @Test
 public void test_enumerateStarPos_1() {
 assertEquals("[[], [0]]", matcher.enumerateStarPositions(1).toString());
 }
 @Test
 public void test_enumerateStarPos_2() {
 assertEquals("[[], [0], [1], [0, 1]]",
 matcher.enumerateStarPositions(2).toString());
 }
 @Test
 public void test_enumerateStarPos_3() {
 assertEquals("[[], [0], [1], [2], [0, 1], [0, 2], [1, 2], [0, 1, 2]]",
 matcher.enumerateStarPositions(3).toString());
 }
 @Test
 public void test_ahmed() {
 assertTrue(hasMatch("ahmed"));
 }
 @Test
 public void test_m00stafa() {
 assertTrue(hasMatch("m**stafa"));
 }
 @Test
 public void test_00() {
 assertFalse(hasMatch("**"));
 }
 @Test
 public void test_0000() {
 assertTrue(hasMatch("****"));
 }
 @Test
 public void test_00000() {
 assertTrue(hasMatch("*****"));
 }
 @Test
 public void test_000000000000000() {
 assertFalse(hasMatch("***************"));
 }
 @Test
 public void test_fizoo() {
 assertFalse(hasMatch("fizoo"));
 }
 @Test
 public void test_fizd() {
 assertFalse(hasMatch("fizd"));
 }
}

The implementation works fine, unit tests passing, and it's \$O(1)\$ if you don't count the hashing of the input string to perform the lookup. Of course space is sacrificed here for speed.

I'm wondering if it's possible to improve the implementation. I find the permutation part a bit awkward, and have the feeling there's a cleaner, more elegant way to do that.

I'm also interested in other improvement ideas as well, including the main algorithm / approach.

asked Mar 2, 2015 at 20:58
\$\endgroup\$
0

2 Answers 2

8
\$\begingroup\$

I like the solution you use by filling a set with all possible matches. This does indeed make the lookup a \$O(1)\$ operation. The naming, style, and presentation is all fine.

I have a suggestion about the algorithm for computing the permutations of the starred words. When dealing with problems like this it is often most convenient to use bitwise manipulations.This will eliminate the cumbersome storage of the List<List<Integer>>. To explain it better, consider the word fizo. This has 4 letters:

fizo
0000 ****
0001 ***o
0010 **z*
0011 **zo
0100 *i**
....
1111 fizo

We can count from 0 to 15 and use the bits in each value to mask out the letters and replace them with the asterisks.

In code, this would look like:

private final void addPermutations(String word) {
 int permutations = 1 << word.length();
 IntStream.range(0,permutations).forEach(mask -> index.add(maskChars(mask, word)));
}
private final String maskChars(int mask, String word) {
 char[] clone = word.toCharArray();
 for (int i = 0; i < clone.length; i++) {
 if ((mask & (1 << i)) == 0) {
 clone[i] = '*';
 }
 }
 return new String(clone);
}

In the event that you have word of more than 31 characters, you can swap to using long values. In the event that longs are not enough, use a bitset.... but you probably wold not have enough memory for that anyway ;-)

answered Mar 2, 2015 at 22:10
\$\endgroup\$
1
  • \$\begingroup\$ Yeay, that's pretty clever indeed! \$\endgroup\$ Commented Mar 2, 2015 at 22:33
3
\$\begingroup\$

Choice of datatypes

public MatcherWithPlaceholders(List<String> words)

Does it really matter that the words comes in a list? How about a Collection? Or, actually, how about Iterable? Iterable is after all what defines the forEach method that you are using.

protected List<List<Integer>> enumerateStarPositions(int length)

Does the order of the result matter? Only for your unit tests. Change the tests! Don't be dependent on the implementation of toString. Use assertArrayEquals instead, or assertEquals(Collection, Collection).

In this case, I would return Set<Set<Integer>>. No order of the integers really matters, because the integers themselves are the order (inside the String). It is important to know that there are no duplicates, so a Set is perfect.

In fact, the inner Set<Integer> might be better of as an array instead. So Set<int[]>

Permutations

@rolfl beat me to it on this one, but here's what I would do:

protected Set<int[]> enumerateStarPositions(int length) {
 int highest = 1 << length;
 Set<int[]> result = new HashSet<>();
 for (int i = 0; i < highest; i++) {
 int bitCount = Integer.bitCount(i);
 int[] count = new int[bitCount];
 int value = i;
 int index = 0;
 for (int bit = 0; bit < length; bit++) {
 int bitValue = 1 << bit;
 if ((value & bitValue) == bitValue) {
 count[index++] = bit;
 }
 }
 result.add(count);
 }
 return result;
}

This approach is essentially the same as @rolfl's, but his way of using streams totally beats this. (I think, until someone benchmarks)

Either way, using a Queue and a ListWithIndex inner class is a bit (ha-ha) overkill.

Back to the topic of...

My approach did cause a break in your unit tests though, which leads us back to the topic Choice of datatypes and Don't be dependent on the implementation of toString. When I used Set<Set<Integer>>:

org.junit.ComparisonFailure: 
Expected :[[], [0], [1], [2], [0, 1], [0, 2], [1, 2], [0, 1, 2]]
Actual :[[], [0], [1], [0, 1], [2], [0, 2], [1, 2], [0, 1, 2]]
answered Mar 2, 2015 at 22:32
\$\endgroup\$
2
  • \$\begingroup\$ In practice, assertEquals on the string representation of arrays produces more readable failure messages in IntelliJ (maybe Eclipse too, I don't remember). I agree with the other points, nice answer, thanks! \$\endgroup\$ Commented Mar 2, 2015 at 22:42
  • \$\begingroup\$ @janos assertEquals calls toString on the objects that is passed to it. So even if you pass a list or a set, you still get readable results. My test did break one of your unit tests, see edit. \$\endgroup\$ Commented Mar 2, 2015 at 22:54

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.