Given a suffix array for a word, check if a pattern (consecutive chars) exists.
Example:
A suffix array constructed for "ameya" should return true for "ame" but false for "foo" or false for "ameyp".
This algoritm is case sensitive.
I'm looking for code-review, best practices and optimizations.
public final class SuffixArrayPatternMatch {
private final String[] suffixArray;
public SuffixArrayPatternMatch(String[] suffixArray) {
this.suffixArray = suffixArray;
}
/**
* Returns 0, if the string starts with the pattern.
* Returns 1, if the string is greater than 1.
* Returns -1, if the string is smaller than pattern.
*
*
* @param str the string to be matched with the pattern.
* @param pattern the pattern to be searched in the string
* @return -1, 0, or 1.
*/
private int compare(String str, String pattern) {
int length = str.length() < pattern.length() ? str.length() : pattern.length();
for (int i = 0; i < length; i++) {
if (str.charAt(i) < pattern.charAt(i)) return -1;
if (str.charAt(i) > pattern.charAt(i)) return 1;
}
// string was greater than myself. thus return 0.
return str.length() < pattern.length() ? -1 : 0;
}
private boolean binarySearch(int lb, int hb, String pattern) {
if (lb > hb) {
return false;
}
int mid = (lb + hb) / 2;
int match = compare(suffixArray[mid], pattern);
if (match == 0) return true;
if (match > 0) {
return binarySearch(lb, mid -1, pattern);
} else {
return binarySearch(mid + 1, hb, pattern);
}
}
/**
* Checks the pattern if present in the string
*
* @param pattern the pattern to be searched in the string
* @return true if pattern is found else false.
*/
public boolean search (String pattern) {
return binarySearch(0, suffixArray.length - 1, pattern);
}
}
public class SuffixArrayPatternArrayTest {
@Test
public void testTrue() {
String[] suffix = {"a", "ameya", "eya", "meya", "ya"};
SuffixArrayPatternMatch sapm = new SuffixArrayPatternMatch(suffix);
for (int i = 0; i < suffix.length; i++) {
assertTrue(sapm.search(suffix[i]));
}
assertTrue(sapm.search("mey"));
assertTrue(sapm.search("ame"));
}
@Test
public void testFalse() {
String[] suffix = {"a", "ameya", "eya", "meya", "ya"};
SuffixArrayPatternMatch sapm = new SuffixArrayPatternMatch(suffix);
assertFalse(sapm.search("foo"));
assertFalse(sapm.search("yap"));
assertFalse(sapm.search("ameyap"));
}
}
2 Answers 2
Since you have a hidden implementation requirement that the suffix be sorted (i.e. binary search), you should sort the array before assigning it to the member variable.
Your compare
implementation differs from that of the String class in that it returns 0 if str
starts with pattern
. Why not code it that way?
private int compare(String str, String pattern) {
return str.startsWith(pattern) ? 0 : str.compareTo(pattern);
}
-
\$\begingroup\$ A suffix array is sorted by definition. Instead of sorting, I would throw an exception if the suffix array is not sorted. \$\endgroup\$mjolka– mjolka2014年07月03日 06:54:36 +00:00Commented Jul 3, 2014 at 6:54
-
\$\begingroup\$ @mjolka in the same boat as @rolfl, didn't realize this was an actual data structure. In light of such, an
exception
would be appropriate. \$\endgroup\$Origineil– Origineil2014年07月03日 18:23:38 +00:00Commented Jul 3, 2014 at 18:23
Right, talk about confusing. This code is written about 'suffix search', but all it does is prefix-search. Suffix happens at the end of the word, prefix happens at the beginning.
-
\$\begingroup\$ It's checking if the pattern is a prefix of any element in the suffix array.
p
is a substring ofs
if and only ifp
is a prefix of a suffix ofs
. \$\endgroup\$mjolka– mjolka2014年07月03日 07:13:03 +00:00Commented Jul 3, 2014 at 7:13 -
\$\begingroup\$ @mjolka - Oh, you're right. I have not heard of the concept of a "suffix array" before. I see it is a well-defined structure. This is a clear indication that the question does not contain enough detail about the problem being solved. I don't think I will delete my answer, it makes it clear that naming is a problem in the code. \$\endgroup\$rolfl– rolfl2014年07月03日 11:19:31 +00:00Commented Jul 3, 2014 at 11:19