Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 0fd737e

Browse files
github-actionsgithub-actions
github-actions
authored and
github-actions
committed
Formatted with Google Java Formatter
1 parent 6cb034b commit 0fd737e

File tree

1 file changed

+155
-145
lines changed

1 file changed

+155
-145
lines changed

‎strings/HorspoolSearch.java‎

Lines changed: 155 additions & 145 deletions
Original file line numberDiff line numberDiff line change
@@ -3,160 +3,170 @@
33
import java.util.HashMap;
44

55
/**
6-
* This class is not thread safe<br><br>
7-
* (From wikipedia)
8-
* In computer science, the Boyer–Moore–Horspool algorithm or Horspool's algorithm is an algorithm for finding
9-
* substrings in strings. It was published by Nigel Horspool in 1980. <br>
10-
* <a href=https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm>Wikipedia page</a><br><br>
11-
* <p>
12-
* An explanation:<br>
13-
* <p>
14-
* The Horspool algorithm is a simplification of the Boyer-Moore algorithm in that it uses only one of the two heuristic
15-
* methods for increasing the number of characters shifted when finding a bad match in the text. This method is usually
16-
* called the "bad symbol" or "bad character" shift. The bad symbol shift method is classified as an input enhancement
17-
* method in the theory of algorithms. Input enhancement is (from wikipedia) the principle that processing a given input
18-
* to a problem and altering it in a specific way will increase runtime efficiency or space efficiency, or both. Both
19-
* algorithms try to match the pattern and text comparing the pattern symbols to the text's from right to left.<br><br>
20-
* <p>
21-
* In the bad symbol shift method, a table is created prior to the search, called the "bad symbol table". The bad symbol
22-
* table contains the shift values for any symbol in the text and pattern. For these symbols, the value is the length of
23-
* the pattern, if the symbol is not in the first (length - 1) of the pattern. Else it is the distance from its
24-
* rightmost occurrence in the pattern to the last symbol of the pattern. In practice, we only calculate the values for
25-
* the ones that exist in the first (length - 1) of the pattern.<br><br>
26-
* <p>
27-
* For more details on the algorithm and the more advanced Boyer-Moore I recommend checking out the wikipedia page and
28-
* professor Anany Levitin's book: Introduction To The Design And Analysis Of Algorithms.
29-
* </p>
6+
* This class is not thread safe<br>
7+
* <br>
8+
* (From wikipedia) In computer science, the Boyer–Moore–Horspool algorithm or Horspool's algorithm
9+
* is an algorithm for finding substrings in strings. It was published by Nigel Horspool in 1980.
10+
* <br>
11+
* <a href=https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm>Wikipedia
12+
* page</a><br>
13+
* <br>
14+
*
15+
* <p>An explanation:<br>
16+
*
17+
* <p>The Horspool algorithm is a simplification of the Boyer-Moore algorithm in that it uses only
18+
* one of the two heuristic methods for increasing the number of characters shifted when finding a
19+
* bad match in the text. This method is usually called the "bad symbol" or "bad character" shift.
20+
* The bad symbol shift method is classified as an input enhancement method in the theory of
21+
* algorithms. Input enhancement is (from wikipedia) the principle that processing a given input to
22+
* a problem and altering it in a specific way will increase runtime efficiency or space efficiency,
23+
* or both. Both algorithms try to match the pattern and text comparing the pattern symbols to the
24+
* text's from right to left.<br>
25+
* <br>
26+
*
27+
* <p>In the bad symbol shift method, a table is created prior to the search, called the "bad symbol
28+
* table". The bad symbol table contains the shift values for any symbol in the text and pattern.
29+
* For these symbols, the value is the length of the pattern, if the symbol is not in the first
30+
* (length - 1) of the pattern. Else it is the distance from its rightmost occurrence in the pattern
31+
* to the last symbol of the pattern. In practice, we only calculate the values for the ones that
32+
* exist in the first (length - 1) of the pattern.<br>
33+
* <br>
34+
*
35+
* <p>For more details on the algorithm and the more advanced Boyer-Moore I recommend checking out
36+
* the wikipedia page and professor Anany Levitin's book: Introduction To The Design And Analysis Of
37+
* Algorithms.
3038
*/
3139
public class HorspoolSearch {
3240

33-
private static HashMap<Character, Integer> shiftValues; // bad symbol table
34-
private static Integer patternLength;
35-
private static int comparisons = 0; // total comparisons in the current/last search
36-
37-
/**
38-
* Case sensitive version version of the algorithm
39-
*
40-
* @param pattern the pattern to be searched for (needle)
41-
* @param text the text being searched in (haystack)
42-
* @return -1 if not found or first index of the pattern in the text
43-
*/
44-
public static int findFirst(String pattern, String text) {
45-
return firstOccurrence(pattern, text, true);
46-
}
47-
48-
/**
49-
* Case insensitive version version of the algorithm
50-
*
51-
* @param pattern the pattern to be searched for (needle)
52-
* @param text the text being searched in (haystack)
53-
* @return -1 if not found or first index of the pattern in the text
54-
*/
55-
public static int findFirstInsensitive(String pattern, String text) {
56-
return firstOccurrence(pattern, text, false);
57-
}
58-
59-
/**
60-
* Utility method that returns comparisons made by last run (mainly for tests)
61-
*
62-
* @return number of character comparisons of the last search
63-
*/
64-
public static Integer getLastComparisons() {
65-
return HorspoolSearch.comparisons;
66-
}
67-
68-
/**
69-
* Fairly standard implementation of the Horspool algorithm. Only the index of the last character of the pattern on the
70-
* text is saved and shifted by the appropriate amount when a mismatch is found. The algorithm stops at the first
71-
* match or when the entire text has been exhausted.
72-
*
73-
* @param pattern String to be matched in the text
74-
* @param text text String
75-
* @return index of first occurrence of the pattern in the text
76-
*/
77-
private static int firstOccurrence(String pattern, String text, boolean caseSensitive) {
78-
shiftValues = calcShiftValues(pattern); // build the bad symbol table
79-
comparisons = 0; // reset comparisons
80-
81-
int textIndex = pattern.length() - 1; // align pattern with text start and get index of the last character
82-
83-
// while pattern is not out of text bounds
84-
while (textIndex < text.length()) {
85-
86-
// try to match pattern with current part of the text starting from last character
87-
int i = pattern.length() - 1;
88-
while (i >= 0) {
89-
comparisons++;
90-
char patternChar = pattern.charAt(i);
91-
char textChar = text.charAt(
92-
(textIndex + i) - (pattern.length() - 1)
93-
);
94-
if (!charEquals(patternChar, textChar, caseSensitive)) { // bad character, shift pattern
95-
textIndex += getShiftValue(text.charAt(textIndex));
96-
break;
97-
}
98-
i--;
99-
}
100-
101-
// check for full match
102-
if (i == -1) {
103-
return textIndex - pattern.length() + 1;
104-
}
41+
private static HashMap<Character, Integer> shiftValues; // bad symbol table
42+
private static Integer patternLength;
43+
private static int comparisons = 0; // total comparisons in the current/last search
44+
45+
/**
46+
* Case sensitive version version of the algorithm
47+
*
48+
* @param pattern the pattern to be searched for (needle)
49+
* @param text the text being searched in (haystack)
50+
* @return -1 if not found or first index of the pattern in the text
51+
*/
52+
public static int findFirst(String pattern, String text) {
53+
return firstOccurrence(pattern, text, true);
54+
}
55+
56+
/**
57+
* Case insensitive version version of the algorithm
58+
*
59+
* @param pattern the pattern to be searched for (needle)
60+
* @param text the text being searched in (haystack)
61+
* @return -1 if not found or first index of the pattern in the text
62+
*/
63+
public static int findFirstInsensitive(String pattern, String text) {
64+
return firstOccurrence(pattern, text, false);
65+
}
66+
67+
/**
68+
* Utility method that returns comparisons made by last run (mainly for tests)
69+
*
70+
* @return number of character comparisons of the last search
71+
*/
72+
public static Integer getLastComparisons() {
73+
return HorspoolSearch.comparisons;
74+
}
75+
76+
/**
77+
* Fairly standard implementation of the Horspool algorithm. Only the index of the last character
78+
* of the pattern on the text is saved and shifted by the appropriate amount when a mismatch is
79+
* found. The algorithm stops at the first match or when the entire text has been exhausted.
80+
*
81+
* @param pattern String to be matched in the text
82+
* @param text text String
83+
* @return index of first occurrence of the pattern in the text
84+
*/
85+
private static int firstOccurrence(String pattern, String text, boolean caseSensitive) {
86+
shiftValues = calcShiftValues(pattern); // build the bad symbol table
87+
comparisons = 0; // reset comparisons
88+
89+
int textIndex =
90+
pattern.length() - 1; // align pattern with text start and get index of the last character
91+
92+
// while pattern is not out of text bounds
93+
while (textIndex < text.length()) {
94+
95+
// try to match pattern with current part of the text starting from last character
96+
int i = pattern.length() - 1;
97+
while (i >= 0) {
98+
comparisons++;
99+
char patternChar = pattern.charAt(i);
100+
char textChar = text.charAt((textIndex + i) - (pattern.length() - 1));
101+
if (!charEquals(patternChar, textChar, caseSensitive)) { // bad character, shift pattern
102+
textIndex += getShiftValue(text.charAt(textIndex));
103+
break;
105104
}
105+
i--;
106+
}
106107

107-
// text exhausted, return failure
108-
return -1;
108+
// check for full match
109+
if (i == -1) {
110+
return textIndex - pattern.length() + 1;
111+
}
109112
}
110113

111-
/**
112-
* Compares the argument characters
113-
*
114-
* @param c1 first character
115-
* @param c2 second character
116-
* @param caseSensitive boolean determining case sensitivity of comparison
117-
* @return truth value of the equality comparison
118-
*/
119-
private static boolean charEquals(char c1, char c2, boolean caseSensitive) {
120-
if (caseSensitive) {
121-
return c1 == c2;
122-
}
123-
return Character.toLowerCase(c1) == Character.toLowerCase(c2);
114+
// text exhausted, return failure
115+
return -1;
116+
}
117+
118+
/**
119+
* Compares the argument characters
120+
*
121+
* @param c1 first character
122+
* @param c2 second character
123+
* @param caseSensitive boolean determining case sensitivity of comparison
124+
* @return truth value of the equality comparison
125+
*/
126+
private static boolean charEquals(char c1, char c2, boolean caseSensitive) {
127+
if (caseSensitive) {
128+
return c1 == c2;
124129
}
125-
126-
/**
127-
* Builds the bad symbol table required to run the algorithm. The method starts from the second to last character
128-
* of the pattern and moves to the left. When it meets a new character, it is by definition its rightmost occurrence
129-
* and therefore puts the distance from the current index to the index of the last character into the table. If the
130-
* character is already in the table, then it is not a rightmost occurrence, so it continues.
131-
*
132-
* @param pattern basis for the bad symbol table
133-
* @return the bad symbol table
134-
*/
135-
private static HashMap<Character, Integer> calcShiftValues(String pattern) {
136-
patternLength = pattern.length();
137-
HashMap<Character, Integer> table = new HashMap<>();
138-
139-
for (int i = pattern.length() - 2; i >= 0; i--) { // length - 2 is the index of the second to last character
140-
char c = pattern.charAt(i);
141-
int finalI = i;
142-
table.computeIfAbsent(c, k -> pattern.length() - 1 - finalI);
143-
}
144-
145-
return table;
130+
return Character.toLowerCase(c1) == Character.toLowerCase(c2);
131+
}
132+
133+
/**
134+
* Builds the bad symbol table required to run the algorithm. The method starts from the second to
135+
* last character of the pattern and moves to the left. When it meets a new character, it is by
136+
* definition its rightmost occurrence and therefore puts the distance from the current index to
137+
* the index of the last character into the table. If the character is already in the table, then
138+
* it is not a rightmost occurrence, so it continues.
139+
*
140+
* @param pattern basis for the bad symbol table
141+
* @return the bad symbol table
142+
*/
143+
private static HashMap<Character, Integer> calcShiftValues(String pattern) {
144+
patternLength = pattern.length();
145+
HashMap<Character, Integer> table = new HashMap<>();
146+
147+
for (int i = pattern.length() - 2;
148+
i >= 0;
149+
i--) { // length - 2 is the index of the second to last character
150+
char c = pattern.charAt(i);
151+
int finalI = i;
152+
table.computeIfAbsent(c, k -> pattern.length() - 1 - finalI);
146153
}
147154

148-
/**
149-
* Helper function that uses the bad symbol shift table to return the appropriate shift value for a given character
150-
*
151-
* @param c character
152-
* @return shift value that corresponds to the character argument
153-
*/
154-
private static Integer getShiftValue(char c) {
155-
if (shiftValues.get(c) != null) {
156-
return shiftValues.get(c);
157-
} else {
158-
return patternLength;
159-
}
155+
return table;
156+
}
157+
158+
/**
159+
* Helper function that uses the bad symbol shift table to return the appropriate shift value for
160+
* a given character
161+
*
162+
* @param c character
163+
* @return shift value that corresponds to the character argument
164+
*/
165+
private static Integer getShiftValue(char c) {
166+
if (shiftValues.get(c) != null) {
167+
return shiftValues.get(c);
168+
} else {
169+
return patternLength;
160170
}
161-
171+
}
162172
}

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /