I have invented the fastest string search algorithm. Can someone prove that it is not fastest?
I am not considering worst case scenarios.
The algorithm can be found here: https://sourceforge.net/projects/fastest-string-search-algo/files/
/*
* Choudhary algorithm
*/
public static int choudharyStringSearchAlgorithm(char[] text, char[] pattern) {
int i = 0;
int index = 0;
int end_index = 0;
boolean not_found = false;
int text_len = text.length;
int pattern_len = pattern.length;
int pi_44 = pattern_len - 1;
int pi_34 = (3 * pattern_len) / 4;
int pi_24 = pattern_len / 2;
int pi_14 = pattern_len / 4;
int[] skip_table = new int[ASIZE];
// preprocess pattern and fill skip_table
for (i = 0; i < pattern_len; i++) {
skip_table[pattern[i]] = pattern_len - 1 - i;
}
// now search
for (i = 0; i < text_len; i++) {
if ((text_len - i) < pattern_len) {
return -1;
}
if (pattern[pi_44] != text[i + pi_44]) {
if (skip_table[(int) (text[i + pi_44])] > 0) {
i = i + skip_table[(int) (text[i + pi_44])] - 1;
}
continue;
}
if (pattern[0] != text[i]) {
continue;
}
if (pattern[pi_24] != text[i + pi_24]) {
continue;
}
if (pattern[pi_34] != text[i + pi_34]) {
continue;
}
if (pattern[pi_14] != text[i + pi_14]) {
continue;
}
end_index = i + pi_44;
not_found = false;
for (index = i; index <= end_index; index++) {
if (text[index] != pattern[index - i]) {
not_found = true;
break;
}
} // end of inner for loop
if (not_found == false) { // match is found
return i;
}
} // end of outer for loop
return -1;
} // end of choudharyStringSearchAlgorithm
1 Answer 1
If you are searching for long patterns, your algorithm wins most of the time (long strings as far to the end of the text as possible are favored), but if you compare the algorithms on patterns up to 100 characters long (human search), Boyer-Moore smokes everyone and it is not even close.
Edit: After some testing I have found out that your algorithm is worse on long patterns than if you just implemented the skip table.
Edit 2: Actually here is the implementation that is consistently faster than every other algorithm in your collection more than 95% of the time:
public class MyTextSearch implements TextSearch
{
@Override
public int search(
char[] text,
char[] pattern
)
{
int[] skip_table = new int[256];
int patternLength = pattern.length - 1;
char lastPatternChar = pattern[patternLength];
Arrays.fill(skip_table, patternLength);
for (int i = 0; i < pattern.length; i++)
{
skip_table[pattern[i]] = patternLength - i - 1;
}
for (int i = 0; i < text.length; i++)
{
if (i + patternLength >= text.length)
{
return -1;
}
char lastTextChar = text[i + patternLength];
if (lastPatternChar != lastTextChar)
{
i += skip_table[lastTextChar];
continue;
}
if (isEqual(text, i, pattern))
{
return i;
}
}
return -1;
}
private boolean isEqual(
char[] text,
int at,
char[] pattern
)
{
for (int j = 0; j < pattern.length; j++)
{
if (text[at + j] != pattern[j])
{
return false;
}
}
return true;
}
@Override
public String name()
{
return "My Text Search";
}
}
I have refactored your code a bit and it is on https://github.com/blazmrak/fastest-string-search if you are interested.
-
\$\begingroup\$ I have modified my algorithm. I tested and it comes out to be faster than my previous version and beats all other algorithms (Boyer Moore, etc.). The pattern for searching is taken from the end of string. You can find my new algorithm here: codereview.stackexchange.com/questions/270565/… \$\endgroup\$user245050– user2450502021年12月07日 08:56:36 +00:00Commented Dec 7, 2021 at 8:56
-
\$\begingroup\$ Is it faster than my implementation? Because I don't see my implementation in the list there. \$\endgroup\$Blaž Mrak– Blaž Mrak2021年12月07日 09:57:01 +00:00Commented Dec 7, 2021 at 9:57
-
\$\begingroup\$ I actually didn't include your implementation. I will do it when I get time. \$\endgroup\$user245050– user2450502021年12月07日 11:48:04 +00:00Commented Dec 7, 2021 at 11:48
This [does not implement] Boyer-Moore-Horspool
I'd call it Raita². \$\endgroup\$