Fastest string search algorithm

Question 1

I have invented the fastest string search algorithm. Can someone prove that it is not fastest?

I am not considering worst case scenarios.

The algorithm can be found here: https://sourceforge.net/projects/fastest-string-search-algo/files/


 /*
 * Choudhary algorithm
 */
 public static int choudharyStringSearchAlgorithm(char[] text, char[] pattern) {
 int i = 0;
 int index = 0;
 int end_index = 0;
 boolean not_found = false;
 int text_len = text.length;
 int pattern_len = pattern.length;
 int pi_44 = pattern_len - 1;
 int pi_34 = (3 * pattern_len) / 4;
 int pi_24 = pattern_len / 2;
 int pi_14 = pattern_len / 4;
 int[] skip_table = new int[ASIZE];
 // preprocess pattern and fill skip_table
 for (i = 0; i < pattern_len; i++) {
 skip_table[pattern[i]] = pattern_len - 1 - i;
 }
 // now search
 for (i = 0; i < text_len; i++) {
 if ((text_len - i) < pattern_len) {
 return -1;
 }
 if (pattern[pi_44] != text[i + pi_44]) {
 if (skip_table[(int) (text[i + pi_44])] > 0) {
 i = i + skip_table[(int) (text[i + pi_44])] - 1;
 }
 continue;
 }
 if (pattern[0] != text[i]) {
 continue;
 }
 if (pattern[pi_24] != text[i + pi_24]) {
 continue;
 }
 if (pattern[pi_34] != text[i + pi_34]) {
 continue;
 }
 if (pattern[pi_14] != text[i + pi_14]) {
 continue;
 }
 end_index = i + pi_44;
 not_found = false;
 for (index = i; index <= end_index; index++) {
 if (text[index] != pattern[index - i]) {
 not_found = true;
 break;
 }
 } // end of inner for loop
 if (not_found == false) { // match is found
 return i;
 }
 } // end of outer for loop
 return -1;
 } // end of choudharyStringSearchAlgorithm

Question 2

Looks like you've re-invented Boyer-Moore-Horspool string searching. At least in theory, Boyer-Moore string searching is a bit faster (but honestly, the difference is quite small). Even though it's been invented before, independently re-inventing BMH is no mean feat.

Question 3

This algorithm is not Boyer-Moore-Horspool. Please see the algorithm carefully. It compares characters at indices pattern_len - 1, 0, (3 * pattern_len) / 4, pattern_len / 2, and pattern_len / 4 before going for full pattern search.

Question 4

This [does not implement] Boyer-Moore-Horspool I'd call it Raita².

Question 5

If you are searching for long patterns, your algorithm wins most of the time (long strings as far to the end of the text as possible are favored), but if you compare the algorithms on patterns up to 100 characters long (human search), Boyer-Moore smokes everyone and it is not even close.

Edit: After some testing I have found out that your algorithm is worse on long patterns than if you just implemented the skip table.

Edit 2: Actually here is the implementation that is consistently faster than every other algorithm in your collection more than 95% of the time:

public class MyTextSearch implements TextSearch
{
 @Override
 public int search(
 char[] text,
 char[] pattern
 )
 {
 int[] skip_table = new int[256];
 int patternLength = pattern.length - 1;
 char lastPatternChar = pattern[patternLength];
 Arrays.fill(skip_table, patternLength);
 for (int i = 0; i < pattern.length; i++)
 {
 skip_table[pattern[i]] = patternLength - i - 1;
 }
 for (int i = 0; i < text.length; i++)
 {
 if (i + patternLength >= text.length)
 {
 return -1;
 }
 char lastTextChar = text[i + patternLength];
 if (lastPatternChar != lastTextChar)
 {
 i += skip_table[lastTextChar];
 continue;
 }
 if (isEqual(text, i, pattern))
 {
 return i;
 }
 }
 return -1;
 }
 private boolean isEqual(
 char[] text,
 int at,
 char[] pattern
 )
 {
 for (int j = 0; j < pattern.length; j++)
 {
 if (text[at + j] != pattern[j])
 {
 return false;
 }
 }
 return true;
 }
 @Override
 public String name()
 {
 return "My Text Search";
 }
}

I have refactored your code a bit and it is on https://github.com/blazmrak/fastest-string-search if you are interested.

Question 6

I have modified my algorithm. I tested and it comes out to be faster than my previous version and beats all other algorithms (Boyer Moore, etc.). The pattern for searching is taken from the end of string. You can find my new algorithm here: codereview.stackexchange.com/questions/270565/…

Question 7

Is it faster than my implementation? Because I don't see my implementation in the list there.

Question 8

I actually didn't include your implementation. I will do it when I get time.

Blaž Mrak Blaž Mrak 8734 silver badges7 bronze badges · Answer 1 · 2021-07-17 19:50:57Z

If you are searching for long patterns, your algorithm wins most of the time (long strings as far to the end of the text as possible are favored), but if you compare the algorithms on patterns up to 100 characters long (human search), Boyer-Moore smokes everyone and it is not even close.

Edit: After some testing I have found out that your algorithm is worse on long patterns than if you just implemented the skip table.

Edit 2: Actually here is the implementation that is consistently faster than every other algorithm in your collection more than 95% of the time:

public class MyTextSearch implements TextSearch
{
 @Override
 public int search(
 char[] text,
 char[] pattern
 )
 {
 int[] skip_table = new int[256];
 int patternLength = pattern.length - 1;
 char lastPatternChar = pattern[patternLength];
 Arrays.fill(skip_table, patternLength);
 for (int i = 0; i < pattern.length; i++)
 {
 skip_table[pattern[i]] = patternLength - i - 1;
 }
 for (int i = 0; i < text.length; i++)
 {
 if (i + patternLength >= text.length)
 {
 return -1;
 }
 char lastTextChar = text[i + patternLength];
 if (lastPatternChar != lastTextChar)
 {
 i += skip_table[lastTextChar];
 continue;
 }
 if (isEqual(text, i, pattern))
 {
 return i;
 }
 }
 return -1;
 }
 private boolean isEqual(
 char[] text,
 int at,
 char[] pattern
 )
 {
 for (int j = 0; j < pattern.length; j++)
 {
 if (text[at + j] != pattern[j])
 {
 return false;
 }
 }
 return true;
 }
 @Override
 public String name()
 {
 return "My Text Search";
 }
}

I have refactored your code a bit and it is on https://github.com/blazmrak/fastest-string-search if you are interested.

I have modified my algorithm. I tested and it comes out to be faster than my previous version and beats all other algorithms (Boyer Moore, etc.). The pattern for searching is taken from the end of string. You can find my new algorithm here: codereview.stackexchange.com/questions/270565/…
Is it faster than my implementation? Because I don't see my implementation in the list there.
I actually didn't include your implementation. I will do it when I get time.

Stack Exchange Network

Fastest string search algorithm

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Fastest string search algorithm

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions