2
\$\begingroup\$

I am trying to implement Boyer Moore Algorithm for text searching and have come up with this implementation:

public class BoyerMooreStringSearching
{
 readonly Dictionary<char, LastIndexTable> _lastIndexTable = new Dictionary<char, LastIndexTable>();
 public string PatternToSearch;
 public List<int> GetStartingIndexsOfPatternInText(string textToSearchIn, string patternToSearch)
 {
 var list = new List<int>();
 PatternToSearch = patternToSearch;
 if (patternToSearch != null && !string.IsNullOrEmpty(textToSearchIn))
 {
 // Update Last Index Table
 UpdateLastIndexTable(patternToSearch);
 PatternToSearch = patternToSearch;
 var j = patternToSearch.Length - 1;
 // Main loop to iterate over whole text
 while (j <= textToSearchIn.Length - 1)
 {
 var lastCharOfPattern = patternToSearch[patternToSearch.Length - 1];
 if (textToSearchIn[j] != lastCharOfPattern)
 {
 // Heuristic 1 
 // If Last Char is not matched with the Last char in pattern and char is not present in the pattern
 // Then advance pointer 'j' to the length of the pattern in textToSearch.
 if (!_lastIndexTable.ContainsKey(textToSearchIn[j]))
 {
 j += patternToSearch.Length - 1;
 }
 // Heuristic 2 
 // Consult the lastIndex table to get the last index of current char in textToSearch
 // and advance pointer 'j' to the last index in textToSearch.
 if (j <= textToSearchIn.Length - 1 && _lastIndexTable.ContainsKey(textToSearchIn[j]))
 {
 var tempObj = _lastIndexTable[textToSearchIn[j]];
 if (tempObj != null) j += tempObj.LastIndex;
 }
 }
 int k = patternToSearch.Length - 1;
 int u = j;
 if (j <= textToSearchIn.Length - 1)
 {
 while (k >= 0)
 {
 // Heuristic (3a) 
 // If Last Char is matched with the Last char in pattern then back track in the text and pattern till 
 // either you got a complete match or a mismatched charecter.
 // Once you got the mismatched char and mismatched char is not present in the pattern then
 // advance j to the index of mismatched charecter in the pattern 
 if (textToSearchIn[u] == patternToSearch[k])
 {
 if (k == 0 && textToSearchIn[u] == patternToSearch[k])
 {
 list.Add(u);
 j += patternToSearch.Length - 1;
 }
 u--;
 k--;
 continue;
 }
 if (!_lastIndexTable.ContainsKey(textToSearchIn[u]))
 {
 // Heuristic (3b) 
 // If Last Char is matched with the Last char in pattern then back track in the text till 
 // either you got a complete match or a mismatched charecter.
 // Once you got the mismatched char and mismatched char is not present in the pattern then
 // advance j to the index of mismatched charecter in the pattern plus the number to char which matched.
 j += k + (j - u);
 break;
 }
 k--;
 }
 }
 j++;
 }
 }
 if (!list.Any())
 list.Add(-1);
 return list;
 }
 private void UpdateLastIndexTable(string patternToSearch)
 {
 _lastIndexTable.Clear();
 var i = patternToSearch.Length - 1;
 foreach (var charToSeach in patternToSearch)
 {
 if (_lastIndexTable.ContainsKey(charToSeach))
 {
 _lastIndexTable[charToSeach].LastIndex = i;
 }
 else
 {
 _lastIndexTable.Add(charToSeach, new LastIndexTable
 {
 CharSearched = charToSeach,
 LastIndex = i
 });
 }
 i--;
 }
 }
}
public class LastIndexTable
{
 public char CharSearched { get; set; }
 public int LastIndex { get; set; }
}

Please can you review and provide feedback?

Trevor Pilley
2,0812 gold badges17 silver badges29 bronze badges
asked Dec 28, 2012 at 7:03
\$\endgroup\$
1
  • 1
    \$\begingroup\$ The standard way to deal with indices is v < element.Length, which is mostly to help prevent off-by-one (fencepost) errors. Dijkstra has some notes on this. \$\endgroup\$ Commented Dec 28, 2012 at 22:02

1 Answer 1

4
\$\begingroup\$

I have checked your code and there are some things which I would change:

  • Create an interface, say, IStringSearcher with one method List<int> GetStartingIndexsOfPatternInText(string textToSearchIn, string patternToSearch). That will help you to change string searching algorithm implementation without client code changes.
  • GetStartingIndexsOfPatternInText method name seems to be a bit long and personally I would change the name.
  • I would either make BoyerMooreStringSearching class as read-only or utility class. That change will be very beneficial if you think about multi-thread applications.
    • read-only - text to search and pattern to search would be provided in the class constructor and could not be changed
    • OR utility class. In that case BoyerMooreStringSearching class and GetStartingIndexsOfPatternInText method would be static.
  • I would throw some exception if provided strings do not meet your requirements (you should document your requirements in the class documentation) e.g.
    • if one of the strings is null or empty or
    • if pattern string is longer than text to search
  • PatternToSearch = patternToSearch; statement is used twice in your code: one before and another after the first if statement.
  • public string PatternToSearch; I do not think it needs to be there as you do not read it.
  • I see that you use lots of variables like k, u, j etc. I think you should use more meaningful names. After couple of months the code will be more readable for you when you forgot what you had written.

I hope that will help.

answered Dec 28, 2012 at 20:14
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.