\$\begingroup\$
\$\endgroup\$
1
I am trying to implement Boyer Moore Algorithm for text searching and have come up with this implementation:
public class BoyerMooreStringSearching
{
readonly Dictionary<char, LastIndexTable> _lastIndexTable = new Dictionary<char, LastIndexTable>();
public string PatternToSearch;
public List<int> GetStartingIndexsOfPatternInText(string textToSearchIn, string patternToSearch)
{
var list = new List<int>();
PatternToSearch = patternToSearch;
if (patternToSearch != null && !string.IsNullOrEmpty(textToSearchIn))
{
// Update Last Index Table
UpdateLastIndexTable(patternToSearch);
PatternToSearch = patternToSearch;
var j = patternToSearch.Length - 1;
// Main loop to iterate over whole text
while (j <= textToSearchIn.Length - 1)
{
var lastCharOfPattern = patternToSearch[patternToSearch.Length - 1];
if (textToSearchIn[j] != lastCharOfPattern)
{
// Heuristic 1
// If Last Char is not matched with the Last char in pattern and char is not present in the pattern
// Then advance pointer 'j' to the length of the pattern in textToSearch.
if (!_lastIndexTable.ContainsKey(textToSearchIn[j]))
{
j += patternToSearch.Length - 1;
}
// Heuristic 2
// Consult the lastIndex table to get the last index of current char in textToSearch
// and advance pointer 'j' to the last index in textToSearch.
if (j <= textToSearchIn.Length - 1 && _lastIndexTable.ContainsKey(textToSearchIn[j]))
{
var tempObj = _lastIndexTable[textToSearchIn[j]];
if (tempObj != null) j += tempObj.LastIndex;
}
}
int k = patternToSearch.Length - 1;
int u = j;
if (j <= textToSearchIn.Length - 1)
{
while (k >= 0)
{
// Heuristic (3a)
// If Last Char is matched with the Last char in pattern then back track in the text and pattern till
// either you got a complete match or a mismatched charecter.
// Once you got the mismatched char and mismatched char is not present in the pattern then
// advance j to the index of mismatched charecter in the pattern
if (textToSearchIn[u] == patternToSearch[k])
{
if (k == 0 && textToSearchIn[u] == patternToSearch[k])
{
list.Add(u);
j += patternToSearch.Length - 1;
}
u--;
k--;
continue;
}
if (!_lastIndexTable.ContainsKey(textToSearchIn[u]))
{
// Heuristic (3b)
// If Last Char is matched with the Last char in pattern then back track in the text till
// either you got a complete match or a mismatched charecter.
// Once you got the mismatched char and mismatched char is not present in the pattern then
// advance j to the index of mismatched charecter in the pattern plus the number to char which matched.
j += k + (j - u);
break;
}
k--;
}
}
j++;
}
}
if (!list.Any())
list.Add(-1);
return list;
}
private void UpdateLastIndexTable(string patternToSearch)
{
_lastIndexTable.Clear();
var i = patternToSearch.Length - 1;
foreach (var charToSeach in patternToSearch)
{
if (_lastIndexTable.ContainsKey(charToSeach))
{
_lastIndexTable[charToSeach].LastIndex = i;
}
else
{
_lastIndexTable.Add(charToSeach, new LastIndexTable
{
CharSearched = charToSeach,
LastIndex = i
});
}
i--;
}
}
}
public class LastIndexTable
{
public char CharSearched { get; set; }
public int LastIndex { get; set; }
}
Please can you review and provide feedback?
Trevor Pilley
2,0812 gold badges17 silver badges29 bronze badges
1 Answer 1
\$\begingroup\$
\$\endgroup\$
I have checked your code and there are some things which I would change:
- Create an interface, say,
IStringSearcher
with one methodList<int> GetStartingIndexsOfPatternInText(string textToSearchIn, string patternToSearch)
. That will help you to change string searching algorithm implementation without client code changes. GetStartingIndexsOfPatternInText
method name seems to be a bit long and personally I would change the name.- I would either make
BoyerMooreStringSearching
class as read-only or utility class. That change will be very beneficial if you think about multi-thread applications.- read-only -
text to search
andpattern to search
would be provided in the class constructor and could not be changed - OR utility class. In that case
BoyerMooreStringSearching
class andGetStartingIndexsOfPatternInText
method would be static.
- read-only -
- I would throw some exception if provided strings do not meet your requirements (you should document your requirements in the class documentation) e.g.
- if one of the strings is null or empty or
- if
pattern string
is longer thantext to search
PatternToSearch = patternToSearch;
statement is used twice in your code: one before and another after the firstif
statement.public string PatternToSearch;
I do not think it needs to be there as you do not read it.- I see that you use lots of variables like
k
,u
,j
etc. I think you should use more meaningful names. After couple of months the code will be more readable for you when you forgot what you had written.
I hope that will help.
lang-cs
v < element.Length
, which is mostly to help prevent off-by-one (fencepost) errors. Dijkstra has some notes on this. \$\endgroup\$