I have the following function and I was wondering if anyone could come up with something more efficient.
It just has to split both the content and the search string and determine how many words are required to be matched before being true.
For example,
"A dog can run.".ContainsV2("Red Dog Run", 4) = false
"A dog can run.".ContainsV2("Red Dog Run", 2) = true
"A dog can run.".ContainsV2("Red Dog Run") = true
"A dog can run.".ContainsV2("Red Dog Run", MatchAllWords: true) = false
/// <summary>
/// Checks to see if any of the words are contained in the string.
/// </summary>
/// <param name="content"></param>
/// <param name="SearchString">Search string</param>
/// <param name="NumberOfMatches">Specify how many words have to match in the search string.
/// This value is automatically dropped when the user searches less words.
/// I.e. NumberOfMatches = 3 and the user searches 'Bob', NumberOfMatches gets set to 1.</param>
/// <param name="MatchAllWords">If set to true, all of the words in the search string must be contained in what's being checked</param>
/// <returns></returns>
public static bool ContainsV2(this string content, string SearchString, int NumberOfMatches = 1, bool MatchAllWords = false)
{
var matches = 0;
var numberOfWords = SearchString.Split(' ').Length;
// Update required number of matches
NumberOfMatches = NumberOfMatches > numberOfWords ?
numberOfWords :
NumberOfMatches;
if (!string.IsNullOrEmpty(content) && !string.IsNullOrEmpty(SearchString))
{
SearchString.Split(' ')
.ToList()
.ForEach(w => //Gets the index of each word after being split
matches += content
.IndexOf(
StripNonAlphaNumeric(w),
StringComparison.OrdinalIgnoreCase
) >= 0 ? 1 : 0
);
}
if (MatchAllWords)
{
return matches == numberOfWords;
}
return matches >= NumberOfMatches;
}
2 Answers 2
A few remarks about your code:
Methods should have meaningful names. Your extension method looks like a newer (better?) version of the already existing Contains
method but your extension method does something else than the original. A better, meaningful name would be ContainsWords
.
The same can be said about your arguments. In my opinion SearchString
does not cover the meaning of the argument. wordsToMatch
is more appropriate. I renamed the other arguments as well as you can see in my code.
Argument names should be camel cased.
You extension method only accepts a string containing words to match. For convenient usage I'd add an overload that takes an collections of words.
Here's what I made of it:
using System;
using System.Collections.Generic;
using System.Linq;
public static class StringExtensions
{
public static bool ContainsWords(this String source, IEnumerable<String> wordsToMatch, int matchAtLeast = 1, bool matchAll = false)
{
if(wordsToMatch == null)
{
throw new ArgumentNullException();
}
var sourceWords = GetWords(source);
var matches = sourceWords.Intersect(wordsToMatch, StringComparer.CurrentCultureIgnoreCase);
if(matchAll || wordsToMatch.Count() < matchAtLeast)
{
return matches.Count() == sourceWords.Count();
}
return matches.Count() >= matchAtLeast;
}
public static bool ContainsWords(this String source, String wordsToMatch, int matchAtLeast = 1, bool matchAll = false)
{
if(wordsToMatch == null)
{
throw new ArgumentNullException();
}
return source.ContainsWords(GetWords(wordsToMatch), matchAtLeast, matchAll);
}
// Splits string in individual words.
private static IEnumerable<String> GetWords(String source)
{
return source.Split(new [] {' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim('.'));
// Optionally: trim other punctuation.
}
}
Update concerning efficiency
In your code you are splitting the search string twice. That's not very efficient. Next, you are using IndexOf
on the source string for every word in the search string. That's not very efficient either. I'm not very good at algorithm analysis but my guess is that splitting the source and the words to match and then applying an Intersect
on the resulting arrays is more efficient.
-
\$\begingroup\$ I like it, but the question was on efficiency. I'll accept it for readability though. \$\endgroup\$Blue Eyed Behemoth– Blue Eyed Behemoth2016年04月05日 20:35:50 +00:00Commented Apr 5, 2016 at 20:35
Here is my first review. See what you can make of it! I left comments for you. They always discuss the code right below.
It is bad style to leave documentation tags empty. Either have no param tag or fill all with appropriate descriptions:
/// <param name="content"></param>
/// <param name="SearchString">Search string</param>
/// <param name="NumberOfMatches">Specify how many words have to match in the search string.
Too much implementation detail in my opinion:
/// This value is automatically dropped when the user searches less words.
/// I.e. NumberOfMatches = 3 and the user searches 'Bob', NumberOfMatches gets set to 1.</param>
Self explanatory name:
/// <param name="MatchAllWords">If set to true, all of the words in the search string must be contained in what's being checked</param>
This is self explanatory, remove the line:
/// <returns></returns>
Use a more telling name like ContainsWords:
public static bool ContainsV2(this string content, string SearchString, int NumberOfMatches = 1, bool MatchAllWords = false)
{
var matches = 0;
var numberOfWords = SearchString.Split(' ').Length;
Don't use empty comments like "Sets a variable" or "Updates value". Explain it better or leave out the comment:
// Update required number of matches
The ?: operator is hard to understand and not necessary here. Rather, use something like in my comment:
// if (NumberOfMatches > numberOfWords)
// {
// NumberOfMatches = numberOfWords;
// }
NumberOfMatches = NumberOfMatches > numberOfWords ?
numberOfWords :
NumberOfMatches;
You already split the search string above. Either don't check for null or do it in the beginning:
if (!string.IsNullOrEmpty(content) && !string.IsNullOrEmpty(SearchString))
{
This is confusing. Reduce the complexity and nesting depth:
SearchString.Split(' ')
.ToList()
.ForEach(w => //Gets the index of each word after being split
matches += content
.IndexOf(
StripNonAlphaNumeric(w),
StringComparison.OrdinalIgnoreCase
) >= 0 ? 1 : 0
);
}
Rather set this at the start of the function. It is confusing here:
if (MatchAllWords)
{
return matches == numberOfWords;
}
return matches >= NumberOfMatches;
}
-
2\$\begingroup\$ Your remarks seem reasonable, but this is not a very readable way of commenting on the pasted code \$\endgroup\$Konrad Morawski– Konrad Morawski2016年04月05日 20:31:48 +00:00Commented Apr 5, 2016 at 20:31
-
1\$\begingroup\$ @vauhochzett in my company as well, but this is a Q&A site ;) He could paste it, but there's other readers who probably won't. Look around the forum and see that people usually don't format their answers like that here. \$\endgroup\$Konrad Morawski– Konrad Morawski2016年04月05日 21:55:40 +00:00Commented Apr 5, 2016 at 21:55
-
1\$\begingroup\$ @ThomasWeller "describe" being the operational word ;) Empty tag doesn't describe anything, and "Search string" doesn't do anything to describe
searchString
either. If that was the case, this would be nothing else but gaming an imperfect system to adhere to certain rules on the surface, while disregarding their actual purpose \$\endgroup\$Konrad Morawski– Konrad Morawski2016年04月05日 21:58:55 +00:00Commented Apr 5, 2016 at 21:58 -
1\$\begingroup\$ @KonradMorawski Thanks for the explanation. You are completely correct. I will update my post accordingly. \$\endgroup\$vauhochzett– vauhochzett2016年04月06日 11:22:09 +00:00Commented Apr 6, 2016 at 11:22
-
1\$\begingroup\$ Sure thing, and grab an upvote - your suggestions are on the nose \$\endgroup\$Konrad Morawski– Konrad Morawski2016年04月06日 12:01:13 +00:00Commented Apr 6, 2016 at 12:01
StripNonAlphaNumeric()
did. \$\endgroup\$