Complex search program

Question 1

I am working on a fairly complex (at least it feels complex to me at the moment) search program and am looking to possibly increase the performance. Everything works exactly how I want it to, but I'm just wondering if there are any slight performance increases I could benefit from. In this code, even minor performance increases are essential. This is due to the extensive amount of operations being performed.

// Split search into multiple terms and check the {x} longest terms against the cache.
string[] word = searchTerm.Split(' ');
Array.Sort(word, (x, y) => y.Length.CompareTo(x.Length));
for (int i = 0; i < word.Length; i++)
{
 string searchValue = word[i];
 if (i <= MAX_WORD_ITERATIONS && (xmlsearchResults == BLANK_SEARCH_XML_SCHEMA || xmlsearchResults == string.Empty))
 {
 xmlsearchResults = GetCachedRecord(thisSearch, searchValue);
 }
 if (xmlsearchResults != BLANK_SEARCH_XML_SCHEMA && xmlsearchResults != string.Empty)
 {
 thisSearch.Value = searchValue;
 ignoreCache = true;
 break;
 }
}

xmlString.Append(string.Format(@"<{0}>", xmlHeader));
int objcount = 0;
string prevResultName = string.Empty;
try
{
 if (searchResults.hitCount > 0)
 {
 List<SearchItem> SearchResults = new List<SearchItem>();
 foreach (Node ResultNode in Results.Nodes)
 {
 string code = ResultNode.code.ToString();
 string result = ResultNode.id.ToString();
 string name = ResultNode.name_l.ToString();
 string image = string.Empty;
 if (!code.ToLower().Contains("ccb") && code.Length <= MAX_CODE_LENGTH &&
 name != prevResultName && objcount < maxResults)
 {
 Boolean addResult = true;
 if (thisSearch.FilterDescription)
 {
 if (!name.ToLower().Contains(thisSearch.Value.ToLower()))
 {
 addResult = false;
 }
 }
 if (addResult)
 {
 SearchItem oSearch = new SearchItem(name, code);
 SearchResults.Add(oSearch);
 prevResultName = name;
 }
 }
 }
 var SortedSearchResults = SearchResults.OrderByDescending(s => s.Downloads).ToList();
 for (int i = 0; i < SortedSearchResults.Count(); i++)
 {
 if ((objcount < maxResults))
 {
 string te = string.Format("<ResultName>{0}</ResultName>", SortedSearchResults[i].Description);
 if (!xmlString.ToString().Contains(te))
 {
 xmlString.Append(SortedSearchResults[i].GetXMLString());
 objcount += 1;
 }
 }
 }
 }
}
xmlString.Append(string.Format(@"</{0}>", xmlHeader));
return objcount;

Question 2

Your code is not complete. Can you please at least tell us the types of all the identifiers you use?

Question 3

Comparing strings case insensitively is done by using the proper overload:

string.Compare(str1, str2, StringComparison.OrdinalIgnoreCase)

By calling .ToLower() for each string to compare (4 times for each iteration), you create 4 new strings each time. If you have a lot of (long) strings, this will create a lot of garbage for no reason.

In your case you use .Contains() so you can mimic this by using .IndexOf():

str4.IndexOf("dyt2", StringComparison.OrdinalIgnoreCase) >= 0

It's prettier to use bool instead of Boolean, partly because it also keeps your code consistent.

Write your variable names in full: objectCount, previousResultName, etc.

Constants are PascalCase, not WHATEVER_THIS_IS as in Java.

The addResult variable seems a little wonky. Instead of this:

Boolean addResult = true;
if (thisSearch.FilterDescription)
{
 if (!name.ToLower().Contains(thisSearch.Value.ToLower()))
 {
 addResult = false;
 }
}
if (addResult)
{
 SearchItem oSearch = new SearchItem(name, code);
 SearchResults.Add(oSearch);
 prevResultName = name;
}

Why not this:

if (!(thisSearch.FilterDescription && !name.ToLower().Contains(thisSearch.Value.ToLower()))
{
 SearchItem oSearch = new SearchItem(name, code);
 SearchResults.Add(oSearch);
 prevResultName = name;
}

You can clean this up a bit more but I'm not in a boolean algebra mood.

Question 4

Okay thanks, so for the .ToLower().Contains(... it should just be string.Compare(code, "ccd", StringComparison.OrdinalIgnoreCase) instead?

Question 5

Oh, I didn't realize it was a .Contains call. In that case use .IndexOf() with the appropriate overload. I'll edit it.

Question 6

Okay, so I would actually want <, right? Because it's a Not operation.

Question 7

@Volearix: correct.

Question 8

THIS_IS_SNAKE_CASE! *Kicks you into well*

Question 9

One way to improve performance would be to not sort the whole list, since you're only concerned about the MAX_WORD_ITERATIONS longest items. Just do a partial sort.

At the very least, limit your for loop to only going MAX_WORD_ITERATIONS times. If it's not found by then, it won't be found.

It looks like the first if serves no purpose and can be eliminated.

Code segment 1

Array.PartialSort(word, (x, y) => y.Length.CompareTo(x.Length),MAX_WORD);
for (int i = 0; i <= MAX_WORD_ITERATIONS; i++)
{
 string searchValue = word[i];
 xmlsearchResults = GetCachedRecord(thisSearch, searchValue);
 if (xmlsearchResults != BLANK_SEARCH_XML_SCHEMA && xmlsearchResults != string.Empty)
 {
 thisSearch.Value = searchValue;
 ignoreCache = true;
 break;
 }
}

Code segment 2

xmlString.Append(string.Format(@"<{0}>", xmlHeader));

How about using AppendFormat()? No speed increase but easier-to-read code.

if (!code.ToLower().Contains("ccb")

Instead of creating a temporary lowercase string, just use the case-insensitive compare:

code.IndexOf("ccb", StringComparison.OrdinalIgnoreCase) < 0

See https://stackoverflow.com/questions/444798/case-insensitive-containsstring

 var SortedSearchResults = SearchResults.OrderByDescending(s => s.Downloads).ToList();

Why are you converting it to a List<>? You're not doing anything that requires a List<>. Leave it as an IEnumerable<> and use a foreach to iterate over it.

Question 10

One thing which might improve performance:

I'm not sure what xmlString is but given how it's used it's probably a StringBuilder. What you do in this loop:

 for (int i = 0; i < SortedSearchResults.Count(); i++)
 {
 if ((objcount < maxResults))
 {
 string te = string.Format("<ResultName>{0}</ResultName>", SortedSearchResults[i].Description);
 if (!xmlString.ToString().Contains(te))
 {
 xmlString.Append(SortedSearchResults[i].GetXMLString());
 objcount += 1;
 }
 }
 }

is that you convert the content of the StringBuilder into a string then do a string search on it (Contains()) and then append if it's no already present. So for every result you have to search through all previous results in the end making this an an O(n^2) operation. Using a Distinct filter should reduce the complexity to O(n). LINQ doesn't provide a Distinct for a specific property but it's easy to write a little extension method (courtesy of SO):

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
 (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
 HashSet<TKey> seenKeys = new HashSet<TKey>();
 foreach (TSource element in source)
 {
 if (seenKeys.Add(keySelector(element)))
 {
 yield return element;
 }
 }
}

HashSet<> has O(1) lookup hence this filter is O(n).

Then your loop becomes:

 var objCount = 0;
 foreach (var result in SortedSearchResults.DistinctBy(r => r.Description).Take(maxResults))
 {
 xmlString.Append(result.GetXMLString());
 ++objCount;
 }

Question 11

You shouldn't just say "An O(1) algorithm is better" unless you know the constant factors and the sample size. An O(log n) or O(n) algorithm with a smaller constant factor may be superior for this use case.

Question 12

@Snowbody: Hm, reading the answer I can't see where I just said "An O(1) algorithm is better". My answer says "It might be better to turn an O(n^2) algorithm into an O(n) algorithm by the use of a different data structure". Since the OP stated even minor performance improvements are important I'd say it's certainly worth a shot.

score 8 · Answer 1 · 2015-03-12 19:20:51Z

Comparing strings case insensitively is done by using the proper overload:

string.Compare(str1, str2, StringComparison.OrdinalIgnoreCase)

By calling .ToLower() for each string to compare (4 times for each iteration), you create 4 new strings each time. If you have a lot of (long) strings, this will create a lot of garbage for no reason.

In your case you use .Contains() so you can mimic this by using .IndexOf():

str4.IndexOf("dyt2", StringComparison.OrdinalIgnoreCase) >= 0

It's prettier to use bool instead of Boolean, partly because it also keeps your code consistent.

Write your variable names in full: objectCount, previousResultName, etc.

Constants are PascalCase, not WHATEVER_THIS_IS as in Java.

The addResult variable seems a little wonky. Instead of this:

Boolean addResult = true;
if (thisSearch.FilterDescription)
{
 if (!name.ToLower().Contains(thisSearch.Value.ToLower()))
 {
 addResult = false;
 }
}
if (addResult)
{
 SearchItem oSearch = new SearchItem(name, code);
 SearchResults.Add(oSearch);
 prevResultName = name;
}

Why not this:

if (!(thisSearch.FilterDescription && !name.ToLower().Contains(thisSearch.Value.ToLower()))
{
 SearchItem oSearch = new SearchItem(name, code);
 SearchResults.Add(oSearch);
 prevResultName = name;
}

You can clean this up a bit more but I'm not in a boolean algebra mood.

Okay thanks, so for the .ToLower().Contains(... it should just be string.Compare(code, "ccd", StringComparison.OrdinalIgnoreCase) instead?
Oh, I didn't realize it was a .Contains call. In that case use .IndexOf() with the appropriate overload. I'll edit it.
Okay, so I would actually want <, right? Because it's a Not operation.

Snowbody Snowbody 8,66225 silver badges50 bronze badges · Answer 2 · 2015-03-12 19:33:14Z

One way to improve performance would be to not sort the whole list, since you're only concerned about the MAX_WORD_ITERATIONS longest items. Just do a partial sort.

At the very least, limit your for loop to only going MAX_WORD_ITERATIONS times. If it's not found by then, it won't be found.

It looks like the first if serves no purpose and can be eliminated.

Code segment 1

Array.PartialSort(word, (x, y) => y.Length.CompareTo(x.Length),MAX_WORD);
for (int i = 0; i <= MAX_WORD_ITERATIONS; i++)
{
 string searchValue = word[i];
 xmlsearchResults = GetCachedRecord(thisSearch, searchValue);
 if (xmlsearchResults != BLANK_SEARCH_XML_SCHEMA && xmlsearchResults != string.Empty)
 {
 thisSearch.Value = searchValue;
 ignoreCache = true;
 break;
 }
}

Code segment 2

xmlString.Append(string.Format(@"<{0}>", xmlHeader));

How about using AppendFormat()? No speed increase but easier-to-read code.

if (!code.ToLower().Contains("ccb")

Instead of creating a temporary lowercase string, just use the case-insensitive compare:

code.IndexOf("ccb", StringComparison.OrdinalIgnoreCase) < 0

See https://stackoverflow.com/questions/444798/case-insensitive-containsstring

 var SortedSearchResults = SearchResults.OrderByDescending(s => s.Downloads).ToList();

Why are you converting it to a List<>? You're not doing anything that requires a List<>. Leave it as an IEnumerable<> and use a foreach to iterate over it.

ChrisWue ChrisWue 20.6k4 gold badges42 silver badges107 bronze badges · Answer 3 · 2015-03-12 19:12:47Z

One thing which might improve performance:

I'm not sure what xmlString is but given how it's used it's probably a StringBuilder. What you do in this loop:

 for (int i = 0; i < SortedSearchResults.Count(); i++)
 {
 if ((objcount < maxResults))
 {
 string te = string.Format("<ResultName>{0}</ResultName>", SortedSearchResults[i].Description);
 if (!xmlString.ToString().Contains(te))
 {
 xmlString.Append(SortedSearchResults[i].GetXMLString());
 objcount += 1;
 }
 }
 }

is that you convert the content of the StringBuilder into a string then do a string search on it (Contains()) and then append if it's no already present. So for every result you have to search through all previous results in the end making this an an O(n^2) operation. Using a Distinct filter should reduce the complexity to O(n). LINQ doesn't provide a Distinct for a specific property but it's easy to write a little extension method (courtesy of SO):

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
 (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
 HashSet<TKey> seenKeys = new HashSet<TKey>();
 foreach (TSource element in source)
 {
 if (seenKeys.Add(keySelector(element)))
 {
 yield return element;
 }
 }
}

HashSet<> has O(1) lookup hence this filter is O(n).

Then your loop becomes:

 var objCount = 0;
 foreach (var result in SortedSearchResults.DistinctBy(r => r.Description).Take(maxResults))
 {
 xmlString.Append(result.GetXMLString());
 ++objCount;
 }

You shouldn't just say "An O(1) algorithm is better" unless you know the constant factors and the sample size. An O(log n) or O(n) algorithm with a smaller constant factor may be superior for this use case.
@Snowbody: Hm, reading the answer I can't see where I just said "An O(1) algorithm is better". My answer says "It might be better to turn an O(n^2) algorithm into an O(n) algorithm by the use of a different data structure". Since the OP stated even minor performance improvements are important I'd say it's certainly worth a shot.

Stack Exchange Network

Complex search program

3 Answers 3

Code segment 1

Code segment 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Complex search program

3 Answers 3

Code segment 1

Code segment 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions