Search datatable faster

Question 1

I would like to make the search faster as it takes a long time to retrieve the data that I require. Is there a faster method to do this? I have written the code below to allow me to search through a data table and retrieve data, which currently is fully functional. It's the nested for loops which I am using and wondering if there is another method of completing the tasks that the for loops get?

 DataTable sqlData = GetConfiguration();
 // var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");
 var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
 .Select(data => data.Field<String>("QuestionStartText")).Distinct();
 List<String> queryResult = new List<String>();
 foreach (var item in w.ToArray<string>())
 {
 if (item != null)
 {
 String queryString = item;
 //queryResult.Clear();
 for (int i = 0; i < excelDataTable.Columns.Count; i++)
 { 
 for (int k = 2; k < excelDataTable.Rows.Count; k++)
 {
 String row = "";
 bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
 if (check)
 {
 for (int j = 0; j < excelDataTable.Columns.Count; j++)
 {
 string value = excelDataTable.Rows[k][j].ToString()+":";
 row += value;
 }
 if (!queryResult.Contains(row))
 {
 queryResult.Add(row);
 }
 }
 }
 }
 } 
 }

Question 2

You should really run this through a profiler and tell us what exactly takes so long as this totally depends on the data.

Question 3

Arrow code is bad, you should work on reducing that nesting. Also, do you really want the last ":" in row?

Question 4

why iterate the columns than the rows and again the columns? Couldn't you just remove the 'for (int i'?

Question 5

If performance is an issue, then I would start with direct SQL queries first and see how long that takes. You have too much stuff "complected", e.g. braided together in one method, plus ORM can be painfully slow - depending on how optimized and how flexible it is. It is hard to reason about you method as a whole, so I would recommend breaking it down to smaller chunks and timing each one. Remember that lazy functions do not execute till later. Every little piece should have some space and time complexity. Try to do as much as you can in SQL first, for it is (usually) optimal.

Question 6

How many records for w and how many cells in excelDataTable?

Question 7

The most obvious problem is when there are large number of strings, string concatenation is inherently slow. Try using a StringBuilder instead, like this:

 DataTable sqlData = GetConfiguration();
 // var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");
 var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
 .Select(data => data.Field<String>("QuestionStartText")).Distinct();
 List<String> queryResult = new List<String>();
 StringBuilder sb = new StringBuilder();
 foreach (var item in w.ToArray<string>())
 {
 if (item != null)
 {
 String queryString = item;
 //queryResult.Clear();
 for (int i = 0; i < excelDataTable.Columns.Count; i++)
 { 
 for (int k = 2; k < excelDataTable.Rows.Count; k++)
 {
 sb.Clear();
 bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
 if (check)
 {
 for (int j = 0; j < excelDataTable.Columns.Count; j++)
 {
 sb.Append(excelDataTable.Rows[k][j].ToString());
 sb.Append(':');
 }
 string row = sb.ToString();
 if (!queryResult.Contains(row))
 {
 queryResult.Add(row);
 }
 }
 }
 }
 } 
 }

Question 8

-1: while your premiss is not wrong, your conclusion is. You should fix that particular problem by using string.Join instead of a for and a StringBuilder.

Question 9

I didn't know there was this method. Anyways, we don't have an IEnumerable with the appropriate elements

Question 10

@ANeves: I am not sure that String.Join is better in this case, because: their performance is very close (with the Append method of stringbuilder atleast) and he is reusing the allocated memory (by using sb.Clear()) which string.Join can't. At least regarding memory allocation StringBuilder is the winner. How this translates to performance? I wouldn't know without measuring.

Question 11

@Cohen I don't believe that performance of string concatenation is the culprit on issue in question. But you make good points.

Question 12

Well, you can certainly make it easier to read by cleaning it up a little bit and getting rid of the extra (for i) loop, and substituting a string.Join and Select for the inner (for j) loop's concatenation:

List<String> queryResult = new List<String>();
foreach (var queryString in w.ToArray<string>())
{
 if (string.IsNullOrEmpty(queryString)) continue;
 for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
 var excelRow = excelDataTable.Rows[rowIndex];
 if (!excelRow[0].ToString().StartsWith(queryString)) continue;
 var row = string.Join(":", excelRow.Select(c => c.ToString()).ToArray());
 if (!queryResult.Contains(row)) queryResult.Add(row);
 }
}

Now that we can see that we're looping over all the rows for each queryString, we can change to traversing rows once and pick up any queryStrings along the way. This effectively flips the order of iteration (I'm assuming there's more rows than query strings). To get rid of the nested loops altogether, we'll switch to using Any to find any matches. We'll also drop the queryResult.Contains check on each iteration for a single Distinct call at the end. That should keep us from iterating queryResult multiple times.

var queryStrings = w.ToArray<string>();
List<String> queryResult = new List<String>();
for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
 var row = excelDataTable.Rows[rowIndex];
 var rowStart = row[0].ToString();
 if (!queryStrings.Any(q => rowStart.StartsWith(q)) continue;
 var s = string.Join(":", row.Select(c => c.ToString()).ToArray());
 queryResult.Add(s);
}
queryResult = queryResult.Distinct().ToList();

That's not going to do a whole lot for performance (basically get rid of a rows iteration), but everything else I see is really context specific that may end up with worse performance assuming your data looks like I expect it does (< 10 queryStrings, < 50 queryResults and thousands of rows).

A couple of additional thoughts you can try, though, depending on your data:

If queryStrings and/or queryResults are large, using a HashSet<string> may be a better choice. You'll take a hit on insert, but lookups (Any and Contains) will be faster.
the queryResult.Contains call can effectively invalidate all of our work in looking through queryStrings and concatenating row. Depending on how many duplicate results you expect vs how many queryString matches, swapping that and the queryStrings.Any could help.
You could partition the rows and then parallel process them. This may help depending on the number of rows and CPUs available.

From there, any further optimizations would be heavily dependent on your data and would need some example data of the correct relative sizes to profile and test.

Question 13

Guessing at intent a little bit here, but this should simplify and be more performant (I don't see the point in the outer loop over the columns). That being said, where is your bottleneck? I'd almost guess it's in the database operation GetConfiguration() more than any of this code.

 var sqlData = GetConfiguration();
 ////var q = sqlData.AsEnumerable().Where(data => data.Field<string>("slideNo") == "5");
 var w = sqlData
 .AsEnumerable()
 .Where(data => data.Field<string>("slideNo") == "5")
 .Select(data => data.Field<string>("QuestionStartText")).Distinct();
 var queryResult = new List<string>();
 foreach (var queryString in w.Where(item => item != null))
 {
 ////queryResult.Clear();
 for (var k = 2; k < excelDataTable.Rows.Count; k++)
 {
 if (!excelDataTable.Rows[k][0].ToString().StartsWith(queryString))
 {
 continue;
 }
 var rowSb = new StringBuilder();
 for (var j = 0; j < excelDataTable.Columns.Count; j++)
 {
 rowSb.Append(excelDataTable.Rows[k][j] + ":");
 }
 var row = rowSb.ToString();
 if (!queryResult.Contains(row))
 {
 queryResult.Add(row);
 }
 }
 }

resgh resgh 2141 silver badge7 bronze badges · Answer 1 · 2012-11-26 10:49:24Z

The most obvious problem is when there are large number of strings, string concatenation is inherently slow. Try using a StringBuilder instead, like this:

 DataTable sqlData = GetConfiguration();
 // var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");
 var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
 .Select(data => data.Field<String>("QuestionStartText")).Distinct();
 List<String> queryResult = new List<String>();
 StringBuilder sb = new StringBuilder();
 foreach (var item in w.ToArray<string>())
 {
 if (item != null)
 {
 String queryString = item;
 //queryResult.Clear();
 for (int i = 0; i < excelDataTable.Columns.Count; i++)
 { 
 for (int k = 2; k < excelDataTable.Rows.Count; k++)
 {
 sb.Clear();
 bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
 if (check)
 {
 for (int j = 0; j < excelDataTable.Columns.Count; j++)
 {
 sb.Append(excelDataTable.Rows[k][j].ToString());
 sb.Append(':');
 }
 string row = sb.ToString();
 if (!queryResult.Contains(row))
 {
 queryResult.Add(row);
 }
 }
 }
 }
 } 
 }

-1: while your premiss is not wrong, your conclusion is. You should fix that particular problem by using string.Join instead of a for and a StringBuilder.
I didn't know there was this method. Anyways, we don't have an IEnumerable with the appropriate elements
@ANeves: I am not sure that String.Join is better in this case, because: their performance is very close (with the Append method of stringbuilder atleast) and he is reusing the allocated memory (by using sb.Clear()) which string.Join can't. At least regarding memory allocation StringBuilder is the winner. How this translates to performance? I wouldn't know without measuring.
@Cohen I don't believe that performance of string concatenation is the culprit on issue in question. But you make good points.

Mark Brackett Mark Brackett 4392 silver badges6 bronze badges · Answer 2 · 2013-06-25 17:07:34Z

Well, you can certainly make it easier to read by cleaning it up a little bit and getting rid of the extra (for i) loop, and substituting a string.Join and Select for the inner (for j) loop's concatenation:

List<String> queryResult = new List<String>();
foreach (var queryString in w.ToArray<string>())
{
 if (string.IsNullOrEmpty(queryString)) continue;
 for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
 var excelRow = excelDataTable.Rows[rowIndex];
 if (!excelRow[0].ToString().StartsWith(queryString)) continue;
 var row = string.Join(":", excelRow.Select(c => c.ToString()).ToArray());
 if (!queryResult.Contains(row)) queryResult.Add(row);
 }
}

Now that we can see that we're looping over all the rows for each queryString, we can change to traversing rows once and pick up any queryStrings along the way. This effectively flips the order of iteration (I'm assuming there's more rows than query strings). To get rid of the nested loops altogether, we'll switch to using Any to find any matches. We'll also drop the queryResult.Contains check on each iteration for a single Distinct call at the end. That should keep us from iterating queryResult multiple times.

var queryStrings = w.ToArray<string>();
List<String> queryResult = new List<String>();
for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
 var row = excelDataTable.Rows[rowIndex];
 var rowStart = row[0].ToString();
 if (!queryStrings.Any(q => rowStart.StartsWith(q)) continue;
 var s = string.Join(":", row.Select(c => c.ToString()).ToArray());
 queryResult.Add(s);
}
queryResult = queryResult.Distinct().ToList();

That's not going to do a whole lot for performance (basically get rid of a rows iteration), but everything else I see is really context specific that may end up with worse performance assuming your data looks like I expect it does (< 10 queryStrings, < 50 queryResults and thousands of rows).

A couple of additional thoughts you can try, though, depending on your data:

If queryStrings and/or queryResults are large, using a HashSet<string> may be a better choice. You'll take a hit on insert, but lookups (Any and Contains) will be faster.
the queryResult.Contains call can effectively invalidate all of our work in looking through queryStrings and concatenating row. Depending on how many duplicate results you expect vs how many queryString matches, swapping that and the queryStrings.Any could help.
You could partition the rows and then parallel process them. This may help depending on the number of rows and CPUs available.

From there, any further optimizations would be heavily dependent on your data and would need some example data of the correct relative sizes to profile and test.

score 0 · Answer 3 · 2012-12-26 17:10:29Z

Guessing at intent a little bit here, but this should simplify and be more performant (I don't see the point in the outer loop over the columns). That being said, where is your bottleneck? I'd almost guess it's in the database operation GetConfiguration() more than any of this code.

 var sqlData = GetConfiguration();
 ////var q = sqlData.AsEnumerable().Where(data => data.Field<string>("slideNo") == "5");
 var w = sqlData
 .AsEnumerable()
 .Where(data => data.Field<string>("slideNo") == "5")
 .Select(data => data.Field<string>("QuestionStartText")).Distinct();
 var queryResult = new List<string>();
 foreach (var queryString in w.Where(item => item != null))
 {
 ////queryResult.Clear();
 for (var k = 2; k < excelDataTable.Rows.Count; k++)
 {
 if (!excelDataTable.Rows[k][0].ToString().StartsWith(queryString))
 {
 continue;
 }
 var rowSb = new StringBuilder();
 for (var j = 0; j < excelDataTable.Columns.Count; j++)
 {
 rowSb.Append(excelDataTable.Rows[k][j] + ":");
 }
 var row = rowSb.ToString();
 if (!queryResult.Contains(row))
 {
 queryResult.Add(row);
 }
 }
 }

Stack Exchange Network

Search datatable faster

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Search datatable faster

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions