I would like to make the search faster as it takes a long time to retrieve the data that I require. Is there a faster method to do this? I have written the code below to allow me to search through a data table and retrieve data, which currently is fully functional. It's the nested for loops which I am using and wondering if there is another method of completing the tasks that the for loops get?
DataTable sqlData = GetConfiguration();
// var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");
var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
.Select(data => data.Field<String>("QuestionStartText")).Distinct();
List<String> queryResult = new List<String>();
foreach (var item in w.ToArray<string>())
{
if (item != null)
{
String queryString = item;
//queryResult.Clear();
for (int i = 0; i < excelDataTable.Columns.Count; i++)
{
for (int k = 2; k < excelDataTable.Rows.Count; k++)
{
String row = "";
bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
if (check)
{
for (int j = 0; j < excelDataTable.Columns.Count; j++)
{
string value = excelDataTable.Rows[k][j].ToString()+":";
row += value;
}
if (!queryResult.Contains(row))
{
queryResult.Add(row);
}
}
}
}
}
}
3 Answers 3
The most obvious problem is when there are large number of strings, string concatenation is inherently slow. Try using a StringBuilder instead, like this:
DataTable sqlData = GetConfiguration();
// var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");
var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
.Select(data => data.Field<String>("QuestionStartText")).Distinct();
List<String> queryResult = new List<String>();
StringBuilder sb = new StringBuilder();
foreach (var item in w.ToArray<string>())
{
if (item != null)
{
String queryString = item;
//queryResult.Clear();
for (int i = 0; i < excelDataTable.Columns.Count; i++)
{
for (int k = 2; k < excelDataTable.Rows.Count; k++)
{
sb.Clear();
bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
if (check)
{
for (int j = 0; j < excelDataTable.Columns.Count; j++)
{
sb.Append(excelDataTable.Rows[k][j].ToString());
sb.Append(':');
}
string row = sb.ToString();
if (!queryResult.Contains(row))
{
queryResult.Add(row);
}
}
}
}
}
}
-
\$\begingroup\$ -1: while your premiss is not wrong, your conclusion is. You should fix that particular problem by using
string.Join
instead of afor
and aStringBuilder
. \$\endgroup\$ANeves– ANeves2012年11月26日 16:53:19 +00:00Commented Nov 26, 2012 at 16:53 -
\$\begingroup\$ I didn't know there was this method. Anyways, we don't have an
IEnumerable
with the appropriate elements \$\endgroup\$resgh– resgh2012年11月27日 10:30:41 +00:00Commented Nov 27, 2012 at 10:30 -
\$\begingroup\$ @ANeves: I am not sure that String.Join is better in this case, because: their performance is very close (with the Append method of stringbuilder atleast) and he is reusing the allocated memory (by using sb.Clear()) which string.Join can't. At least regarding memory allocation StringBuilder is the winner. How this translates to performance? I wouldn't know without measuring. \$\endgroup\$Cohen– Cohen2012年11月27日 13:00:50 +00:00Commented Nov 27, 2012 at 13:00
-
\$\begingroup\$ @Cohen I don't believe that performance of string concatenation is the culprit on issue in question. But you make good points. \$\endgroup\$ANeves– ANeves2012年11月27日 14:07:18 +00:00Commented Nov 27, 2012 at 14:07
Well, you can certainly make it easier to read by cleaning it up a little bit and getting rid of the extra (for i
) loop, and substituting a string.Join
and Select
for the inner (for j
) loop's concatenation:
List<String> queryResult = new List<String>();
foreach (var queryString in w.ToArray<string>())
{
if (string.IsNullOrEmpty(queryString)) continue;
for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
var excelRow = excelDataTable.Rows[rowIndex];
if (!excelRow[0].ToString().StartsWith(queryString)) continue;
var row = string.Join(":", excelRow.Select(c => c.ToString()).ToArray());
if (!queryResult.Contains(row)) queryResult.Add(row);
}
}
Now that we can see that we're looping over all the rows for each queryString
, we can change to traversing rows
once and pick up any queryStrings
along the way.
This effectively flips the order of iteration (I'm assuming there's more rows than
query strings). To get rid of the nested loops altogether, we'll switch to using Any
to find any matches. We'll also drop the queryResult.Contains
check on each iteration for a single Distinct
call at the end. That should keep us from iterating queryResult
multiple times.
var queryStrings = w.ToArray<string>();
List<String> queryResult = new List<String>();
for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
var row = excelDataTable.Rows[rowIndex];
var rowStart = row[0].ToString();
if (!queryStrings.Any(q => rowStart.StartsWith(q)) continue;
var s = string.Join(":", row.Select(c => c.ToString()).ToArray());
queryResult.Add(s);
}
queryResult = queryResult.Distinct().ToList();
That's not going to do a whole lot for performance (basically get rid of a rows iteration), but everything else I see is really context specific that may end up with worse performance assuming your data looks like I expect it does (< 10 queryStrings
, < 50 queryResults
and thousands of rows).
A couple of additional thoughts you can try, though, depending on your data:
- If
queryStrings
and/orqueryResults
are large, using aHashSet<string>
may be a better choice. You'll take a hit on insert, but lookups (Any
andContains
) will be faster. - the
queryResult.Contains
call can effectively invalidate all of our work in looking throughqueryStrings
and concatenatingrow
. Depending on how many duplicate results you expect vs how manyqueryString
matches, swapping that and thequeryStrings.Any
could help. - You could partition the
rows
and then parallel process them. This may help depending on the number of rows and CPUs available.
From there, any further optimizations would be heavily dependent on your data and would need some example data of the correct relative sizes to profile and test.
Guessing at intent a little bit here, but this should simplify and be more performant (I don't see the point in the outer loop over the columns). That being said, where is your bottleneck? I'd almost guess it's in the database operation GetConfiguration()
more than any of this code.
var sqlData = GetConfiguration();
////var q = sqlData.AsEnumerable().Where(data => data.Field<string>("slideNo") == "5");
var w = sqlData
.AsEnumerable()
.Where(data => data.Field<string>("slideNo") == "5")
.Select(data => data.Field<string>("QuestionStartText")).Distinct();
var queryResult = new List<string>();
foreach (var queryString in w.Where(item => item != null))
{
////queryResult.Clear();
for (var k = 2; k < excelDataTable.Rows.Count; k++)
{
if (!excelDataTable.Rows[k][0].ToString().StartsWith(queryString))
{
continue;
}
var rowSb = new StringBuilder();
for (var j = 0; j < excelDataTable.Columns.Count; j++)
{
rowSb.Append(excelDataTable.Rows[k][j] + ":");
}
var row = rowSb.ToString();
if (!queryResult.Contains(row))
{
queryResult.Add(row);
}
}
}
row
? \$\endgroup\$w
and how many cells inexcelDataTable
? \$\endgroup\$