3
\$\begingroup\$

This might be difficult for me to describe clearly but I will try anyway. Also note that this is entirely speculative; it might not even be able to run any faster (that is why I am asking).

This is a question directed at performance. I have been trying to improve the performance of a method and have managed to improve it quite a bit already (around 40% faster). Right now, the entire process takes about 1 minute and 15 seconds to complete. I would like to see if maybe you guys can see some obvious faults in my code and how we could perhaps improve it even further, seeing as I am not quite satisfied yet.

The method fetches a big amount of data from a couple of databases and needs to be displayed in a grid in a certain way. Here is the heavy part of the method. I will describe it in more detail after:

DataClasses1DataContext dbContext = new DataClasses1DataContext(sqlConnectionString);
//Find minimum value of N in the specified timespan
var headerMin = ((from y in dbContext.Testheaders
 where Convert.ToDateTime(y.StartTime) >= GlobalVariables.StartTimeMSA
 select y.N).Min());
//Find maximum value of N in the specified timespan
var headerMax = ((from y in dbContext.Testheaders
 where Convert.ToDateTime(y.StartTime) <= GlobalVariables.EndTimeMSA
 select y.N).Max());
//Find appropriate data in the specified timespan (min and max headerid)
var msaData = from x in dbContext.Testdatas
 where x.HeaderID >= headerMin
 &&
 x.HeaderID <= headerMax
 select new
 {
 StartTime = Convert.ToDateTime((from z in dbContext.Testheaders
 where x.HeaderID == z.N
 select z.StartTime).FirstOrDefault()),
 DUT_id = Convert.ToString((from y in dbContext.Testheaders
 where x.HeaderID == y.N
 select y.DUT_id).FirstOrDefault()),
 Meas = (double)x.Meas,
 x.Step,
 x.LogID,
 x.HeaderID,
 printID = Convert.ToInt32((from y in dbContext.Testheaders
 where x.HeaderID == y.N
 select y.PrintID).FirstOrDefault()),
 Prod_Blok = Convert.ToInt32((from y in dbContext.Testheaders
 where x.HeaderID == y.N
 select y.Prod_Blok).FirstOrDefault()),
 Rep_Count = Convert.ToInt32((from y in dbContext.Testheaders
 where x.HeaderID == y.N
 select y.Rep_Count).FirstOrDefault()),
 Operator = Convert.ToString((from y in dbContext.Testheaders
 where x.HeaderID == y.N
 select y.Operator).FirstOrDefault()),
 };
msaDataTable.Columns.Add("Time");
msaDataTable.Columns.Add("DUT");
msaDataTable.Columns.Add("Position");
var functionDistinct = msaData.Select(x => new { x.Step, x.LogID }).Distinct().ToList().OrderBy(x => x.Step).ThenBy(x => x.LogID);
//Add columns to datatable corresponding to each function. The number of columns is only known at runtime.
foreach (var item in functionDistinct)
{
 var column = new DataColumn() { DataType = typeof(double), ColumnName = String.Format("{0}-{1}", item.Step, item.LogID), AllowDBNull = true };
 msaDataTable.Columns.Add(column);
}
var distinctHeaderID = msaData.Select(x => x.HeaderID).Distinct();
foreach (var headerID in distinctHeaderID)
{
 ct.ThrowIfCancellationRequested();
 //Fetch the data for the current headerID
 var headerIDData = msaData.Where(x => x.HeaderID == headerID).Select(x => new { x.StartTime, x.DUT_id, x.printID, x.Step, x.LogID, x.Meas }).ToList();
 if (headerIDData != null)
 {
 DataRow row = msaDataTable.NewRow();
 row["Time"] = headerIDData.FirstOrDefault().StartTime;
 row["DUT"] = headerIDData.FirstOrDefault().DUT_id;
 row["Position"] = headerIDData.FirstOrDefault().printID;
 //Run through the data for the current headerID and assign the values to the right function.
 foreach (var data in headerIDData)
 {
 string function = String.Format("{0}-{1}", data.Step, data.LogID);
 row[function] = data.Meas;
 }
 msaDataTable.Rows.Add(row);
 }
}

Some sample rows from the database query (msaData) could look like this (these are just made up):

Sample rows

I am basically taking those sample rows and combine them into a single row, where the distinct Step and LogID are combined into "functions" shown as columns with the Measurements as cell values. To illustrate, here is how the above picture would be displayed correctly:

Sample rows correct

A few notes:

  • Step and LogID combined are known as a "function". In the database they are separated into two columns instead of just the function. For example, Step could have a field value of 5 and LogID 1. Then the function would be known as 5-1. This is what I am displaying as columns instead.

  • To achieve the goal I am creating a DataTable and binding it to the grid after I am done filling it with appropriate data.

  • I need to create the columns dynamically as there is no way to know how many functions there are at design time. That is the reason I am using a DataTable in the first place instead of just creating a class and an ObservableCollection for example.

  • The above sample rows is a very small amount of what the actual data will consist of. We are talking 400000 rows in the test scenario I am running currently.

Are there any immediate problems in my code you can spot? Perhaps some places where a ToList() call could improve performance (even though I have tried some already). Or perhaps another way to tackle how I combine the rows?

Brythan
7,0143 gold badges21 silver badges37 bronze badges
asked Dec 19, 2014 at 12:05
\$\endgroup\$
6
  • \$\begingroup\$ Just out of curiosity, why would you return 400k rows to an end user? That's sure to cause the user's brain to overflow with too much data. \$\endgroup\$ Commented Dec 19, 2014 at 12:26
  • \$\begingroup\$ Well the thing is that the number of rows will be reduced greatly before it is shown to the user, since the number of functions can be as big as about 70. This is a client request to get it shown this way and with my method it is shown the way they want it. I am using DevExpress so it will be easy for them to search for items in the grid. But the most common use will be to export the grid data into excel, where they will use some statistical program (that I have no idea how works or what actually is) to do some automatic modifications/readings on the data. \$\endgroup\$ Commented Dec 19, 2014 at 12:32
  • \$\begingroup\$ Furthermore, it might be acceptable for them for the application to take this long before spewing out the data, but I find it somewhat slow. I am asking the question out of curiosity to see if anyone can spot some obvious flaws, seeing as I am relatively new to C# and haven't actually done an application like this before. Hence why I stated the question as merely speculative :) \$\endgroup\$ Commented Dec 19, 2014 at 12:34
  • 2
    \$\begingroup\$ Have you looked at the generated SQL and looked at the plan and stats for it? \$\endgroup\$ Commented Dec 19, 2014 at 13:47
  • 1
    \$\begingroup\$ Oh sorry, yeah have a google around analysing query plans in SQL Server. You will be able to speed your function up more effectively by speeding up the SQL aspect. \$\endgroup\$ Commented Dec 19, 2014 at 16:07

2 Answers 2

2
\$\begingroup\$

I'm thinking that the 2 calls at the beginning might be reduced to one. Which should cut down the time somewhat. Something like this:

var headers = ((from y in dbContext.Testheaders
 let startTime = Convert.ToDateTime(y.StartTime)
 where startTime >= GlobalVariables.StartTimeMSA &&
 startTime <= GlobalVariables.EndTimeMSA
 select y.N));
//Find minimum value of N in the specified timespan
var headerMin = headers.Min();
//Find maximum value of N in the specified timespan
var headerMax = headers.Max());

In a similar vein extracting the appropriate header, only once for each record your looking at in testDatas should eliminate the queries your doing in the new block. Something like this:

var msaData = from x in dbContext.Testdatas
 where x.HeaderID >= headerMin
 &&
 x.HeaderID <= headerMax
 let header = (from h in dbContext.Testheaders
 where x.HeaderID == h.N
 select h).FirstOrDefault() 
 select new
 {
 StartTime = Convert.ToDateTime(header.StartTime),
 DUT_id = Convert.ToString(header.DUT_id),
 Meas = (double)x.Meas,
 x.Step,
 x.LogID,
 x.HeaderID,
 printID = Convert.ToInt32(header.PrintID),
 Prod_Blok = Convert.ToInt32(header.Prod_Blok),
 Rep_Count = Convert.ToInt32(header.Rep_Count),
 Operator = Convert.ToString(header.Operator),
 };
answered Dec 23, 2014 at 2:58
\$\endgroup\$
2
  • \$\begingroup\$ Hmm interesting, haven't heard of the let clause before. I'll try this out. \$\endgroup\$ Commented Dec 23, 2014 at 6:48
  • \$\begingroup\$ Doing this actually did cut the elapsed time a bit. Now it is down to about 50 seconds which I do find somewhat acceptable. Thanks! I'll go read what let actually does to gain a bit of understanding of what you did. \$\endgroup\$ Commented Dec 23, 2014 at 7:07
0
\$\begingroup\$
  • You should call ToList() at the big Select() and remove the calls to ToList() here

    var functionDistinct = msaData.Select(x => new { x.Step, x.LogID
     }).Distinct().ToList().OrderBy(x => x.Step).ThenBy(x => x.LogID);
    

    or here

    var headerIDData = msaData.Where(x => x.HeaderID == headerID).Select(x => new { 
     x.StartTime, x.DUT_id, x.printID, x.Step, x.LogID, x.Meas }).ToList(); 
    
  • Skip the creation of the anonymous types where you don't need it.

  • Replace the check headerIDData != null with headerIDData.Any(), because it can't be null

  • call Distinct() before Select() by implementing an IEqualityComparer<T>

    var distinctHeaderID = msaData.Select(x => x.HeaderID).Distinct();
    
answered Dec 19, 2014 at 12:59
\$\endgroup\$
5
  • \$\begingroup\$ Alright, tried removing the ToList() calls where you specified. Here are the elapsed times: Time elapsed: 00:01:06.7339292 (with the ToList() calls) Time elapsed: 00:03:20.5137437 (removing the ToList() calls) So removing the ToList() calls basically triples the elapsed time. \$\endgroup\$ Commented Dec 19, 2014 at 13:10
  • \$\begingroup\$ Oh yeah, let me update the question with that query. \$\endgroup\$ Commented Dec 19, 2014 at 13:19
  • \$\begingroup\$ Once again, the elapsed time was way longer with calling the ToList() at the initial msaData and removing it from functionDistinct and headerIDData. Performance: Time elapsed: 00:05:41.6543591. Performance with old code: Time elapsed: 00:00:58.3041223. \$\endgroup\$ Commented Dec 19, 2014 at 13:44
  • \$\begingroup\$ So, the call to ToList() will take almost 5 minutes ? \$\endgroup\$ Commented Dec 19, 2014 at 13:50
  • \$\begingroup\$ Apparently, yes. Does seem a bit weird. \$\endgroup\$ Commented Dec 19, 2014 at 13:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.