The following code takes about 20 seconds to run with about 200,000 records in the TaskLogs table:
using (var db = new DAL.JobManagerEntities())
{
return db.TaskLogs.Select(taskLog => new TaskLog()
{
TaskLogID = taskLog.TaskLogID,
TaskID = taskLog.TaskID,
TaskDescription = taskLog.Task.TaskDescription,
TaskType = (TaskType)taskLog.Task.TaskTypeID,
RunID = taskLog.RunID,
ProcessID = taskLog.ProcessID,
JobID = taskLog.JobID,
JobName = taskLog.Job.JobName,
Result = taskLog.Result,
StartTime = taskLog.StartTime,
TimeTaken = taskLog.TimeTaken
}).OrderByDescending(t => t.RunID).ThenByDescending(t => t.RunID).ThenByDescending(t => t.StartTime).ToList();
}
I tweaked it until I got something that runs faster. Here's where I got to:
using (var db = new DAL.JobManagerEntities())
{
db.Configuration.LazyLoadingEnabled = false;
var tasks = db.Tasks.ToList();
var jobs = db.Jobs.ToList();
var result = db.TaskLogs.Select(x => new TaskLog()
{
TaskLogID = x.TaskLogID,
TaskID = x.TaskID,
RunID = x.RunID,
ProcessID = x.ProcessID,
JobID = x.JobID,
Result = x.Result,
StartTime = x.StartTime,
TimeTaken = x.TimeTaken
}).OrderByDescending(t => t.RunID).ThenByDescending(t => t.StartTime).ToList();
foreach (var r in result)
{
r.TaskDescription = tasks.Single(t => t.TaskID == r.TaskID).TaskDescription;
r.TaskType = (TaskType)tasks.Single(t => t.TaskID == r.TaskID).TaskTypeID;
r.JobName = jobs.Single(j => j.JobID == r.JobID).JobName;
}
return result;}
Which runs in less than 6 seconds for the same number of records.
The TaskLog table is linked to the Job and Task tables as follows: enter image description here
The Job and Task tables will have 100s and 1000s of records respectively.
Is there anything else I could do in order to further improve the efficiency of the code?
2 Answers 2
1) Not sure if it impacts performance, but your first code fragment has one redunant order by:
.OrderByDescending(t => t.RunID).ThenByDescending(t => t.RunID)
2) You could improve performance of your seconds code with "client side indexing" (using a dictionary):
var tasksMap = tasks.ToDictionary(t => t.TaskID);
var jobsMap = jobs.ToDictionary(t => t.JobID);
foreach (var r in result)
{
var task = tasksMap[r.TaskID];
r.TaskDescription = task.TaskDescription;
r.TaskType = (TaskType)task.TaskTypeID;
r.JobName = jobsMap[r.JobID].JobName;
}
-
\$\begingroup\$ Thanks for the answer. Would
var taskMap = db.Tasks.ToDictionary(t => t.TaskID)
be the same asvar tasks = db.Tasks.ToList(); var taskMap = tasks.ToDictionary(t => t.TaskID)
? \$\endgroup\$Kappacake– Kappacake2018年02月12日 20:28:07 +00:00Commented Feb 12, 2018 at 20:28 -
\$\begingroup\$ Not sure, but I'd expect it. \$\endgroup\$JanDotNet– JanDotNet2018年02月12日 20:45:24 +00:00Commented Feb 12, 2018 at 20:45
-
\$\begingroup\$ The client side indexing does improve performance indeed. Execution time for the foreach loop went from about 500ms to about 30ms! \$\endgroup\$Kappacake– Kappacake2018年02月14日 10:47:07 +00:00Commented Feb 14, 2018 at 10:47
You can do two things to speed this query up:
- use
join
s so that the entire query runs on the server and you don't have to run over the results again with theforeach
loop - add
.AsNoTracking()
to each table so that EF does not have to create proxy objects for change-tracking.
Here's an example of how it would look like when we apply both suggestions.
using (var db = new DAL.JobManagerEntities())
{
db.Configuration.LazyLoadingEnabled = false;
var result =
from taskLog in db.TaskLogs.AsNoTracking()
join task in db.Tasks.AsNoTracking() on taskLog.TaskID equals task.TaskID
join job in db.Jobs.AsNoTracking() on taskLog.JobID equals job.JobID
orderby taskLog.RunID descending, taskLog.StartTime descending
select new TaskLog
{
TaskLogID = taskLog.TaskLogID,
TaskID = taskLog.TaskID,
RunID = taskLog.RunID,
ProcessID = taskLog.ProcessID,
JobID = taskLog.JobID,
Result = taskLog.Result,
StartTime = taskLog.StartTime,
TimeTaken = taskLog.TimeTaken,
TaskDescription = task.TaskDescription,
TaskType = (TaskType)task.TaskTypeID,
JobName = job.JobName
};
return result.ToList();
}
-
\$\begingroup\$ Thanks a lot for the answet! I’m going to try this later on and let you know \$\endgroup\$Kappacake– Kappacake2018年02月14日 07:56:06 +00:00Commented Feb 14, 2018 at 7:56
-
\$\begingroup\$ I had to make a few changes to make the above code work:
join job in db.Jobs.AsNoTracking() on taskLog.JobID equals job.TaskID
needs to bejoin job in db.Jobs.AsNoTracking() on taskLog.JobID equals job.JobID
and I also need to order the result before returningreturn result.OrderByDescending(t => t.RunID).ThenByDescending(t => t.StartTime).ToList();
. Unfortunately this code runs in about 20 seconds, so it is much slower than the original one. \$\endgroup\$Kappacake– Kappacake2018年02月14日 10:03:42 +00:00Commented Feb 14, 2018 at 10:03 -
\$\begingroup\$ @demonicdaron Yeah, I thougt so that I might have the IDs wrong - but this was notepad-coding ;-] If this is even slower then most probably because you order it again. There is already an
orderby
that will run on the server. No need to do it locally again. \$\endgroup\$t3chb0t– t3chb0t2018年02月14日 10:07:23 +00:00Commented Feb 14, 2018 at 10:07 -
\$\begingroup\$ sorry I completely missed that order by! for some reasons it wasn't returning the same data as the original code, so I added the order by and that corrected the problem. I'm gonna do some more testing and come back to your answer. Thanks for the help \$\endgroup\$Kappacake– Kappacake2018年02月14日 10:11:46 +00:00Commented Feb 14, 2018 at 10:11
-
\$\begingroup\$ I just tried removing the order by altogether (both from the query and the return), but it made little difference. I also removed the where clause, but that also had little effect. \$\endgroup\$Kappacake– Kappacake2018年02月14日 10:17:18 +00:00Commented Feb 14, 2018 at 10:17
Explore related questions
See similar questions with these tags.
taskLog.Task.TaskDescription
will be handled just fine without lazy loading. Can you disable LazyLoading and Proxy creation for the first sample as well as usingAsNoTracking
? Would be interesting whether this boosts performance. \$\endgroup\$Select(taskLog => new { ... })
instead ofnew TaskLog() ...
) and don't cast theTaskTypeID
but take it as the returned type? Below you commented that the query runs in like 15ms, rest being theToList
... how did you test this, it sounds a bit fast? \$\endgroup\$db.Database.Log = x => WriteSomewhere(x)
and check query performance with some dedicated SQL tool for comparison) \$\endgroup\$JobName
andTaskDescription
sum up to 1000 characters per record, which is more than all the other properties combined. Since you mention there will be a lot less entries (1000s vs 200000 records), the solution from @JanDotNet makes perfect sense, it will just transfer a lot less duplicated strings from server to client. \$\endgroup\$