I have a task in my program that is inserting thousands (94,953 in one instance and 6,930 in another) of records into my database using Entity Framework.
Right now I am doing this and calling the .Add()
method for each record but it takes about 1 minute to insert the smaller batch and over 20 minutes for the larger batch. I have tried the .AddRange()
method but that jumped the smaller batch to over 4 minutes.
Is there another approach with Entity Framework 6 or is this a limitation I have to live with? If it makes any difference, the data is going into a SQL Server 2012 R2 instance.
var taskCodes = DynamicsHelper.GetTaskCodes(dynamicsSession, "01-01-1990");
Console.WriteLine("Adding task codes to database.");
using (var db = new JobSightDbContext())
{
foreach (var taskCode in taskCodes)
{
var projectID = db.ProjectCodes.Where(project => project.Code == taskCode.Item4).Select(project => project.ID).FirstOrDefault();
if (projectID != 0)
{
var newTaskCode = new TaskCode()
{
Code = taskCode.Item1,
Description = taskCode.Item2,
IsActive = taskCode.Item3,
ProjectID = projectID
};
db.TaskCodes.Add(newTaskCode);
db.SaveChanges();
}
}
Console.WriteLine("{0} tasks added to the database.", db.TaskCodes.Count());
}
3 Answers 3
Since you're not making any changes to existing objects, you might set the AutoDetectChangesEnabled
property of your context to false
.
From MSDN:
Gets or sets a value indicating whether the DetectChanges method is called automatically by methods of DbContext and related classes. The default value is true.
Calling the Add
calls the DetectChanges
method every time and is expensive. Turn it off and at the end, turn it on again.
Example:
using (var db = new DbContext)
{
try
{
db.Configuration.AutoDetectChangesEnabled = false;
//logic
}
finally
{
db.Configuration.AutoDetectChangesEnabled = true;
}
}
More reading on this: Secrets of DetectChanges Part 3
Also, you call the SaveChanges
method every iteration in your loop. This means you make a call to your DB every iteration to persist that entity and is intensive and time consuming. Place that line of code outside your loop.
foreach (var taskCode in taskCodes)
{
//logic
db.TaskCodes.Add(newTaskCode);
}
db.SaveChanges();
It's much more performant to make 1 call for many items than many calls for 1 item!
Here's an example question on StackOverflow about this: Any difference between calling SaveChanges() inside and outside a foreach loop?
-
\$\begingroup\$ I edited your code example to be
db.Configuration.AutoDetectChangesEnabled
as that is the proper call into it per msdn.microsoft.com/en-us/data/jj556205.aspx hope you don't mind the small fix. \$\endgroup\$Matthew Verstraete– Matthew Verstraete2016年06月10日 15:50:01 +00:00Commented Jun 10, 2016 at 15:50
You're doing this for each record:
var projectID = db.ProjectCodes.Where(project => project.Code == taskCode.Item4).Select(project => project.ID).FirstOrDefault();
That alone is a massive performance issue. Instead, before looping through taskCodes
:
- extract the distinct
Item4
(what a bad name, BTW) from yourtaskCodes
, - use those to retrieve the appropriate
ProjectCodes
- and store the various combinations in a
Dictionary<T, T>
.
When you loop through taskCodes
you can now use TryGetValue
to retrieve the appropriate projectID
.
However, considering the volume of data, it might be worthwhile for you to look outside of EF: consider SqlBulkCopy
; here's an example.
-
\$\begingroup\$
Item4
is hard named by MS since I am using aTuple
, or at least I don't know how to change the name. I will give your extraction a try to see how much the time is improved. I was thinking about usingSqlBulkCopy
but this insert will almost never be ran, it is to just do the initial data import for the application so I kind of want to see what improvements can be done on the EF side without having to build out a whole ADO connection just for this. \$\endgroup\$Matthew Verstraete– Matthew Verstraete2016年06月03日 20:45:31 +00:00Commented Jun 3, 2016 at 20:45 -
\$\begingroup\$ Thinking on how to implement your suggestion of the extraction I am not seeing how it would work due to tasks having the same name (something I just realized). The data source I am pulling from allows multiple tasks with the same name, but only one task of a given name per project. Because of this I am not seeing a way to make the connection from the dictionary to the tuple. \$\endgroup\$Matthew Verstraete– Matthew Verstraete2016年06月03日 21:05:07 +00:00Commented Jun 3, 2016 at 21:05
-
1\$\begingroup\$ @Matthew Key of the dictionary is the value of Item4, value is the value of projectID. And why are you using a Tuple with four items? Why not use a custom class with properly named properties? \$\endgroup\$BCdotWEB– BCdotWEB2016年06月03日 21:32:00 +00:00Commented Jun 3, 2016 at 21:32
-
1\$\begingroup\$ @Matthew exposing a
Tuple
in a public API is bad design, because the client code has no way of telling the meaning ofItem1
from that ofItem4
. You don't change the name, you take 5 minutes to implement a tiny little public type that exposes the properties you need to carry around, and then you can rename them at will. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2016年06月04日 00:55:40 +00:00Commented Jun 4, 2016 at 0:55 -
\$\begingroup\$ @BCdotWEB Thank you for your help. I had implemented your suggestions and did see a slight improvement in the speed but after using Abbas's suggestion, in tandem with yours, there was a significant speed improvement so I have accepted his as the answer since I can only accept on and his had the biggest impact on speed. \$\endgroup\$Matthew Verstraete– Matthew Verstraete2016年06月10日 17:41:17 +00:00Commented Jun 10, 2016 at 17:41
Entity Framework isn't great for speed as others stated and SqlBulkCopy is a better tool for large inserts.
EntityFramework.BulkInsert is a nuget that encapsulates thr bulk copy such that it looks like its an EF operation. I've had some usage of it in the past.
-
\$\begingroup\$ note, this nuget is not free, there is a license fee \$\endgroup\$Avi– Avi2023年01月17日 14:03:07 +00:00Commented Jan 17, 2023 at 14:03
SqlBulkCopy
. \$\endgroup\$