I reviewed this question that in my opinion uses a not so pretty batch-save so I tried improve it and this is what I came up with.
In this more generic solution that allows to reuse it I wanted to implement the using
statement instead of manually disposing and recreating the context.
It is an extension that you can use on a collection of entities. It requires the size of the batch and a factory method for the context. The last parameter is for reporting progress.
Unlike the original solution this one works with double so it is able to report progress for collection that are smaller then 100 items.
Internally it uses two lambdas: one for checking if the batch should be saved and the other one for progress reporting. The context is wrapped with a using statement.
static class EnumerableExtensions
{
public static void BatchSave<T, TContext>(
this IList<T> items,
int batchSize,
Func<TContext> createContext,
IProgress<double> progress
)
where T : class
where TContext : DbContext
{
var count = 0;
var lastPercentage = 0d;
var nextBatch = new Func<bool>(() => ++count % batchSize == 0);
var reportProgress = new Action(() =>
{
var percentage = Math.Round((double)count / (double)items.Count * 100d, 1);
if (percentage > lastPercentage)
{
progress.Report(percentage);
lastPercentage = percentage;
}
});
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
using (var context = createContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
do
{
//context.Set<T>().Add(enumerator.Current);
Console.WriteLine("Add " + enumerator.Current);
reportProgress();
} while (!nextBatch() && enumerator.MoveNext());
//context.SaveChanges();
Console.WriteLine("Save batch = " + count);
}
}
reportProgress();
}
}
class TestContext : DbContext { }
I needed to comment the two entity framework lines out so that it doesn't crash with the TestContext
.
Example usage:
void Main()
{
// convert items to string to satisfy the class constraint
var items = Enumerable.Range(0, 23).Select(x => x.ToString()).ToArray();
var batchSize = 10;
var progress = new Progress<double>(percent => Console.WriteLine($"Progress {percent}"));
items.BatchSave(batchSize, () => new TestContext(), progress);
}
Output:
Add 0
Add 1
Add 2
Add 3
Add 4
Add 5
Add 6
Add 7
Add 8
Add 9
Save batch = 10
Add 10
Progress 4,3
Add 11
Add 12
Add 13
Add 14
Add 15
Add 16
Add 17
Add 18
Add 19
Save batch = 20
Add 20
Add 21
Add 22
Save batch = 23
Progress 13
Progress 17,4
Progress 21,7
Progress 26,1
Progress 30,4
Progress 34,8
Progress 39,1
Progress 43,5
Progress 47,8
Progress 52,2
Progress 56,5
Progress 60,9
Progress 65,2
Progress 69,6
Progress 73,9
Progress 78,3
Progress 8,7
Progress 82,6
Progress 91,3
Progress 95,7
Progress 100
Progress 87
-
\$\begingroup\$ I know nothing about Entity Framework. Is there a reason this is an extension method on a List rather than part of a repository? \$\endgroup\$404– 4042016年12月22日 21:40:16 +00:00Commented Dec 22, 2016 at 21:40
-
\$\begingroup\$ @eurotrash it should be reusable for any repository/context, just a utility extension. Batch save is something you might need in many places. \$\endgroup\$t3chb0t– t3chb0t2016年12月22日 21:42:59 +00:00Commented Dec 22, 2016 at 21:42
-
\$\begingroup\$ But you commented out the two lines that are crucial for the code to actually work (or do anything at all, for that matter). Thus is this really "working" code? \$\endgroup\$Kirk Woll– Kirk Woll2016年12月22日 22:48:48 +00:00Commented Dec 22, 2016 at 22:48
-
\$\begingroup\$ @KirkWoll I disabled the context becasue of this original question. This pattern seem to work for them so I wanted to save some time and just work on a prettier version of it. \$\endgroup\$t3chb0t– t3chb0t2016年12月23日 05:41:34 +00:00Commented Dec 23, 2016 at 5:41
2 Answers 2
var count = 0; var lastPercentage = 0d; var nextBatch = new Func<bool>(() => ++count % batchSize == 0);
It was a clever usage of context capture but I think it ended up playing against you.
Your method has way too much responsibility. You should had at least another method to resolve the batching problem. Because you mixed the two things(batch + db stuff) you ended up by having this:
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
//...
do{
//...
} while (!nextBatch() && enumerator.MoveNext());
}
And the first thing that I thought about was: "Why the hell is he advancing the enumerator twice? Why couldn't he use foreach
instead?" And than I realized, without even parsing the code too much: "Ah it must be because of the batching stuff...".
It doesn't really matter if I got that right or not, I had to spend time to think about it, asking this kind questions is very useful actually. I would strongly suggest you next time you see yourself using enumerator.MoveNext
to ask the same thing. Why aren't you using foreach
?
Anyway, as I said you should separate those two concerns. A batch implementation would become like this:
public IEnumerable<IEnumerable<int>> Batch(int total, int batchSize){
for(int i = 0; i < total; i += batchSize){
var currentBatchSize = Math.Min(batchSize, total - i);
yield return Enumerable.Range(i * batchSize, currentBatchSize);
}
}
A similar thing could be said about reportProgress
. You ended up by creating abstraction over a simple calculation that is to calculate the report progress(with a simple mathematical formula). It didn't really contribute in any positive way to the readability/maintainability of your code.
Putting that all into a new implementation would look like this:
public static IEnumerable<IEnumerable<int>> GetBatches(int total, int batchSize){
for(int i = 0; i < total; i += batchSize){
var currentBatchSize = Math.Min(batchSize, total - i);
yield return Enumerable.Range(i, currentBatchSize);
}
}
public static void BatchSave<T, TContext>(
this IList<T> items,
int batchSize,
Func<TContext> createContext,
IProgress<double> progress
)
where T : class
where TContext : DbContext
{
var batches = GetBatches(items.Count, batchSize).ToList();
int currentBatch = 0;
foreach(var batch in batches){
using (var context = createContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
foreach(var i in batch){
Console.WriteLine("Add " + items[i]);
}
//context.SaveChanges();
progress.Report(++currentBatch * 100.0 / batches.Count);
Console.WriteLine("Save batch = " + currentBatch);
}
}
progress.Report(100);
}
This alternative might yet lack something. For example, if you are really a neat freak, maybe you would try to change the return result of GetBatches
in someway to be more useful. One thing that could be useful is to know which batch is the current batch. Should the return be IEnumerable<IEnumerable<Tuple<int, int>>>
(where tuple can be another classes with two integers if you want...). Would it be more useful to turn it to GetBatches<T>
? with IEnumerable<IEnumerable<T>>
return? Well I will leave that for you to decide, at least I am providing the foundations for it.
One another thing that might be worth pointing is that there isn't really a reason to createContext()
multiple times, you can create it just one time, and just call SaveChanges
per each batch. Or maybe there is an efficiency reason, that I am not aware of... but you have to prove it by measuring.
-
\$\begingroup\$ I agree, precalculating batches looks much cleaner then the enumerator hack which you are right about, I needed it for batching. I have to defend the
reportProgress
;-) Its purpose is to report progress if it changes by at least one percent. In your solution it would report progress after each batch. On the other side it's questionable if I even should do it for batches that are not yet fully saved but if I used a transaction that it would be fine again, I guess. \$\endgroup\$t3chb0t– t3chb0t2016年12月23日 05:33:25 +00:00Commented Dec 23, 2016 at 5:33 -
\$\begingroup\$ You brought me to an idea with the
GetBatches
method. Actually there is one in the .Net that can already do something simliar:Partitioner.Create(0, items.Length, batchSize).GetDynamicPartitions()
. It's in theSystem.Collections.Concurrent
namespace. I think I can use this one. \$\endgroup\$t3chb0t– t3chb0t2016年12月23日 07:00:29 +00:00Commented Dec 23, 2016 at 7:00
I don't really know C# but this looks pretty nice.
The main part of the code is tightly coupled to a DbContext
.
As I saw a TestContext
,
I was a bit surprised that it wasn't used more to verify the behavior of the BatchSave
method.
That is, instead of commenting out the lines //context.Set<T>().Add(enumerator.Current);
and //context.SaveChanges();
and replacing them with print statements for the sake of a demo,
it would be great if you could leave the implementation intact and write a TestContext
that would allow verification.
A minor point is that nextBatch
is declared too early,
far from the first place of use.
It would be better to move it closer,
right in front of the declaration of enumerator
.
Finally, a tiny thing, the casting of items.Count
to double
is redundant, as the cast on count
will already make the expression double
.
Explore related questions
See similar questions with these tags.