Generic method to split provided collection into smaller collections

Question 1

First time writing a generic method here. Input a List<T> and an int value and output a List<List<T>> with each member List<T> of the provided int input.

No doubt there are easier / less verbose ways of handling this via LINQ but I wanted to try my hand with a generic method. Please let me know of any ways that I could improve this method.

static List<List<T>> SplitIntoChunks<T>(List<T> fullBatch, int chunkSize)
{
 if (chunkSize <= 0)
 {
 throw new ArgumentOutOfRangeException("Chunk size cannot be less than or equal to zero.");
 }
 if (fullBatch == null)
 {
 throw new ArgumentNullException("Input to be split cannot be null.");
 }
 int numOfChunks = fullBatch.Count / chunkSize;
 //handles uneven number of items within the full batch to ensure none at the end are missed
 if (fullBatch.Count % chunkSize > 0)
 {
 numOfChunks++;
 }
 int cellCounter = 0;
 List<List<T>> splitChunks = new List<List<T>>();
 for (int chunkNum = 0; chunkNum < numOfChunks; chunkNum++)
 {
 var chunk = new List<T>();
 for (int index = 0; index < chunkSize; index++)
 {
 if (index < fullBatch.Count)
 {
 chunk.Add(fullBatch[index]);
 cellCounter++;
 }
 }
 splitChunks.Add(chunk);
 }
 return splitChunks;
}

Question 2

Your implementation is not bad for a no-LinQ solution. But there's always room for improvement. First I'll provide a LinQ solution that provides a clean way to return a chunked list:

public static List<List<T>> Split<T>(List<T> collection, int size)
{
 var chunks = new List<List<T>>();
 var chunkCount = collection.Count() / size;
 if (collection.Count % size > 0)
 chunkCount++;
 for (var i = 0; i < chunkCount; i++)
 chunks.Add(collection.Skip(i * size).Take(size).ToList());
 return chunks;
}

Basically it comes down to this:

calculate the count of chunks that are needed
loop over the length of chunks
use the Enumerable.Skip and Enumerable.Take methods to get chunks
return the list of chunks

Now, you implemented a no-LinQ solution so I created one myself too. My implementation doesn't have to calculate the amount of chunks or use two loops to create the list of chunks:

public static List<List<T>> SplitNoLinq<T>(List<T> collection, int size)
{
 var chunks = new List<List<T>>();
 var count = 0;
 var temp = new List<T>();
 foreach (var element in collection)
 {
 if (count++ == size)
 {
 chunks.Add(temp);
 temp = new List<T>();
 count = 1;
 }
 temp.Add(element);
 }
 chunks.Add(temp); 
 return chunks;
}

The code iterates over the collection and keeps a counter, adding the iterated item to a temporary list. If the counter equals the desired length of a chunk it will add the temporary list to the return list. At the end, the last chunk is added.

The var keyword:

From the C# Programming Guide:

The var keyword can also be useful when the specific type of the variable is tedious to type on the keyboard, or is obvious, or does not add to the readability of the code.

So lines like:

List<List<T>> splitChunks = new List<List<T>>();

would become:

var splitChunks = new List<List<T>>();

Furthermore you could place the code in an extension method, also using an IEnumerable<T> instead of List<T>:

public static class Extensions
{
 public static List<List<T>> Split<T>(this IEnumerable<T> collection, int size)
 {
 var chunks = new List<List<T>>();
 var count = 0;
 var temp = new List<T>();
 foreach (var element in collection)
 {
 if (count++ == size)
 {
 chunks.Add(temp);
 temp = new List<T>();
 count = 1;
 }
 temp.Add(element);
 }
 chunks.Add(temp);
 return chunks;
 }
}
//USAGE::
var numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };
var chunked = numbers.Split(5);

Question 3

if you're going to make the parameter IEnumerable<T>, might as well make the return type IEnumerable<IEnumerable<T>> too. Probably an opportunity for lazy evaluation with yield return as well.

Question 4

Excellent suggestions thank you, I'll keep this question open for a while to see what others come up with.

Question 5

Here's a version using techniques I referenced in my comment on this answer:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> fullBatch, int chunkSize)
{
 if (chunkSize <= 0)
 {
 throw new ArgumentOutOfRangeException(
 "chunkSize",
 chunkSize,
 "Chunk size cannot be less than or equal to zero.");
 }
 if (fullBatch == null)
 {
 throw new ArgumentNullException("fullBatch", "Input to be split cannot be null.");
 }
 var cellCounter = 0;
 var chunk = new List<T>(chunkSize);
 foreach (var element in fullBatch)
 {
 if (cellCounter++ == chunkSize)
 {
 yield return chunk;
 chunk = new List<T>(chunkSize);
 cellCounter = 1;
 }
 chunk.Add(element);
 }
 yield return chunk;
}

Note I'm doing the following:

Pre-allocating list size to be the chunk size (i.e. minimizes re-allocations while adding to the list)
Using the "state machine" of yield return so that the evaluation is lazy (can be effectively used in LINQ)
Extension method on IEnumerable<T> so that it plays nicely with LINQ
Use the proper overloads on the exception constructors as to provide all the pertinent information

Question 6

And because it should be mentioned: stackoverflow.com/questions/30176121/…

Question 7

Not the shortest source code, but I love how nicely optimized it is. I assume Skip() can be pretty slow with lazy evaluation and expensive element getter, this one offers reasonable balance between memory usage and speed. BTW, sometimes LINQ is faster than eager evaluation, especially with huge amount of data.

Abbas Abbas 5,60324 silver badges40 bronze badges · Accepted Answer · 2015-05-08 19:06:43Z

Your implementation is not bad for a no-LinQ solution. But there's always room for improvement. First I'll provide a LinQ solution that provides a clean way to return a chunked list:

public static List<List<T>> Split<T>(List<T> collection, int size)
{
 var chunks = new List<List<T>>();
 var chunkCount = collection.Count() / size;
 if (collection.Count % size > 0)
 chunkCount++;
 for (var i = 0; i < chunkCount; i++)
 chunks.Add(collection.Skip(i * size).Take(size).ToList());
 return chunks;
}

Basically it comes down to this:

calculate the count of chunks that are needed
loop over the length of chunks
use the Enumerable.Skip and Enumerable.Take methods to get chunks
return the list of chunks

Now, you implemented a no-LinQ solution so I created one myself too. My implementation doesn't have to calculate the amount of chunks or use two loops to create the list of chunks:

public static List<List<T>> SplitNoLinq<T>(List<T> collection, int size)
{
 var chunks = new List<List<T>>();
 var count = 0;
 var temp = new List<T>();
 foreach (var element in collection)
 {
 if (count++ == size)
 {
 chunks.Add(temp);
 temp = new List<T>();
 count = 1;
 }
 temp.Add(element);
 }
 chunks.Add(temp); 
 return chunks;
}

The code iterates over the collection and keeps a counter, adding the iterated item to a temporary list. If the counter equals the desired length of a chunk it will add the temporary list to the return list. At the end, the last chunk is added.

The var keyword:

From the C# Programming Guide:

The var keyword can also be useful when the specific type of the variable is tedious to type on the keyboard, or is obvious, or does not add to the readability of the code.

So lines like:

List<List<T>> splitChunks = new List<List<T>>();

would become:

var splitChunks = new List<List<T>>();

Furthermore you could place the code in an extension method, also using an IEnumerable<T> instead of List<T>:

public static class Extensions
{
 public static List<List<T>> Split<T>(this IEnumerable<T> collection, int size)
 {
 var chunks = new List<List<T>>();
 var count = 0;
 var temp = new List<T>();
 foreach (var element in collection)
 {
 if (count++ == size)
 {
 chunks.Add(temp);
 temp = new List<T>();
 count = 1;
 }
 temp.Add(element);
 }
 chunks.Add(temp);
 return chunks;
 }
}
//USAGE::
var numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };
var chunked = numbers.Split(5);

if you're going to make the parameter IEnumerable<T>, might as well make the return type IEnumerable<IEnumerable<T>> too. Probably an opportunity for lazy evaluation with yield return as well.
Excellent suggestions thank you, I'll keep this question open for a while to see what others come up with.

Stack Exchange Network

Generic method to split provided collection into smaller collections

2 Answers 2

Hot Network Questions

Generic method to split provided collection into smaller collections

2 Answers 2

Related

Hot Network Questions