This function divides a sequence into partitions, where a partition is a list of consecutive matching elements.
Example
Input: (A, A, B, B, B, A, C, C)
Output: ((A, A), (B, B, B), (A), (C, C))
I've tried to make this code "obviously correct", but it still doesn't look that way to me.
public static IEnumerable<List<T>> PartitionBy<T, PK>(this IEnumerable<T> sequence, Func<T, PK> partitionKey)
{
return sequence.PartitionBy(partitionKey, EqualityComparer<PK>.Default);
}
public static IEnumerable<List<T>> PartitionBy<T>(this IEnumerable<T> sequence, IEqualityComparer<T> comparer)
{
return sequence.PartitionBy(item => item, comparer);
}
public static IEnumerable<List<T>> PartitionBy<T, X>(this IEnumerable<T> sequence, Func<T, X> partitionKey, IEqualityComparer<X> comparer)
{
var itr = sequence.GetEnumerator();
if (!itr.MoveNext())
{
// empty sequence was passed in, so return empty sequence
yield break;
}
// Start the first partition.
var currentList = new List<T>(new[] { itr.Current });
while (itr.MoveNext())
{
var key1 = partitionKey(currentList[0]);
var key2 = partitionKey(itr.Current);
if (comparer.Equals(key1, key2))
{
// continue current partition
currentList.Add(itr.Current);
}
else
{
// yield current partition and start a new one
yield return currentList;
currentList = new List<T>(new[] { itr.Current });
}
}
// We know it has at least 1 element here.
yield return currentList;
}
1 Answer 1
Just a small point to start:
new List<T>(new[] { itr.Current });
You don't need the array too, you can just do:
new List<T> { itr.Current };
I'd suggest that you aim for consistency with your generic type names too. Why X
vs PK
? I'd suggest TKey
for both.
You could do it just with a foreach:
public static IEnumerable<IEnumerable<T>> Partition<T, TKey>(
this IEnumerable<T> items,
Func<T, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
List<T> currentPartition = null;
foreach (var item in items)
{
if (currentPartition != null
&& comparer.Equals(keySelector(item), keySelector(currentPartition[0])))
{
currentPartition.Add(item);
}
else
{
if (currentPartition != null)
{
yield return currentPartition;
}
currentPartition = new List<T> { item };
}
}
if (currentPartition != null)
{
yield return currentPartition;
}
}
Is it clearer than your code? I'm not convinced it is.
-
\$\begingroup\$ Despite the (relative) complexity inside the loop, I do think this will be clearer for most programmers, simply because it avoids the cumbersome enumerator code that
foreach
is meant to free us from. \$\endgroup\$VisualMelon– VisualMelon2017年05月31日 21:35:22 +00:00Commented May 31, 2017 at 21:35 -
\$\begingroup\$ Yeah, I started with a foreach but it wasn't as clean as yours. I think I slightly prefer yours to my OP now. And naming it "currentPartition" is a big improvement over "currentList". \$\endgroup\$default.kramer– default.kramer2017年05月31日 22:13:50 +00:00Commented May 31, 2017 at 22:13
(A, A)
and another is(A)
\$\endgroup\$