Suppose a series of objects (presented here as tuples):
"a" | 1 "a" | 2 "b" | 3 "b" | 4 "a" | 5
There is no built in function (that I know of) to group by the first columns's sequence, that is, all the "a"'s in a row, then the "b"'s, then the one "a" alone. So that the groups become: {1,2},{3,4},{5} and not {1,2,5},{3,4}.
So I wrote this, which I'm submitting for review. I emulate all 8 variants of GroupBy
which I present here as the two main variants (with and without result selector):
public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>
(this TSource[] source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector,
IEqualityComparer<TKey> comparer)
{
var newElement = source.Select(keySelector).ToArray().MakeSequentialKey(comparer).Zip(
source.Select(elementSelector),
(x, y) => new Tuple<int, TElement>(x, y));
var groupElement = newElement.GroupBy(t => t.Item1, t => t.Item2);
var newKey = source.Select(keySelector).ToArray().MakeSequentialKey(comparer).Zip(
source.Select(keySelector),
(x, y) => new Tuple<int, TKey>(x, y));
var groupKey = newKey.GroupBy(t => t.Item1, t => t.Item2);
return groupKey.Zip(groupElement,
(key,element) => new Grouping<TKey,TElement>(key.First(),element));
}
public static IEnumerable<TResult> GroupBySequence<TSource, TKey, TElement, TResult>
(this TSource[] source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector,
Func<TKey, IEnumerable<TElement>, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
{
return source.GroupBySequence(keySelector,
elementSelector, comparer).Select(x => resultSelector(x.Key, x));
}
Helper methods:
//Performs an operation over each consecutive item. Here used for determining equality.
public static IEnumerable<TResult> WithNext<T, TResult>
(this T[] source, Func<T, T, TResult> operation)
{
return source.Zip(source.Skip(1), operation);
}
//Makes the unique key
public static IEnumerable<int> MakeSequentialKey<T>
(this T[] source, IEqualityComparer<T> comparer)
{
if (source.Length == 0)
return Enumerable.Empty<int>();
return (new[] { 0 })
.Concat(source.ToArray().WithNext<T, int>((x, y) => comparer.Equals(x, y) ? 0 : 1))
.ToArray()
.RunningSum();
}
//Sum of all previous elements up to each item of an array
public static IEnumerable<int> RunningSum(this int[] source)
{
int cumul = 0;
foreach (int i in source)
yield return cumul += i;
}
And the Grouping class, which is pretty much a straightforward implementation of IGrouping:
public class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
TKey key;
IEnumerable<TElement> elements;
public Grouping(TKey key, IEnumerable<TElement> elements)
{
this.key = key;
this.elements = elements;
}
public TKey Key { get { return key; } }
public IEnumerator<TElement> GetEnumerator()
{
return elements.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return elements.GetEnumerator();
}
}
Anticipated questions:
- What is your general approach here?
Generate a really unique key from the given key, group the elements AND the key with that, and either reform new groups with
Grouping
or apply result to it, so that the original key type is still used. - Why extent
T[]
and notIEnumerable<T>
? Because usage like this means the elements are ordered. It would not make sense to useGroupBySequence
over a Dictionary or a HashSet which both implement IEnumerable, because these two collections have AFAIK no notion of order. If there s a better or clearer way to indicate this, I don't know it.
I'm looking for criticism, suggestions on clarity and best practices. Thank you for your time.
4 Answers 4
As in my comment:
If
{1,2},{3,4},{5}
is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed?
public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>(
this TSource[] source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector,
IEqualityComparer<TKey> comparer)
{
if (source.Length == 0)
{
yield break;
}
TKey currentKey = keySelector(source.First());
var foundItems = new List<TElement>();
foreach (var item in source)
{
TKey key = keySelector(item);
if (!comparer.Equals(currentKey, key))
{
yield return new Grouping<TKey, TElement>(currentKey, foundItems);
currentKey = key;
foundItems = new List<TElement>();
}
foundItems.Add(elementSelector(item));
}
if (foundItems.Count > 0)
{
yield return new Grouping<TKey, TElement>(currentKey, foundItems);
}
}
-
1\$\begingroup\$ I changed
foundItems.Clear()
tofoundItems = new List<TElement>()
since it can cause problems otherwise. \$\endgroup\$Steven Jeuris– Steven Jeuris2011年12月08日 17:50:31 +00:00Commented Dec 8, 2011 at 17:50 -
\$\begingroup\$ Hmm, I guess the separate classes were really unnecessary. I should have seen that. \$\endgroup\$Jeff Mercado– Jeff Mercado2011年12月09日 00:06:40 +00:00Commented Dec 9, 2011 at 0:06
-
\$\begingroup\$ @JeffMercado: While looking through your solution I was wondering whether it was somehow more reusable. You might be onto something of splitting the behavior of 'consecutive equal objects'. Combining this with
Zip
could result in concise syntax which is conceptually separated from the grouping. Didn't have time to attempt this. :) \$\endgroup\$Steven Jeuris– Steven Jeuris2011年12月09日 10:15:14 +00:00Commented Dec 9, 2011 at 10:15
Here's another approach (less LINQ, slightly more code):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace testGrouping
{
static class GroupBySequenceExtension
{
internal class Grouping<TKey, TVal> : IGrouping<TKey, TVal>
{
public TKey Key { get; set; }
public IEnumerable<TVal> Items { get; set; }
public IEnumerator<TVal> GetEnumerator()
{
return Items.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return Items.GetEnumerator();
}
}
public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>
(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector,
IEqualityComparer<TKey> keyComparer)
{
TKey lastKey = default(TKey);
bool atFirst = true;
List<TElement> items = new List<TElement>();
foreach (var item in source)
{
var key = keySelector(item);
var element = elementSelector(item);
if (atFirst)
{
lastKey = key;
atFirst = false;
}
if (keyComparer.Equals(key, lastKey))
{
items.Add(element);
}
else
{
yield return new Grouping<TKey, TElement>
{
Key = lastKey,
Items = items
};
items = new List<TElement>();
items.Add(element);
}
lastKey = key;
}
if (items.Count > 0)
{
yield return new Grouping<TKey, TElement>
{
Key = lastKey,
Items = items
};
}
}
}
}
-
\$\begingroup\$ Although my code has less duplicate lines and doesn't rely on
default()
, this answer seems to carry the same idea of an easier solution than that of the OP. +1 for that. \$\endgroup\$Steven Jeuris– Steven Jeuris2011年12月08日 15:59:47 +00:00Commented Dec 8, 2011 at 15:59
For this, I would not use LINQ at all here. There unfortunately aren't any methods available to make this task easier. In fact, I would say trying to use what currently is available makes it more complicated and inefficient than it has to. As you can see by all the helper methods and whatnot you had to add, you can see how complicated it can be.
Just make it work for IEnumerable<TSource>
, there's no sense in restricting it to arrays. Sure HashSets and Dictionaries don't have a notion of ordering, but how else would you make this accessible to other "ordered" enumerables? You'd have to add overloads for each type you want to support since there aren't any interfaces implemented that has this distinction. It would be easier to let it work for all enumerables and leave when to use it up to the user of your code.
I actually have some code that does something like this. You only really need to keep track of the previous key that was added. If the current key matches the previous one, stick it in the same group, otherwise create a new one.
public static partial class EnumerableEx
{
public static IEnumerable<IGrouping<TKey, TSource>> GroupByConsecutive<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return GroupByConsecutive(source, keySelector, null as IEqualityComparer<TKey>);
}
public static IEnumerable<IGrouping<TKey, TSource>> GroupByConsecutive<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
return GroupByConsecutive(source, keySelector, Functions.Identity<TSource>, keyComparer);
}
public static IEnumerable<IGrouping<TKey, TElement>> GroupByConsecutive<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector)
{
return GroupByConsecutive(source, keySelector, elementSelector, null as IEqualityComparer<TKey>);
}
public static IEnumerable<IGrouping<TKey, TElement>> GroupByConsecutive<TSource, TKey, TElement>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector,
IEqualityComparer<TKey> keyComparer)
{
return ConsecutiveGrouper<TKey, TElement>.Create(source, keySelector, elementSelector, keyComparer as IEqualityComparer<TKey>);
}
internal class ConsecutiveGrouper<TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>
{
internal static ConsecutiveGrouper<TKey, TElement> Create<TSource, TKey, TElement>(
IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector,
IEqualityComparer<TKey> keyComparer)
{
source.ThrowIfNull("source");
keySelector.ThrowIfNull("keySelector");
elementSelector.ThrowIfNull("elementSelector");
var grouper = new ConsecutiveGrouper<TKey, TElement>(keyComparer);
foreach (var item in source)
{
grouper.NextGroup(keySelector(item)).Add(elementSelector(item));
}
return grouper;
}
private ConsecutiveGrouper(IEqualityComparer<TKey> keyComparer)
{
this._keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
this._groupings = new List<Grouping>();
this._lastGrouping = null;
}
private IEqualityComparer<TKey> _keyComparer;
private List<Grouping> _groupings;
private Grouping _lastGrouping;
private Grouping NextGroup(TKey key)
{
if (_lastGrouping == null)
{
_lastGrouping = new Grouping(key);
_groupings.Add(_lastGrouping);
}
else if (!_keyComparer.Equals(_lastGrouping.Key, key))
{
_lastGrouping = new Grouping(key);
_groupings.Add(_lastGrouping);
}
return _lastGrouping;
}
public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator()
{
return _groupings.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
class Grouping : IGrouping<TKey, TElement>
{
internal Grouping(TKey key)
{
this.Key = key;
this._elements = new List<TElement>();
}
public TKey Key { get; private set; }
private List<TElement> _elements;
internal void Add(TElement element)
{
_elements.Add(element);
}
public IEnumerator<TElement> GetEnumerator()
{
return _elements.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
}
p.s., Python's itertools.groupby()
iterator has the same semantics that you want as far as I can tell. You might want to give a look at the equivalent implementation.
-
\$\begingroup\$ I'm not sure I agree that helper functions are a sign of complexity. I might as well have coded the helper stuff in the functions themselves. I had to restrain myself not to make
RunningSum
take a lambda as a parameter in case one would like to use it for something else. Some languages allow many variations of Running functions. Ultimately I thought it didn't make it clear though. \$\endgroup\$MPelletier– MPelletier2011年12月07日 14:40:53 +00:00Commented Dec 7, 2011 at 14:40 -
\$\begingroup\$ Also, what does
null as IEqualityComparer<TKey>
do? I've never seen that before. \$\endgroup\$MPelletier– MPelletier2011年12月07日 14:41:31 +00:00Commented Dec 7, 2011 at 14:41 -
\$\begingroup\$ My point about the helper functions is that you needed the helper functions to make using the LINQ functions work. If you had used a more direct approach using a different algorithm, you could have avoided them all together. The
null as IEqualityComparer<TKey>
is just me being explicit about what I'm passing thenull
as. It's just a simple cast as if I did(IEqualityComparer<TKey>)null
. It's expecting anIEqualityComparer<TKey>
there but I'm passing innull
. It's helpful for distinguishing between intended calls to certain function overloads. \$\endgroup\$Jeff Mercado– Jeff Mercado2011年12月07日 19:55:03 +00:00Commented Dec 7, 2011 at 19:55 -
\$\begingroup\$ I see! I knew it was a cast, I just wasn't sure what it was doing there. Thought maybe the compiler would redefine it to the type's default. So why get rid of the comparer? \$\endgroup\$MPelletier– MPelletier2011年12月08日 02:26:00 +00:00Commented Dec 8, 2011 at 2:26
-
\$\begingroup\$ Written that way, I was trying to minimize the amount of repetition in the code. A common pattern when it comes to providing an optional comparer is to use
null
to signify that the default comparer is to be used. In this case, the comparer is set in the constructor forConsecutiveGrouper
. It is cleaner this way than to have that repeated in every overload that required it. \$\endgroup\$Jeff Mercado– Jeff Mercado2011年12月08日 04:02:12 +00:00Commented Dec 8, 2011 at 4:02
Since I was looking at similar functionality for runs of sequential numbers, I thought I'd offer my LINQ solution to this.
Using an extension method based on the APL scan operator, which is something like an Aggregate
that returns the intermediate results, which I also have modified to use KeyValuePair
to pair the scan value and matching element so you don't have to use ValueTuple
s and keep up with it manually:
public static IEnumerable<KeyValuePair<TKey, T>> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<KeyValuePair<TKey, T>, T, TKey> combine) {
var srce = src.GetEnumerator();
if (srce.MoveNext()) {
var seed = new KeyValuePair<TKey, T>(seedKey, srce.Current);
while (srce.MoveNext()) {
yield return seed;
seed = new KeyValuePair<TKey, T>(combine(seed, srce.Current), srce.Current);
}
yield return seed;
}
}
You can create a group number for each matching sequence of keys, then group on these group numbers:
public static IEnumerable<IGrouping<int, TResult>> GroupByRuns<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) {
cmp = cmp ?? EqualityComparer<TKey>.Default;
return src.ScanPair(0,
(kvp, cur) => cmp.Equals(key(kvp.Value), key(cur)) ? kvp.Key : kvp.Key + 1)
.GroupBy(kvp => kvp.Key, kvp => result(kvp.Value));
}
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.GroupByRuns(key, e => e);
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement>(this IEnumerable<TElement> src) => src.GroupByRuns(e => e, e => e);
public static IEnumerable<IEnumerable<TResult>> Runs<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) =>
src.GroupByRuns(key, result).Select(g => g.Select(s => s));
public static IEnumerable<IEnumerable<TElement>> Runs<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.Runs(key, e => e);
public static IEnumerable<IEnumerable<TElement>> Runs<TElement>(this IEnumerable<TElement> src) => src.Runs(e => e, e => e);
I have two variations, the ones named GroupByRuns
return IEnumerable<IGrouping>
analogous to GroupBy
, the ones named Runs
returns IEnumerable<IEnumerable>
.
{1,2},{3,4},{5}
is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed? \$\endgroup\$GroupBy
. \$\endgroup\$