Grouping by sequence in LINQ

Question 1

Suppose a series of objects (presented here as tuples):

"a" | 1
"a" | 2
"b" | 3
"b" | 4
"a" | 5

There is no built in function (that I know of) to group by the first columns's sequence, that is, all the "a"'s in a row, then the "b"'s, then the one "a" alone. So that the groups become: {1,2},{3,4},{5} and not {1,2,5},{3,4}.

So I wrote this, which I'm submitting for review. I emulate all 8 variants of GroupBy which I present here as the two main variants (with and without result selector):

public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>
 (this TSource[] source,
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 IEqualityComparer<TKey> comparer)
{
 var newElement = source.Select(keySelector).ToArray().MakeSequentialKey(comparer).Zip(
 source.Select(elementSelector),
 (x, y) => new Tuple<int, TElement>(x, y));
 var groupElement = newElement.GroupBy(t => t.Item1, t => t.Item2);
 var newKey = source.Select(keySelector).ToArray().MakeSequentialKey(comparer).Zip(
 source.Select(keySelector),
 (x, y) => new Tuple<int, TKey>(x, y));
 var groupKey = newKey.GroupBy(t => t.Item1, t => t.Item2);
 return groupKey.Zip(groupElement, 
 (key,element) => new Grouping<TKey,TElement>(key.First(),element));
}
public static IEnumerable<TResult> GroupBySequence<TSource, TKey, TElement, TResult>
 (this TSource[] source,
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 Func<TKey, IEnumerable<TElement>, TResult> resultSelector,
 IEqualityComparer<TKey> comparer)
{
 return source.GroupBySequence(keySelector, 
 elementSelector, comparer).Select(x => resultSelector(x.Key, x));
}

Helper methods:

//Performs an operation over each consecutive item. Here used for determining equality.
public static IEnumerable<TResult> WithNext<T, TResult>
 (this T[] source, Func<T, T, TResult> operation)
{
 return source.Zip(source.Skip(1), operation);
}
//Makes the unique key
public static IEnumerable<int> MakeSequentialKey<T>
 (this T[] source, IEqualityComparer<T> comparer)
{
 if (source.Length == 0)
 return Enumerable.Empty<int>();
 return (new[] { 0 })
 .Concat(source.ToArray().WithNext<T, int>((x, y) => comparer.Equals(x, y) ? 0 : 1))
 .ToArray()
 .RunningSum();
}
//Sum of all previous elements up to each item of an array
public static IEnumerable<int> RunningSum(this int[] source)
{
 int cumul = 0;
 foreach (int i in source)
 yield return cumul += i;
}

And the Grouping class, which is pretty much a straightforward implementation of IGrouping:

public class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
 TKey key;
 IEnumerable<TElement> elements;
 public Grouping(TKey key, IEnumerable<TElement> elements)
 {
 this.key = key;
 this.elements = elements;
 }
 public TKey Key { get { return key; } }
 public IEnumerator<TElement> GetEnumerator()
 {
 return elements.GetEnumerator();
 }
 IEnumerator IEnumerable.GetEnumerator()
 {
 return elements.GetEnumerator();
 }
}

Anticipated questions:

What is your general approach here? Generate a really unique key from the given key, group the elements AND the key with that, and either reform new groups with Grouping or apply result to it, so that the original key type is still used.
Why extent T[] and not IEnumerable<T>? Because usage like this means the elements are ordered. It would not make sense to use GroupBySequence over a Dictionary or a HashSet which both implement IEnumerable, because these two collections have AFAIK no notion of order. If there s a better or clearer way to indicate this, I don't know it.

I'm looking for criticism, suggestions on clarity and best practices. Thank you for your time.

Question 2

"Pre-anticipated"? When would you ever anticipate anything except before it happens? ;)

Question 3

Oops, pleonasm...

Question 4

If {1,2},{3,4},{5} is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed?

Question 5

@StevenJeuris My example is simplified, I was looking at making something generic and portable for various usage. I added no complexity that was not already present in GroupBy.

Question 6

@StevenJeuris Done: programmers.stackexchange.com/questions/124128/…

Question 7

As in my comment:

If {1,2},{3,4},{5} is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed?

public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>(
 this TSource[] source,
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 IEqualityComparer<TKey> comparer)
{
 if (source.Length == 0)
 {
 yield break;
 }
 TKey currentKey = keySelector(source.First());
 var foundItems = new List<TElement>();
 foreach (var item in source)
 {
 TKey key = keySelector(item);
 if (!comparer.Equals(currentKey, key))
 { 
 yield return new Grouping<TKey, TElement>(currentKey, foundItems);
 currentKey = key;
 foundItems = new List<TElement>();
 }
 foundItems.Add(elementSelector(item));
 }
 if (foundItems.Count > 0)
 {
 yield return new Grouping<TKey, TElement>(currentKey, foundItems);
 }
}

Question 8

I changed foundItems.Clear() to foundItems = new List<TElement>() since it can cause problems otherwise.

Question 9

Hmm, I guess the separate classes were really unnecessary. I should have seen that.

Question 10

@JeffMercado: While looking through your solution I was wondering whether it was somehow more reusable. You might be onto something of splitting the behavior of 'consecutive equal objects'. Combining this with Zip could result in concise syntax which is conceptually separated from the grouping. Didn't have time to attempt this. :)

Question 11

Here's another approach (less LINQ, slightly more code):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace testGrouping
{
 static class GroupBySequenceExtension
 {
 internal class Grouping<TKey, TVal> : IGrouping<TKey, TVal>
 { 
 public TKey Key { get; set; }
 public IEnumerable<TVal> Items { get; set; }
 public IEnumerator<TVal> GetEnumerator()
 {
 return Items.GetEnumerator();
 }
 System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
 {
 return Items.GetEnumerator();
 } 
 }
 public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>
 (this IEnumerable<TSource> source, 
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 IEqualityComparer<TKey> keyComparer)
 {
 TKey lastKey = default(TKey);
 bool atFirst = true;
 List<TElement> items = new List<TElement>();
 foreach (var item in source)
 {
 var key = keySelector(item);
 var element = elementSelector(item);
 if (atFirst)
 {
 lastKey = key;
 atFirst = false;
 }
 if (keyComparer.Equals(key, lastKey))
 {
 items.Add(element);
 }
 else
 {
 yield return new Grouping<TKey, TElement>
 {
 Key = lastKey, 
 Items = items
 };
 items = new List<TElement>();
 items.Add(element);
 }
 lastKey = key;
 }
 if (items.Count > 0)
 {
 yield return new Grouping<TKey, TElement>
 {
 Key = lastKey,
 Items = items
 };
 }
 }
 }
}

Question 12

Although my code has less duplicate lines and doesn't rely on default(), this answer seems to carry the same idea of an easier solution than that of the OP. +1 for that.

Question 13

For this, I would not use LINQ at all here. There unfortunately aren't any methods available to make this task easier. In fact, I would say trying to use what currently is available makes it more complicated and inefficient than it has to. As you can see by all the helper methods and whatnot you had to add, you can see how complicated it can be.

Just make it work for IEnumerable<TSource>, there's no sense in restricting it to arrays. Sure HashSets and Dictionaries don't have a notion of ordering, but how else would you make this accessible to other "ordered" enumerables? You'd have to add overloads for each type you want to support since there aren't any interfaces implemented that has this distinction. It would be easier to let it work for all enumerables and leave when to use it up to the user of your code.

I actually have some code that does something like this. You only really need to keep track of the previous key that was added. If the current key matches the previous one, stick it in the same group, otherwise create a new one.

public static partial class EnumerableEx
{
 public static IEnumerable<IGrouping<TKey, TSource>> GroupByConsecutive<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
 {
 return GroupByConsecutive(source, keySelector, null as IEqualityComparer<TKey>);
 }
 public static IEnumerable<IGrouping<TKey, TSource>> GroupByConsecutive<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
 {
 return GroupByConsecutive(source, keySelector, Functions.Identity<TSource>, keyComparer);
 }
 public static IEnumerable<IGrouping<TKey, TElement>> GroupByConsecutive<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector)
 {
 return GroupByConsecutive(source, keySelector, elementSelector, null as IEqualityComparer<TKey>);
 }
 public static IEnumerable<IGrouping<TKey, TElement>> GroupByConsecutive<TSource, TKey, TElement>(
 this IEnumerable<TSource> source,
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 IEqualityComparer<TKey> keyComparer)
 {
 return ConsecutiveGrouper<TKey, TElement>.Create(source, keySelector, elementSelector, keyComparer as IEqualityComparer<TKey>);
 }
 internal class ConsecutiveGrouper<TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>
 {
 internal static ConsecutiveGrouper<TKey, TElement> Create<TSource, TKey, TElement>(
 IEnumerable<TSource> source,
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 IEqualityComparer<TKey> keyComparer)
 {
 source.ThrowIfNull("source");
 keySelector.ThrowIfNull("keySelector");
 elementSelector.ThrowIfNull("elementSelector");
 var grouper = new ConsecutiveGrouper<TKey, TElement>(keyComparer);
 foreach (var item in source)
 {
 grouper.NextGroup(keySelector(item)).Add(elementSelector(item));
 }
 return grouper;
 }
 private ConsecutiveGrouper(IEqualityComparer<TKey> keyComparer)
 {
 this._keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
 this._groupings = new List<Grouping>();
 this._lastGrouping = null;
 }
 private IEqualityComparer<TKey> _keyComparer;
 private List<Grouping> _groupings;
 private Grouping _lastGrouping;
 private Grouping NextGroup(TKey key)
 {
 if (_lastGrouping == null)
 {
 _lastGrouping = new Grouping(key);
 _groupings.Add(_lastGrouping);
 }
 else if (!_keyComparer.Equals(_lastGrouping.Key, key))
 {
 _lastGrouping = new Grouping(key);
 _groupings.Add(_lastGrouping);
 }
 return _lastGrouping;
 }
 public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator()
 {
 return _groupings.GetEnumerator();
 }
 System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
 {
 return GetEnumerator();
 }
 class Grouping : IGrouping<TKey, TElement>
 {
 internal Grouping(TKey key)
 {
 this.Key = key;
 this._elements = new List<TElement>();
 }
 public TKey Key { get; private set; }
 private List<TElement> _elements;
 internal void Add(TElement element)
 {
 _elements.Add(element);
 }
 public IEnumerator<TElement> GetEnumerator()
 {
 return _elements.GetEnumerator();
 }
 System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
 {
 return GetEnumerator();
 }
 }
 }
}

p.s., Python's itertools.groupby() iterator has the same semantics that you want as far as I can tell. You might want to give a look at the equivalent implementation.

Question 14

I'm not sure I agree that helper functions are a sign of complexity. I might as well have coded the helper stuff in the functions themselves. I had to restrain myself not to make RunningSum take a lambda as a parameter in case one would like to use it for something else. Some languages allow many variations of Running functions. Ultimately I thought it didn't make it clear though.

Question 15

Also, what does null as IEqualityComparer<TKey> do? I've never seen that before.

Question 16

My point about the helper functions is that you needed the helper functions to make using the LINQ functions work. If you had used a more direct approach using a different algorithm, you could have avoided them all together. The null as IEqualityComparer<TKey> is just me being explicit about what I'm passing the null as. It's just a simple cast as if I did (IEqualityComparer<TKey>)null. It's expecting an IEqualityComparer<TKey> there but I'm passing in null. It's helpful for distinguishing between intended calls to certain function overloads.

Question 17

I see! I knew it was a cast, I just wasn't sure what it was doing there. Thought maybe the compiler would redefine it to the type's default. So why get rid of the comparer?

Question 18

Written that way, I was trying to minimize the amount of repetition in the code. A common pattern when it comes to providing an optional comparer is to use null to signify that the default comparer is to be used. In this case, the comparer is set in the constructor for ConsecutiveGrouper. It is cleaner this way than to have that repeated in every overload that required it.

Question 19

Since I was looking at similar functionality for runs of sequential numbers, I thought I'd offer my LINQ solution to this.

Using an extension method based on the APL scan operator, which is something like an Aggregate that returns the intermediate results, which I also have modified to use KeyValuePair to pair the scan value and matching element so you don't have to use ValueTuples and keep up with it manually:

public static IEnumerable<KeyValuePair<TKey, T>> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<KeyValuePair<TKey, T>, T, TKey> combine) {
 var srce = src.GetEnumerator();
 if (srce.MoveNext()) {
 var seed = new KeyValuePair<TKey, T>(seedKey, srce.Current);
 while (srce.MoveNext()) {
 yield return seed;
 seed = new KeyValuePair<TKey, T>(combine(seed, srce.Current), srce.Current);
 }
 yield return seed;
 }
}

You can create a group number for each matching sequence of keys, then group on these group numbers:

public static IEnumerable<IGrouping<int, TResult>> GroupByRuns<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) {
 cmp = cmp ?? EqualityComparer<TKey>.Default;
 return src.ScanPair(0,
 (kvp, cur) => cmp.Equals(key(kvp.Value), key(cur)) ? kvp.Key : kvp.Key + 1)
 .GroupBy(kvp => kvp.Key, kvp => result(kvp.Value));
}
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.GroupByRuns(key, e => e);
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement>(this IEnumerable<TElement> src) => src.GroupByRuns(e => e, e => e);
public static IEnumerable<IEnumerable<TResult>> Runs<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) =>
 src.GroupByRuns(key, result).Select(g => g.Select(s => s));
public static IEnumerable<IEnumerable<TElement>> Runs<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.Runs(key, e => e);
public static IEnumerable<IEnumerable<TElement>> Runs<TElement>(this IEnumerable<TElement> src) => src.Runs(e => e, e => e);

I have two variations, the ones named GroupByRuns return IEnumerable<IGrouping> analogous to GroupBy, the ones named Runs returns IEnumerable<IEnumerable>.

Steven Jeuris Steven Jeuris 2,6623 gold badges21 silver badges34 bronze badges · Accepted Answer · 2011-12-08 14:59:11Z

As in my comment:

If {1,2},{3,4},{5} is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed?

public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>(
 this TSource[] source,
 Func<TSource, TKey> keySelector,
 Func<TSource, TElement> elementSelector,
 IEqualityComparer<TKey> comparer)
{
 if (source.Length == 0)
 {
 yield break;
 }
 TKey currentKey = keySelector(source.First());
 var foundItems = new List<TElement>();
 foreach (var item in source)
 {
 TKey key = keySelector(item);
 if (!comparer.Equals(currentKey, key))
 { 
 yield return new Grouping<TKey, TElement>(currentKey, foundItems);
 currentKey = key;
 foundItems = new List<TElement>();
 }
 foundItems.Add(elementSelector(item));
 }
 if (foundItems.Count > 0)
 {
 yield return new Grouping<TKey, TElement>(currentKey, foundItems);
 }
}

I changed foundItems.Clear() to foundItems = new List<TElement>() since it can cause problems otherwise.
Hmm, I guess the separate classes were really unnecessary. I should have seen that.
@JeffMercado: While looking through your solution I was wondering whether it was somehow more reusable. You might be onto something of splitting the behavior of 'consecutive equal objects'. Combining this with Zip could result in concise syntax which is conceptually separated from the grouping. Didn't have time to attempt this. :)

Stack Exchange Network

Grouping by sequence in LINQ

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Grouping by sequence in LINQ

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions