LINQ Extension methods MinBy and MaxBy

Question 1

LINQ Extension

I present a LINQ method ArgBy that returns the item of a sequence that corresponds to the predicate specified. Two additional methods MinBy, MaxBy are provided as specific cases. Its specification how to handle null is the same as the existing LINQ methods (see link).

The purpose is to replace a two-pass function:

var max = collection.Max(item => item.Value);
var argMax = collection.First(item => item.Value == max);

by a single-pass function:

var argMax = collection.MaxBy(x => x.Value);

Questions

Is this method usable in practice?
Is this correct LINQ style?
Am I compliant to the existing LINQ specification of equivalent methods?
Are there edge cases or optimisations that require more thought?

Source Code

ArgBy iterates the source sequence and tests each item against a predicate. Any match on the predicate overwrites the referenence to test against for remaining items. The last match is returned. As with the existing LINQ implementation (see reference source Min for instance) there is slightly different behavior for value types vs null-asssignable types. There is additional complexity over existing methods, since we include a key selector, and a key itself could also be either a value type or null-assignable.

public static class LinqExtension
{
 public static TSource ArgBy<TSource, TKey>(
 this IEnumerable<TSource> source, 
 Func<TSource, TKey> keySelector, 
 Func<(TKey Current, TKey Previous), bool> predicate)
 {
 if (source == null) throw new ArgumentNullException(nameof(source));
 if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
 if (predicate == null) throw new ArgumentNullException(nameof(predicate));
 var value = default(TSource);
 var key = default(TKey);
 if (value == null)
 {
 foreach (var other in source)
 {
 if (other == null) continue;
 var otherKey = keySelector(other);
 if (otherKey == null) continue;
 if (value == null || predicate((otherKey, key)))
 {
 value = other;
 key = otherKey;
 }
 }
 return value;
 }
 else
 {
 bool hasValue = false;
 foreach (var other in source)
 {
 var otherKey = keySelector(other);
 if (otherKey == null) continue;
 if (hasValue)
 {
 if (predicate((otherKey, key))) 
 {
 value = other;
 key = otherKey;
 }
 }
 else
 {
 value = other;
 key = otherKey;
 hasValue = true;
 }
 }
 if (hasValue) return value;
 throw new InvalidOperationException("Sequence contains no elements");
 }
 }
}

Two additional methods are provided to get the item with respectively minimum or maximum value given a key selector.

public static TSource MinBy<TSource, TKey>(
 this IEnumerable<TSource> source,
 Func<TSource, TKey> keySelector,
 IComparer<TKey> comparer = null)
{
 if (comparer == null) comparer = Comparer<TKey>.Default;
 return source.ArgBy(keySelector, lag => comparer.Compare(lag.Current, lag.Previous) < 0);
}
public static TSource MaxBy<TSource, TKey>(
 this IEnumerable<TSource> source,
 Func<TSource, TKey> keySelector,
 IComparer<TKey> comparer = null)
{
 if (comparer == null) comparer = Comparer<TKey>.Default;
 return source.ArgBy(keySelector, lag => comparer.Compare(lag.Current, lag.Previous) > 0);
}

Unit Tests

Let's create both a struct and class.

struct Tag
{
 public readonly ValueProvider ValueProvider;
 public readonly string Description;
 public Tag(ValueProvider valueProvider, string description) 
 => (ValueProvider, Description) = (valueProvider, description);
}
class ValueProvider : IComparable<ValueProvider>
{
 public readonly int Value;
 public ValueProvider(int value) => Value = value;
 public int CompareTo(ValueProvider other)
 {
 return Value.CompareTo(other.Value);
 }
}

And perform some basic tests:

[TestMethod]
public void MinByClass()
{
 var sequence = new[] 
 {
 null,
 new ValueProvider(1),
 new ValueProvider(0),
 null,
 new ValueProvider(2),
 };
 var min = sequence.MinBy(x => x.Value);
 var max = sequence.MaxBy(x => x.Value);
 Assert.AreEqual(0, min.Value);
 Assert.AreEqual(2, max.Value);
}
[TestMethod]
public void MinByStruct()
{
 var sequence = new[]
 {
 new Tag(new ValueProvider(1), "B"),
 new Tag(new ValueProvider(0), "A"),
 new Tag(null, "D"),
 new Tag(new ValueProvider(2), "C"),
 };
 var min = sequence.MinBy(x => x.ValueProvider);
 var max = sequence.MaxBy(x => x.ValueProvider);
 Assert.AreEqual("A", min.Description);
 Assert.AreEqual("C", max.Description);
}

Question 2

Could you add an exmple that would yield different results for Min and MinBy or Max and MaxBy? Currently they are the same and I'm having a hard time wrapping my head around it.

Question 3

@t3chb0t Have a look now, Tag is no longer comparabe, it uses a key selector now to perform the comparison.

Question 4

One of us made a mistake... I now get the ArgumentException: At least one object must implement IComparable. mhmm. I was thinkig of a test where you do both operations like source.Min(x => x); // <-- 2 and source.MinBy(x => x); // <-- 4 or I completely miss the point of these extensions hehe.

Question 5

The explanation you've just added makes it pefectly clear now ;-)

Question 6

You're a good student, I love this one var value = default(TSource); :P and I forgive you that one bool hasValue = false;

Question 7

 public static TSource ArgBy<TSource, TKey>(

MinBy and MaxBy are very intelligible names, but ArgBy doesn't say much to me. Really this is a specialised Aggregate (and I wonder whether an implementation using Aggregate might be simpler - or, indeed, whether the existence of Aggregate makes ArgBy unnecessary).

 Func<(TKey Current, TKey Previous), bool> predicate)

I presume that you chose Func<(TKey Current, TKey Previous), bool> instead of Func<TKey, TKey, bool> because it lets you name the arguments? Whatever the reason, it would be worth documenting.

 if (value == null)
 {
 foreach (var other in source)
 {
 if (other == null) continue;
 var otherKey = keySelector(other);
 if (otherKey == null) continue;

Those two continues would also be worth documenting. I might want null to sort before everything else rather than be discarded.

 if (value == null)
 {
 ...
 return value;
 }
 else
 {
 bool hasValue = false;
 ...
 if (hasValue) return value;
 throw new InvalidOperationException("Sequence contains no elements");
 }

That inconsistency is a big gotcha. I see from your reference that System.Linq does the same, but I wouldn't expect it and I definitely think it should be documented.

Also, the exception message might not be true: maybe the sequence contained elements, but their keys were null.

Note that Linq predates some of the type constraints we have now. Perhaps it would be worth handling the two cases by having two implementations, one with where TSource : class and one with where TSource : struct. That way the method-level comments don't have to case split on the type parameter (<exception cref="InvalidOperationException">Thrown when the sequence is empty and the element type is a struct.</exception>).

Question 8

(1) Func<(TKey Current, TKey Previous), bool> yes this is my way of documenting the tuple. (2) the exception message might not be true you are right, it should be consistent with LINQ and be "Sequence contains no matching elements" (3) two seperate methods for struct and class, I considered this and perhaps it would be more clean code indeed (4) it does require documentation in general, as does the existing LINQ API :p

Question 9

@dfhwze .net is probably not the best example of consistency. If everyone would try to mimic their conventions of KeyNotFoundException without naming the key (which immutables now do) we would just throw exceptions like you know, there's an error, good luck finding it, I won't tell you more about it :-]

Question 10

@t3chb0t Yeah I also feel, because I tried to be as consistent as possible with LINQ's reference source, it hurts readability and lacks documentation of specification.

Question 11

Thinking about all the possible null-handling preferences (source items null allowed, null preferred, keys null allowed, null preferred), perhaps there is a good reason LINQ does not provide a MinBy. This method would become really cluttered.

Question 12

@dfhwze, IMO the way to handle that is to include no special casing for null at all: if the caller needs to include a Where(elt => elt != null) in the chain, they can.

Question 13

I think in general MinBy and MaxBy are useful but handling all the null cases might be tricky, and indeed it is seeing the repeated logic and two similar loops. For this reason I also think that it'd be easier if we do not handle them at all but let the user decide what he wants to do with them and implement both simply with OrderBy(Descending).First()

public static TSource MinBy<TSource, TKey>
(
 this IEnumerable<TSource> source,
 Func<TSource, TKey> keySelector,
 IComparer<TKey> comparer = null
)
{
 return 
 source
 .OrderBy(s => keySelector(s), comparer ?? Comparer<TKey>.Default)
 //.OrderByDescending() // for MaxBy
 .First();
}

If the user wants nulls to be sorted then he'd just call it with .?

sequence.MinBy(x => x?.Value);

and if not, then he can filter them out:

sequence.Where(x => x != null).MinBy(x => x.Value);

Now, he has all the freedom and we no longer need to decide for him or document any possibe tricks or hidden features. If he doesn't filter nulls out and doesn't use .? it'll just throw.

Don't use SortedSet It turns out that it isn't the right tool for this job. OrderBy[Descending] however, is performing quite well (even much better than expected) and only marginally left behind by the single-pass approach.

^{(Tested with extendend benchmarks with 100k shuffled items where the filter value was a random number between 0-100)}

^obsolete

(削除) In performance critical scenarios I'd use a SortedSet with a custom wrapper-comparer (also because I'm lazy). Its complexitiy is O(log n) according to Comparative Analysis of List, HashSet and SortedSet which is better than O(n) according to What does O(log n) mean exactly? . The disadvantage might be that the set needs aditional memory to hold the items but maybe this can be neglected and the faster search outweighs the memory usage. (削除ここまで)

private class MinComparer<TSource, TKey> : IComparer<TSource>
{
 public IComparer<TKey> Comparer { get; set; }
 public Func<TSource, TKey> KeySelector { get; set; }
 public int Compare(TSource x, TSource y)
 {
 return Comparer.Compare(KeySelector(x), KeySelector(y))
 }
}
public static TSource MinBy<TSource, TKey>
(
 this IEnumerable<TSource> source,
 Func<TSource, TKey> keySelector,
 IComparer<TKey> comparer = null
)
{
 var c = new MinComparer<TSource, TKey>
 {
 KeySelector = keySelector,
 Comparer = comparer ?? Comparer<TKey>.Default,
 };
 return new SortedSet<TSource>(source, c).First(); 
}

Another MaxComparer could order the items in descending order so that we still could use First with it.

Question 14

@dfhwze there is one more possibility with a SortedSet, this one is O(log n)

Question 15

I noticed your edit, so I removed my obsolete comment :p I could re-iterate that I find your solution compact and elegant.

Question 16

@HenrikHansen I call this oops ;-] fixed

Question 17

SortedSet is not a high performance alternative: it's \$\Theta(\lg n)\$ per Add, which means that new SortedSet<TSource>(source, c) takes \$\Theta(n \lg n)\$.

Question 18

@dfhwze I tuned the posted benchmarks and there is virtually not difference between single run and OrderBy[Descending]. The difference averages to ~0.030 ms for 100k items in random order (1.209 ms vs 1.245 ms). SortedSet really sucks haha. I'll have to update the question but it's not the worst one :P MinByA and MaxByA had to be excluded from the tests because I wanted them to complete today lol

Question 19

AggByA: baseline two-pass method presented by the OP.
AggByB: one-pass method using Aggregate suggested by @PeterTaylor
AggByC: one-pass method implemented by hand

I left most error handling to the reader since I personally don't believe it is necessary, or even helpful, in something this abstracted. If the caller wants extra safety then they can easily add whatever checks they desire through the selector and predicate arguments.

Q&A:

Is this method usable in practice? Of course, it just is not ideal.
Is this correct LINQ style? Opinion-based, but I'd say 'No.' primarily because ArgBy seems to be unnecessary (as pointed out by @PeterTaylor). That said, depending on the Aggregate method ends up being slower than the naive alternative.

Benchmark Results:

Benchmark Code:

public readonly struct SomeStructure
{
 private readonly int m_count;
 private readonly string m_name;
 public int Count => m_count;
 public string Name => m_name;
 public SomeStructure(int count, string name) {
 m_count = count;
 m_name = name;
 }
}
public class Race
{
 private static SomeStructure[] m_data;
 [GlobalSetup]
 public void Setup() {
 m_data = new[] {
 new SomeStructure(0, "A"),
 new SomeStructure(1, "B"),
 new SomeStructure(2, "C"),
 };
 }
 [Benchmark]
 public SomeStructure MinByA() => Program.MinByA(m_data, a => a.Count);
 [Benchmark]
 public SomeStructure MinByB() => Program.MinByB(m_data, a => a.Count);
 [Benchmark]
 public SomeStructure MinByC() => Program.MinByC(m_data, a => a.Count);
 [Benchmark]
 public SomeStructure MaxByA() => Program.MaxByA(m_data, a => a.Count);
 [Benchmark]
 public SomeStructure MaxByB() => Program.MaxByB(m_data, a => a.Count);
 [Benchmark]
 public SomeStructure MaxByC() => Program.MaxByC(m_data, a => a.Count);
}
class Program
{
 static void Main(string[] args) {
 var summary = BenchmarkRunner.Run<Race>();
 Console.ReadKey();
 }
 public static TElement AggByA<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector, Func<TSelector, TSelector, bool> predicate, Func<Func<TElement, TSelector>, TSelector> aggregator) =>
 items.First(acc => predicate(selector(acc), aggregator(selector)));
 public static TElement AggByB<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector, Func<TSelector, TSelector, bool> predicate) =>
 items.Aggregate((acc, cur) => (predicate(selector(acc), selector(cur)) ? cur : acc));
 public static TElement AggByC<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector, Func<TSelector, TSelector, bool> predicate) {
 TElement accumulator;
 var enumerator = items.GetEnumerator();
 try {
 if (!enumerator.MoveNext()) {
 throw new InvalidOperationException("no elements to enumerate");
 }
 accumulator = enumerator.Current;
 while (enumerator.MoveNext()) {
 var current = enumerator.Current;
 if (predicate(selector(accumulator), selector(current))) {
 accumulator = current;
 }
 }
 }
 finally {
 enumerator.Dispose();
 }
 return accumulator;
 }
 public static TElement MaxByA<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector) where TSelector : IComparable<TSelector> =>
 AggByA(items, selector, (acc, cur) => (0 == acc.CompareTo(cur)), items.Max);
 public static TElement MaxByB<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector) where TSelector : IComparable<TSelector> =>
 AggByB(items, selector, (acc, cur) => (0 > acc.CompareTo(cur)));
 public static TElement MaxByC<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector) where TSelector : IComparable<TSelector> =>
 AggByC(items, selector, (acc, cur) => (0 > acc.CompareTo(cur)));
 public static TElement MinByA<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector) where TSelector : IComparable<TSelector> =>
 AggByA(items, selector, (acc, cur) => (0 == acc.CompareTo(cur)), items.Min);
 public static TElement MinByB<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector) where TSelector : IComparable<TSelector> =>
 AggByB(items, selector, (acc, cur) => (0 < acc.CompareTo(cur)));
 public static TElement MinByC<TElement, TSelector>(IEnumerable<TElement> items, Func<TElement, TSelector> selector) where TSelector : IComparable<TSelector> =>
 AggByC(items, selector, (acc, cur) => (0 < acc.CompareTo(cur)));
}

Question 20

These tests are broken. AggByC doesn't dispose the enumerator; MinByA and MaxByA both enumerate items twice. Also these tests only test three already sorted items. Run them with 100_000 shuffled items and OrderBy[Descending] wins them all or is so close to MinByC that the results are barely comparable. MinByA and MaxByA have to be excluded because they need so long. SortedSet is not the slowest one... MinByA and MaxByA are.

Question 21

+1 for creating these test but -1 for broken ones, non realistic conditions and not testing all approaches so the result is 0.

Question 22

@t3chb0t I explicitly mentioned that AggByA is two passes; it is the baseline that the OP is trying to beat. Of course it's slowest! The SortedSet implementation wastes memory and doesn't play nice with large inputs; why bother using it at all? As for the lack of dispose, a simple oversight that I have corrected.

Question 23

why bother using it at all - for the sake of science and proof :P

Question 24

@t3chb0t We don't even need to perform a benchmark in order to prove that any solution that involves extra memory is going to be inherently slower than any reasonably intelligent algorithm that accumulates "in-place." Science tells us that such implementations are objectively awful before we even write a single line of code.

Peter Taylor Peter Taylor 24.5k1 gold badge49 silver badges94 bronze badges · Accepted Answer · 2019-08-31 08:45:17Z

 public static TSource ArgBy<TSource, TKey>(

MinBy and MaxBy are very intelligible names, but ArgBy doesn't say much to me. Really this is a specialised Aggregate (and I wonder whether an implementation using Aggregate might be simpler - or, indeed, whether the existence of Aggregate makes ArgBy unnecessary).

 Func<(TKey Current, TKey Previous), bool> predicate)

I presume that you chose Func<(TKey Current, TKey Previous), bool> instead of Func<TKey, TKey, bool> because it lets you name the arguments? Whatever the reason, it would be worth documenting.

 if (value == null)
 {
 foreach (var other in source)
 {
 if (other == null) continue;
 var otherKey = keySelector(other);
 if (otherKey == null) continue;

Those two continues would also be worth documenting. I might want null to sort before everything else rather than be discarded.

 if (value == null)
 {
 ...
 return value;
 }
 else
 {
 bool hasValue = false;
 ...
 if (hasValue) return value;
 throw new InvalidOperationException("Sequence contains no elements");
 }

That inconsistency is a big gotcha. I see from your reference that System.Linq does the same, but I wouldn't expect it and I definitely think it should be documented.

Also, the exception message might not be true: maybe the sequence contained elements, but their keys were null.

Note that Linq predates some of the type constraints we have now. Perhaps it would be worth handling the two cases by having two implementations, one with where TSource : class and one with where TSource : struct. That way the method-level comments don't have to case split on the type parameter (<exception cref="InvalidOperationException">Thrown when the sequence is empty and the element type is a struct.</exception>).

(1) Func<(TKey Current, TKey Previous), bool> yes this is my way of documenting the tuple. (2) the exception message might not be true you are right, it should be consistent with LINQ and be "Sequence contains no matching elements" (3) two seperate methods for struct and class, I considered this and perhaps it would be more clean code indeed (4) it does require documentation in general, as does the existing LINQ API :p
@dfhwze .net is probably not the best example of consistency. If everyone would try to mimic their conventions of KeyNotFoundException without naming the key (which immutables now do) we would just throw exceptions like you know, there's an error, good luck finding it, I won't tell you more about it :-]
@t3chb0t Yeah I also feel, because I tried to be as consistent as possible with LINQ's reference source, it hurts readability and lacks documentation of specification.
Thinking about all the possible null-handling preferences (source items null allowed, null preferred, keys null allowed, null preferred), perhaps there is a good reason LINQ does not provide a MinBy. This method would become really cluttered.
@dfhwze, IMO the way to handle that is to include no special casing for null at all: if the caller needs to include a Where(elt => elt != null) in the chain, they can.

Stack Exchange Network

LINQ Extension methods MinBy and MaxBy

LINQ Extension

Questions

Source Code

Unit Tests

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

LINQ Extension methods MinBy and MaxBy

LINQ Extension

Questions

Source Code

Unit Tests

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions