Basically, here's the problem statement:
Given an
IEnumerable<T> source
and anIEnumerable<T> exceptions
, return anIEnumerable<T> result
which contains all ofsource
, withoutexceptions
, only omitting exact instances of a value. I.e.: if a value appears twice insource
, but once inexceptions
, thenresult
will have that value exactly one time.
Let's take an example:
Exception list:
{ 2, 2, 3, 4, 5}
Source list:
{ 2, 2, 2, 6 }
Result:
{ 2, 6 }
Code:
public static IEnumerable<T> ExceptExact<T>(this IEnumerable<T> source, IEnumerable<T> exceptions)
{
var tExceptions = new List<T>();
tExceptions.AddRange(exceptions);
var result = new List<T>();
foreach (var el in source)
{
if (tExceptions.Contains(el))
{
tExceptions.RemoveAt(tExceptions.IndexOf(el));
}
else
{
result.Add(el);
}
}
return result;
}
Usage:
var a = new List<int> { 2, 3, 5, 4, 2 }; var b = new List<int> { 2, 2, 2, 6 }; var c = b.ExceptExact(a); var d = a.ExceptExact(b); Console.WriteLine("A: { " + string.Join(", ", a) + " }"); Console.WriteLine("B: { " + string.Join(", ", b) + " }"); Console.WriteLine("C: { " + string.Join(", ", c) + " }"); Console.WriteLine("D: { " + string.Join(", ", d) + " }");
Output:
A: { 2, 3, 5, 4, 2 } B: { 2, 2, 2, 6 } C: { 2, 6 } D: { 3, 5, 4 }
5 Answers 5
Additional to the valid point from Rick Davin:
1) You can drop the Contains
because IndexOf
returns -1 if the item is not contained.
2) You could use yield to get rid of the additional list
public static IEnumerable<T> ExceptExact<T>(this IEnumerable<T> source, IEnumerable<T> exceptions)
{
var tExceptions = exceptions.ToList();
foreach (var el in source)
{
var index = tExceptions.IndexOf(el);
if (index >= 0)
{
tExceptions.RemoveAt(index);
continue;
}
yield return el;
}
}
After taking a look to MSDN (List.Remove), I figured out that there is even a simpler way:
Removes the first occurrence of a specific object from the List.
Returns true if item is successfully removed; otherwise, false.This method also returns false if item was not found in the List.
public static IEnumerable<T> ExceptExact<T>(this IEnumerable<T> source, IEnumerable<T> exceptions)
{
var tExceptions = exceptions.ToList();
return source.Where(el => !tExceptions.Remove(el));
}
-
1\$\begingroup\$ I always forget about
yield return
...thanks for pointing that out! :) \$\endgroup\$Der Kommissar– Der Kommissar2016年07月10日 20:16:31 +00:00Commented Jul 10, 2016 at 20:16 -
\$\begingroup\$ You are welcome... I found another simpler solution that doesn't even require the yield ;). \$\endgroup\$JanDotNet– JanDotNet2016年07月10日 20:34:49 +00:00Commented Jul 10, 2016 at 20:34
-
\$\begingroup\$ This is a nice trick with the remove ;-) \$\endgroup\$t3chb0t– t3chb0t2016年07月10日 21:19:03 +00:00Commented Jul 10, 2016 at 21:19
It's a short enough method that there's not much to comment upon. You could shorten these two lines:
var tExceptions = new List<T>();
tExceptions.AddRange(exceptions);
Into just one:
var tExceptions = exceptions.ToList();
Consider the case where source
and exceptions
each consist of one value, repeated n
times, e.g.
source = Enumerable.Repeat(1, 1000000)
and
exceptions = Enumerable.Repeat(1, 1000000)
Your algorithm and JanDotNet's algorithm will run in \$O(n^2)\$ time.
We can reduce this to \$O(n)\$ time by doing the following:
- Create a dictionary containing the count of each distinct value in
exceptions
- For each value in
values
:- If the dictionary contains a non-zero count for
value
:- Reduce the count of
value
in the dictionary by one
- Reduce the count of
- Otherwise:
- Yield return
value
- Yield return
- If the dictionary contains a non-zero count for
If you don't need to preserve the order of the elements a simple LINQ query like this one would give you the same results:
static class Extensions
{
public static IEnumerable<T> ExceptExact<T>(
this IEnumerable<T> x,
IEnumerable<T> y)
{
return
x
.GroupBy(n => n)
.Select(g =>
g.Skip(y.Where(m => m.Equals(g.Key)).Count()))
.SelectMany(n => n);
}
}
-
1\$\begingroup\$ +1 for an alternative approach. However IMHO it is more difficult to read than the foreach loop. \$\endgroup\$JanDotNet– JanDotNet2016年07月10日 19:35:19 +00:00Commented Jul 10, 2016 at 19:35
-
1\$\begingroup\$ I do want to preserve element order... \$\endgroup\$Der Kommissar– Der Kommissar2016年07月10日 20:18:47 +00:00Commented Jul 10, 2016 at 20:18
-
1\$\begingroup\$ I wouldn't call this LINQ query "simple". More like "wtf is going on here" :) \$\endgroup\$Nikita B– Nikita B2016年07月11日 07:28:52 +00:00Commented Jul 11, 2016 at 7:28
-
\$\begingroup\$ @NikitaB it all depends on how good you know LINQ. Theoretically every query that contains more than a single where/select isn't simple. \$\endgroup\$t3chb0t– t3chb0t2016年07月11日 07:32:43 +00:00Commented Jul 11, 2016 at 7:32
-
\$\begingroup\$ @t3chb0t, it is more about how good your query conveys your intent. I saw larger LINQ queries which were way easier to read. The fact that you replaced good variable names with a bunch of x's, y's and m's does not help either. \$\endgroup\$Nikita B– Nikita B2016年07月11日 07:45:11 +00:00Commented Jul 11, 2016 at 7:45
I guess am late to the party !!!! Most of the improvements have been suggested by @JanDotNet, @mjolka, @Rick Davin and @t3chb0t. There are two things I would like to add here is
- There is no check to ensure the source and exception enumerables are not empty before proceeding with the other computations
- I'm not criticising your approach but I believe there are smarter ways to achieve your objective here. I was a little surprised no one suggested using Except for this approach as it is more robust(can be applied to string and int)
The signature for Except is
Enumerable.Except<TSource>(IEnumerable<TSource>, IEnumerable<TSource>, IEqualityComparer<TSource>)
you can use the IEqualityComparer to create a custom Equality Comparer e.g Why Enumerable.Except() Might Not Work the Way You Might Expect. Note, using the default Except may result in your duplicate being removed e.g
Exception list:
{ 2, 2, 3, 4, 5}
Source list:
{ 2, 2, 2, 6 }
var result = source.Except(exceptions)
The result would give {6}.
The implementation of Except ensures duplicates are removed as a set class is used for the implementation. Hence the second Enumerable is added to the set and duplicates are removed when the first Enumerable is added. That's why 2 got removed giving {6). A snippet of the implementation is given below.
public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first,
IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
if (first == null)
{
throw Error.ArgumentNull("first");
}
if (second == null)
{
throw Error.ArgumentNull("second");
}
return ExceptIterator<TSource>(first, second, comparer);
}
private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first,
IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> iteratorVariable0 = new Set<TSource>(comparer);
foreach (TSource local in second)
{
iteratorVariable0.Add(local);
}
foreach (TSource iteratorVariable1 in first)
{
if (!iteratorVariable0.Add(iteratorVariable1))
{
continue;
}
yield return iteratorVariable1;
}
}
Note: This question has been answered by Marius Schulz in his github page Without method-EXTRALINQ by extending the ExceptIterator to a new Iterator called WithoutIterator which uses a hashset and linq to behave exactly the way you would want Except method to function.
tExceptions
is a temporary list of theexceptions
parameter, because I cannot delete values from the input parameter. \$\endgroup\$