In order to divide an IEnumerable
in C# by a predicate, I implemented the following (in a static class):
public static IEnumerable<IEnumerable<T>> SplitBeforeIf<T>(this IEnumerable<T> input, Func<T, bool> pred)
{
//This exists to store the predicate satisfier, which I kept as
//the first T in the inner IEnumerable (except, maybe the first one).
T predSatis = default(T);
while(input.Any())
{
var temp = input.TakeWhile(v => !pred(v));
//I was told that the below is the best test for default-ness.
//(I assume that pred(default(T)) is always false.)
if(predSatis == null || predSatis.Equals(default(T)))
yield return temp;
else
yield return (new List<T>{predSatis}).Concat(temp);
input = input.SkipWhile(v => !pred(v));
predSatis = input.FirstOrDefault();
input = input.Skip(1);
}
if(predSatis != null && !predSatis.Equals(default(T)))
yield return new List<T>{predSatis};
}
This returns as follows:
new List<int>{1,2,3,4,5}.SplitBeforeIf(v => v % 2 == 0)
gives{{1}, {2,3}, {4,5}}
new List<int>{6,1,2,3,4,5}.SplitBeforeIf(v => v % 2 == 0)
gives{{}, {6, 1}, {2,3}, {4, 5}}
new List<int>{1,2,3,4,5,6}.SplitBeforeIf(v => v % 2 == 0)
gives{{1}, {2,3}, {4,5}, {6}}
.
I have 3 questions (besides any other comments on the code):
- I see that, despite my best efforts, this is \$O(n^2)\$. Why? Can this be improved for a generic
IEnumerable
? - This does not support a stateful predicate or a stateful
IEnumerable
input. Can this be remedied? - This looks awfully like Clojure's partition-by. How does Clojure avoid the first issue (it may also have the second, but functional programming dislikes state anyway)?
Being a little pedantic about these things, I also would prefer solutions that don't accumulate values into another data structure (just to see if full laziness is possible).
4 Answers 4
So I thought I'd try this out with a test enumerable.
public class NumberValues : IEnumerable<int>
{
public NumberValues(int startValue, int endValue)
{
_endValue = endValue;
_startValue = startValue;
counter = 0;
}
public NumberValues(int endValue) : this(0, 10) { }
public int counter { get; set; }
public IEnumerator<int> GetEnumerator()
{
var iterator = _startValue;
while (iterator < _endValue)
{
Trace.Write(iterator + ",");
iterator++;
counter++;
yield return iterator;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
followed closely by
[TestCase(0,5)]
[TestCase(1,5)]
[TestCase(0,6)]
[TestCase(1,6)]
[TestCase(0,10)]
[TestCase(1,10)]
public void TestSkipBeforeIf(int start, int end)
{
var numberValue = new NumberValues(start,end);
numberValue.SplitBeforeIf(i => i%2 == 0).ToList();
}
which results in the following output.
0,5 > 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4, - 25
1,5 > 1,1,1,2,1,2,3,1,2,3,4,1,2,3,4,1,2,3,4, - 19
0,6 > 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5, - 27
1,6 > 1,1,1,2,1,2,3,1,2,3,4,1,2,3,4,5,1,2,3,4,5, - 21
0,10 > 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5,6,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,8,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9, - 65
1,10 > 1,1,1,2,1,2,3,1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,1,2,3,,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9, - 55
0,20 > 230
0,40 > 860
0,100 > 5150
But why?
while (input.Any())
{
var temp = input.TakeWhile(v => !pred(v));
if (predSatis == null || predSatis.Equals(default(T)))
yield return temp;
else
yield return (new List<T> { predSatis }).Concat(temp);
input = input.SkipWhile(v => !pred(v));
predSatis = input.FirstOrDefault();
input = input.Skip(1);
}
if you evaluate your 2 lines
input = input.SkipWhile(v => !pred(v));
predSatis = input.FirstOrDefault();
over iterations of your while loop, it is necessary to expand the input variable for each iteration.
input.SkipWhile(pred).FirstOrDefault
input.SkipWhile(pred).SkipWhile(pred).FirstOrDefault
input.SkipWhile(pred).SkipWhile(pred).SkipWhile(pred).FirstOrDefault
input.SkipWhile(pred).SkipWhile(pred).SkipWhile(pred).SkipWhile(pred).FirstOrDefault
having seen this, we now consider the evaluation of input.SkipWhile(!pred)
so over 100 iterations
pred = x => x % 2 == 0 : 5150
pred = x => x % 3 == 0 : 3600
pred = x => x % 4 == 0 : 2625
pred = x => x % 5 == 0 : 2120
pred = x => x % 6 == 0 : 1849
pred = x => x % 15 == 0 : 837
pred = x => x % 20 == 0 : 605
pred = x => x % 25 == 0 : 504
pred = x => x % 30 == 0 : 564
pred = x => x % 30 == 0 : 413
Your functions behaviour is therefore dependent not just on the size of your data, but on the form of your predicate.
For your own analysis, this is what I used for evaluating the predicate
[TestCase(1,5000,10)]
public void TestSkipBeforeIf(int start, int end, int mod)
{
var numberValue = new NumberValues(start,end);
numberValue.SplitBeforeIf(i => i%mod == 0).ToList();
}
I'm not clear what you meant by stateful predicate or IEnumerable, however in terms of the IEnumerable I have maintained a count as state. Similarly I could maintain a count for the pred, or add some bizarre checking behaviour.
Func<object, int, bool> predStateful = (y,x) =>
{
if (x == y.Check(x))
{
return false;
}
return x%mod == 0;
};
Func<int, bool> pred = x => predStateful(MyCheckingObject,x);
numberValue.SplitBeforeIf(pred).ToList();
Thanks for the interesting question.
EDIT: Modified Chunk function using a predicate. original
public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(
this IEnumerable<TValue> values,
Func<TValue,bool> pred)
{
using (var enumerator = values.GetEnumerator())
{
while (enumerator.MoveNext())
{
yield return GetChunk(enumerator, pred).ToList();
}
}
}
private static IEnumerable<T> GetChunk<T>(
IEnumerator<T> enumerator,
Func<T,bool> pred)
{
do
{
yield return enumerator.Current;
} while (!pred(enumerator.Current) && enumerator.MoveNext());
}
-
\$\begingroup\$ Thank you very much for the interesting points! I meant a similar thing by "stateful predicate" as you've seen for the "stateful
IEnumerable
" - if the predicate captures variables (see blogs.msdn.microsoft.com/matt/2008/03/01/…). \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月12日 13:41:48 +00:00Commented Jul 12, 2016 at 13:41 -
\$\begingroup\$ here is a function that returns lazily stackoverflow.com/questions/12389203/… \$\endgroup\$Gareth– Gareth2016年07月12日 18:38:01 +00:00Commented Jul 12, 2016 at 18:38
-
\$\begingroup\$ I like the idea behind that SO thread. Unfortunately, I have to use the predicate. \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月12日 19:00:59 +00:00Commented Jul 12, 2016 at 19:00
-
\$\begingroup\$ the predicate is easy to add, by just replacing
int size
withFunc<int,bool> pred
\$\endgroup\$Gareth– Gareth2016年07月13日 08:17:51 +00:00Commented Jul 13, 2016 at 8:17 -
\$\begingroup\$ I think the issue is that the chunk size is fixed and I'd like the predicate on the value type of the
IEnumerable
. Either way, I think my code turns out to be much like stackoverflow.com/a/23652544/5292630 \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月13日 11:58:57 +00:00Commented Jul 13, 2016 at 11:58
I can't really answer your questions but I can show you a simpler (easier to read) way to achieve the same result :
public static IEnumerable<IEnumerable<T>> SplitBeforeIf<T> (
this IEnumerable<T> source, Func<T, bool> predicate)
{
var temp = new List<T> ();
foreach (var item in source)
if (predicate (item))
{
if (temp.Any ())
yield return temp;
temp = new List<T> { item };
}
else
temp.Add (item);
yield return temp;
}
Edit
If you want something with no intermediate data structure and with O(n) complexity, I think this could do the job.
Note that it's a bit convoluted ; I tried to make a class with a state machine like the ones used internally by Linq [WhereEnumerableIterator for example] but without success
It could be "simpler" (all state contained in one single method) with VB.Net because it has lambda Iterator
static class Extensions
{
private enum EnumerationState
{
Ended = -1,
Started = 0,
Enumerating
}
private static EnumerationState currentState;
public static IEnumerable<IEnumerable<T>> SplitBeforeIf<T> (
this IEnumerable<T> source, Func<T, bool> predicate)
{
using (var enumerator = source.GetEnumerator ())
{
currentState = EnumerationState.Started;
var counter = 0;
while (currentState != EnumerationState.Ended)
{
++counter;
yield return SplitBeforeIfInternal (enumerator, predicate, counter);
}
}
}
private static IEnumerable<T> SplitBeforeIfInternal<T> (IEnumerator<T> enumerator, Func<T, bool> predicate, int amountToSkip)
{
while (amountToSkip > 0)
{
--amountToSkip;
if (currentState == EnumerationState.Enumerating && amountToSkip == 0)
yield return enumerator.Current;
bool hasMoved;
while ((hasMoved = enumerator.MoveNext ()) && !predicate (enumerator.Current))
if (amountToSkip == 0)
yield return enumerator.Current;
currentState = hasMoved ? EnumerationState.Enumerating : EnumerationState.Ended;
}
}
}
I join the VB.Net equivalent just for informatory purpose :
Module Extensions
Private Enum EnumerationState
Ended = -1
Started = 0
Enumerating
End Enum
<Runtime.CompilerServices.Extension>
Iterator Function SplitBeforeIf(Of T)(source As IEnumerable(Of T), predicate As Func(Of T, Boolean)) As IEnumerable(Of IEnumerable(Of T))
Using e = source.GetEnumerator
Dim state = EnumerationState.Started
Dim amountToSkip = 0
While state <> EnumerationState.Ended
amountToSkip += 1
Yield Iterator Function()
While amountToSkip > 0
amountToSkip -= 1
If state = EnumerationState.Enumerating AndAlso amountToSkip = 0 Then Yield e.Current
Dim hasMoved = e.MoveNext
While hasMoved AndAlso Not predicate(e.Current)
If amountToSkip = 0 Then Yield e.Current
hasMoved = e.MoveNext
End While
state = If(hasMoved, EnumerationState.Enumerating, EnumerationState.Ended)
End Function()
End While
End Using
End Function
End Module
-
\$\begingroup\$ This was one of my implementations. I just wanted to force myself to use lazy sequences and IEnumerables instead of having a List. (Sorry for not mentioning that.) Technically, I think this fixes question 1 everywhere and 2 for a stateful predicate. \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月09日 13:43:12 +00:00Commented Jul 9, 2016 at 13:43
-
\$\begingroup\$ @HemanGandhi having a backing array (what
List
essentially is) does not prevent it from being lazy.yield
is what makes it lazy. \$\endgroup\$asibahi– asibahi2016年07月09日 13:57:55 +00:00Commented Jul 9, 2016 at 13:57 -
\$\begingroup\$ @AbdulRahmanSibahi My point was that I wanted the inner lists to also be lazy. \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月09日 16:39:14 +00:00Commented Jul 9, 2016 at 16:39
-
\$\begingroup\$ I really like this idea. Unfortunately, it seems that unless you enumerate the previous inner values, you won't progress through the IEnumerable. (ie. SplitBeforeIf().ElementAt(n)) returns the same thing for any n.) I wonder if this can be altered? \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月11日 13:31:34 +00:00Commented Jul 11, 2016 at 13:31
-
1\$\begingroup\$ @Mat'sMug I choose
Func
only because it's what the other "Linq" method do too ;) \$\endgroup\$Sehnsucht– Sehnsucht2016年07月11日 15:48:15 +00:00Commented Jul 11, 2016 at 15:48
because it is a public method you should validate the arguments at least against
null
, but according to Ben Aaronson's commentYou shouldn't do guard checks like this inside an iterator method because they'll only get checked when the code starts iterating, rather than when the method is called.
you should then call a separate method where the Enumeration will take place.
omitting braces
{}
although they might be optional can lead to serious bugs. I would like to encourage you to always use them to make your code less error prone and more readable.names like
predSatis
which needs comments liketo store the predicate satisfier
are a sign that the variables are named bad. If you have the feeling that you need a variable you should just name it well although it will involve more typing but the readability of the code increases a lot. If you had named thispredicateSatisfier
it would have been clear what it is about. That is true forpred
parameter as well.(削除)if(predSatis == null || predSatis.Equals(default(T)))
well, the first check is superflous because for value types it won't do anything and for reference typesdefault(T)
equals tonull
. So better just useif (predSatis.Equals(default(T)))
(削除ここまで)Linq methods are a great way to reduce typing and a lot of devs think it is easier to read. But using 5 different linq methods to do this job of which 2 are using the same predicate is over the top IMO.
Using just a plain foreach with the support of List<T>
will run in O(N)
public static IEnumerable<IEnumerable<T>> SplitBeforeIf<T>(this IEnumerable<T> input, Func<T, bool> predicate)
{
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
return SafeSplitBeforeIf(input, predicate);
}
private static IEnumerable<IEnumerable<T>> SafeSplitBeforeIf<T>(this IEnumerable<T> input, Func<T, bool> predicate)
{
var result = new List<T>();
foreach (var item in input)
{
if (predicate(item))
{
yield return result;
result.Clear();
}
result.Add(item);
}
if (result.Count > 0)
{
yield return result;
}
}
-
\$\begingroup\$ I had said that I didn't want a List in order to make the inner lists lazy as well (a part of my pedantry). Also, I compared against null because I was getting some sort of null pointer exception without it. \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月11日 17:01:02 +00:00Commented Jul 11, 2016 at 17:01
-
\$\begingroup\$ @HemanGandhi you are right about the
null
check. I have edited my answer to address the lazyness of the inner enumerables as well. \$\endgroup\$Heslacher– Heslacher2016年07月12日 04:49:21 +00:00Commented Jul 12, 2016 at 4:49 -
1\$\begingroup\$ @Heslacher You're not actually lazily computing the inner values. You're eagerly generating all of them, you're just creating a sequence that's super inefficient to enumerate with all of the nested concats and by wrapping every single item in an array. Your solution is just strictly worse than if you had used a list. \$\endgroup\$Servy– Servy2016年07月12日 13:28:08 +00:00Commented Jul 12, 2016 at 13:28
-
1\$\begingroup\$ @Heslacher it's always better then the
CodeOverviewException
;-P \$\endgroup\$t3chb0t– t3chb0t2016年07月12日 15:42:05 +00:00Commented Jul 12, 2016 at 15:42 -
2\$\begingroup\$ You shouldn't do guard checks like this inside an iterator method because they'll only get checked when the code starts iterating, rather than when the method is called. MS splits LINQ methods for this reason: referencesource.microsoft.com/#System.Core/System/Linq/… \$\endgroup\$Ben Aaronson– Ben Aaronson2016年07月21日 08:21:52 +00:00Commented Jul 21, 2016 at 8:21
I can't help you with your questions, but below you'll find another implementation which avoids instatiation of new lists. The penalty is though the lst.Count(), but if the frequence of elements satisfying the predicate is high it may be ok?
public static IEnumerable<IEnumerable<T>> SplitBeforeIf2<T>(this IEnumerable<T> input, Func<T, bool> pred)
{
var lst = input.TakeWhile((x, i) => i == 0 ? true : !pred(x));
var count = lst.Count();
if (count > 0)
{
yield return lst;
foreach (var l in input.Skip(count).SplitBeforeIf2(pred))
yield return l;
}
}
As the above solution has a rather poor performance for large datasets, I went on and found the solution below which seems to be more effecient:
public static IEnumerable<IEnumerable<T>> SplitBeforeIf4<T>(this IEnumerable<T> input, Func<T, bool> pred)
{
int count = 0;
IGrouping<bool, T> group = null;
while ((group = input.Skip(count).TakeWhile((x, i) => i == 0 ? true : !pred(x)).GroupBy(x => true).FirstOrDefault()) != null)
{
yield return group.Select(x => x);
count += group.Count();
}
}
-
\$\begingroup\$ Shouldn't the
yield return lst
come outside the if statement? Also, I love the recursion! \$\endgroup\$Heman Gandhi– Heman Gandhi2016年07月09日 16:41:44 +00:00Commented Jul 9, 2016 at 16:41 -
1\$\begingroup\$ @HemanGandhi placing the yield return lst inside the if statement prevents the last empty selection to be returned. \$\endgroup\$user73941– user739412016年07月09日 16:44:36 +00:00Commented Jul 9, 2016 at 16:44
1, 2, 3, 4, 5 => {1}{2,3}{4,5}
;0, 1, 2, 3, 4, 5 => {}{1}{2,3}{4,5}
;1, 2, 3, 4, 5, 6 => {1}{2,3}{4,5}{6}
;2, 3, 4, 5 => {}{2,3}{4,5}
\$\endgroup\$O(N^2)
in complexity, just run it with input size doubling and see if the runtime doubles or quadruples. If it doubles it is O(n), if it quadruples it is O(n^2) \$\endgroup\$