19

I am running a test.

It looks like:

method 1)

List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).First();

method 2)

List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result = ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).ToArray()[0];

Why is method 2 is so slow compared to method 1?

Rick Sladkey
34.3k6 gold badges73 silver badges95 bronze badges
asked May 19, 2011 at 20:57
1
  • 2
    What's the point of converting to an array anyway? Commented May 19, 2011 at 21:01

10 Answers 10

62

You have a jar containing a thousand coins, many of which are dimes. You want a dime. Here are two methods for solving your problem:

  1. Pull coins out of the jar, one at a time, until you get a dime. Now you've got a dime.

  2. Pull coins out of the jar, one at a time, putting the dimes in another jar. If that jar turns out to be too small, move them, one at a time, to a larger jar. Keep on doing that until you have all of the dimes in the final jar. That jar is probably too big. Manufacture a jar that is exactly big enough to hold that many dimes, and then move the dimes, one at a time, to the new jar. Now start taking dimes out of that jar. Take out the first one. Now you've got a dime.

Is it now clear why method 1 is a whole lot faster than method 2?

answered May 19, 2011 at 22:50

7 Comments

OK, awesome analogy to run through that (but wouldn't the resizes use block-copy? not that this would change the performance profile at all ;p)
@Marc: The original code uses ints, which on 32 bit architectures, is the unit of atomicity. Copying a block of ints still just copies them one at a time when you get right down to it; it's just that the overhead per move is really low. On a 64 bit architecture the block copy does in fact move them over two at a time, which is even better. But still, the point is that huge amounts of memory are being copied unnecessarily if all that you want to do is find the first item.
@Eric - fair enough ;p Maybe it was hope-over-reality, but I somehow expected that anything getting down to something akin to BlockCopy would work at larger chunk sizes. My bad.
@Marc: Sadly, the hardware is what you get. There are of course really sneaky tricks to "copy" entire pages around; for example, tell the operating system to map the same hunk of physical memory to two different places in an address space; by doing so you appear to instantaneously "copy" from one buffer to another but in fact it is just two different "views" on the same physical storage.
Are there not SSE registers that would let you copy 128 (or soon 256) bits about at once? Not that it changes the core point of course, but just out of interest...
|
46

Erm... because you are creating an extra array (rather than just using the iterator). The first approach stops after the first match (Where is a non-buffered streaming API). The second loads all the matches into an array (presumably with several re-sizes), then takes the first item.

As a side note; you can create infinite sequences; the first approach would still work, the second would run forever (or explode).

It could also be:

var result ErrorCodes.First(x => ReturnedErrorCodes.Contains(x));

(that won't make it any faster, but is perhaps easier to read)

answered May 19, 2011 at 21:00

4 Comments

The second approach would definitely explode if run on an infinite series; it'll get to sizeof(T) * array.Length > int.MaxValue and throw an OutOfMemoryException. No running forever about it.
@KeithS - that depends on whether the infinite sequence are all excluded by the Where, so that becomes never-ending but never-yielding. Or indeed, any sane finite number of matches and an infinite number of "misses".
To make the query even more readable, I'd suggest: var result = ErrorCodes.First(ReturnedErrorCodes.Contains);
What would have been the case if the array (or other collection, lets say list), has already been created? would there be any performance changes?
4

Because of deferred execution.

The code ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)) doesn't return a collection of integers, instead it returns an expression that is capable of returning a stream of integers. It doesn't do any actual work until you start reading integers from it.

The ToArray method will consume the entire stream and put all the integers in an array. This means that every item in the entire list has to be compared to the error codes.

The First method on the other hand will only get the first item from the stream, and then stop reading from the stream. This will make it a lot faster, because it will stop comparing items from the list to the error codes as soon as it finds a match.

answered May 19, 2011 at 21:09

Comments

1

Because ToArray() copies the entire sequence to an array.

Method 2 has to iterate over the whole sequence to build an array, and then returns the first element.

Method 1 just iterates over enough of the sequence to find the first matching element.

answered May 19, 2011 at 21:00

Comments

1

ToArray() walks through the whole sequence it has been given and creates and array out of it.

If you don't callt ToArray(), First() lets Where() return just the first item that matches and immediatelly returns.

answered May 19, 2011 at 21:01

Comments

1

First() is complexity of O(1)

ToArray()[0] is complexity O(n)+1

answered May 19, 2011 at 21:01

4 Comments

-1. This is not correct. Both are complexity O(nm), where *n is the number of items in the integer list and m is the number of items in the error codes list. The complexity doesn't say anything about their relative performance, only how they react to changed conditions.
@Marc Gravell, @Guffa - If you use First() and not First(predicate) the first item will always mach, therefore complexity of O(1). The ToArray() Will go over the whole List, therefore complexity of O(n).
true; I phrased it poorly - I meant in combination with Where(predicate).First(), as per the question
Yes, using only First() is O(1), but that's irrelevant to this question. So, either your answer is wrong, or irrelevant. Have your pick. ;)
1
var @e = array.GetEnumerator();
// First
@e.MoveNext();
return @e.Current;
// ToArray (with yield [0] should as fast as First...)
while (@e.MoveNext() {
 yield return @e.Current;
}
answered May 19, 2011 at 21:21

Comments

0

Because in the second example, you are actually converting the IEnumerable<T> to an array, whereas in the first example, no conversion is taking place.

answered May 19, 2011 at 20:59

1 Comment

The important bit is not that it is doing some conversion, but what that conversion actually does.
0

In method 2 the entire array must be converted to an array first. Also, it seems awkward to mix array access when First() is so much more readable.

answered May 19, 2011 at 21:00

Comments

0

This makes sense, ToArray probably involves a copy, which is always going to be more expensive, since Linq can't make any guarantees about how you're going to use your array, while First() can just return the single element at the beginning.

answered May 19, 2011 at 21:01

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.