C# LINQ First() faster than ToArray()[0]?

Question 1

I am running a test.

It looks like:

method 1)

List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).First();

method 2)

List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result = ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).ToArray()[0];

Why is method 2 is so slow compared to method 1?

Question 2

What's the point of converting to an array anyway?

Question 3

You have a jar containing a thousand coins, many of which are dimes. You want a dime. Here are two methods for solving your problem:

Pull coins out of the jar, one at a time, until you get a dime. Now you've got a dime.
Pull coins out of the jar, one at a time, putting the dimes in another jar. If that jar turns out to be too small, move them, one at a time, to a larger jar. Keep on doing that until you have all of the dimes in the final jar. That jar is probably too big. Manufacture a jar that is exactly big enough to hold that many dimes, and then move the dimes, one at a time, to the new jar. Now start taking dimes out of that jar. Take out the first one. Now you've got a dime.

Is it now clear why method 1 is a whole lot faster than method 2?

Question 4

OK, awesome analogy to run through that (but wouldn't the resizes use block-copy? not that this would change the performance profile at all ;p)

Question 5

@Marc: The original code uses ints, which on 32 bit architectures, is the unit of atomicity. Copying a block of ints still just copies them one at a time when you get right down to it; it's just that the overhead per move is really low. On a 64 bit architecture the block copy does in fact move them over two at a time, which is even better. But still, the point is that huge amounts of memory are being copied unnecessarily if all that you want to do is find the first item.

Question 6

@Eric - fair enough ;p Maybe it was hope-over-reality, but I somehow expected that anything getting down to something akin to BlockCopy would work at larger chunk sizes. My bad.

Question 7

@Marc: Sadly, the hardware is what you get. There are of course really sneaky tricks to "copy" entire pages around; for example, tell the operating system to map the same hunk of physical memory to two different places in an address space; by doing so you appear to instantaneously "copy" from one buffer to another but in fact it is just two different "views" on the same physical storage.

Question 8

Are there not SSE registers that would let you copy 128 (or soon 256) bits about at once? Not that it changes the core point of course, but just out of interest...

Question 9

Erm... because you are creating an extra array (rather than just using the iterator). The first approach stops after the first match (Where is a non-buffered streaming API). The second loads all the matches into an array (presumably with several re-sizes), then takes the first item.

As a side note; you can create infinite sequences; the first approach would still work, the second would run forever (or explode).

It could also be:

var result ErrorCodes.First(x => ReturnedErrorCodes.Contains(x));

(that won't make it any faster, but is perhaps easier to read)

Question 10

The second approach would definitely explode if run on an infinite series; it'll get to sizeof(T) * array.Length > int.MaxValue and throw an OutOfMemoryException. No running forever about it.

Question 11

@KeithS - that depends on whether the infinite sequence are all excluded by the Where, so that becomes never-ending but never-yielding. Or indeed, any sane finite number of matches and an infinite number of "misses".

Question 12

To make the query even more readable, I'd suggest: var result = ErrorCodes.First(ReturnedErrorCodes.Contains);

Question 13

What would have been the case if the array (or other collection, lets say list), has already been created? would there be any performance changes?

Question 14

Because of deferred execution.

The code ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)) doesn't return a collection of integers, instead it returns an expression that is capable of returning a stream of integers. It doesn't do any actual work until you start reading integers from it.

The ToArray method will consume the entire stream and put all the integers in an array. This means that every item in the entire list has to be compared to the error codes.

The First method on the other hand will only get the first item from the stream, and then stop reading from the stream. This will make it a lot faster, because it will stop comparing items from the list to the error codes as soon as it finds a match.

Question 15

Because ToArray() copies the entire sequence to an array.

Method 2 has to iterate over the whole sequence to build an array, and then returns the first element.

Method 1 just iterates over enough of the sequence to find the first matching element.

Question 16

ToArray() walks through the whole sequence it has been given and creates and array out of it.

If you don't callt ToArray(), First() lets Where() return just the first item that matches and immediatelly returns.

Question 17

First() is complexity of O(1)

ToArray()[0] is complexity O(n)+1

Question 18

-1. This is not correct. Both are complexity O(nm), where *n is the number of items in the integer list and m is the number of items in the error codes list. The complexity doesn't say anything about their relative performance, only how they react to changed conditions.

Question 19

@Marc Gravell, @Guffa - If you use First() and not First(predicate) the first item will always mach, therefore complexity of O(1). The ToArray() Will go over the whole List, therefore complexity of O(n).

Question 20

true; I phrased it poorly - I meant in combination with Where(predicate).First(), as per the question

Question 21

Yes, using only First() is O(1), but that's irrelevant to this question. So, either your answer is wrong, or irrelevant. Have your pick. ;)

Question 22

var @e = array.GetEnumerator();
// First
@e.MoveNext();
return @e.Current;
// ToArray (with yield [0] should as fast as First...)
while (@e.MoveNext() {
 yield return @e.Current;
}

Question 23

Because in the second example, you are actually converting the IEnumerable<T> to an array, whereas in the first example, no conversion is taking place.

Question 24

The important bit is not that it is doing some conversion, but what that conversion actually does.

Question 25

In method 2 the entire array must be converted to an array first. Also, it seems awkward to mix array access when First() is so much more readable.

Question 26

This makes sense, ToArray probably involves a copy, which is always going to be more expensive, since Linq can't make any guarantees about how you're going to use your array, while First() can just return the single element at the beginning.

score 62 · Accepted Answer · 2011-05-19 22:50:19Z

62

You have a jar containing a thousand coins, many of which are dimes. You want a dime. Here are two methods for solving your problem:

Pull coins out of the jar, one at a time, until you get a dime. Now you've got a dime.
Pull coins out of the jar, one at a time, putting the dimes in another jar. If that jar turns out to be too small, move them, one at a time, to a larger jar. Keep on doing that until you have all of the dimes in the final jar. That jar is probably too big. Manufacture a jar that is exactly big enough to hold that many dimes, and then move the dimes, one at a time, to the new jar. Now start taking dimes out of that jar. Take out the first one. Now you've got a dime.

Is it now clear why method 1 is a whole lot faster than method 2?

Share

Improve this answer

answered May 19, 2011 at 22:50

Eric Lippert's user avatar

Eric Lippert Eric Lippert

663k184 gold badges1.3k silver badges2.1k bronze badges

7 Comments

Marc Gravell

Marc Gravell Over a year ago

OK, awesome analogy to run through that (but wouldn't the resizes use block-copy? not that this would change the performance profile at all ;p)

2011年05月20日T05:05:35.14Z+00:00

Eric Lippert

Eric Lippert Over a year ago

@Marc: The original code uses ints, which on 32 bit architectures, is the unit of atomicity. Copying a block of ints still just copies them one at a time when you get right down to it; it's just that the overhead per move is really low. On a 64 bit architecture the block copy does in fact move them over two at a time, which is even better. But still, the point is that huge amounts of memory are being copied unnecessarily if all that you want to do is find the first item.

2011年05月23日T20:47:35.917Z+00:00

Marc Gravell

Marc Gravell Over a year ago

@Eric - fair enough ;p Maybe it was hope-over-reality, but I somehow expected that anything getting down to something akin to BlockCopy would work at larger chunk sizes. My bad.

2011年05月23日T20:50:39.32Z+00:00

Eric Lippert

Eric Lippert Over a year ago

@Marc: Sadly, the hardware is what you get. There are of course really sneaky tricks to "copy" entire pages around; for example, tell the operating system to map the same hunk of physical memory to two different places in an address space; by doing so you appear to instantaneously "copy" from one buffer to another but in fact it is just two different "views" on the same physical storage.

2011年05月23日T21:37:33.45Z+00:00

ShuggyCoUk

ShuggyCoUk Over a year ago

Are there not SSE registers that would let you copy 128 (or soon 256) bits about at once? Not that it changes the core point of course, but just out of interest...

2011年05月24日T08:45:26.53Z+00:00

|

CollectivesTM on Stack Overflow

C# LINQ First() faster than ToArray()[0]?

10 Answers 10

7 Comments

4 Comments

Comments

Comments

Comments

4 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

10 Answers 10

7 Comments

4 Comments

Comments

Comments

Comments

4 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related