I am reading this post on Big-O
It says that the following code is O(n^2):
bool ContainsDuplicates(String[] strings)
{
for(int i = 0; i < strings.Length; i++)
{
for(int j = 0; j < strings.Length; j++)
{
if(i == j) // Don't compare with self
{
continue;
}
if(strings[i] == strings[j])
{
return true;
}
}
}
return false;
}
But I can not understand why.
The inner loop does something constant.
So it is summation of 1....N of a constant. i.e. constant number O(1).
The outer loop is summation over the O(1).
So I would imagine it is n*O(1).
I think I am misunderstanding something here.
I do not think that all nested loops mean O(n^2), right?
4 Answers 4
Your mistake is with the inner loop. It does something constant n times, so it is O(n). The outer loop does the inner loop n times, so it is O(n × n), or O(n2 ).
In general, if the number of iterations a loop makes is dependant on the size of the input, it is O(n). And if k such loops are nested, it is O(nk ).
-
1perhaps an easier to understand notation would be to say that the algorithm runs in O(i*j). Here both loops iterate over the length f String, thus i = j = n where n is strings.lengthNewtopian– Newtopian2011年09月26日 02:20:21 +00:00Commented Sep 26, 2011 at 2:20
-
1@Newtopian: You usually assume that the strings are about constant in length; that's actually a pretty good approximation in practice.Donal Fellows– Donal Fellows2011年09月26日 13:10:37 +00:00Commented Sep 26, 2011 at 13:10
-
@Donal indeed, we can, with proper data set knowledge make such assumptions, but as always there are exceptions, DNA analysis comes to mind where string length can be a few characters in length for a codon to multi mega bytes for a full chromosome. That said in this case in particular example the string length is of no consequence since we iterate the strings themselves, not the characters. That is, unless the equality operator has been overloaded to some funky logic depending on length of the operands. I doubt however the OP intended to dig this far in his analysis.Newtopian– Newtopian2011年09月27日 03:36:04 +00:00Commented Sep 27, 2011 at 3:36
-
@Newtopian: To be fair, the
ContainsDuplicates
function/method from the question isn't very likely to be used with full chromosomes.Donal Fellows– Donal Fellows2011年09月30日 12:34:27 +00:00Commented Sep 30, 2011 at 12:34 -
@Donal : probably not no :-)Newtopian– Newtopian2011年09月30日 17:59:14 +00:00Commented Sep 30, 2011 at 17:59
If the length of the string is n
, the test if i == j
will execute n^2 times. The order of an algorithm is the order of whatever part of it has the highest order. Since 'i' is compared to 'j' n^2 times, the order of the algorithm cannot possibly be less than O(n^2)
.
For a very large 'n', doubling 'n' will quadruple the run time.
-
But isn't the test
i==j
an O(1)?user10326– user103262011年09月25日 21:28:30 +00:00Commented Sep 25, 2011 at 21:28 -
5The test itself is O(1). The inner loop executes the test 'n' times, so it's O(n*1). The outer loop executes the inner loop 'n' times, so it's O(n*n*1). So the code overall is O(n^2). If you execute an O(1) step n^2 times, that's O(n^2).David Schwartz– David Schwartz2011年09月25日 21:30:46 +00:00Commented Sep 25, 2011 at 21:30
-
1Minor nit: actually for any value of N doubling it will quadruple the run time -- it's just that it doesn't matter that much for small N.Peter Rowell– Peter Rowell2011年09月25日 21:31:28 +00:00Commented Sep 25, 2011 at 21:31
-
@Peter: That's not true. For example, going from n=2 to n=4 may not double run time because a 32-bit machine may use the same number of memory fetches to read 2 bytes as 4. Similarly, for small n, the overhead of entering the function may be significant, and that's constant. Larger n may have better branch prediction on average. And so on.David Schwartz– David Schwartz2011年09月25日 21:32:51 +00:00Commented Sep 25, 2011 at 21:32
-
5In this context, "bery large 'n'" means as 'n' tends toward infinity, but ignoring any inefficiencies that may occur due to large 'n's (such as not fitting in a 32-bit or 64-bit integer). The point of big-O notation is to analyze how an algorithm's performance changes as 'n' increases, so long as it stays within the machine's capabilities. (This may be a quibble in this context, but in general, understanding what big-O notation includes and ignores is important to understanding what it means.)David Schwartz– David Schwartz2011年09月26日 03:41:53 +00:00Commented Sep 26, 2011 at 3:41
You are misinterpreting what a constant operation means.
A constant operation is an operation which always executes in fixed amount of time independent of input it receives.
i == j is a constant operation because it executes this statement in fixed amount of time. Lets say it takes 1 unit of time.
But this constant operation is performed (no of values of i)*(no of values of j) times. Lets say i and j are run 10 times each. Then by calculation it takes 100 unit of time for completion of i==j when in a nested loop. So it will vary as the values of i and j vary.
We can be sure that i==j will be done in 1 unit time but we cannot know how many times i==j will be performed.
If it is performed n times then it will take n units of time. Outer loop executes inner loop n times. So in essence i==j operations is done n^2 times.
All nested loops mean O(n^(no of nested loops)). Here O means the upper limit which means code will execute in less than or equal to the value of O(). This is the guarantee that it will not take longer than that but it can take less time not larger.
Nested loops run O(i1 * i2 * ... * in), where n is the number of nested loops and ix is the number of iterations in loop x. (Or, to put that another way, it's the product of the number of iterations in each of the loops.)
Your pair of loops iterates over the same array twice, which by coincidence is O(n * n) or O(n2).
What happens inside the innermost loop is treated as constant because it's going to happen every time. Big-O notation isn't really about measuring the performance of linear code, it's more about making relative comparisons of how iterative algorithms behave when given n items of input to deal with. Certain operations that don't actually do any iterating (such as a push or pop on a stack) get dinged as O(1) because they do handle one element of the data.
-
1not true if i_k is not constant: think about sweepline algos worst case they are O(n^2) best case they are O(n) (not counting initial sort) with some inner-loops being over all or over none depending on the problem. also quicksort implemented without recursion is a nested loop yet it still is O(n * log n)ratchet freak– ratchet freak2011年09月26日 00:20:06 +00:00Commented Sep 26, 2011 at 0:20
-
Quicksort with random pivot selection is nondeterministic, which means the O(n log n) figure is an average. It still fits what I said: i1 = n and i2 = log n.Blrfl– Blrfl2011年09月26日 11:43:05 +00:00Commented Sep 26, 2011 at 11:43
Explore related questions
See similar questions with these tags.
i+1
instead of0
. As written, in the worst case (no dupes) 1/2 the comparisons are redundant.O(N)
, this code would actually beO(N^2 * M)
where M is the length of the strings.