I have to create an algorithm which returns the largest missing element in an array of size n, which ranges from values 1-k.
Example: If the array contained 1,1,3,1,3 then k=3, n=5, and the return should be 2.
If time complexity isn't taken into account, this is relatively straightforward; simply iterate through the array searching for k-1, k-2,...2, and the first time a value isn't found, return that value (or -1 if no such value exists). However, this will be a loop inside a loop, resulting in O(n*k) complexity.
I'm specifically asked to achieve O(n+k) in my result, and while I'm new to algorithm design, I believe this means my algorithm is not allowed to contain any nested loops whatsoever. None of the array sorting methods I've learned have linear worst-case times, so I can't use those, and I can't see any other way to keep n and k separate in all segments.
I've been stuck on this concept for several hours now. I'm not necessarily looking for a full answer, but if anyone could point me in the right direction, or even explain how I should be approaching O(n+k) complexity in a problem like this, I'd be very grateful.
3 Answers 3
C# sample code with comments:
int k = 3;
int[] data = new int[] {1, 1, 3, 1, 3};
// Step 1: collecting items presented
// O(n): we have to scan "data" array == O(n), adding item - n * O(1) == O(n)
// O(n) + n * O(1) == O(n) + O(n) == O(n) for the step 1
HashSet<int> present = new HashSet<int>(data);
// Step 2: scan 1..k items and find items missing:
// O(k) - 1..k scan + k * O(1) - checking each item
// O(k) + k * O(1) == O(k) + O(k) = O(k) for the step 2
var missing = Enumerable
.Range(1, k) // O(k) - scan
.Where(item => !present.Contains(item)) // k * O(1) == O(k)
.ToList(); // materialize
// Test
Console.Write(string.Join(" ", missing));
Finally, we have step 1 followed by step 2:
O(n) + O(k) = O(n + k)
Edit: Step 2 neither Linq nor lambda, but good old for loop:
for (int i = 1; i <= k; ++i) // O(k) - scan
if (!present.Contains(i)) // O(1) - check
Console.WriteLine(i);
-
Is it possible not to use hash? I am curious.javaLover– javaLover05/30/2017 07:59:55Commented May 30, 2017 at 7:59
-
@javaLover: you can use a boolean array
[1..k]
instead of hash: just mark k-th item withtrue
if item is presentDmitrii Bychenko– Dmitrii Bychenko05/30/2017 08:03:44Commented May 30, 2017 at 8:03 -
-
This answer seems hopelessly too C# specific (on a question that didn't specify a language) - lambdas are far from a universal concept and make this answer much less readable to the non-C# audience than if you were just to have looped over manually. An explicit explanation of the approach outside of code also tends to make the answer much easier to understand much faster.Bernhard Barker– Bernhard Barker05/30/2017 08:04:56Commented May 30, 2017 at 8:04
-
@Dukeling: Thank you! I totally agree that Linq may be difficult to read esp. for those who don't use C#; I've edited the answer.Dmitrii Bychenko– Dmitrii Bychenko05/30/2017 08:09:26Commented May 30, 2017 at 8:09
You could use another array to store the occurrences without using hashset, though it would waste some memory,
public int maxMissingElement(final int arr[], final int max) {
boolean[] holder = new boolean[max + 1];
for (int i = 0; i < arr.length; i++) {
holder[arr[i]]=true;
}
for (int i = max - 1; i >= 0; i--) {
if (!holder[i]) {
return i;
}
}
return -1;
}
Time complexity: O(n+k)
Space complexity: O(k)
-
Thank you for the responses everyone! I've selected this as the most helpful to me because it avoids hashing, but learned from and have upvoted all of them.MMMMMCK– MMMMMCK05/30/2017 08:13:58Commented May 30, 2017 at 8:13
-
Actually an array is just an implementation of the generic hash set concept. This answer isn't much different, though I still think it's a great answer for it's compactness.Glubus– Glubus05/30/2017 11:32:22Commented May 30, 2017 at 11:32
First thing that comes to my mind is keeping a hash set of size k
. A hash set has the property that it is created in linear-k time, and looking up data is in O(1)
time.
This means that you could create a hash set (or dictionary) with a flag for each value of k. So for k = 3
you'd get a set (0 -> false, 1 -> false, 2 -> false, 3 -> false)
.
Then you could simply walk through the array of data (of size n) in linear-n time, and for each value simply set the corresponding record in your hash set to true.
After this process, you've ended up with a hash set that has records set to true only for the values that exist in the data set.
At the end, simply walk through the hash set once and keep an object called result
for instance. Each time you find a value that is false
(i.e. did not exist in the data array) you compare it to the result
. If the value is bigger than result
, you set result
to that value.
At the end of the cycle over the hash set, the value in result
will be the biggest missing value in the data set.
So is it O(k + n)
?
- We created the hash set = O(k)
- We walked over the data set, each time checking the hash set = O(n) * O(1)
- We walked over the hash set, comparing and updating a variable = O(k) * O(1)
We add up these time complexities as follows:
O(k) + O(n)O(1) + O(k)*O(1)
= 2 * O(k) + O(n) = O(k) + O(n)
= O(k + n).
Note
Keeping a hash set to me sounds practical but not really elegant. I'm sure that there exists a way to do this without the hash set, but it meets the constraints of the problem.