1

I have to create an algorithm which returns the largest missing element in an array of size n, which ranges from values 1-k.

Example: If the array contained 1,1,3,1,3 then k=3, n=5, and the return should be 2.

If time complexity isn't taken into account, this is relatively straightforward; simply iterate through the array searching for k-1, k-2,...2, and the first time a value isn't found, return that value (or -1 if no such value exists). However, this will be a loop inside a loop, resulting in O(n*k) complexity.

I'm specifically asked to achieve O(n+k) in my result, and while I'm new to algorithm design, I believe this means my algorithm is not allowed to contain any nested loops whatsoever. None of the array sorting methods I've learned have linear worst-case times, so I can't use those, and I can't see any other way to keep n and k separate in all segments.

I've been stuck on this concept for several hours now. I'm not necessarily looking for a full answer, but if anyone could point me in the right direction, or even explain how I should be approaching O(n+k) complexity in a problem like this, I'd be very grateful.

asked May 30, 2017 at 7:44

3 Answers 3

3

C# sample code with comments:

int k = 3;
int[] data = new int[] {1, 1, 3, 1, 3};
// Step 1: collecting items presented
// O(n): we have to scan "data" array == O(n), adding item - n * O(1) == O(n)
// O(n) + n * O(1) == O(n) + O(n) == O(n) for the step 1
HashSet<int> present = new HashSet<int>(data);
// Step 2: scan 1..k items and find items missing:
// O(k) - 1..k scan + k * O(1) - checking each item
// O(k) + k * O(1) == O(k) + O(k) = O(k) for the step 2 
var missing = Enumerable
 .Range(1, k) // O(k) - scan 
 .Where(item => !present.Contains(item)) // k * O(1) == O(k) 
 .ToList(); // materialize 
// Test
Console.Write(string.Join(" ", missing));

Finally, we have step 1 followed by step 2:

O(n) + O(k) = O(n + k) 

Edit: Step 2 neither Linq nor lambda, but good old for loop:

for (int i = 1; i <= k; ++i) // O(k) - scan
 if (!present.Contains(i)) // O(1) - check
 Console.WriteLine(i);
answered May 30, 2017 at 7:58
5
  • Is it possible not to use hash? I am curious. Commented May 30, 2017 at 7:59
  • @javaLover: you can use a boolean array [1..k] instead of hash: just mark k-th item with true if item is present Commented May 30, 2017 at 8:03
  • Oh, no, another cheat XD. Commented May 30, 2017 at 8:04
  • This answer seems hopelessly too C# specific (on a question that didn't specify a language) - lambdas are far from a universal concept and make this answer much less readable to the non-C# audience than if you were just to have looped over manually. An explicit explanation of the approach outside of code also tends to make the answer much easier to understand much faster. Commented May 30, 2017 at 8:04
  • @Dukeling: Thank you! I totally agree that Linq may be difficult to read esp. for those who don't use C#; I've edited the answer. Commented May 30, 2017 at 8:09
2

You could use another array to store the occurrences without using hashset, though it would waste some memory,

public int maxMissingElement(final int arr[], final int max) {
boolean[] holder = new boolean[max + 1];
for (int i = 0; i < arr.length; i++) {
 holder[arr[i]]=true;
}
for (int i = max - 1; i >= 0; i--) {
 if (!holder[i]) {
 return i;
 }
}
return -1;
}

Time complexity: O(n+k)

Space complexity: O(k)

answered May 30, 2017 at 8:02
2
  • Thank you for the responses everyone! I've selected this as the most helpful to me because it avoids hashing, but learned from and have upvoted all of them. Commented May 30, 2017 at 8:13
  • Actually an array is just an implementation of the generic hash set concept. This answer isn't much different, though I still think it's a great answer for it's compactness. Commented May 30, 2017 at 11:32
1

First thing that comes to my mind is keeping a hash set of size k. A hash set has the property that it is created in linear-k time, and looking up data is in O(1) time.

This means that you could create a hash set (or dictionary) with a flag for each value of k. So for k = 3 you'd get a set (0 -> false, 1 -> false, 2 -> false, 3 -> false).

Then you could simply walk through the array of data (of size n) in linear-n time, and for each value simply set the corresponding record in your hash set to true.

After this process, you've ended up with a hash set that has records set to true only for the values that exist in the data set.

At the end, simply walk through the hash set once and keep an object called result for instance. Each time you find a value that is false (i.e. did not exist in the data array) you compare it to the result. If the value is bigger than result, you set result to that value.

At the end of the cycle over the hash set, the value in result will be the biggest missing value in the data set.

So is it O(k + n)?

  • We created the hash set = O(k)
  • We walked over the data set, each time checking the hash set = O(n) * O(1)
  • We walked over the hash set, comparing and updating a variable = O(k) * O(1)

We add up these time complexities as follows:

 O(k) + O(n)O(1) + O(k)*O(1)
 = 2 * O(k) + O(n) = O(k) + O(n)
 = O(k + n).

Note

Keeping a hash set to me sounds practical but not really elegant. I'm sure that there exists a way to do this without the hash set, but it meets the constraints of the problem.

answered May 30, 2017 at 8:02

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.