12

Given you have an array A[1..n] of size n, it contains elements from the set {1..n}. However, two of the elements are missing, (and perhaps two of the array elements are repeated). Find the missing elements.

Eg if n=5, A may be A[5] = {1,2,1,3,2}; and so the missing elements are {4,5}

The approach I used was:

int flag[n] = {0}; 
int i; 
for(i = 0; i < n; i++) { 
 flag[A[i]-1] = 1; 
 } 
for(i = 0; i < n; i++) { 
 if(!flag[i]) { 
 printf("missing: %d", (i+1)); 
} 

the space complexity comes to O(n). I feel this is a very kiddish and inefficient code. So could you please provide a better algo with better space and time complexity.

Mark Baker
213k34 gold badges354 silver badges390 bronze badges
asked Mar 9, 2011 at 17:49
6
  • 3
    You can't do it in less than O(n) time, because you have to look at every element at least once. However there might be some solutions that use less memory. Commented Mar 9, 2011 at 17:56
  • flag[A[i]] = 1; is a no-no! According to your post, each A[i] has a value between 1 and n; flag[n] is undefined (the indexes for flag go from 0 to n-1). Oh ... and tyou're trying to access A[0] which, according to your post, should not be considred; and you're failing to access A[n] (last A you look at is A[n-1]). OTOH, if A is defined as A[5] = {1, 2, 1, 3, 2} there is an inconsistency in the description. Commented Mar 9, 2011 at 18:09
  • oops sorry.. though you should have understood its supposed to be flag[A[i]-1] = 1; thanks will modify it.. Commented Mar 9, 2011 at 18:11
  • possible duplicates: stackoverflow.com/q/2348731/10396, and stackoverflow.com/q/2590165/10396 Commented Mar 9, 2011 at 18:28
  • Definite duplicate of Q2 in stackoverflow.com/questions/3492302/… Commented Mar 9, 2011 at 18:32

9 Answers 9

16
+100

Theoretically,

It is possible to do in O(1) space (in RAM model, i.e. O(1) words) and O(n) time even with a read-only array.

Warning: Long post with some mathematics. If you are only interested in the code and not the algorithm/proof, skip to the code section. You would need to read some parts of the algorithm section to understand the code, though.


Algorithm

Assume the missing numbers are x and y.

There are two possibilities for the array:

1) One number is repeated three times, and the remaining numbers in the array appear exactly once.

For this case, the bucketed XOR trick will work.

Do a XOR of all elements of the array with 1,2,...,n.

You end up with z = x XOR y.

There is at least one bit of z which is non-zero.

Now differentiating the elements of the array based on that bit (two buckets) do a XOR pass through the array again.

You will end up with x and y.

Once you have the x and y, you can confirm if these are indeed the missing elements.

If it so happens that the confirmation step fails, then we must have the second case:

2) Two elements repeated exactly twice and the rest appearing exactly once.

Let the two repeated elements be a and b (x and y are the missing ones).

Warning: Math ahead.

Let S_k = 1^k + 2^k + .. + n^k

For instance S_1 = n(n+1)/2, S_2 = n(n+1)(2n+1)/6 etc.

Now we compute seven things:

T_1 = Sum of the elements of the array = S_1 + a + b - x - y.
T_2 = Sum of the squares = S_2 + a^2 + b^2 - x^2 - y^2
T_3 = Sum of cubes = S_3 + a^3 + b^3 - x^3 - y^3
T_4 = Sum of fourth powers = S_4 + a^4 + b^4 - x^4 - y^4
...
T_7 = Sum of seventh powers = S_7 + a^7 + b^7 - x^7 - y^7

Note, we can use O(1) words (intsead of one) to deal with the overflow issues. (I estimate 8-10 words will be enough).

Let Ci = T_i - S_i

Now assume that a,b,x,y are the roots of the 4th degree polynomial P(z) = z^4 + pz^3 + qz^2 + rz + s

Now we try to transform the above seven equations into four linear equations in p,q,r,s.

For instance, if we do 4th Eqn + p * 3rd Eqn + q* 2nd equation + r* 1st equation

we get

C4 + p*C3 + q*C2 + r*C1 = 0

Similarly we get

C5 + p*C4 + q*C3 + r*C2 + s*C1 = 0
C6 + p*C5 + q*C4 + r*C3 + s*C2 = 0
C7 + p*C6 + q*C5 + r*C4 + s*C3 = 0

These are four linear equations in p,q,r,s which can be solved by Linear Algebra techniques like Gaussian Elimination.

Note that p,q,r,s will be rationals and so can be computed only with integer arithmetic.

Now suppose we are given a solution p,q,r,s to the above set of equations.

Consider P(z) = z^4 + pz^3 + qz^2 + rz + s.

What the above equations are saying is basically

P(a) + P(b) - P(x) - P(y) = 0
aP(a) + bP(b) - xP(x) -yP(y) = 0
a^2 P(a) + b^2 P(b) - x^2 P(x) - y^2 P(y) = 0
a^3 P(a) + b^3 P(b) - x^3 P(x) - y^3 P(y) = 0

Now the matrix

 1 1 -1 -1
 a b -x -y
 a^2 b^2 -x^2 -y^2
 a^3 b^3 -x^3 -y^3

has the same determinant as the Vandermonde matrix and thus is invertible, if a,b,x,y are distinct.

Thus we must have that P(a) = P(b) = P(x) = P(y) = 0.

Now check which of 1,2,3,...,n are roots of x^4 + px^3 + qx^2 + rx + s = 0.

Thus this is a linear time constant space algorithm.


Code

I wrote the following C# (.Net 4.0) code and it seems to work for the few samples I tried... (Note: I didn't bother catering to case 1 above).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Numerics;
namespace SOManaged
{
 class Program
 {
 static void Main(string[] args)
 {
 ulong[] inp = {1,3,2,1,2};
 ulong[] inp1 = { 1,2,3,4,5,6,7,8,
 9,10,11,13,14,15,
 16,17,18,19,20,21,5,14};
 int N = 100000;
 ulong[] inp2 = new ulong[N];
 for (ulong i = 0; i < (ulong)N; i++)
 {
 inp2[i] = i+1;
 }
 inp2[122] = 44;
 inp2[419] = 13;
 FindMissingAndRepeated(inp);
 FindMissingAndRepeated(inp1);
 FindMissingAndRepeated(inp2);
 }
 static void FindMissingAndRepeated(ulong [] nums)
 {
 BigInteger[] C = new BigInteger[8];
 // Compute the C_i
 for (int k = 0; k < 8; k++)
 {
 C[k] = 0;
 }
 BigInteger i = 1;
 BigInteger n = 0;
 for (int j = 0; j < nums.Length; j++)
 {
 n = nums[j];
 i = j + 1;
 for (int k = 1; k < 8; k++)
 {
 C[k] += i - n;
 n = n * nums[j];
 i = i * (j + 1);
 }
 }
 for (int k = 1; k <= 7; k++)
 {
 Console.Write("C[" + k.ToString() + "] = " + 
 C[k].ToString() +", ");
 }
 Console.WriteLine();
 // Solve for p,q,r,s
 BigInteger[] pqrs = new BigInteger[4];
 BigInteger[] constants = new BigInteger[4];
 BigInteger[,] matrix = new BigInteger[4, 4];
 int start = 4;
 for (int row = 0; row < 4; row++ )
 {
 constants[row] = -C[start];
 int k = start-1;
 for (int col = 0; col < 4; col++)
 {
 matrix[row, col] = C[k];
 k--;
 }
 start++;
 }
 Solve(pqrs, matrix, constants, 4);
 for (int k = 0; k < 4; k++)
 {
 Console.Write("pqrs[" + k.ToString() + "] = " 
 + pqrs[k].ToString() + ", ");
 }
 Console.WriteLine();
 // Find the roots.
 for (int k = 1; k <= nums.Length; k++)
 {
 BigInteger x = new BigInteger(k);
 BigInteger p_k = x * x * x* x + pqrs[0] * x* x * x 
 + pqrs[1] * x * x + pqrs[2] * x 
 + pqrs[3];
 if (p_k == 0)
 {
 Console.WriteLine("Found: " + k.ToString());
 }
 }
 }
 // Solve using Cramer's method.
 // matrix * pqrs = constants.
 static void Solve(BigInteger[] pqrs, BigInteger[,] matrix, 
 BigInteger[] constants, int n)
 {
 BigInteger determinant = Determinant(matrix, n);
 for (int i = 0; i < n; i++)
 {
 BigInteger[,] numerator = Replace(matrix, constants, n, i);
 BigInteger numDet = Determinant(numerator,4);
 pqrs[i] = numDet/ determinant;
 }
 }
 // Replace a column of matrix with constants.
 static BigInteger[,] Replace(BigInteger[,] matrix, 
 BigInteger[] constants, int n, int col)
 {
 BigInteger[,] newMatrix = new BigInteger[n, n];
 for (int i = 0; i < n; i++)
 {
 for (int j = 0; j < n; j++)
 {
 if (j != col)
 {
 newMatrix[i, j] = matrix[i, j];
 }
 else
 {
 newMatrix[i, j] = constants[i];
 }
 }
 }
 return newMatrix;
 }
 // Recursively compute determinant for matrix.
 static BigInteger Determinant(BigInteger[,] matrix, int n)
 {
 BigInteger determinant = new BigInteger(0);
 int multiplier = 1;
 if (n == 1)
 {
 return matrix[0,0];
 }
 for (int i = 0; i < n; i++)
 {
 BigInteger [,] subMatrix = new BigInteger[n-1,n-1];
 int row = 0;
 for (int j=1; j < n; j++)
 {
 int col = 0;
 for (int k = 0; k < n ; k++)
 {
 if (k == i)
 {
 continue;
 }
 subMatrix[row,col] = matrix[j,k];
 col++;
 }
 row++;
 }
 BigInteger subDeterminant = Determinant(subMatrix, n - 1);
 determinant += multiplier * subDeterminant * matrix[0,i];
 multiplier = -multiplier;
 }
 return determinant;
 }
 }
}

The output is

C[1] = 6, C[2] = 36, C[3] = 180, C[4] = 864, C[5] = 4116, C[6] = 19656, C[7] = 9
4380,
pqrs[0] = -12, pqrs[1] = 49, pqrs[2] = -78, pqrs[3] = 40,
Found: 1
Found: 2
Found: 4
Found: 5
C[1] = 15, C[2] = 407, C[3] = 9507, C[4] = 215951, C[5] = 4861515, C[6] = 108820
727, C[7] = 2424698067,
pqrs[0] = -53, pqrs[1] = 980, pqrs[2] = -7396, pqrs[3] = 18480,
Found: 5
Found: 12
Found: 14
Found: 22
C[1] = 486, C[2] = 189424, C[3] = 75861486, C[4] = 31342069984, C[5] = 130971109
69326, C[6] = 5492487308851024, C[7] = 2305818940736419566,
pqrs[0] = -600, pqrs[1] = 83183, pqrs[2] = -3255216, pqrs[3] = 29549520,
Found: 13
Found: 44
Found: 123
Found: 420
answered Mar 9, 2011 at 20:51
9
  • @Georg: Here is some data. For the example problem 1,1,2,2,3: c1 = 6 C2 = 36 c3 = 180 C4 = 864 C5 = 4116 C6 = 19656 C7 = 94380 Solving gives p=-12, q= 49, r = -78, s= 40. THe corresponding polynomial is (z-1)(z-2)(z-4)(z-5) Commented Mar 10, 2011 at 2:08
  • I used a calculator and this site: wims.unice.fr/wims/wims.cgi I currently don't have time to write a full program, unfortunately. Commented Mar 10, 2011 at 2:09
  • @Georg: I wrote some code and have edited the answer with it. Ideally it should be the other way round... Code requiring proof, not proof requiring code! :-) Commented Mar 11, 2011 at 7:57
  • @Moron: I've upvoted your answer long ago anyway. :) I'm quite impressed by your dedication to this answer. Commented Mar 11, 2011 at 9:31
  • 1
    @Scott: By your own logic, how will you even maintain an index into the array in O(1) space? That is why I specified the Word RAM model. If n occupies O(1) words, then so does n^7. 8-10 comes from that fact that 1^7 + 2^7 + ... + n^7 is of the order n^8. Commented Apr 22, 2011 at 4:43
9

As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too. In pseudo-code:

for i := 1 to n
 while A[A[i]] != A[i] 
 swap(A[i], A[A[i]])
 end if
end for
for i := 1 to n
 if A[i] != i then 
 print i
 end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

answered Apr 22, 2011 at 4:23
2
  • Since we are skipping the 0th index and iterating upto i=N, it seems we always need to extend given arr by 1 and copy first ele to new index. Tried for One Missing One Repeating case e.g. [4,3,5,2,2] and also for Two Missing Two Repeating case e.g. [5,3,5,2,2]. Also tried for One Missing case e.g. [4,3,5,2] based on your answer to this question, and that too seems to need k+1 additional space, instead of just k. Or did I go wrong somewhere? Commented Apr 18, 2019 at 15:40
  • @dd9chndn: This is pseudo-code using 1-based arrays (because that's how the problem was formulated). Commented Apr 19, 2019 at 13:31
5

Your solution isn't bad. Here's an alternative - Sort the list and iterate through it checking adjacent numbers. When there is a gap, print all the numbers in between. If k is the length of the array and n is the number to count to, we get O(k lg k + n) time, O(1) space

templatetypedef
375k112 gold badges948 silver badges1.1k bronze badges
answered Mar 9, 2011 at 17:52
8
  • That depends on the sorting algorithm, that usually it's already O(n), so probably it's more efficient the OP's code. Commented Mar 9, 2011 at 17:55
  • @redent84: the OPs code is already optimal, any kind of sorting algo will make it slower. Commented Mar 9, 2011 at 17:59
  • O(1) space if the input array is mutable - but then if the input array is mutable we can play tricks to make the questioner's approach O(1) space too, in O(k) time. Commented Mar 9, 2011 at 18:00
  • Depends on what optimal means, for low k and high n, this is more time efficient and it is always more space efficient.... Commented Mar 9, 2011 at 18:01
  • k can't be smaller than n - your k is n from the question, and your n is whatever number the higher of the two gaps actually is. So there is no "low k and high n". Commented Mar 9, 2011 at 18:02
1

This is bit qwirky As all your numbers are positive (by problem). I ll make the number at position i-1 a negetive if i is present in array.

int i; 
for(i = 0; i < n; i++) { 
 A[abs(A[i])-1] = -1*abs(A[abs(A[i])-1]);
 } 
for(i = 0; i < n; i++) { 
 if(A[i]>0) { 
 printf("missing: %d", i+1); 
} 

Complexity O(n), no auxiliary array user, but destroys the input array.

answered Mar 9, 2011 at 18:03
7
  • This algorithm doesn't work. Not only that, also it causes an array index out of bounds (and segmentation violation fault - if you are lucky). Commented Mar 9, 2011 at 18:10
  • I tried for your input and it gave me correct answer. You mean this is not what your interviewers expected ? Commented Mar 9, 2011 at 18:11
  • 1
    Any input different from a sequence starting at 1 and ending at N will cause the error. That's a bit risky! Commented Mar 9, 2011 at 18:11
  • @redent84 : Yeah thats correct I just followed as per given in question :) Commented Mar 9, 2011 at 18:13
  • @Zimbabao You can reset the sign of the array in the second for cycle without increasing complexity Commented Mar 9, 2011 at 18:35
1

Following is one of the approaches to identify all missing numbers when an array is known to contain only lumbers between 1 to n inclusive, without using any additional space. Time complexity is O(n).

Lets take a smallest number k such that it is not supposed to be in array therfore k = n+1 (lets call it an add factor).

first loop through each array and for every a[i] we will update a[a[i] - 1] += k; after this loop every array element contains two sets of information the number which was originally in the array element + k * (number of occurances of ith number in the array).

in the second loop you could find out how many repetations of ith number by doing an integer division of the number at each location by k. And original number at ith location would be a[i] % k;

Lets work through an example

A[5] = {1,2,1,3,2};

here (addfactor) k = 5 (array length) + 1 = 6

After fisrt loop array would look like if original element is m and occurances of ith number is O(i) resulting array element would be m + k * O(i) this element divide(integer) by k you'll get occurances of ith elemnent, and %k you'd get original array.

A = {1 + 6*2, 2 + 6*2, 1 + 6*1, 3+6*0 , 2+6*0 }
A = {13, 14, 7, 3, 2 }

Following is C# code which (I'm sorry, my C is little rusty its been a while.) could be ported to any language just by replacing Printf & scanfs.

 static void Main(string[] args)
 {
 int[] A = { 1, 2, 1, 3, 2 };
 PrintDuplicateAndMissing(A);
 Console.ReadLine();
 }
 static void PrintDuplicateAndMissing(int[] array)
 {
 int addfactor = array.Length + 1;
 for (int i = 0; i < array.Length; i++)
 {
 array[array[i] - 1] += addfactor; // -1 only if array contains from 1 to n. if it is 0 to n (this -1 is not required)
 }
 for (int i = 0; i < array.Length; i++)
 {
 if ( (array[i] / addfactor) == 0 )
 Console.WriteLine(string.Format("{0} is missing", i + 1)); // i + 1 only if array is 1 to n, if 0 to n then +1 is not required
 array[i] %= addfactor; //restore original content of the array
 }
 }
answered Mar 10, 2011 at 7:18
1

A sample code snippet for finding the missing elements without sorting the array below:

 public static void series(int[] arr) {
 for (int i = 0; i < arr.length; i++) {
 while (arr[i] != i + 1) {
 int jump = arr[arr[i] - 1];
 if (jump == arr[i]) {
 break;
 }
 arr[arr[i] - 1] = arr[i];
 arr[i] = jump;
 }
 }
 System.out.println("Missing number is ");
 for (int i = 0; i < arr.length; i++) {
 if (arr[i] != i + 1) {
 System.out.println(i + 1);
 } else {
 arr[i] = -1;
 }
 }

This code works for a series of numbers from 0 to N.

answered Feb 2, 2016 at 11:26
2
  • I think we can replace the while loop condition while ( arr[i] != i + 1 ) with while ( arr[arr[i] - 1] != arr[i] ) and then get rid of the if (jump == arr[i]) condition and directly do swap ( arr[arr[i] - 1], arr[i] ) Commented Apr 19, 2019 at 11:01
  • @Chandan Hegde yes the code can be reduced, but we have not defined the swap function here. So to avoid confusion, I have used the extra variable 'jump'. Commented Dec 21, 2019 at 2:29
0

Cycle each element 0...n-1.

x = abs(A[i]) (with i = 0...n-1);
A[x - 1] can be: 
> 0: we haven't checked the element, change its sign to negative:
 A[x - 1] = -A[x - 1]
< 0: we already found the same number

At the end of the cycle, pass each A[0...n-1]. The index of the positive elements + 1 is the missing numbers.

So if

y = abs(A[i]) > 0: i + 1 is missing.

In C#

var arr = new[] { 1, 2, 1, 2, 4 };
for (int i = 0; i < arr.Length; i++) {
 int x = Math.Abs(arr[i]);
 int y = arr[x - 1];
 if (y > 0) {
 arr[x - 1] = -arr[x - 1];
 }
}
for (int i = 0; i < arr.Length; i++) {
 int x = arr[i];
 if (x > 0) {
 Console.WriteLine("Missing {0}", i + 1);
 } else {
 arr[i] = -arr[i];
 }
}

And the array is as good as new.

answered Mar 9, 2011 at 18:26
1
  • I think his is wrong. I think he can access a negative index element. Try 2, 1, 1, 2, 4 Commented Mar 9, 2011 at 18:37
0

As we know we are looking for elements between 1 to N Create a Hash set containing 1 to N.

foreach(int i in input)
{
 if(hashset.contains(i))
 {
 hashset.delete(i);
 }
}
return "remaining elements in Hashset.";

The remaining elements in Hashset are the missing elements.

answered Mar 12, 2012 at 2:53
0

As you have given an array of n size and find the missing number when it's in a sequence.

#include<stdio.h>
main()
{
print("value of n");
scan("%d",&n);
print("enter the elements");
for(i=0;i<n;i++)
scan("%d",&a[i]);
for(i=0;i<n;i++)
{
d1[i]=a[i+1]-a[i];
temp=d1[i];
d[i]=temp;
}
for(i=0;i<n;i++)
{
if(d[i]==d[i+1]
{
c=d[i];
break;
}
}
for(i=0;i<n;i++)
b[i]=a[0]+i*c;
for(i=0;i<n;i++)
{
miss=0;
for(j=0;j<n;j++)
{
if(b[i]!=a[j])
{
miss++;
}
if(miss==n)
print("missing no. %d",b[i]);
}
}

It would find the missing when its in sequence only.

answered Nov 22, 2013 at 21:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.