Sort, & find out duplicates and missing elements in an array

Question 1

this code is for return duplicates after sorted them, and also to return the missed value between the limits of the given array, it's executed correctly, but i need to optimize it so it excecute in less time than it execute can any one help?

function findDupsMiss(arr) {
 // your code here
 var noDuplicates = [];
 var missedNum;
 var duplicates = [];
 var sortedDuplicates =[]
arr.forEach((el, i) => {
 if(noDuplicates.includes(el) == false){
 noDuplicates.push(el)
 }
})
var sortedArr = noDuplicates.sort((a,b) => a-b);
for(var i=0 ; i<sortedArr.length -1 ; i++){
 if((sortedArr[i]+1) !== sortedArr[i+1]){
 missedNum = sortedArr[i] + 1
 }
}
arr.forEach( el => {
 if(arr.indexOf(el) != arr.lastIndexOf(el)){
 duplicates.push(el)
 }
})
duplicates.forEach(el => {
 if(sortedDuplicates.includes(el) == false){
 sortedDuplicates.push(el)
 }
})
var lastdup = sortedDuplicates.sort((a,b) => a-b);
return [missedNum , lastdup]
}
findDupsMiss([10,9,8,9,6,1,2,4,3,2,5,5,3])

Question 2

Can you give us a better description of what the code does? Maybe add sample input and output. What are "the limits of given array"? What is "the missed value between limits"?

Question 3

On the small 13 element array your implementation appears to be fairly performant, but since you request a "faster" implementation I took a stab at one myself.

Implementation

In this implementation I wanted to remove the multiple loops and use a data structure that lent itself to faster lookups, so I used plain javascript objects as maps. I also defined each array function callback outside where it'd be used to only instantiate a single copy.

function findDupsMiss2(arr) {
 const processedNumbers = {};
 const duplicates = {};
 const noDuplicates = {};
 const checkDupOrNot = (el, i) => {
 if (processedNumbers[el]) {
 duplicates[el] = el;
 delete noDuplicates[el];
 } else {
 noDuplicates[el] = el;
 }
 processedNumbers[el] = el;
 };
 arr.forEach(checkDupOrNot);
 const missedReduceFn = (missed, el, i, arr) =>
 el + 1 !== arr[i + 1] ? el : missed;
 const missed = [
 ...Array(
 processedNumbers[Object.values(processedNumbers).length - 1]
 ).keys()
 ].reduce(missedReduceFn, null);
 return [missed, Object.values(duplicates)];
}

Testing

This is the function I defined to measure how performant a function is:

const measurePerf = (fn, data, runs = 1000000) =>
 [...Array(runs).keys()]
 .map(() => {
 const start = performance.now();
 fn(data);
 const end = performance.now();
 return end - start;
 })
 .reduce((total, current) => total + current) / runs;

This takes a function and measures how long it takes to execute, sums the total time then divides by the # of runs to get an average execution time.

Used as such

measurePerf(findDupsMiss, data); // uses default 1 million iterations
measurePerf(findDupsMiss, data, 1000);

Initial Results

Using the default 1-million iteration performance measure

findDupsMiss1
[7, Array[4]]
0: 7
1: Array[4]
0: 2
1: 3
2: 5
3: 9
0.0021040050006413367
findDupsMiss2
[7, Array[4]]
0: 7
1: Array[4]
0: 2
1: 3
2: 5
3: 9
0.0023690749967063313

At first this may seem like your code is already faster (or my implementation is slower), but this isn't the entire story.

If we analyze the Big-O complexity of your implementation we find it is O(n^2). Of the outer loops (3x O(n) forEach loops and 2x O(nLog_n) sorts) each of the forEach loops has an additional O(n) search loop to find either an index or if an element is already included, bumping the Big-O to O(n^2).

The other implementation has also 4x loops (1x forEach, 1x array spread, 1x array reduce, 1x object.values), but has no inner loops, each loop uses an O(1) lookup in a map, so has an O(n) complexity.

But where is the sort?

When using an object as a map with number-like keys, they are inserted in numerical order, so we get "sorting" for free. (in reality there is probably some minimal search though).

Further Results & Extended Examination

What does this Big-O O(n^2) vs O(n) mean?

This has to do with how much work there is to do based upon input size. When an algorithm's Big-O complexity is O(n) then when the input size doubles the output should roughly double, i.e. something like 2n (hand wavey), whereas when an algorithm's Big-O complexity is O(n^2) then the output size something more like 4n.

Here's a rough table of output. Because of performance I had to drop the test performance iterations to 100 (versus10^6), but figured 100 runs is still enough to gather enough data for decent comparisons.

TBH the average is less relevant here as this table is more to illustrate the ratio between the two algorithms, with a secondary comparison in parenthesis to highlight the ratio of output from the previous iteration

const measurePerf = (fn, data, runs = 1000000) =>
 [...Array(runs).keys()]
 .map(() => {
 const start = performance.now();
 fn(data);
 const end = performance.now();
 return end - start;
 })
 .reduce((total, current) => total + current) /runs;
function findDupsMiss1(arr) {
 // your code here
 var noDuplicates = [];
 var missedNum;
 var duplicates = [];
 var sortedDuplicates = [];
 arr.forEach((el, i) => {
 if (noDuplicates.includes(el) == false) {
 noDuplicates.push(el);
 }
 });
 var sortedArr = noDuplicates.sort((a, b) => a - b);
 for (var i = 0; i < sortedArr.length - 1; i++) {
 if (sortedArr[i] + 1 !== sortedArr[i + 1]) {
 missedNum = sortedArr[i] + 1;
 }
 }
 arr.forEach(el => {
 if (arr.indexOf(el) != arr.lastIndexOf(el)) {
 duplicates.push(el);
 }
 });
 duplicates.forEach(el => {
 if (sortedDuplicates.includes(el) == false) {
 sortedDuplicates.push(el);
 }
 });
 var lastdup = sortedDuplicates.sort((a, b) => a - b);
 return [missedNum, lastdup];
}
function findDupsMiss2(arr) {
 const processedNumbers = {};
 const duplicates = {};
 const noDuplicates = {};
 const checkDupOrNot = (el, i) => {
 if (processedNumbers[el]) {
 duplicates[el] = el;
 delete noDuplicates[el];
 } else {
 noDuplicates[el] = el;
 }
 processedNumbers[el] = el;
 };
 arr.forEach(checkDupOrNot);
 const missedReduceFn = (missed, el, i, arr) =>
 el + 1 !== arr[i + 1] ? el : missed;
 const missed = [
 ...Array(
 processedNumbers[Object.values(processedNumbers).length - 1]
 ).keys()
 ].reduce(missedReduceFn, null);
 return [missed, Object.values(duplicates)];
}
const data = [10, 9, 8, 9, 6, 1, 2, 4, 3, 2, 5, 5, 3];
const perf1 = measurePerf(findDupsMiss1, data);
console.log(findDupsMiss1(data), perf1);
const perf2 = measurePerf(findDupsMiss2, data);
console.log(findDupsMiss2(data), perf2);
const processIterations = async () => {
 const results = [];
 // # elements 1, 2, 4, 8, 16, 32, 64, ..., 262144, 524288
 for (let i = 0; i < 20; i++) {
 const dataLength = 1 << i;
 const randData = [...Array(dataLength).keys()].map(() =>
 Math.floor(Math.random)
 );
 const [t1, t2] = await Promise.all([
 // For your sanity, start skipping findDupsMiss1 here at 11.
 // Maybe don't go more than 15 unless you like creating space heaters.
 i < 11 ? measurePerf(findDupsMiss1, randData, 100) : "skipped",
 measurePerf(findDupsMiss2, randData, 100)
 ]);
 results.push({
 iteration: i,
 dataLength,
 t1: Number.isFinite(t1) ? Number(t1).toFixed(3) : t1,
 t2: Number.isFinite(t2) ? Number(t2).toFixed(3) : t2
 });
 }
 return results;
};
processIterations().then(results => {
 console.log(`\t# Elements\tt1\tt2\tt2:t1 ratio`);
 results.forEach(({ iteration, dataLength, t1, t2 }, i, arr) => {
 const prev = arr[i - 1] || {};
 const t2t1Ratio = Number(t2 / t1).toFixed(2);
 const t1Ratio = Number(t1 / prev.t1).toFixed(1);
 const t2Ratio = Number(t2 / prev.t2).toFixed(1);
 console.log(
 `${iteration}\t${dataLength}\t${t1}(${t1Ratio})\t${t2}(${t2Ratio})\t${t2t1Ratio}`
 );
 });
});

 # Elements t1 avg t2 avg t2:t1 ratio 
0 1 0.002(NaN) 0.003(NaN) 1.50 
1 2 0.002(1.0) 0.004(1.3) 2.00 
2 4 0.003(1.5) 0.005(1.3) 1.67 
3 8 0.006(2.0) 0.006(1.2) 1.00 
4 16 0.005(0.8) 0.012(2.0) 2.40 
5 32 0.011(2.2) 0.016(1.3) 1.45 
6 64 0.031(2.8) 0.029(1.8) 0.94 
7 128 0.084(2.7) 0.048(1.7) 0.57 
8 256 0.270(3.2) 0.094(2.0) 0.35 
9 512 1.032(3.8) 0.186(2.0) 0.18 
10 1024 4.696(4.6) 0.372(2.0) 0.08 
11 2048 16.303(3.5) 0.749(2.0) 0.05 
12 4096 67.092(4.1) 1.492(2.0) 0.02 
13 8192 266.986(4.0) 3.018(2.0) 0.01 
14 16384 1081.774(4.1) 6.785(2.2) 0.01 
15 32768 skipped(NaN) 12.298(1.8) NaN 
16 65536 skipped(NaN) 25.941(2.1) NaN 
17 131072 skipped(NaN) 49.775(1.9) NaN 
18 262144 skipped(NaN) 99.556(2.0) NaN 
19 524288 skipped(NaN) 195.841(2.0) NaN

Notes:

Average time is in milliseconds
Item in () is the ratio to the previous time (i.e. line above)
Each iteration 100 runs for performance measuring
Stopped measuring algorithm 1 after 15 iterations, it's just too slow

Conclusion

As can be seen, the table confirms your algorithm, after a certain point, appears to settle into roughly a 4x output size based upon 2x input, where the other algorithm appears to settle around 2x, as expected. Also, a tipping point appears to consistently be around iteration #6 / 64 elements when algorithm #2 overtakes algorithm #1 and is more performant.

Edit find duplicates and last missed algorithm comparison

Question 4

From a short review;

I like variables to be either spelled out or Spartan (meaning 1 char)
- missedNum -> missedNumber
- noDuplicates -> uniqueDuplicates?
- el -> i (since you expect numbers)
Comments, you only have 1 line, and it could go ;)
Reducing loop counts is the way to reduce run time.

This counter example has 3 'loops', 1 map because I dont want to mess with the original list, 1 sort because that's central to the functionality, and 1 for to analyze the data. The code is not sexy at all, but I think reads well and seems to perform better.

function analyzeNumberList(list) {
 let numbers = list.map(i=>i).sort((a,b)=>a-b);
 let missing, dupes = [];
 
 for(let i = 0; i < numbers.length; i++){
 let current = numbers[i];
 let last = numbers[i-1];
 let secondLast = numbers[i-2];
 
 if(current == last){
 if(current != secondLast){
 dupes.push(current);
 }
 }else if(current != last + 1){
 missing = last + 1;
 }
 }
 return [missing, dupes];
}
console.log(analyzeNumberList([10,9,8,9,6,1,2,4,3,2,5,5,3]))

Drew Reese Drew ReeseDrew Reese 9391 gold badge6 silver badges14 bronze badges · Answer 1 · 2020-06-08 06:18:57Z

On the small 13 element array your implementation appears to be fairly performant, but since you request a "faster" implementation I took a stab at one myself.

Implementation

In this implementation I wanted to remove the multiple loops and use a data structure that lent itself to faster lookups, so I used plain javascript objects as maps. I also defined each array function callback outside where it'd be used to only instantiate a single copy.

function findDupsMiss2(arr) {
 const processedNumbers = {};
 const duplicates = {};
 const noDuplicates = {};
 const checkDupOrNot = (el, i) => {
 if (processedNumbers[el]) {
 duplicates[el] = el;
 delete noDuplicates[el];
 } else {
 noDuplicates[el] = el;
 }
 processedNumbers[el] = el;
 };
 arr.forEach(checkDupOrNot);
 const missedReduceFn = (missed, el, i, arr) =>
 el + 1 !== arr[i + 1] ? el : missed;
 const missed = [
 ...Array(
 processedNumbers[Object.values(processedNumbers).length - 1]
 ).keys()
 ].reduce(missedReduceFn, null);
 return [missed, Object.values(duplicates)];
}

Testing

This is the function I defined to measure how performant a function is:

const measurePerf = (fn, data, runs = 1000000) =>
 [...Array(runs).keys()]
 .map(() => {
 const start = performance.now();
 fn(data);
 const end = performance.now();
 return end - start;
 })
 .reduce((total, current) => total + current) / runs;

This takes a function and measures how long it takes to execute, sums the total time then divides by the # of runs to get an average execution time.

Used as such

measurePerf(findDupsMiss, data); // uses default 1 million iterations
measurePerf(findDupsMiss, data, 1000);

Initial Results

Using the default 1-million iteration performance measure

findDupsMiss1
[7, Array[4]]
0: 7
1: Array[4]
0: 2
1: 3
2: 5
3: 9
0.0021040050006413367
findDupsMiss2
[7, Array[4]]
0: 7
1: Array[4]
0: 2
1: 3
2: 5
3: 9
0.0023690749967063313

At first this may seem like your code is already faster (or my implementation is slower), but this isn't the entire story.

If we analyze the Big-O complexity of your implementation we find it is O(n^2). Of the outer loops (3x O(n) forEach loops and 2x O(nLog_n) sorts) each of the forEach loops has an additional O(n) search loop to find either an index or if an element is already included, bumping the Big-O to O(n^2).

The other implementation has also 4x loops (1x forEach, 1x array spread, 1x array reduce, 1x object.values), but has no inner loops, each loop uses an O(1) lookup in a map, so has an O(n) complexity.

But where is the sort?

When using an object as a map with number-like keys, they are inserted in numerical order, so we get "sorting" for free. (in reality there is probably some minimal search though).

Further Results & Extended Examination

What does this Big-O O(n^2) vs O(n) mean?

This has to do with how much work there is to do based upon input size. When an algorithm's Big-O complexity is O(n) then when the input size doubles the output should roughly double, i.e. something like 2n (hand wavey), whereas when an algorithm's Big-O complexity is O(n^2) then the output size something more like 4n.

Here's a rough table of output. Because of performance I had to drop the test performance iterations to 100 (versus10^6), but figured 100 runs is still enough to gather enough data for decent comparisons.

TBH the average is less relevant here as this table is more to illustrate the ratio between the two algorithms, with a secondary comparison in parenthesis to highlight the ratio of output from the previous iteration

const measurePerf = (fn, data, runs = 1000000) =>
 [...Array(runs).keys()]
 .map(() => {
 const start = performance.now();
 fn(data);
 const end = performance.now();
 return end - start;
 })
 .reduce((total, current) => total + current) /runs;
function findDupsMiss1(arr) {
 // your code here
 var noDuplicates = [];
 var missedNum;
 var duplicates = [];
 var sortedDuplicates = [];
 arr.forEach((el, i) => {
 if (noDuplicates.includes(el) == false) {
 noDuplicates.push(el);
 }
 });
 var sortedArr = noDuplicates.sort((a, b) => a - b);
 for (var i = 0; i < sortedArr.length - 1; i++) {
 if (sortedArr[i] + 1 !== sortedArr[i + 1]) {
 missedNum = sortedArr[i] + 1;
 }
 }
 arr.forEach(el => {
 if (arr.indexOf(el) != arr.lastIndexOf(el)) {
 duplicates.push(el);
 }
 });
 duplicates.forEach(el => {
 if (sortedDuplicates.includes(el) == false) {
 sortedDuplicates.push(el);
 }
 });
 var lastdup = sortedDuplicates.sort((a, b) => a - b);
 return [missedNum, lastdup];
}
function findDupsMiss2(arr) {
 const processedNumbers = {};
 const duplicates = {};
 const noDuplicates = {};
 const checkDupOrNot = (el, i) => {
 if (processedNumbers[el]) {
 duplicates[el] = el;
 delete noDuplicates[el];
 } else {
 noDuplicates[el] = el;
 }
 processedNumbers[el] = el;
 };
 arr.forEach(checkDupOrNot);
 const missedReduceFn = (missed, el, i, arr) =>
 el + 1 !== arr[i + 1] ? el : missed;
 const missed = [
 ...Array(
 processedNumbers[Object.values(processedNumbers).length - 1]
 ).keys()
 ].reduce(missedReduceFn, null);
 return [missed, Object.values(duplicates)];
}
const data = [10, 9, 8, 9, 6, 1, 2, 4, 3, 2, 5, 5, 3];
const perf1 = measurePerf(findDupsMiss1, data);
console.log(findDupsMiss1(data), perf1);
const perf2 = measurePerf(findDupsMiss2, data);
console.log(findDupsMiss2(data), perf2);
const processIterations = async () => {
 const results = [];
 // # elements 1, 2, 4, 8, 16, 32, 64, ..., 262144, 524288
 for (let i = 0; i < 20; i++) {
 const dataLength = 1 << i;
 const randData = [...Array(dataLength).keys()].map(() =>
 Math.floor(Math.random)
 );
 const [t1, t2] = await Promise.all([
 // For your sanity, start skipping findDupsMiss1 here at 11.
 // Maybe don't go more than 15 unless you like creating space heaters.
 i < 11 ? measurePerf(findDupsMiss1, randData, 100) : "skipped",
 measurePerf(findDupsMiss2, randData, 100)
 ]);
 results.push({
 iteration: i,
 dataLength,
 t1: Number.isFinite(t1) ? Number(t1).toFixed(3) : t1,
 t2: Number.isFinite(t2) ? Number(t2).toFixed(3) : t2
 });
 }
 return results;
};
processIterations().then(results => {
 console.log(`\t# Elements\tt1\tt2\tt2:t1 ratio`);
 results.forEach(({ iteration, dataLength, t1, t2 }, i, arr) => {
 const prev = arr[i - 1] || {};
 const t2t1Ratio = Number(t2 / t1).toFixed(2);
 const t1Ratio = Number(t1 / prev.t1).toFixed(1);
 const t2Ratio = Number(t2 / prev.t2).toFixed(1);
 console.log(
 `${iteration}\t${dataLength}\t${t1}(${t1Ratio})\t${t2}(${t2Ratio})\t${t2t1Ratio}`
 );
 });
});

 # Elements t1 avg t2 avg t2:t1 ratio 
0 1 0.002(NaN) 0.003(NaN) 1.50 
1 2 0.002(1.0) 0.004(1.3) 2.00 
2 4 0.003(1.5) 0.005(1.3) 1.67 
3 8 0.006(2.0) 0.006(1.2) 1.00 
4 16 0.005(0.8) 0.012(2.0) 2.40 
5 32 0.011(2.2) 0.016(1.3) 1.45 
6 64 0.031(2.8) 0.029(1.8) 0.94 
7 128 0.084(2.7) 0.048(1.7) 0.57 
8 256 0.270(3.2) 0.094(2.0) 0.35 
9 512 1.032(3.8) 0.186(2.0) 0.18 
10 1024 4.696(4.6) 0.372(2.0) 0.08 
11 2048 16.303(3.5) 0.749(2.0) 0.05 
12 4096 67.092(4.1) 1.492(2.0) 0.02 
13 8192 266.986(4.0) 3.018(2.0) 0.01 
14 16384 1081.774(4.1) 6.785(2.2) 0.01 
15 32768 skipped(NaN) 12.298(1.8) NaN 
16 65536 skipped(NaN) 25.941(2.1) NaN 
17 131072 skipped(NaN) 49.775(1.9) NaN 
18 262144 skipped(NaN) 99.556(2.0) NaN 
19 524288 skipped(NaN) 195.841(2.0) NaN

Notes:

Average time is in milliseconds
Item in () is the ratio to the previous time (i.e. line above)
Each iteration 100 runs for performance measuring
Stopped measuring algorithm 1 after 15 iterations, it's just too slow

Conclusion

As can be seen, the table confirms your algorithm, after a certain point, appears to settle into roughly a 4x output size based upon 2x input, where the other algorithm appears to settle around 2x, as expected. Also, a tipping point appears to consistently be around iteration #6 / 64 elements when algorithm #2 overtakes algorithm #1 and is more performant.

Edit find duplicates and last missed algorithm comparison

konijn konijnkonijn 34.2k5 gold badges70 silver badges267 bronze badges · Answer 2 · 2020-07-08 12:19:54Z

From a short review;

I like variables to be either spelled out or Spartan (meaning 1 char)
- missedNum -> missedNumber
- noDuplicates -> uniqueDuplicates?
- el -> i (since you expect numbers)
Comments, you only have 1 line, and it could go ;)
Reducing loop counts is the way to reduce run time.

This counter example has 3 'loops', 1 map because I dont want to mess with the original list, 1 sort because that's central to the functionality, and 1 for to analyze the data. The code is not sexy at all, but I think reads well and seems to perform better.

function analyzeNumberList(list) {
 let numbers = list.map(i=>i).sort((a,b)=>a-b);
 let missing, dupes = [];
 
 for(let i = 0; i < numbers.length; i++){
 let current = numbers[i];
 let last = numbers[i-1];
 let secondLast = numbers[i-2];
 
 if(current == last){
 if(current != secondLast){
 dupes.push(current);
 }
 }else if(current != last + 1){
 missing = last + 1;
 }
 }
 return [missing, dupes];
}
console.log(analyzeNumberList([10,9,8,9,6,1,2,4,3,2,5,5,3]))

Stack Exchange Network

Sort, & find out duplicates and missing elements in an array

2 Answers 2

Implementation

Testing

Initial Results

Further Results & Extended Examination

Conclusion

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sort, & find out duplicates and missing elements in an array

2 Answers 2

Implementation

Testing

Initial Results

Further Results & Extended Examination

Conclusion

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions