I need to filter an array of form controls to check for duplicate values. I have written the following algorithm to identify and return an array of the controls which have are duplicates. I have not seen this indexOf based approach in any of the sample duplicate checkers one finds in Google. Can anyone here spot a flaw in my code?
// select all keys
const duplicateKeys = values.map((current) => current.name);
// filter duplicate keys
const filteredDuplicateKeys = duplicateKeys.reduce((prev, name, index, names) => {
if (names.indexOf(name) !== names.indexOf(name, index)) {
if (!prev.includes(name)) return prev.concat(name);
}
return prev;
}, []);
// filter original array based on identified duplicate keys
const duplicates = values.reduce((prev, current) => filteredDuplicateKeys.indexOf(current.name) > -1 ? prev.concat(current) : prev, []);
2 Answers 2
The important flaw in your code is runtime complexity. reduce, indexOf and includes all have runtimes that are linear (O(n)) in the size of the array. You're running the latter two once for every iteration of reduce, giving a quadratic (O(n2)) runtime. Doubling the length of the input will quadruple the running time.
A secondary flaw is excess regular human-brain complexity: it's more lines of code and data structures than are needed.
The typical approach involves a single hash table to count duplicates and just two lines of code: one to count, one to filter:
var countKeys = {};
values.forEach( value => countKeys[value.name] = (countKeys[value.name] || 0) + 1 )
const duplicates = values.filter( value => countKeys[value.name]>1 );
const values = [
{
name: 'Duplicate 1',
value: [1,2,3],
},
{
name: 'Duplicate 2',
value: [1,2,3],
},
{
name: 'Duplicate 1',
value: [1,2,34],
},
{
name: 'Duplicate 2',
value: [1,2,3],
},
{
name: 'Duplicate 2',
value: [1,2,3],
},
{
name: 'Not Duplicate',
value: [1,2,3],
},
];
var countKeys = {};
values.forEach( value => countKeys[value.name] = (countKeys[value.name] || 0) + 1 )
const duplicates = values.filter( value => countKeys[value.name]>1 );
console.log(duplicates)
<p>
-
\$\begingroup\$ Thank you for taking the time to share that information. That complexity is quite high - I guess that is why this is not an appropriate solution to the problem. \$\endgroup\$hsimah– hsimah2019年02月04日 01:03:37 +00:00Commented Feb 4, 2019 at 1:03
Some inefficiencies in your code.
concatis a memory and CPU hog as it creates a new array each time you call it. If you are just adding to an array usearray.pushas it has much less overhead.- You look for two indexes with
if (names.indexOf(name) !== names.indexOf(name, index))but you only need to know if the second result is> -1the first search is redundant. - You don't need to create an array of keys
duplicateKeys. You can useArray.findIndexinstead and work on the original items. - When you have located a duplicate key you do not need to check if it already exists in the list of duplicate keys. It does not matter if you duplicate items in the array of duplicate keys. Memory is cheaper than CPU cycles so favour memory over CPU cycles.
- When you have the list of duplicated keys you can use
Array.filterto extract the items with those keys.
Thus you can simplify the function to
function getDuplicates(arr, key = "name") {
const dupKeys = arr.reduce((prev, item, index, arr) => {
if (arr.findIndex((val, idx) => idx > index && val[key] === item[key]) > -1) {
prev.push(item[key]);
return prev;
}
return prev;
}, []);
return arr.filter(item => dupKeys.includes(item[key]));
}
Hash tables for faster lookups
You can get better performance and less complexity if you use a Map. It creates a hash table for each entry making lookups much faster. Use the map to count how many copies of a key there are and then filter the array depending on the count.
function getDuplicates(arr, key = "name") {
const keys = new Map();
for(const val of arr) {
if (keys.has(val[key])) { keys.get(val[key]).count += 1}
else { keys.set(val[key], {count: 1}) }
}
return arr.filter(val => keys.get(val[key]).count > 1);
}
-
\$\begingroup\$ Thanks for taking the time to review my code. I appreciate the tips for improving efficiency. \$\endgroup\$hsimah– hsimah2019年02月04日 01:04:09 +00:00Commented Feb 4, 2019 at 1:04