3

I have a group of arrays that I need to filter out duplicates. It needs to work in such a fashion that within each array, there are no duplicates, and within the total group, there are no two arrays that hold the same values.

The first part is easy - for each inner array, I can apply Set to the array and filter it out. So, given the matrix arrays I can apply the following to filter:

const sets : string[][] = arrays.map(arr=>[...new Set(arr)].sort());

This will give me an array of sets. How can I make this into a set of sets? As in, if sets=[[a, b],[c],[d, a],[c],[e]] I would like setOfSets to equal [[a, b],[c],[d, a],[e]]?

Applying setOfSets = [...new Set(sets)]; would not work, since arrays that are equal are not considered equal by default if they have different addresses. Is there a way to force set to check by value, or another effective way to create this effect?

Edit

Original matrix:

[[a, b, b],
[c,c],
[b,a],
[d,a],
[c,c],
[e,e]]

after creating and sorting sets:

[[a,b],
[c],
[a,b],
[d,a],
[c],
[e]]

desired result:

[[a,b],
[c],
[d,a],
[e]]
asked Jan 6, 2021 at 9:23
0

1 Answer 1

4

If the data in your set is easy to serialize, I would opt for a solution like this:

const data = [
 ["a", "b", "b"],
 ["c","c"],
 ["b","a"],
 ["d","a"],
 ["c","c"],
 ["e","e"]
];
// Create the "hash" of your set
const serializeSet = s => Array
 .from(s)
 .sort()
 .join("___");
// Create a map (or object) that ensures 1 entry per hash
const outputMap = data
 .map(xs => new Set(xs))
 .reduce(
 (acc, s) => acc.set(serializeSet(s), s),
 new Map()
 );
// Turn your Map and Sets back in to arrays
const output = Array
 .from(outputMap.values())
 .map(s => Array.from(s));
 
console.log(output);

To come up with a good hash function for your set, you need to have a good look at your data. For example:

  • When your arrays consist of single characters from a-z, like in my example above, we can sort those strings using a default sorter and then join the result using a character from outside the a-z range.
  • If your arrays consist of random strings or numbers, JSON.stringify(Array.from(s).sort()) is safer to use
  • When your arrays consist of plain objects, you could JSON.stringify its sorted elements, but watch out for differences in the order of objects properties! (e.g. {a: 1, b: 2} vs {b: 2, a: 1})
answered Jan 6, 2021 at 9:53
Sign up to request clarification or add additional context in comments.

10 Comments

Ah, the examples were added when I was typing my answer. Thanks for the heads up. I'll have a look
@Wimanicesir I just tried it and my code does give the desired result?
What is the purpose of the join("___")?
When you create a hash of your data, there are often edge cases. The weird delimiter attempts to minimize those risks. For example, in my hashing function, there is the unsafe case of ["a___", "b"] vs ["a", "___b"] will both give "a______b" as a hash, so they will be incorrectly seen as duplicates. To create a hashing function that is safe to use, you need to know what your data might look like. In this case, when working with sample input, the ___ is enough.
I'll add some examples to my answer. In the mean time: the most commonly used way of hashing your arrays would be by using JSON.stringify. Maybe that makes more sense: serializeSet = s => JSON.stringify(Array.from(s).sort())
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.