Merge intersecting sub-arrays

Question 1

What I'm trying to achieve is to group all sub-arrays, which are connected (sharing the same elements) into one array. So here is an example:

Input: [[1, 17], [3, 5], [5], [1, 3], [7], [2], [9, 8], [7, 2]];

What I am expecting to get: [[ 1, 3, 5, 17 ], [ 2, 7 ], [ 8, 9 ]]

Indexes of arrays, that are related:

0, 1, 2, 3
4, 5, 7
6

My code does what's needed, but my gut tells me that performance, amount of iterations and readability of the code can be improved, especially because it will be run on pretty long arrays.

So here is what I have:

const input = [[1, 17], [3, 5], [5], [1, 3], [7], [2], [9, 8], [7, 2]];
let i = 0;
const mergeIntersections = (array) => {
 const arrayClone = [].concat(array);
 let intersectionFound;
 let currIntersections = [];
 arrayClone.forEach(() => {
 /**
 * Subarray, which elements we'll look for
 * Removing it from the source array so it does not disturb us
 */
 const wanted = arrayClone.shift().sort((a, b) => a - b);
 
 /*
 * Preserving the length of the source array
 */
 arrayClone.push([]);
 let intersections = [].concat(wanted);
 wanted.forEach((wantedElement) => {
 arrayClone.forEach((arr, arrIndex) => {
 if (~arr.indexOf(wantedElement)) {
 intersections = intersections.concat(arr);
 intersectionFound = true;
 arrayClone[arrIndex] = [];
 }
 i++;
 });
 });
 if (intersectionFound) {
 currIntersections.push(Array.from(new Set(intersections)));
 } else {
 currIntersections = array;
 }
 currIntersections = currIntersections.filter(item => item.length);
 });
 if (intersectionFound) {
 return mergeIntersections(currIntersections);
 }
 return currIntersections;
};
console.log('result: ', JSON.stringify(mergeIntersections(input)));
console.log('amount of iterations: ', i);

Question 2

Have you tried it with those long arrays? if its fast enough you will want to priorize readability on what you already have, if not a change in the algorithm might be needed.

Question 3

Yep, I tried, and nope, it's not fast at all. That's why I am here

Question 4

Would you like a _ (lodash) solution?

Question 5

how long are these arrays?

Question 6

@MuhammadFaizan I'd prefer vanilla js; ~150k of sub-arrays, each containing unknown amount of elements

Question 7

I find several issues with the current implementation:

Complicated, very difficult to understand how it works
shift on an array while iterating over it with forEach
Creating too many objects in the process (arrays and sets)
Using indexOf on arrays instead working with sets
Using ~arr.indexOf instead of the more readable simple condition > -1

I propose a simpler, more efficient algorithm:

Convert each array to a Set
Initialize an empty object, to be used as a map of values and the set they belong to. Let's call this pools.
For each set, and for each item in the set:
- If item is not in pools, then add item -> set mapping in pools
- Otherwise, add each item2 in pools[item] to set, and add mappings item2 -> set in pools
At this point pools has all the items in all the input arrays as keys, pointing to sets that have the values merged. Keys in the same pool all point to the set, so there are no unnecessary set objects here.
What remains is to get the distinct values from pools, convert them to arrays and sort them (if you need to sort them).

Here's one way to implement:

const mergeIntersections2 = (arrays) => {
 const pools = {};
 arrays.map(arr => new Set(arr)).forEach(set => {
 Array.from(set).forEach(item => {
 if (!pools[item]) {
 pools[item] = set;
 } else {
 pools[item].forEach(item2 => {
 pools[item2] = set.add(item2);
 });
 }
 });
 });
 const seen = {};
 const merged = [];
 Object.keys(pools).forEach(item => {
 if (!seen[item]) {
 merged.push(Array.from(pools[item]).sort((a, b) => a - b));
 pools[item].forEach(item2 => seen[item2] = 1);
 }
 });
 return merged;
};

Question 8

Nice short solution. 2 notes though: pools[item] != set will always be true. And this has a worst case complexity of amount of numbers in input * amount of arrays in input * cost of set add (imagine this case: [[1, 2, 3, 4, 5, 6], [1], [1], [1], [1]])

Question 9

I went full performance here, so no forEach and such. The idea is to use associative arrays and assume they cost O(1) for speed improvement. The steps of this algorithm:

For each number, store the index of the array it first appeared on.
For repeated numbers, we have intersections. As resolving the intersection right now is not performant (we might have more intersections later), we will just mark the current array and the array that number first appeared on as neighbors.
After all this preprocessing, we are ready to solve the problem. What we are left with is a graph, where a node is an array inside input. For that node, we have stored its inmediate neighbors, which represent the other arrays that also had a common number with it.
Now we do a standard DFS to iterate over all direct and indirect neighbors of a node, where we keep adding the numbers of each of those nodes to the group we are making.

The most expensive part of the algorithm is the sorting, which I ́m not sure if you needed it. If you do not care about order, no need to sort them. The important part is that we only add each number to the result array once, and we process each number once, so except for the sorting, the complexity should be linear on the amount of numbers in input.

const input = [[1, 17], [3, 5], [5], [1, 3], [7], [2], [9, 8], [7, 2]];
function merge(input) {
 var numberGroup = {}
 var neighbors = new Array(input.length); // direct intersections
 var processed = new Array(input.length); // for the dfs
 
 for (var arrayIndex = 0; arrayIndex < input.length; arrayIndex++) {
 var array = input[arrayIndex];
 neighbors[arrayIndex] = {}; // we don ́t use [] because we don ́t want to keep repeated neighbors
 processed[arrayIndex] = false;
 
 for (var i = 0; i < array.length; i++) {
 var number = array[i];
 
 if (numberGroup.hasOwnProperty(number) == false) { // haven ́t seen the number before
 numberGroup[number] = arrayIndex; // we mark its first apparition
 } else { // we connect the node with the one where this number appeared for the first time
 var neighbor = numberGroup[number]; 
 neighbors[arrayIndex][neighbor] = true;
 neighbors[neighbor][arrayIndex] = true;
 }
 }
 }
 
 //inline function, could be taken outside and pass input , processed and neighbors
 function makeGroup (index, group) { // dfs
 if (processed[index] == false) {
 processed[index] = true;
 
 for (var i = 0; i < input[index].length; i++) {
 group[input[index][i]] = true; // we only care about key, so we put value true but could be false
 }
 
 for (var neighbor in neighbors[index]) makeGroup(neighbor, group)
 return true; // this makes a new group
 }
 return false; // this does not make a new group
 }
 
 var result = [];
 
 for (var i = 0; i < input.length; i++) {
 var group = {};
 if(makeGroup(i, group)) {
 result.push(Object.keys(group).map(Number).sort((a, b) => a - b)); // pass the keys to an array and sort it
 }
 }
 
 return result;
}
console.log(merge(input));

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2017-11-01 21:40:11Z

I find several issues with the current implementation:

Complicated, very difficult to understand how it works
shift on an array while iterating over it with forEach
Creating too many objects in the process (arrays and sets)
Using indexOf on arrays instead working with sets
Using ~arr.indexOf instead of the more readable simple condition > -1

I propose a simpler, more efficient algorithm:

Convert each array to a Set
Initialize an empty object, to be used as a map of values and the set they belong to. Let's call this pools.
For each set, and for each item in the set:
- If item is not in pools, then add item -> set mapping in pools
- Otherwise, add each item2 in pools[item] to set, and add mappings item2 -> set in pools
At this point pools has all the items in all the input arrays as keys, pointing to sets that have the values merged. Keys in the same pool all point to the set, so there are no unnecessary set objects here.
What remains is to get the distinct values from pools, convert them to arrays and sort them (if you need to sort them).

Here's one way to implement:

const mergeIntersections2 = (arrays) => {
 const pools = {};
 arrays.map(arr => new Set(arr)).forEach(set => {
 Array.from(set).forEach(item => {
 if (!pools[item]) {
 pools[item] = set;
 } else {
 pools[item].forEach(item2 => {
 pools[item2] = set.add(item2);
 });
 }
 });
 });
 const seen = {};
 const merged = [];
 Object.keys(pools).forEach(item => {
 if (!seen[item]) {
 merged.push(Array.from(pools[item]).sort((a, b) => a - b));
 pools[item].forEach(item2 => seen[item2] = 1);
 }
 });
 return merged;
};

Nice short solution. 2 notes though: pools[item] != set will always be true. And this has a worst case complexity of amount of numbers in input * amount of arrays in input * cost of set add (imagine this case: [[1, 2, 3, 4, 5, 6], [1], [1], [1], [1]])

Stack Exchange Network

Merge intersecting sub-arrays

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Merge intersecting sub-arrays

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions