Algorithm for commonalities across related objects

Question 1

I am making early forays into the application of algorithms to best solve a problem and I am finding it difficult to choose the best application.

The problem is give the data below (two columns) find those ids where the linkedId matches entirely:

Input

id linkedId
1 103
2 107
1 107
2 103
4 64
6 120
7 107
8 180
7 64
9 129

Expected output [1,2] as the linkedIds match entirely for the two ids

As I look at the data, I would envisage a graph of some description so that I can establish the various links between the ids and linkedIds but I am at a loss as to how to approach this.

I do not want a solution but more guiding in the right direction.

Thanks

Question 2

By "matched exactly" what do you mean? Do you mean that the set of linkedIds that each id in the output maps to are exactly the same? What if there are two sets of ids in the input such that each set of ids maps to the same linkedIDs within a set but not across sets, should we return them then? In other words, are we looking for all Ids such that there exists another ID with the exact same mapped set of linkedIds?

Question 3

I came up with the idea of hashing when I just started looking at this, but seems that the range of the input data is not given, so it's very hard to decide on a specific algorithm.

However, there's another idea:

During input, create a table for each id, record all the linkedId of it, and the sum of linkedIds of it.
After recording all the input, you have tons of ways to find those ids whose sum of all linkedIds are the same, put them into groups, like an $O(n^2)$ brute-force or some $O(n\log(n))$ discretization. This step is a kind of classification, all ids in a group may not be the equivalent, but equivalent ids must be in the same group
Traverse through all groups, find those equivalent ids

This is expected to be a lot more efficient than brute-force. Hope it can help you.

ice1000 ice1000 9806 silver badges35 bronze badges · Answer 1 · 2018-10-22 20:46:36Z

I came up with the idea of hashing when I just started looking at this, but seems that the range of the input data is not given, so it's very hard to decide on a specific algorithm.

However, there's another idea:

During input, create a table for each id, record all the linkedId of it, and the sum of linkedIds of it.
After recording all the input, you have tons of ways to find those ids whose sum of all linkedIds are the same, put them into groups, like an $O(n^2)$ brute-force or some $O(n\log(n))$ discretization. This step is a kind of classification, all ids in a group may not be the equivalent, but equivalent ids must be in the same group
Traverse through all groups, find those equivalent ids

This is expected to be a lot more efficient than brute-force. Hope it can help you.

Stack Exchange Network

Algorithm for commonalities across related objects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Algorithm for commonalities across related objects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions