I am making early forays into the application of algorithms to best solve a problem and I am finding it difficult to choose the best application.
The problem is give the data below (two columns) find those ids where the linkedId matches entirely:
Input
id linkedId
1 103
2 107
1 107
2 103
4 64
6 120
7 107
8 180
7 64
9 129
Expected output [1,2] as the linkedIds match entirely for the two ids
As I look at the data, I would envisage a graph of some description so that I can establish the various links between the ids and linkedIds but I am at a loss as to how to approach this.
I do not want a solution but more guiding in the right direction.
Thanks
-
$\begingroup$ By "matched exactly" what do you mean? Do you mean that the set of linkedIds that each id in the output maps to are exactly the same? What if there are two sets of ids in the input such that each set of ids maps to the same linkedIDs within a set but not across sets, should we return them then? In other words, are we looking for all Ids such that there exists another ID with the exact same mapped set of linkedIds? $\endgroup$Throckmorton– Throckmorton2019年04月21日 00:31:00 +00:00Commented Apr 21, 2019 at 0:31
1 Answer 1
I came up with the idea of hashing when I just started looking at this, but seems that the range of the input data is not given, so it's very hard to decide on a specific algorithm.
However, there's another idea:
- During input, create a table for each
id
, record all thelinkedId
of it, and the sum oflinkedId
s of it. - After recording all the input, you have tons of ways to find those
id
s whose sum of alllinkedId
s are the same, put them into groups, like an $O(n^2)$ brute-force or some $O(n\log(n))$ discretization. This step is a kind of classification, allid
s in a group may not be the equivalent, but equivalentid
s must be in the same group - Traverse through all groups, find those equivalent
id
s
This is expected to be a lot more efficient than brute-force. Hope it can help you.