I would like to know if there's a known algorithm or best practice way to do the following:
I have a collection with a subcollection, for example:
R1 R2 R3
-- -- --
M M M
N N
L L
A
What i need is an algorithm to get the following result:
R1, R2: M N L
R2: A
R3: M
This is -not- what i want, it has more repeating values for R than the above:
R1, R2, R3: M
R1, R2: N L
R2: A
I need to group in way that i get the most optimized groups of R. The least amount of groups of R the better so i get the largest sub collections.
Another example (with the most obvious result):
R1 R2 R3
-- -- --
M M A
V V B
L L C
Should result in:
R1, R2: M V L
R3: A B C
I need to do this in LINQ/C#.
Any solutions? Tips? Links?
-
It's not clear what you are trying to minimize or maximize. Do you always want to list the largest set common to more than one R group? Suppose you have R1: ABCD; R2: ABCD; R3: ABC ? Then would you like to get R1,R2: ABCD; R3: ABC ?kevin cline– kevin cline2012年11月28日 17:56:53 +00:00Commented Nov 28, 2012 at 17:56
-
I want the least combinations for R. The more "R"s in a group, the better. In your example: Yes, you're right.Jeroen– Jeroen2012年11月28日 18:05:13 +00:00Commented Nov 28, 2012 at 18:05
-
You're "not what I want" has the most R's in a group. Do you mean you want the least number of total R's in all groups ? I think that's what you're trying to say..Jimmy Hoffa– Jimmy Hoffa2012年11月28日 18:11:24 +00:00Commented Nov 28, 2012 at 18:11
-
Did anyone else initially misread the title as "algorithm to optimize groping"?Mason Wheeler– Mason Wheeler2012年11月28日 18:17:17 +00:00Commented Nov 28, 2012 at 18:17
-
You're right. I guess my explanation isn't the best. By the more "R"s in a group i meant that ideally i have them all in 1 group. The less they repeat the better.Jeroen– Jeroen2012年11月28日 18:19:38 +00:00Commented Nov 28, 2012 at 18:19
1 Answer 1
I think since you want the least total R's, you want to reverse your K/V and use the values as keys so your initial list becomes:
- M: R1, R2, R3
- N: R1, R2
- L: R1, R2
- A: R2
Then you can say the count of members intersecting ( http://msdn.microsoft.com/en-us/library/system.linq.enumerable.intersect.aspx ) of M and N is 2, which means you get n-(2 sets * 2 members) number of R's, and if you intersect L you get n-(3*2) number of R's. and if you intersect A as well you get n-(4*1) number of R's, so obviously your best choice is the second intersection set. Unfortunately this is a naive implementation with O(n!) time because you have to interset all possible sets to find the greatest set of intersections which will yield the largest minimization of R's to portray all sets.
After you find the best optimization in this technique where the largest intersection count * sets crossed, remove all the R's in the intersection set from the sets you intersected, then repeat the terribly inefficient combinatoric intersection comparisons against the sets as they are left. Rinse and repeat the comparisons to find largest intersections and remove them from the sets until no Rs are left in any sets.