I have n
sets that each contain some elements, for example:
{2, 3}, {6, 3}, {2, 7, 3}, {7}
Now the goal is to find the minimum number of elements that occur in all sets. For this example it would be 2 ({3, 7}
). Some more examples:
{2}, {4}
- minimum is 2 ({2, 4}
)
{4, 15, 3}, {3, 2}, {24, 4, 2}, {24}, {15, 6}
- minimum is 3 ({15, 2, 24}
or {15, 3, 24}
or {3, 24, 6}
)
I don't need to know the set of those minimum number of elements necessarily, just the size of that set is enough.
I've written a recursive algorithm myself already that goes over all possibilities, but the execution time really gets out of hand as the number of different elements rises. Is there an algorithm available (maybe already written but I couldn't find it) for this specific problem? I've looked at the set cover problem, it looks a bit like this one but it's not exactly the same.
2 Answers 2
The problem you are describing is the hitting set problem, and as you suggest is closely related to the set cover problem. Finding a hitting set is NP-hard, so it is no surprise that your implementation will be slow as the number of elements rises!
You can approximate the solution with various approaches such as a greedy algorithm, but this is very much a case of 'pick your battles': whatever approach you choose, you are trading correctness for time.
-
To get from this problem to set cover, you have to reverse the relationship between sets and numbers. So instead of putting number 3 in sets 1, 2 and 3, you make set 3 contain numbers 1, 2 and 3. Then it's set cover.Stack Exchange Broke The Law– Stack Exchange Broke The Law10/05/2020 16:11:20Commented Oct 5, 2020 at 16:11
The problem is NP-complete. That means (look it up in CS) nobody has found an algorithm that is always fast, and likely never will.
You can try to keep the execution time lower. Remove all empty sets. If there is a set with a single element, like 24 in your example, that number goes to your solution because you need it, then you remove that number from the problem. If there is an element common to all sets then you take it and have the solution. If a set is a subset of another, remove the superset. If an element is only in one set, remove it. If an element always comes with the same other element then remove it. So far it’s easy.
Then you switch to your recursive approach. Count how many sets contain each element, and you want to try the ones contained in the largest number of sets first. Keep track of the best solution found and stop examining across paths that can’t give better results. For example if you have 100 sets and covered them with 10 numbers, then you only check numbers recursively that are in at least 12 sets because now you only care about nine or fewer numbers covering all sets.
resultSet
starts with all elements that are members of singleton sets. In your original example,7
. Then, I would remove7
from all sets (removing{7}
entirely rather than leaving it empty), and calculate the intersection of all remaining sets. If the intersection is empty, you find the smallest set, and add all of its elements to the result, and repeat. If it's not empty (in this case, it will be{3}
), remove the intersection elements from all sets. Repeat until all original sets are "checked off".