Identifying a starting point for maximum coverage

Asked 8 years, 6 months ago

Viewed 53 times

$\begingroup$

I have 10,000 sequentially numbered items I wish to characterize. For each item, I created a list of the other items I think might have a similar characteristic.

i.e.

Item 1: [2,4,7,8,9,...,9489] (list contains 250 items)
Item 2: [1,4,13,23,...,9424] (list contains 12 items)
Item 3: [1,2,4,7,...,9489] (list contains 140 items)
Item 4: [1,3,7,9,...,9211] (list contains 250 items)
Item 5: [1,3,7,9,...,9221] (list contains 250 items)
Item 6: [1,2,7,25,...,9248] (list contains 241 items)
Item 7: [4,5,6] (list contains 3 items)

If I only have time to test 50 items, How might I choose which items to test so that the maximum number of items is represented? The identified similarity is not necessarily bi-directional - notice item 7's list contains 4,5,6 but not 1 and 3(which both have 7 in their list).

Some lists may be very similar or even identical, as seen in items 4 and 5. Testing both 4 and 5 would not achieve the maximum coverage.

algorithms

Improve this question

asked Mar 25, 2017 at 7:18

Mark Brown's user avatar

Mark Brown Mark Brown

1233 bronze badges

$\endgroup$

$\begingroup$ The first step would be to find a quantitative objective function to optimize. Without this, all we can offer is common-sense heuristics that you can think of on your own. $\endgroup$

Yuval Filmus
– Yuval Filmus

2017年03月25日 07:31:43 +00:00
Commented Mar 25, 2017 at 7:31
$\begingroup$ I don't yet have a function to optimize, which Is why I didn't post this on StackOverflow. Let's say I order the lists by the number of items in descending order. I then check if the items in list 1 match more than 99% of the items in every other list. If so, I remove the other list. Then I go back through and do the same starting with list 2. Does this sound reasonable? $\endgroup$

Mark Brown
– Mark Brown

2017年03月25日 07:55:20 +00:00
Commented Mar 25, 2017 at 7:55
$\begingroup$ That's your common sense heuristic, and it's as good as anybody's. $\endgroup$

Yuval Filmus
– Yuval Filmus

2017年03月25日 08:07:46 +00:00
Commented Mar 25, 2017 at 8:07

Add a comment |

1 Answer 1

Sorted by: Reset to default

$\begingroup$

It sounds like you want to select a set of 50 lists whose union contains as many elements as possible. This is the maximum coverage problem, which is NP-complete, but there are standard techniques that can be used to get decent-quality solutions (there is an approximation algorithm using a greedy algorithm, and a formulation as ILP).

Improve this answer

answered Mar 31, 2017 at 7:09

D.W.'s user avatar

D.W. ♦D.W.

168k23 gold badges234 silver badges517 bronze badges

$\endgroup$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

algorithms

See similar questions with these tags.

Stack Exchange Network

Identifying a starting point for maximum coverage

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Identifying a starting point for maximum coverage

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions