UPD: The following problem comes from error correction codes, to be precise -- either maximum-likelihood or belief-propagation decoding when the spectrum of special failing subsets is known (to some extent). Then the failure probability can be formulated as follows.
Let us say the universe set is $\mathcal U = \{1,2,\dotsc,N\}$ and we have $M$ subsets of it: $S_1, S_2, \dotsc, S_M$ ($S_i \subseteq \mathcal U$ for each $i$). Given $W,ドル find number of $W$-subsets of $\mathcal U$ that cover at least one of $S_1, S_2, \dotsc, S_M$. Mathematically speaking -- find cardinality of the following set: $$ \{S \subseteq \mathcal U : |S|=W, \exists i \text{ s.t. } S_i \subseteq S\} ,円. $$
NB: union of the sets $S_1, S_2, \dotsc, S_M$ is not necessarily $\mathcal U$.
The algorithm does not need to be the most efficient, as I just need to find these numbers for my other problem. Hence, say, running time of 1 hour is acceptable. (Well, in the worst case, even 1 day might be okay if there is nothing better). The order of parameters I am interested in: $N \approx 200,ドル $M \leq 50,ドル $|S_i| \leq 8,ドル $W \leq 30$.
UPD: It seems that to find exact answer is very hard, therefore I am also interested in good bounds and/or approximations.
Example. $N=4, M=2, S_1=\{1,2\}, S_2=\{2,3\}$.
If $W=1,ドル the answer is 0: there are no 1-subsets that cover either $S_1$ or $S_2$.
If $W=2,ドル the answer is 2: $S_1$ and $S_2$ themselves.
If $W=3,ドル the answer is 3: $\{1,2,3\}$ (covers both $S_1$ and $S_2$), $\{1,2,4\}$ (covers $S_1$), $\{2,3,4\}$ (covers $S_2$).
If $W=4,ドル the answer is 1: $\mathcal U = \{1,2,3,4\}$ covers anything.
UPD: $M$ is now much smaller.
-
$\begingroup$ Do you need the exact number, or is an approximate answer OK too? $\endgroup$D.W.– D.W. ♦2018年01月17日 20:12:44 +00:00Commented Jan 17, 2018 at 20:12
-
$\begingroup$ Well, very close approximation will work. But not heuristics or so $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年01月17日 21:49:05 +00:00Commented Jan 17, 2018 at 21:49
-
$\begingroup$ Perhaps, I should consider also done bounds on the correct answer. I might need to start with Bonferroni inequalities... $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年01月18日 23:06:11 +00:00Commented Jan 18, 2018 at 23:06
-
$\begingroup$ Interesting problem! What's the context where you encountered this problem? Can you credit the source? Is there a motivation from a practical situation? $\endgroup$D.W.– D.W. ♦2018年01月20日 06:23:46 +00:00Commented Jan 20, 2018 at 6:23
-
$\begingroup$ Updated. Also, because of no progress approximate answer is now OK too :) $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年01月20日 08:16:57 +00:00Commented Jan 20, 2018 at 8:16
2 Answers 2
I can suggest two possible methods.
#SAT
This is closely related to #SAT, and one approach might be to try applying an off-the-shelf #SAT (or approximate-#SAT) solver.
Let $t_k$ denote the number of $W$-sets that contain $S_k$ but not any of $S_1,\dots,S_{k-1},ドル i.e., $t_k$ is the cardinality of the set
$$\{S \subseteq \mathcal{U} : |S|=W, S_k \subseteq S, S_1 \not\subseteq S, \dots, S_{k-1} \not\subseteq S\}.$$
It is easy to see that you want to compute $t_1+t_2+\dots+t_M,ドル so if we can figure out how to compute $t_k,ドル we are done.
How shall we compute $t_k$? Well, any set $S$ that meets the conditions must cover $S_k,ドル so all of the elements of $S_k$ are in $S,ドル and the only question is the remaining elements. Set $\mathcal{U}' = \mathcal{U} \setminus S_k,ドル $S'_i = S_i \setminus S_k,ドル and $W' = W-|S_k|$. Then $t_k$ is the number of $W'$-sets (over universe $\mathcal{U}'$) that don't cover any of $S'_1,\dots,S'_{k-1},ドル i.e., the cardinality of the set
$$\{S' \subseteq \mathcal{U}' : |S'|=W', S'_1 \not\subseteq S', \dots, S'_{k-1} \not\subseteq S'\}.$$
As you stare at this, the condition on $S'$ turns out to be expressible as a boolean formula in CNF form. Let $x_1,\dots,x_{N'}$ (where $N'=|\mathcal{U}'|$) be boolean variables; the idea is that $x_i$ is true iff $i \in S$. Then the condition $S'_1 \not\subseteq S', \dots, S'_{k-1} \not\subseteq S'$ is equivalent to a CNF formula on $x_1,\dots,x_{N'}$ with $k-1$ clauses and $N'$ variables; each constraint $S'_i \not\subseteq S'$ induces a clause $\lor_{j \in S'_i} \neg x_j$. Then, $t_k$ is the number of satisfying assignments of weight $k$ for this formula. You can easily conjoin clauses to this formula that require the assignment to have weight $k$.
In this way, we have reduced the problem of computing $t_k$ to the problem of counting the number of satisfying assignments of a boolean formula in CNF form. This is exactly the #SAT problem. The problem is known to be hard, but there are off-the-shelf solvers you could try. There are also solvers that yield an approximation to the number, which are worth trying as well. See, e.g., Is there a sometimes-efficient algorithm to solve #SAT? and https://cstheory.stackexchange.com/q/18135/5038.
I don't know whether this will be fast enough for your needs, but it is a possible approach you could consider.
Inclusion-Exclusion
If $I \subseteq \{1,2,\dots,M\}$ is a set of indices, let $u_I$ denote number of $W$-sets that contain $S_i$ for each $i \in I,ドル i.e., the cardinality of the set
$$\{S \subseteq \mathcal{U} : |S|=W, \forall i \in I . S_i \subseteq S\}.$$
You can express the number you want using the inclusion-exclusion formula. In particular, you want to compute
$$u = \sum_I (-1)^{|I|} \cdot u_I$$
where $I$ ranges over all non-empty subsets of $\{1,2,\dots,M\}$.
Heuristically, we can expect that the larger $I$ is, the smaller $u_I$ will be, so we can approximate the above sum as follows:
$$u \approx \sum_{|I| \le 8} (-1)^{|I|} \cdot u_I.$$
Note that you can compute $u_I$ easily for any given $I$; in particular,
$$u_I = {N-|S_I| \choose W-|S_I|}$$
where we define $S_I = \cup_{i \in U} S_i$. Now since ${50 \choose 8} \approx 2^{29},ドル for your particular parameters it is easy enough to enumerate all index sets $I$ such that $|I|\le 8$ and compute $u_I$ for each, then use the approximate inclusion-exclusion formula above. You can adjust the parameter 8 to trade off computation time vs quality of the approximation.
The disadvantage is that there is no a priori guarantee on how good the approximation will be -- but this might be useful nonetheless.
It might be possible to refine and improve this approach further. While it's true that on average we expect larger sets $I$ correspond to smaller values of $u_I,ドル this isn't necessarily true. Therefore, it's possible that you might get a better approximation by picking a threshold $\tau$ and then using
$$u \approx \sum_{u_I \ge \tau} (-1)^{|I|} \cdot u_I.$$
This requires you to iterate through sets $I$ in order of decreasing $u_I,ドル and then stop once you reach the threshold $\tau$. How do you do this? The key insight is that as you add elements to $I,ドル the value of $u_I$ gets monotonically smaller. Therefore, you can use breadth-first search to explore which elements to add to the set. You start out with all sets $I$ of size 1 and push them onto a priority queue. The elements of the priority queue are ordered by the value of $u_I$. In each step, you pop from the priority queue a set $I$ with the largest value of $u_I$ out of all on the queue; and assuming you haven't processed $I$ before, you add $(-1)^{|I|} \cdot u_I$ to your running total and then push the sets $I \cup \{j\}$ onto the queue for each $j \notin I$.
I suggest trying both variants on inclusion-exclusion to see which gives a better approximation.
-
$\begingroup$ Isn't that the case that in inclusion-exclusion principle $u = \sum_{k=1}^{M} (-1)^k \left( \sum_{|I|=k} u_I \right)$ terms in brackets decrease with increase of $k$? $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年02月01日 08:37:20 +00:00Commented Feb 1, 2018 at 8:37
-
$\begingroup$ @YauhenYakimenka, I'm not sure. If you are asking does $u_I$ decrease as $|I|$ increases, not necessarily. I think you can have sets $I,I'$ with $|I'|>|I|$ but $u_{I'} > u_I$. If you are asking does $\sum_{|I|=k} u_I$ decrease as $k$ increases, I don't know the answer to that. That sounds plausible but you might want to double-check it to be sure. $\endgroup$2018年02月01日 16:27:35 +00:00Commented Feb 1, 2018 at 16:27
-
$\begingroup$ I was meaning the latter... $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年02月01日 17:02:33 +00:00Commented Feb 1, 2018 at 17:02
Is using the Inclusion-Exclusion-Principle (also called Sieve-formula) too inefficient? According to the principle, if the union of any $L+1$ sets out of $S_{1}, S_{2}, \ldots S_{M}$ is bigger than $W$ then the number you are looking for is $$ \sum_{n=1}^{L}{(-1)^n \sum_{i_{1},\ldots,i_{n}} \binom{N - |S_{i_{1}} \cup \ldots \cup S_{i_{n}}|}{W - |S_{i_{1}} \cup \ldots \cup S_{i_{n}}|}} $$ Edit: The complexity is around $M^{\frac{W}{|S_{i}|}}$ if the $S_{i}$s are close to being disjoint and around 2ドル^M$ if the $S_{i}$s are close to being equal, so probably to big indeed.
-
1$\begingroup$ And the complexity then is around 2ドル^M$? For $M=5000$ it doesn't look anything close to realistic... $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年01月18日 08:42:31 +00:00Commented Jan 18, 2018 at 8:42
-
1$\begingroup$ The complexity is around $M^{\frac{W}{|S_{i}|}},ドル but yes probably in most cases to big. $\endgroup$Tore Vincent Carstens– Tore Vincent Carstens2018年01月18日 09:20:55 +00:00Commented Jan 18, 2018 at 9:20
-
$\begingroup$ on the other hand, I now think I can at least use Bonferroni inequalities (i.e. partial sums from Inclusion-Exclusion Principle) to at least bound the exact answer. $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年01月20日 08:19:25 +00:00Commented Jan 20, 2018 at 8:19
-
$\begingroup$ By the way, do you know some good algorithm to calculate the formula, better than straightforward calculation? $\endgroup$Yauhen Yakimenka– Yauhen Yakimenka2018年01月30日 10:36:59 +00:00Commented Jan 30, 2018 at 10:36