I am interested to find an absolute value (not an approximation) of "combination without repetition" for given \$n\$ and \$k\,ドル or \$\binom{n}{k}\$.
The brute force solution would look like this
private static ulong Factorial(int x)
{
ulong res = 1;
while (x > 1)
{
res *= (ulong)x--;
}
return res;
}
public static int Combination0(int k, int n)
{
k = Math.Min(k, n - k);
if (n < 2 || k < 1) return 1;
if (k == 1) return n;
return (int)(Factorial(n) / (Factorial(k) * Factorial(n - k)));
}
We can slightly optimize this solution, by finding \$\prod_{n\geq i>k}{i}\$ instead of \$\frac{n!}{(n-k)!}\$.
private static ulong Factorial(int x, int until = 0)
{
ulong res = 1;
while (x > until)
{
res *= (ulong)x--;
}
return res;
}
public static int Combination1(int k, int n)
{
k = Math.Min(k, n - k);
if (n < 2 || k < 1) return 1;
if (k == 1) return n;
return (int)(Factorial(n, n - k) / Factorial(k));
}
But these two solutions have one significant problem - we are limited by ulong.MaxValue
, which is more than \20ドル!\,ドル but less than \21ドル!\$.
Another way to find the number of combinations, which doesn't have the previously described problem, is the Pascal's triangle.
public static int Combination2(int k, int n)
{
k = Math.Min(k, n - k);
if (n < 2 || k < 1) return 1;
if (k == 1) return n;
int[] triangle = new int[k + 1];
triangle[0] = 1;
// expanding
int i = 0;
for (; i < k; i++)
{
for (int j = i + 1; j > 0; j--)
{
triangle[j] += triangle[j - 1];
}
}
// progressing
for (; i < n - k; i++)
{
for (int j = k; j > 0; j--)
{
triangle[j] += triangle[j - 1];
}
}
// collapsing
for (; i < n; i++)
{
int until = k - (n - i);
for (int j = k; j > until; j--)
{
triangle[j] += triangle[j - 1];
}
}
return triangle[k];
}
example of finding a number of combinations of 8 taken 3
But the problem is that Combination2
is significantly slow.
I would appreciate any comments and suggestions for an improvement.
Update
@quasar and @henrik-hansen suggested the way to prevent overflow by calculating \$\prod_{0 \leq i < k}{\frac{n-i}{i+1}}\$.
3 Answers 3
You should forget about factorial when it comes to Combinations (n, k). Instead you can use the formula: n(n-1)(n-2)...(n-k+1)/(1*2*3*...*k)
. You start with n
and then iterate over x = 1 .. k - 1
and successively multiply with (n-x)
and at the same time reduce by dividing with x
. All in all it ends up like this:
public ulong Combinations(ulong n, ulong k)
{
ulong count = n;
for (ulong x = 1; x <= k - 1; x++)
{
count = count * (n - x) / x;
}
return count / k;
}
In this way you prevent overflow from intermediate factorial calculations.
-
1\$\begingroup\$ Slightly modified your solution. Thank you! \$\endgroup\$pgs– pgs2018年06月17日 14:09:31 +00:00Commented Jun 17, 2018 at 14:09
-
\$\begingroup\$ @pgs: It looks good, but when I compare some of your algorithms against mine, mine "wins" in all cases. I think, the cause that mine performs rather badly in you tests may be the conversions you do from int to ulong? Why do you stick to ints? \$\endgroup\$user73941– user739412018年06月18日 06:27:30 +00:00Commented Jun 18, 2018 at 6:27
-
\$\begingroup\$ Indeed, I changed
ulong
toint
where it makes sense. Results of benchmarking now are updated. \$\endgroup\$pgs– pgs2018年06月18日 14:15:38 +00:00Commented Jun 18, 2018 at 14:15
One trick is to keep the partial products in small, \${n}\choose{k}\$ so they don't overflow.
I iteratively multiply \$n/(n-k)\$ by \$(n-1)/(n-k-1)\,ドル cache the result in an accumulator, multiply that by \$(n-2)/(n-k-2)\$ and so forth.
#include <iostream>
template <class T>
T choose(T n, T k)
{
T accum = 1;
T m = n;
for (T i = 1; i <= m - k; i++)
{
accum = accum * n / (n - k);
n--;
}
return accum;
}
int main()
{
std::cout << std::fixed;
long double n = 50, k = 25;
std::cout << "\nLDBL_MAX" << LDBL_MAX;
long double result = choose(n, k);
std::cout << "\nC(" << n << "," << k << ") = " << result;
std::cin.clear();
std::cin.get();
return 0;
}
-
1\$\begingroup\$ It works if you use floating point types for
T
but not for integer types! \$\endgroup\$user73941– user739412018年06月17日 07:02:00 +00:00Commented Jun 17, 2018 at 7:02
I think the question of which method is faster depends on your intended use. FWIW I've found that if I use any C(N,k) in a program then I tend to use a lot of them. In this case the fastest way can be to make a table of all the C(N,k) (0<=k<=N, N<=Nmax) at program start, and then the speed of the routine for calculating C(N,k) is less important. Anyway, I think that the fastest way to do this is to use the recurrence relation; my tests show it's around 100 times faster than Henrik's method for Nmax = 57.
The recurrence relation method is also able to compute a few more rows than the others. I reckon overflow strikes Henrik's method around N=58, whereas the recurrence relation is good up to N=68
Finally in production code I would want to see at least a claim, ideally a reference to a proof, that where there are integer divisions the left hand side is indeed a multiple of the right hand side. This is the case in Henrik's method, but as Henrik points out, not in Quasars's. The problem with not having such a claim is that, say in debugging, someone might stare at the code and wonder if they are losing accuracy.
Explore related questions
See similar questions with these tags.