Pascal Triangle calculation with BigIntegers

Question 1

On a well known programming challenges website(CW) there is a Kyu2 eggs height problem that can be solved utilizing a variation of Pascals Triangles properties.

The problem asks to solve f(n, m) and the solution is the sum of Pascal's Triangle row m up to cell/column n, so the sum of f(i) = (m + 1 - i) * f(i-1) / i for i = 1 to n.

On the second variation of that problem (Kyu1) it is asked that you solve the same challenge linearly with the f(n, m) output being orders of magnitude bigger.

I've tried to solve the challenge with the code below, but it exceed the allowed time constrains to solve the challenge.

using System.Numerics;
public class EggChallenge
{
 private static BigInteger mo = new BigInteger(998244353);
 public static BigInteger Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : BigInteger.Remainder(SumHeight(n+1, m), mo);
 }
 public static BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger BigInt = new BigInteger(1);
 BigInteger BigIntegerSum = new BigInteger(0);
 for(int i = 1; i < n; i++)
 {
 BigInt = BigInteger.Divide(BigInteger.Multiply(m + 1 - i, BigInt), (BigInteger)i);
 BigIntegerSum = BigInteger.Add(BigIntegerSum, BigInt);
 }
 return BigIntegerSum;
 }
}

The above code solves test cases such as:

EggChallenge.Height(10000,100000);
EggChallenge.Height(80000,100000);
EggChallenge.Height(3000,BigInteger.Pow(2,200));

But it's not able to beat the actual challenge time constrains.

I don't think the algorithm is what to improve has it seems to run at O(n), but I could be very wrong.

So far I've tried memorization in a BigInteger[] to release the cost of the constant addition, but it performed worst then the current code.

My question is how would this code be improved performance wise, which techniques could be applied to improve code performance?, and if possible why are those the best to apply?

initial challenge details
linear challenge details

Question 2

using System.Numerics;
public class EggChallenge
{
 private static BigInteger mo = new BigInteger(998244353);
 public static BigInteger Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : BigInteger.Remainder(SumHeight(n+1, m), mo);
 }
 public static BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger BigInt = new BigInteger(1);
 BigInteger BigIntegerSum = new BigInteger(0);
 for(int i = 1; i < n; i++)
 {
 BigInt = BigInteger.Divide(BigInteger.Multiply(m + 1 - i, BigInt), (BigInteger)i);
 BigIntegerSum = BigInteger.Add(BigIntegerSum, BigInt);
 }
 return BigIntegerSum;
 }
}

C# supports operator overloading subject to some constraints, and BigInteger has all of the arithmetical operators implemented. So you can improve legibility a lot by using standard operators rather than BigInteger.OperatorName(foo, bar).

Local variables in C# normally begin with a lower case letter, so using upper case is not very helpful. In addition, BigInt doesn't tell me anything useful: I know that the type is BigInteger because I can see the declaration, but what does the variable mean?

Renaming and using overloaded operators I get rewritten code which IMO is easier to read:

 BigInteger Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : SumHeight(n + 1, m) % mo;
 }
 BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger term = 1;
 BigInteger sum = 0;
 for (int i = 1; i < n; i++)
 {
 term = (m + 1 - i) * term / i;
 sum += term;
 }
 return sum;
 }

I don't think the algorithm is what to improve has it seems to run at O(n), but I could be very wrong.

I'm afraid that you are very wrong. The code is performing \$O(n)\$ arithmetical operations on BigIntegers, but an arithmetical operation on a BigInteger is not \$O(1)\$. The fundamental problem is this:

 ... SumHeight(n + 1, m) % mo;

It should instead be

 ... SumHeight(n + 1, m, mo);

with the % being applied regularly inside the loop:

 for (int i = 1; i < n; i++)
 {
 term = (m + 1 - i) * term % mo;
 term = ModDiv(term, i, mo); // I'll come back to this
 sum += term;
 }
 return sum % mo;

The massive advantage is that you no longer need BigInteger: it suffices to use long (or maybe ulong: it's a matter of taste because 998244353 was chosen such that long would work in languages like Java which don't have ulong).

I infer from your comment on an earlier answer that you haven't studied modular arithmetic. This is quite understandable: I think it's safe to say that 99% of programmers will never use it in their careers. But although it is irrelevant for most line-of-business programming, it is quite important in code contests/challenges (interpreted broadly to include katas, Project Euler, etc.) because they frequently use modular arithmetic to make large test cases feasible. If you find katas either enjoyable or useful for your CV, it's probably worth your time to study.

The properties I'm relying on are (treating % as an operator on arbitrary-width integers and assuming that we're avoiding overflow, which is % Pow(2, IntegerTypeWidth)):

((a % m) + (b % m)) % m == (a + b) % m
((a % m) * (b % m)) % m == (a * b) % m

Now I can come back to modular division. Essentially ModDiv(term, i, mo) needs to calculate a modular reciprocal of i (i.e. j such that i * j % mo == 1) and then return term * j % mo. There is always such a j as long as it is coprime with mo, and 998244353 is deliberately chosen to be a prime number, deliberately chosen so that you can divide by any number which isn't a multiple of it. The standard approach to calculate a modular reciprocal is the extended Euclidean algorithm, and is described in so many millions of web pages that I won't try to give my own explanation here.

To wrap up the explanation, I must address a point raised by Rick Davin. If you rewrite SumHeight so that n and m are both long, how to handle the test case m = BigInteger.Pow(2, 200)? If we look at how m is used, it takes part in addition and multiplication. So the way to do this is to do the modulo in BigInteger and then cast to long:

 long Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : SumHeight((long)n + 1, (long)(m % mo), mo);
 }

Question 3

Should probably be long since (ulong)Math.Pow(2, 200) returns 0, which would short-circuit to return 0 in Height, whereas (long)Math.Pow(2, 200) returns -9223372036854775808.

Question 4

@RickDavin, I don't quite see it that way, but you indirectly raise an important point about my failure to address extreme values of m. In short: m should be taken % mo using BigInteger and then cast to long or ulong. I'll edit to address this later.

Question 5

Peter thanks for your input, I'll need to delve deeper into modular reciprocals to understand the logic. Excuse my naivety but a couple question. Won't performing modulus per iteration have a big cost? plus the cost of calculating the modular reciprocals? I don't see the traded off (unless BigInt is really that expensive), and applying % mo per term will it not distort the calculation for the next iteration of term? Should that correct itself in the implementation of ModDiv().

Question 6

@Miguel_Ryu, yes, it's expensive, but it's still going to be a lot cheaper than working with 600000-bit numbers (in the extreme of n = 3000, m = 2**200. Applying it to term is fine unless you need to divide by a multiple of the modulus, but since n is never as large as the modulus and i < n that's not a problem here.

Question 7

I've got a version that is slightly faster. I'd like to think the whole algorithm could be improved but it's too early in the morning for me to concentrate on that. What I would caution you is that sometimes casting can hurt performance, most notably if you are casting hundreds of millions of times.

That said there are lots of implicit castings going on in your code. A couple of examples:

i < n
m + 1 - i

Certainly m needs to be a BigInteger to hold Math.Pow(2, 200). But the constraint is that n will be less than 80K. So n could be a simple integer, which means there will be no implicit casting in the for conditional.

So I took your code and tried removing all such implicit castings to BigInteger while making n an Int32. I got over 10% improvement. On a side note, local variables should begin with a lowercase letter.

public static BigInteger Height(int n, BigInteger m)
{
 return n == 0 || m == BigInteger.Zero ? BigInteger.Zero : BigInteger.Remainder(SumHeight(n + 1, m), mo);
}
public static BigInteger SumHeight(int n, BigInteger m)
{
 BigInteger bigInt = BigInteger.One;
 BigInteger bigIntegerSum = BigInteger.Zero;
 for (int i = 1; i < n; i++)
 {
 BigInteger bi = new BigInteger(i);
 BigInt = BigInteger.Divide(BigInteger.Multiply(BigInteger.Subtract(BigInteger.Add(m, BigInteger.One), bi), bigInt), bi);
 bigIntegerSum = BigInteger.Add(bigIntegerSum, bigInt);
 }
 return bigIntegerSum;
}

Note that just your for loop alone required i to be cast as BigInteger 3 times (2 implicitly and 1 explicit). My reworking reduces that to only 1. Still, it would be cool to see if there is a better algorithm. Perhaps there should be some binary halving method. Instead of going from the bottom floor up, test the middle floor. If it fails, check the middle between it and the bottom, whereas if it succeeded check the middle between the top?

Question 8

Okay, I've got a fundamental improvement you can make, that will greatly speed up your second test case.

Basically, you're looping to get the sum of all entries in a row of Pascal's triangle up through entry N, right? If you call GetHeight(80000,100000), you're essentially looping through 80,000 values.

But there's another property of Pascal's triangle you can exploit: the sum of the entire row is just 2^RowNum. So for values where your n is more than half your m, you can speed up the process by calculating GetHeight(m-n+1,m), and subtracting it from 2^RowNum.

In my run-through of your second test case, I was able to complete it in 0.8 seconds instead of 5.8.

 public static BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger lengthRemaining = m - n + 1;
 if (lengthRemaining < n)
 {
// yes, the line below is a kludge that will fail on very large m values.
 BigInteger tot = BigInteger.Pow(2, (int)m);
 return tot - SumHeight(lengthRemaining, m);
 }
 BigInteger BigInt = new BigInteger(1);
 BigInteger BigIntegerSum = new BigInteger(0);
 for (int i = 1; i < n; i++)
 {
 BigInt = ((m + 1 - i) * BigInt) / (BigInteger)i;
 BigIntegerSum = BigIntegerSum + BigInt;
 }
 return BigIntegerSum;
 }

Question 9

If n is BigInteger then i needs to be BigInteger also.

Taking the modus of 998244353 in the loop will prevent BigInteger from having to expand.

Don't need BigInteger for 998244353. That will fit in Int64;

Kind of hard for me to believe n is BigInteger is the requirement. According to the links n in just integer. You are wasting CPU by not being exact.

for(int i = 1; i < n; i++)

This is not iterating i = 1 to n

Shorter syntax but not sure it is any better.
Because of integer rounding these are not same.

term = (m + 1 - i) * term / i; 
term *= (m + 1) / i - 1;

Just looking at I don't see a way to short circuit anything

for (BigInteger i = 1; i < n; i++)
{
 term *= (m + 1) / i - 1;
 sum += term;
}

Question 10

"Shorter syntax but not sure it is any better": it's much worse, because it might not give the same result.

Question 11

@PeterTaylor How? Do you have an example?

Question 12

It's integer division. Why should \$m + 1\$ be divisible by \$n!\$?

Question 13

@PeterTaylor Yes I figured that out an revised the question. Thanks

Question 14

May I ask why the down vote.

Peter Taylor Peter Taylor 24.5k1 gold badge49 silver badges94 bronze badges · Answer 1 · 2018-05-03 14:22:48Z

using System.Numerics;
public class EggChallenge
{
 private static BigInteger mo = new BigInteger(998244353);
 public static BigInteger Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : BigInteger.Remainder(SumHeight(n+1, m), mo);
 }
 public static BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger BigInt = new BigInteger(1);
 BigInteger BigIntegerSum = new BigInteger(0);
 for(int i = 1; i < n; i++)
 {
 BigInt = BigInteger.Divide(BigInteger.Multiply(m + 1 - i, BigInt), (BigInteger)i);
 BigIntegerSum = BigInteger.Add(BigIntegerSum, BigInt);
 }
 return BigIntegerSum;
 }
}

C# supports operator overloading subject to some constraints, and BigInteger has all of the arithmetical operators implemented. So you can improve legibility a lot by using standard operators rather than BigInteger.OperatorName(foo, bar).

Local variables in C# normally begin with a lower case letter, so using upper case is not very helpful. In addition, BigInt doesn't tell me anything useful: I know that the type is BigInteger because I can see the declaration, but what does the variable mean?

Renaming and using overloaded operators I get rewritten code which IMO is easier to read:

 BigInteger Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : SumHeight(n + 1, m) % mo;
 }
 BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger term = 1;
 BigInteger sum = 0;
 for (int i = 1; i < n; i++)
 {
 term = (m + 1 - i) * term / i;
 sum += term;
 }
 return sum;
 }

I don't think the algorithm is what to improve has it seems to run at O(n), but I could be very wrong.

I'm afraid that you are very wrong. The code is performing \$O(n)\$ arithmetical operations on BigIntegers, but an arithmetical operation on a BigInteger is not \$O(1)\$. The fundamental problem is this:

 ... SumHeight(n + 1, m) % mo;

It should instead be

 ... SumHeight(n + 1, m, mo);

with the % being applied regularly inside the loop:

 for (int i = 1; i < n; i++)
 {
 term = (m + 1 - i) * term % mo;
 term = ModDiv(term, i, mo); // I'll come back to this
 sum += term;
 }
 return sum % mo;

The massive advantage is that you no longer need BigInteger: it suffices to use long (or maybe ulong: it's a matter of taste because 998244353 was chosen such that long would work in languages like Java which don't have ulong).

I infer from your comment on an earlier answer that you haven't studied modular arithmetic. This is quite understandable: I think it's safe to say that 99% of programmers will never use it in their careers. But although it is irrelevant for most line-of-business programming, it is quite important in code contests/challenges (interpreted broadly to include katas, Project Euler, etc.) because they frequently use modular arithmetic to make large test cases feasible. If you find katas either enjoyable or useful for your CV, it's probably worth your time to study.

The properties I'm relying on are (treating % as an operator on arbitrary-width integers and assuming that we're avoiding overflow, which is % Pow(2, IntegerTypeWidth)):

((a % m) + (b % m)) % m == (a + b) % m
((a % m) * (b % m)) % m == (a * b) % m

Now I can come back to modular division. Essentially ModDiv(term, i, mo) needs to calculate a modular reciprocal of i (i.e. j such that i * j % mo == 1) and then return term * j % mo. There is always such a j as long as it is coprime with mo, and 998244353 is deliberately chosen to be a prime number, deliberately chosen so that you can divide by any number which isn't a multiple of it. The standard approach to calculate a modular reciprocal is the extended Euclidean algorithm, and is described in so many millions of web pages that I won't try to give my own explanation here.

To wrap up the explanation, I must address a point raised by Rick Davin. If you rewrite SumHeight so that n and m are both long, how to handle the test case m = BigInteger.Pow(2, 200)? If we look at how m is used, it takes part in addition and multiplication. So the way to do this is to do the modulo in BigInteger and then cast to long:

 long Height(BigInteger n, BigInteger m)
 {
 return n == 0 || m == 0 ? 0 : SumHeight((long)n + 1, (long)(m % mo), mo);
 }

Should probably be long since (ulong)Math.Pow(2, 200) returns 0, which would short-circuit to return 0 in Height, whereas (long)Math.Pow(2, 200) returns -9223372036854775808.
@RickDavin, I don't quite see it that way, but you indirectly raise an important point about my failure to address extreme values of m. In short: m should be taken % mo using BigInteger and then cast to long or ulong. I'll edit to address this later.
Peter thanks for your input, I'll need to delve deeper into modular reciprocals to understand the logic. Excuse my naivety but a couple question. Won't performing modulus per iteration have a big cost? plus the cost of calculating the modular reciprocals? I don't see the traded off (unless BigInt is really that expensive), and applying % mo per term will it not distort the calculation for the next iteration of term? Should that correct itself in the implementation of ModDiv().
@Miguel_Ryu, yes, it's expensive, but it's still going to be a lot cheaper than working with 600000-bit numbers (in the extreme of n = 3000, m = 2**200. Applying it to term is fine unless you need to divide by a multiple of the modulus, but since n is never as large as the modulus and i < n that's not a problem here.

Rick Davin Rick Davin 6,7321 gold badge20 silver badges32 bronze badges · Answer 2 · 2018-05-03 13:15:33Z

I've got a version that is slightly faster. I'd like to think the whole algorithm could be improved but it's too early in the morning for me to concentrate on that. What I would caution you is that sometimes casting can hurt performance, most notably if you are casting hundreds of millions of times.

That said there are lots of implicit castings going on in your code. A couple of examples:

i < n
m + 1 - i

Certainly m needs to be a BigInteger to hold Math.Pow(2, 200). But the constraint is that n will be less than 80K. So n could be a simple integer, which means there will be no implicit casting in the for conditional.

So I took your code and tried removing all such implicit castings to BigInteger while making n an Int32. I got over 10% improvement. On a side note, local variables should begin with a lowercase letter.

public static BigInteger Height(int n, BigInteger m)
{
 return n == 0 || m == BigInteger.Zero ? BigInteger.Zero : BigInteger.Remainder(SumHeight(n + 1, m), mo);
}
public static BigInteger SumHeight(int n, BigInteger m)
{
 BigInteger bigInt = BigInteger.One;
 BigInteger bigIntegerSum = BigInteger.Zero;
 for (int i = 1; i < n; i++)
 {
 BigInteger bi = new BigInteger(i);
 BigInt = BigInteger.Divide(BigInteger.Multiply(BigInteger.Subtract(BigInteger.Add(m, BigInteger.One), bi), bigInt), bi);
 bigIntegerSum = BigInteger.Add(bigIntegerSum, bigInt);
 }
 return bigIntegerSum;
}

Note that just your for loop alone required i to be cast as BigInteger 3 times (2 implicitly and 1 explicit). My reworking reduces that to only 1. Still, it would be cool to see if there is a better algorithm. Perhaps there should be some binary halving method. Instead of going from the bottom floor up, test the middle floor. If it fails, check the middle between it and the bottom, whereas if it succeeded check the middle between the top?

Kevin Kevin 3461 silver badge4 bronze badges · Answer 3 · 2018-05-03 19:47:44Z

Okay, I've got a fundamental improvement you can make, that will greatly speed up your second test case.

Basically, you're looping to get the sum of all entries in a row of Pascal's triangle up through entry N, right? If you call GetHeight(80000,100000), you're essentially looping through 80,000 values.

But there's another property of Pascal's triangle you can exploit: the sum of the entire row is just 2^RowNum. So for values where your n is more than half your m, you can speed up the process by calculating GetHeight(m-n+1,m), and subtracting it from 2^RowNum.

In my run-through of your second test case, I was able to complete it in 0.8 seconds instead of 5.8.

 public static BigInteger SumHeight(BigInteger n, BigInteger m)
 {
 BigInteger lengthRemaining = m - n + 1;
 if (lengthRemaining < n)
 {
// yes, the line below is a kludge that will fail on very large m values.
 BigInteger tot = BigInteger.Pow(2, (int)m);
 return tot - SumHeight(lengthRemaining, m);
 }
 BigInteger BigInt = new BigInteger(1);
 BigInteger BigIntegerSum = new BigInteger(0);
 for (int i = 1; i < n; i++)
 {
 BigInt = ((m + 1 - i) * BigInt) / (BigInteger)i;
 BigIntegerSum = BigIntegerSum + BigInt;
 }
 return BigIntegerSum;
 }

paparazzo paparazzo 6,1263 gold badges20 silver badges41 bronze badges · Answer 4 · 2018-05-03 16:16:48Z

If n is BigInteger then i needs to be BigInteger also.

Taking the modus of 998244353 in the loop will prevent BigInteger from having to expand.

Don't need BigInteger for 998244353. That will fit in Int64;

Kind of hard for me to believe n is BigInteger is the requirement. According to the links n in just integer. You are wasting CPU by not being exact.

for(int i = 1; i < n; i++)

This is not iterating i = 1 to n

Shorter syntax but not sure it is any better.
Because of integer rounding these are not same.

term = (m + 1 - i) * term / i; 
term *= (m + 1) / i - 1;

Just looking at I don't see a way to short circuit anything

for (BigInteger i = 1; i < n; i++)
{
 term *= (m + 1) / i - 1;
 sum += term;
}

"Shorter syntax but not sure it is any better": it's much worse, because it might not give the same result.
It's integer division. Why should \$m + 1\$ be divisible by \$n!\$?
@PeterTaylor Yes I figured that out an revised the question. Thanks

Stack Exchange Network

Pascal Triangle calculation with BigIntegers

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Pascal Triangle calculation with BigIntegers

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions