On a well known programming challenges website(CW) there is a Kyu2 eggs height problem that can be solved utilizing a variation of Pascals Triangles properties.
The problem asks to solve f(n, m)
and the solution is the sum of Pascal's Triangle row m
up to cell/column n
, so the sum of
f(i) = (m + 1 - i) * f(i-1) / i
for i = 1 to n
.
On the second variation of that problem (Kyu1) it is asked that you solve the same challenge linearly with the f(n, m)
output being orders of magnitude bigger.
I've tried to solve the challenge with the code below, but it exceed the allowed time constrains to solve the challenge.
using System.Numerics;
public class EggChallenge
{
private static BigInteger mo = new BigInteger(998244353);
public static BigInteger Height(BigInteger n, BigInteger m)
{
return n == 0 || m == 0 ? 0 : BigInteger.Remainder(SumHeight(n+1, m), mo);
}
public static BigInteger SumHeight(BigInteger n, BigInteger m)
{
BigInteger BigInt = new BigInteger(1);
BigInteger BigIntegerSum = new BigInteger(0);
for(int i = 1; i < n; i++)
{
BigInt = BigInteger.Divide(BigInteger.Multiply(m + 1 - i, BigInt), (BigInteger)i);
BigIntegerSum = BigInteger.Add(BigIntegerSum, BigInt);
}
return BigIntegerSum;
}
}
The above code solves test cases such as:
EggChallenge.Height(10000,100000);
EggChallenge.Height(80000,100000);
EggChallenge.Height(3000,BigInteger.Pow(2,200));
But it's not able to beat the actual challenge time constrains.
I don't think the algorithm is what to improve has it seems to run at O(n), but I could be very wrong.
So far I've tried memorization in a BigInteger[]
to release the cost of the constant addition, but it performed worst then the current code.
My question is how would this code be improved performance wise, which techniques could be applied to improve code performance?, and if possible why are those the best to apply?
4 Answers 4
using System.Numerics; public class EggChallenge { private static BigInteger mo = new BigInteger(998244353); public static BigInteger Height(BigInteger n, BigInteger m) { return n == 0 || m == 0 ? 0 : BigInteger.Remainder(SumHeight(n+1, m), mo); } public static BigInteger SumHeight(BigInteger n, BigInteger m) { BigInteger BigInt = new BigInteger(1); BigInteger BigIntegerSum = new BigInteger(0); for(int i = 1; i < n; i++) { BigInt = BigInteger.Divide(BigInteger.Multiply(m + 1 - i, BigInt), (BigInteger)i); BigIntegerSum = BigInteger.Add(BigIntegerSum, BigInt); } return BigIntegerSum; } }
C# supports operator overloading subject to some constraints, and BigInteger
has all of the arithmetical operators implemented. So you can improve legibility a lot by using standard operators rather than BigInteger.OperatorName(foo, bar)
.
Local variables in C# normally begin with a lower case letter, so using upper case is not very helpful. In addition, BigInt
doesn't tell me anything useful: I know that the type is BigInteger
because I can see the declaration, but what does the variable mean?
Renaming and using overloaded operators I get rewritten code which IMO is easier to read:
BigInteger Height(BigInteger n, BigInteger m)
{
return n == 0 || m == 0 ? 0 : SumHeight(n + 1, m) % mo;
}
BigInteger SumHeight(BigInteger n, BigInteger m)
{
BigInteger term = 1;
BigInteger sum = 0;
for (int i = 1; i < n; i++)
{
term = (m + 1 - i) * term / i;
sum += term;
}
return sum;
}
I don't think the algorithm is what to improve has it seems to run at O(n), but I could be very wrong.
I'm afraid that you are very wrong. The code is performing \$O(n)\$ arithmetical operations on BigInteger
s, but an arithmetical operation on a BigInteger
is not \$O(1)\$. The fundamental problem is this:
... SumHeight(n + 1, m) % mo;
It should instead be
... SumHeight(n + 1, m, mo);
with the %
being applied regularly inside the loop:
for (int i = 1; i < n; i++)
{
term = (m + 1 - i) * term % mo;
term = ModDiv(term, i, mo); // I'll come back to this
sum += term;
}
return sum % mo;
The massive advantage is that you no longer need BigInteger
: it suffices to use long
(or maybe ulong
: it's a matter of taste because 998244353
was chosen such that long
would work in languages like Java which don't have ulong
).
I infer from your comment on an earlier answer that you haven't studied modular arithmetic. This is quite understandable: I think it's safe to say that 99% of programmers will never use it in their careers. But although it is irrelevant for most line-of-business programming, it is quite important in code contests/challenges (interpreted broadly to include katas, Project Euler, etc.) because they frequently use modular arithmetic to make large test cases feasible. If you find katas either enjoyable or useful for your CV, it's probably worth your time to study.
The properties I'm relying on are (treating %
as an operator on arbitrary-width integers and assuming that we're avoiding overflow, which is % Pow(2, IntegerTypeWidth)
):
((a % m) + (b % m)) % m == (a + b) % m
((a % m) * (b % m)) % m == (a * b) % m
Now I can come back to modular division. Essentially ModDiv(term, i, mo)
needs to calculate a modular reciprocal of i
(i.e. j
such that i * j % mo == 1
) and then return term * j % mo
. There is always such a j
as long as it is coprime with mo
, and 998244353
is deliberately chosen to be a prime number, deliberately chosen so that you can divide by any number which isn't a multiple of it. The standard approach to calculate a modular reciprocal is the extended Euclidean algorithm, and is described in so many millions of web pages that I won't try to give my own explanation here.
To wrap up the explanation, I must address a point raised by Rick Davin. If you rewrite SumHeight
so that n
and m
are both long
, how to handle the test case m = BigInteger.Pow(2, 200)
? If we look at how m
is used, it takes part in addition and multiplication. So the way to do this is to do the modulo in BigInteger
and then cast to long
:
long Height(BigInteger n, BigInteger m)
{
return n == 0 || m == 0 ? 0 : SumHeight((long)n + 1, (long)(m % mo), mo);
}
-
\$\begingroup\$ Should probably be
long
since(ulong)Math.Pow(2, 200)
returns 0, which would short-circuit to return 0 inHeight
, whereas(long)Math.Pow(2, 200)
returns -9223372036854775808. \$\endgroup\$Rick Davin– Rick Davin2018年05月03日 14:56:40 +00:00Commented May 3, 2018 at 14:56 -
\$\begingroup\$ @RickDavin, I don't quite see it that way, but you indirectly raise an important point about my failure to address extreme values of
m
. In short:m
should be taken% mo
usingBigInteger
and then cast tolong
orulong
. I'll edit to address this later. \$\endgroup\$Peter Taylor– Peter Taylor2018年05月03日 14:59:36 +00:00Commented May 3, 2018 at 14:59 -
\$\begingroup\$ Peter thanks for your input, I'll need to delve deeper into modular reciprocals to understand the logic. Excuse my naivety but a couple question. Won't performing modulus per iteration have a big cost? plus the cost of calculating the modular reciprocals? I don't see the traded off (unless BigInt is really that expensive), and applying
% mo
perterm
will it not distort the calculation for the next iteration ofterm
? Should that correct itself in the implementation ofModDiv()
. \$\endgroup\$Miguel_Ryu– Miguel_Ryu2018年05月03日 18:39:34 +00:00Commented May 3, 2018 at 18:39 -
\$\begingroup\$ @Miguel_Ryu, yes, it's expensive, but it's still going to be a lot cheaper than working with 600000-bit numbers (in the extreme of
n = 3000, m = 2**200
. Applying it to term is fine unless you need to divide by a multiple of the modulus, but sincen
is never as large as the modulus andi < n
that's not a problem here. \$\endgroup\$Peter Taylor– Peter Taylor2018年05月03日 18:52:16 +00:00Commented May 3, 2018 at 18:52
I've got a version that is slightly faster. I'd like to think the whole algorithm could be improved but it's too early in the morning for me to concentrate on that. What I would caution you is that sometimes casting can hurt performance, most notably if you are casting hundreds of millions of times.
That said there are lots of implicit castings going on in your code. A couple of examples:
i < n
m + 1 - i
Certainly m
needs to be a BigInteger to hold Math.Pow(2, 200)
. But the constraint is that n
will be less than 80K
. So n
could be a simple integer, which means there will be no implicit casting in the for
conditional.
So I took your code and tried removing all such implicit castings to BigInteger while making n
an Int32
. I got over 10% improvement. On a side note, local variables should begin with a lowercase letter.
public static BigInteger Height(int n, BigInteger m)
{
return n == 0 || m == BigInteger.Zero ? BigInteger.Zero : BigInteger.Remainder(SumHeight(n + 1, m), mo);
}
public static BigInteger SumHeight(int n, BigInteger m)
{
BigInteger bigInt = BigInteger.One;
BigInteger bigIntegerSum = BigInteger.Zero;
for (int i = 1; i < n; i++)
{
BigInteger bi = new BigInteger(i);
BigInt = BigInteger.Divide(BigInteger.Multiply(BigInteger.Subtract(BigInteger.Add(m, BigInteger.One), bi), bigInt), bi);
bigIntegerSum = BigInteger.Add(bigIntegerSum, bigInt);
}
return bigIntegerSum;
}
Note that just your for
loop alone required i
to be cast as BigInteger
3 times (2 implicitly and 1 explicit). My reworking reduces that to only 1. Still, it would be cool to see if there is a better algorithm. Perhaps there should be some binary halving method. Instead of going from the bottom floor up, test the middle floor. If it fails, check the middle between it and the bottom, whereas if it succeeded check the middle between the top?
Okay, I've got a fundamental improvement you can make, that will greatly speed up your second test case.
Basically, you're looping to get the sum of all entries in a row of Pascal's triangle up through entry N, right? If you call GetHeight(80000,100000)
, you're essentially looping through 80,000 values.
But there's another property of Pascal's triangle you can exploit: the sum of the entire row is just 2^RowNum. So for values where your n is more than half your m, you can speed up the process by calculating GetHeight(m-n+1,m), and subtracting it from 2^RowNum.
In my run-through of your second test case, I was able to complete it in 0.8 seconds instead of 5.8.
public static BigInteger SumHeight(BigInteger n, BigInteger m)
{
BigInteger lengthRemaining = m - n + 1;
if (lengthRemaining < n)
{
// yes, the line below is a kludge that will fail on very large m values.
BigInteger tot = BigInteger.Pow(2, (int)m);
return tot - SumHeight(lengthRemaining, m);
}
BigInteger BigInt = new BigInteger(1);
BigInteger BigIntegerSum = new BigInteger(0);
for (int i = 1; i < n; i++)
{
BigInt = ((m + 1 - i) * BigInt) / (BigInteger)i;
BigIntegerSum = BigIntegerSum + BigInt;
}
return BigIntegerSum;
}
If n
is BigInteger then i
needs to be BigInteger also.
Taking the modus of 998244353 in the loop will prevent BigInteger from having to expand.
Don't need BigInteger for 998244353. That will fit in Int64;
Kind of hard for me to believe n
is BigInteger is the requirement. According to the links n in just integer. You are wasting CPU by not being exact.
for(int i = 1; i < n; i++)
This is not iterating i = 1 to n
Shorter syntax but not sure it is any better.
Because of integer rounding these are not same.
term = (m + 1 - i) * term / i;
term *= (m + 1) / i - 1;
Just looking at I don't see a way to short circuit anything
for (BigInteger i = 1; i < n; i++)
{
term *= (m + 1) / i - 1;
sum += term;
}
-
\$\begingroup\$ "Shorter syntax but not sure it is any better": it's much worse, because it might not give the same result. \$\endgroup\$Peter Taylor– Peter Taylor2018年05月03日 16:37:08 +00:00Commented May 3, 2018 at 16:37
-
\$\begingroup\$ @PeterTaylor How? Do you have an example? \$\endgroup\$paparazzo– paparazzo2018年05月03日 16:43:14 +00:00Commented May 3, 2018 at 16:43
-
\$\begingroup\$ It's integer division. Why should \$m + 1\$ be divisible by \$n!\$? \$\endgroup\$Peter Taylor– Peter Taylor2018年05月03日 17:37:40 +00:00Commented May 3, 2018 at 17:37
-
\$\begingroup\$ @PeterTaylor Yes I figured that out an revised the question. Thanks \$\endgroup\$paparazzo– paparazzo2018年05月03日 17:50:59 +00:00Commented May 3, 2018 at 17:50
-
\$\begingroup\$ May I ask why the down vote. \$\endgroup\$paparazzo– paparazzo2018年05月03日 21:10:25 +00:00Commented May 3, 2018 at 21:10
Explore related questions
See similar questions with these tags.