I've read other solutions about solving the 10th problem for Project Euler (finding the sum of all prime numbers under 2,000,000) but I wanted to try it on my own using the Sieve of Eratosthenes method. It runs quickly and it seems to work for all of my test increments (under 10, first 1000 primes) but it doesn't seem to work for the full 2,000,000 range. I was wondering if anyone could help point out the problem in my code. I know it's not the prettiest/most efficient code, so sorry.
#include <iostream>
using namespace std;
int main(int argc, char const *argv[])
{
// upper limit & sum
int sum = 0;
int upLimit = 2000000;
// array of isPrime check for integers
bool isPrime[upLimit+1];
// initialize
for (int i=2; i <= upLimit; i++) {
isPrime[i] = true;
}
for (int i=2; i<=upLimit; i++) {
if (isPrime[i]) {
// add to sum
sum += i;
// mark all multiples
int j = i;
while (j <= upLimit) {
isPrime[j] = false;
j += i;
}
}
}
cout << sum << endl;
}
3 Answers 3
You need to check for overflow before adding.
for (int i=2; i<=upLimit; i++) {
if (isPrime[i]) {
// Check for overflow
if (std::numeric_limits<decltype(sum)>::max() - sum <= i) {
std::cerr << "Overflow" << std::endl;
return 1;
}
// add to sum
sum += i;
...
As int
is only guaranteed to hold values up to 32767, you'll find that you should use a long long
for sum
.
-
1\$\begingroup\$
int
is only guaranteed to hold as many values asshort
, which in turn guaranteed to hold as many values aschar
. In any case, if you suspect that overflow is a problem, testing for overflow inside anint
loop running up toupLimit
is a dubious solution. \$\endgroup\$vnp– vnp2015年01月06日 06:42:05 +00:00Commented Jan 6, 2015 at 6:42 -
\$\begingroup\$ Note that
long long
is only guaranteed to be at least 64-bits since C++11. \$\endgroup\$Daan– Daan2015年01月08日 14:36:54 +00:00Commented Jan 8, 2015 at 14:36 -
\$\begingroup\$ (Or C99, if you're not adverse to mixing the two) \$\endgroup\$Daan– Daan2015年01月08日 15:03:30 +00:00Commented Jan 8, 2015 at 15:03
The problem is overflow, which can be solved by changing some of the types in your code (I'm going to pretend i
is predeclared here):
int sum = 0;
int upLimit = 2000000;
int i = 2;
int j = i;
While int
is commonly a 32-bit integer, this isn't required. And we really care about the size (and the range it implies) of our integer types. So let's start by switching to the fixed width integer types defined in cstdint
*. We should also switch to the unsigned variety. Since our code only uses positive integers, this effectively doubles our overflow-safe range:
uint32_t sum = 0;
uint32_t upLimit = 2000000;
uint32_t i = 2;
uint32_t j = i;
This helps, but will still break eventually. So let's establish an upper bound on the maximum value that we'd like to store in sum
. There's some really interesting math in this area, but for now, let's pretend we don't know anything about primes. We only know that we'll be summing distinct 32-bit numbers. So that's the sum of at most 2^32 values. And the largest possible value is 2^32. Which means that sum
can never grow larger than 2^32 * 2^32 = 2^64. And we have a type that is guaranteed to be able to contain that:
uint64_t sum = 0;
No more overflow.
If you're interested, cstdint
also contains type definitions like uint_fast32_t
, which allow the compiler to use a larger type if that would allow it to generate more efficient code. See for example cstdint on CppReference. Usual caveats regarding premature optimization apply, of course.
One last tip, not related to overflow: a bool[]
takes up 1 bytes per value, while a std::vector<bool>
is allowed to be more space-efficient. And space-efficiency (and more generally cache-friendliness) will be paramount if you want to take this beyond a few million.
*) These definitions require C++11. If that's not available, many compilers offer similar types in C++98 mode, although portability might suffer. Alternatively, if your compiler supports C99, you might be able to get working fixed width definitions from stdint.h
Other than the datatype limits, try this optimisation with the information that any prime greater than 3
can be represented as 6x+1
or 6x-1
.
int sum = 5; // (2 + 3)
// For generic upLimit, i.e upLimit<4 have the conditional returns.
// Initialize the isPrime array with true.
// loop through i with i=5, check for i (6x-1), i+2 (6x+1) and then increase i by 6
// (which is done in two steps i+=2 and i+=4.
for (int i=5; i<=upLimit; i+=4) {
if (isPrime[i]) { // Prime 6x-1
// add to sum
sum += i;
// mark all multiples
int j = i;
while (j <= upLimit) {
isPrime[j] = false;
j += i;
}
}
i+=2; // Prime 6x+1
if (isPrime[i]) {
// add to sum
sum += i;
// mark all multiples
int j = i;
while (j <= upLimit) {
isPrime[j] = false;
j += i;
}
}
}
-
\$\begingroup\$ While this is not bad advice, in my experience, space efficiency matters more to a sieve. For example, storing only odd candidates in your sieve array (such that
isPrime[i]
represents candidate2i+1
) will likely provide a larger speedup than the above optimization. IIRC, the paper available after correctly submitting problem #10 discusses the odd-only option in greater detail. \$\endgroup\$Daan– Daan2015年01月08日 14:33:18 +00:00Commented Jan 8, 2015 at 14:33 -
\$\begingroup\$ Yes that surely will add to it but ensure readability of the code. Your array size could be reduced to upLimit/3 , storing 6x-1 and 6x+1 only. \$\endgroup\$thepace– thepace2015年01月08日 14:36:56 +00:00Commented Jan 8, 2015 at 14:36
-
\$\begingroup\$ Why stop there? You can use the length-30 prime wheel to reduce the size requirements to 8-bits per 30 candidates :-). But yes, maintainability suffers. \$\endgroup\$Daan– Daan2015年01月08日 14:55:17 +00:00Commented Jan 8, 2015 at 14:55
Explore related questions
See similar questions with these tags.