Yes, I realize there are many, many, many, many, many, prime generators on the Code Review SE. However, I have been unable to find any using this method for generating primes inside a vector, so I figure it should be covered.
Here's the C++ code:
long int primeCap = 1e10;
std::vector<long int> primes;
primes.push_back(2);
for(long int i = 2; i < primeCap; i++){
bool isPrime = true;
for(long int j = 0; primes[j] <= sqrt(i); j++){
if(i % primes[j] == 0){
isPrime = false;
break;
}
}
if(isPrime)
primes.push_back(i);
}
This code takes a while to fill a vector with \10ドル^{10}\$ (or more, depending on how much is needed) primes. How can this code be improved for the generation primes?
Update: The issue described below was a nonissue, and was simply an error on my part, the part below can be ignored.
(削除) There may also be an issue where the very last item in the vector is set to \2ドル\$. I have not effectively verified this, but modifying the if(isPrime)
statement to be:
if(isPrime){
std::cout << i << '\n';
primes.push_back(i);
}
Results in the last output being two. Once again, I have not effectively verified this, but it is something to note. (削除ここまで)
2 Answers 2
Here are some things that may help you improve your code.
Use the required #include
s
The code uses std::vector
which means that it should #include <vector>
. It was not difficult to infer, but it helps reviewers if the code is complete.
Use <cmath>
instead of <math.h>
I infer that you've used <math.h>
instead of <cmath>
because you're invoking sqrt(i)
rather than std::sqrt(i)
. The difference between the two include forms is that the former defines things within the std::
namespace versus into the global namespace. Language lawyers have lots of fun with this, but for daily use I'd recommend using <cmath>
. See this SO question for details.
Use the appropriate constructor
There's not really any reason to push the value of 2 after the stack is created when it could be done all at the same time. In fact, I'd write this:
std::vector<int> primes{2, 3};
Seeding the vector with the first two primes for reasons that will be very clear in the next suggestion.
Use a more efficient algorithm
Of all even numbers, only 2, which is already in the vector, is prime, so thereafter all even numbers may be safely skipped. Further, there's little reason to attempt to divide by 2 if you've skipped all other even numbers. For that reason, I'd write it like this:
for(int i = 3; i < primeCap; i+=2) {
bool isPrime = true;
int lim = std::sqrt(i);
for(int j = 1; primes[j] <= lim; j++){
if(i % primes[j] == 0){
isPrime = false;
break;
}
}
if(isPrime)
primes.push_back(i);
}
Doing this reduces the time by about half.
Use const
where practical
Since primeCap
doesn't seem to change within this program, it would make sense to me to declare it as const
or even constexpr
:
constexpr long int primeCap = 1e8;
Maximize the available numerical range
Assuming we're not hunting for negative prime numbers, wouldn't unsigned numbers make more sense? It's also logical to try to assure that primeCap
and the values in the std::vector
are of the same type -- either unsigned
or unsigned long
.
Avoid memory reallocations
It would save some reallocations and moves if the code were to preallocate the required size. One quick and relatively accurate approximation for the number of prime numbers less than a particular number \$n\$ is \$\frac{n}{\log{n}-1}\$. We can use that to reserve
approximately the right amount of memory:
primes.reserve(primeCap / (std::log(primeCap) - 1));
Avoid break
ing out of loops
It's generally better to put loop exit conditions at the top rather than forcing the reader to hunt for a break
buried somewhere inside the loop. It also often simplifies the code. The for
loop could be rewritten as:
for(unsigned j = 1; isPrime && operator[](j) <= lim; j++){
isPrime = i % operator[](j);
}
Consider a custom class
If we need a std::vector
of prime numbers, there's little use in it until it's actually created and populated. For that reason, one way to make sure that it's useful immediately after construction is to create a class with a constructor:
Result
Here's the code with all of these suggestions. It runs about twice as fast as the original for any given range.
#include <vector>
#include <cmath>
#include <iostream>
class Primes : public std::vector<unsigned>
{
public:
Primes(unsigned primeCap)
: std::vector<unsigned>{2,3}
{
reserve(primeCap / (std::log(primeCap) - 1));
for(unsigned i = back()+2; i < primeCap; i+=2) {
bool isPrime = true;
unsigned lim = std::sqrt(i);
for(unsigned j = 1; isPrime && operator[](j) <= lim; j++){
isPrime = i % operator[](j);
}
if(isPrime)
push_back(i);
}
}
};
int main() {
Primes primes(1e8);
std::cout << "prime[" << primes.size() << "] = " << primes.back() << "\n";
}
-
\$\begingroup\$ For my use case, it is not optimal to use it in a class, but I understand where you're coming from. Thanks for all the recommendations, they were all very useful. I am in fact using
<cmath>
, but I had no idea that usingstd::
was better, I assumed that thestd::
part was optional (learning C++ on my own usually means I miss a few crucial things). Thanks! \$\endgroup\$esote– esote2016年11月01日 01:40:06 +00:00Commented Nov 1, 2016 at 1:40 -
\$\begingroup\$ I'm happy it was useful to you. As always, you can and should tailor the advice to your actual use; you know your application better than I do! \$\endgroup\$Edward– Edward2016年11月01日 01:52:14 +00:00Commented Nov 1, 2016 at 1:52
-
\$\begingroup\$ I was experiencing some odd behavior using the vector. I discovered that the vector has
3
twice (i.e.{2,3,3,5...}
. To fix this I updated the firstfor
loop to be:for(unsigned i = 5; i < primeCap; i+=2)
(wherei
is set to5
initially). This fixed the issue. \$\endgroup\$esote– esote2016年11月02日 22:11:49 +00:00Commented Nov 2, 2016 at 22:11 -
1\$\begingroup\$ @Anonymous: Ah, you're right. I fixed it by making it start from
back()+2
so that if you decide to add more primes in the constructor, it will continue to work correctly. Thanks! \$\endgroup\$Edward– Edward2016年11月02日 22:17:55 +00:00Commented Nov 2, 2016 at 22:17
Note that I didn't do any benchmark, neither did I run a profiler on the code. This is just from reading your code. You should do profiling yourself if this is really time critical. (But then you should probably be using another method instead of the Sieve of Erasthothenes).
Reserve space for your primes
You know that you'll be looking for primes below primeCap = 1e10;
. The number of primes below x
is pi(x)
and can be approximated well with x/(log x - 1)
. So use that to reserve space beforehand in order to avoid (potentially costly) allocations.
What's the sign useful for?
Primes are positive. Why do you use (signed
) long int
when you could use unsigned
(long int
) numbers? Though keep in mind that some object against using unsigned integer datatypes as there are some pitfalls regarding "the usual arithmetic conversions".
Two times 2
The prime 2
will be twice in your vector. Change the initialization of the loop variable i
to i = 3
to solve that.
{
and }
for everyone
This is a very opinionated argument here, but consider using {}
even if there's only a single statement after an if
(or for
). We had some pretty popular bugs (partly) due to this "syntax shortcut" recently.
Notes:
- Don't try to optimize the
sqrt(i)
. - Some may object against the
break
. I don't think that it's an issue here, as the flow of control is pretty obvious.
There may also be an issue where the very last item in the vector is set to
2
. [..]
I have no idea what you want to suggest here. With the change you mention I definitively do not get 2
as last output (unless primeCap
is set accordingly).
-
\$\begingroup\$ The part about the issue with the last item being set to
2
turned out to be an error on my part. I updated my question to avoid further confusion. \$\endgroup\$esote– esote2016年11月01日 01:30:40 +00:00Commented Nov 1, 2016 at 1:30
long int
. \$\endgroup\$