I am using Trail Division Method with a pre-calculated list of primes to calculate the prime factorization of all numbers less than M (M <= 10^7).
I am using an array of vectors of pairs. The format for 10 is as follows:
PF[10][0].first = 2 // Base
PF[10][0].second = 1 // Exp
PF[10][1].first = 5
PF[10][1].second = 1
My approach is working fine but it is too slow. For M=10^7 it took 36.841 sec to compute PF of all numbers <=M on my system.
Questions
- Which is the best approach for this question?
- For my approach what other optimizations can i do?
My code
#include <iostream>
#include <fstream>
#include <vector>
#include <utility>
#include "math.h"
#include "stdio.h"
using namespace std;
const int lim=45000;
const int Max=10000000;
char prime[lim];
vector<pair <int,unsigned char> > PF[Max];
void prep()
{
//Calculation of Prime Numbers
for(int i=1;i<lim;prime[i++]=1);
for(int i=2;i*i<lim;i++)
if(prime[i])
for(int j=i+i;j<lim;prime[j]=0,j+=i);
for(int i=2;i<Max;i++) {
int num=i;
unsigned char pq=0;
//Check for powers of 2
while(num%2==0) {
pq++;
num=num/2;
}
if(pq>0)
PF[i].push_back( make_pair(2,pq) );
int pan=num;
//Loop for all primes j such that j*j<num
for(int j=3;j*j<=num;j+=2) {
if(prime[j]) {
pq=0;
while(num%j==0) {
pq++;
num=num/j;
}
if(pq>0)
PF[i].push_back( make_pair(j,pq) );
}
}
if(num>1)
PF[i].push_back( make_pair(num,1) );
}
}
main()
{
prep();
}
Timing
My Code
real 0m36.841s
user 0m36.624s
sys 0m0.265s
Igor ostrovsky Code
real 0m41.628s
user 0m41.390s
sys 0m0.265s
5 Answers 5
I did this a while ago, in the interest of reducing ratios of factorials
#include <utility>
#include <vector>
std::vector< std::pair<int, int> > factor_table;
void fill_sieve( int n )
{
factor_table.resize(n+1);
for( int i = 1; i <= n; ++i ) {
if (i & 1)
factor_table[i] = std::pair<int, int>(i, 1);
else
factor_table[i] = std::pair<int, int>(2, i>>1);
}
for( int j = 3, j2 = 9; j2 <= n; ) {
if (factor_table[j].second == 1) {
int i = j;
int ij = j2;
while (ij <= n) {
factor_table[ij] = std::pair<int, int>(j, i);
++i;
ij += j;
}
}
j2 += (j + 1) << 2;
j += 2;
}
}
std::vector<int> factor( int num )
{
std::vector<int> factors;
factors.reserve(30);
do {
factors.push_back(factor_table[num].first);
num = factor_table[num].second;
} while (num != 1);
return factors;
}
I believe it will be much faster than yours, because I was able to avoid ever using division (/
and %
). As a matter of fact, I don't use any multiplies either.
(It's about 6x faster with g++ than the code in the question, and the actual factor generation part is marginally faster, about 10%, than lol4t0's. Returning factors in a std::vector<int>
takes 5x as long as the factor generation.)
Test program: http://ideone.com/pt9nu
Original application:
-
\$\begingroup\$ Hey, I comprehend your idea! It looks very interesting to me. Still the idea is not obviuos from your code -- so one may want your data structure to be explained in more detail. \$\endgroup\$user1123502– user11235022012年02月17日 00:09:45 +00:00Commented Feb 17, 2012 at 0:09
-
\$\begingroup\$ It seems that your
for( int j
cycle may be further improved byinc=2
...j+=inc; inc=6-inc;
trick, soj2
have to be calculated slightly differently. Also one have to startj
from 5, and addj=3
special processing case just likej=2
special processing case infactor_table[i] = std::pair<int, int>(2, i>>1);
\$\endgroup\$user1123502– user11235022012年02月17日 00:48:05 +00:00Commented Feb 17, 2012 at 0:48 -
\$\begingroup\$ @ben .. Thanks for your reply :) .. link -- In this particular problem 'factor(k).swap(factors);'. will make factors to always store the return value and hence it will have the MAX-1 value after execution .. How can i display factors of some other number ?? . . When i declared factors as 2D vectors i need to use 'factors.push_back(factor(k));'. which is very time consuming .. It is taking 23 sec on my system to run while with 'factor(k).swap(factors);'. it took 8 sec .. Edit: while with reserving required memory beforehand it took 14 sec \$\endgroup\$anudeep2011– anudeep20112012年02月17日 10:05:37 +00:00Commented Feb 17, 2012 at 10:05
-
2\$\begingroup\$ @anudeep2011: Looking up the factors from the table I built is extremely fast, there's no need to build a table of vectors of factors. Just call the
factor(int num)
function when you need the factors ofnum
. \$\endgroup\$Ben Voigt– Ben Voigt2012年02月17日 15:19:21 +00:00Commented Feb 17, 2012 at 15:19 -
2\$\begingroup\$ This method is bloody brilliant. \$\endgroup\$Todd Lehman– Todd Lehman2012年02月18日 06:16:06 +00:00Commented Feb 18, 2012 at 6:16
Off the top of my head, once you find a factor, you can look that up in your existing table. That will keep you from having to continue searching for more primes.
On the topic of optimisations:
- You're using
push_back
while not reserving any memory. Are you sure you don't want to reserve some amount you consider safe first? - You're checking whether something is a multiple of two by using modulo and then division. Your compiler probably optimises this out already, but you could use bitwise-and and a right-shift instead, as long as you know the numbers aren't negative.
- You're multiplying
j
with itself on every iteration of one of the inner loops. Storingceil(sqrt(num))
is likely to be more efficient. - Rereading, I see that you're doing the same thing with
i
in the second loop.
Apart from that, you should probably take a look at the following issues:
- Declaring
main
with no return type is not allowed. - C headers of the style
header.h
are deprecated, usecheader
now. (For example,cmath
instead ofmath.h
.) As Ben Voigt remarks, you're not using anything from many of your includes so you should drop them altogether. - You've got
using namespace std;
. While this may be okay for a short program, it does not combine well with global variables, and explicitly qualifying names would make more sense in this case (you only have five names to qualify, anyway). - The lines that calculate prime numbers are not readable to me as they are. As you can be sure that the body of the loop will always be executed before the increment, you could change it to be
for (int j = i+i; j < lim; j += i) prime[j] = 0;
, which would do the same thing while making it more readable. A similar thing can be done with the very first loop. - Splitting this code into functions would help a lot. Seeing as most of your variables are global, the only thing you need to worry about is the function call overhead (which may be non-existent if the function is inlined). In particular, I would move out
calculate_primes
. - You're not using
pan
anywhere, so you shouldn't declare it in the first place.
I tried to use cache ang finished with foolwing code (based on Igor's one):
#include <vector>
#include <iostream>
#include <cmath>
using namespace std;
const int MAX = 10000000;
const int SQRT_MAX = sqrt(MAX);
int factor[MAX];
struct Node
{
int number;
Node* prior;
};
vector<Node> cache(MAX);
void precompute()
{
for(int i = 1; i < MAX; i++) {
factor[i] = i;
}
cache[1].prior = 0;
cache[1].number = 1;
for(int i = 2; i < SQRT_MAX; i++) {
if (factor[i] == i) {
for(int j = i << 1; j < MAX; j += i) {
factor[j] = i;
}
}
}
for(int i = 2; i < MAX; i++) {
int& f = factor[i];
cache[i].prior = & cache[i/f];
cache[i].number = f;
}
}
Node* get_factors(int x)
{
return &cache[x];
}
int main() {
precompute();
for (int k = 2; k < MAX; k++) {
Node* it = get_factors(k);
while (it != 0) {
//std::cout << it->number << " ";
it = it->prior;
}
//std::cout << std::endl;
}
}
My result is:
sh-3.1$ g++ 1.cpp -O3
sh-3.1$ time ./a
real 0m0.900s
user 0m0.000s
sys 0m0.015s
sh-3.1$
Opposed Igor's:
sh-3.1$ time ./a
real 0m10.313s
user 0m0.015s
sys 0m0.015s
sh-3.1$
With the following main cycle:
int main() {
precompute();
for (int k = 2; k < MAX; k++) {
vector<int> factors = get_factors(k);
for(int i=0; i<factors.size(); i++) {
//std::cout << factors[i] << " ";
}
//std::cout << std::endl;
}
}
-
\$\begingroup\$ (1) Really slick implementation. (2) Why do you print 0 (zero) at the end of each line of output? \$\endgroup\$Todd Lehman– Todd Lehman2012年02月18日 06:58:00 +00:00Commented Feb 18, 2012 at 6:58
-
\$\begingroup\$ BTW, you can speed this up by about 32% with a very simple modification: Inside
precompute()
, also initialize the entirecache
array in a single straightforward loop from 1 to MAX. Thenget_factors
simply returns&cache[x]
without doing any other work. On my system, this improved the runtime from 0.65 seconds to 0.49 seconds. \$\endgroup\$Todd Lehman– Todd Lehman2012年02月18日 07:22:43 +00:00Commented Feb 18, 2012 at 7:22 -
\$\begingroup\$ I forgot to say: Your implementation is the most beautiful prime factorization algorithm I've ever seen. Storing the lists of factors of all the numbers as interwoven linked lists is so cool it blows my mind. I've never seen this before. Did you discover this algorithm yourself or see it elsewhere? \$\endgroup\$Todd Lehman– Todd Lehman2012年02月18日 07:25:01 +00:00Commented Feb 18, 2012 at 7:25
-
1\$\begingroup\$ @ToddLehman, There is an error in code. I missed ` cache[1].number= 1;`. It will print 1 (1 is actually a factor of every number). Hm, if I make the improvement, that you recommend, I become faster than @Ben Voigt, yeaaah! Also, you can see that @Ben Voigt uses same structured cache, storing index of prior number, instead of pointer to it. \$\endgroup\$Lol4t0– Lol4t02012年02月18日 10:09:22 +00:00Commented Feb 18, 2012 at 10:09
-
\$\begingroup\$ @Lol4t0: That, and
first
is both the index and the actual paired factor itself, and I don't have anything equivalent to your separatefactor
array. \$\endgroup\$Ben Voigt– Ben Voigt2012年02月18日 18:03:18 +00:00Commented Feb 18, 2012 at 18:03
Instead of storing the full factorization of each integer, just store one of the factors:
#include <vector>
#include <iostream>
using namespace std;
const int MAX = 10000000;
int factor[MAX];
void precompute()
{
for(int i = 1; i < MAX; i++) {
factor[i] = i;
}
for(int i = 2; i * i < MAX; i++) if (factor[i] == i) {
for(int j = i + i; j < MAX; j += i) {
factor[j] = i;
}
}
}
vector<int> get_factors(int x)
{
vector<int> factors;
while (x > 1) {
int f = factor[x];
factors.push_back(f);
x /= f;
}
return factors;
}
void main() {
precompute();
vector<int> factors = get_factors(2012);
for(int i=0; i<factors.size(); i++) {
std::cout << factors[i] << " ";
}
std::cout << std::endl;
}
Edit: I tested the code and fixed a couple of minor bugs.
-
\$\begingroup\$ it seems you store the biggest factor for an entry in the table. Thus the factors will be found in descending order. If you store the smallest factor, they will be found in ascending order. Valid as well. Advantage? Start your inner loop in
precompute()
fromi*i
- a speedup! Put 2 in all even entries and then skip them, usingj += 2*i
step in the inner loop - a speedup! Separate out the testing of even/odd (test byn&1
, notn%2
), have your table twice as small - a speedup! \$\endgroup\$Will Ness– Will Ness2012年04月26日 16:37:02 +00:00Commented Apr 26, 2012 at 16:37