Optimization - Prime Factorizations of Numbers <= 10^7

Question 1

I am using Trail Division Method with a pre-calculated list of primes to calculate the prime factorization of all numbers less than M (M <= 10^7).

I am using an array of vectors of pairs. The format for 10 is as follows:

PF[10][0].first = 2 // Base
PF[10][0].second = 1 // Exp
PF[10][1].first = 5
PF[10][1].second = 1

My approach is working fine but it is too slow. For M=10^7 it took 36.841 sec to compute PF of all numbers <=M on my system.

Questions

Which is the best approach for this question?
For my approach what other optimizations can i do?

My code

#include <iostream>
#include <fstream>
#include <vector>
#include <utility>
#include "math.h"
#include "stdio.h"
using namespace std;
const int lim=45000;
const int Max=10000000;
char prime[lim];
vector<pair <int,unsigned char> > PF[Max];
void prep()
{
 //Calculation of Prime Numbers
 for(int i=1;i<lim;prime[i++]=1);
 for(int i=2;i*i<lim;i++)
 if(prime[i])
 for(int j=i+i;j<lim;prime[j]=0,j+=i);
 for(int i=2;i<Max;i++) {
 int num=i;
 unsigned char pq=0;
 //Check for powers of 2
 while(num%2==0) {
 pq++;
 num=num/2;
 } 
 if(pq>0)
 PF[i].push_back( make_pair(2,pq) );
 int pan=num;
 //Loop for all primes j such that j*j<num
 for(int j=3;j*j<=num;j+=2) {
 if(prime[j]) {
 pq=0;
 while(num%j==0) {
 pq++;
 num=num/j;
 } 
 if(pq>0)
 PF[i].push_back( make_pair(j,pq) );
 }
 }
 if(num>1)
 PF[i].push_back( make_pair(num,1) );
 }
}
main()
{
 prep();
}

Timing

My Code

real 0m36.841s
user 0m36.624s
sys 0m0.265s

Igor ostrovsky Code

real 0m41.628s
user 0m41.390s
sys 0m0.265s

Question 2

I did this a while ago, in the interest of reducing ratios of factorials

#include <utility>
#include <vector>
std::vector< std::pair<int, int> > factor_table;
void fill_sieve( int n )
{
 factor_table.resize(n+1);
 for( int i = 1; i <= n; ++i ) {
 if (i & 1)
 factor_table[i] = std::pair<int, int>(i, 1);
 else
 factor_table[i] = std::pair<int, int>(2, i>>1);
 }
 for( int j = 3, j2 = 9; j2 <= n; ) {
 if (factor_table[j].second == 1) {
 int i = j;
 int ij = j2;
 while (ij <= n) {
 factor_table[ij] = std::pair<int, int>(j, i);
 ++i;
 ij += j;
 }
 }
 j2 += (j + 1) << 2;
 j += 2;
 }
}
std::vector<int> factor( int num )
{
 std::vector<int> factors;
 factors.reserve(30);
 do {
 factors.push_back(factor_table[num].first);
 num = factor_table[num].second;
 } while (num != 1);
 return factors;
}

I believe it will be much faster than yours, because I was able to avoid ever using division (/ and %). As a matter of fact, I don't use any multiplies either.

(It's about 6x faster with g++ than the code in the question, and the actual factor generation part is marginally faster, about 10%, than lol4t0's. Returning factors in a std::vector<int> takes 5x as long as the factor generation.)

Test program: http://ideone.com/pt9nu

Original application:

http://ideone.com/Weeg6

Question 3

Hey, I comprehend your idea! It looks very interesting to me. Still the idea is not obviuos from your code -- so one may want your data structure to be explained in more detail.

Question 4

It seems that your for( int j cycle may be further improved by inc=2...j+=inc; inc=6-inc; trick, so j2 have to be calculated slightly differently. Also one have to start j from 5, and add j=3 special processing case just like j=2 special processing case in factor_table[i] = std::pair<int, int>(2, i>>1);

Question 5

@ben .. Thanks for your reply :) .. link -- In this particular problem 'factor(k).swap(factors);'. will make factors to always store the return value and hence it will have the MAX-1 value after execution .. How can i display factors of some other number ?? . . When i declared factors as 2D vectors i need to use 'factors.push_back(factor(k));'. which is very time consuming .. It is taking 23 sec on my system to run while with 'factor(k).swap(factors);'. it took 8 sec .. Edit: while with reserving required memory beforehand it took 14 sec

Question 6

@anudeep2011: Looking up the factors from the table I built is extremely fast, there's no need to build a table of vectors of factors. Just call the factor(int num) function when you need the factors of num.

Question 7

This method is bloody brilliant.

Question 8

Off the top of my head, once you find a factor, you can look that up in your existing table. That will keep you from having to continue searching for more primes.

Question 9

On the topic of optimisations:

You're using push_back while not reserving any memory. Are you sure you don't want to reserve some amount you consider safe first?
You're checking whether something is a multiple of two by using modulo and then division. Your compiler probably optimises this out already, but you could use bitwise-and and a right-shift instead, as long as you know the numbers aren't negative.
You're multiplying j with itself on every iteration of one of the inner loops. Storing ceil(sqrt(num)) is likely to be more efficient.
Rereading, I see that you're doing the same thing with i in the second loop.

Apart from that, you should probably take a look at the following issues:

Declaring main with no return type is not allowed.
C headers of the style header.h are deprecated, use cheader now. (For example, cmath instead of math.h.) As Ben Voigt remarks, you're not using anything from many of your includes so you should drop them altogether.
You've got using namespace std;. While this may be okay for a short program, it does not combine well with global variables, and explicitly qualifying names would make more sense in this case (you only have five names to qualify, anyway).
The lines that calculate prime numbers are not readable to me as they are. As you can be sure that the body of the loop will always be executed before the increment, you could change it to be for (int j = i+i; j < lim; j += i) prime[j] = 0;, which would do the same thing while making it more readable. A similar thing can be done with the very first loop.
Splitting this code into functions would help a lot. Seeing as most of your variables are global, the only thing you need to worry about is the function call overhead (which may be non-existent if the function is inlined). In particular, I would move out calculate_primes.
You're not using pan anywhere, so you shouldn't declare it in the first place.

Question 10

I tried to use cache ang finished with foolwing code (based on Igor's one):

#include <vector>
#include <iostream>
#include <cmath>
using namespace std;
const int MAX = 10000000;
const int SQRT_MAX = sqrt(MAX);
int factor[MAX];
struct Node
{
 int number;
 Node* prior;
};
vector<Node> cache(MAX);
void precompute()
{
 for(int i = 1; i < MAX; i++) {
 factor[i] = i;
 }
 cache[1].prior = 0;
 cache[1].number = 1;
 for(int i = 2; i < SQRT_MAX; i++) {
 if (factor[i] == i) {
 for(int j = i << 1; j < MAX; j += i) {
 factor[j] = i;
 }
 }
 }
 for(int i = 2; i < MAX; i++) {
 int& f = factor[i];
 cache[i].prior = & cache[i/f];
 cache[i].number = f;
 }
}
Node* get_factors(int x)
{
 return &cache[x];
}
int main() {
 precompute();
 for (int k = 2; k < MAX; k++) {
 Node* it = get_factors(k);
 while (it != 0) { 
 //std::cout << it->number << " ";
 it = it->prior;
 }
 //std::cout << std::endl;
 }
}

My result is:

sh-3.1$ g++ 1.cpp -O3
sh-3.1$ time ./a
real 0m0.900s
user 0m0.000s
sys 0m0.015s
sh-3.1$

Opposed Igor's:

sh-3.1$ time ./a
real 0m10.313s
user 0m0.015s
sys 0m0.015s
sh-3.1$

With the following main cycle:

int main() {
 precompute();
 for (int k = 2; k < MAX; k++) {
 vector<int> factors = get_factors(k);
 for(int i=0; i<factors.size(); i++) { 
 //std::cout << factors[i] << " ";
 }
 //std::cout << std::endl;
 }
}

Question 11

(1) Really slick implementation. (2) Why do you print 0 (zero) at the end of each line of output?

Question 12

BTW, you can speed this up by about 32% with a very simple modification: Inside precompute(), also initialize the entire cache array in a single straightforward loop from 1 to MAX. Then get_factors simply returns &cache[x] without doing any other work. On my system, this improved the runtime from 0.65 seconds to 0.49 seconds.

Question 13

I forgot to say: Your implementation is the most beautiful prime factorization algorithm I've ever seen. Storing the lists of factors of all the numbers as interwoven linked lists is so cool it blows my mind. I've never seen this before. Did you discover this algorithm yourself or see it elsewhere?

Question 14

@ToddLehman, There is an error in code. I missed ` cache[1].number= 1;`. It will print 1 (1 is actually a factor of every number). Hm, if I make the improvement, that you recommend, I become faster than @Ben Voigt, yeaaah! Also, you can see that @Ben Voigt uses same structured cache, storing index of prior number, instead of pointer to it.

Question 15

@Lol4t0: That, and first is both the index and the actual paired factor itself, and I don't have anything equivalent to your separate factor array.

Question 16

Instead of storing the full factorization of each integer, just store one of the factors:

#include <vector>
#include <iostream>
using namespace std;
const int MAX = 10000000;
int factor[MAX];
void precompute()
{
 for(int i = 1; i < MAX; i++) {
 factor[i] = i;
 }
 for(int i = 2; i * i < MAX; i++) if (factor[i] == i) {
 for(int j = i + i; j < MAX; j += i) {
 factor[j] = i;
 }
 }
}
vector<int> get_factors(int x)
{
 vector<int> factors;
 while (x > 1) {
 int f = factor[x];
 factors.push_back(f);
 x /= f;
 }
 return factors;
}
void main() {
 precompute();
 vector<int> factors = get_factors(2012);
 for(int i=0; i<factors.size(); i++) { 
 std::cout << factors[i] << " ";
 }
 std::cout << std::endl;
}

Edit: I tested the code and fixed a couple of minor bugs.

Question 17

it seems you store the biggest factor for an entry in the table. Thus the factors will be found in descending order. If you store the smallest factor, they will be found in ascending order. Valid as well. Advantage? Start your inner loop in precompute() from i*i - a speedup! Put 2 in all even entries and then skip them, using j += 2*i step in the inner loop - a speedup! Separate out the testing of even/odd (test by n&1, not n%2), have your table twice as small - a speedup!

Ben Voigt Ben Voigt 8712 gold badges8 silver badges19 bronze badges · Answer 1 · 2012-02-16 21:03:43Z

I did this a while ago, in the interest of reducing ratios of factorials

#include <utility>
#include <vector>
std::vector< std::pair<int, int> > factor_table;
void fill_sieve( int n )
{
 factor_table.resize(n+1);
 for( int i = 1; i <= n; ++i ) {
 if (i & 1)
 factor_table[i] = std::pair<int, int>(i, 1);
 else
 factor_table[i] = std::pair<int, int>(2, i>>1);
 }
 for( int j = 3, j2 = 9; j2 <= n; ) {
 if (factor_table[j].second == 1) {
 int i = j;
 int ij = j2;
 while (ij <= n) {
 factor_table[ij] = std::pair<int, int>(j, i);
 ++i;
 ij += j;
 }
 }
 j2 += (j + 1) << 2;
 j += 2;
 }
}
std::vector<int> factor( int num )
{
 std::vector<int> factors;
 factors.reserve(30);
 do {
 factors.push_back(factor_table[num].first);
 num = factor_table[num].second;
 } while (num != 1);
 return factors;
}

I believe it will be much faster than yours, because I was able to avoid ever using division (/ and %). As a matter of fact, I don't use any multiplies either.

(It's about 6x faster with g++ than the code in the question, and the actual factor generation part is marginally faster, about 10%, than lol4t0's. Returning factors in a std::vector<int> takes 5x as long as the factor generation.)

Test program: http://ideone.com/pt9nu

Original application:

http://ideone.com/Weeg6

Hey, I comprehend your idea! It looks very interesting to me. Still the idea is not obviuos from your code -- so one may want your data structure to be explained in more detail.
It seems that your for( int j cycle may be further improved by inc=2...j+=inc; inc=6-inc; trick, so j2 have to be calculated slightly differently. Also one have to start j from 5, and add j=3 special processing case just like j=2 special processing case in factor_table[i] = std::pair<int, int>(2, i>>1);
@ben .. Thanks for your reply :) .. link -- In this particular problem 'factor(k).swap(factors);'. will make factors to always store the return value and hence it will have the MAX-1 value after execution .. How can i display factors of some other number ?? . . When i declared factors as 2D vectors i need to use 'factors.push_back(factor(k));'. which is very time consuming .. It is taking 23 sec on my system to run while with 'factor(k).swap(factors);'. it took 8 sec .. Edit: while with reserving required memory beforehand it took 14 sec
@anudeep2011: Looking up the factors from the table I built is extremely fast, there's no need to build a table of vectors of factors. Just call the factor(int num) function when you need the factors of num.

JoelJoel · Answer 2 · 2012-02-16 18:37:04Z

Off the top of my head, once you find a factor, you can look that up in your existing table. That will keep you from having to continue searching for more primes.

Komi Golov Komi Golov 3,26920 silver badges29 bronze badges · Answer 3 · 2012-02-16 20:45:19Z

On the topic of optimisations:

You're using push_back while not reserving any memory. Are you sure you don't want to reserve some amount you consider safe first?
You're checking whether something is a multiple of two by using modulo and then division. Your compiler probably optimises this out already, but you could use bitwise-and and a right-shift instead, as long as you know the numbers aren't negative.
You're multiplying j with itself on every iteration of one of the inner loops. Storing ceil(sqrt(num)) is likely to be more efficient.
Rereading, I see that you're doing the same thing with i in the second loop.

Apart from that, you should probably take a look at the following issues:

Declaring main with no return type is not allowed.
C headers of the style header.h are deprecated, use cheader now. (For example, cmath instead of math.h.) As Ben Voigt remarks, you're not using anything from many of your includes so you should drop them altogether.
You've got using namespace std;. While this may be okay for a short program, it does not combine well with global variables, and explicitly qualifying names would make more sense in this case (you only have five names to qualify, anyway).
The lines that calculate prime numbers are not readable to me as they are. As you can be sure that the body of the loop will always be executed before the increment, you could change it to be for (int j = i+i; j < lim; j += i) prime[j] = 0;, which would do the same thing while making it more readable. A similar thing can be done with the very first loop.
Splitting this code into functions would help a lot. Seeing as most of your variables are global, the only thing you need to worry about is the function call overhead (which may be non-existent if the function is inlined). In particular, I would move out calculate_primes.
You're not using pan anywhere, so you shouldn't declare it in the first place.

Lol4t0 Lol4t0 2241 silver badge9 bronze badges · Answer 4 · 2012-02-16 20:14:16Z

I tried to use cache ang finished with foolwing code (based on Igor's one):

#include <vector>
#include <iostream>
#include <cmath>
using namespace std;
const int MAX = 10000000;
const int SQRT_MAX = sqrt(MAX);
int factor[MAX];
struct Node
{
 int number;
 Node* prior;
};
vector<Node> cache(MAX);
void precompute()
{
 for(int i = 1; i < MAX; i++) {
 factor[i] = i;
 }
 cache[1].prior = 0;
 cache[1].number = 1;
 for(int i = 2; i < SQRT_MAX; i++) {
 if (factor[i] == i) {
 for(int j = i << 1; j < MAX; j += i) {
 factor[j] = i;
 }
 }
 }
 for(int i = 2; i < MAX; i++) {
 int& f = factor[i];
 cache[i].prior = & cache[i/f];
 cache[i].number = f;
 }
}
Node* get_factors(int x)
{
 return &cache[x];
}
int main() {
 precompute();
 for (int k = 2; k < MAX; k++) {
 Node* it = get_factors(k);
 while (it != 0) { 
 //std::cout << it->number << " ";
 it = it->prior;
 }
 //std::cout << std::endl;
 }
}

My result is:

sh-3.1$ g++ 1.cpp -O3
sh-3.1$ time ./a
real 0m0.900s
user 0m0.000s
sys 0m0.015s
sh-3.1$

Opposed Igor's:

sh-3.1$ time ./a
real 0m10.313s
user 0m0.015s
sys 0m0.015s
sh-3.1$

With the following main cycle:

int main() {
 precompute();
 for (int k = 2; k < MAX; k++) {
 vector<int> factors = get_factors(k);
 for(int i=0; i<factors.size(); i++) { 
 //std::cout << factors[i] << " ";
 }
 //std::cout << std::endl;
 }
}

(1) Really slick implementation. (2) Why do you print 0 (zero) at the end of each line of output?
BTW, you can speed this up by about 32% with a very simple modification: Inside precompute(), also initialize the entire cache array in a single straightforward loop from 1 to MAX. Then get_factors simply returns &cache[x] without doing any other work. On my system, this improved the runtime from 0.65 seconds to 0.49 seconds.
I forgot to say: Your implementation is the most beautiful prime factorization algorithm I've ever seen. Storing the lists of factors of all the numbers as interwoven linked lists is so cool it blows my mind. I've never seen this before. Did you discover this algorithm yourself or see it elsewhere?
@ToddLehman, There is an error in code. I missed ` cache[1].number= 1;`. It will print 1 (1 is actually a factor of every number). Hm, if I make the improvement, that you recommend, I become faster than @Ben Voigt, yeaaah! Also, you can see that @Ben Voigt uses same structured cache, storing index of prior number, instead of pointer to it.
@Lol4t0: That, and first is both the index and the actual paired factor itself, and I don't have anything equivalent to your separate factor array.

Igor ostrovskyIgor ostrovsky · Answer 5 · 2012-02-16 18:37:06Z

Instead of storing the full factorization of each integer, just store one of the factors:

#include <vector>
#include <iostream>
using namespace std;
const int MAX = 10000000;
int factor[MAX];
void precompute()
{
 for(int i = 1; i < MAX; i++) {
 factor[i] = i;
 }
 for(int i = 2; i * i < MAX; i++) if (factor[i] == i) {
 for(int j = i + i; j < MAX; j += i) {
 factor[j] = i;
 }
 }
}
vector<int> get_factors(int x)
{
 vector<int> factors;
 while (x > 1) {
 int f = factor[x];
 factors.push_back(f);
 x /= f;
 }
 return factors;
}
void main() {
 precompute();
 vector<int> factors = get_factors(2012);
 for(int i=0; i<factors.size(); i++) { 
 std::cout << factors[i] << " ";
 }
 std::cout << std::endl;
}

Edit: I tested the code and fixed a couple of minor bugs.

it seems you store the biggest factor for an entry in the table. Thus the factors will be found in descending order. If you store the smallest factor, they will be found in ascending order. Valid as well. Advantage? Start your inner loop in precompute() from i*i - a speedup! Put 2 in all even entries and then skip them, using j += 2*i step in the inner loop - a speedup! Separate out the testing of even/odd (test by n&1, not n%2), have your table twice as small - a speedup!

Stack Exchange Network

Optimization - Prime Factorizations of Numbers <= 10^7

Questions

My code

Timing

My Code

Igor ostrovsky Code

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimization - Prime Factorizations of Numbers <= 10^7

Questions

My code

Timing

My Code

Igor ostrovsky Code

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions