Prime Factorization

Question 1

I'm doing an Objective-C project and I need to do some prime factorization and so I came up with the following algorithm which is 99% C, except for the array part which was easiest to implement in Objective-C because it dynamically allocates new memory for the elements as they are simply added. The Objective-C part is pretty self-explanatory.

I'm mainly looking for feedback on the algorithm, not necessarily the coding style and such, but I will gladly feedback on something I can improve on! On my machine with O3 optimizations, I get just under two seconds on factorizing 1234567890. I really don't know if that is good or bad. One thing I would like to add to it is a table for previously calculated factorizations, when on large numbers you will most likely encounter numbers more than once (e.g. 4, 6, etc.). I could maybe add to the function that if it's even, add a 2 to the factors and don't do the division/prime check which may improve performance.

//Declare an array
NSMutableArray *factors;
bool isPrime(long n) {
 //Check if even or ends in a five
 if ((n % 2 == 0 && n > 2) || (n > 5 && n % 10 == 5)) {
 return 0;
 }
 if (n == 2) return 1;
 //set upper bound to square root
 long u = sqrt(n);
 long c = 0;
 for (long i = 3; i <= u; i+=2)
 if (n % i == 0) {
 c++;
 break;
 }
 return !c;
}
void factorize(long n) {
 //If prime return
 if (isPrime(n)) {
 [factors addObject:@(n)];
 return;
 }
 for (long i = n/2; i > 1; i--) {
 if (n % i) continue; //Check if indivisible
 long d = n/i;
 bool iP = isPrime(i), dP = isPrime(d);
 if (iP)
 [factors addObject:@(i)];
 else
 factorize(i);
 if (dP)
 [factors addObject:@(d)];
 else
 factorize(d);
 return;
 }
}
int main(int argc, const char * argv[]) {
 factors = [NSMutableArray array];
 long n = 1234567890;
 factorize(n);
 NSLog(@"%@", factors);
 return 0;
}

Question 2

Why bother optimizing a slow algorithm when you can just replace it with another (easy to implement) algorithm that will instantly factor all 64-bit integers in under few thousand or so simple iterations? Work smart, not hard - en.wikipedia.org/wiki/Pollard%27s_rho_algorithm

Question 3

@awashburn This is prime factorization, not primality testing..

Question 4

@Thomas I'd still argue against a stochastic algorithm...

Question 5

@awashburn The algorithm I linked is deterministic, and randomizing it doesn't really gain anything. Most nontrivial factorization algorithms are probabilistic though, that's just the reality of things, for 64 bit inputs though I find deterministic Miller-Rabin and rho (after trial division by the first few primes for efficiency) is as fast as it gets. Although I haven't tried ECM, but I suspect it has too much overhead in this case.

Question 6

@Thomas But the function you linked to uses stochastic principals to define a partial (non-total) function. Your right it is deterministic, still wouldn't recommend a partial function...

Question 7

Your algorithm tries to find the largest factors first. This is ineffective because you have to test each possible factor whether it is a prime number or not.

It is much more effective to start with the smallest factor. If you divide the number by each factor found this way then you will get only prime factors, and the prime testing function isPrime() becomes obsolete. You start with dividing by 2 and then test possible odd factors in increasing order. (If you have a pre-computed list of primes then this can be further improved.)

On my computer this reduced the runtime for the factorization of 1234567890 from 1.6 to 0.00005 seconds.

In addition, a global factors variable is no longer necessary with this change, and your method could look like this:

NSMutableArray * primeFactorization(long n) {
 NSMutableArray *factors = [NSMutableArray array];
 // Divide by 2:
 while (n > 1 && n % 2 == 0) {
 [factors addObject:@(2)];
 n /= 2;
 }
 // Divide by 3, 5, 7, ...
 //
 // i is a possible *smallest* factor of the (remaining) number n.
 // If i * i > n then n is either 1 or a prime number.
 for (long i = 3; i * i <= n; i += 2) {
 while (n > 1 && n % i == 0) {
 [factors addObject:@(i)];
 n /= i;
 }
 }
 if (n > 1) {
 // Append last prime factor:
 [factors addObject:@(n)];
 }
 return factors;
}
int main(int argc, const char * argv[]) {
 long n = 1234567890;
 NSDate *startTime = [NSDate date];
 NSArray *factors = primeFactorization(n);
 NSLog(@"time: %f", -[startTime timeIntervalSinceNow]);
 NSLog(@"%@", factors);
 return 0;
}

Question 8

Since you don't use argc & argv, I would put void in their spot. At least, that is what I would do in C. Other than that, very good advice: +1

Question 9

Feedback on the algorithm

//Check if even or ends in a five
if ((n % 2 == 0 && n > 2) || (n > 5 && n % 10 == 5)) {
 return 0;
}

The part which checks if your number is divided by 5 makes very little optimization, since it's checked on the second iteration of the loop which follow.

for (long i = 3; i <= u; i+=2)
 if (n % i == 0) {
 c++;
 break;
 }

I suggest creating a cache of all prime numbers up to, say, 100000, and rewrite the loop like this:

for (long i = 0; i < primeCacheSize; i++)
 if (n % primeCache[i] == 0) {
 return 1;
 }
for (long i = 100001; i <= u; i++)
 if (n % i == 0) {
 return 1;
 }
return 0;

This is a bit strange:

//If prime return
if (isPrime(n)) {
 [factors addObject:@(n)];
 return;
}
for (long i = n/2; i > 1; i--) {
 if (n % i) continue; //Check if indivisible

How will this code work on 208907 * 208907 * 208907 (9117147423118643)? Your function isPrime will determine that n is divided by 208907, return false, and you'll start iterating from 9117147423118643 / 2 downwards. Looks like an infinite (and useless) loop.

The solution I may suggest: make your isPrime function return the number itself if n is prime or number's divider if it isn't.

for (long i = 0; i < primeCacheSize; i++)
 if (n % primeCache[i] == 0) {
 return primeCache[i];
 }
for (long i = 100001; i <= u; i++)
 if (n % i == 0) {
 return i;
 }
return n;

And rewrite factorize like that:

void factorize(long n) {
//If prime return
while (n > 1) {
 divider = isPrime(n);
 [factors addObject:@(divider)];
 n /= divider;
}

Question 10

I'd limit the prime cache loops with u as well to avoid unnecessary checks: i < primeCacheSize && i <= u.

Question 11

Also, with the suggestion of having isPrime return the divider: the function name isPrime is now misleading as I'd expect a boolean value to be returned, so its name should be changed to something like firstPrimeOf or whatever. Also, I'd recommend using a NSMutableSet instead of an NSMutableArray for factors.

Question 12

A note on your prime cache. if you have 10000 primes cached, and the number is not in the cache and the number is greater then the upper limit of the cache, you should not start the next loop'

Question 13

If the number is not in the cache, you should start the second loop not at i=cacheSize+1 but at i=cache[cacheSize]+2 (+2 so it's an odd number), and the loop should increment by i+=2. This ensures that you are only checking odd numbers.

Question 14

Given that this answer is marked with the c tag, and the user quite appropriately points out that the code is 99% c, I decided that it might be worthwhile to demonstrate how simple the C solution to this problem is.

There are only two aspects of Objective-C the user uses.

NSMutableArray. This is convenient and it handles the resizing of the array for us. However, it's important to note that there's nothing magical about how an NSMutableArray resizes. Objective-C mutable arrays are resized in the same way as a C-array.
NSNumber. We don't actually see NSNumber anywhere directly, but that's what the @() syntax is doing--creating NSNumber objects. And why? Because NSArray can not store primitive data types. It must store pointers.

Unless the actual intent is to use this array in some other Objective-C code where the NSNumber objects will actually be easier to deal with, there's not a particularly good reason to use Objective-C here, except perhaps being uncomfortable in working with C-style arrays.

So, here's my solution to this problem in C.

As a starting point, I used Martin R's answer, which actually also is very similar to the algorithm I used for a slightly different problem here.

Please note, the following code snippet includes the code from Martin's answer so that the two solutions can be compared side-by-side in case he ever edits/removes his answer.

Martin's very good answer:

NSMutableArray * primeFactorization(long n) {
 NSMutableArray *factors = [NSMutableArray array];
 // Divide by 2:
 while (n > 1 && n % 2 == 0) {
 [factors addObject:@(2)];
 n /= 2;
 }
 // Divide by 3, 5, 7, ...
 //
 // i is a possible *smallest* factor of the (remaining) number n.
 // If i * i > n then n is either 1 or a prime number.
 for (long i = 3; i * i <= n; i += 2) {
 while (n > 1 && n % i == 0) {
 [factors addObject:@(i)];
 n /= i;
 }
 }
 if (n > 1) {
 // Append last prime factor:
 [factors addObject:@(n)];
 }
 return factors;
}

My approach in straight C:

long * cPrimeFactorization(long n, long *factorCount) {
 long currentSize = 2;
 long currentIndex = 0;
 long *factors = malloc(sizeof(long) * currentSize);
 while (n > 1 && n % 2 == 0) {
 factors[currentIndex++] = 2;
 if (currentIndex >= currentSize) {
 currentSize *= 2;
 long *reallocFactors = realloc(factors, currentSize * sizeof(long));
 if (reallocFactors) {
 factors = reallocFactors;
 } else {
 printf("realloc failed");
 free(factors);
 return NULL;
 }
 }
 n /= 2;
 }
 for (long i = 3; i * i <= n; i += 2) {
 while (n > 1 && n % i == 0) {
 factors[currentIndex++] = i;
 if (currentIndex >= currentSize) {
 currentSize *= 2;
 long *reallocFactors = realloc(factors, currentSize * sizeof(long));
 if (reallocFactors) {
 factors = reallocFactors;
 } else {
 printf("realloc failed");
 free(factors);
 return NULL;
 }
 }
 n /= i;
 }
 }
 if (n > 1) {
 factors[currentIndex++] = n;
 }
 *factorCount = currentIndex; 
 return factors;
}

Running them side by side:

int main(int argc, const char * argv[]) {
 long n = 1234567890;
 NSDate *startTime = [NSDate date];
 NSArray *oFactors = primeFactorization(n);
 NSLog(@"time (Objective-C): %f", -[startTime timeIntervalSinceNow]);
 startTime = [NSDate date];
 long factorCount;
 long * cFactors = cPrimeFactorization(n, &factorCount);
 NSLog(@"time (C): %f", -[startTime timeIntervalSinceNow]);
 NSLog(@"%@", oFactors);
 for (long i = 0; i < factorCount; ++i) {
 printf("%li\n", cFactors[i]);
 }
 return 0;
}

For the sake of benchmarking consistency, I just used the Objective-C approach (because Martin already had it and I didn't feel like looking up a C approach). I tried a handful of different numbers and ran them all multiple times. The C approach runs faster in every case.

More than speed, the C approach actually takes less total memory than the Objective-C approach. Not only does the Objective-C array have more overhead than the C array, (each index is the same size, 8-bytes), but the Objective-C array is an index of pointers to objects. So the Objective-C array is already slightly bigger than the C array, plus there's several magnitudes more memory space for holding all of the NSNumber objects out on the heap as well.

And that we have to spend so much time creating these NSNumber objects out on the stack is part of the reason the Objective-C solution is slower. It's also slower because passing messages to objects (the addObject: we call several times) is very slow compared to being able to directly insert a value in at a memory location (which is what we do with the C-array).

Please note, I'm definitely an Objective-C programmer, and I am not a C programmer in the slightest. My C code almost certainly has room for improvement I'd imagine... but as Objective-C programmers, we must always keep in mind that pure-C should always be an option.

As a note, this speed (and memory) difference becomes particularly noticeable with very large powers of two. For example, with 2^49th, which is 562,949,953,421,312, the Objective-C solution takes almost 3 times as long on my computer.

Question 15

With regards to printf("realloc failed");, the caller should handle reporting errors.

Question 16

Yeah, probably. I'm not familiar with C-standards or what should be done in a case like this... I just know it's possible for realloc to fail...

Question 17

@nhgrif: My last comment was wrong and I have deleted it. But it would be interesting if - after you added the necessary allocation to the while (n > 1 && n % 2 == 0) loop - the difference is still particularly noticeable for powers of two.

Question 18

There are math potential problems with double sqrt(double)

bool isPrime(long n) {
 ...
 long u = sqrt(n);
 ...
 for (long i = 3; i <= u; i+=2)

Conversion from long to double may lose precision. Although double typically can represent every value of an int54_t, long could be wider such as 64 bits.
A good sqrt() will return the exact value for a perfect square, but not all sqrt() behave so nicely. Code should be prepared for something like sqrt(121) --> 10.99999....
Since the for() loop has no trouble iterating a little pass ideal, recommend a simple solution for both above issues: Increase the upper limit by 1.
```
 long u = sqrt(n);
 u++;
```

Martin R Martin R 24.2k2 gold badges38 silver badges96 bronze badges · Answer 1 · 2014-09-05 11:48:49Z

Your algorithm tries to find the largest factors first. This is ineffective because you have to test each possible factor whether it is a prime number or not.

It is much more effective to start with the smallest factor. If you divide the number by each factor found this way then you will get only prime factors, and the prime testing function isPrime() becomes obsolete. You start with dividing by 2 and then test possible odd factors in increasing order. (If you have a pre-computed list of primes then this can be further improved.)

On my computer this reduced the runtime for the factorization of 1234567890 from 1.6 to 0.00005 seconds.

In addition, a global factors variable is no longer necessary with this change, and your method could look like this:

NSMutableArray * primeFactorization(long n) {
 NSMutableArray *factors = [NSMutableArray array];
 // Divide by 2:
 while (n > 1 && n % 2 == 0) {
 [factors addObject:@(2)];
 n /= 2;
 }
 // Divide by 3, 5, 7, ...
 //
 // i is a possible *smallest* factor of the (remaining) number n.
 // If i * i > n then n is either 1 or a prime number.
 for (long i = 3; i * i <= n; i += 2) {
 while (n > 1 && n % i == 0) {
 [factors addObject:@(i)];
 n /= i;
 }
 }
 if (n > 1) {
 // Append last prime factor:
 [factors addObject:@(n)];
 }
 return factors;
}
int main(int argc, const char * argv[]) {
 long n = 1234567890;
 NSDate *startTime = [NSDate date];
 NSArray *factors = primeFactorization(n);
 NSLog(@"time: %f", -[startTime timeIntervalSinceNow]);
 NSLog(@"%@", factors);
 return 0;
}

Since you don't use argc & argv, I would put void in their spot. At least, that is what I would do in C. Other than that, very good advice: +1

ekaerovets ekaerovets 2561 silver badge3 bronze badges · Answer 2 · 2014-09-05 09:35:20Z

Feedback on the algorithm

//Check if even or ends in a five
if ((n % 2 == 0 && n > 2) || (n > 5 && n % 10 == 5)) {
 return 0;
}

The part which checks if your number is divided by 5 makes very little optimization, since it's checked on the second iteration of the loop which follow.

for (long i = 3; i <= u; i+=2)
 if (n % i == 0) {
 c++;
 break;
 }

I suggest creating a cache of all prime numbers up to, say, 100000, and rewrite the loop like this:

for (long i = 0; i < primeCacheSize; i++)
 if (n % primeCache[i] == 0) {
 return 1;
 }
for (long i = 100001; i <= u; i++)
 if (n % i == 0) {
 return 1;
 }
return 0;

This is a bit strange:

//If prime return
if (isPrime(n)) {
 [factors addObject:@(n)];
 return;
}
for (long i = n/2; i > 1; i--) {
 if (n % i) continue; //Check if indivisible

How will this code work on 208907 * 208907 * 208907 (9117147423118643)? Your function isPrime will determine that n is divided by 208907, return false, and you'll start iterating from 9117147423118643 / 2 downwards. Looks like an infinite (and useless) loop.

The solution I may suggest: make your isPrime function return the number itself if n is prime or number's divider if it isn't.

for (long i = 0; i < primeCacheSize; i++)
 if (n % primeCache[i] == 0) {
 return primeCache[i];
 }
for (long i = 100001; i <= u; i++)
 if (n % i == 0) {
 return i;
 }
return n;

And rewrite factorize like that:

void factorize(long n) {
//If prime return
while (n > 1) {
 divider = isPrime(n);
 [factors addObject:@(divider)];
 n /= divider;
}

I'd limit the prime cache loops with u as well to avoid unnecessary checks: i < primeCacheSize && i <= u.
Also, with the suggestion of having isPrime return the divider: the function name isPrime is now misleading as I'd expect a boolean value to be returned, so its name should be changed to something like firstPrimeOf or whatever. Also, I'd recommend using a NSMutableSet instead of an NSMutableArray for factors.
A note on your prime cache. if you have 10000 primes cached, and the number is not in the cache and the number is greater then the upper limit of the cache, you should not start the next loop'
If the number is not in the cache, you should start the second loop not at i=cacheSize+1 but at i=cache[cacheSize]+2 (+2 so it's an odd number), and the loop should increment by i+=2. This ensures that you are only checking odd numbers.

nhgrif nhgrif 25.4k3 gold badges64 silver badges129 bronze badges · Answer 3 · 2014-09-05 23:33:33Z

Given that this answer is marked with the c tag, and the user quite appropriately points out that the code is 99% c, I decided that it might be worthwhile to demonstrate how simple the C solution to this problem is.

There are only two aspects of Objective-C the user uses.

NSMutableArray. This is convenient and it handles the resizing of the array for us. However, it's important to note that there's nothing magical about how an NSMutableArray resizes. Objective-C mutable arrays are resized in the same way as a C-array.
NSNumber. We don't actually see NSNumber anywhere directly, but that's what the @() syntax is doing--creating NSNumber objects. And why? Because NSArray can not store primitive data types. It must store pointers.

Unless the actual intent is to use this array in some other Objective-C code where the NSNumber objects will actually be easier to deal with, there's not a particularly good reason to use Objective-C here, except perhaps being uncomfortable in working with C-style arrays.

So, here's my solution to this problem in C.

As a starting point, I used Martin R's answer, which actually also is very similar to the algorithm I used for a slightly different problem here.

Please note, the following code snippet includes the code from Martin's answer so that the two solutions can be compared side-by-side in case he ever edits/removes his answer.

Martin's very good answer:

NSMutableArray * primeFactorization(long n) {
 NSMutableArray *factors = [NSMutableArray array];
 // Divide by 2:
 while (n > 1 && n % 2 == 0) {
 [factors addObject:@(2)];
 n /= 2;
 }
 // Divide by 3, 5, 7, ...
 //
 // i is a possible *smallest* factor of the (remaining) number n.
 // If i * i > n then n is either 1 or a prime number.
 for (long i = 3; i * i <= n; i += 2) {
 while (n > 1 && n % i == 0) {
 [factors addObject:@(i)];
 n /= i;
 }
 }
 if (n > 1) {
 // Append last prime factor:
 [factors addObject:@(n)];
 }
 return factors;
}

My approach in straight C:

long * cPrimeFactorization(long n, long *factorCount) {
 long currentSize = 2;
 long currentIndex = 0;
 long *factors = malloc(sizeof(long) * currentSize);
 while (n > 1 && n % 2 == 0) {
 factors[currentIndex++] = 2;
 if (currentIndex >= currentSize) {
 currentSize *= 2;
 long *reallocFactors = realloc(factors, currentSize * sizeof(long));
 if (reallocFactors) {
 factors = reallocFactors;
 } else {
 printf("realloc failed");
 free(factors);
 return NULL;
 }
 }
 n /= 2;
 }
 for (long i = 3; i * i <= n; i += 2) {
 while (n > 1 && n % i == 0) {
 factors[currentIndex++] = i;
 if (currentIndex >= currentSize) {
 currentSize *= 2;
 long *reallocFactors = realloc(factors, currentSize * sizeof(long));
 if (reallocFactors) {
 factors = reallocFactors;
 } else {
 printf("realloc failed");
 free(factors);
 return NULL;
 }
 }
 n /= i;
 }
 }
 if (n > 1) {
 factors[currentIndex++] = n;
 }
 *factorCount = currentIndex; 
 return factors;
}

Running them side by side:

int main(int argc, const char * argv[]) {
 long n = 1234567890;
 NSDate *startTime = [NSDate date];
 NSArray *oFactors = primeFactorization(n);
 NSLog(@"time (Objective-C): %f", -[startTime timeIntervalSinceNow]);
 startTime = [NSDate date];
 long factorCount;
 long * cFactors = cPrimeFactorization(n, &factorCount);
 NSLog(@"time (C): %f", -[startTime timeIntervalSinceNow]);
 NSLog(@"%@", oFactors);
 for (long i = 0; i < factorCount; ++i) {
 printf("%li\n", cFactors[i]);
 }
 return 0;
}

For the sake of benchmarking consistency, I just used the Objective-C approach (because Martin already had it and I didn't feel like looking up a C approach). I tried a handful of different numbers and ran them all multiple times. The C approach runs faster in every case.

More than speed, the C approach actually takes less total memory than the Objective-C approach. Not only does the Objective-C array have more overhead than the C array, (each index is the same size, 8-bytes), but the Objective-C array is an index of pointers to objects. So the Objective-C array is already slightly bigger than the C array, plus there's several magnitudes more memory space for holding all of the NSNumber objects out on the heap as well.

And that we have to spend so much time creating these NSNumber objects out on the stack is part of the reason the Objective-C solution is slower. It's also slower because passing messages to objects (the addObject: we call several times) is very slow compared to being able to directly insert a value in at a memory location (which is what we do with the C-array).

Please note, I'm definitely an Objective-C programmer, and I am not a C programmer in the slightest. My C code almost certainly has room for improvement I'd imagine... but as Objective-C programmers, we must always keep in mind that pure-C should always be an option.

As a note, this speed (and memory) difference becomes particularly noticeable with very large powers of two. For example, with 2^49th, which is 562,949,953,421,312, the Objective-C solution takes almost 3 times as long on my computer.

With regards to printf("realloc failed");, the caller should handle reporting errors.
Yeah, probably. I'm not familiar with C-standards or what should be done in a case like this... I just know it's possible for realloc to fail...
@nhgrif: My last comment was wrong and I have deleted it. But it would be interesting if - after you added the necessary allocation to the while (n > 1 && n % 2 == 0) loop - the difference is still particularly noticeable for powers of two.

chux chux 36.2k2 gold badges43 silver badges96 bronze badges · Answer 4 · 2014-09-08 17:54:34Z

There are math potential problems with double sqrt(double)

bool isPrime(long n) {
 ...
 long u = sqrt(n);
 ...
 for (long i = 3; i <= u; i+=2)

Conversion from long to double may lose precision. Although double typically can represent every value of an int54_t, long could be wider such as 64 bits.
A good sqrt() will return the exact value for a perfect square, but not all sqrt() behave so nicely. Code should be prepared for something like sqrt(121) --> 10.99999....
Since the for() loop has no trouble iterating a little pass ideal, recommend a simple solution for both above issues: Increase the upper limit by 1.
```
 long u = sqrt(n);
 u++;
```

Stack Exchange Network

Prime Factorization

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Prime Factorization

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions