Here's my implementation to factorize a positive integer:
def is_prime(n):
if n == 1:
return False
if n % 2 == 0:
return False
i = 3
while i * i <= n:
if n % i == 0:
return False
i += 2
return True
# @timeit
def prime_factors(n):
prime_factor_list = []
while not n % 2:
prime_factor_list.append(2)
n //= 2
while not n % 3:
prime_factor_list.append(3)
n //= 3
i = 5
while n != 1:
if is_prime(i):
while not n % i:
prime_factor_list.append(i)
n //= i
i += 2
return prime_factor_list
How can I improve this code in speed and make it pythonic?
-
1\$\begingroup\$ I think that the best way to do fast number factorization in python is just call an external library (which does not rely on Python iteration) such as sympy. See related answer on StackOverflow stackoverflow.com/questions/4643647/… ). \$\endgroup\$John Donn– John Donn2016年03月04日 09:54:23 +00:00Commented Mar 4, 2016 at 9:54
4 Answers 4
prime_factors()
The single greatest improvement to performance that you could make would be to remove the call to is_prime()
altogether. If all factors smaller than i
have been removed from n
already, then i
could not possibly be composite.
The next improvement you could make would be to turn prime_factors()
into a generator. The function would no longer need to resize and recopy prime_factor_list
whenever the list needs to grow beyond its estimated size. Furthermore, the caller could start making use of initial results before the entire factorization is finished.
Using three lines (i = 5
, while n != 1
, and i += 2
) to structure the loop — something that would be succinctly written in C as a for (i = 5; n != 1; i += 2)
header — is un-Pythonic. Such counting loops are usually better written using some kind of range()
, xrange()
, enumerate()
, or something from itertools
. Here, itertools.chain()
and itertools.count()
would be appropriate.
Without changing the spirit of your algorithm, I would write:
import itertools
def prime_factors(n):
for i in itertools.chain([2], itertools.count(3, 2)):
if n <= 1:
break
while n % i == 0:
n //= i
yield i
is_prime()
As mentioned above, you don't need is_prime()
at all, if your only goal is to implement prime_factors()
.
However, if you wanted to implement is_prime()
anyway, you could tighten it up a bit. My main recommendation would be to avoid performing the i * i
multiplication with each iteration. You would probably be better off computing the \$\sqrt{n}\$ limit just once.
Then, as with the other function, you should express the loop more Pythonically using range()
or xrange()
. Actually, you wouldn't need an explicit loop; you could use any()
instead.
So, if we stick to your algorithm in spirit, we could tighten up the implementation like this:
def is_prime(n):
if n < 3 or n % 2 == 0:
return n == 2
else:
return not any(n % i == 0 for i in range(3, int(n**0.5 + 2), 2))
-
\$\begingroup\$ using is_prime was really dumb. Good Observation. \$\endgroup\$user2305– user23052016年03月04日 10:51:34 +00:00Commented Mar 4, 2016 at 10:51
-
\$\begingroup\$ @200_success @user2305 "If all factors smaller than
i
have been removed fromn
already, theni
could not possibly be composite."i
can and will be composite (it's just going through odd numbers, and most of them are composite). It's just that, ifi
is composite, it will not divide the remainingn
, and I think that's what the OP was trying to skip. However, I do agree with removing the call tois_prime()
: trying to avoid aO(1)
operation with aO(log(n))
check is not exactly an optimization! :) \$\endgroup\$OxTaz– OxTaz2016年03月04日 17:16:07 +00:00Commented Mar 4, 2016 at 17:16 -
1\$\begingroup\$ @200_success: you can break the loop if
n < i * i
and yieldn
as the largest prime factor. \$\endgroup\$Daniel– Daniel2016年03月06日 15:37:37 +00:00Commented Mar 6, 2016 at 15:37
Your code is written fairly well, yet in is_prime
you forgot to check whether the input integer is 2; just add this:
if n == 2:
return True
Also, for some negative integers is_prime
returns True
. I suggest you add the following statement to prune them away:
if n <= 1:
return False
prime_factors
never calls is_prime
with a number that's divisible by 2 or 3. is_prime
could be faster if it used this knowledge. However, if is_prime
will use this knowledge, then it may be a bit misleading in global scope. So you could move the definition of is_prime
inside of prime_factors
.
Although the code is Pythonic alright, it would be more idiomatic to use a prime generator to get the next prime factor, instead of iterating from 5 upwards. Given a prime generator gen
, the final loop of the code would become:
while n != 1:
prime = next(gen)
while not n % prime:
prime_factor_list.append(prime)
n //= prime
Few additional thoughts to Fast Number Factorization in Python answer.
is_prime()
In case if you have multiple consequent calls you should use something like Sieve_of_Eratosthenes. If you will, time to generate sieve will depend on maximum value of number to factorize but total time will be reduces.
prime_factors()
There is one thing you miss in your code. Lets take prime number, let is be \$ 10007 \$ and multiply it by \$ 2 \,ドル we will receive \$ 20014 \$. Its factorization will be \$ 20014 = 10007 \times 2 \$.
Now lets analyze your prime_factors
. You will find that \$ 2 \$ is prime divisor of \$ 20014 \$ and will continue to iterate through all prime numbers up to \$ 10007 \$ but actually you have to iterate up to square root of initial number, because if you cross this line and reminder still not equal \$ 1 \$ than it is prime.
This trick can reduce time for such cases.
As the result, you can assume that you need to generate sieve up to square root of highest number.