How can I speed up a script that iterates over a large range (600 billion)?

Chris Torek nospam at torek.net
Wed Jun 22 01:58:51 EDT 2011


Now that the exercise has been solved...
Instead of "really short code to solve the problem", how about 
some "really long code"? :-)
I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:
 $ factor 99999999999999999
 99999999999999999: 3 3 2071723 5363222357
The python code is rather too slow for this particular number (I
gave up on it finding 5363222357) but works quite well on 600851475143,
or even, e.g., 12186606004219:
 $ python factor.py 600851475143 12186606004219
 600851475143: 71 839 1471 6857
 12186606004219: 2071723 5882353
While searching for speed hacks I came across a few interesting
tricks, particularly TZ<omega>TZIOY's mod-30 scheme (erat3) at
stackoverflow.com (see URLs in the comments), which only really
works well in Python 2.7 and later (using itertools.compress).
Here is the end result (run with 2.5 and 2.7, I have no 3.x installed
anywhere convenient, and of course the print calls would change):
import itertools
import sys
def count(start = 0, step = 1):
 """
 Yield count starting from <start> and counting up by <step>.
 Same as itertools.count() except for the <step> argument, and
 allowing non-"int" arguments.
 
 Python 2.7 and later provides this directly, via itertools.
 Note: it's usually faster to use itertools.islice(itertools.count(...))
 than to run the "while True" loop below, so we do that if we can.
 """
 if (sys.version_info[0] > 2 or
 (sys.version_info[0] == 2 and sys.version_info[1] >= 7)):
 return itertools.count(start, step)
 if isinstance(start, int) and isinstance(step, int):
 if step == 1:
 return itertools.count(start)
 if 1 < step < 5: # 5 upper bound is a guess
 return itertools.islice(itertools.count(start), 0, None, step)
 def f(start, step):
 while True:
 yield start
 start += step
 return f(start, step)
# Sieve of Eratosthenes-based prime-number generator.
#
# Based on David Eppstein's for python 2.3(?) and subsequent
# discussion -- see
# http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
#
# See also:
# http://oreilly.com/pub/a/python/excerpt/pythonckbk_chap1/index1.html?page=last
#
# http://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python/3796442#3796442
def primes():
 """
 Yields sequence of prime numbers via Sieve of Eratosthenes.
 """
 # Keep the state from the last time we abandoned the
 # generator, in primes.primes and primes.D. We can then
 # very quickly re-yield previously-saved primes. We're
 # going to skip even numbers below, we so prime the
 # "primes so far" list with 2.
 #
 # For the fancy (erat3) version, we need to pre-provide
 # 2, 3, and 5, and pre-load D. Having done that we can
 # start either version at 7.
 try:
 primes.primes
 except AttributeError:
 primes.primes = [2, 3, 5]
 for p in primes.primes:
 yield p
 # OK, now we need a mapping holding known-composite values
 # (numbers "struck from the sieve").
 try:
 D = primes.D
 except AttributeError:
 D = primes.D = { 9: 3, 25: 5 }
 # D[q] exists if q is composite; its value is the first prime
 # number that proves that q is composite. (However, only odd
 # numbers appear in D.)
 q = p + 2 # where we start sieve-ing, below
 if sys.version_info[0] == 2 and sys.version_info[1] < 7:
 for q in count(q, 2):
 p = D.pop(q, None)
 if p is None:
 # q was not marked composite, so it's prime; moreover,
 # q-squared is composite (and odd) and its first prime
 # factor is q.
 primes.primes.append(q)
 D[q * q] = q
 yield q
 else:
 # q is composite, p is its first prime factor -- e.g.,
 # q=9 and p=3, or q=15 and p=3. Extend the sieve:
 # find first natural number k where q + 2kp is not already
 # already known as composite. (e.g., if q=9 and p=3, k=1
 # so that we mark D[15] as composite, with 3 as its first
 # prime factor.) Note that we are skipping even numbers,
 # so p and q are both odd, so q+p is even, q+2p is odd,
 # q+3p is even, q+4p is odd, etc. We don't need to mark
 # even-numbered composites in D, which is why we only look
 # at q + 2kp.
 twop = p + p
 x = q + twop # next odd multiple of p
 while x in D: # skip already-known composites
 x += twop
 D[x] = p
 else:
 # 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
 MASK = (1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0)
 MODULOS = frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )
 # If we started counting from 7, we'd want:
 # itertools.compress(itertools.count(7,2), itertools.cycle(MASK))
 # But we start counting from q which means we need to clip off
 # the first ((q - 7) % 30) // 2 items:
 offset = ((q - 7) % 30) // 2
 for q in itertools.compress(itertools.count(q, 2),
 itertools.islice(itertools.cycle(MASK), offset, None, 1)):
 p = D.pop(q, None)
 if p is None:
 D[q * q] = q
 primes.primes.append(q)
 yield q
 else:
 twop = p + p
 x = q + twop
 while x in D or (x % 30) not in MODULOS:
 x += twop
 D[x] = p
def factors(num):
 """
 Return all the prime factors of the given number.
 """
 if num < 0:
 num = -num
 if num < 2:
 return
 for p in primes():
 q, r = divmod(num, p)
 while r == 0:
 yield p
 if q == 1:
 return
 num = q
 q, r = divmod(num, p)
if __name__ == '__main__':
 for arg in (sys.argv[1:] if len(sys.argv) > 1 else ['600851475143']):
 try:
 arg = int(arg)
 except ValueError, error:
 print error
 else:
 print '%d:' % arg,
 for fac in factors(arg):
 print fac,
 sys.stdout.flush()
 print
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /