Answering this SO question recently, I've developed the following code (based on a well-known Active Code recipe, as discussed here):
wheel = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,
4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
wsize = 48
def primes():
yield from (2, 3, 5, 7)
yield from wsieve()
def wsieve(): # wheel-sieve, by Will Ness. cf. ideone.com/WFv4f
yield 11 # cf. https://stackoverflow.com/a/10733621/849891
mults = {} # https://stackoverflow.com/a/19391111/849891
ps = wsieve()
p = next(ps)
psq, c, i = p*p, 11, 0 # 13 = 11 + wheel[0]
cbase, ibase = 11, 0
while True:
c += wheel[i] ; i = (i+1) % wsize # 17 = 13 + wheel[1]
if c in mults:
(j,pbase) = mults.pop(c) # add(mults, NEXT c, j, pbase)
elif c < psq:
yield c ; continue
else: # (c==psq)
while not cbase == p:
cbase += wheel[ibase]
ibase = (ibase+1) % wsize # ibase - initial offset into wheel, for p
j, pbase = ibase, p # add(mults, NEXT c, ibase, p)
p = next(ps) ; psq = p*p
m = c + pbase*wheel[j] ; j = (j+1) % wsize # mults(p) = map (*p)
while m in mults: # roll(wheel,ibase,p)
m += pbase*wheel[j] ; j = (j+1) % wsize
mults[m] = (j,pbase)
Comparing its performance on ideone with its odds-based predecessor reveals it runs about 1.5x faster. Empirical complexity seems okay (~ n1.1, for n primes produced).
Theoretically, it's supposed to be 1 / (2/3 * 4/5 * 6/7) =
2.1875x times faster, if I'm not mistaken (indeed, taking ratio of circumference over number of teeth, (210/48) / (2/1) = 2.1875
). But it is slower than that.
Is this just because of the increased code complexity? Is there a way to make it run faster, without changing it drastically? Improve it in some other sense, or make it prettier/more pythonic without loosing performance? Will appreciate a review.
In particular, I have the rolled wheel streams inlined, using different variables for different "objects". It seems all over the place; is there a way to encapsulate it into a bona fide object, without loosing performance? (last time when I used functions for some encapsulation, efficiency suffered).
-
2\$\begingroup\$ FWIW: PyPy. \$\endgroup\$Veedrac– Veedrac2015年06月01日 22:30:22 +00:00Commented Jun 1, 2015 at 22:30
3 Answers 3
I thought of two things that achieve a 27 % speedup and simplify wsieve
a bit, at the expense of some precomputations.
Turning the wheel step by step from prime to prime seems wasteful. Instead of this code
while not cbase == p: cbase += wheel[ibase] ibase = (ibase+1) % wsize j = ibase
you could do directly
j = spoke_index[p % 210]
having precomputed a suitablespoke_index
dictionary with 48 entries. The variablescbase
andibase
can be eliminated.When a wheel is involved I find it hard to resist using
itertools.cycle
. Instead of thisi = 0 while True: c += wheel[i] ; i = (i+1) % wsize
you could do this
for step in cycle(wheel): c += step
The other uses of the wheel are more difficult to replace because there is no direct way to start the cycle at an arbitrary position. However, we can precompute all 48 rotations of
wheels
and put them in a dictionary for easy lookup.
My version of full solution. Instead of the index j
I put a cyclic iterator in mults
.
from itertools import cycle
CIRCUMFERENCE = 2*3*5*7
BASE_PRIMES = (2,3,5,7)
NEXT_PRIME = 11
def wheel(start):
result = []
i = start
for j in range(i + 1, i + 1 + CIRCUMFERENCE):
if all(j % k for k in BASE_PRIMES):
result.append(j - i)
i = j
return result
def rotated_wheels():
result = {}
i = 1
while i < CIRCUMFERENCE:
result[i] = wheel(i)
i = i + result[i][0]
return result
def primes():
yield from BASE_PRIMES
yield from wsieve()
def wsieve(wheels=rotated_wheels()):
yield NEXT_PRIME
mults = {}
ps = wsieve()
p = next(ps)
psq, c = p*p, p
cwheel = cycle(wheels[c])
for step in cwheel:
c += step
if c in mults:
(mwheel, pbase) = mults.pop(c)
elif c < psq:
yield c
continue
else: # (c==psq)
mwheel = cycle(wheels[p % CIRCUMFERENCE])
pbase = p
p = next(ps) ; psq = p*p
m = c
for mstep in mwheel:
m += pbase * mstep
if m not in mults:
break
mults[m] = (mwheel, pbase)
Edited to use for
instead of while
.
-
\$\begingroup\$ the start index on the wheel for a prime
p
should be(p-11) % 210
, more or less. I thought turning the wheel is nice, conceptually, as it finds the intersection of coprimes (zipped with the wheel position) and primes; and the primes are found as set difference of coprimes and multiples of primes. it has a nice math quality about it (see the last entry here). :) also, since it's going at the sqrt pace, it shouldn't have contributed that much to the run time, I thought...cycle
is exactly the kind of idea I was hoping for. \$\endgroup\$Will Ness– Will Ness2015年06月02日 19:05:03 +00:00Commented Jun 2, 2015 at 19:05 -
\$\begingroup\$ using itertools tools may increase speed because they are working "on the C side" presumably. \$\endgroup\$Will Ness– Will Ness2015年06月02日 21:20:25 +00:00Commented Jun 2, 2015 at 21:20
-
1\$\begingroup\$ @WillNess Well if you prefer to turn the wheel, you can use
cycle(enumerate(wheel))
as the iterator. \$\endgroup\$Janne Karila– Janne Karila2015年06月03日 10:43:47 +00:00Commented Jun 3, 2015 at 10:43 -
\$\begingroup\$ following your advice, using itertools I was able to streamline the code some more and achieve a speedup of 11%! Thank you. \$\endgroup\$Will Ness– Will Ness2015年06月07日 11:17:39 +00:00Commented Jun 7, 2015 at 11:17
-
\$\begingroup\$ (18% actually...) \$\endgroup\$Will Ness– Will Ness2015年06月07日 12:23:35 +00:00Commented Jun 7, 2015 at 12:23
How long does your code take to find the first million primes (that is, all the primes below 15,485,864)?
>>> from timeit import timeit
>>> timeit(lambda:list(islice(primes(), 10**6)), number=1)
5.053490979000344
Compared to the implementations in this answer, this is faster than sieve
but slower than sieve2
:
>>> timeit(lambda:list(sieve(15485864)), number=1)
10.124665475999791
>>> timeit(lambda:list(sieve2(15485864)), number=1)
3.4201999040014925
Even though these implementations ought to be asympotically worse.
The trouble with implementing this kind of algorithm in Python is that the big overhead of the Python interpreter tends to overwhelm small algorithmic improvements, if those improvements mean that you end up spending more time in the slow Python bytecode and less time in the fast C implementation.
Update
There's some confusion in comments about what I mean by "asympotically worse" above. The complexity of sieving for primes below \$n\$ is $$ C n \log \log n + O(1) $$ for some constant \$C\$. Wheel sieving using the first \$k\$ primes theoretically saves a factor of $$ \prod_{i\le k}{p_i-1\over p_i},$$ so wheel sieving with 2 should halve the runtime; with 2 and 3 it should reduce the runtime to a third, and so on. This is a constant factor of improvement, so it wouldn't appear in the big-O analysis (it would get folded into the constant \$C\$). But in real life, constant factors matter too: effectively in your code you are exchanging a larger number of cheap operations for a smaller number of expensive operations. With a large-enough wheel, you ought to be able to beat the naïve algorithms. But it might have to be quite large.
-
\$\begingroup\$ 1M 5.10s-8.7MB on ideone. \$\endgroup\$Will Ness– Will Ness2015年06月01日 13:58:47 +00:00Commented Jun 1, 2015 at 13:58
-
\$\begingroup\$ I wonder whether the unrolling could help - like shown in your linked answer's #8. It allows for constant step again. Maybe for the smaller wheel, with 30-circumference and only 8 teeth... will have to try it sometime. \$\endgroup\$Will Ness– Will Ness2015年06月01日 14:53:18 +00:00Commented Jun 1, 2015 at 14:53
-
2\$\begingroup\$ The key to the performance of
sieve8
is that it does very little in the Python interpreter: by careful computation of indexes it manages to dispatch nearly all the work to NumPy, where numeric operations on arrays are cheap. \$\endgroup\$Gareth Rees– Gareth Rees2015年06月01日 14:55:16 +00:00Commented Jun 1, 2015 at 14:55
Using itertools after the advice from Janne Karila (big thanks!), I got a whopping 1.5x speedup with the following streamlined, and dare I say, altogether canonical-looking code,
def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89
wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2,
4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10]
cs = accumulate( chain( [11], cycle( wh11)))
yield( next( cs)) # cf. ideone.com/WFv4f
ps = wsieve() # codereview.stackexchange.com/q/92365/9064
p = next(ps) # 11
psq = p*p # 121
D = dict( zip( accumulate( chain( [0], wh11)), count(0))) # start from
sieve = {}
for c in cs:
if c in sieve:
wheel = sieve.pop(c)
for m in wheel:
if not m in sieve:
break
sieve[m] = wheel # sieve[143] = wheel@187
elif c < psq:
yield c
else: # (c==psq)
# map (p*) (roll wh from p) = roll (wh*p) from (p*p)
x = [p*d for d in wh11]
i = D[ (p-11) % 210]
wheel = accumulate( chain( [psq], cycle( x[i:] + x[:i])))
p = next(ps) ; psq = p*p
next(wheel) ; m = next(wheel)
sieve[m] = wheel
a later note: an attempt at unrolling it didn't pan out, ran at ~20% slower.
-
\$\begingroup\$ Answer converted to community wiki. \$\endgroup\$Simon Forsberg– Simon Forsberg2016年07月29日 23:15:58 +00:00Commented Jul 29, 2016 at 23:15
-
\$\begingroup\$ Is there a way I can make this yield 2,3,5, and 7 before it starts? doing so the obvious way breaks everything. \$\endgroup\$Oscar Smith– Oscar Smith2016年11月01日 23:43:15 +00:00Commented Nov 1, 2016 at 23:43
-
\$\begingroup\$ @OscarSmith just create another generator that yields 2,3,5,7, and then calls
wsieve
. It's already defined in the question itself, asdef primes()
. \$\endgroup\$Will Ness– Will Ness2016年11月02日 08:49:35 +00:00Commented Nov 2, 2016 at 8:49
Explore related questions
See similar questions with these tags.