Wheel based unbounded sieve of Eratosthenes in Python

Question 1

Answering this SO question recently, I've developed the following code (based on a well-known Active Code recipe, as discussed here):

wheel = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,
 4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
wsize = 48
def primes(): 
 yield from (2, 3, 5, 7)
 yield from wsieve()
def wsieve(): # wheel-sieve, by Will Ness. cf. ideone.com/WFv4f
 yield 11 # cf. https://stackoverflow.com/a/10733621/849891 
 mults = {} # https://stackoverflow.com/a/19391111/849891
 ps = wsieve() 
 p = next(ps) 
 psq, c, i = p*p, 11, 0 # 13 = 11 + wheel[0]
 cbase, ibase = 11, 0 
 while True:
 c += wheel[i] ; i = (i+1) % wsize # 17 = 13 + wheel[1]
 if c in mults:
 (j,pbase) = mults.pop(c) # add(mults, NEXT c, j, pbase)
 elif c < psq: 
 yield c ; continue 
 else: # (c==psq) 
 while not cbase == p:
 cbase += wheel[ibase]
 ibase = (ibase+1) % wsize # ibase - initial offset into wheel, for p
 j, pbase = ibase, p # add(mults, NEXT c, ibase, p)
 p = next(ps) ; psq = p*p
 m = c + pbase*wheel[j] ; j = (j+1) % wsize # mults(p) = map (*p)
 while m in mults: # roll(wheel,ibase,p)
 m += pbase*wheel[j] ; j = (j+1) % wsize 
 mults[m] = (j,pbase)

Comparing its performance on ideone with its odds-based predecessor reveals it runs about 1.5x faster. Empirical complexity seems okay (~ n^1.1, for n primes produced).

Theoretically, it's supposed to be 1 / (2/3 * 4/5 * 6/7) = 2.1875x times faster, if I'm not mistaken (indeed, taking ratio of circumference over number of teeth, (210/48) / (2/1) = 2.1875). But it is slower than that.

Is this just because of the increased code complexity? Is there a way to make it run faster, without changing it drastically? Improve it in some other sense, or make it prettier/more pythonic without loosing performance? Will appreciate a review.

In particular, I have the rolled wheel streams inlined, using different variables for different "objects". It seems all over the place; is there a way to encapsulate it into a bona fide object, without loosing performance? (last time when I used functions for some encapsulation, efficiency suffered).

Question 2

FWIW: PyPy.

Question 3

I thought of two things that achieve a 27 % speedup and simplify wsieve a bit, at the expense of some precomputations.

Turning the wheel step by step from prime to prime seems wasteful. Instead of this code
```
while not cbase == p:
 cbase += wheel[ibase]
 ibase = (ibase+1) % wsize 
j = ibase 
```
you could do directly j = spoke_index[p % 210] having precomputed a suitable spoke_index dictionary with 48 entries. The variables cbase and ibase can be eliminated.
When a wheel is involved I find it hard to resist using itertools.cycle. Instead of this
```
i = 0 
while True:
 c += wheel[i] ; i = (i+1) % wsize 
```
you could do this
```
for step in cycle(wheel):
 c += step
```
The other uses of the wheel are more difficult to replace because there is no direct way to start the cycle at an arbitrary position. However, we can precompute all 48 rotations of wheels and put them in a dictionary for easy lookup.

My version of full solution. Instead of the index j I put a cyclic iterator in mults.

from itertools import cycle
CIRCUMFERENCE = 2*3*5*7
BASE_PRIMES = (2,3,5,7)
NEXT_PRIME = 11
def wheel(start):
 result = []
 i = start
 for j in range(i + 1, i + 1 + CIRCUMFERENCE):
 if all(j % k for k in BASE_PRIMES):
 result.append(j - i)
 i = j
 return result
def rotated_wheels():
 result = {}
 i = 1
 while i < CIRCUMFERENCE:
 result[i] = wheel(i)
 i = i + result[i][0]
 return result
def primes(): 
 yield from BASE_PRIMES
 yield from wsieve()
def wsieve(wheels=rotated_wheels()): 
 yield NEXT_PRIME 
 mults = {} 
 ps = wsieve() 
 p = next(ps) 
 psq, c = p*p, p 
 cwheel = cycle(wheels[c])
 for step in cwheel:
 c += step
 if c in mults:
 (mwheel, pbase) = mults.pop(c) 
 elif c < psq: 
 yield c 
 continue 
 else: # (c==psq) 
 mwheel = cycle(wheels[p % CIRCUMFERENCE])
 pbase = p 
 p = next(ps) ; psq = p*p
 m = c
 for mstep in mwheel:
 m += pbase * mstep
 if m not in mults:
 break
 mults[m] = (mwheel, pbase)

Edited to use for instead of while.

Question 4

the start index on the wheel for a prime p should be (p-11) % 210, more or less. I thought turning the wheel is nice, conceptually, as it finds the intersection of coprimes (zipped with the wheel position) and primes; and the primes are found as set difference of coprimes and multiples of primes. it has a nice math quality about it (see the last entry here). :) also, since it's going at the sqrt pace, it shouldn't have contributed that much to the run time, I thought... cycle is exactly the kind of idea I was hoping for.

Question 5

using itertools tools may increase speed because they are working "on the C side" presumably.

Question 6

@WillNess Well if you prefer to turn the wheel, you can use cycle(enumerate(wheel)) as the iterator.

Question 7

following your advice, using itertools I was able to streamline the code some more and achieve a speedup of 11%! Thank you.

Question 8

(18% actually...)

Question 9

How long does your code take to find the first million primes (that is, all the primes below 15,485,864)?

>>> from timeit import timeit
>>> timeit(lambda:list(islice(primes(), 10**6)), number=1)
5.053490979000344

Compared to the implementations in this answer, this is faster than sieve but slower than sieve2:

>>> timeit(lambda:list(sieve(15485864)), number=1)
10.124665475999791
>>> timeit(lambda:list(sieve2(15485864)), number=1)
3.4201999040014925

Even though these implementations ought to be asympotically worse.

The trouble with implementing this kind of algorithm in Python is that the big overhead of the Python interpreter tends to overwhelm small algorithmic improvements, if those improvements mean that you end up spending more time in the slow Python bytecode and less time in the fast C implementation.

Update

There's some confusion in comments about what I mean by "asympotically worse" above. The complexity of sieving for primes below \$n\$ is $$ C n \log \log n + O(1) $$ for some constant \$C\$. Wheel sieving using the first \$k\$ primes theoretically saves a factor of $$ \prod_{i\le k}{p_i-1\over p_i},$$ so wheel sieving with 2 should halve the runtime; with 2 and 3 it should reduce the runtime to a third, and so on. This is a constant factor of improvement, so it wouldn't appear in the big-O analysis (it would get folded into the constant \$C\$). But in real life, constant factors matter too: effectively in your code you are exchanging a larger number of cheap operations for a smaller number of expensive operations. With a large-enough wheel, you ought to be able to beat the naïve algorithms. But it might have to be quite large.

Question 10

1M 5.10s-8.7MB on ideone.

Question 11

I wonder whether the unrolling could help - like shown in your linked answer's #8. It allows for constant step again. Maybe for the smaller wheel, with 30-circumference and only 8 teeth... will have to try it sometime.

Question 12

The key to the performance of sieve8 is that it does very little in the Python interpreter: by careful computation of indexes it manages to dispatch nearly all the work to NumPy, where numeric operations on arrays are cheap.

Question 13

Using itertools after the advice from Janne Karila (big thanks!), I got a whopping 1.5x speedup with the following streamlined, and dare I say, altogether canonical-looking code,

def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89
 wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2,
 4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10]
 cs = accumulate( chain( [11], cycle( wh11)))
 yield( next( cs)) # cf. ideone.com/WFv4f
 ps = wsieve() # codereview.stackexchange.com/q/92365/9064
 p = next(ps) # 11
 psq = p*p # 121
 D = dict( zip( accumulate( chain( [0], wh11)), count(0))) # start from
 sieve = {}
 for c in cs:
 if c in sieve:
 wheel = sieve.pop(c) 
 for m in wheel: 
 if not m in sieve: 
 break
 sieve[m] = wheel # sieve[143] = wheel@187
 elif c < psq: 
 yield c 
 else: # (c==psq) 
 # map (p*) (roll wh from p) = roll (wh*p) from (p*p)
 x = [p*d for d in wh11]
 i = D[ (p-11) % 210]
 wheel = accumulate( chain( [psq], cycle( x[i:] + x[:i])))
 p = next(ps) ; psq = p*p
 next(wheel) ; m = next(wheel)
 sieve[m] = wheel

a later note: an attempt at unrolling it didn't pan out, ran at ~20% slower.

Question 14

Answer converted to community wiki.

Question 15

Is there a way I can make this yield 2,3,5, and 7 before it starts? doing so the obvious way breaks everything.

Question 16

@OscarSmith just create another generator that yields 2,3,5,7, and then calls wsieve. It's already defined in the question itself, as def primes().

Janne Karila Janne Karila 10.6k21 silver badges34 bronze badges · Accepted Answer · 2015-06-02 15:11:13Z

I thought of two things that achieve a 27 % speedup and simplify wsieve a bit, at the expense of some precomputations.

Turning the wheel step by step from prime to prime seems wasteful. Instead of this code
```
while not cbase == p:
 cbase += wheel[ibase]
 ibase = (ibase+1) % wsize 
j = ibase 
```
you could do directly j = spoke_index[p % 210] having precomputed a suitable spoke_index dictionary with 48 entries. The variables cbase and ibase can be eliminated.
When a wheel is involved I find it hard to resist using itertools.cycle. Instead of this
```
i = 0 
while True:
 c += wheel[i] ; i = (i+1) % wsize 
```
you could do this
```
for step in cycle(wheel):
 c += step
```
The other uses of the wheel are more difficult to replace because there is no direct way to start the cycle at an arbitrary position. However, we can precompute all 48 rotations of wheels and put them in a dictionary for easy lookup.

My version of full solution. Instead of the index j I put a cyclic iterator in mults.

from itertools import cycle
CIRCUMFERENCE = 2*3*5*7
BASE_PRIMES = (2,3,5,7)
NEXT_PRIME = 11
def wheel(start):
 result = []
 i = start
 for j in range(i + 1, i + 1 + CIRCUMFERENCE):
 if all(j % k for k in BASE_PRIMES):
 result.append(j - i)
 i = j
 return result
def rotated_wheels():
 result = {}
 i = 1
 while i < CIRCUMFERENCE:
 result[i] = wheel(i)
 i = i + result[i][0]
 return result
def primes(): 
 yield from BASE_PRIMES
 yield from wsieve()
def wsieve(wheels=rotated_wheels()): 
 yield NEXT_PRIME 
 mults = {} 
 ps = wsieve() 
 p = next(ps) 
 psq, c = p*p, p 
 cwheel = cycle(wheels[c])
 for step in cwheel:
 c += step
 if c in mults:
 (mwheel, pbase) = mults.pop(c) 
 elif c < psq: 
 yield c 
 continue 
 else: # (c==psq) 
 mwheel = cycle(wheels[p % CIRCUMFERENCE])
 pbase = p 
 p = next(ps) ; psq = p*p
 m = c
 for mstep in mwheel:
 m += pbase * mstep
 if m not in mults:
 break
 mults[m] = (mwheel, pbase)

Edited to use for instead of while.

the start index on the wheel for a prime p should be (p-11) % 210, more or less. I thought turning the wheel is nice, conceptually, as it finds the intersection of coprimes (zipped with the wheel position) and primes; and the primes are found as set difference of coprimes and multiples of primes. it has a nice math quality about it (see the last entry here). :) also, since it's going at the sqrt pace, it shouldn't have contributed that much to the run time, I thought... cycle is exactly the kind of idea I was hoping for.
using itertools tools may increase speed because they are working "on the C side" presumably.
@WillNess Well if you prefer to turn the wheel, you can use cycle(enumerate(wheel)) as the iterator.
following your advice, using itertools I was able to streamline the code some more and achieve a speedup of 11%! Thank you.

Stack Exchange Network

Wheel based unbounded sieve of Eratosthenes in Python

3 Answers 3

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Wheel based unbounded sieve of Eratosthenes in Python

3 Answers 3

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions