I wanted to test my performance optimizing skills, and so wanted to find how fast I could get the first \$n\$ prime numbers. I limited myself to only the standard library, as I'm sure numpy
or another library written in C has a prime generator way faster than Python ever will be - and offloading to a library doesn't really improve my ability to improve performance.
I've implemented both the Sieve of Eratosthenes and the Sieve of Sundaram. The SoE was based off my answer here, and the SoS was based off Wikipedias definition. These are available at the end of the answer.
I improved performance by:
SoE: Vectorizing the creation of primes.
primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
SoE: Change the start of the slice from \2ドルb\$ to \$b^2\$. [1]
primes[base*base:limit:base] = [False]*((((limit - base*base) - 1) // base) + 1)
SoE: Simplify the calculations - addition seems to be faster than multiplication.
primes[base * base::base] = [False] * ((limit - 1) // base - base + 1)
SoE: Use
itertools.compress
, rather than a comprehension.SoS: Vectorize the inner loop.
start = 1 + 3*j step = 1 + 3*j primes[start::step] = [False] * ceil((n - start) / step)
SoS: Vectorize the creation of values that only have one value in the sequence.
When \$\frac{n - \text{start}}{\text{stop}} = \frac{n - (1 + 3j)}{1 + 2j} \le 1\$ is equivalent to \$n \le 2 + 5j\$ we know we can stop at \$j = \frac{n - 2}{5}\$.
multi_stop = (n - 2) // 5 for j in range(1, multi_stop): start = 1 + 3*j step = 1 + 2*j primes[start::step] = [False] * ceil((n - start) / step) if multi_stop >= 1: single_start = multi_stop * 3 + 1 primes[single_start::3] = [False] * ceil((n - single_start) / 3)
- SoS: It doesn't seem like you need the
if
created above, and so you can just save wasted cycles.
I tried defining false = [False]*limit
and slice it, but found it to be slower than creating new lists in the loop.
This got the following prime sieves:
from math import ceil
from itertools import compress
def sieve_eratosthenes(limit):
if limit <= 1:
return []
primes = [True] * limit
for base in range(2, int(limit**0.5 + 1)):
if primes[base]:
primes[base * base::base] = [False] * ((limit - 1) // base - base + 1)
primes[0] = primes[1] = False
return list(compress(range(limit), primes))
def sieve_sundaram(limit):
if limit <= 1:
return []
n = (limit - 1) // 2
primes = [True] * n
for j in range(1, (n - 2) // 5):
start = 1 + 3*j
step = 1 + 2*j
primes[start::step] = [False] * ceil((n - start) / step)
return [2] + [2*i + 1 for i, p in enumerate(primes) if p][1:]
Both are faster than the both the original functions.
enter image description here enter image description here
Code to generate graphs:
from math import ceil
from itertools import compress
import numpy as np
import matplotlib.pyplot as plt
from graphtimer import Plotter, MultiTimer
def sieve_eratosthenes_orig(limit):
if limit <= 1:
return []
primes = [True] * limit
for base in range(2, int(limit**0.5 + 1)):
if primes[base]:
for composite in range(base * 2, limit, base):
primes[composite] = False
return [num for num, is_prime in enumerate(primes) if is_prime][2:]
def sieve_eratosthenes(limit):
if limit <= 1:
return []
primes = [True] * limit
for base in range(2, int(limit**0.5 + 1)):
if primes[base]:
primes[base * base::base] = [False] * ((limit - 1) // base - base + 1)
primes[0] = primes[1] = False
return list(compress(range(limit), primes))
def sieve_sundaram_orig(limit):
if limit <= 1:
return []
n = (limit - 1) // 2
primes = [True] * n
for j in range(1, n):
for i in range(1, j + 1):
value = i + j + 2*i*j
if value < n:
primes[value] = False
return [2] + [2*i + 1 for i, p in enumerate(primes) if p][1:]
def sieve_sundaram(limit):
if limit <= 1:
return []
n = (limit - 1) // 2
primes = [True] * n
for j in range(1, (n - 2) // 5):
start = 1 + 3*j
step = 1 + 2*j
primes[start::step] = [False] * ceil((n - start) / step)
return [2] + [2*i + 1 for i, p in enumerate(primes) if p][1:]
def sieve_test(limit):
if limit <= 1:
return []
n = (limit - 1) // 2
primes = [True] * n
multi_stop = (n - 2) // 5
for j in range(1, multi_stop):
start = 1 + 3*j
step = 1 + 2*j
primes[start::step] = [False] * ceil((n - start) / step)
return [2] + [2*i + 1 for i, p in enumerate(primes) if p][1:]
def test():
for exp in range(6):
limit = 10 ** exp
assert sieve_test(limit) == sieve_eratosthenes(limit)
def main():
fig, axs = plt.subplots()
axs.set_yscale('log')
axs.set_xscale('log')
(
Plotter(MultiTimer([
sieve_eratosthenes_orig,
sieve_eratosthenes,
sieve_sundaram,
sieve_sundaram_orig,
# sieve_test,
]))
.repeat(5, 5, np.logspace(0.35, 2), args_conv=int)
.min()
.plot(axs, x_label='limit')
)
fig.show()
if __name__ == '__main__':
test()
main()
To use the above code snippet you need to install numpy, matplotlib and graphtimer. All should be available via pypi.
Can they be made faster, or is a different sieve faster?
1 Answer 1
def sieve_eratosthenes(limit): if limit <= 1: return [] primes = [True] * limit for base in range(2, int(limit**0.5 + 1)): if primes[base]: primes[base * base::base] = [False] * ((limit - 1) // base - base + 1) primes[0] = primes[1] = False return list(compress(range(limit), primes))
No attempt at all to use a wheel? I get roughly a 25% speedup just by special-casing the prime 2 with:
def sieve_eratosthenes_wheel(limit):
if limit <= 1:
return []
primes = [True] * limit
if limit > 4:
primes[4::2] = [False] * ((limit - 1) // 2 - 2 + 1)
for base in range(3, int(limit**0.5 + 1), 2):
if primes[base]:
# We require off + (len-1)*step < limit <= off + len*step
# So len = ceil((limit - off) / step)
primes[base*base::2*base] = [False] * ((limit - base*base + 2*base - 1) // (2*base))
primes[0] = primes[1] = False
return list(compress(range(limit), primes))
Using primes 2 and 3 it's possible to do two range updates with step sizes of 6*base
, but it gets more complicated to calculate the initial offsets, which depend on base % 6
:
def sieve_eratosthenes_wheel3(limit):
if limit <= 1:
return []
primes = [True] * limit
def mark_composite(off, step):
# We require off + (len-1)*step < limit <= off + len*step
# So len = ceil((limit - off) / step)
primes[off::step] = [False] * ((limit - off + step - 1) // step)
mark_composite(4, 2)
mark_composite(9, 6)
base = 5
max_base = int(limit**0.5)
while base <= max_base:
# base == 5 (mod 6)
if primes[base]:
mark_composite(base*base, 6*base)
mark_composite(base*(base+2), 6*base)
base += 2
# base == 1 (mod 6)
if primes[base]:
mark_composite(base*base, 6*base)
mark_composite(base*(base+4), 6*base)
base += 4
primes[0] = primes[1] = False
return list(compress(range(limit), primes))
For limit
50 million, taking sieve_eratosthenes
as the baseline of 100 time units, I measure sieve_eratosthenes_wheel
at about 73 time units and sieve_eratosthenes_wheel3
at about 63 time units.
-
\$\begingroup\$ Could you explain what "a wheel" is? I know car wheels, but this meaning is definitely new to me :) \$\endgroup\$2019年08月28日 16:43:34 +00:00Commented Aug 28, 2019 at 16:43
-
\$\begingroup\$ @Peilonrayz en.wikipedia.org/wiki/Wheel_factorization \$\endgroup\$Rick Davin– Rick Davin2019年08月28日 17:42:54 +00:00Commented Aug 28, 2019 at 17:42
Explore related questions
See similar questions with these tags.
xrange = range
and ensuring no IndexErrors I get the following graph. Which makes it faster than SoS at 1000 and always slower than SoE. \$\endgroup\$