Fast Prime Sieve (Python Implementation)

Question 1

A modified version of the prime sieve. I actually doubt it could be implemented any faster, but I might be wrong:

from math import sqrt
def sieve(n):
 """
 * A fast implementation of the prime sieve algorithm. Finds all prime numbers up to a given upper bound.
 * Params:
 * n: The upper bound.
 * Return:
 * A list of all prime numbers in [0, n].
 * Implementation:
 * Implements the sieve of Erasthotenes with a few optimisations for speed.
 * We only delete multiples of numbers up to `int(sqrt(n))`. This suffices to find all prime numbers.
 * We manipulate a list of boolean values, indexed by the number itself with the values denoting its primality. This is much faster than deleting elements in the list.
 """
 if n < 2: return []
 prime = [True]*(n + 1) #Initialise the list.
 rt = int(sqrt(n))+1
 for x in range(2, rt): #We only need to seive multiples of numbers <= to `n`.
 if prime[x]:
 for c in range(x*x, n + 1, x): prime[c] = False
 return [idx for idx, p in enumerate(prime) if p][2:]

Question 2

Please always include a language tag with your question. With Python it's also best to include one of python-3.x or python-2.x if your question is targeting that environment.

Question 3

Since this is tagged beginner, you should know that speed isn't everything. First of all you should care about making your code readable as ultimately that is more important.

You should always put new statements on new lines. At first I ignored your first guard statement, due to this. I only caught on that you'd done this when I saw the second from last line was huge.
Your comments aren't great. Yes [True]*(n + 1) initializes a list, we can read the code to understand that.
Your variable names don't really help reading the code.

def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so `sieve` is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 for composite in range(base*2, limit, base):
 primes[composite] = False
 return [num for num, is_prime in enumerate(primes) if is_prime][2:]

I'd like to point out that your memory usage can be improved. How many \$O(n)\$ lists do you have?

primes to perform the sieve.
Your list comprehension, that gets all the indexes.
Your duplication of the list comprehension to remove two values.

You can remove the second by returning an iterator. As for the third, you can instead perform an islice, or setting 0 and 1 to false. Yes this is now 10% slower.

However if you change the second for loop to run in C land rather than Python land you net a 40% speed-up from the original.

from math import ceil
def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
 primes[0] = primes[1] = False
 return (num for num, is_prime in enumerate(primes) if is_prime)

From this changing the last line to use itertools.compress rather than an iterator comprehension reduces time an additional 40% (nearly 70% from the original).

from math import ceil
from itertools import compress
def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
 primes[0] = primes[1] = False
 return compress(range(limit), primes)

Question 4

Thanks for the suggestions, I am not so sure why you removed sqrt() though? Wouldn't it be faster than n**0.5? Also, I don't understand what this line of code does: primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2). Could you please explain?

Question 5

@TobiAlafin Preference and speed. timeit("sqrt(10000)", "from math import sqrt", number=100000000) -> 9.42233170700274, timeit("10000 ** 0.5", number=100000000) -> 0.8613513769996644. It's called vectorization, in short Python slow, C fast, do everything in C.

Question 6

Nice idea with the slice assignment. I thought doing primes[base*base:limit:base] = [False]*((((limit - base*base) - 1) // base) + 1) would be even faster, since the slice is a bit smaller, but it seems to be exactly the same speed...

Question 7

@Graipher Oh, I guess late last night I didn't notice the difference between n*n and n*2, oops. That's interesting, wouldn't have thought changing a couple hundred/thousand values at a time would run that quickly. Did you test with larger limits? Mine are at 10000.

Question 8

@Peilonrayz: Here's the plot: i.sstatic.net/tb2Pe.png

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2019-02-14 02:22:51Z

Since this is tagged beginner, you should know that speed isn't everything. First of all you should care about making your code readable as ultimately that is more important.

You should always put new statements on new lines. At first I ignored your first guard statement, due to this. I only caught on that you'd done this when I saw the second from last line was huge.
Your comments aren't great. Yes [True]*(n + 1) initializes a list, we can read the code to understand that.
Your variable names don't really help reading the code.

def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so `sieve` is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 for composite in range(base*2, limit, base):
 primes[composite] = False
 return [num for num, is_prime in enumerate(primes) if is_prime][2:]

I'd like to point out that your memory usage can be improved. How many \$O(n)\$ lists do you have?

primes to perform the sieve.
Your list comprehension, that gets all the indexes.
Your duplication of the list comprehension to remove two values.

You can remove the second by returning an iterator. As for the third, you can instead perform an islice, or setting 0 and 1 to false. Yes this is now 10% slower.

However if you change the second for loop to run in C land rather than Python land you net a 40% speed-up from the original.

from math import ceil
def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
 primes[0] = primes[1] = False
 return (num for num, is_prime in enumerate(primes) if is_prime)

From this changing the last line to use itertools.compress rather than an iterator comprehension reduces time an additional 40% (nearly 70% from the original).

from math import ceil
from itertools import compress
def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
 primes[0] = primes[1] = False
 return compress(range(limit), primes)

Thanks for the suggestions, I am not so sure why you removed sqrt() though? Wouldn't it be faster than n**0.5? Also, I don't understand what this line of code does: primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2). Could you please explain?
@TobiAlafin Preference and speed. timeit("sqrt(10000)", "from math import sqrt", number=100000000) -> 9.42233170700274, timeit("10000 ** 0.5", number=100000000) -> 0.8613513769996644. It's called vectorization, in short Python slow, C fast, do everything in C.
Nice idea with the slice assignment. I thought doing primes[base*base:limit:base] = [False]*((((limit - base*base) - 1) // base) + 1) would be even faster, since the slice is a bit smaller, but it seems to be exactly the same speed...
@Graipher Oh, I guess late last night I didn't notice the difference between n*n and n*2, oops. That's interesting, wouldn't have thought changing a couple hundred/thousand values at a time would run that quickly. Did you test with larger limits? Mine are at 10000.

Stack Exchange Network

Fast Prime Sieve (Python Implementation)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Fast Prime Sieve (Python Implementation)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions