A modified version of the prime sieve. I actually doubt it could be implemented any faster, but I might be wrong:
from math import sqrt
def sieve(n):
"""
* A fast implementation of the prime sieve algorithm. Finds all prime numbers up to a given upper bound.
* Params:
* n: The upper bound.
* Return:
* A list of all prime numbers in [0, n].
* Implementation:
* Implements the sieve of Erasthotenes with a few optimisations for speed.
* We only delete multiples of numbers up to `int(sqrt(n))`. This suffices to find all prime numbers.
* We manipulate a list of boolean values, indexed by the number itself with the values denoting its primality. This is much faster than deleting elements in the list.
"""
if n < 2: return []
prime = [True]*(n + 1) #Initialise the list.
rt = int(sqrt(n))+1
for x in range(2, rt): #We only need to seive multiples of numbers <= to `n`.
if prime[x]:
for c in range(x*x, n + 1, x): prime[c] = False
return [idx for idx, p in enumerate(prime) if p][2:]
-
\$\begingroup\$ Please always include a language tag with your question. With Python it's also best to include one of python-3.x or python-2.x if your question is targeting that environment. \$\endgroup\$Peilonrayz– Peilonrayz ♦2019年02月14日 01:37:04 +00:00Commented Feb 14, 2019 at 1:37
1 Answer 1
Since this is tagged beginner, you should know that speed isn't everything. First of all you should care about making your code readable as ultimately that is more important.
You should always put new statements on new lines. At first I ignored your first guard statement, due to this. I only caught on that you'd done this when I saw the second from last line was huge.
Your comments aren't great. Yes
[True]*(n + 1)
initializes a list, we can read the code to understand that.Your variable names don't really help reading the code.
def sieve(limit):
if limit < 2:
return []
limit += 1 # Preincrement `limit` so `sieve` is inclusive, unlike `range`.
primes = [True]*limit
for base in range(2, int(limit**0.5 + 1)):
if primes[base]:
for composite in range(base*2, limit, base):
primes[composite] = False
return [num for num, is_prime in enumerate(primes) if is_prime][2:]
I'd like to point out that your memory usage can be improved. How many \$O(n)\$ lists do you have?
primes
to perform the sieve.- Your list comprehension, that gets all the indexes.
- Your duplication of the list comprehension to remove two values.
You can remove the second by returning an iterator. As for the third, you can instead perform an islice
, or setting 0 and 1 to false. Yes this is now 10% slower.
However if you change the second for loop to run in C land rather than Python land you net a 40% speed-up from the original.
from math import ceil
def sieve(limit):
if limit < 2:
return []
limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
primes = [True]*limit
for base in range(2, int(limit**0.5 + 1)):
if primes[base]:
primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
primes[0] = primes[1] = False
return (num for num, is_prime in enumerate(primes) if is_prime)
From this changing the last line to use itertools.compress
rather than an iterator comprehension reduces time an additional 40% (nearly 70% from the original).
from math import ceil
from itertools import compress
def sieve(limit):
if limit < 2:
return []
limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
primes = [True]*limit
for base in range(2, int(limit**0.5 + 1)):
if primes[base]:
primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
primes[0] = primes[1] = False
return compress(range(limit), primes)
-
\$\begingroup\$ Thanks for the suggestions, I am not so sure why you removed
sqrt()
though? Wouldn't it be faster thann**0.5
? Also, I don't understand what this line of code does:primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
. Could you please explain? \$\endgroup\$Tobi Alafin– Tobi Alafin2019年02月14日 07:53:35 +00:00Commented Feb 14, 2019 at 7:53 -
\$\begingroup\$ @TobiAlafin Preference and speed.
timeit("sqrt(10000)", "from math import sqrt", number=100000000)
-> 9.42233170700274,timeit("10000 ** 0.5", number=100000000)
-> 0.8613513769996644. It's called vectorization, in short Python slow, C fast, do everything in C. \$\endgroup\$2019年02月14日 09:00:00 +00:00Commented Feb 14, 2019 at 9:00 -
\$\begingroup\$ Nice idea with the slice assignment. I thought doing
primes[base*base:limit:base] = [False]*((((limit - base*base) - 1) // base) + 1)
would be even faster, since the slice is a bit smaller, but it seems to be exactly the same speed... \$\endgroup\$Graipher– Graipher2019年02月14日 09:43:07 +00:00Commented Feb 14, 2019 at 9:43 -
\$\begingroup\$ @Graipher Oh, I guess late last night I didn't notice the difference between
n*n
andn*2
, oops. That's interesting, wouldn't have thought changing a couple hundred/thousand values at a time would run that quickly. Did you test with larger limits? Mine are at 10000. \$\endgroup\$2019年02月14日 09:54:31 +00:00Commented Feb 14, 2019 at 9:54 -
1\$\begingroup\$ @Peilonrayz: Here's the plot: i.sstatic.net/tb2Pe.png \$\endgroup\$Graipher– Graipher2019年02月14日 10:57:51 +00:00Commented Feb 14, 2019 at 10:57
Explore related questions
See similar questions with these tags.