2
\$\begingroup\$

A modified version of the prime sieve. I actually doubt it could be implemented any faster, but I might be wrong:

from math import sqrt
def sieve(n):
 """
 * A fast implementation of the prime sieve algorithm. Finds all prime numbers up to a given upper bound.
 * Params:
 * n: The upper bound.
 * Return:
 * A list of all prime numbers in [0, n].
 * Implementation:
 * Implements the sieve of Erasthotenes with a few optimisations for speed.
 * We only delete multiples of numbers up to `int(sqrt(n))`. This suffices to find all prime numbers.
 * We manipulate a list of boolean values, indexed by the number itself with the values denoting its primality. This is much faster than deleting elements in the list.
 """
 if n < 2: return []
 prime = [True]*(n + 1) #Initialise the list.
 rt = int(sqrt(n))+1
 for x in range(2, rt): #We only need to seive multiples of numbers <= to `n`.
 if prime[x]:
 for c in range(x*x, n + 1, x): prime[c] = False
 return [idx for idx, p in enumerate(prime) if p][2:]
Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked Feb 14, 2019 at 1:28
\$\endgroup\$
1
  • \$\begingroup\$ Please always include a language tag with your question. With Python it's also best to include one of python-3.x or python-2.x if your question is targeting that environment. \$\endgroup\$ Commented Feb 14, 2019 at 1:37

1 Answer 1

5
\$\begingroup\$

Since this is tagged , you should know that speed isn't everything. First of all you should care about making your code readable as ultimately that is more important.

  • You should always put new statements on new lines. At first I ignored your first guard statement, due to this. I only caught on that you'd done this when I saw the second from last line was huge.

  • Your comments aren't great. Yes [True]*(n + 1) initializes a list, we can read the code to understand that.

  • Your variable names don't really help reading the code.

def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so `sieve` is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 for composite in range(base*2, limit, base):
 primes[composite] = False
 return [num for num, is_prime in enumerate(primes) if is_prime][2:]

I'd like to point out that your memory usage can be improved. How many \$O(n)\$ lists do you have?

  1. primes to perform the sieve.
  2. Your list comprehension, that gets all the indexes.
  3. Your duplication of the list comprehension to remove two values.

You can remove the second by returning an iterator. As for the third, you can instead perform an islice, or setting 0 and 1 to false. Yes this is now 10% slower.

However if you change the second for loop to run in C land rather than Python land you net a 40% speed-up from the original.

from math import ceil
def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
 primes[0] = primes[1] = False
 return (num for num, is_prime in enumerate(primes) if is_prime)

From this changing the last line to use itertools.compress rather than an iterator comprehension reduces time an additional 40% (nearly 70% from the original).

from math import ceil
from itertools import compress
def sieve(limit):
 if limit < 2:
 return []
 limit += 1 # Preincrement `limit` so sieve is inclusive, unlike `range`.
 primes = [True]*limit
 for base in range(2, int(limit**0.5 + 1)):
 if primes[base]:
 primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2)
 primes[0] = primes[1] = False
 return compress(range(limit), primes)
answered Feb 14, 2019 at 2:22
\$\endgroup\$
10
  • \$\begingroup\$ Thanks for the suggestions, I am not so sure why you removed sqrt() though? Wouldn't it be faster than n**0.5? Also, I don't understand what this line of code does: primes[base*2:limit:base] = [False]*(ceil(limit / base) - 2). Could you please explain? \$\endgroup\$ Commented Feb 14, 2019 at 7:53
  • \$\begingroup\$ @TobiAlafin Preference and speed. timeit("sqrt(10000)", "from math import sqrt", number=100000000) -> 9.42233170700274, timeit("10000 ** 0.5", number=100000000) -> 0.8613513769996644. It's called vectorization, in short Python slow, C fast, do everything in C. \$\endgroup\$ Commented Feb 14, 2019 at 9:00
  • \$\begingroup\$ Nice idea with the slice assignment. I thought doing primes[base*base:limit:base] = [False]*((((limit - base*base) - 1) // base) + 1) would be even faster, since the slice is a bit smaller, but it seems to be exactly the same speed... \$\endgroup\$ Commented Feb 14, 2019 at 9:43
  • \$\begingroup\$ @Graipher Oh, I guess late last night I didn't notice the difference between n*n and n*2, oops. That's interesting, wouldn't have thought changing a couple hundred/thousand values at a time would run that quickly. Did you test with larger limits? Mine are at 10000. \$\endgroup\$ Commented Feb 14, 2019 at 9:54
  • 1
    \$\begingroup\$ @Peilonrayz: Here's the plot: i.sstatic.net/tb2Pe.png \$\endgroup\$ Commented Feb 14, 2019 at 10:57

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.