Find primes using Sieve of Eratosthenes with Python

Question 1

I was Googling for a 'correct' or 'efficient' implementation of the Sieve of Eratosthenes. I found them all more complicated than what I came up with. I am by no means a Python expert, so I was wondering if something is wrong with my code.

def get_primes(max_int):
 numbers = range(2, max_int) 
 while len(numbers) > 0:
 numbers = [ num for num in numbers if num % numbers[0] != 0 or num == numbers[0] ] # only keep numbers that are NOT divisible by the prime 
 yield numbers[0] # yield the prime
 numbers = numbers[1:] # make a new numbers-array, the first num is a new prime

Call it with

primes = get_primes(1000)

And it gives you a generator of primes up until 1000.

Question 2

Your code looks correct to me, it does produce prime numbers in the given range.

PEP8 Online reports some code style violations with respect to whitespace and line length, but it is not too bad.

But: This is not the Sieve of Eratosthenes. The Sieve of Eratosthenes keeps a (single) list of all candidate numbers, and computes multiples of each found prime to mark subsequent composite numbers in the list. Your algorithm computes the remainder of all remaining candidates instead, and creates new lists in each step.

As a simple benchmark I ran

print(sum(p for p in get_primes(100000)))

with your code, this takes approximately 5 seconds on my MacBook.

It can be improved slightly by filtering the list only once in each step and not twice:

def get_primes(max_int):
 numbers = range(2, max_int)
 while len(numbers) > 0:
 yield numbers[0]
 numbers = [num for num in numbers if num % numbers[0] != 0]

This reduces the time to 4.6 seconds.

But any "real" Eratosthenes sieve is much faster. As an example, with this one from Rosetta code

def primes_upto(limit):
 is_prime = [False] * 2 + [True] * (limit - 1) 
 for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
 if is_prime[n]:
 for i in range(n*n, limit+1, n):
 is_prime[i] = False
 return [i for i, prime in enumerate(is_prime) if prime]

the above benchmark runs in 0.08 seconds.

Question 3

It seems I did not understand The Sieve enough, your answer helped me clarify. I've coded the implementation correctly now and it is indeed way faster.

Question 4

If you want to speed it up further, you can use slice assignment instead of the inner "for" loop:

def primes_slice(limit):
 is_prime = [False] * 2 + [True] * (limit - 1)
 for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
 if is_prime[n]:
 is_prime[n*n::n] = [False] * ((limit - n*n)/n + 1)
 return [i for i, prime in enumerate(is_prime) if prime]

That lets C code do the former inner loop for you under the covers, at the expense of some complicated stuff on the right side of the slice assignment. Even that can go away if you use the bitarray package for is_prime, which also reduces the memory footprint considerably. This tweak gives me about a 2x speed boost with n = 10,000,000

$ time ./prime_rosetta_slice.py 1e7
664579
real 0m1.134s
user 0m0.797s
sys 0m0.344s
$ time ./prime_rosetta_upto.py 1e7
664579
real 0m2.242s
user 0m1.875s
sys 0m0.281s

For the next performance improvement, try keeping only odd numbers> 2 in is_prime and then manually returning 2 as a special case.

Martin R Martin R 24.2k2 gold badges37 silver badges95 bronze badges · Accepted Answer · 2018-05-19 14:48:15Z

Your code looks correct to me, it does produce prime numbers in the given range.

PEP8 Online reports some code style violations with respect to whitespace and line length, but it is not too bad.

But: This is not the Sieve of Eratosthenes. The Sieve of Eratosthenes keeps a (single) list of all candidate numbers, and computes multiples of each found prime to mark subsequent composite numbers in the list. Your algorithm computes the remainder of all remaining candidates instead, and creates new lists in each step.

As a simple benchmark I ran

print(sum(p for p in get_primes(100000)))

with your code, this takes approximately 5 seconds on my MacBook.

It can be improved slightly by filtering the list only once in each step and not twice:

def get_primes(max_int):
 numbers = range(2, max_int)
 while len(numbers) > 0:
 yield numbers[0]
 numbers = [num for num in numbers if num % numbers[0] != 0]

This reduces the time to 4.6 seconds.

But any "real" Eratosthenes sieve is much faster. As an example, with this one from Rosetta code

def primes_upto(limit):
 is_prime = [False] * 2 + [True] * (limit - 1) 
 for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
 if is_prime[n]:
 for i in range(n*n, limit+1, n):
 is_prime[i] = False
 return [i for i, prime in enumerate(is_prime) if prime]

the above benchmark runs in 0.08 seconds.

It seems I did not understand The Sieve enough, your answer helped me clarify. I've coded the implementation correctly now and it is indeed way faster.

Stack Exchange Network

Find primes using Sieve of Eratosthenes with Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Find primes using Sieve of Eratosthenes with Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions