Numpy segmented Sieve Of Eratosthenes

Question 1

For the segmented Sieve Of Eratosthenes, I have devised the following code provided. And I understand that it is not the best implementation; However, it should be much quicker than what it is now. The c Profile provides numerous results. However, it only defines built-in methods and gives no further context as to which built-in method is taking up the most time. I wish to optimize this sieve to the maximum possible performance capable through python.

Which numpy built-in method is making my program slower?

import numpy as np
import math
import cProfile
def generateprimes(n: int = 0) -> list:#simple sieve of Eratosthenes to find primes that make up
 #all factors of the upper bound.
 Primelist = np.full((1,int(math.sqrt(n) )),1, dtype='int8') 
 Primelist[0,0], Primelist[0,1] = 0,0
 for i in range(2,int(math.sqrt(math.sqrt(n))) + 1):
 if Primelist[0,i] == 1:
 for x in range (i*i,int(math.sqrt(n) ),i):
 Primelist[0,x] = 0
 indices = np.where(Primelist == 1)[1]
 return indices
 #unique, counts = np.unique(Primelist, return_counts=True)
 #dict(zip(unique, counts))
 #return [b for b in range(int(math.sqrt(n))) if Primelist[0,b] == 1]
def SegmentedSieve(R: int, L:int, Primes: tuple) -> tuple:
 Finalprime = np.empty((0,1), dtype = 'int32')
 limit = (R//L) # Total amount of segments
 Finalprime = np.append(Finalprime, Primes)
 for x in range(1, limit):
 Low = (L * x)
 High = Low + L
 Segment = np.full((1,High-Low),1,dtype='int8') #creates a segment that is the size of the differences between L and R
 NewPrimes = np.extract(Primes <= int(math.sqrt(High)),Primes) # Only taking the primes that are less than the square root of the upperbound. 
 #i.e the limit of the current segment
 for p in NewPrimes:
 s = ((int(math.ceil(Low / p))) * p) % Low # Finds the position of the first occurrence of the prime in the 
 # newest lowest bound or Low
 for i in range(s,L,p): # starts at the position found and skips every p distance. The upperbound is the 
 #amount of elements in each segment.
 Segment[0, i] = 0
 indices = np.where(Segment == 1)[1] + Low
 Finalprime = np.append(Finalprime, indices)
 continue
 return Finalprime
def main():
 R = (10 ** 7)
 L = int(math.sqrt(R))
 SegmentedSieve(R, L, generateprimes(R))
if __name__ == '__main__':
 cProfile.run('main()')

Run results

Question 2

Performance

`numpy.append`

The first thing I noticed is the numpy.append method with a cumulative time of 1.131s. For each of the 3162 times it is called a new array is created and the content of the existing array is copied. This can be improved by using a (builtin) list instead of a numpy array. Appending to a list is less expensive because lists are specifically designed to be appended to.

In order to still return a numpy array, the list of arrays Finalprime has to be converted to an array using the numpy.hstack method.

This change reduced the total runtime from 4.5s to 2.8s.

Python `for` loop

Next I used the line based profile "Scalene" to find out which code lines use up a lot of time.

This lead me to the following loop:

for i in range(s,L,p): # starts at the position found and skips every p distance. The upperbound is the 
 #amount of elements in each segment.
 Segment[0, i] = 0

All this loop does setting every pth element to zero. The same can be achieved using the arguments of the range function as a numpy slice instead:

Segment[0, s:L:p] = 0

This improvement further reduced the runtime from 2.8s to 1.0s

General

There are a number things that are not performance related but are a bit odd:

the continue keyword at the end of the main loop of SegmentedSieve is executed every time at the end of the loop and is therefor useless.
Your code does not follow the standard python formatting conventions:
- functions and variables should be named using lower_snake_case instead of PascalCase
- after a comma there should usually be a space
- before and after a top level definition there should be two blank lines
Some of the type annotations are incorrect. generateprimes does not return a list but a numpy array. The Primes argument of the SegmentedSieve method is not a tuple but a numpy array. The same is true for the return value of said method.
You are using 2D arrays for Segment and Primelist but the first dimension of these arrays has size 1 and it is never really used. This can be simplified to 1D arrays.

Changed Code

This is the code with the performance and general suggestions applied to it:

import numpy as np
from numpy.typing import NDArray
import math
import cProfile
def generateprimes(n: int = 0) -> NDArray[np.intp]:
 """
 simple sieve of Eratosthenes to find primes that make up
 all factors of the upper bound.
 """
 upper_bound = int(math.sqrt(n))
 primes = np.full(upper_bound, 1, dtype=np.int8)
 primes[0], primes[1] = 0, 0
 for i in range(2, int(math.sqrt(math.sqrt(n))) + 1):
 if primes[i] == 1:
 for x in range(i**2, upper_bound, i):
 primes[x] = 0
 indices = np.where(primes == 1)[0]
 return indices
def segmented_sieve(r: int, l: int, primes: NDArray[np.intp]) -> NDArray[np.intp]:
 finalprime = []
 limit = r // l # Total amount of segments
 finalprime.append(primes)
 for x in range(1, limit):
 low = l * x
 high = low + l
 # creates a segment that is the size of the differences between L and R
 segment = np.full(high - low, 1, dtype=np.int8)
 # Only taking the primes that are less than the square root of the upperbound.
 # i.e the limit of the current segment
 new_primes = np.extract(primes <= int(math.sqrt(high)), primes)
 for p in new_primes:
 # Finds the position of the first occurrence of the prime in the
 # newest lowest bound or Low
 s = (math.ceil(low / p) * p) % low
 # starts at the position found and skips every p distance. The upperbound is the
 # amount of elements in each segment.
 segment[s:l:p] = 0
 indices = np.where(segment == 1)[0] + low
 finalprime.append(indices)
 return np.hstack(finalprime)
def main():
 r = 10**7
 l = int(math.sqrt(r))
 segmented_sieve(r, l, generateprimes(r))
if __name__ == "__main__":
 cProfile.run("main()")

M472 M472M472 3861 silver badge5 bronze badges · Accepted Answer · 2022-11-28 21:41:08Z

Performance

`numpy.append`

The first thing I noticed is the numpy.append method with a cumulative time of 1.131s. For each of the 3162 times it is called a new array is created and the content of the existing array is copied. This can be improved by using a (builtin) list instead of a numpy array. Appending to a list is less expensive because lists are specifically designed to be appended to.

In order to still return a numpy array, the list of arrays Finalprime has to be converted to an array using the numpy.hstack method.

This change reduced the total runtime from 4.5s to 2.8s.

Python `for` loop

Next I used the line based profile "Scalene" to find out which code lines use up a lot of time.

This lead me to the following loop:

for i in range(s,L,p): # starts at the position found and skips every p distance. The upperbound is the 
 #amount of elements in each segment.
 Segment[0, i] = 0

All this loop does setting every pth element to zero. The same can be achieved using the arguments of the range function as a numpy slice instead:

Segment[0, s:L:p] = 0

This improvement further reduced the runtime from 2.8s to 1.0s

General

There are a number things that are not performance related but are a bit odd:

the continue keyword at the end of the main loop of SegmentedSieve is executed every time at the end of the loop and is therefor useless.
Your code does not follow the standard python formatting conventions:
- functions and variables should be named using lower_snake_case instead of PascalCase
- after a comma there should usually be a space
- before and after a top level definition there should be two blank lines
Some of the type annotations are incorrect. generateprimes does not return a list but a numpy array. The Primes argument of the SegmentedSieve method is not a tuple but a numpy array. The same is true for the return value of said method.
You are using 2D arrays for Segment and Primelist but the first dimension of these arrays has size 1 and it is never really used. This can be simplified to 1D arrays.

Changed Code

This is the code with the performance and general suggestions applied to it:

import numpy as np
from numpy.typing import NDArray
import math
import cProfile
def generateprimes(n: int = 0) -> NDArray[np.intp]:
 """
 simple sieve of Eratosthenes to find primes that make up
 all factors of the upper bound.
 """
 upper_bound = int(math.sqrt(n))
 primes = np.full(upper_bound, 1, dtype=np.int8)
 primes[0], primes[1] = 0, 0
 for i in range(2, int(math.sqrt(math.sqrt(n))) + 1):
 if primes[i] == 1:
 for x in range(i**2, upper_bound, i):
 primes[x] = 0
 indices = np.where(primes == 1)[0]
 return indices
def segmented_sieve(r: int, l: int, primes: NDArray[np.intp]) -> NDArray[np.intp]:
 finalprime = []
 limit = r // l # Total amount of segments
 finalprime.append(primes)
 for x in range(1, limit):
 low = l * x
 high = low + l
 # creates a segment that is the size of the differences between L and R
 segment = np.full(high - low, 1, dtype=np.int8)
 # Only taking the primes that are less than the square root of the upperbound.
 # i.e the limit of the current segment
 new_primes = np.extract(primes <= int(math.sqrt(high)), primes)
 for p in new_primes:
 # Finds the position of the first occurrence of the prime in the
 # newest lowest bound or Low
 s = (math.ceil(low / p) * p) % low
 # starts at the position found and skips every p distance. The upperbound is the
 # amount of elements in each segment.
 segment[s:l:p] = 0
 indices = np.where(segment == 1)[0] + low
 finalprime.append(indices)
 return np.hstack(finalprime)
def main():
 r = 10**7
 l = int(math.sqrt(r))
 segmented_sieve(r, l, generateprimes(r))
if __name__ == "__main__":
 cProfile.run("main()")

Stack Exchange Network

Numpy segmented Sieve Of Eratosthenes

1 Answer 1

Performance

`numpy.append`

Python `for` loop

General

Changed Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Numpy segmented Sieve Of Eratosthenes

1 Answer 1

Performance

numpy.append

Python for loop

General

Changed Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`numpy.append`

Python `for` loop