For the segmented Sieve Of Eratosthenes, I have devised the following code provided. And I understand that it is not the best implementation; However, it should be much quicker than what it is now. The c Profile provides numerous results. However, it only defines built-in methods and gives no further context as to which built-in method is taking up the most time. I wish to optimize this sieve to the maximum possible performance capable through python.
Which numpy built-in method is making my program slower?
import numpy as np
import math
import cProfile
def generateprimes(n: int = 0) -> list:#simple sieve of Eratosthenes to find primes that make up
#all factors of the upper bound.
Primelist = np.full((1,int(math.sqrt(n) )),1, dtype='int8')
Primelist[0,0], Primelist[0,1] = 0,0
for i in range(2,int(math.sqrt(math.sqrt(n))) + 1):
if Primelist[0,i] == 1:
for x in range (i*i,int(math.sqrt(n) ),i):
Primelist[0,x] = 0
indices = np.where(Primelist == 1)[1]
return indices
#unique, counts = np.unique(Primelist, return_counts=True)
#dict(zip(unique, counts))
#return [b for b in range(int(math.sqrt(n))) if Primelist[0,b] == 1]
def SegmentedSieve(R: int, L:int, Primes: tuple) -> tuple:
Finalprime = np.empty((0,1), dtype = 'int32')
limit = (R//L) # Total amount of segments
Finalprime = np.append(Finalprime, Primes)
for x in range(1, limit):
Low = (L * x)
High = Low + L
Segment = np.full((1,High-Low),1,dtype='int8') #creates a segment that is the size of the differences between L and R
NewPrimes = np.extract(Primes <= int(math.sqrt(High)),Primes) # Only taking the primes that are less than the square root of the upperbound.
#i.e the limit of the current segment
for p in NewPrimes:
s = ((int(math.ceil(Low / p))) * p) % Low # Finds the position of the first occurrence of the prime in the
# newest lowest bound or Low
for i in range(s,L,p): # starts at the position found and skips every p distance. The upperbound is the
#amount of elements in each segment.
Segment[0, i] = 0
indices = np.where(Segment == 1)[1] + Low
Finalprime = np.append(Finalprime, indices)
continue
return Finalprime
def main():
R = (10 ** 7)
L = int(math.sqrt(R))
SegmentedSieve(R, L, generateprimes(R))
if __name__ == '__main__':
cProfile.run('main()')
1 Answer 1
Performance
numpy.append
The first thing I noticed is the numpy.append
method with a cumulative time of 1.131s. For each of the 3162 times it is called a new array is created and the content of the existing array is copied. This can be improved by using a (builtin) list instead of a numpy array. Appending to a list is less expensive because lists are specifically designed to be appended to.
In order to still return a numpy array, the list of arrays Finalprime
has to be converted to an array using the numpy.hstack
method.
This change reduced the total runtime from 4.5s to 2.8s.
Python for
loop
Next I used the line based profile "Scalene" to find out which code lines use up a lot of time.
This lead me to the following loop:
for i in range(s,L,p): # starts at the position found and skips every p distance. The upperbound is the
#amount of elements in each segment.
Segment[0, i] = 0
All this loop does setting every pth element to zero. The same can be achieved using the arguments of the range function as a numpy slice instead:
Segment[0, s:L:p] = 0
This improvement further reduced the runtime from 2.8s to 1.0s
General
There are a number things that are not performance related but are a bit odd:
- the
continue
keyword at the end of the main loop ofSegmentedSieve
is executed every time at the end of the loop and is therefor useless. - Your code does not follow the standard python formatting conventions:
- functions and variables should be named using
lower_snake_case
instead ofPascalCase
- after a comma there should usually be a space
- before and after a top level definition there should be two blank lines
- functions and variables should be named using
- Some of the type annotations are incorrect.
generateprimes
does not return alist
but a numpyarray
. ThePrimes
argument of theSegmentedSieve
method is not atuple
but a numpyarray
. The same is true for the return value of said method. - You are using 2D arrays for
Segment
andPrimelist
but the first dimension of these arrays has size 1 and it is never really used. This can be simplified to 1D arrays.
Changed Code
This is the code with the performance and general suggestions applied to it:
import numpy as np
from numpy.typing import NDArray
import math
import cProfile
def generateprimes(n: int = 0) -> NDArray[np.intp]:
"""
simple sieve of Eratosthenes to find primes that make up
all factors of the upper bound.
"""
upper_bound = int(math.sqrt(n))
primes = np.full(upper_bound, 1, dtype=np.int8)
primes[0], primes[1] = 0, 0
for i in range(2, int(math.sqrt(math.sqrt(n))) + 1):
if primes[i] == 1:
for x in range(i**2, upper_bound, i):
primes[x] = 0
indices = np.where(primes == 1)[0]
return indices
def segmented_sieve(r: int, l: int, primes: NDArray[np.intp]) -> NDArray[np.intp]:
finalprime = []
limit = r // l # Total amount of segments
finalprime.append(primes)
for x in range(1, limit):
low = l * x
high = low + l
# creates a segment that is the size of the differences between L and R
segment = np.full(high - low, 1, dtype=np.int8)
# Only taking the primes that are less than the square root of the upperbound.
# i.e the limit of the current segment
new_primes = np.extract(primes <= int(math.sqrt(high)), primes)
for p in new_primes:
# Finds the position of the first occurrence of the prime in the
# newest lowest bound or Low
s = (math.ceil(low / p) * p) % low
# starts at the position found and skips every p distance. The upperbound is the
# amount of elements in each segment.
segment[s:l:p] = 0
indices = np.where(segment == 1)[0] + low
finalprime.append(indices)
return np.hstack(finalprime)
def main():
r = 10**7
l = int(math.sqrt(r))
segmented_sieve(r, l, generateprimes(r))
if __name__ == "__main__":
cProfile.run("main()")
Explore related questions
See similar questions with these tags.