Cached infinite prime generator

Question 1

For a project I am working on, I need to use a lot of primes, frequently. To do this, I added a cache to Will Ness's prime sieve, that stores already generated primes so getting them is quick. How can I improve this code?

def prime_sieve(): # postponed sieve, by Will Ness
 for c in (2,3,5,7): # original code David Eppstein,
 yield c
 sieve = {} # Alex Martelli, ActiveState Recipe 2002
 ps = prime_sieve() # a separate base Primes Supply:
 p = next(ps) and next(ps) # (3) a Prime to add to dict
 q = p*p # (9) its sQuare
 for c in count(9,2): # the Candidate
 if c in sieve: # c’s a multiple of some base prime
 s = sieve.pop(c) # i.e. a composite ; or
 elif c < q:
 yield c # a prime
 continue
 else: # (c==q): # or the next base prime’s square:
 s=count(q+2*p,2*p) # (9+6, by 6 : 15,21,27,33,...)
 p=next(ps) # (5)
 q=p*p # (25)
 for m in s: # the next multiple
 if m not in sieve: # no duplicates
 break
 sieve[m] = s # original test entry: ideone.com/WFv4f
class cached_primes:
 def __init__(self):
 self.prime_gen = prime_sieve()
 self.vals = [2]
 next(self.prime_gen)
 def get(self, start=2, end=float('inf')):
 vals = self.vals
 lo = self.get_ind(start)
 if vals[-1] >= end:
 hi = self.get_ind(end)
 yield from islice(vals, lo, hi)
 return
 else:
 while vals[-1] < end:
 if len(vals) > lo:
 bound = len(vals)
 yield from takewhile(lambda p: p <=end, islice(vals, lo, bound))
 lo = bound
 vals.append(next(self.prime_gen))
 hi = self.get_ind(end)
 yield from islice(vals, lo, hi)
 def get_ind(self, x):
 if x <= 2:
 return 0
 vals = self.vals
 B = x/log(x)
 if x < 17:
 return bisect(vals, x, 0, min(len(vals), int(B)))
 return bisect(vals, x, int(B), min(len(vals), int(1.25506*B)))

Question 2

Can you add a link to the prime sieve solution on which your code is based?

Question 3

ideone.com/WFv4f

Question 4

prime_sieve() isn't your code, so I won't review it (although I will express my opinion that it's in dire need of more commenting).

However, the choice to use it is firmly within the scope of this review, and frankly I can't understand it. The whole point of that particular implementation is to optimise memory use at the cost of considerable code complexity. But given that your goal is a fully cached list (i.e. optimising speed at the cost of memory usage), it would make far more sense to me to use a simpler implementation which directly accesses the cache.

 if vals[-1] >= end:
 hi = self.get_ind(end)
 yield from islice(vals, lo, hi)
 return
 else:
 while vals[-1] < end:
 if len(vals) > lo:
 bound = len(vals)
 yield from takewhile(lambda p: p <=end, islice(vals, lo, bound))
 lo = bound
 vals.append(next(self.prime_gen))
 hi = self.get_ind(end)
 yield from islice(vals, lo, hi)

Taking a step back to look at the structure, we have

 if condition():
 foo()
 return
 else:
 while not condition():
 bar()
 foo()
 return # implicit because it's the end of the function

Firstly,

 return
 else:

can often be clearer without the else, which is unnecessary: the scope it defines is unreachable if the if block executed. That gives

 if condition():
 foo()
 return
 while not condition():
 bar()
 foo()
 return

which is semantically identical to

 while not condition():
 bar()
 foo()
 return

 while vals[-1] < end:
 if len(vals) > lo:
 bound = len(vals)
 yield from takewhile(lambda p: p <=end, islice(vals, lo, bound))
 lo = bound
 vals.append(next(self.prime_gen))

The whole if clause in here seems dubious to me. Why can't it be removed entirely? (I figured out the reason, but IMO there should be a comment to save me the effort).

Given that all but the first time round it will yield from a slice containing precisely one value (the one appended the previous time round), it seems rather heavyweight. It would probably make sense to refactor as:

 yield from (something)
 while vals[-1] < end:
 extension = next(self.prime_gen)
 vals.append(extension)
 yield extension

Further modification to the final refactors implemented for the above points may come from the simpler replacement of prime_gen. Do you want a replacement with produces primes one at a time, or something like a paged sieve? The latter should be more efficient, but at the cost of slight complication.

 def get_ind(self, x):

If I had to guess without any contextual clues, I would assume that ind is short for independent. If it's necessary to abbreviate index then I consider idx to be the best option. But here there is a clear correct name for the method: index, for consistency with list.index.

(Of course, that's assuming you want it to be public. Given that it returns incorrect values for primes which haven't yet been reached by get, it would probably be better to call it _index).

Peter Taylor Peter Taylor 24.5k1 gold badge49 silver badges94 bronze badges · Answer 1 · 2019-08-12 10:37:14Z

prime_sieve() isn't your code, so I won't review it (although I will express my opinion that it's in dire need of more commenting).

However, the choice to use it is firmly within the scope of this review, and frankly I can't understand it. The whole point of that particular implementation is to optimise memory use at the cost of considerable code complexity. But given that your goal is a fully cached list (i.e. optimising speed at the cost of memory usage), it would make far more sense to me to use a simpler implementation which directly accesses the cache.

 if vals[-1] >= end:
 hi = self.get_ind(end)
 yield from islice(vals, lo, hi)
 return
 else:
 while vals[-1] < end:
 if len(vals) > lo:
 bound = len(vals)
 yield from takewhile(lambda p: p <=end, islice(vals, lo, bound))
 lo = bound
 vals.append(next(self.prime_gen))
 hi = self.get_ind(end)
 yield from islice(vals, lo, hi)

Taking a step back to look at the structure, we have

 if condition():
 foo()
 return
 else:
 while not condition():
 bar()
 foo()
 return # implicit because it's the end of the function

Firstly,

 return
 else:

can often be clearer without the else, which is unnecessary: the scope it defines is unreachable if the if block executed. That gives

 if condition():
 foo()
 return
 while not condition():
 bar()
 foo()
 return

which is semantically identical to

 while not condition():
 bar()
 foo()
 return

 while vals[-1] < end:
 if len(vals) > lo:
 bound = len(vals)
 yield from takewhile(lambda p: p <=end, islice(vals, lo, bound))
 lo = bound
 vals.append(next(self.prime_gen))

The whole if clause in here seems dubious to me. Why can't it be removed entirely? (I figured out the reason, but IMO there should be a comment to save me the effort).

Given that all but the first time round it will yield from a slice containing precisely one value (the one appended the previous time round), it seems rather heavyweight. It would probably make sense to refactor as:

 yield from (something)
 while vals[-1] < end:
 extension = next(self.prime_gen)
 vals.append(extension)
 yield extension

Further modification to the final refactors implemented for the above points may come from the simpler replacement of prime_gen. Do you want a replacement with produces primes one at a time, or something like a paged sieve? The latter should be more efficient, but at the cost of slight complication.

 def get_ind(self, x):

If I had to guess without any contextual clues, I would assume that ind is short for independent. If it's necessary to abbreviate index then I consider idx to be the best option. But here there is a clear correct name for the method: index, for consistency with list.index.

(Of course, that's assuming you want it to be public. Given that it returns incorrect values for primes which haven't yet been reached by get, it would probably be better to call it _index).

Stack Exchange Network

Cached infinite prime generator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Cached infinite prime generator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions