For a project I am working on, I need to use a lot of primes, frequently. To do this, I added a cache to Will Ness's prime sieve, that stores already generated primes so getting them is quick. How can I improve this code?
def prime_sieve(): # postponed sieve, by Will Ness
for c in (2,3,5,7): # original code David Eppstein,
yield c
sieve = {} # Alex Martelli, ActiveState Recipe 2002
ps = prime_sieve() # a separate base Primes Supply:
p = next(ps) and next(ps) # (3) a Prime to add to dict
q = p*p # (9) its sQuare
for c in count(9,2): # the Candidate
if c in sieve: # c’s a multiple of some base prime
s = sieve.pop(c) # i.e. a composite ; or
elif c < q:
yield c # a prime
continue
else: # (c==q): # or the next base prime’s square:
s=count(q+2*p,2*p) # (9+6, by 6 : 15,21,27,33,...)
p=next(ps) # (5)
q=p*p # (25)
for m in s: # the next multiple
if m not in sieve: # no duplicates
break
sieve[m] = s # original test entry: ideone.com/WFv4f
class cached_primes:
def __init__(self):
self.prime_gen = prime_sieve()
self.vals = [2]
next(self.prime_gen)
def get(self, start=2, end=float('inf')):
vals = self.vals
lo = self.get_ind(start)
if vals[-1] >= end:
hi = self.get_ind(end)
yield from islice(vals, lo, hi)
return
else:
while vals[-1] < end:
if len(vals) > lo:
bound = len(vals)
yield from takewhile(lambda p: p <=end, islice(vals, lo, bound))
lo = bound
vals.append(next(self.prime_gen))
hi = self.get_ind(end)
yield from islice(vals, lo, hi)
def get_ind(self, x):
if x <= 2:
return 0
vals = self.vals
B = x/log(x)
if x < 17:
return bisect(vals, x, 0, min(len(vals), int(B)))
return bisect(vals, x, int(B), min(len(vals), int(1.25506*B)))
-
2\$\begingroup\$ Can you add a link to the prime sieve solution on which your code is based? \$\endgroup\$Martin R– Martin R2018年01月05日 06:23:56 +00:00Commented Jan 5, 2018 at 6:23
-
\$\begingroup\$ ideone.com/WFv4f \$\endgroup\$Oscar Smith– Oscar Smith2018年01月05日 06:35:14 +00:00Commented Jan 5, 2018 at 6:35
1 Answer 1
prime_sieve()
isn't your code, so I won't review it (although I will express my opinion that it's in dire need of more commenting).
However, the choice to use it is firmly within the scope of this review, and frankly I can't understand it. The whole point of that particular implementation is to optimise memory use at the cost of considerable code complexity. But given that your goal is a fully cached list (i.e. optimising speed at the cost of memory usage), it would make far more sense to me to use a simpler implementation which directly accesses the cache.
if vals[-1] >= end: hi = self.get_ind(end) yield from islice(vals, lo, hi) return else: while vals[-1] < end: if len(vals) > lo: bound = len(vals) yield from takewhile(lambda p: p <=end, islice(vals, lo, bound)) lo = bound vals.append(next(self.prime_gen)) hi = self.get_ind(end) yield from islice(vals, lo, hi)
Taking a step back to look at the structure, we have
if condition():
foo()
return
else:
while not condition():
bar()
foo()
return # implicit because it's the end of the function
Firstly,
return
else:
can often be clearer without the else
, which is unnecessary: the scope it defines is unreachable if the if
block executed. That gives
if condition():
foo()
return
while not condition():
bar()
foo()
return
which is semantically identical to
while not condition():
bar()
foo()
return
while vals[-1] < end: if len(vals) > lo: bound = len(vals) yield from takewhile(lambda p: p <=end, islice(vals, lo, bound)) lo = bound vals.append(next(self.prime_gen))
The whole if
clause in here seems dubious to me. Why can't it be removed entirely? (I figured out the reason, but IMO there should be a comment to save me the effort).
Given that all but the first time round it will yield from
a slice containing precisely one value (the one appended the previous time round), it seems rather heavyweight. It would probably make sense to refactor as:
yield from (something)
while vals[-1] < end:
extension = next(self.prime_gen)
vals.append(extension)
yield extension
Further modification to the final refactors implemented for the above points may come from the simpler replacement of prime_gen
. Do you want a replacement with produces primes one at a time, or something like a paged sieve? The latter should be more efficient, but at the cost of slight complication.
def get_ind(self, x):
If I had to guess without any contextual clues, I would assume that ind
is short for independent. If it's necessary to abbreviate index
then I consider idx
to be the best option. But here there is a clear correct name for the method: index
, for consistency with list.index
.
(Of course, that's assuming you want it to be public. Given that it returns incorrect values for primes which haven't yet been reached by get
, it would probably be better to call it _index
).