I'm trying to solve the following exercise from Codility:
A prime is a positive integer X that has exactly two distinct divisors: 1 and X. The first few prime integers are 2, 3, 5, 7, 11 and 13.
A semiprime is a natural number that is the product of two (not necessarily distinct) prime numbers. The first few semiprimes are 4, 6, 9, 10, 14, 15, 21, 22, 25, 26.
You are given two non-empty arrays P and Q, each consisting of M integers. These arrays represent queries about the number of semiprimes within specified ranges.
Query K requires you to find the number of semiprimes within the range (P[K], Q[K]), where 1 ≤ P[K] ≤ Q[K] ≤ N.
For example, consider an integer N = 26 and arrays P, Q such that:
P[0] = 1 Q[0] = 26 P[1] = 4 Q[1] = 10 P[2] = 16 Q[2] = 20
The number of semiprimes within each of these ranges is as follows:
(1, 26) is 10, (4, 10) is 4, (16, 20) is 0. Write a function:
def solution(N, P, Q)
that, given an integer N and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M elements specifying the consecutive answers to all the queries.
For example, given an integer N = 26 and arrays P, Q such that:
P[0] = 1 Q[0] = 26 P[1] = 4 Q[1] = 10 P[2] = 16 Q[2] = 20
the function should return the values [10, 4, 0], as explained above.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..50,000]; M is an integer within the range [1..30,000]; each element of arrays P and Q is an integer within the range [1..N]; P[i] ≤ Q[i]. https://app.codility.com/programmers/lessons/11-sieve_of_eratosthenes/count_semiprimes/
My idea is to first find the primes in a given range, then find the semiprimes by multiplying the primes with each other, calculate the prefix sum of the semiprimes and finally return the number of semiprimes in a given slice by subtracting the prefix sum of the lower index from the higher.
My code is:
def find_primes(n):
primes = set(range(1, n+1))
i = 2
while i * i <= n:
if i in primes:
k = i * i
while k <=n :
if k in primes:
primes.remove(k)
k += i
i += 1
return list(primes)
def find_semi_primes(n):
primes = find_primes(n)
p = len(primes)
semi_primes = set()
for i in range(1, p):
for j in range(i, p):
p1 = primes[i]
p2 = primes[j]
if p1 * p2 > n:
break
semi_primes.add(p1 * p2)
return semi_primes
def solution(N, P, Q):
semi_primes = find_semi_primes(N)
prefix_count = (N+1) * [0]
for n in range(1, N+1):
prefix_count[n] = prefix_count[n-1]
if n in semi_primes:
prefix_count[n] += 1
q = len(P)
result = q * [0]
for i in range(q):
result[i] = prefix_count[Q[i]] - prefix_count[P[i]-1]
return result
3 Answers 3
Here are some suggestions, mainly regarding coding style.
Syntax
When I copied your code from the question and tried to run it,
I got a compile error due to the missing indentation in the
find_primes
function:
if k in primes:
primes.remove(k)
Perhaps there was a problem when you copied the code into the post. This works:
if k in primes:
primes.remove(k)
Let's start from the top:
def find_primes(n):
That's a good name for a function. However, it doesn't specifically convey what it is doing. That could be improved with a docstring as recommended by the PEP 8 style guide. The docstring could summarize the function's purpose as well as specify the input type and return type:
def find_primes(n):
"""
Find all primes numbers less than or equal to the input.
The input (n) is expected to be a positive integer.
Return a list of primes.
"""
primes
is a good name for an array variable:
primes = set(range(1, n+1))
However, it at first seems a little misleading because you are initially setting it to all integers, not just prime numbers. I recommend adding a comment stating that the non-primes will be removed from the array.
It would also be helpful to add comments regarding your algorithm,
explaining your nested while
loops.
This line uses inconsistent spacing around the operators:
while k <=n :
The black program can be used to automatically reformat the code with consistency:
while k <= n:
The common definition of prime numbers does not include 1
, yet your
find_primes
function returns 1
in its primes
array. Even the
Codility description shows the first prime number as 2
. I suggest
adding a comment in the code for this.
You should also add a docstring for the find_semi_primes
function, which should
include a description of what a "semiprime" is. It should also include
a description of the algorithm, explaining the nested loops.
The expression p1 * p2
is repeated a couple times. You can avoid the
DRY using a
new variable named product
:
for j in range(i, p):
product = primes[i] * primes[j]
if product > n:
break
semi_primes.add(product)
This code seems very "procedural" in nature, looking like the work of a C or C++ programmer rather than that of a fluent Pythonista. For example, I see a lot of for i in range():
loops where the idiomatic Python would use a comprehension or transform.
Let's start by correcting find_primes()
- 1 is not a prime, so the initial set should start at 2
:
primes = set(range(2, n+1))
Then instead of ignoring the first element in find_semi_primes()
, we should index from zero:
for i in range(0, p):
That makes the code more understandable for readers used to working with primes.
Consider the final loop:
q = len(P) result = q * [0] for i in range(q): result[i] = prefix_count[Q[i]] - prefix_count[P[i]-1] return result
My Pythonic replacement uses map
to evaluate each query and produce the result:
def semiprimes_in_range(lower, upper):
return prefix_count[upper] - prefix_count[lower-1]
return list(map(semiprimes_in_range, p, q))
It's possible to write find_semiprimes
can be written more readably as:
def find_semiprimes(n):
primes = find_primes((n+1)//2)
return {x * y for x, y in itertools.combinations_with_replacement(primes, 2 )}
However, I don't recommend this because it generates many semiprimes greater than n
, hugely impacting performance.
But do note the ceil division by 2 - since the smallest prime is 2, we know that's the smallest possible factor of the semiprimes.
We can use itertools.accumulate()
to generate prefix_count
- that's another chunk of code we don't have to write ourselves:
prefix_count = list(itertools.accumulate(map(lambda x: x in semiprimes, range(n+1))))
My modified code
It's not perfect, but demonstrates my suggestions as a complete program.
import itertools
def find_primes(n):
'''
Generate prime numbers up to (and including) 'n'.
'''
# Implementation is sieve of Eratosthenes.
primes = set(range(2, n+1))
i = 2
while i * i <= n:
if i in primes:
k = i * i
while k <= n:
if k in primes:
primes.remove(k)
k += i
i += 1
return sorted(primes)
def find_semiprimes(n):
'''
Generate semiprime numbers up to (and including) 'n'.
'''
primes = find_primes(n)
p = len(primes)
semiprimes = set()
for i in range(0, p):
for j in range(i, p):
p1 = primes[i]
p2 = primes[j]
if p1 * p2 > n:
break
semiprimes.add(p1 * p2)
return semiprimes
def solution(n, p, q):
'''
Create a list containing the following elements:
For every element of p and corresponding element of q (both ≤ n),
count how many semiprimes in the range [p, q).
'''
semiprimes = find_semiprimes(n)
prefix_count = list(itertools.accumulate(map(lambda x: x in semiprimes, range(n+1))))
def semiprimes_in_range(lower, upper):
return prefix_count[upper] - prefix_count[lower-1]
return list(map(semiprimes_in_range, p, q))
if __name__ == '__main__':
print(solution(26, [1, 4, 16], [26, 10, 20]))
A bug: find_semi_primes
assumes that the list of primes is in sorted order (it skips index 0 where it assumes the number 1 is, and it breaks the inner loop when the semiprime becomes too large). But find_primes
returns list(primes)
where primes
is a set, and sets are unordered. So you might get any arbitrary order instead of sorted order. (In CPython you currently do get sorted order, but that's due to implementation details that should not be relied on.)
And I'd change
if k in primes:
primes.remove(k)
to
primes.discard(k)
for simplicity.
Explore related questions
See similar questions with these tags.