Express a number as sum of 2 primes

Question 1

Problem: Given an even number (greater than 2), return two prime numbers whose sum will be equal to given number.

NOTE 1: A solution will always exist. (Goldbach's conjecture)

Note 2: If there are more than one solutions possible, return the lexicographically smaller solution.

This solution becomes slower for higher order numbers. How can I optimise this code?

import math
# Function to create a list of all prime numbers uptil given number
def list_prime(A):
 primes = [] # initialise an empty list of primes
 bool_a = [True for i in range(2,A+3)] # creates boolean list of index 
 for i in range(2, int(math.sqrt(A))+1): 
 if bool_a[i]:
 for j in range( i * i, A +1 , i): # eliminate all multiples of i*i+i, i*i+2i and so on uptil A+1 
 if bool_a[j]:
 bool_a[j] = False;
 for i in range(2,A): # the index of reamaining bools are prime numbers 
 if bool_a[i]:
 primes.append(i)
 return primes
#Function that returns two prime numbers whose sum is equal to the given number
def prime_sum(A):
 solution_set = []
 for i in (list_prime(A)):
 for j in (list_prime(A)[::-1]):
 if i + j == A:
 solution_set.append((i,j))
 break
 return min(solution_set)

Question 2

You calculate all the primes on each iteration. Caching this result will help tremendously

Question 3

(Goldbach's conjecture is yet unproven).

Question 4

@Jamal: How does changing the wording from "This is my solution: [solution] It becomes slower for [...]" to "This solution becomes slower for [...]: [solution]" make the question better? I agree that most of your changes increased the readability of the question or otherwise improved it, but the change mentioned above seems more like an adaptation to your personal preference than an objective improvement in readability or otherwise. I find that this slightly goes against the policy to always respect the author's intent, even if, in this case, it applies more to the language style than the content.

Question 5

@Stingy: Because "this is my solution" is a bit unnecessary as it's already obvious that a solution is given. The real concern is then moved up so that it stands out more.

Question 6

Using the Sieve of Eratosthenes seems like a reasonable way to generate a large number of primes.

One thing that you could do is directly return your bitset representation of the primes, rather than copying it to a condensed list. This would save a bit of time in the short term, and be helpful for the next step.

for i in (list_prime(A)):
 for j in (list_prime(A)[::-1]):

A few comments.

First, doing something twice is slower than doing it once! Since list_prime(A) is repeated in both lines, it will have to re-run the Sieve of Eratosthenes: even though you know that the list of prime numbers isn't going to change, Python doesn't! Even worse, because the second line is in a loop, you're running list_prime again for every value of i. So you can speed it up a bit by storing that list in a variable before you get to the loops.

Second, when you have nested loops like this, you make your algorithm quadratic in the size of list_prime(A). The issue is that you're doing the imprecise test for j before the precise test, which is normally the slow way around. That is, you're considering all the values of j which meet the rule "j is a prime number" and then testing whether "j + i == A". Instead consider all the values of j for which "j + i == A" first. There is exactly one of those for each i, which is found by simple subtraction. Then check whether that j is in your list of prime numbers. If so, you have a match. If not, you can immediately check the next i.

(The reason that I suggested returning the bitset representation of the list of primes is that it makes it much faster to check whether a specific number is prime.)

A few less substantial optimisations

solution_set.append((i,j))

It is good to consider the tie-breaking rules for problems like this, and listing candidates first and checking which wins afterwards is a valid way to do so.

However, it is worth thinking about the order of your algorithm before implementing the tie-breaking check. If, as is the case here, the first satisfying value is guaranteed to be the one that wins the tie break because it has the smallest i value, you might as well just return the first one you get to.

if bool_a[j]:
 bool_a[j] = False;

If the end result of these two lines is that bool_a[j] must be False, just set it to false. if statements are surprisingly slow on modern CPUs.

primes.append(i)

Whenever you find yourself appending something to a list one element at a time, consider whether you can rewrite it using list comprehensions. For example something like the following:

primes = [index for (index, is_prime) in enumerate(bool_a) if is_prime]

(However, as above, my actual recommendation is that you completely remove that step and directly return bool_a)

Question 7

Restating and condensing what @Josiah already said in their answer:

Don't do unnecessary calculations. This includes not re-doing the calculations for all prime as well as not going through all pairs of numbers just to find two that sum to the final number.
Simplify the sieve of Eratosthenes to be a generator that yields each prime number as it finds it.

With these two changes your code would become:

def prime_sieve(limit):
 prime = [True] * limit
 prime[0] = prime[1] = False
 for i, is_prime in enumerate(prime):
 if is_prime:
 yield i
 for n in range(i * i, limit, i):
 prime[n] = False
def sum_of_primes(n):
 primes = list(prime_sieve(n))
 # for fast `in` lookups:
 primes_set = set(primes)
 for a in primes:
 b = n - a
 if b in primes_set:
 # this is the lexicographically smallest by design
 return a, b

This outperforms your original code by a huge margin:

In [5]: %timeit prime_sum(1002)
27.1 ms ± 256 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit sum_of_primes(1002)
142 μs ± 699 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

It also scales a lot better (O(n) instead of O(n^2)):

enter image description here

Question 8

Although I am generally in favour of generators, is there much advantage to a generator that is immediately packed into a list?

Question 9

@Josiah: No, not really. But there is an advantage over the current approach by the OP which goes through all numbers first and then goes through the array again, accumulating all numbers which are prime.

Question 10

@Josiah: Generator code is usually a couple of lines shorter, and there are more opportunities for re-use (it might not always be the case that it gets immediately turned into a list).

Question 11

This could be further modularized by having a method two_sum(numbers, target) that finds two numbers in a list that sum up to a given target. The method neither knows nor cares about prime numbers.

Josiah Josiah 6,1062 gold badges18 silver badges39 bronze badges · Accepted Answer · 2018-05-12 17:12:41Z

Using the Sieve of Eratosthenes seems like a reasonable way to generate a large number of primes.

One thing that you could do is directly return your bitset representation of the primes, rather than copying it to a condensed list. This would save a bit of time in the short term, and be helpful for the next step.

for i in (list_prime(A)):
 for j in (list_prime(A)[::-1]):

A few comments.

First, doing something twice is slower than doing it once! Since list_prime(A) is repeated in both lines, it will have to re-run the Sieve of Eratosthenes: even though you know that the list of prime numbers isn't going to change, Python doesn't! Even worse, because the second line is in a loop, you're running list_prime again for every value of i. So you can speed it up a bit by storing that list in a variable before you get to the loops.

Second, when you have nested loops like this, you make your algorithm quadratic in the size of list_prime(A). The issue is that you're doing the imprecise test for j before the precise test, which is normally the slow way around. That is, you're considering all the values of j which meet the rule "j is a prime number" and then testing whether "j + i == A". Instead consider all the values of j for which "j + i == A" first. There is exactly one of those for each i, which is found by simple subtraction. Then check whether that j is in your list of prime numbers. If so, you have a match. If not, you can immediately check the next i.

(The reason that I suggested returning the bitset representation of the list of primes is that it makes it much faster to check whether a specific number is prime.)

A few less substantial optimisations

solution_set.append((i,j))

It is good to consider the tie-breaking rules for problems like this, and listing candidates first and checking which wins afterwards is a valid way to do so.

However, it is worth thinking about the order of your algorithm before implementing the tie-breaking check. If, as is the case here, the first satisfying value is guaranteed to be the one that wins the tie break because it has the smallest i value, you might as well just return the first one you get to.

if bool_a[j]:
 bool_a[j] = False;

If the end result of these two lines is that bool_a[j] must be False, just set it to false. if statements are surprisingly slow on modern CPUs.

primes.append(i)

Whenever you find yourself appending something to a list one element at a time, consider whether you can rewrite it using list comprehensions. For example something like the following:

primes = [index for (index, is_prime) in enumerate(bool_a) if is_prime]

(However, as above, my actual recommendation is that you completely remove that step and directly return bool_a)

Stack Exchange Network

Express a number as sum of 2 primes

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Express a number as sum of 2 primes

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions