Goldbach’s Conjecture in Python

Question 1

Input Format

Input starts with an integer n (1 ≤ n ≤ 100) indicating the number of cases. The following n lines each contain a test case of a single even number x (4 ≤ x ≤ 32000).

Output Format

For each test case x, give the number of unique ways that x can be represented as a sum of two primes. Then list the sums (one sum per line) in increasing order of the first addend. The first addend must always be less than or equal to the second to avoid duplicates.

Sample Input

2
26
100

Sample Output

26 has 3 representation(s)
3+23
7+19
13+13

The above problem was originally posted in 2013 ICPC North America Qualifier, I encountered it in HackerRank's Round-I Holiday Cup contest.

My Python 3.x Code

import math
import time
def sieve(n):
 "Return all primes <= n using sieve of erato."
 np1 = n + 1
 s = list(range(np1))
 s[1] = 0
 sqrtn = int(round(n**0.5))
 for i in range(2, sqrtn + 1):
 if s[i]:
 s[i*i: np1: i] = [0] * len(range(i*i, np1, i))
 return filter(None, s)
def primeSum(n):
 p=2
 limit=math.floor(n/2)
 for p in primes:
 q=n-p
 if(p>q):
 break
 if(q in primes):
 out.extend([p,q])
 print (n,"has",round(len(out)/2),"representation(s)")
 for i,j in zip(out[0::2],out[1::2]):
 print (i,"+",j,sep='')
 print ("")
primes = sorted(set(sieve(32000)))
for __ in range(int(input())):
 out = []
 primeSum(int(input()))

This code almost took over 5 seconds for larger values of n but ran in less than 1 second in most cases. Is it because of time taken to output all those representations for larger values (because 32000 has over 300 representations?

Can this be optimized? Also review my general coding (I'm new to python).

Question 2

Apparently there is a bug. It shows 0 representations for 125.

Question 3

Yea because Goldbach's conjecture is all about evens and primes. It says if n is even it can be represented as n=p+q where both p and q are primes.

Question 4

Oops... I don't know what I was thinking about. Sorry.

Question 5

I don't know the problem well enough to tell you how to improve the speed of the logic, but yes your print calls are slow. print can actually be quite a slow command (it depends on where you're running it).

But there is a way to speed it up without reducing text, and it suits your purposes. If you make less print calls with longer strings, you can reduce the time it takes. And in your case this also means replacing a for loop with a generator expression, which is also good. Generator expressions are list like objects that pass out one value at a time to a function or a loop.

So here's the code we can improve:

for i,j in zip(out[0::2],out[1::2]):
 print (i,"+",j,sep='')

You're iterating over pairs and printing each time, but with the str.join command you could use that for loop setup to instead make one long string that contains all the results. To make it a generator first the print arguments need to be switched to just a normal string formation, like this:

("{}+{}".format(i, j))

i and j will be swapped in where those curly braces are, making the same string you were printing with sep=''. Now here's how the generator expression looks:

("{}+{}".format(i, j) for i, j in zip(out[0::2], out[1::2]))

I have changed the string, but you can see that the for loop is just the same as before. Now to actually turn this into a string, we need to pass it to "".join. join is called on a string object, and whatever you put in that string object is inserted between each value taken from the generator. Since each print is a new line, a newline character fits best:

print(n, "has", round(len(out) / 2), "representation(s)")
results = ("{}+{}".format(i, j) for i, j in zip(out[0::2], out[1::2]))
print('\n'.join(results))

This may not solve your full speed problem, but it is faster and is a cleaner way to format the code.

Question 6

This is what actually solved the speed problem!

Question 7

Pass more arguments to functions

primes is global, but passing it as an argument to primeSum is trivial:

def primeSum(n, primes):

and then:

primeSum(int(input()), primes)

This makes the reasoning about and testing the function easier.

Do not modify random global variables

out.extend([p,q]) where out is a global variable. Don't. It makes the code impossible to use in different situations and therefore wastes your programming effort.

Do not print from a computation function

primeSum should find a prime sum. Printing the couples nicely is the job for another function. Again a computer friendly return format improves re-usability.

Remove dead code

You write:

limit=math.floor(n/2)

But limit is never been used, so delete that line.

Also p=2 can be eliminated as it does nothing.

Meaningful names

Avoid single-letter names, be descriptive. Good names can really make the difference between un-readable and readable code. And remember to use snake_case in Python.

Final version

Putting all my advice together gives this nice prime_sum function:

def prime_sums(number, primes):
 """
 Checks how a number may be written as a sum of two primes.
 >>> list(prime_sums(100, sorted(set(sieve(100)))))
 [(3, 97), (11, 89), (17, 83), (29, 71), (41, 59), (47, 53)]
 """
 for prime in primes:
 difference = number - prime
 if(prime > difference):
 break
 if(difference in primes):
 yield (prime, difference)

Question 8

Hmm..I agree as it was a contest I tried to shorten the code. Thanks anyways.

Question 9

Actually I have not learnt to use yield, I'll look into it!

Question 10

@Protino it is a sleek way of returning a list from a function basically. Stackoverflow explain it very well: stackoverflow.com/questions/231767/…

Question 11

@oliverpool Sure

Question 12

I love doing code review review ^^ (I deleted the comments that you fixed)

Question 13

Here are some comments:

Cut out unused code. A few instances I spotted:

import time # The time module is never used
limit=math.floor(n/2) # We never use this variable

Read PEP 8, the Python style guide. I’m not going to break it down in detail, but among other things:
- Spaces around binary operators
- Function names should be lowercase_with_underscores, not camelCase as in primeSum().
- Two newlines between functions.
Use better variable names. Names like p, q and s aren’t very helpful – single-letter variable names rarely are. You should give them meaningful names, that tell me what the value represents in the original problem.
Rethink how the primeSum() function works. Right now, it manipulates an external list (out) and prints some stuff to the screen. The printing is unhelpful if I want to use this as part of a larger function (not reusable), and it’s trusting the caller to set up a list called out. It would be better if:
- It didn’t print anything
- It created a new, empty version of out as an internal list, and returned that
Subsequent invocations will do weird things with the contents of out.
More comments and docstrings. There’s only one docstring in this function, and it’s quite short. You should bulk up the docstrings, telling me what the function does, what inputs it takes, and what the meaning of the return value is.

There should also be some comments relating the code back to the original problem. This is really important for reading, reviewing and maintainability. (Starting example: why do you set s[1] to be 0 on line 7? This isn’t obvious to somebody who only has the code.)
Tidy up the computation of the sqrtn variable.
- Rather than using n**0.5, use math.sqrt(n). Slightly longer, but more explicit.
- Once you’ve called round(), you don’t need to cast to int. Quoting from the docs:
  
  The return value is an integer if called with one argument, otherwise of the same type as number.
  
  (Note this behaviour is new with Python 3.)
- Since this variable is only used once, I’d inline it with the range() on the next line.

Question 14

Building on top of what was already pointed out in the other posts, there are a couple of optimizations to be made.

Set operations in place of iterating list

You are creating a list of sorted primes after calling set on the original list from your sieve and then using that primes list to check for members in your main loop:

for p in primes:
 q=n-p
 if(p>q):
 break
 if(q in primes):
 out.extend([p,q])

In the trimmed down sample below, you can save a copy of that call to set while still iterating in the sorted primes list but checking for membership using the set logical operations:

pset = set(sieve(32000))
primes = sorted(pset)

then the loop is:

for p in primes:
 q=n-p
 if(p>q):
 break
 if {q} & pset:
 out.extend([p,q])

This is improving the overall time by about two orders of magnitude:

import math
import time
from timeit import default_timer
def sieve(n):
 "Return all primes <= n using sieve of erato."
 np1 = n + 1
 s = list(range(np1))
 s[1] = 0
 sqrtn = int(round(n**0.5))
 for i in range(2, sqrtn + 1):
 if s[i]:
 s[i*i: np1: i] = [0] * int(math.ceil((np1 - i*i) / i))
 return filter(None, s)
def primeSum(n):
 p=2
 for p in primes:
 q=n-p
 if(p>q):
 break
 if {q} & pset:
 out.extend([p,q])
pset = set(sieve(32000))
primes = sorted(pset)
test_runs = 20
total = 0
for _ in range(test_runs):
 out = []
 start = default_timer()
 primeSum(30000)
 stop = default_timer()
 total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))

Results in:

Total Time: 0.027330
Avg Testrun Time: 0.001367

Where, a trimmed down version of your original (minus the print calls):

import math
import time
from timeit import default_timer
def sieve(n):
 "Return all primes <= n using sieve of erato."
 np1 = n + 1
 s = list(range(np1))
 s[1] = 0
 sqrtn = int(round(n**0.5))
 for i in range(2, sqrtn + 1):
 if s[i]:
 s[i*i: np1: i] = [0] * len(range(i*i, np1, i))
 return filter(None, s)
def primeSum(n):
 p=2
 limit=math.floor(n/2)
 for p in primes:
 q=n-p
 if(p>q):
 break
 if(q in primes):
 out.extend([p,q])
primes = sorted(set(sieve(32000)))
test_runs = 20
total = 0
for _ in range(test_runs):
 out = []
 start = default_timer()
 primeSum(30000)
 stop = default_timer()
 total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))

Runs in:

Total Time: 7.038189
Avg Testrun Time: 0.351909

Note: this code is run with python3.5

Math operation instead of useless object creation

I calculate the length of the sequence to be added in the sieve:

s[i*i: np1: i] = [0] * int(math.ceil((np1 - i*i) / i))

Instead of calling len over a range object you won't use:

s[i*i: np1: i] = [0] * len(range(i*i, np1, i))

Question 15

Verify the Total time of my code without print calls, it is actually around 1.8s and not 7.038s. Combining your code and @SuperBiasedMan 's output formatting it just takes 0.28s

Question 16

@Protino did you factor in that 7.038 was the total aggregate time from 20 tests run? The bottom value of 0.35 was the average I got from your code without print calls

Question 17

yes the total time is 1.870s and Avg is 0.09sI verified it on my windows 32-bit python platform. Do re-run it and verify.

Question 18

You should grab a copy of python3.5 and give this a try! I ran the ideone code and sure enough it was timed as you said ~2.8s. I re-ran the code in my system but, still got the same results of ~7.0. I started to get suspicious of the version numbers here and saw ideone was using python3.4. I'm running archlinux and python3.5 is installed so, I grabbed a copy of 3.4 and the results matched yours and ideone. Really strange that the results would vary that much.

Question 19

Anyways I got awesome results in HackerRank though. I combined all the suggestions provided here and the code ran in less than 0.06s for a test case involving 100 test runs!. Cheers.

Question 20

OK I was wrong. Checking every prime is way slower than a sieve. My mistake was comparing my optimized code against a slower larger sieve. There is a trade off of speed for size. The sieves that use a set of primes are maybe 2x faster than this sieve code but use much more storage per prime stored. This uses 1 bit per prime where python ints are much larger. Usually the bit array is about 1/5 the size.

I decided to use a bitarray() sieve of Eratosthenes from a program I wrote earlier and then use the 6k-1 prime_sum code I used before. The big advantages are; much faster sieve generator and a very very fast prime check. p[n] just checks a bit for 1 or 0 instead of a long drawn out prime check. This code is 6 times faster than my previous prime_sum(). This is not faster than the set() of prime numbers but it is able to do 4GB lists of primes.

To answer the original question: The sieve is too large using ints and Yes the prints take a long time. Imagine printing out 7,930,427 sums! I just printed the first few. Enough to prove Goldbach right without waiting for a while.

Here are the two outputs for large numbers:

My previous code:

$ python simple_goldbach.py 
Total Time: 0.041585
Avg Testrun Time: 0.002079
30,000 has 602 representation(s)
Please enter an even number greater than 3> 12345678
12,345,678 has 71,169 representation(s) took 1.613520
Please enter an even number greater than 3> 4000000000
4,000,000,000 has 7,930,427 representation(s) took 498.131707
Please enter an even number greater than 3>

And the new sieve code:

 $ python sieve_goldbach.py 
 Made new sieve 1,000,000,000 in 4.366924
Total Time: 0.012563
Avg Testrun Time: 0.000628
30,000 has 602 representation(s)
Please enter an even number greater than 3> 12345678
12,345,678 has 71,169 representation(s)
prime_sum took: 0.195684
31 + 12,345,647 = 12,345,678
41 + 12,345,637 = 12,345,678
97 + 12,345,581 = 12,345,678
101 + 12,345,577 = 12,345,678
Please enter an even number greater than 3> 4000000000
1,000,000,000 is less than 4,000,000,000 making new sieve
 Made new sieve 4,000,000,000 in 19.151368
4,000,000,000 has 7,930,427 representation(s)
prime_sum took: 77.966109
89 + 3,999,999,911 = 4,000,000,000
131 + 3,999,999,869 = 4,000,000,000
239 + 3,999,999,761 = 4,000,000,000
383 + 3,999,999,617 = 4,000,000,000
Please enter an even number greater than 3> 200000000
200,000,000 has 538,290 representation(s)
prime_sum took: 3.153349
37 + 199,999,963 = 200,000,000
43 + 199,999,957 = 200,000,000
97 + 199,999,903 = 200,000,000
181 + 199,999,819 = 200,000,000
Please enter an even number greater than 3>

Here is my faster sieve representation:

from timeit import default_timer
from bitarray import bitarray
global primes
def make_sieve(size):
 """Create a sieve of Eratosthenes up to the given size."""
 s_start = default_timer()
 limit = int(1 + size**0.5) + 2
 p = bitarray(size+2) # One bit per value
 p.setall(True)
 p[0:2] = False # Clear zero and one
 p[4::2] = False # Clear multiples of 2
 p[9::3] = False # Clear multiples of 3
 for i in range(5, limit, 6): # Process only numbers of the form 6k-1 and 6k+1
 h = i + 2 # 6k+1
 if p[i]: # If 6k-1 is prime
 p[i*i::2 * i] = False # Clear multiples of 6k-1
 if p[h]: # If 6k+1 is prime
 p[h*h::2 * h] = False # Clear multiples of 6k+1
 p = p[:size]
 s_stop = default_timer()
 print(f" Made new sieve {len(p):,} in {s_stop - s_start:4f}")
 #return [i for i in range(2,size+1) if p[i]] # to return list of primes
 return p
def prime_sum(n):
 global primes
 if len(primes) < n:
 print(f"{len(primes):,} is less than {n:,} making new sieve")
 primes = make_sieve(n)
 sums = []
 if n&1 or n<4: 
 print("Only even numbers greater than 3")
 return sums
 if n == 4:
 return [[2,2]]
 if primes[n-3]:
 sums += [[3,n-3]]
 for k6 in range(5,n//2+1,6): # only check 6k-1 and 6k+1
 if primes[k6] and primes[n-k6]:
 sums += [[k6,n-k6]]
 if primes[k6+2] and primes[n-(k6+2)]:
 sums += [[k6+2,n-(k6+2)]]
 return sums
 
primes = make_sieve(10**9) 
test_runs = 20
total = 0
test_num = 30000
for _ in range(test_runs):
 out = []
 start = default_timer()
 a = prime_sum(test_num)
 stop = default_timer()
 total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))
N = test_num
goldbach_sums = prime_sum(N)
print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
while True:
 N=-1
 while N<4 or N&1:
 N = int(input("Please enter an even number greater than 3> "))
 start = default_timer()
 goldbach_sums = prime_sum(N)
 stop = default_timer()
 print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
 print('prime_sum took: {:4f}'.format(stop-start))
 print(*[f"{x:,} + {y:,} = {N:,}" for [x,y] in goldbach_sums[:4]],sep='\n')

Question 21

+1 I'll have to check the numbers closer but going over the bit-array aspects is a good use of resources, I like the mindset.

Question 22

No sieve, just check a few primes

Looking at the previous answers, I noticed that most use sieves. This is OK as long as the number in question is smaller than 8 billion or so. The sieve needs to be as large as n//2+1 to collect all of the pairs. Also the sieves take a few minutes to create. So I decided not to use a sieve at all, but to check for individual primes. Checking primes is expensive. I did not want to start at 2 and go through all of the numbers from 2 to n//2+1 and check each for prime.

I used the knowledge that only 6k-1 and 6k+1 are primes after 3. (5=6-1 and 7=6+1, 11=12-1 and 13=12+1...) So I do very few checks. It can be faster. If you add:

from sympy import isprime as p

and comment out my prime check, it should run several times faster.
It will do 4 p(xx) checks per rep worst case. Moving 6 numbers up for each loop.
Here is my output on my i5 1.1GHz laptop:

>python simple_goldbach.py
Total Time: 0.140530
Avg Testrun Time: 0.007027
30,000 has 603 representation(s)
Please enter an even number greater than 4> 120
120 has 13 representation(s)
7 +たす 113 =わ 120
11 +たす 109 =わ 120
13 +たす 107 =わ 120
17 +たす 103 =わ 120
19 +たす 101 =わ 120
23 +たす 97 =わ 120
31 +たす 89 =わ 120
37 +たす 83 =わ 120
41 +たす 79 =わ 120
47 +たす 73 =わ 120
53 +たす 67 =わ 120
59 +たす 61 =わ 120
61 +たす 59 =わ 120
Please enter an even number greater than 3>

Here is the code:

from timeit import default_timer
from gmpy2 import is_prime as p
def p_by_hand(n): # 6k prime check
 if n <=1: return False
 if n <=3: return True
 if n&1 == 0 or n%3 == 0: return False
 limit = int(n**.5+1)+1
 for i in range(5,limit,6):
 if n%i==0 or n%(i+2)==0:
 return False
 return True
def prime_sum(n):
 sums = []
 if n&1 or n<4: 
 print("Only even numbers greater than 3")
 return sums
 if n == 4:
 return [[2,2]]
 if p(n-3):
 sums += [[3,n-3]]
 for k6 in range(5,n//2+1,6): # only check 6k-1 and 6k+1
 if p(k6) and p(n-k6):
 sums += [[k6,n-k6]]
 if p(k6+2) and p(n-(k6+2)):
 sums += [[k6+2,n-(k6+2)]]
 return sums
 
test_runs = 20
total = 0
test_num = 30000
for _ in range(test_runs):
 out = []
 start = default_timer()
 a = prime_sum(test_num)
 stop = default_timer()
 total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))
N = test_num
goldbach_sums = prime_sum(N)
print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
while True:
 N=-1
 while N<4 or N&1:
 N = int(input("Please enter an even number greater than 3> "))
 goldbach_sums = prime_sum(N)
 print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
 print(*[f"{x} + {y} = {N}" for [x,y] in goldbach_sums],sep='\n')

Question 23

"My Goldbach code is O((n//2+1)//6)." Please read the big-Oh page. It's relevant to say that your code takes \$k\$ times that expression, and then give a value of \$k\$ for your particular server. But big-Oh notation deliberately hides that multiplicative constant; it mostly highlights linear vs. polynomial or e.g. \$O(n \log n)\$ behavior. // Also, pep-8 asks that you name it prime_sum().

Question 24

You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.

Question 25

Made some fixes. Sorry. I wrote in on the fly without thinking. I copied the def primeSum(n): from the original question.

score 3 · Accepted Answer · 2015-12-04 17:47:47Z

I don't know the problem well enough to tell you how to improve the speed of the logic, but yes your print calls are slow. print can actually be quite a slow command (it depends on where you're running it).

But there is a way to speed it up without reducing text, and it suits your purposes. If you make less print calls with longer strings, you can reduce the time it takes. And in your case this also means replacing a for loop with a generator expression, which is also good. Generator expressions are list like objects that pass out one value at a time to a function or a loop.

So here's the code we can improve:

for i,j in zip(out[0::2],out[1::2]):
 print (i,"+",j,sep='')

You're iterating over pairs and printing each time, but with the str.join command you could use that for loop setup to instead make one long string that contains all the results. To make it a generator first the print arguments need to be switched to just a normal string formation, like this:

("{}+{}".format(i, j))

i and j will be swapped in where those curly braces are, making the same string you were printing with sep=''. Now here's how the generator expression looks:

("{}+{}".format(i, j) for i, j in zip(out[0::2], out[1::2]))

I have changed the string, but you can see that the for loop is just the same as before. Now to actually turn this into a string, we need to pass it to "".join. join is called on a string object, and whatever you put in that string object is inserted between each value taken from the generator. Since each print is a new line, a newline character fits best:

print(n, "has", round(len(out) / 2), "representation(s)")
results = ("{}+{}".format(i, j) for i, j in zip(out[0::2], out[1::2]))
print('\n'.join(results))

This may not solve your full speed problem, but it is faster and is a cleaner way to format the code.

1

\$\begingroup\$ This is what actually solved the speed problem! \$\endgroup\$

Gurupad Mamadapur
– Gurupad Mamadapur

2015年12月07日 15:02:08 +00:00
Commented Dec 7, 2015 at 15:02

Stack Exchange Network

Goldbach’s Conjecture in Python

Input Format

Output Format

Sample Input

Sample Output

My Python 3.x Code

6 Answers 6

No sieve, just check a few primes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Goldbach’s Conjecture in Python

Input Format

Output Format

Sample Input

Sample Output

My Python 3.x Code

6 Answers 6

No sieve, just check a few primes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions