Input Format
Input starts with an integer n (1 ≤ n ≤ 100) indicating the number of cases. The following n lines each contain a test case of a single even number x (4 ≤ x ≤ 32000).
Output Format
For each test case x, give the number of unique ways that x can be represented as a sum of two primes. Then list the sums (one sum per line) in increasing order of the first addend. The first addend must always be less than or equal to the second to avoid duplicates.
Sample Input
2
26
100
Sample Output
26 has 3 representation(s)
3+23
7+19
13+13
The above problem was originally posted in 2013 ICPC North America Qualifier, I encountered it in HackerRank's Round-I Holiday Cup contest.
My Python 3.x Code
import math
import time
def sieve(n):
"Return all primes <= n using sieve of erato."
np1 = n + 1
s = list(range(np1))
s[1] = 0
sqrtn = int(round(n**0.5))
for i in range(2, sqrtn + 1):
if s[i]:
s[i*i: np1: i] = [0] * len(range(i*i, np1, i))
return filter(None, s)
def primeSum(n):
p=2
limit=math.floor(n/2)
for p in primes:
q=n-p
if(p>q):
break
if(q in primes):
out.extend([p,q])
print (n,"has",round(len(out)/2),"representation(s)")
for i,j in zip(out[0::2],out[1::2]):
print (i,"+",j,sep='')
print ("")
primes = sorted(set(sieve(32000)))
for __ in range(int(input())):
out = []
primeSum(int(input()))
This code almost took over 5 seconds for larger values of n but ran in less than 1 second in most cases. Is it because of time taken to output all those representations for larger values (because 32000 has over 300 representations?
Can this be optimized? Also review my general coding (I'm new to python).
-
\$\begingroup\$ Apparently there is a bug. It shows 0 representations for 125. \$\endgroup\$vnp– vnp2015年12月02日 19:37:40 +00:00Commented Dec 2, 2015 at 19:37
-
\$\begingroup\$ Yea because Goldbach's conjecture is all about evens and primes. It says if n is even it can be represented as n=p+q where both p and q are primes. \$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月02日 19:40:15 +00:00Commented Dec 2, 2015 at 19:40
-
\$\begingroup\$ Oops... I don't know what I was thinking about. Sorry. \$\endgroup\$vnp– vnp2015年12月02日 19:41:38 +00:00Commented Dec 2, 2015 at 19:41
6 Answers 6
I don't know the problem well enough to tell you how to improve the speed of the logic, but yes your print calls are slow. print
can actually be quite a slow command (it depends on where you're running it).
But there is a way to speed it up without reducing text, and it suits your purposes. If you make less print calls with longer strings, you can reduce the time it takes. And in your case this also means replacing a for loop with a generator expression, which is also good. Generator expressions are list like objects that pass out one value at a time to a function or a loop.
So here's the code we can improve:
for i,j in zip(out[0::2],out[1::2]):
print (i,"+",j,sep='')
You're iterating over pairs and printing each time, but with the str.join
command you could use that for
loop setup to instead make one long string that contains all the results. To make it a generator first the print arguments need to be switched to just a normal string formation, like this:
("{}+{}".format(i, j))
i
and j
will be swapped in where those curly braces are, making the same string you were printing with sep=''
. Now here's how the generator expression looks:
("{}+{}".format(i, j) for i, j in zip(out[0::2], out[1::2]))
I have changed the string, but you can see that the for loop is just the same as before. Now to actually turn this into a string, we need to pass it to "".join
. join
is called on a string object, and whatever you put in that string object is inserted between each value taken from the generator. Since each print is a new line, a newline character fits best:
print(n, "has", round(len(out) / 2), "representation(s)")
results = ("{}+{}".format(i, j) for i, j in zip(out[0::2], out[1::2]))
print('\n'.join(results))
This may not solve your full speed problem, but it is faster and is a cleaner way to format the code.
-
1\$\begingroup\$ This is what actually solved the speed problem! \$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月07日 15:02:08 +00:00Commented Dec 7, 2015 at 15:02
Pass more arguments to functions
primes
is global, but passing it as an argument to primeSum
is trivial:
def primeSum(n, primes):
and then:
primeSum(int(input()), primes)
This makes the reasoning about and testing the function easier.
Do not modify random global variables
out.extend([p,q])
where out
is a global variable. Don't. It makes the code impossible to use in different situations and therefore wastes your programming effort.
Do not print from a computation function
primeSum
should find a prime sum. Printing the couples nicely is the job for another function. Again a computer friendly return format improves re-usability.
Remove dead code
You write:
limit=math.floor(n/2)
But limit
is never been used, so delete that line.
Also p=2
can be eliminated as it does nothing.
Meaningful names
Avoid single-letter names, be descriptive. Good names can really make the difference between un-readable and readable code. And remember to use snake_case
in Python.
Final version
Putting all my advice together gives this nice prime_sum
function:
def prime_sums(number, primes):
"""
Checks how a number may be written as a sum of two primes.
>>> list(prime_sums(100, sorted(set(sieve(100)))))
[(3, 97), (11, 89), (17, 83), (29, 71), (41, 59), (47, 53)]
"""
for prime in primes:
difference = number - prime
if(prime > difference):
break
if(difference in primes):
yield (prime, difference)
-
\$\begingroup\$ Hmm..I agree as it was a contest I tried to shorten the code. Thanks anyways. \$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月02日 19:49:27 +00:00Commented Dec 2, 2015 at 19:49
-
\$\begingroup\$ Actually I have not learnt to use yield, I'll look into it! \$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月02日 20:18:04 +00:00Commented Dec 2, 2015 at 20:18
-
\$\begingroup\$ @Protino it is a sleek way of returning a list from a function basically. Stackoverflow explain it very well: stackoverflow.com/questions/231767/… \$\endgroup\$Caridorc– Caridorc2015年12月02日 20:20:00 +00:00Commented Dec 2, 2015 at 20:20
-
\$\begingroup\$ @oliverpool Sure \$\endgroup\$Caridorc– Caridorc2015年12月03日 13:43:36 +00:00Commented Dec 3, 2015 at 13:43
-
\$\begingroup\$ I love doing code review review ^^ (I deleted the comments that you fixed) \$\endgroup\$oliverpool– oliverpool2015年12月03日 14:00:49 +00:00Commented Dec 3, 2015 at 14:00
Here are some comments:
Cut out unused code. A few instances I spotted:
import time # The time module is never used limit=math.floor(n/2) # We never use this variable
Read PEP 8, the Python style guide. I’m not going to break it down in detail, but among other things:
- Spaces around binary operators
- Function names should be lowercase_with_underscores, not camelCase as in
primeSum()
. - Two newlines between functions.
Use better variable names. Names like p, q and s aren’t very helpful – single-letter variable names rarely are. You should give them meaningful names, that tell me what the value represents in the original problem.
Rethink how the primeSum() function works. Right now, it manipulates an external list (
out
) and prints some stuff to the screen. The printing is unhelpful if I want to use this as part of a larger function (not reusable), and it’s trusting the caller to set up a list calledout
. It would be better if:- It didn’t print anything
- It created a new, empty version of
out
as an internal list, and returned that
Subsequent invocations will do weird things with the contents of
out
.More comments and docstrings. There’s only one docstring in this function, and it’s quite short. You should bulk up the docstrings, telling me what the function does, what inputs it takes, and what the meaning of the return value is.
There should also be some comments relating the code back to the original problem. This is really important for reading, reviewing and maintainability. (Starting example: why do you set
s[1]
to be0
on line 7? This isn’t obvious to somebody who only has the code.)Tidy up the computation of the sqrtn variable.
- Rather than using
n**0.5
, usemath.sqrt(n)
. Slightly longer, but more explicit. Once you’ve called
round()
, you don’t need to cast to int. Quoting from the docs:The return value is an integer if called with one argument, otherwise of the same type as number.
(Note this behaviour is new with Python 3.)
Since this variable is only used once, I’d inline it with the range() on the next line.
- Rather than using
Building on top of what was already pointed out in the other posts, there are a couple of optimizations to be made.
Set operations in place of iterating list
You are creating a list
of sorted
primes after calling set
on the original list
from your sieve and then using that primes list
to check for members in your main loop:
for p in primes:
q=n-p
if(p>q):
break
if(q in primes):
out.extend([p,q])
In the trimmed down sample below, you can save a copy of that call to set
while still iterating in the sorted
primes list
but checking for membership using the set
logical operations:
pset = set(sieve(32000))
primes = sorted(pset)
then the loop is:
for p in primes:
q=n-p
if(p>q):
break
if {q} & pset:
out.extend([p,q])
This is improving the overall time by about two orders of magnitude:
import math
import time
from timeit import default_timer
def sieve(n):
"Return all primes <= n using sieve of erato."
np1 = n + 1
s = list(range(np1))
s[1] = 0
sqrtn = int(round(n**0.5))
for i in range(2, sqrtn + 1):
if s[i]:
s[i*i: np1: i] = [0] * int(math.ceil((np1 - i*i) / i))
return filter(None, s)
def primeSum(n):
p=2
for p in primes:
q=n-p
if(p>q):
break
if {q} & pset:
out.extend([p,q])
pset = set(sieve(32000))
primes = sorted(pset)
test_runs = 20
total = 0
for _ in range(test_runs):
out = []
start = default_timer()
primeSum(30000)
stop = default_timer()
total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))
Results in:
Total Time: 0.027330
Avg Testrun Time: 0.001367
Where, a trimmed down version of your original (minus the print
calls):
import math
import time
from timeit import default_timer
def sieve(n):
"Return all primes <= n using sieve of erato."
np1 = n + 1
s = list(range(np1))
s[1] = 0
sqrtn = int(round(n**0.5))
for i in range(2, sqrtn + 1):
if s[i]:
s[i*i: np1: i] = [0] * len(range(i*i, np1, i))
return filter(None, s)
def primeSum(n):
p=2
limit=math.floor(n/2)
for p in primes:
q=n-p
if(p>q):
break
if(q in primes):
out.extend([p,q])
primes = sorted(set(sieve(32000)))
test_runs = 20
total = 0
for _ in range(test_runs):
out = []
start = default_timer()
primeSum(30000)
stop = default_timer()
total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))
Runs in:
Total Time: 7.038189
Avg Testrun Time: 0.351909
Note: this code is run with python3.5
Math operation instead of useless object creation
I calculate the length of the sequence to be added in the sieve:
s[i*i: np1: i] = [0] * int(math.ceil((np1 - i*i) / i))
Instead of calling len
over a range
object you won't use:
s[i*i: np1: i] = [0] * len(range(i*i, np1, i))
-
\$\begingroup\$ Verify the Total time of my code without print calls, it is actually around
1.8s
and not7.038s
. Combining your code and @SuperBiasedMan 's output formatting it just takes0.28s
\$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月05日 11:02:43 +00:00Commented Dec 5, 2015 at 11:02 -
\$\begingroup\$ @Protino did you factor in that
7.038
was the total aggregate time from 20 tests run? The bottom value of0.35
was the average I got from your code withoutprint
calls \$\endgroup\$tijko– tijko2015年12月05日 18:29:20 +00:00Commented Dec 5, 2015 at 18:29 -
\$\begingroup\$ yes the total time is
1.870s
and Avg is0.09s
I verified it on my windows 32-bit python platform. Do re-run it and verify. \$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月05日 18:51:52 +00:00Commented Dec 5, 2015 at 18:51 -
1\$\begingroup\$ You should grab a copy of
python3.5
and give this a try! I ran the ideone code and sure enough it was timed as you said ~2.8s. I re-ran the code in my system but, still got the same results of ~7.0. I started to get suspicious of the version numbers here and saw ideone was usingpython3.4
. I'm running archlinux andpython3.5
is installed so, I grabbed a copy of 3.4 and the results matched yours and ideone. Really strange that the results would vary that much. \$\endgroup\$tijko– tijko2015年12月06日 04:59:27 +00:00Commented Dec 6, 2015 at 4:59 -
1\$\begingroup\$ Anyways I got awesome results in HackerRank though. I combined all the suggestions provided here and the code ran in less than
0.06s
for a test case involving100 test runs!.
Cheers. \$\endgroup\$Gurupad Mamadapur– Gurupad Mamadapur2015年12月06日 06:48:16 +00:00Commented Dec 6, 2015 at 6:48
OK I was wrong. Checking every prime is way slower than a sieve. My mistake was comparing my optimized code against a slower larger sieve. There is a trade off of speed for size. The sieves that use a set of primes are maybe 2x faster than this sieve code but use much more storage per prime stored. This uses 1 bit per prime where python ints are much larger. Usually the bit array is about 1/5 the size.
I decided to use a bitarray() sieve of Eratosthenes from a program I wrote earlier and then use the 6k-1 prime_sum code I used before. The big advantages are; much faster sieve generator and a very very fast prime check. p[n] just checks a bit for 1 or 0 instead of a long drawn out prime check. This code is 6 times faster than my previous prime_sum(). This is not faster than the set() of prime numbers but it is able to do 4GB lists of primes.
To answer the original question: The sieve is too large using ints and Yes the prints take a long time. Imagine printing out 7,930,427 sums! I just printed the first few. Enough to prove Goldbach right without waiting for a while.
Here are the two outputs for large numbers:
My previous code:
$ python simple_goldbach.py
Total Time: 0.041585
Avg Testrun Time: 0.002079
30,000 has 602 representation(s)
Please enter an even number greater than 3> 12345678
12,345,678 has 71,169 representation(s) took 1.613520
Please enter an even number greater than 3> 4000000000
4,000,000,000 has 7,930,427 representation(s) took 498.131707
Please enter an even number greater than 3>
And the new sieve code:
$ python sieve_goldbach.py
Made new sieve 1,000,000,000 in 4.366924
Total Time: 0.012563
Avg Testrun Time: 0.000628
30,000 has 602 representation(s)
Please enter an even number greater than 3> 12345678
12,345,678 has 71,169 representation(s)
prime_sum took: 0.195684
31 + 12,345,647 = 12,345,678
41 + 12,345,637 = 12,345,678
97 + 12,345,581 = 12,345,678
101 + 12,345,577 = 12,345,678
Please enter an even number greater than 3> 4000000000
1,000,000,000 is less than 4,000,000,000 making new sieve
Made new sieve 4,000,000,000 in 19.151368
4,000,000,000 has 7,930,427 representation(s)
prime_sum took: 77.966109
89 + 3,999,999,911 = 4,000,000,000
131 + 3,999,999,869 = 4,000,000,000
239 + 3,999,999,761 = 4,000,000,000
383 + 3,999,999,617 = 4,000,000,000
Please enter an even number greater than 3> 200000000
200,000,000 has 538,290 representation(s)
prime_sum took: 3.153349
37 + 199,999,963 = 200,000,000
43 + 199,999,957 = 200,000,000
97 + 199,999,903 = 200,000,000
181 + 199,999,819 = 200,000,000
Please enter an even number greater than 3>
Here is my faster sieve representation:
from timeit import default_timer
from bitarray import bitarray
global primes
def make_sieve(size):
"""Create a sieve of Eratosthenes up to the given size."""
s_start = default_timer()
limit = int(1 + size**0.5) + 2
p = bitarray(size+2) # One bit per value
p.setall(True)
p[0:2] = False # Clear zero and one
p[4::2] = False # Clear multiples of 2
p[9::3] = False # Clear multiples of 3
for i in range(5, limit, 6): # Process only numbers of the form 6k-1 and 6k+1
h = i + 2 # 6k+1
if p[i]: # If 6k-1 is prime
p[i*i::2 * i] = False # Clear multiples of 6k-1
if p[h]: # If 6k+1 is prime
p[h*h::2 * h] = False # Clear multiples of 6k+1
p = p[:size]
s_stop = default_timer()
print(f" Made new sieve {len(p):,} in {s_stop - s_start:4f}")
#return [i for i in range(2,size+1) if p[i]] # to return list of primes
return p
def prime_sum(n):
global primes
if len(primes) < n:
print(f"{len(primes):,} is less than {n:,} making new sieve")
primes = make_sieve(n)
sums = []
if n&1 or n<4:
print("Only even numbers greater than 3")
return sums
if n == 4:
return [[2,2]]
if primes[n-3]:
sums += [[3,n-3]]
for k6 in range(5,n//2+1,6): # only check 6k-1 and 6k+1
if primes[k6] and primes[n-k6]:
sums += [[k6,n-k6]]
if primes[k6+2] and primes[n-(k6+2)]:
sums += [[k6+2,n-(k6+2)]]
return sums
primes = make_sieve(10**9)
test_runs = 20
total = 0
test_num = 30000
for _ in range(test_runs):
out = []
start = default_timer()
a = prime_sum(test_num)
stop = default_timer()
total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))
N = test_num
goldbach_sums = prime_sum(N)
print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
while True:
N=-1
while N<4 or N&1:
N = int(input("Please enter an even number greater than 3> "))
start = default_timer()
goldbach_sums = prime_sum(N)
stop = default_timer()
print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
print('prime_sum took: {:4f}'.format(stop-start))
print(*[f"{x:,} + {y:,} = {N:,}" for [x,y] in goldbach_sums[:4]],sep='\n')
-
\$\begingroup\$ +1 I'll have to check the numbers closer but going over the bit-array aspects is a good use of resources, I like the mindset. \$\endgroup\$tijko– tijko2024年06月07日 15:23:00 +00:00Commented Jun 7, 2024 at 15:23
No sieve, just check a few primes
Looking at the previous answers, I noticed that most use sieves. This is OK as long as the number in question is smaller than 8 billion or so. The sieve needs to be as large as n//2+1 to collect all of the pairs. Also the sieves take a few minutes to create. So I decided not to use a sieve at all, but to check for individual primes. Checking primes is expensive. I did not want to start at 2 and go through all of the numbers from 2 to n//2+1 and check each for prime.
I used the knowledge that only 6k-1 and 6k+1 are primes after 3. (5=6-1 and 7=6+1, 11=12-1 and 13=12+1...) So I do very few checks. It can be faster. If you add:
from sympy import isprime as p
and comment out my prime check, it should run several times faster.
It will do 4 p(xx) checks per rep worst case. Moving 6 numbers up for each loop.
Here is my output on my i5 1.1GHz laptop:
>python simple_goldbach.py
Total Time: 0.140530
Avg Testrun Time: 0.007027
30,000 has 603 representation(s)
Please enter an even number greater than 4> 120
120 has 13 representation(s)
7 +たす 113 =わ 120
11 +たす 109 =わ 120
13 +たす 107 =わ 120
17 +たす 103 =わ 120
19 +たす 101 =わ 120
23 +たす 97 =わ 120
31 +たす 89 =わ 120
37 +たす 83 =わ 120
41 +たす 79 =わ 120
47 +たす 73 =わ 120
53 +たす 67 =わ 120
59 +たす 61 =わ 120
61 +たす 59 =わ 120
Please enter an even number greater than 3>
Here is the code:
from timeit import default_timer
from gmpy2 import is_prime as p
def p_by_hand(n): # 6k prime check
if n <=1: return False
if n <=3: return True
if n&1 == 0 or n%3 == 0: return False
limit = int(n**.5+1)+1
for i in range(5,limit,6):
if n%i==0 or n%(i+2)==0:
return False
return True
def prime_sum(n):
sums = []
if n&1 or n<4:
print("Only even numbers greater than 3")
return sums
if n == 4:
return [[2,2]]
if p(n-3):
sums += [[3,n-3]]
for k6 in range(5,n//2+1,6): # only check 6k-1 and 6k+1
if p(k6) and p(n-k6):
sums += [[k6,n-k6]]
if p(k6+2) and p(n-(k6+2)):
sums += [[k6+2,n-(k6+2)]]
return sums
test_runs = 20
total = 0
test_num = 30000
for _ in range(test_runs):
out = []
start = default_timer()
a = prime_sum(test_num)
stop = default_timer()
total += (stop - start)
print('Total Time: {:4f}'.format(total))
print('Avg Testrun Time: {:4f}'.format(total / test_runs))
N = test_num
goldbach_sums = prime_sum(N)
print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
while True:
N=-1
while N<4 or N&1:
N = int(input("Please enter an even number greater than 3> "))
goldbach_sums = prime_sum(N)
print(f"{N:,} has {len(goldbach_sums):,} representation(s)")
print(*[f"{x} + {y} = {N}" for [x,y] in goldbach_sums],sep='\n')
-
1\$\begingroup\$ "My Goldbach code is O((n//2+1)//6)." Please read the big-Oh page. It's relevant to say that your code takes \$k\$ times that expression, and then give a value of \$k\$ for your particular server. But big-Oh notation deliberately hides that multiplicative constant; it mostly highlights linear vs. polynomial or e.g. \$O(n \log n)\$ behavior. // Also, pep-8 asks that you name it
prime_sum()
. \$\endgroup\$J_H– J_H2024年06月06日 23:04:39 +00:00Commented Jun 6, 2024 at 23:04 -
\$\begingroup\$ You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer. \$\endgroup\$Toby Speight– Toby Speight2024年06月07日 10:08:59 +00:00Commented Jun 7, 2024 at 10:08
-
\$\begingroup\$ Made some fixes. Sorry. I wrote in on the fly without thinking. I copied the
def primeSum(n):
from the original question. \$\endgroup\$Andy Richter– Andy Richter2024年06月07日 12:54:11 +00:00Commented Jun 7, 2024 at 12:54
Explore related questions
See similar questions with these tags.