I'm solving Euler exercise number 7 and trying to go a bit further on bigger primes. It is too slow with big numbers. I've been reading ways to optimize the sieve, but I would like to ask people with more experience with this.
import time
from math import log, ceil
start = time.time()
primes = [2]
op_count = 0
limitN = 0
pr_count = 0
def primes_goto(prime_index):
global primes
global op_count
global limitN
global pr_count
if prime_index < 6:
limitN = 100
else:
limitN = ceil(prime_index * (log(prime_index) + log(log(prime_index)))) #bound
not_prime = set()
while pr_count < prime_index:
for i in range(3, limitN, 2):
if i in not_prime:
continue
for j in range(i*3, limitN, i*2):
not_prime.add(j)
primes.append(i)
pr_count += 1
return primes
ind = int(10001)
primes_goto(ind)
ind_prime = primes[ind-1]
end = time.time()
print("Prime number at posizion: {} = {}".format(ind, ind_prime))
print("Runtime: {}".format(end-start))
-
1\$\begingroup\$ What big numbers have you tried and what is the expected performance? By the way, 1 is not prime. \$\endgroup\$Marc– Marc2021年01月05日 01:36:44 +00:00Commented Jan 5, 2021 at 1:36
-
\$\begingroup\$ projecteuler.net/problem=7 \$\endgroup\$Al Mastro– Al Mastro2021年01月05日 08:22:21 +00:00Commented Jan 5, 2021 at 8:22
-
\$\begingroup\$ I've tried to calculate the 1000000 prime number, and it took 5 seconds \$\endgroup\$Al Mastro– Al Mastro2021年01月05日 08:23:03 +00:00Commented Jan 5, 2021 at 8:23
-
\$\begingroup\$ Python is very slow, if you care about optimised performance you would write in C/C++, which is roughly 20x-100x faster than python. \$\endgroup\$CaptainCodeman– CaptainCodeman2021年01月05日 10:20:26 +00:00Commented Jan 5, 2021 at 10:20
3 Answers 3
Unused variables
op_count
is declared in the global scope, pulled into the primes_goto
local scope, but is never used. It may be deleted.
Unnecessary Globals
limitN
and pr_count
are declared in the global scope, but only ever used inside the primes_goto
function. They may be removed from the global scope, and simply declared inside the primes_goto
function.
Unused Return
The primes_goto
function ends with return primes
, but the returned value is not assigned to anything.
One way to fix this would be to remove the return primes
statement.
A better way would be to move the primes = [2]
initialization inside the primes_goto
function, and remove global primes
declaration. Then, return this local result, and assign the result to a variable in the caller’s context. Ie)
primes = primes_goto(ind)
Unnecessary cast
ind = int(10001)
The value 10001
is already an integer; there is no need to "cast" it.
Organization
Python programs should follow the following organization:
- imports
- function & class declarations
- mainline
The initialization of variables should be moved from before primes_goto
to after all function declarations.
Profiling
time.time()
has limited resolution, due to it expressing the time from an epoch decades in the past, in factions of a second. time.perf_counter()
expresses time from an arbitrary epoch to the highest resolution available, making it ideal for measuring time intervals.
Reworked code
import time
from math import log, ceil
def primes_goto(prime_index):
primes = [2]
pr_count = 0
if prime_index < 6:
limitN = 100
else:
limitN = ceil(prime_index * (log(prime_index) + log(log(prime_index)))) #bound
not_prime = set()
while pr_count < prime_index:
for i in range(3, limitN, 2):
if i in not_prime:
continue
for j in range(i*3, limitN, i*2):
not_prime.add(j)
primes.append(i)
pr_count += 1
return primes
start = time.perf_counter()
ind = 10001
primes_goto(ind)
ind_prime = primes[ind-1]
end = time.perf_counter()
print("Prime number at position: {} = {}".format(ind, ind_prime))
print("Runtime: {}".format(end-start))
Optimization
bitarray
As pointed out in other reviews, bitarray
can be efficiently used to store the sieve flags, and i*i
is a better starting point for crossing off prime candidates due to all smaller multiples already being eliminated as multiples of smaller primes.
Avoid unnecessary work
Again, as pointed out in other answers: marking off primes candidates as 3*i
(or i*i
) is pointless once i
exceeds limitN//3
(or isqrt(limitN)
), you can gain efficiency by separating the prime your while
loop into two: the first part crossing off multiples of a prime number while adding that prime to the primes
list, the second while
loop just adding discovered primes to the primes
list.
PEP 8
The Style Guide for Python Programs enumerates several rules that Python programs should follow. The main violation in your code relates to naming: You should use only snake_case
for variables. limitN
should be renamed to limit_n
, or upper_limit
.
Naming
Why we’re talking about names, ind
, ind_prime
, pr_count
and prime_goto
are all terrible names. You might know what they mean today, but other programmers reading the code will have a hard time trying to elude their meaning; you may even have problems if you revisit the code months down the road.
first_n_primes(n)
would be a better function name. I’ll leave you to come up with better variable names.
-
\$\begingroup\$ Nice review, I didn't point those issues out cause I felt the OP needed just performance review. \$\endgroup\$theProgrammer– theProgrammer2021年01月06日 05:28:02 +00:00Commented Jan 6, 2021 at 5:28
-
\$\begingroup\$ @theProgrammer The OP wanted a performance review. They needed a full code review. \$\endgroup\$AJNeufeld– AJNeufeld2021年01月06日 07:04:13 +00:00Commented Jan 6, 2021 at 7:04
-
\$\begingroup\$ @theProgrammer I actually struggled between upvoting and downvoting your answer. From the help centre: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." Your answer was dangerously close to just being an alternate solution. In the end I decided that the "consuming all free memory (6GB)" was enough of an insightful observation to allow the answer to stand, and instead added my own review. \$\endgroup\$AJNeufeld– AJNeufeld2021年01月06日 16:09:05 +00:00Commented Jan 6, 2021 at 16:09
You can make this part a bit faster:
for j in range(i*3, limitN, i*2):
not_prime.add(j)
Better start at i*i
, and better don't do your own loop. So it becomes:
not_prime.update(range(i*i, limitN, i*2))
You can use bitarray
, this tends to be a little faster.
The code using bitarray
is
from bitarray import bitarray
import time
from math import sqrt, ceil, log
def primes_goto2(index: int):
prime_up_to = ceil(index * (log(index) + log(log(index)))) + 4
primes = bitarray(prime_up_to)
primes.setall(True)
primes[0] = primes[1] = False
primes[4::2] = False
for i in range(3, int(sqrt(prime_up_to)), 2):
if primes[i]:
primes[i * i::i] = False
prime_list = [i for i in range(len(primes)) if primes[i]]
return prime_list[index]
Time taken for 1000000
(NOTE: my index does not require index - 1, so in order to get same value as me you have to use 1000001
)
index = 1000000
t0 = time.perf_counter()
prime_number = primes_goto2(index)
t1 = time.perf_counter()
print("Prime number at position: {} = {}".format(index, prime_number))
print("Runtime: {}".format(t1-t0))
Output:
Prime number at position: 1000000 =わ 15485867
Runtime: 1.888874288000011
Which is very much better than yours.
I also noticed, yours consumes a lot of memory, running 1000001
ate up all my free memory(6gb)
-
1\$\begingroup\$ You should use
isqrt()
instead ofint(sqrt())
, especially for large numbers, where floating point accuracy will corrupt the result. You've got an off-by-one error, since range is half-open. You wantisqrt(prime_up_to)+1
. \$\endgroup\$AJNeufeld– AJNeufeld2021年01月05日 19:10:47 +00:00Commented Jan 5, 2021 at 19:10 -
1\$\begingroup\$ Using
2*i
for step will increase your speed:primes[i*i::2*i] = False
\$\endgroup\$AJNeufeld– AJNeufeld2021年01月05日 19:13:13 +00:00Commented Jan 5, 2021 at 19:13 -
\$\begingroup\$ this solution, with suggestions implemented, took only 0.76 secs, which is dramatically better than mine \$\endgroup\$Al Mastro– Al Mastro2021年01月06日 17:03:24 +00:00Commented Jan 6, 2021 at 17:03
Explore related questions
See similar questions with these tags.