I implemented the Miller-Rabin-Primetest in python3. But I have the feeling it is very slow. I know that there are faster version in gmpy2 package, but I want to compare it with another code I have in python. So I don't want to use an existing implementation. I used the Pseudocode from Wikipedia. What am I doing wrong?
Here is my code.
def rab_test(n, rounds):
start = time.time()
if n == 1:
return False
if n in [2, 3]:
return True
if n % 2 == 0:
return False
d = n - 1
s = 1
while d % 2 == 0:
d = d // 2
s += 1
for _ in range(rounds):
a = random.randrange(2, n-1)
x = gmpy2.powmod(a, d, n)
for _ in range(s):
y = gmpy2.powmod(x, 2, n)
if y == 1 and x != 1 and x != n-1:
return False
x = y
if y != 1:
return False
return True
1 Answer 1
I don't have many suggestions for how to improve the actual function. That said, I see some evidence that you could use a better context for checking both performance and correctness. Specifically, you're assigning start = time.time() and then never using it; don't do that.
Here's a quick test rig:
import gmpy2
import random
import timeit
def rab_test(n, rounds):
if n == 1:
return False
if n in [2, 3]:
return True
if n % 2 == 0:
return False
d = n - 1
s = 1
while d % 2 == 0:
d = d // 2
s += 1
for _ in range(rounds):
a = random.randrange(2, n-1)
x = gmpy2.powmod(a, d, n)
for _ in range(s):
y = gmpy2.powmod(x, 2, n)
if y == 1 and x != 1 and x != n-1:
return False
x = y
if y != 1:
return False
return True
def main():
for _ in range(1000*10):
n = random.randrange(1000*1000*1000, 1000*1000*1000*1000)
mine = rab_test(n, 50)
theirs = gmpy2.is_prime(n, 50)
if mine != theirs:
print(f"{n}, {mine}, {theirs}\n")
raise
print(timeit.timeit("rab_test(random.randrange(1000*1000*1000*100, 1000*1000*1000*1000), 50)", globals=globals()))
print(timeit.timeit("gmpy2.is_prime(random.randrange(1000*1000*1000*100, 1000*1000*1000*1000), 50)", globals=globals()))
if __name__ == "__main__":
main()
The package timeit is the normal choice in python, and probably good enough. Whether randomly checking 10k values for correctness is good enough is less clear; you can and should tweak the above code to convince yourself that it's doing a good test.
So what's it output?
6.803874525998253
0.7338983130030101
Your code is 10x slower than the library! But the library's written in C, so it's impossible to say if they're doing something trickier in terms of the algorithm or if this is just C's natural advantage. (Actually it's not impossible, you could always just check their source code.)
Now that you have a decent way of comparing alternatives, you might try out alterations to the function. My guess is that, while you could maybe make it fancier looking in various ways, none of them will increase speed, with the possible exception of parallelism.
-
1\$\begingroup\$ I think this can benefit from using
timeit(..., setup=...)- on my machine thegmpy2timeit spends about 20% of its time just generating the random number. Or even better, generate 1000 or so random numbers before the timeits, and test on those, so the two functions have the same input exactly. It may be a somewhat unfair comparison - for numbers in this range it suffices to just test the bases 2, 3, 5, 7, 11, and 13. I wouldn't be surprised ifgmpyis aware of this type of fact. \$\endgroup\$Izaak van Dongen– Izaak van Dongen2024年03月16日 08:11:08 +00:00Commented Mar 16, 2024 at 8:11 -
\$\begingroup\$ One thing to note is that for large N, gmpy2 is limiting the number of cases that are actually run. See the source code here. This means that your tests might not be apples to apples, and you are running more iterations than gmpy2 is \$\endgroup\$C.Nivs– C.Nivs2024年03月21日 21:55:27 +00:00Commented Mar 21, 2024 at 21:55
-
\$\begingroup\$ Interesting point. I don't think it affects any of the tests we've actually done though; the cap is 1k. And, if I understand the math correctly, 1k is actually quite generous. \$\endgroup\$ShapeOfMatter– ShapeOfMatter2024年03月21日 23:15:58 +00:00Commented Mar 21, 2024 at 23:15
-
1\$\begingroup\$ I dug into the GMP source.
gmpy2.is_primewraps the GMP functionmpz_probab_prime_p. This does trial division by small primes, then callsmpz_millerrabin. For numbers below 64 bits,mpz_millerrabinjust does a BPSW test, which is deterministic. In particular it totally ignores therepsargument. You can observe that if you changerepsto 1000 in the call togmpy2.is_prime, the runtime barely changes. You could even the playing field a bit by testing much larger integers. \$\endgroup\$Izaak van Dongen– Izaak van Dongen2024年03月25日 11:41:17 +00:00Commented Mar 25, 2024 at 11:41 -
\$\begingroup\$ That's important! I re-ran the tests, only changing the sampling range and iterations (couldn't be bothered to get them both using the same pre-gen samples). At
(2**65, 2**68), 100the gmpy2 advantage is down to 4, and at(2**74, 2**75), 1kit's down to ~2. So maybe for really large numbers they're the same... \$\endgroup\$ShapeOfMatter– ShapeOfMatter2024年03月26日 20:34:57 +00:00Commented Mar 26, 2024 at 20:34
gmpy2.powmod(x, 2, n)is a culprit. A simplex = (x*x)% nmight be faster. I don't havegmpy2installed to confirm. \$\endgroup\$