3
\$\begingroup\$

I implemented the Miller-Rabin-Primetest in python3. But I have the feeling it is very slow. I know that there are faster version in gmpy2 package, but I want to compare it with another code I have in python. So I don't want to use an existing implementation. I used the Pseudocode from Wikipedia. What am I doing wrong?

Here is my code.

def rab_test(n, rounds):
 start = time.time()
 if n == 1:
 return False
 if n in [2, 3]:
 return True
 if n % 2 == 0:
 return False
 d = n - 1
 s = 1
 while d % 2 == 0:
 d = d // 2
 s += 1
 for _ in range(rounds):
 a = random.randrange(2, n-1)
 x = gmpy2.powmod(a, d, n)
 for _ in range(s):
 y = gmpy2.powmod(x, 2, n)
 if y == 1 and x != 1 and x != n-1:
 return False
 x = y
 if y != 1:
 return False
 return True
Sᴀᴍ Onᴇᴌᴀ
29.6k16 gold badges46 silver badges203 bronze badges
asked Mar 15, 2024 at 9:05
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Let me start with "does the code produce the correct output?" \$\endgroup\$ Commented Mar 15, 2024 at 10:55
  • \$\begingroup\$ @possum yes it does \$\endgroup\$ Commented Mar 15, 2024 at 11:30
  • 1
    \$\begingroup\$ When in doubt, profile. Where the code spends most of the time? That said, my first instinct is that gmpy2.powmod(x, 2, n) is a culprit. A simple x = (x*x)% n might be faster. I don't have gmpy2 installed to confirm. \$\endgroup\$ Commented Mar 15, 2024 at 21:40

1 Answer 1

4
\$\begingroup\$

I don't have many suggestions for how to improve the actual function. That said, I see some evidence that you could use a better context for checking both performance and correctness. Specifically, you're assigning start = time.time() and then never using it; don't do that.

Here's a quick test rig:

import gmpy2
import random
import timeit
def rab_test(n, rounds):
 if n == 1:
 return False
 if n in [2, 3]:
 return True
 if n % 2 == 0:
 return False
 d = n - 1
 s = 1
 while d % 2 == 0:
 d = d // 2
 s += 1
 for _ in range(rounds):
 a = random.randrange(2, n-1)
 x = gmpy2.powmod(a, d, n)
 for _ in range(s):
 y = gmpy2.powmod(x, 2, n)
 if y == 1 and x != 1 and x != n-1:
 return False
 x = y
 if y != 1:
 return False
 return True
def main():
 for _ in range(1000*10):
 n = random.randrange(1000*1000*1000, 1000*1000*1000*1000)
 mine = rab_test(n, 50)
 theirs = gmpy2.is_prime(n, 50)
 if mine != theirs:
 print(f"{n}, {mine}, {theirs}\n")
 raise
 print(timeit.timeit("rab_test(random.randrange(1000*1000*1000*100, 1000*1000*1000*1000), 50)", globals=globals()))
 print(timeit.timeit("gmpy2.is_prime(random.randrange(1000*1000*1000*100, 1000*1000*1000*1000), 50)", globals=globals()))
if __name__ == "__main__":
 main()

The package timeit is the normal choice in python, and probably good enough. Whether randomly checking 10k values for correctness is good enough is less clear; you can and should tweak the above code to convince yourself that it's doing a good test.

So what's it output?

6.803874525998253
0.7338983130030101

Your code is 10x slower than the library! But the library's written in C, so it's impossible to say if they're doing something trickier in terms of the algorithm or if this is just C's natural advantage. (Actually it's not impossible, you could always just check their source code.)

Now that you have a decent way of comparing alternatives, you might try out alterations to the function. My guess is that, while you could maybe make it fancier looking in various ways, none of them will increase speed, with the possible exception of parallelism.

answered Mar 15, 2024 at 18:26
\$\endgroup\$
5
  • 1
    \$\begingroup\$ I think this can benefit from using timeit(..., setup=...) - on my machine the gmpy2 timeit spends about 20% of its time just generating the random number. Or even better, generate 1000 or so random numbers before the timeits, and test on those, so the two functions have the same input exactly. It may be a somewhat unfair comparison - for numbers in this range it suffices to just test the bases 2, 3, 5, 7, 11, and 13. I wouldn't be surprised if gmpy is aware of this type of fact. \$\endgroup\$ Commented Mar 16, 2024 at 8:11
  • \$\begingroup\$ One thing to note is that for large N, gmpy2 is limiting the number of cases that are actually run. See the source code here. This means that your tests might not be apples to apples, and you are running more iterations than gmpy2 is \$\endgroup\$ Commented Mar 21, 2024 at 21:55
  • \$\begingroup\$ Interesting point. I don't think it affects any of the tests we've actually done though; the cap is 1k. And, if I understand the math correctly, 1k is actually quite generous. \$\endgroup\$ Commented Mar 21, 2024 at 23:15
  • 1
    \$\begingroup\$ I dug into the GMP source. gmpy2.is_prime wraps the GMP function mpz_probab_prime_p. This does trial division by small primes, then calls mpz_millerrabin. For numbers below 64 bits, mpz_millerrabin just does a BPSW test, which is deterministic. In particular it totally ignores the reps argument. You can observe that if you change reps to 1000 in the call to gmpy2.is_prime, the runtime barely changes. You could even the playing field a bit by testing much larger integers. \$\endgroup\$ Commented Mar 25, 2024 at 11:41
  • \$\begingroup\$ That's important! I re-ran the tests, only changing the sampling range and iterations (couldn't be bothered to get them both using the same pre-gen samples). At (2**65, 2**68), 100 the gmpy2 advantage is down to 4, and at (2**74, 2**75), 1k it's down to ~2. So maybe for really large numbers they're the same... \$\endgroup\$ Commented Mar 26, 2024 at 20:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.