Extended Euclidean algorithm

Question 1

I have programmed the extended euclidean algorithm. Is this a good approach?

def ext_ggT(a, b):
 x,y, u,v = 0,1, 1,0
 while a != 0:
 q, r = b//a, b%a
 m, n = x-u*q, y-v*q
 b,a, x,y, u,v = a,r, u,v, m,n
 gcd = b
 return gcd, x, y

Question 2

Close voters, just because you don't know what the extended Euclidean algorithm is doesn't mean that the question is unclear. I don't close C questions because I don't know C and it's 'unclear' to me.

Question 3

Is this a good approach? approach to what exactly? Or do you want opinions if you coded it well, advice how to code it better?

Question 4

I guess advice... There are always more efficient ways of doing it, so I wanted some feedback. I have a book for computer science in which I found the theory of the algorithm and with it I programmed this function.

Question 5

For what it's worth, as a number theorist, I found this a very clear transcription in code of how I would think of the problem.

Question 6

Looks good to me. I would add a comment explaining the return value and the Bezout identity.

Question 7

Not being familiar with the algorithm in question, it was very non-obvious to me what the code did, and in general I think it's good to write code in such a way that it's comprehensible with minimal reasonable effort even if the reader isn't a domain expert (i.e. they should be able to look up documentation, look at your code, and see how the two relate). The main hindrances I faced were:

It wasn't immediately obvious that x, y, u, v represented two distinct series (where u was always a prior value of x, etc). In general there are a lot of variables in this code to keep track of, and not a lot of explanation of what they're all for.
Dependencies between the different values likewise were not obvious. Having all the assignments smushed onto one line made it hard to discern this visually; it's nice to use compact tuple assignments when the relationships are obvious, but it doesn't necessarily always improve readability.
There're no doc/comments explaining what's going on.
The name ext_ggT doesn't follow Python's snake_case naming convention and is a bit cryptic.

After reading the wiki link (thx Peilon) I was able to sort of reverse-engineer it and then I made some changes so that the code matches up more with my understanding based on the wiki article (and is commented so that anyone looking at this code side by side with the wiki article will immediately see what goes with what).

from collections import deque
from typing import Tuple
def extended_euclidean(a: int, b: int) -> Tuple[int, int, int]:
 """
 Returns (gcd, x, y) such that:
 gcd = greatest common divisor of (a, b)
 x, y = coefficients such that ax + by = gcd
 """
 # We only need to keep the last two elements of each series.
 r = deque([b, a], 2)
 s = deque([0, 1], 2)
 t = deque([1, 0], 2)
 # The next element of each series is a function of the previous two.
 # We stop building these series once r (the remainder) is zero; 
 # the final result comes from the iteration prior to that one.
 while r[-1] != 0:
 q = r[-2] // r[-1]
 r.append(r[-2] % r[-1])
 s.append(s[-2] - s[-1] * q)
 t.append(t[-2] - t[-1] * q)
 return r[-2], s[-2], t[-2]
assert extended_euclidean(240, 46) == (2, -9, 47)

The biggest change is that I've represented the various series described in the wiki article as iterables, rather than representing each as two scalars; this doesn't make much difference to the way the code actually runs, but the fact that these six values (previously a, b, x, y, u, and v) represent three distinct series is now very obvious to the reader. The three series are initialized and extended in such a way as to "make alike look alike" -- you can see at a glance how each successive element is computed from the prior two, and easily discern where there are and aren't dependencies between these values.

You could initialize these series as simply:

r = [b, a]
s = [0, 1]
t = [1, 0]

and the code would return the correct result, but to preserve the behavior of only keeping the last two elements (which I agree is a good space optimization) I've converted them to deques with maxlen=2. The deque abstracts away the business of automatically popping the unneeded values off of the left side, which helps declutter the "interesting" part of the implementation while still preserving the property of having it use constant space.

Question 8

I strongly disagree with using deques. There is so much unnecessary overhead for a simple algorithm. The original implementation follows the mathematical formulation but this version strays from it.

Question 9

I timed things, and this is rougly twice as slow as the original code with python3, and ten times as slow with pypy3.

Question 10

Good points! In most real-world situations, it's more important to optimize reading and debugging a particular function than constant factors of its actual runtime, because human time is much more expensive than CPU time. But in a situation where every nanosecond counts because you're operating at massive scale, you may not want to use Python at all (say, use Rust to write the performance-sensitive part of your code and call the compiled library via CFFI from your Python code).

Question 11

I'd be very interested in seeing other implementations that follow the description of the algorithm as described in Wiki (i.e. expressing it as the r, s, t series). On the way to this one I did a version with 2-tuples that probably performs better than deques, but I didn't like the way that it read.

Question 12

Inspite of all reasons, simple commenting and meaningful variable names would have been enough. Spirit of simplicity is lost. In fact, yours needs original one to show (a Mathl. newbie) how simple it is. Might be both complement each other, so in a sense both implementations are needed.

Question 13

Give your variables meaningful names reflecting their role in the algorithm.

Question 14

In the context of mathematics it can be hard to find meaningful variable names

Question 15

I can only think of good names for quotient and remainder. Nothing else really makes anything clearer

Question 16

@northerner It's hard to find meaningful names in computer programming too. Just seems like a cop out.

Question 17

@Peilonrayz you seriously seem to miss the point

Question 18

@Peilonrayz you don't backup your assertion at all. Take the Euclidean Algorithm for instance. It takes two arguments, commonly called a and b. Please tell me a better name for them?

Samwise Samwise 4,0107 silver badges14 bronze badges · Accepted Answer · 2020-04-09 16:41:41Z

Not being familiar with the algorithm in question, it was very non-obvious to me what the code did, and in general I think it's good to write code in such a way that it's comprehensible with minimal reasonable effort even if the reader isn't a domain expert (i.e. they should be able to look up documentation, look at your code, and see how the two relate). The main hindrances I faced were:

It wasn't immediately obvious that x, y, u, v represented two distinct series (where u was always a prior value of x, etc). In general there are a lot of variables in this code to keep track of, and not a lot of explanation of what they're all for.
Dependencies between the different values likewise were not obvious. Having all the assignments smushed onto one line made it hard to discern this visually; it's nice to use compact tuple assignments when the relationships are obvious, but it doesn't necessarily always improve readability.
There're no doc/comments explaining what's going on.
The name ext_ggT doesn't follow Python's snake_case naming convention and is a bit cryptic.

After reading the wiki link (thx Peilon) I was able to sort of reverse-engineer it and then I made some changes so that the code matches up more with my understanding based on the wiki article (and is commented so that anyone looking at this code side by side with the wiki article will immediately see what goes with what).

from collections import deque
from typing import Tuple
def extended_euclidean(a: int, b: int) -> Tuple[int, int, int]:
 """
 Returns (gcd, x, y) such that:
 gcd = greatest common divisor of (a, b)
 x, y = coefficients such that ax + by = gcd
 """
 # We only need to keep the last two elements of each series.
 r = deque([b, a], 2)
 s = deque([0, 1], 2)
 t = deque([1, 0], 2)
 # The next element of each series is a function of the previous two.
 # We stop building these series once r (the remainder) is zero; 
 # the final result comes from the iteration prior to that one.
 while r[-1] != 0:
 q = r[-2] // r[-1]
 r.append(r[-2] % r[-1])
 s.append(s[-2] - s[-1] * q)
 t.append(t[-2] - t[-1] * q)
 return r[-2], s[-2], t[-2]
assert extended_euclidean(240, 46) == (2, -9, 47)

The biggest change is that I've represented the various series described in the wiki article as iterables, rather than representing each as two scalars; this doesn't make much difference to the way the code actually runs, but the fact that these six values (previously a, b, x, y, u, and v) represent three distinct series is now very obvious to the reader. The three series are initialized and extended in such a way as to "make alike look alike" -- you can see at a glance how each successive element is computed from the prior two, and easily discern where there are and aren't dependencies between these values.

You could initialize these series as simply:

r = [b, a]
s = [0, 1]
t = [1, 0]

and the code would return the correct result, but to preserve the behavior of only keeping the last two elements (which I agree is a good space optimization) I've converted them to deques with maxlen=2. The deque abstracts away the business of automatically popping the unneeded values off of the left side, which helps declutter the "interesting" part of the implementation while still preserving the property of having it use constant space.

I strongly disagree with using deques. There is so much unnecessary overhead for a simple algorithm. The original implementation follows the mathematical formulation but this version strays from it.
I timed things, and this is rougly twice as slow as the original code with python3, and ten times as slow with pypy3.
Good points! In most real-world situations, it's more important to optimize reading and debugging a particular function than constant factors of its actual runtime, because human time is much more expensive than CPU time. But in a situation where every nanosecond counts because you're operating at massive scale, you may not want to use Python at all (say, use Rust to write the performance-sensitive part of your code and call the compiled library via CFFI from your Python code).
I'd be very interested in seeing other implementations that follow the description of the algorithm as described in Wiki (i.e. expressing it as the r, s, t series). On the way to this one I did a version with 2-tuples that probably performs better than deques, but I didn't like the way that it read.
Inspite of all reasons, simple commenting and meaningful variable names would have been enough. Spirit of simplicity is lost. In fact, yours needs original one to show (a Mathl. newbie) how simple it is. Might be both complement each other, so in a sense both implementations are needed.

Stack Exchange Network

Extended Euclidean algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Extended Euclidean algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions