A recent question on credit card validation here on Code Review, led me down a dark rabbit hole of check digit algorithms. I took a stop at the Verhoeff algorithm and tried to implement it myself.
That lead to the following piece of code:
class Verhoeff:
"""Calculate and verify check digits using Verhoeff's algorithm"""
MULTIPLICATION_TABLE = (
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
(1, 2, 3, 4, 0, 6, 7, 8, 9, 5),
(2, 3, 4, 0, 1, 7, 8, 9, 5, 6),
(3, 4, 0, 1, 2, 8, 9, 5, 6, 7),
(4, 0, 1, 2, 3, 9, 5, 6, 7, 8),
(5, 9, 8, 7, 6, 0, 4, 3, 2, 1),
(6, 5, 9, 8, 7, 1, 0, 4, 3, 2),
(7, 6, 5, 9, 8, 2, 1, 0, 4, 3),
(8, 7, 6, 5, 9, 3, 2, 1, 0, 4),
(9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
)
INVERSE_TABLE = (0, 4, 3, 2, 1, 5, 6, 7, 8, 9)
PERMUTATION_TABLE = (
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
(1, 5, 7, 6, 2, 8, 3, 0, 9, 4),
(5, 8, 0, 3, 7, 9, 6, 1, 4, 2),
(8, 9, 1, 6, 0, 4, 3, 5, 2, 7),
(9, 4, 5, 3, 1, 2, 6, 8, 7, 0),
(4, 2, 8, 6, 5, 7, 3, 9, 0, 1),
(2, 7, 9, 3, 8, 0, 6, 4, 1, 5),
(7, 0, 4, 6, 9, 1, 3, 2, 5, 8)
)
@classmethod
def calculate(cls, input_: str) -> str:
"""Calculate the check digit using Verhoeff's algorithm"""
check_digit = 0
for i, digit in enumerate(reversed(input_), 1):
col_idx = cls.PERMUTATION_TABLE[i % 8][int(digit)]
check_digit = cls.MULTIPLICATION_TABLE[check_digit][col_idx]
return str(cls.INVERSE_TABLE[check_digit])
@classmethod
def validate(cls, input_: str) -> bool:
"""Validate the check digit using Verhoeff's algorithm"""
check_digit = 0
for i, digit in enumerate(reversed(input_)):
col_idx = cls.PERMUTATION_TABLE[i % 8][int(digit)]
check_digit = cls.MULTIPLICATION_TABLE[check_digit][col_idx]
return cls.INVERSE_TABLE[check_digit] == 0
I chose to implement it as a class with two class methods because I plan to include other algorithms as well and structuring the code this way seemed reasonable to me.
I'm particularly interested in your feedback on the following aspects:
- What do you think about the API?
calculate(input_: str) -> str
andvalidate(input_: str) -> bool
seem reasonable and symmetric, but I could also imagine using something likecalculate(input_: Sequence[int]) -> int
/validate(input_: Sequence[int], int) -> bool
. - There seems to be reasonable amount of code duplication between the two functions
calculate
/validate
, but I couldn't really wrap my head around how to define one in respect to the other.
In addition to the class above, I also decided to take a shot at some unit tests for the algorithm using pytest.
import string
import itertools
import pytest
from check_sums import Verhoeff
# modification and utility functions to test the check digit algorihm robustness
DIGIT_REPLACEMENTS = {
digit: string.digits.replace(digit, "") for digit in string.digits
}
def single_digit_modifications(input_):
"""Generate all single digit modifications of a numerical input sequence"""
for i, digit in enumerate(input_):
for replacement in DIGIT_REPLACEMENTS[digit]:
yield input_[:i] + replacement + input_[i+1:]
def transposition_modifications(input_):
"""Pairwise transpose of all neighboring digits
The algorithm tries to take care that transpositions really change the
input. This is done to make sure that those permutations actually alter the
input."""
for i, digit in enumerate(input_[:-1]):
if digit != input_[i+1]:
yield input_[:i] + input_[i+1] + digit + input_[i+2:]
def flatten(iterable_of_iterables):
"""Flatten one level of nesting
Borrowed from
https://docs.python.org/3/library/itertools.html#itertools-recipes
"""
return itertools.chain.from_iterable(iterable_of_iterables)
# Verhoeff algoritm related tests
# Test data taken from
# https://en.wikibooks.org/wiki/Algorithm_Implementation/Checksums/Verhoeff_Algorithm
VALID_VERHOEF_INPUTS = [
"2363", "758722", "123451", "1428570", "1234567890120",
"84736430954837284567892"
]
@pytest.mark.parametrize("input_", VALID_VERHOEF_INPUTS)
def test_verhoeff_calculate_validate(input_):
"""Test Verhoeff.calculate/Verhoeff.validate with known valid inputs"""
assert Verhoeff.calculate(input_[:-1]) == input_[-1]\
and Verhoeff.validate(input_)
@pytest.mark.parametrize(
"modified_input",
flatten(single_digit_modifications(i) for i in VALID_VERHOEF_INPUTS)
)
def test_verhoeff_single_digit_modifications(modified_input):
"""Test if single digit modifications can be detected"""
assert not Verhoeff.validate(modified_input)
@pytest.mark.parametrize(
"modified_input",
flatten(transposition_modifications(i) for i in VALID_VERHOEF_INPUTS)
)
def test_verhoeff_transposition_modifications(modified_input):
"""Test if transposition modifications can be detected"""
assert not Verhoeff.validate(modified_input)
The tests cover known precomputed input and check digit values, as well as some of the basic error classes (single-digit errors, transpositions) the checksum was designed to detect. I decided to actually generate all the modified inputs in the test fixture so that it would be easier to see which of the modified inputs cause a failure of the algorithm. So far I have found none.
Note: There is a thematically related question of mine on optimizing the Luhn check digit algorithm.
2 Answers 2
Your tests look ok. I have three concerns:
(削除) If I read correctly, your "single digit modifications" test cycle is going to have over 1000000000000000000000 cycles. That's... not practical. pick a compromise. (削除ここまで)- The positive tests are checking
calculate
andvalidate
. I see no reason not to check both in your negative tests too. - You're only checking syntactically valid input. This ties into your question about what the type signature should be.
- You have a few options for your type signature. Without going over the compromises in detail, I would suggest that the first line in
calculate
andvalidate
should be an isdigit() check, and raise an exception if it fails. - Whatever you do for types, your tests should check that it's handling edge cases as intended, whatever it is you decide you intend.
- empty strings
- a single digit (could break
validate
) - all zeros
- whitespace in different configurations
- illegal characters
- You have a few options for your type signature. Without going over the compromises in detail, I would suggest that the first line in
You don't have to address all these points, depending on the use-cases of this project, and whatever else is going on in your life, it may be fine to call it good enough as-is.
-
\$\begingroup\$ Thanks for the feedback! All the tests, including the same test set for some other algorithms I've cooked up, run in under 4 seconds, so thats okay at the moment. Maybe I should check that the modification function actually does what it's supposed to do ;-) Regarding
isdigit()
: as per the docs it's not completely the right fit for the task since it also permits "digits" that cannot be handled in base 10 and some other funny stuff. I have some plans for this, but this will go in another question. \$\endgroup\$AlexV– AlexV2019年06月19日 18:31:22 +00:00Commented Jun 19, 2019 at 18:31 -
\$\begingroup\$ Assuming you don't want to worry about Arabic numerals, a regex would serve for input validation. :) \$\endgroup\$ShapeOfMatter– ShapeOfMatter2019年06月19日 18:38:28 +00:00Commented Jun 19, 2019 at 18:38
-
\$\begingroup\$ I don't see any "problems" with the single-modification function, although it is a little convoluted. My actual guess is that pytest isn't completing the iteration for some reason. \$\endgroup\$ShapeOfMatter– ShapeOfMatter2019年06月19日 18:44:37 +00:00Commented Jun 19, 2019 at 18:44
-
1\$\begingroup\$ I just checked the output of the modificication functions and it seems to be what I expect. The function modifies the string in one place and one place only. There is no combinatorics involved, so the number of calls scales linearly with the number of digits used in the test (should be 59 at the moment). For every digit, there are 9 ways the digit can be altered (ignoring the rest of the number), so that would be 531 modified inputs. \$\endgroup\$AlexV– AlexV2019年06月19日 19:02:46 +00:00Commented Jun 19, 2019 at 19:02
I've not had much exposure with good tests. So this focuses on the first codeblock.
- I think
*_TABLE
isn't that useful. InsteadPERMUTATIONS
andINVERSE
looks nicer to me. - Given that
calculate
andvalidate
are almost duplicate functions you should probably define a private helper to handle the common code.
class Verhoeff:
...
@classmethod
def _find_check_digit(cls, digits):
check_digit = 0
for i, digit in digits:
col_idx = cls.PERMUTATIONS[i % 8][int(digit)]
check_digit = cls.MULTIPLICATIONS[check_digit][col_idx]
return check_digit
@classmethod
def calculate(cls, input_: str) -> str:
"""Calculate the check digit using Verhoeff's algorithm"""
check_digit = cls._find_check_digit(enumerate(reversed(input_), 1))
return str(cls.INVERSES[check_digit])
@classmethod
def validate(cls, input_: str) -> bool:
"""Validate the check digit using Verhoeff's algorithm"""
check_digit = cls._find_check_digit(enumerate(reversed(input_)))
return cls.INVERSES[check_digit] == 0
Explore related questions
See similar questions with these tags.
validate
?assert Verhoeff.calculate(input_[:-1]) == input_[-1]\ and Verhoeff.validate(input_)
\$\endgroup\$