Verhoeff check digit algorithm

Question 1

A recent question on credit card validation here on Code Review, led me down a dark rabbit hole of check digit algorithms. I took a stop at the Verhoeff algorithm and tried to implement it myself.

That lead to the following piece of code:

class Verhoeff:
 """Calculate and verify check digits using Verhoeff's algorithm"""
 MULTIPLICATION_TABLE = (
 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
 (1, 2, 3, 4, 0, 6, 7, 8, 9, 5),
 (2, 3, 4, 0, 1, 7, 8, 9, 5, 6),
 (3, 4, 0, 1, 2, 8, 9, 5, 6, 7),
 (4, 0, 1, 2, 3, 9, 5, 6, 7, 8),
 (5, 9, 8, 7, 6, 0, 4, 3, 2, 1),
 (6, 5, 9, 8, 7, 1, 0, 4, 3, 2),
 (7, 6, 5, 9, 8, 2, 1, 0, 4, 3),
 (8, 7, 6, 5, 9, 3, 2, 1, 0, 4),
 (9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
 )
 INVERSE_TABLE = (0, 4, 3, 2, 1, 5, 6, 7, 8, 9)
 PERMUTATION_TABLE = (
 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
 (1, 5, 7, 6, 2, 8, 3, 0, 9, 4),
 (5, 8, 0, 3, 7, 9, 6, 1, 4, 2),
 (8, 9, 1, 6, 0, 4, 3, 5, 2, 7),
 (9, 4, 5, 3, 1, 2, 6, 8, 7, 0),
 (4, 2, 8, 6, 5, 7, 3, 9, 0, 1),
 (2, 7, 9, 3, 8, 0, 6, 4, 1, 5),
 (7, 0, 4, 6, 9, 1, 3, 2, 5, 8)
 )
 @classmethod
 def calculate(cls, input_: str) -> str:
 """Calculate the check digit using Verhoeff's algorithm"""
 check_digit = 0
 for i, digit in enumerate(reversed(input_), 1):
 col_idx = cls.PERMUTATION_TABLE[i % 8][int(digit)]
 check_digit = cls.MULTIPLICATION_TABLE[check_digit][col_idx]
 return str(cls.INVERSE_TABLE[check_digit])
 @classmethod
 def validate(cls, input_: str) -> bool:
 """Validate the check digit using Verhoeff's algorithm"""
 check_digit = 0
 for i, digit in enumerate(reversed(input_)):
 col_idx = cls.PERMUTATION_TABLE[i % 8][int(digit)]
 check_digit = cls.MULTIPLICATION_TABLE[check_digit][col_idx]
 return cls.INVERSE_TABLE[check_digit] == 0

I chose to implement it as a class with two class methods because I plan to include other algorithms as well and structuring the code this way seemed reasonable to me.

I'm particularly interested in your feedback on the following aspects:

What do you think about the API? calculate(input_: str) -> str and validate(input_: str) -> bool seem reasonable and symmetric, but I could also imagine using something like calculate(input_: Sequence[int]) -> int/validate(input_: Sequence[int], int) -> bool.
There seems to be reasonable amount of code duplication between the two functions calculate/validate, but I couldn't really wrap my head around how to define one in respect to the other.

In addition to the class above, I also decided to take a shot at some unit tests for the algorithm using pytest.

import string
import itertools
import pytest
from check_sums import Verhoeff
# modification and utility functions to test the check digit algorihm robustness
DIGIT_REPLACEMENTS = {
 digit: string.digits.replace(digit, "") for digit in string.digits
}
def single_digit_modifications(input_):
 """Generate all single digit modifications of a numerical input sequence"""
 for i, digit in enumerate(input_):
 for replacement in DIGIT_REPLACEMENTS[digit]:
 yield input_[:i] + replacement + input_[i+1:]
def transposition_modifications(input_):
 """Pairwise transpose of all neighboring digits
 The algorithm tries to take care that transpositions really change the
 input. This is done to make sure that those permutations actually alter the
 input."""
 for i, digit in enumerate(input_[:-1]):
 if digit != input_[i+1]:
 yield input_[:i] + input_[i+1] + digit + input_[i+2:]
def flatten(iterable_of_iterables):
 """Flatten one level of nesting
 Borrowed from
 https://docs.python.org/3/library/itertools.html#itertools-recipes
 """
 return itertools.chain.from_iterable(iterable_of_iterables)
# Verhoeff algoritm related tests
# Test data taken from 
# https://en.wikibooks.org/wiki/Algorithm_Implementation/Checksums/Verhoeff_Algorithm
VALID_VERHOEF_INPUTS = [
 "2363", "758722", "123451", "1428570", "1234567890120",
 "84736430954837284567892"
]
@pytest.mark.parametrize("input_", VALID_VERHOEF_INPUTS)
def test_verhoeff_calculate_validate(input_):
 """Test Verhoeff.calculate/Verhoeff.validate with known valid inputs"""
 assert Verhoeff.calculate(input_[:-1]) == input_[-1]\
 and Verhoeff.validate(input_)
@pytest.mark.parametrize(
 "modified_input",
 flatten(single_digit_modifications(i) for i in VALID_VERHOEF_INPUTS)
)
def test_verhoeff_single_digit_modifications(modified_input):
 """Test if single digit modifications can be detected"""
 assert not Verhoeff.validate(modified_input)
@pytest.mark.parametrize(
 "modified_input",
 flatten(transposition_modifications(i) for i in VALID_VERHOEF_INPUTS)
)
def test_verhoeff_transposition_modifications(modified_input):
 """Test if transposition modifications can be detected"""
 assert not Verhoeff.validate(modified_input)

The tests cover known precomputed input and check digit values, as well as some of the basic error classes (single-digit errors, transpositions) the checksum was designed to detect. I decided to actually generate all the modified inputs in the test fixture so that it would be easier to see which of the modified inputs cause a failure of the algorithm. So far I have found none.

Note: There is a thematically related question of mine on optimizing the Luhn check digit algorithm.

Question 2

Doesn't this heavily hint at simplification of validate? assert Verhoeff.calculate(input_[:-1]) == input_[-1]\ and Verhoeff.validate(input_)

Question 3

Your tests look ok. I have three concerns:

~~(削除) If I read correctly, your "single digit modifications" test cycle is going to have over 1000000000000000000000 cycles. That's... not practical. pick a compromise. (削除ここまで)~~
The positive tests are checking calculate and validate. I see no reason not to check both in your negative tests too.
You're only checking syntactically valid input. This ties into your question about what the type signature should be.
- You have a few options for your type signature. Without going over the compromises in detail, I would suggest that the first line in calculate and validate should be an isdigit() check, and raise an exception if it fails.
- Whatever you do for types, your tests should check that it's handling edge cases as intended, whatever it is you decide you intend.
  - empty strings
  - a single digit (could break validate)
  - all zeros
  - whitespace in different configurations
  - illegal characters

You don't have to address all these points, depending on the use-cases of this project, and whatever else is going on in your life, it may be fine to call it good enough as-is.

Question 4

Thanks for the feedback! All the tests, including the same test set for some other algorithms I've cooked up, run in under 4 seconds, so thats okay at the moment. Maybe I should check that the modification function actually does what it's supposed to do ;-) Regarding isdigit(): as per the docs it's not completely the right fit for the task since it also permits "digits" that cannot be handled in base 10 and some other funny stuff. I have some plans for this, but this will go in another question.

Question 5

Assuming you don't want to worry about Arabic numerals, a regex would serve for input validation. :)

Question 6

I don't see any "problems" with the single-modification function, although it is a little convoluted. My actual guess is that pytest isn't completing the iteration for some reason.

Question 7

I just checked the output of the modificication functions and it seems to be what I expect. The function modifies the string in one place and one place only. There is no combinatorics involved, so the number of calls scales linearly with the number of digits used in the test (should be 59 at the moment). For every digit, there are 9 ways the digit can be altered (ignoring the rest of the number), so that would be 531 modified inputs.

Question 8

I've not had much exposure with good tests. So this focuses on the first codeblock.

I think *_TABLE isn't that useful. Instead PERMUTATIONS and INVERSE looks nicer to me.
Given that calculate and validate are almost duplicate functions you should probably define a private helper to handle the common code.

class Verhoeff:
 ...
 @classmethod
 def _find_check_digit(cls, digits):
 check_digit = 0
 for i, digit in digits:
 col_idx = cls.PERMUTATIONS[i % 8][int(digit)]
 check_digit = cls.MULTIPLICATIONS[check_digit][col_idx]
 return check_digit
 @classmethod
 def calculate(cls, input_: str) -> str:
 """Calculate the check digit using Verhoeff's algorithm"""
 check_digit = cls._find_check_digit(enumerate(reversed(input_), 1))
 return str(cls.INVERSES[check_digit])
 @classmethod
 def validate(cls, input_: str) -> bool:
 """Validate the check digit using Verhoeff's algorithm"""
 check_digit = cls._find_check_digit(enumerate(reversed(input_)))
 return cls.INVERSES[check_digit] == 0

ShapeOfMatter ShapeOfMatter 4,4377 silver badges25 bronze badges · Accepted Answer · 2019-06-19 18:08:28Z

Your tests look ok. I have three concerns:

~~(削除) If I read correctly, your "single digit modifications" test cycle is going to have over 1000000000000000000000 cycles. That's... not practical. pick a compromise. (削除ここまで)~~
The positive tests are checking calculate and validate. I see no reason not to check both in your negative tests too.
You're only checking syntactically valid input. This ties into your question about what the type signature should be.
- You have a few options for your type signature. Without going over the compromises in detail, I would suggest that the first line in calculate and validate should be an isdigit() check, and raise an exception if it fails.
- Whatever you do for types, your tests should check that it's handling edge cases as intended, whatever it is you decide you intend.
  - empty strings
  - a single digit (could break validate)
  - all zeros
  - whitespace in different configurations
  - illegal characters

You don't have to address all these points, depending on the use-cases of this project, and whatever else is going on in your life, it may be fine to call it good enough as-is.

Thanks for the feedback! All the tests, including the same test set for some other algorithms I've cooked up, run in under 4 seconds, so thats okay at the moment. Maybe I should check that the modification function actually does what it's supposed to do ;-) Regarding isdigit(): as per the docs it's not completely the right fit for the task since it also permits "digits" that cannot be handled in base 10 and some other funny stuff. I have some plans for this, but this will go in another question.
Assuming you don't want to worry about Arabic numerals, a regex would serve for input validation. :)
I don't see any "problems" with the single-modification function, although it is a little convoluted. My actual guess is that pytest isn't completing the iteration for some reason.
I just checked the output of the modificication functions and it seems to be what I expect. The function modifies the string in one place and one place only. There is no combinatorics involved, so the number of calls scales linearly with the number of digits used in the test (should be 59 at the moment). For every digit, there are 9 ways the digit can be altered (ignoring the rest of the number), so that would be 531 modified inputs.

Stack Exchange Network

Verhoeff check digit algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Verhoeff check digit algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions