Unit testing Markov chain code

Question 1

What are the best ways to unit test code that outputs random sequences satisfying specific conditions, such as Markov chains?

Let's be specific. There are two natural things to test:

That the initial state returned in each chain follows the user-specified probabilities, such that chain[0] == i with probability p_initial[i].
That the transitions between states are correct, that is, if chain[k] == i, then the probability of chain[k+1] == j is given by p_transition[i, j].

Now, I can think of two ways to do this.

Test that the code does what I want it to do. Generate a large number of sequences (say, 10000) and test that the frequencies for the initial state approximate p_initial. Next test that the frequencies of transitions to state j from state i follow p_transition.

The difficulty with this approach is that any finite sample will exhibit fluctuations from the prescribed probabilities. I thus have to set an acceptable level for these fluctuations. But this means that a) the test might miss actual errors in the code if they only change the frequencies a little bit, and b) after a refactoring I might get false failures due to a particular chain falling outside the accepted fluctuation level by pure chance. (I'm assuming I'm using a fixed seed for the random number generator so that at least without refactoring the tests should be reproducible.) Worse, the more I relax my acceptance criteria to mitigate problem b), the more I risk falling for a).

Moreover, I'm only testing a small subset of the requirements for the model (I also want the transitions to be independent, for instance). And compared to how simple the implementation for this code is (given a random number generator; see below), it doesn't seem worth it to add more bloat in the testing code.

The alternative is to assume that the random number generator was tested by its creators and works fine, and just test that my code calls it in the right way. So I can mock the RNG and check that my code calls it with the right arguments.

The problem with this approach is that it makes the test depend strongly on the implementation. For instance, I could choose the initial state by using something like numpy.random.choice in Python, or I could simply generate a random number between 0 and 1 (numpy.random.random) and implement my own logic for choosing the initial state based on that. Even within each of these choices, I could choose to order the states in any way. The tests must know what the code is doing, and that makes the tests fragile to refactoring.

So that's my problem: it seems wrong to couple the test to the implementation so strongly. But it also seems wrong to perform a lot of tests of my function that are in effect tests of the RNG (and not particularly stringent tests either).

Is there a different, better way to do this that allows me to test the code that I wrote while relying on the RNG writers to have checked their own code?

For definiteness, here is some (Python) code implementing a simple Markov chain generator

import numpy as np
class PlainMarkov(object):
 def __init__(self, p_initial, p_transition, rng=None):
 """ p_initial = sequence of initial-state probabilities
 p_transition = sequence of sequences for transition probabilities
 rng = random number generator
 """
 self.p_initial = np.asarray(p_initial)
 self.p_transition = np.asarray(p_transition)
 self.rng = rng is rng is not None else np.random.random.__self__
 def run(self):
 """ Draw one chain from the Markov distribution. """
 n_states = len(self.p_initial)
 state = self.rng.choice(n_states, p=self.p_initial)
 chain = []
 while state_numeric != 0:
 chain.append(state)
 state = self.rng.choice(n_states, p=self.p_transition[state])
 return chain

Question 2

3. Stub rng.choice() to return hardcoded values and check that run() returns the expected chain.

Question 3

@Goyo, this has similar problems to choice 2: a different implementation of the Markov chain generator might not even use rng.choice() at all. Besides, the output from rng.choice() depends on the arguments with which it's called, so hardcoding the output tests only part of the functionality.

Question 4

As a general principle, for unit testing the behavior under test must be deterministic because the discriminator function is binary (pass/fail). That is, regardless of the output of the whole system, each of the unit tests should be deterministic. Maybe cross reference softwareengineering.stackexchange.com/questions/133047/… .

Question 5

That wouldn't be coupling the tests to the implementation but writing different tests for different behaviors. An you course you stub to return certain values when passed certain parameters.

Question 6

@Legendre17 But you cannot never assert that a particular sample comes from a given distribution. That is a fact of life and unit testing won't allow you to work around it. The best you can do to test that definition of correctness is statistical testing, with all the limitations they have.

Question 7

The alternative is to assume that the random number generator was tested by its creators and works fine, and just test that my code calls it in the right way. So I can mock the RNG and check that my code calls it with the right arguments.

This is, of course, the correct answer.

Unless you are actually in the business of designing your own RNG, you want to treat randomness as something to be provided to your system as an input, and thus your tests can choose what inputs to provide on the "random" channel.

Even within each of these choices, I could choose to order the states in any way. The tests must know what the code is doing, and that makes the tests fragile to refactoring.

There are two responses to this

First: if the observable behavior of the system is altered by a change to the implementation, that change is not a refactoring.

Second: you don't have to over-fit your tests. If you can describe the constraints that are not satisfied by incorrect outputs, the write your tests to check the output against the constraints, rather than worrying that the output matches some previously calculated value.

This is property based testing vs example based testing in a nutshell. I recommend Scott Wlaschin's An Introduction to Property Based Testing if you aren't already familiar with the subject.

This discussion of the Fischer fairy chess problem describes some of the concerns, albeit on a simpler sort of problem with a more limited range of outputs.

Question 8

that change is not a refactoring. - really? So changing random.nextInt(0,2) == 0 to random.nextBoolean() is not a refactoring?

Question 9

Refactoring, by definition [refactoring.com/] [en.wikipedia.org/wiki/Code_refactoring], does not alter the observable behaviour of the system.

Question 10

A unittest should be what the name implies, a test of a specific unit.

That generally means you mock away all dependencies. So if you made a fake random number generator that returned a fixed set of number (1, 7, 4 , 3, 9, 2 for example and always that sequence) then you could test that you generate a very specific Markov chain based on that. And your markov chain generate function should be one that gives a very specific chain based on a sequence of random numbers. Even if you refactor your function you would expect that the same random number sequence as input gives that same results.

You are not going to show with a unittest that your function will generate some specific probability distribution. You are just trying to show that if you know exactly which numbers are generated out of your random number generator that your algorithm gives that expected output. If need be you can create a second test based on a different set of random number generated and show that one works too. If both those tests pass you don't know your function will work always, you just know that it is likely enough to work so that it's worth the effort doing more complex tests/see what happens in production.

If course you can also do some kind of automated integration test/load test/statistical analysis on a large number of runs. but that's not a unittest. And with those tests you have do deal with statistical unlikely scenario's (like generate 20 times 0 in a row out of your random number generator).

Think of your function as a blackjack scoring function. The card that's being drawn is random, but if i have a 8 and a Queen my score should always be 18 (and i'm alive, etc). You don't want to tests probabilistic behavior you want to tests very deterministic results after your random/probability function has actually been executed

Question 11

But there are several different but equally valid ways in which an implementation could take a sequence of numbers from an RNG and turn them into a sequence of states from a Markov chain. So even if I fix the RNG output, the output from the Markov chain generator is implementation-dependent. What you suggest, I think, is essentially to make the specific way in which this transformation is done a part of the interface. But that doesn't seem much better than making the test depend on a hidden implementation detail.

Question 12

I would argue that making an explicit transformation between random numbers and sequences is much better exactly because it is testable and you could show other people the algorithm and talk about the correctness. As long as you used the same algorithm you could refactor and keep the tests. It's exactly why hidden implementation details are worse. You are testing a specific implementation of your interface. Of course if you had a different implementation you would have to write new tests. Anyone using your interface would be tested with a mock implementation with a (static) mock sequence

Question 13

That's a fair point, but this approach makes the interface cater to the testing framework more than to the user. As a user, I wouldn't care how the transformation from random numbers to state chains is performed. If I had two implementations that differed only in this transformation, I would think of them as having the same interface because nothing in my code would need to change to account for this. Is the best way really to make an implementation detail public just so I can write the tests?

Question 14

@Legendre17 deterministic unit tests do not lock you to a particular implementation. They make it very clear when your implementation has functionally changed. You want to play with functionally different implementations? Fine, do it. Just write each kind its own unit tests. Switch tests when you switch implementations. You want to write statistical tests that don't care about implementation? Fine do it. Just don't call these slow things unit tests. Run slow tests separately. Nothing Batavia has said dictates any interface details. Just that you use tests wisely.

Question 15

@candied_orange I think I'm not explaining this right. The only functional requirement for writing a Markov generator is that it generates chains drawn from the appropriate distribution. Given a PRNG, such a class is trivial to implement, but that can be done in several ways that use the PRNG differently. These aren't functionally different implementations, they differ only in their internals. So if I write a test for each of them, I'm writing tests that depend on details that are not part of the interface of the class.

Question 16

Perhaps I'm missing something, but you are aware that you can use random.seed()?

Use that to set some specific seed. By hand validate the the answer. And store / validate that you get those states after a run with that seeded input value.

Then maybe try another seed, and run, and validate (by hand - some other mechanism) those answers. And validate that your markov chain evaluator gets those answers too.

Sorry if I missed something, but I hope that helped...

Question 17

Yes, I fix the random generator seed to keep the tests deterministic. The problem is that there are different ways in which the code can be implemented that are all correct but can lead to different results, even though the PRNG sequence is fixed. So if I check that I get a particular chain for a particular seed, this can fail when I refactor the code just because an implementation detail changed. I was hoping there's a better way.

Question 18

I'm afraid I don't follow how that's possible. A markov chain becomes completely deterministic and you can compute the 'right answer' given a hard-wiring of the sequence of 'dice rolls'. A markov chain is a simple state machine, where the rules for transitioning between states are probabilistic. If you fix the rolls on the dice, you fix the sequence of states. If you actually change your set of states (is that what you mean by refactor?) - then of course you will get a different answer - you have a different markov chain! @Legendre17

Question 19

What I'm saying is that even if the dice rolls are fixed, we still have freedom in choosing what each roll means. If we roll 1, does that map to state 1 or state 5? More to the point, suppose I decide to write (rng.random() > 1 - np.cumsum(p)).nonzero()[0][0] instead of rng.choice(len(p), p=p) in the run method. Then I'll get the exact same statistics, but the outcome for each choice of seed will be different.

Question 20

@Legendre17. That choice is enshrined in the test results. You cannot renumber your state's or change your random number generator (which the above effectively does) . It's worth remembering the point of unit tests or regression tests. It's a tool! The world doesn't end when they fail. Your attention is caught and you are warned to look more closely at your code change. What you describe is not a real problem.

Question 21

I think in this case, the most you can test on an interface level is whether the resulting chain is a valid Markov chain - ie. whether the initial element is from the p_initial list, and whether each subsequent element is a valid transition from the element before it. And for that, the test doesn't need to know about the implementation at all.

I think having a further test of whether the results are statistically correct may be a good idea, but I wouldn't call that test a "unit test".

You also can have unit tests of specific implementations, which rely on knowledge of those implementations. Whether that's a useful thing to have depends on the situation. In this case, I wouldn't consider it to be, but YMMV.

score 1 · Accepted Answer · 2018-08-07 01:29:11Z

The alternative is to assume that the random number generator was tested by its creators and works fine, and just test that my code calls it in the right way. So I can mock the RNG and check that my code calls it with the right arguments.

This is, of course, the correct answer.

Unless you are actually in the business of designing your own RNG, you want to treat randomness as something to be provided to your system as an input, and thus your tests can choose what inputs to provide on the "random" channel.

Even within each of these choices, I could choose to order the states in any way. The tests must know what the code is doing, and that makes the tests fragile to refactoring.

There are two responses to this

First: if the observable behavior of the system is altered by a change to the implementation, that change is not a refactoring.

Second: you don't have to over-fit your tests. If you can describe the constraints that are not satisfied by incorrect outputs, the write your tests to check the output against the constraints, rather than worrying that the output matches some previously calculated value.

This is property based testing vs example based testing in a nutshell. I recommend Scott Wlaschin's An Introduction to Property Based Testing if you aren't already familiar with the subject.

This discussion of the Fischer fairy chess problem describes some of the concerns, albeit on a simpler sort of problem with a more limited range of outputs.

that change is not a refactoring. - really? So changing random.nextInt(0,2) == 0 to random.nextBoolean() is not a refactoring?
Refactoring, by definition [refactoring.com/] [en.wikipedia.org/wiki/Code_refactoring], does not alter the observable behaviour of the system.

Stack Exchange Network

Unit testing Markov chain code

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Unit testing Markov chain code

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions