How do you unit test an encoder?

Question 1

I have something like this:

public byte[] EncodeMyObject(MyObject obj)

I've been unit testing like this:

byte[] expectedResults = new byte[3]{ 0x01, 0x02, 0xFF };
Assert.IsEqual(expectedResults, EncodeMyObject(myObject));

EDIT: The two ways I've seen proposed are:

1) Using hardcoded expected values, like the above example.

2) Using a decoder to decode the encoded byte array and comparing the input/output objects.

The problem I see with method 1 is that it is very brittle and requires a lot of hard coded values.

The problem with method 2 is that testing the encoder depends on the decoder working correctly. If the encoder/decoder are broken equally (in the same place), then the tests could produce false positives.

These may very well be the only ways to test this type of method. If that's the case then fine. I'm asking the question to see if there are any better strategies for this type of testing. I can not reveal the internals of the particular encoder I am working on. I am asking in general how you would solve this type of problem, and I don't feel the internals are important. Assume that a given input object will always produce the same output byte array.

Question 2

How does myObject go from myObject to { 0x01, 0x02, 0xFF }? Can that algorithm be broken down and tested? The reason I ask is presently, it looks like you have a test that proves that one magic thing produces another magic thing. Your only confidence is that the one input produces the one output. If you can break down the algorithm, you can gain further confidence in the algorithm, and be less reliant on magical inputs and outputs.

Question 3

@Codism What if the encoder and decoder are broke in the same place?

Question 4

Tests are, by definition, doing something and checking to see if you got the expected results, which is what your test does. You would, of course, need to make sure you do enough tests like that to make sure you exercise all of your code and cover edge cases and other weirdness.

Question 5

@Justin984, well, now we're going deeper. I wouldn't expose those private internals as members of the Encoder's API, certainly not. I'd remove them from the Encoder entirely. Or rather, the Encoder would delegate off to somewhere else, a dependency. If it's a battle between an untestable monster method or a bunch of helper classes, I'm choosing the helper classes every single time. But again, I'm making uninformed inferences to your code at this point, because I can't see it. But if you want to gain confidence in your tests, having smaller methods doing less things is a way to get there.

Question 6

@Justin984 If the spec changes you change the expected output in your test and it now fails. Then, you change the encoder logic to pass. Seems exactly how TDD is supposed to work and it'll only fail when it should. I don't see how this makes it brittle.

Question 7

You're in a bit of an obnoxious situation there. If you had a static format you were encoding into, your first method would be the way to go. If it were just your own format, and nobody else had to decode than the second method would be the way to go. But you don't really fit into either of those categories.

What I'd do is try to break things down by the level of abstraction.

So I'd start with something at the bit level, that I'd test something like

bitWriter = new BitWriter();
bitWriter.writeInt(42, bits = 7);
assertEqual( bitWriter.data(), {0x42} )

So the idea is that the bitwriter knows how to write out the most primitive types of fields, like ints.

More complex types would be implemented using and tested something like:

bitWriter = new BitWriter();
writeDate(bitWriter, new Datetime(2001, 10, 4));
bitWriter2 = new BitWriter();
bitWriter2.writeInt(2001, 12)
bitWriter2.writeInt(10, 4)
bitWriter2.writeInt(4, 6)
assertEquals(bitWriter.data(), bitWriter2.data() )

Notice that this avoids any knowledge of how the actual bits get packed. That's tested by the previous test, and for this test we'll pretty much just assume that it works.

Then at the next level of abstraction we'd have

bitWriter = new BitWriter();
encodeObject(bitWriter, myObject);
bitWriter2 = new BitWriter();
bitWriter2.writeInt(42, 32)
writeDate(bitWriter2, new Datetime(2001, 10, 4));
writeVarString(bitWriter2, "alphanumeric");
assertEquals(bitWriter.data(), bitWriter2.data() )

so, again, we don't try to include the knowledge of how varstrings or dates or numbers are actually encoded. In this test, we are only interested in the encoding produces by encodeObject.

The end result is that if the format for dates is changed, you'll have to fix the tests that actually involve dates, but all other code and tests aren't concerned with how dates are actually encoded and once you update the code to make that work, all those tests will pass just fine.

Question 8

I like this. I guess this is what some of the other commenters were saying about breaking it into smaller pieces. It doesn't completely avoid the problem when the spec changes, but it makes it better.

Question 9

Depends. If the encoding is something completely fixed, where every implementation is supposed to create exactly the same output, it doesn't make sense to check anything other than verify that example inputs map to exactly the expected outputs. That is the most obvious test, and probably also the easiest to write.

If there is wiggle room with alternative outputs, as in the MPEG standard (e.g. there are certain operators you can apply to the input, but you are free to trade off encoding effort versus output quality or storage space), then it's better to apply the defined decoding strategy to the output and verify that it's the same as the input - or, if the encoding is lossy, that it's reasonably close to the original input. That is harder to program, but protects you against any future improvements that may be made to your encoder.

Question 10

Suppose you use the decoder and compare values. What if the encoder and decoder are both broke in the same place? The encoder encodes incorrectly, and the decoder decodes incorrectly, but the input/output objects are correct because the process was done incorrectly twice.

Question 11

@Justin984 then use so called "test vectors", know input/output pairs that you can use precisely to test a encoder and decoder

Question 12

@ratchet freak That puts me back to testing with expected values. Which is fine, that's what I'm currently doing, but it's a bit brittle so I was looking to see if there are better ways.

Question 13

Aside from carefully reading the standard and creating a test case for every rule, there is hardly a way to avoid that both an encoder and a decoder contain the same bug. For example, let's assume that "ABC" must be translated to "xyz" but the encoder doesn't know that and your decoder also wouldn't understand "xyz" if it ever encountered it. The handcrafted testcases do not contain the "ABC" sequence, because the programmer wasn't aware of that rule, and also a test with encoding/decoding random strings would incorrectly pass because both encoder and decoder ignore the problem.

Question 14

To help catching bugs affecting both decoders and encoders written by yourself due to missing knowledge, make an effort to obtain encoder outputs from other vendors; and also try to test your encoder's output on the third-party decoders. There isn't an alternative around it.

Question 15

Test that encode(decode(coded_value)) == coded_value and decode(encode(value)) == value. You can give a random input to the tests if you want.

It's still possible that both the encoder and decoder are broken in complimentary ways, but that seems pretty unlikely unless you have a conceptual misunderstanding of the encoding standard. Doing hardcoded tests of the encoder and decoder (like you're doing already) should guard against that.

If you have access to another implementation of this that's known to work, you can at least use it to get confidence that your implementation is good even if using it in the unit tests would be impossible.

Question 16

I agree that a complementary encoder/decoder error is unlikely in general. In my specific case, the code for the encoder/decoder classes is generated by another tool based on rules from a database. So complementary errors do happen occasionally.

Question 17

How can there be 'complimentary errors'? That implies that there is an external specification for the encoded form, and hence an external decoder.

Question 18

I don't understand your use of the word external. But there is a specification for how the data is encoded and also a decoder. A complimentary error is where the encoder and decoder both operate in a way that is complementary but that deviates from the specification. I have an example in the comments under the original question.

Question 19

If the encoder was supposed to implement ROT13 but accidentally did ROT14 and the decoder did too, then decode(encode('a')) == 'a' but the encoder is still broken. For things much more complicated than that, it's probably much less likely that that sort of thing would happen, but theoretically it could.

Question 20

@MichaelShaw just a piece of trivia, the encoder and decoder for ROT13 are the same; ROT13 is its own inverse. If you implemented ROT14 by mistake, then decode(encode(char)) would not equal char (it would equal char+2).

Question 21

Test to the requirements.

If the requirements is only 'encode to a byte stream that when decoded produces an equivalent object.', then just test the encoder by decoding. If you are writing both the encoder and the decoder, then just test them together. They can't have "matching errors". If they work together, then the test passes.

If there are other requirements for the data stream, then you will have to test them by examining the encoded data.

If the encoded format is predefined, then either you will have to verify the encoded data against the expected result, as you did, or (better) obtain a reference decoder that can be trusted to do the verification. Use of a reference decoder eliminates the possibility that you have misinterpreted the format specification.

Question 22

Depending on the testing framework and paradigm you're using, you can still use the Arrange Act Assert pattern for this like you've said.

[TestMethod]
public void EncodeMyObject_ForValidInputs_Encodes()
{
 //Arrange object under test
 MyEncoder encoderUnderTest = new MyEncoder();
 MyObject validObject = new MyOjbect();
 //arrange object for condition under test
 //Act
 byte[] actual = encoderUnderTest.EncodeMyObject(myObject);
 //Assert
 byte[] expected = new byte[3]{ 0x01, 0x02, 0xFF };
 Assert.IsEqual(expected, actual);
}

You should know requirements for EncodeMyObject() and can use this pattern to test against each of them for valid and invalid criteria, by arranging each of them and hardcoding the expected result for expected, similarly for the decoder.

Since the expected are hard coded, these will be fragile if you've got a massive change.

You may be able to automate with something parameter driven (have a look at Pex) or if you're doing DDD or BDD have a look at gerkin/cucumber.

Question 23

You get to decide what is important to you.

Is it important to you that an Object survives the round trip, and the exact wire format isn't really important? Or is the exact wire format an important part of the functionality of your encoder and decoder?

If the former, than just make sure that objects survive the round trip. If the encoder and decoder are both broken in exactly complementary ways, you don't really care.

If the latter, then you need to be testing that the wire format is as you expect for the given inputs. This means either testing the format directly, or else using a reference implementation. But having tested the basics, you may get value from additional round-trip tests, which should be easier to write in volume.

score 1 · Accepted Answer · 2013-02-14 04:58:01Z

You're in a bit of an obnoxious situation there. If you had a static format you were encoding into, your first method would be the way to go. If it were just your own format, and nobody else had to decode than the second method would be the way to go. But you don't really fit into either of those categories.

What I'd do is try to break things down by the level of abstraction.

So I'd start with something at the bit level, that I'd test something like

bitWriter = new BitWriter();
bitWriter.writeInt(42, bits = 7);
assertEqual( bitWriter.data(), {0x42} )

So the idea is that the bitwriter knows how to write out the most primitive types of fields, like ints.

More complex types would be implemented using and tested something like:

bitWriter = new BitWriter();
writeDate(bitWriter, new Datetime(2001, 10, 4));
bitWriter2 = new BitWriter();
bitWriter2.writeInt(2001, 12)
bitWriter2.writeInt(10, 4)
bitWriter2.writeInt(4, 6)
assertEquals(bitWriter.data(), bitWriter2.data() )

Notice that this avoids any knowledge of how the actual bits get packed. That's tested by the previous test, and for this test we'll pretty much just assume that it works.

Then at the next level of abstraction we'd have

bitWriter = new BitWriter();
encodeObject(bitWriter, myObject);
bitWriter2 = new BitWriter();
bitWriter2.writeInt(42, 32)
writeDate(bitWriter2, new Datetime(2001, 10, 4));
writeVarString(bitWriter2, "alphanumeric");
assertEquals(bitWriter.data(), bitWriter2.data() )

so, again, we don't try to include the knowledge of how varstrings or dates or numbers are actually encoded. In this test, we are only interested in the encoding produces by encodeObject.

The end result is that if the format for dates is changed, you'll have to fix the tests that actually involve dates, but all other code and tests aren't concerned with how dates are actually encoded and once you update the code to make that work, all those tests will pass just fine.

I like this. I guess this is what some of the other commenters were saying about breaking it into smaller pieces. It doesn't completely avoid the problem when the spec changes, but it makes it better.

Stack Exchange Network

How do you unit test an encoder?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How do you unit test an encoder?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions