I have something like this:
public byte[] EncodeMyObject(MyObject obj)
I've been unit testing like this:
byte[] expectedResults = new byte[3]{ 0x01, 0x02, 0xFF };
Assert.IsEqual(expectedResults, EncodeMyObject(myObject));
EDIT: The two ways I've seen proposed are:
1) Using hardcoded expected values, like the above example.
2) Using a decoder to decode the encoded byte array and comparing the input/output objects.
The problem I see with method 1 is that it is very brittle and requires a lot of hard coded values.
The problem with method 2 is that testing the encoder depends on the decoder working correctly. If the encoder/decoder are broken equally (in the same place), then the tests could produce false positives.
These may very well be the only ways to test this type of method. If that's the case then fine. I'm asking the question to see if there are any better strategies for this type of testing. I can not reveal the internals of the particular encoder I am working on. I am asking in general how you would solve this type of problem, and I don't feel the internals are important. Assume that a given input object will always produce the same output byte array.
6 Answers 6
You're in a bit of an obnoxious situation there. If you had a static format you were encoding into, your first method would be the way to go. If it were just your own format, and nobody else had to decode than the second method would be the way to go. But you don't really fit into either of those categories.
What I'd do is try to break things down by the level of abstraction.
So I'd start with something at the bit level, that I'd test something like
bitWriter = new BitWriter();
bitWriter.writeInt(42, bits = 7);
assertEqual( bitWriter.data(), {0x42} )
So the idea is that the bitwriter knows how to write out the most primitive types of fields, like ints.
More complex types would be implemented using and tested something like:
bitWriter = new BitWriter();
writeDate(bitWriter, new Datetime(2001, 10, 4));
bitWriter2 = new BitWriter();
bitWriter2.writeInt(2001, 12)
bitWriter2.writeInt(10, 4)
bitWriter2.writeInt(4, 6)
assertEquals(bitWriter.data(), bitWriter2.data() )
Notice that this avoids any knowledge of how the actual bits get packed. That's tested by the previous test, and for this test we'll pretty much just assume that it works.
Then at the next level of abstraction we'd have
bitWriter = new BitWriter();
encodeObject(bitWriter, myObject);
bitWriter2 = new BitWriter();
bitWriter2.writeInt(42, 32)
writeDate(bitWriter2, new Datetime(2001, 10, 4));
writeVarString(bitWriter2, "alphanumeric");
assertEquals(bitWriter.data(), bitWriter2.data() )
so, again, we don't try to include the knowledge of how varstrings or dates or numbers are actually encoded. In this test, we are only interested in the encoding produces by encodeObject.
The end result is that if the format for dates is changed, you'll have to fix the tests that actually involve dates, but all other code and tests aren't concerned with how dates are actually encoded and once you update the code to make that work, all those tests will pass just fine.
-
I like this. I guess this is what some of the other commenters were saying about breaking it into smaller pieces. It doesn't completely avoid the problem when the spec changes, but it makes it better.ConditionRacer– ConditionRacer2013年02月14日 14:53:18 +00:00Commented Feb 14, 2013 at 14:53
Depends. If the encoding is something completely fixed, where every implementation is supposed to create exactly the same output, it doesn't make sense to check anything other than verify that example inputs map to exactly the expected outputs. That is the most obvious test, and probably also the easiest to write.
If there is wiggle room with alternative outputs, as in the MPEG standard (e.g. there are certain operators you can apply to the input, but you are free to trade off encoding effort versus output quality or storage space), then it's better to apply the defined decoding strategy to the output and verify that it's the same as the input - or, if the encoding is lossy, that it's reasonably close to the original input. That is harder to program, but protects you against any future improvements that may be made to your encoder.
-
2Suppose you use the decoder and compare values. What if the encoder and decoder are both broke in the same place? The encoder encodes incorrectly, and the decoder decodes incorrectly, but the input/output objects are correct because the process was done incorrectly twice.ConditionRacer– ConditionRacer2013年02月13日 18:17:55 +00:00Commented Feb 13, 2013 at 18:17
-
@Justin984 then use so called "test vectors", know input/output pairs that you can use precisely to test a encoder and decoderratchet freak– ratchet freak2013年02月13日 18:56:31 +00:00Commented Feb 13, 2013 at 18:56
-
@ratchet freak That puts me back to testing with expected values. Which is fine, that's what I'm currently doing, but it's a bit brittle so I was looking to see if there are better ways.ConditionRacer– ConditionRacer2013年02月13日 19:22:44 +00:00Commented Feb 13, 2013 at 19:22
-
1Aside from carefully reading the standard and creating a test case for every rule, there is hardly a way to avoid that both an encoder and a decoder contain the same bug. For example, let's assume that "ABC" must be translated to "xyz" but the encoder doesn't know that and your decoder also wouldn't understand "xyz" if it ever encountered it. The handcrafted testcases do not contain the "ABC" sequence, because the programmer wasn't aware of that rule, and also a test with encoding/decoding random strings would incorrectly pass because both encoder and decoder ignore the problem.user281377– user2813772013年02月13日 21:54:10 +00:00Commented Feb 13, 2013 at 21:54
-
1To help catching bugs affecting both decoders and encoders written by yourself due to missing knowledge, make an effort to obtain encoder outputs from other vendors; and also try to test your encoder's output on the third-party decoders. There isn't an alternative around it.rwong– rwong2013年02月14日 05:20:57 +00:00Commented Feb 14, 2013 at 5:20
Test that encode(decode(coded_value)) == coded_value
and decode(encode(value)) == value
. You can give a random input to the tests if you want.
It's still possible that both the encoder and decoder are broken in complimentary ways, but that seems pretty unlikely unless you have a conceptual misunderstanding of the encoding standard. Doing hardcoded tests of the encoder and decoder (like you're doing already) should guard against that.
If you have access to another implementation of this that's known to work, you can at least use it to get confidence that your implementation is good even if using it in the unit tests would be impossible.
-
I agree that a complementary encoder/decoder error is unlikely in general. In my specific case, the code for the encoder/decoder classes is generated by another tool based on rules from a database. So complementary errors do happen occasionally.ConditionRacer– ConditionRacer2013年02月13日 21:32:41 +00:00Commented Feb 13, 2013 at 21:32
-
How can there be 'complimentary errors'? That implies that there is an external specification for the encoded form, and hence an external decoder.kevin cline– kevin cline2013年02月13日 22:04:18 +00:00Commented Feb 13, 2013 at 22:04
-
I don't understand your use of the word external. But there is a specification for how the data is encoded and also a decoder. A complimentary error is where the encoder and decoder both operate in a way that is complementary but that deviates from the specification. I have an example in the comments under the original question.ConditionRacer– ConditionRacer2013年02月13日 22:15:50 +00:00Commented Feb 13, 2013 at 22:15
-
If the encoder was supposed to implement ROT13 but accidentally did ROT14 and the decoder did too, then decode(encode('a')) == 'a' but the encoder is still broken. For things much more complicated than that, it's probably much less likely that that sort of thing would happen, but theoretically it could.Michael Shaw– Michael Shaw2013年02月14日 00:35:27 +00:00Commented Feb 14, 2013 at 0:35
-
@MichaelShaw just a piece of trivia, the encoder and decoder for ROT13 are the same; ROT13 is its own inverse. If you implemented ROT14 by mistake, then
decode(encode(char))
would not equalchar
(it would equalchar+2
).Tom Marthenal– Tom Marthenal2013年02月17日 06:46:43 +00:00Commented Feb 17, 2013 at 6:46
Test to the requirements.
If the requirements is only 'encode to a byte stream that when decoded produces an equivalent object.', then just test the encoder by decoding. If you are writing both the encoder and the decoder, then just test them together. They can't have "matching errors". If they work together, then the test passes.
If there are other requirements for the data stream, then you will have to test them by examining the encoded data.
If the encoded format is predefined, then either you will have to verify the encoded data against the expected result, as you did, or (better) obtain a reference decoder that can be trusted to do the verification. Use of a reference decoder eliminates the possibility that you have misinterpreted the format specification.
Depending on the testing framework and paradigm you're using, you can still use the Arrange Act Assert pattern for this like you've said.
[TestMethod]
public void EncodeMyObject_ForValidInputs_Encodes()
{
//Arrange object under test
MyEncoder encoderUnderTest = new MyEncoder();
MyObject validObject = new MyOjbect();
//arrange object for condition under test
//Act
byte[] actual = encoderUnderTest.EncodeMyObject(myObject);
//Assert
byte[] expected = new byte[3]{ 0x01, 0x02, 0xFF };
Assert.IsEqual(expected, actual);
}
You should know requirements for EncodeMyObject()
and can use this pattern to test against each of them for valid and invalid criteria, by arranging each of them and hardcoding the expected result for expected
, similarly for the decoder.
Since the expected are hard coded, these will be fragile if you've got a massive change.
You may be able to automate with something parameter driven (have a look at Pex) or if you're doing DDD or BDD have a look at gerkin/cucumber.
You get to decide what is important to you.
Is it important to you that an Object survives the round trip, and the exact wire format isn't really important? Or is the exact wire format an important part of the functionality of your encoder and decoder?
If the former, than just make sure that objects survive the round trip. If the encoder and decoder are both broken in exactly complementary ways, you don't really care.
If the latter, then you need to be testing that the wire format is as you expect for the given inputs. This means either testing the format directly, or else using a reference implementation. But having tested the basics, you may get value from additional round-trip tests, which should be easier to write in volume.
myObject
go frommyObject
to{ 0x01, 0x02, 0xFF }
? Can that algorithm be broken down and tested? The reason I ask is presently, it looks like you have a test that proves that one magic thing produces another magic thing. Your only confidence is that the one input produces the one output. If you can break down the algorithm, you can gain further confidence in the algorithm, and be less reliant on magical inputs and outputs.