Unit testing thread abstraction

Question 1

Being portable C++ code, our project has an abstraction layer over threads. This layer has unit tests. This includes tests like that Thread::sleep(100) wakes up in 130 ms, that CondVar::notify() wakes up the other thread in 50 ms and similar. Obviously this occasionally fails, especially when the computer is also doing something else.

Is it normal to test these things? Is there a way to still test them and make the tests pass reliably for continuous integration—where the server might be running a second build in parallel, so it is doing something else too.

Question 2

Tests should be deterministic and definitive. Assuming they conform to the spec, a failed test should indicate a bug. If it doesn't -- if the test can fail in a case that is not an error and not the code's fault -- then the test doesn't tell you anything useful other than that you might be assuming too much about your environment.

The big issue here is that outside a real-time system, you can't guarantee anything about how quickly a thread will wake up once its nap is over. A thread that's been told to sleep for 100 ms will sleep for at least 100 ms. But it may well sleep for ten seconds if the system is really busy. In most environments, that is not an error, and it should not be treated as one.

Question 3

Maybe the test asserts that at least 100 ms have elapsed since it was put to sleep.

Question 4

@GregBurghardt: That would be a valid test, as it's something that the system can promise (and generally does). Of course, at that point you'd test for "at least 100 ms" rather than any relation to 130 ms, lest it fail when the system does manage to wake up a thread exactly when its nap's over.

Question 5

Obviously this occasionally fails.

If it is acceptable for a test to occasionally fail, then what you're testing is not an actual requirement.

Is it normal to test these things?

Sounds like you've created an adapter, and the tests that you're describing don't sound like unit tests, they sound like integration tests.

One reason for creating an adapter is to enable unit testing of something that would have been hard to test otherwise. E.g., You want to verify that some higher-level function calls sleep() when it's supposed to do, but there's no way to inject a test double for the operating system's static sleep() function. So, you create an adapter---a very thin object with a sleep(n) method that calls the static sleep(n) function. That way, you can inject a double for your adapter in the unit test of the higher level function.

It's hard to create a unit test for the adapter itself, when it's whole purpose is to create a unit-testable interface to something that wasn't unit-testable to begin with.

Usually, you make the adapter so trivially thin, that everyone can agree that the adapter itself does not need to be tested.

Question 6

Yes, it is an adapter. The tests test the adapter itself in large part.

Question 7

@JanHudec So, "Is it normal to test these things?" In my experience, No. When I have written adapters in the past, I was doing it to enable unit testing of something else. I did not write tests for the adapter itself. The adapters were so simple that other members of my team agreed that they did not require tests of their own.

Question 8

Well, the point of the adapter is not to create a unit-testable interface. The point of the adapter is to allow the code to work in different environments with different system APIs.

Question 9

With Sleep(n), two regressions I can imagine a test being useful to catch are:

Failure to call the underlying platform sleep resulting in no sleep occurring at all
The delay units being translated incorrectly (eg 100 seconds instead of 100 milliseconds).

Whether the risk of such a regression happening is worth writing tests depends on judgement.

Tests like these are timing sensitive and pass or fail at the whim of the current speed of the environment they are running on. To make them reliable you are forced to bump up the delays. For example testing Sleep(10) to make sure it isn't really Sleep(0) or Sleep(10000) cannot reliably be done by testing that the delay is between 10ms and 20ms. if the environment is slow such a test is susceptible to pass even if bug #1 exists (false positive) and fail even if no bug is present (false negative).

Unit tests should be reliable and repeatable on any machine, especially on developer machines.

The only way to bolster reliability of the test is to push the tested delays far above the typical slowest speed of the environment. For example test that Sleep(5000) causes a delay between 5000ms and 6000ms. But now the tests are slow to run.

Unit tests should be fast so that developers aren't discouraged from running them frequently.

It might be possible to find a balance so tests are both reasonably fast and reliable, but with random anti-virus scans, updates, specs, and who knows what else going on on developer machines it might be easier to not include them as unit tests at all, but have them run as a separate suite.

This subject also touches the testing multi-threaded components. You can write the component in such a way to make it controllable under tests, but I tend to dislike that because it tends to make the design more complex just for testing which violates the kiss principle. Often there is a simpler way to test a multi-threaded component, but you need large delays in the tests to ensure reliability. This method is often hated because such tests tend to be included in the unit test suite and suddenly devs find the unit tests are now taking ages plus sometimes certain tests fail randomly, causing someone to "fix" it by bumping up the delays resulting in the tests now taking even longer.

I mention this because such tests could go into the same slow-running suite running on the controlled environment. And be tailored to run reliably on a test machine dedicated to them that had nothing else going on, maybe kicked off by CI. This test suite could also include other types of slow running tests that are not necessarily reliable, such as randomization tests which are underrated. The kind of useful tests that I see missing because no-one knows where to put them.

cHao cHao 1,0207 silver badges12 bronze badges · Answer 1 · 2017-05-18 14:04:38Z

Tests should be deterministic and definitive. Assuming they conform to the spec, a failed test should indicate a bug. If it doesn't -- if the test can fail in a case that is not an error and not the code's fault -- then the test doesn't tell you anything useful other than that you might be assuming too much about your environment.

The big issue here is that outside a real-time system, you can't guarantee anything about how quickly a thread will wake up once its nap is over. A thread that's been told to sleep for 100 ms will sleep for at least 100 ms. But it may well sleep for ten seconds if the system is really busy. In most environments, that is not an error, and it should not be treated as one.

Maybe the test asserts that at least 100 ms have elapsed since it was put to sleep.
@GregBurghardt: That would be a valid test, as it's something that the system can promise (and generally does). Of course, at that point you'd test for "at least 100 ms" rather than any relation to 130 ms, lest it fail when the system does manage to wake up a thread exactly when its nap's over.

Solomon Slow Solomon Slow 1,23110 silver badges14 bronze badges · Answer 2 · 2017-05-18 15:53:39Z

Obviously this occasionally fails.

If it is acceptable for a test to occasionally fail, then what you're testing is not an actual requirement.

Is it normal to test these things?

Sounds like you've created an adapter, and the tests that you're describing don't sound like unit tests, they sound like integration tests.

One reason for creating an adapter is to enable unit testing of something that would have been hard to test otherwise. E.g., You want to verify that some higher-level function calls sleep() when it's supposed to do, but there's no way to inject a test double for the operating system's static sleep() function. So, you create an adapter---a very thin object with a sleep(n) method that calls the static sleep(n) function. That way, you can inject a double for your adapter in the unit test of the higher level function.

It's hard to create a unit test for the adapter itself, when it's whole purpose is to create a unit-testable interface to something that wasn't unit-testable to begin with.

Usually, you make the adapter so trivially thin, that everyone can agree that the adapter itself does not need to be tested.

Yes, it is an adapter. The tests test the adapter itself in large part.
@JanHudec So, "Is it normal to test these things?" In my experience, No. When I have written adapters in the past, I was doing it to enable unit testing of something else. I did not write tests for the adapter itself. The adapters were so simple that other members of my team agreed that they did not require tests of their own.
Well, the point of the adapter is not to create a unit-testable interface. The point of the adapter is to allow the code to work in different environments with different system APIs.

Weyland Yutani Weyland Yutani 1374 bronze badges · Answer 3 · 2017-05-19 10:30:02Z

With Sleep(n), two regressions I can imagine a test being useful to catch are:

Failure to call the underlying platform sleep resulting in no sleep occurring at all
The delay units being translated incorrectly (eg 100 seconds instead of 100 milliseconds).