When in the refactoring step of a TDD process, if we "factor out" some common functionality from two (or more) code modules, how do we avoid that "factored out" code not being properly covered by unit tests?
Here is a toy example to illustrate what I mean. Since it is a toy example, the factored-out code is very small, but we could imagine that it was a much larger set of code that was common between two modules. Let's say we have arrived at the following situation by following a TDD process:
# ----- foo.py -------
def foo(a: int, b:int) -> str:
x = str(a + b)
... # some elided operations that modify x further
return x
# ----- bar.py ------
def bar(a: int, b: int) -> str:
x = str(a + b)
... # some different operations that modify x further
return x
In the refactoring step, we want to break out the str(a + b)
part of the two functions:
After:
# ---- baz.py ----
def stringy_sum(a: int, b: int) -> str:
return str(a + b)
# ---- foo.py ----
def foo(a: int, b:int) -> str:
x = stringy_sum(a, b)
... # some elided operations that modify x further
return x
# ----- bar.py ------
def bar(a: int, b: int) -> str:
x = stringy_sum(a, b)
... # some different operations that modify x further
return x
Now we have a function stringy_sum
that is not covered by any unit tests of its own. It's only covered by what could be regarded as integration tests (the tests for foo and bar still use the function, but do not actually test its operation). Where did we go wrong in this process from a TDD standpoint?
4 Answers 4
You went wrong by thinking of stringy_sum
as something that should be tested. It's an internal implementation detail, and you shouldn't test internal implementation details.
An alternative way of looking at it: you went wrong by making stringy_sum
part of your public interface. Hide it away so nobody other than the people working on the module know about it and you don't need to worry that it doesn't have its own unit tests, because your unit tests still cover all the behaviour of the module. (Yes, I know Python doesn't have a great way of saying "this is private, you can't use it"; that is mostly irrelevant here).
-
10<obligatory_rant> Python is a language for consenting adults. If you want to mark something private, simply prefix the name with an underscore, e.g. "_stringy_sum". Whether other developers will ignore this and invoke it anyway is a different problem. The canonical example I've seen for public/private is a toy "ATM" class - if your coworkers are trying to break into the ATM, you've got bigger problems than the fact that Python doesn't enforce private variables. </obligatory_rant>Blackhawk– Blackhawk2022年12月15日 22:17:59 +00:00Commented Dec 15, 2022 at 22:17
-
3Why shouldn't stringy_sum have tests if this is useful?Stack Exchange Broke The Law– Stack Exchange Broke The Law2022年12月16日 10:34:58 +00:00Commented Dec 16, 2022 at 10:34
-
1@user253751 Precisely because of this issue - if you want to change the code using stringy_sum but not its behaviour (i.e. refactor) then you are required to add or remove tests. The only utility of tests (according to this interpretation) is to verify behaviour. So either stringy_sum is part of the public API (behaviour) of the class, and should be tested, or it is used only implicitly, in which case the actual behaviour should be tested. If stringy_sum contains complex behaviour, then remodelling might make it a new unit in the system, which is used by foo and bar.preferred_anon– preferred_anon2022年12月16日 11:11:57 +00:00Commented Dec 16, 2022 at 11:11
-
@user253751 Of course I have. It's a natural thing to want. You asked why not, so I told you why not. Note the last sentence in particular.preferred_anon– preferred_anon2022年12月16日 11:18:06 +00:00Commented Dec 16, 2022 at 11:18
Philip Kendall's answer says the issue can be discussed away by interpreting stringy_sum
as a private function, as an implementation detail which does not need a test on its own. That is a possible point of view.
However, we can also take the point of view that stringy_sum
shall become a public function, intentionally, and that this is exactly what you want. For this case, we can approach the issue in a straightforward TDD way, just by making smaller "baby steps".
You can try it this way:
Plan the new function
stringy_sum
and write a test for it (which fails, sincestringy_sum
does not exist yet).Implement
stringy_sum
by copying the code out of foo or bar and make some adaptions, if necessary. Now the new test should pass.Modify
foo
by replacing the old code by a call tostringy_sum
. Run all tests, they should pass (otherwise debug).Do the same for
bar
.
This is TDD by the book - red, green, refactor - and results in a new function with a new test, which is exactly what you asked for.
To decide, however, which point of view is more adequate, toy examples don't provide enough context. In a real-world sitation, you can way more easily decide if your method to extract is something you want in public, or something you want to keep in private.
Rather than talk about what you can do, let’s talk about how to decide.
If this thing is going to be used by code you have no control over then it needs its own test, because it’s effectively part of an API.
Otherwise, if you have dominion over everything that will call this then you can treat it like an implementation detail and let it get tested when other things call it.
Thinking of it this way avoids depending on access modifiers that don't exist in every language. This is the real reason we care about things being public. If it's public we assume we'll lose control over how it's used. That isn't always true but it is what drove that knee jerk rule.
At the heart of this issue is how you define a unit. If you think every class is it's own unit you wrap every class in tests and no one wants to refactor anything in your code base because doing so always breaks some test.
Fowler= called those solitary tests. He preferred sociable tests that allowed for many forms of abstraction. Sometimes a class abstracts many others (see The Façade Pattern= ). Sometimes you just need to be sure the facade works and don't care about what's hiding behind it. Not caring allows for a lot of refactoring. So this was never about access modifiers anyway. It's always been about abstraction.
That is, until performance becomes an issue. While I am perfectly willing to allow that the unit under test can be larger than a single class I shudder to allow everything in the unit. The best rules for limiting what you allow in the unit come from Mitchel Feathers. He won’t let you call your test a unit test if:
- It talks to the database
- It communicates across the network
- It touches the file system
- It can't run at the same time as any of your other unit tests
- You have to do special things to your environment (such as editing config files) to run it.
People have argued to death whether to call their tests unit tests or integration tests. Personally I don't care what they're called anymore. I care about how I can use them. I have one pile of tests that run so fast I can run them as often as I type a semicolon. I have another pile that I run when I merge code. Guess where the slow tests go?
If you don't separate these tests so you can run them at different times then you're doing it wrong.
So you decide if stringy_sum
needs it's own test based on if you can control what calls it and if what calls it is fast enough that you're willing to run it over and over while you code.
You didn't go wrong anywhere. The function is still covered. If at some point the code base changes further so that it isn't covered anymore, then you'd have an issue, but then your coverage metrics would tell you so.
In fact, you did good, because as it was, your coverage metric was slightly misleading in that it counted basically the same code twice. Now that it has been reduced to a single occurrence, your metric is slightly closer to being functionally correct.
is not covered by any unit tests of its own
does not mean that it is not covered by unit tests. Your original unit tests, with zero modification, should already be testing the logic of the factored out code. Otherwise you are not achieving good code coverage.foo(a, b)
equal tofoo_helper(stringy_sum(a, b))
, then write tests forfoo_helper
(as well asstringy_sum
) instead offoo
; similarly withbar
. I'm not saying I recommend it, but you can do it.