How should factored-out code be tested as part of the TDD refactoring step?

Question 1

When in the refactoring step of a TDD process, if we "factor out" some common functionality from two (or more) code modules, how do we avoid that "factored out" code not being properly covered by unit tests?

Here is a toy example to illustrate what I mean. Since it is a toy example, the factored-out code is very small, but we could imagine that it was a much larger set of code that was common between two modules. Let's say we have arrived at the following situation by following a TDD process:

# ----- foo.py -------
def foo(a: int, b:int) -> str:
 x = str(a + b)
 ... # some elided operations that modify x further
 return x
# ----- bar.py ------
def bar(a: int, b: int) -> str:
 x = str(a + b)
 ... # some different operations that modify x further
 return x

In the refactoring step, we want to break out the str(a + b) part of the two functions:

After:

# ---- baz.py ----
def stringy_sum(a: int, b: int) -> str:
 return str(a + b)
# ---- foo.py ----
def foo(a: int, b:int) -> str:
 x = stringy_sum(a, b)
 ... # some elided operations that modify x further
 return x
# ----- bar.py ------
def bar(a: int, b: int) -> str:
 x = stringy_sum(a, b)
 ... # some different operations that modify x further
 return x

Now we have a function stringy_sum that is not covered by any unit tests of its own. It's only covered by what could be regarded as integration tests (the tests for foo and bar still use the function, but do not actually test its operation). Where did we go wrong in this process from a TDD standpoint?

Question 2

I guess its just terminology at this point. I think the assumption with the refactor step is that you are not adding new public methods and classes. ie the testable surface stays the same. If you are. then add more tests

Question 3

If you're doing this as a TDD step, the code that you factor out is already covered with unit tests. Refactoring restructures your code without changing functionality, and you do the refactoring step after you already have passing tests.

Question 4

"Now we have a function [...] that is not covered by any unit tests of its own" - that is fine. Not every single function that exists has to have it's own dedicated test. This is how you get brittle tests and code that's hard to change. It should just be covered by something. You have to decide what your components/units are, what constitutes their interfaces/contracts, and test on that level.

Question 5

is not covered by any unit tests of its own does not mean that it is not covered by unit tests. Your original unit tests, with zero modification, should already be testing the logic of the factored out code. Otherwise you are not achieving good code coverage.

Question 6

You can achieve more testing orthogonality by making foo(a, b) equal to foo_helper(stringy_sum(a, b)), then write tests for foo_helper (as well as stringy_sum) instead of foo; similarly with bar. I'm not saying I recommend it, but you can do it.

Question 7

You went wrong by thinking of stringy_sum as something that should be tested. It's an internal implementation detail, and you shouldn't test internal implementation details.

An alternative way of looking at it: you went wrong by making stringy_sum part of your public interface. Hide it away so nobody other than the people working on the module know about it and you don't need to worry that it doesn't have its own unit tests, because your unit tests still cover all the behaviour of the module. (Yes, I know Python doesn't have a great way of saying "this is private, you can't use it"; that is mostly irrelevant here).

Question 8

<obligatory_rant> Python is a language for consenting adults. If you want to mark something private, simply prefix the name with an underscore, e.g. "_stringy_sum". Whether other developers will ignore this and invoke it anyway is a different problem. The canonical example I've seen for public/private is a toy "ATM" class - if your coworkers are trying to break into the ATM, you've got bigger problems than the fact that Python doesn't enforce private variables. </obligatory_rant>

Question 9

Why shouldn't stringy_sum have tests if this is useful?

Question 10

@user253751 Precisely because of this issue - if you want to change the code using stringy_sum but not its behaviour (i.e. refactor) then you are required to add or remove tests. The only utility of tests (according to this interpretation) is to verify behaviour. So either stringy_sum is part of the public API (behaviour) of the class, and should be tested, or it is used only implicitly, in which case the actual behaviour should be tested. If stringy_sum contains complex behaviour, then remodelling might make it a new unit in the system, which is used by foo and bar.

Question 11

@user253751 Of course I have. It's a natural thing to want. You asked why not, so I told you why not. Note the last sentence in particular.

Question 12

Philip Kendall's answer says the issue can be discussed away by interpreting stringy_sum as a private function, as an implementation detail which does not need a test on its own. That is a possible point of view.

However, we can also take the point of view that stringy_sum shall become a public function, intentionally, and that this is exactly what you want. For this case, we can approach the issue in a straightforward TDD way, just by making smaller "baby steps".

You can try it this way:

Plan the new function stringy_sum and write a test for it (which fails, since stringy_sum does not exist yet).
Implement stringy_sum by copying the code out of foo or bar and make some adaptions, if necessary. Now the new test should pass.
Modify foo by replacing the old code by a call to stringy_sum. Run all tests, they should pass (otherwise debug).
Do the same for bar.

This is TDD by the book - red, green, refactor - and results in a new function with a new test, which is exactly what you asked for.

To decide, however, which point of view is more adequate, toy examples don't provide enough context. In a real-world sitation, you can way more easily decide if your method to extract is something you want in public, or something you want to keep in private.

Question 13

Rather than talk about what you can do, let’s talk about how to decide.

If this thing is going to be used by code you have no control over then it needs its own test, because it’s effectively part of an API.

Otherwise, if you have dominion over everything that will call this then you can treat it like an implementation detail and let it get tested when other things call it.

Thinking of it this way avoids depending on access modifiers that don't exist in every language. This is the real reason we care about things being public. If it's public we assume we'll lose control over how it's used. That isn't always true but it is what drove that knee jerk rule.

At the heart of this issue is how you define a unit. If you think every class is it's own unit you wrap every class in tests and no one wants to refactor anything in your code base because doing so always breaks some test.

Fowler⁼ called those solitary tests. He preferred sociable tests that allowed for many forms of abstraction. Sometimes a class abstracts many others (see The Façade Pattern⁼). Sometimes you just need to be sure the facade works and don't care about what's hiding behind it. Not caring allows for a lot of refactoring. So this was never about access modifiers anyway. It's always been about abstraction.

That is, until performance becomes an issue. While I am perfectly willing to allow that the unit under test can be larger than a single class I shudder to allow everything in the unit. The best rules for limiting what you allow in the unit come from Mitchel Feathers. He won’t let you call your test a unit test if:

It talks to the database

It communicates across the network

It touches the file system

It can't run at the same time as any of your other unit tests

You have to do special things to your environment (such as editing config files) to run it.

artima.com - A Set of Unit Testing Rules

People have argued to death whether to call their tests unit tests or integration tests. Personally I don't care what they're called anymore. I care about how I can use them. I have one pile of tests that run so fast I can run them as often as I type a semicolon. I have another pile that I run when I merge code. Guess where the slow tests go?

If you don't separate these tests so you can run them at different times then you're doing it wrong.

So you decide if stringy_sum needs it's own test based on if you can control what calls it and if what calls it is fast enough that you're willing to run it over and over while you code.

Question 14

You didn't go wrong anywhere. The function is still covered. If at some point the code base changes further so that it isn't covered anymore, then you'd have an issue, but then your coverage metrics would tell you so.

In fact, you did good, because as it was, your coverage metric was slightly misleading in that it counted basically the same code twice. Now that it has been reduced to a single occurrence, your metric is slightly closer to being functionally correct.

Philip Kendall Philip Kendall 26k10 gold badges66 silver badges68 bronze badges · Answer 1 · 2022-12-15 12:55:17Z

31

You went wrong by thinking of stringy_sum as something that should be tested. It's an internal implementation detail, and you shouldn't test internal implementation details.

An alternative way of looking at it: you went wrong by making stringy_sum part of your public interface. Hide it away so nobody other than the people working on the module know about it and you don't need to worry that it doesn't have its own unit tests, because your unit tests still cover all the behaviour of the module. (Yes, I know Python doesn't have a great way of saying "this is private, you can't use it"; that is mostly irrelevant here).

Share

Improve this answer

answered Dec 15, 2022 at 12:55

Philip Kendall's user avatar

Philip Kendall Philip Kendall

26k10 gold badges66 silver badges68 bronze badges

4

10

<obligatory_rant> Python is a language for consenting adults. If you want to mark something private, simply prefix the name with an underscore, e.g. "_stringy_sum". Whether other developers will ignore this and invoke it anyway is a different problem. The canonical example I've seen for public/private is a toy "ATM" class - if your coworkers are trying to break into the ATM, you've got bigger problems than the fact that Python doesn't enforce private variables. </obligatory_rant>

Blackhawk
– Blackhawk

2022年12月15日 22:17:59 +00:00
Commented Dec 15, 2022 at 22:17
3

Why shouldn't stringy_sum have tests if this is useful?

Stack Exchange Broke The Law
– Stack Exchange Broke The Law

2022年12月16日 10:34:58 +00:00
Commented Dec 16, 2022 at 10:34
1

@user253751 Precisely because of this issue - if you want to change the code using stringy_sum but not its behaviour (i.e. refactor) then you are required to add or remove tests. The only utility of tests (according to this interpretation) is to verify behaviour. So either stringy_sum is part of the public API (behaviour) of the class, and should be tested, or it is used only implicitly, in which case the actual behaviour should be tested. If stringy_sum contains complex behaviour, then remodelling might make it a new unit in the system, which is used by foo and bar.

preferred_anon
– preferred_anon

2022年12月16日 11:11:57 +00:00
Commented Dec 16, 2022 at 11:11
@user253751 Of course I have. It's a natural thing to want. You asked why not, so I told you why not. Note the last sentence in particular.

preferred_anon
– preferred_anon

2022年12月16日 11:18:06 +00:00
Commented Dec 16, 2022 at 11:18

Add a comment |

Doc Brown Doc Brown 219k35 gold badges405 silver badges619 bronze badges · Answer 2 · 2022-12-15 13:13:44Z

Philip Kendall's answer says the issue can be discussed away by interpreting stringy_sum as a private function, as an implementation detail which does not need a test on its own. That is a possible point of view.

However, we can also take the point of view that stringy_sum shall become a public function, intentionally, and that this is exactly what you want. For this case, we can approach the issue in a straightforward TDD way, just by making smaller "baby steps".

You can try it this way:

Plan the new function stringy_sum and write a test for it (which fails, since stringy_sum does not exist yet).
Implement stringy_sum by copying the code out of foo or bar and make some adaptions, if necessary. Now the new test should pass.
Modify foo by replacing the old code by a call to stringy_sum. Run all tests, they should pass (otherwise debug).
Do the same for bar.

This is TDD by the book - red, green, refactor - and results in a new function with a new test, which is exactly what you asked for.

To decide, however, which point of view is more adequate, toy examples don't provide enough context. In a real-world sitation, you can way more easily decide if your method to extract is something you want in public, or something you want to keep in private.

score 6 · Answer 3 · 2022-12-15 20:51:05Z

Rather than talk about what you can do, let’s talk about how to decide.

If this thing is going to be used by code you have no control over then it needs its own test, because it’s effectively part of an API.

Otherwise, if you have dominion over everything that will call this then you can treat it like an implementation detail and let it get tested when other things call it.

Thinking of it this way avoids depending on access modifiers that don't exist in every language. This is the real reason we care about things being public. If it's public we assume we'll lose control over how it's used. That isn't always true but it is what drove that knee jerk rule.

At the heart of this issue is how you define a unit. If you think every class is it's own unit you wrap every class in tests and no one wants to refactor anything in your code base because doing so always breaks some test.

Fowler⁼ called those solitary tests. He preferred sociable tests that allowed for many forms of abstraction. Sometimes a class abstracts many others (see The Façade Pattern⁼). Sometimes you just need to be sure the facade works and don't care about what's hiding behind it. Not caring allows for a lot of refactoring. So this was never about access modifiers anyway. It's always been about abstraction.

That is, until performance becomes an issue. While I am perfectly willing to allow that the unit under test can be larger than a single class I shudder to allow everything in the unit. The best rules for limiting what you allow in the unit come from Mitchel Feathers. He won’t let you call your test a unit test if:

It talks to the database

It communicates across the network

It touches the file system

It can't run at the same time as any of your other unit tests

You have to do special things to your environment (such as editing config files) to run it.

artima.com - A Set of Unit Testing Rules

People have argued to death whether to call their tests unit tests or integration tests. Personally I don't care what they're called anymore. I care about how I can use them. I have one pile of tests that run so fast I can run them as often as I type a semicolon. I have another pile that I run when I merge code. Guess where the slow tests go?

If you don't separate these tests so you can run them at different times then you're doing it wrong.

So you decide if stringy_sum needs it's own test based on if you can control what calls it and if what calls it is fast enough that you're willing to run it over and over while you code.

Kilian Foth Kilian Foth 111k45 gold badges301 silver badges323 bronze badges · Answer 4 · 2022-12-15 12:52:20Z

You didn't go wrong anywhere. The function is still covered. If at some point the code base changes further so that it isn't covered anymore, then you'd have an issue, but then your coverage metrics would tell you so.

In fact, you did good, because as it was, your coverage metric was slightly misleading in that it counted basically the same code twice. Now that it has been reduced to a single occurrence, your metric is slightly closer to being functionally correct.

Stack Exchange Network

How should factored-out code be tested as part of the TDD refactoring step?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How should factored-out code be tested as part of the TDD refactoring step?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions