12

I've come to believe that continuous refactoring is a good way to maintain the health of a system that is in continuous development. That is, a system for which new features are always in the pipeline.

By 'continuous refactoring' I mean to make small updates to code surrounding the new feature being added or modified. While working on the feature, the particulars of the code will be in the programmers mind, so this would be an appropriate time to make related improvements.

In this refactoring approach, there are always improvements made to the system, separate from the new features being added. It also affects the mentality of the programmers -- to think of continuous improvement of the system as something natural and desirable.

There would still be tasks solely devoted to refactoring (as opposed to doing them alongside features). Those are reserved for bigger changes. In a rough estimation, I would consider it appropriate for a programmer to do a medium size refactoring about once every 3-6 months. For large refactoring, those would be rare; those need to be planned carefully, and only done when specific needs have been identified. As to what constitute a 'medium' vs 'large' refactoring, that is dependent on the particulars of the system.

However, there is an issue with continuous refactoring which I want to overcome, and which is the subject of this post.

When working on a feature, the ideal set of changes include only the updates needed to implement the feature. This way, other programmers that review the code only need to concern themselves with the required changes. But, this is exactly opposite of the idea of continuous refactoring. So, on one hand, making changes only related to the task at hand is good for reviewing the code; while on the other hand, the associated refactoring is good for the long term health of the system.

What are some ideas to overcome that challenge: to be able to make small refactors while working on features, whilst being able to separate the feature changes from the refactor changes such that they are easy to review in isolation?

Greg Burghardt
45.7k8 gold badges85 silver badges149 bronze badges
asked Feb 7 at 16:14
9
  • 7
    When I first came to this question, it had a -1 score and one vote to close as needing focus. I couldn't disagree more. This question has a tightly focused problem to solve and the author showed a good amount of research. Questions like this should be the bread and butter of this community; +1 and very good first question, in my opinion. Commented Feb 7 at 18:31
  • As an example of way to accomplish this, imagine if it were possible to somehow tag specific changes within a commit in some way, and mark them to be part of some group, say a "refactor" tag. Then, when the work on the feature is complete, if there were some git operation that would separate the "refactor" tagged lines from the others, and somehow combine them as 2 separate changesets (maybe in a new branch). That would allow for mixed commits, and for automatic separation after. That's just one possible way to do it (if it existed). I'm sure other could think of different solutions. Commented Feb 8 at 20:08
  • Another way to do it requires more effort, but it could be doable. First, the commits are done as described. Then, at the end, 2 new branches are created whose base is the same as the feature branch. The developer could then manually cherrypick from the feature branch into the 2 new banches: 1 for the feature work, and another for the refactors. This is a heavy way to doing it, but at this moment I more concern with listing different ways to approach it, without discriminating on difficulty of the process. Basically, a type of brainstorming of approaches. Commented Feb 8 at 20:16
  • 2
    Close voters please comment. Commented Feb 8 at 23:56
  • 1
    @Dess what friction? Consider a small script that merges a refactoring branch each time a feature branch is checked out. Then commit and switch before every refactoring task. The only problem is a decision on what belongs where. But it is applicable to layers too. Commented Feb 9 at 0:39

7 Answers 7

9

To quote Kent Beck: "make the change easy, then make the easy change".

First make a pull request (PR) that contains the refactoring that makes the change you want to make easy. No functionality should be changed.

Make a second PR that contains the new feature or whatever it is you need to change. Add or update tests in this PR as well.

Instead of two separate PRs, you could consider separating the two things with commits, but that generally only works for very small refactorings and changes.

candied_orange
120k27 gold badges232 silver badges368 bronze badges
answered Feb 7 at 16:23
4
  • git add -p; git commit -m "refactor: ..."; gh pr create; git add -u; git commit -m "feat: ..."; gh pr create; Commented Feb 8 at 2:02
  • 6
    The crux of the question is that the changes being made for the feature, and for the refactors, are being done simultaneously (presumably in an isolated branch). Any or all commits can potentially include changes from both feature and refactors. The question is, at the end, how to separate the changesets. Commented Feb 8 at 3:08
  • @Dess: did you notice? This is exactly why my answer tries to address. Commented Mar 27 at 21:12
  • I think this is as good an answer as we're going to get to. If you're mixing and matching refactoring with features, this risks confusing the reviewer of the code as well as mixing intent. It is possible that some refactoring breaks the code while the feature is fine and vice versa. Yes, I know - unit tests...but you get my point. Commented Mar 31 at 16:35
5

Some good answers here, but from your comments I guess noone really answered your literal question:

to be able to make small refactors while working on features, being able to separate the feature changes from the refactor changes such that they are easy to review in isolation?

Well, only you, the developer who is making the changes, is able to distinguish which kind of change is really a feature change and which is a refactoring change. And when you muddle those two types of changes up, no tool will be able to separate them afterwards - hence whatever you do, there will be always some self-discipline and willingness of separating the changes necessary.

As others have already written, your goal is to get the pure refactoring changes and the pure feature changes into two different changesets, as a basis of two different PRs, and the refactoring changes should be applicable first, whilst the feature changes expect the refactorings already made. This, however - leads to a chicken-egg problem: to find out which refactorings will make sense to make a feature implementation easier, you need to start with the implementation first and see where you run into obstacles. On the other hand, you want to be sure the refactorings work without any feature change, so you need to try them out without any feature changes.

In simple situations, you may be able to overcome this issue by doing a thorough impact analysis before creating the first feature change. You may take notes, make some experimental changes to the code base to find out what you need (but undo them before the first commit). Then you do all the refactorings which you think will make sense in one commit / PR, and afterwards start the sole feature implementation.

If the situation becomes more complicated, you may introduce two branches (a pure refactoring branch as well as the feature branch) and keep them both as two working copies in parallel in your local dev environment. This allows you to keep two windows of your famous IDE or editor open and switch between the two branches by simply changing the active editor window.

Now imagine you are working on your feature branch. Whenever you detect the need to refactor something where you are sure this could have been refactored beforehand, before the feature implementation started - stop working in the "feature branch editor" and go immediately to the "refactoring branch editor". Apply the refactoring there, run all your tests, fix the bugs, commit the changes from the refactoring branch to the dev branch and then merge the dev branch into your feature branch.

As a result, the refactorings will be sorted out from the feature changes, though you effectively work on features and refactorings in parallel. All refactorings can now be now reviewed individually without feature changes, and when you finally commit the whole feature changeset to the dev branch, it will not contain any refactorings any more, which means it can be reviewed individually as well. For an outside observer, it will look as if you did all the refactorings before implementing the feature, though your real workflow was different.

Of course, you will have to make sure you do not make accidentally changes to the wrong branch / in the wrong editor window. That's also the case when you regularly switch between different branches in a single working copy using git checkout, for example. Most IDEs or editors can be configured in some way to clearly show in which folder you are currently working.

Did I try this out? Yes, but to be honest, not for this purpose. A common case is the need for creating hotfixes for the latest release while working on new features. There, it is quite common to create a branch as a 2nd working copy which starts from the latest released version, apply some changes and merge them into the current dev branch. I don't see any reason why this should not work for your use case as well.

answered Feb 10 at 12:58
0
3

When working on a feature, the ideal set of changes include only the updates needed to implement the feature. This way, other programmers that review the code only need to concern themselves with the required changes. But, this is exactly opposite of the idea of continuous refactoring. So, on one hand, making changes only related to the task at hand is good for reviewing the code; while on the other hand, the associated refactoring is good for the long term health of the system.

Always know who the work is for.

Continuous refactoring is best when driven by customer facing features, it’s true. But that short sighted approach has a hidden down side called tech debt. Tech debt will let you go fast early but slow you down more and more later. Put it off long enough and eventually you have no choice but to beg management for time just to deal with it.

I’m working on a dedicated refactoring right now. It spans many sprints. It offers the customer no direct benefit at all. It wasn’t my idea. And when I came across the ticket for it I tried to kill it. By trying to kill it I forced them to take ownership of it. I made them document what they wanted out of it.

Now I know who my "customer" is for this (the dev team). By taking this attitude I protect myself from chasing my own pet peeves that no one else cares about.

Priorities have been set and people know how this fits into the overall plan. We know what work needs to wait on it. What work doesn’t. Why and how much we care.

Generally refactoring should not be a management decision. It is a development need. The more you keep management from dealing with it the better. But sometimes there is a big mess to deal with that becomes visible to them. Once it does they will want to track it. The work will need to be done either way.

How did I end up in this undesirable state? The old code is full of old ideas that the new coders desperately want gone. We wanted to change things years ago but too many things insisted on the old way. Management finally put us in a position to get rid of the old ideas even though this will break many things. Management happens to be focused on a few things now and doesn’t care about the many this will break. That makes this is a good opportunity.

So even though I greatly prefer keeping refactoring off managements plate, here I am with them insisting on it. I’m not against it. I just want to know who cares before I burn time on it.

That’s how I strike my balance between YAGNI and BDUF. I make sure these non-feature-driven-changes make someone besides me happy.

But this kinda thing should be rare. Prevented when possible. And regarded skeptically. Try to drive most of your refactoring with features the customer does care about. Makes it much easier to keep management from knowing or caring that it’s happening.

Kent was right. Make the change easy. Then make the easy change.

I wish what I was dealing with was just making one change easy. This is so much more. Seriously avoid letting your code get in a state like this if you can. Not fun. Personally, I find it embarrassing.

In short, you enable continuous refactoring by being willing to do it without getting credit for it. Do it because you’re a pro.

answered Feb 8 at 23:04
2
  • Perfect answer, but the definition of a feature is too narrow for my taste. I see no reason not to include tech debt compensation estimation in any activity that "customer" requests. Example: management wants a new feature ? Refactor integration API. Management wants to support old features? Refactor while fixing bugs. Commented Feb 8 at 23:36
  • To generalize - communication with management has to be two-way - requests vs estimations. And some estimations have to be large enough to include large refactorings. Commented Feb 8 at 23:41
2

This is a difficult question to answer, because it's a subtle XY problem. It's unclear exactly what kind of refactoring is happening here and how necessary it is. With this answer I'm trying to apply your core question to things I've seen happening in a professional capacity - it's up to you to assess the context and justifications for your own use case.

If you're not here for a more general explanation that lays the groundwork, skip to the last section titled "Direct answers".


General guideline

When working on a feature, the ideal set of changes include only the updates needed to implement the feature.

I suspect your words are different from what you intended to say; but I strongly disagree with your words and in my professional opinion such a mindset is often the main driver that causes developers to only think about their current task and not the (quality of the) codebase that they're working in.

This way, other programmers that review the code only need to concern themselves with the required changes.

The point of "refactor as you go" is that you do this at the same time as your intended change; which implies that you only refactor things that sufficiently pertain to what you're already working on. If you're going off to refactor unrelated things, that's not refactoring as you go - that's not to say that that refactor isn't necessary, but at least it's not necessary to include it in the same PR as your main task.

I suspect you're thinking of cases where you're e.g. changing a reused structure, thus prompting you to (slightly) update all prior usages of that changed structure, which will show up as part of the changeset.

While that observation is correct, I don't see the problem here - or at least I don't see how to improve on this without negatively impacting the quality of the refactoring; i.e. I'm excluding the idea that you half-ass the refactor just so you can keep the changeset down at the cost of the overall codebase consistency.
Even with the best laid plans, if you uncover that a sizeable refactoring is needed, you should not hold this necessary refactoring back purely for the sake of keeping the PR readable, even if this refactor is aggressive enough to make this PR less than trivial to vet.

A PR should not be reviewed file-by-file, it should be read through, parsed as a single change, and the quality of the implementation should be judged as a whole. Breaking it down file-by-file only works for syntax cleanup and inline optimizations, not judging the overall design.

Additionally, most PR reviewing tools I've seen allow you to select a subset of the commits in the PR and review those. If you separate the allegedly "unrelated" changes in a separate commit and clearly label them with the commit message, a reviewer can filter them out.


Main concern

The overall asterisk I see with your question is that you seem to be close to observing/deciding that we should limit our refactoring, not on the basis of what is necessary to refactor, but with the primary motivator being how easy it is for a reviewer.

While we should of course strive to make PRs easy to read, this is a lesser priority compared to keeping the codebase itself maintainable and reasonably free from technical debt.


Direct answers

With all the above in mind, here's the most direct answers I can give to your explicit questions:

What are some ideas to overcome that challenge: to be able to make small refactors while working on features, ...

Don't start from a premise of needing to keep refactors small. Size is not a relevant criterion. Thematic adherence to the main task is what's important; plus there is an additional pragmatic edge case for freshly discovered concerning technical debt that needs to be immediately addressed (e.g. because it's a blocker for the task at hand).

... whilst being able to separate the feature changes from the refactor changes such that they are easy to review in isolation?

If you could do that, then the refactor work is unrelated to the feature change, at which point there's no reason to even keep them in the same PR in the first place - which renders the question moot.

answered Feb 9 at 23:41
1

I don't fully recognise the assumptions which this question proceeds from, and something is not quite ringing true, and for that reason this answer is going to be a bit of a meander to try and hopefully hit on the issue.

The meaning of refactoring

Originally, "refactoring" was a subset of readaptation. It was the application of routine/mechanical alterations to the source code that were supposed to preserve the same "behaviour", but achieve it through a program whose internal elements or structure were slightly different.

The analogy is with algebraic rearrangement of a formula - and more subtly (and perhaps more insidiously), computer programs themselves are analogised with mathematical formulas. The analogy between programs and formulas is not always a correct analogy, but nevertheless the idea works sometimes.

There's two broad reasons why the readaptation of computer programs occurs. One is to explicitly change what it does. This can be either to cope with new circumstances, or it can be to fix perceived faults in the original program - the distinction between which is often unclear.

The other is because the program as source code is found not to be in an ideal form for the developers to further work with - that is, for the developers to continue to grasp and exert a controlling influence over.

There can be a lot of reasons for this. Muddled naming, algorithms that are far more complex than necessary, or an overall structure that isn't keeping up with the evolving conceptualisation inside the developer's heads.

I'm not being exhaustive, but the point is that these things cause various difficulties for developers. They make it more difficult than necessary for developers to reason about the code so as to work out how to change it correctly (which means changes happen slower, with more errors, and requiring more and more cognitive musculature to make progress). They make it difficult to be reminded, by simply reading the code, about the conceptualisation that the design proceeds from - meaning once the creators move on and forget details, any changes later will require development work to be somewhat redone and/or code rewritten from scratch. And they make it especially difficult to introduce any new developer, as the conceptualisation becomes increasingly difficult to infer from reading the code itself.

In terms of changes to the code, it's not always a case of progressively fitting new features. Each proposed alteration poses a challenge of how to integrate the change into the existing edifice of how everything else is programmed to work, and some changes are simple to incorporate whereas others can require an extensive overhaul and revolution in the design.

The rewrite is of course the least conservative approach - similar in civil engineering to building a new bridge alongside, diverting the traffic, then dynamiting the old.

But sometimes you can also take other approaches, like taking the strain off the original bridge piers in some way, building new piers in some way, whilst preserving the original deck and roadway in continuous use. Eventually with the new piers in place, the deck will either last longer, or bear more weight than the original piers could possibly have supported without collapse.

I'm falling into civil engineering analogies here because it's considerably more complicated to describe these techniques as actually applied to software - only experienced practitioners will get the gist, the novice won't know what the hell I'm talking about (which, as an aside, is why it's often not useful for novices to simply read what experts say).

In this context, "refactoring" is broadly the act of taking the weight off the original piers whilst preserving the use of the deck. It's altering the program internally to facilitate the ability of the developers to work further with the program, often with a view to making a future change that really does change how the program works outwardly. The need for this refactoring as a preparatory step, is because it is perceived to be too difficult to plan and execute all the necessary changes to the program in one operation, without risking oversights that would damage how the program works, or which would stop smaller/reactive maintenance activities occurring within a reasonable time without forking the codebase.

It's also possible to engage in refactoring without a specific plan for future outward change in the program. Developers might do this so they are in a position to react more quickly to future changes, or so that they can continue to be able to read the code to be reminded how it works (or so that another developer can hope to read it more effectively).

The supposed distinction between features and refactorings

Now, in your question you seem to be identifying two separate streams of work, features and refactorings. You also talk of constant small refactorings.

Well, as I say above, refactoring projects are often massive - just short of the threshold of throwing everything away and rewriting - and done with a view to massive outward changes. Refactoring is not inherently small.

Generally, you wouldn't distinguish between "small refactorings" and "small features". Practically every new feature is going to require something to change and perhaps be "refactored".

We would distinguish between refactorings and features either when the refactoring is done explicitly to facilitate the continued work of the developers and keep the software under their control (i.e. it's not done for any specific new feature - it might be just cleaning up after the ones that are already there), or it is done when it is an explicit preparatory step to providing a feature which imposes massive changes.

It either case, it's not an optional development activity, it's not a belief system, which are some of the terms in which you speak. You can argue about the necessity of doing certain specific instances of work, but not about the general practice of the field (where such activities can never be excluded in principle).

When discussing small refactorings that are associated with fitting new features, you'd just talk about features - because it can be taken for granted that new features usually require some small transformations of existing code.

Combining unrelated work

I wonder whether you are really asking about "how do we moonlight on unrelated refactorings which we the developers think are necessary, whilst purporting to work on features which are authorised work and don't reasonably require the refactoring". My advice would be not to muddle the work together, if they can be isolated from one another, and if you have a system for tackling items one at a time.

That said, some changes cannot really be isolated. The "unrelated" refactorings, whilst pursued for a separate purpose, might be so integrated and bound up with the workings implicated by the new "feature" that it makes no sense to isolate the work on them both.

You identify the desirability of developers getting the unrelated refactoring work done whilst that area of the program is hot in their minds - and that makes sense. Well, why doesn't that apply to the reviewers too? Presumably the reviewers need to understand the program and the developer's purpose in order to effectively review his work, so why would the reviewers benefit from an isolation which the developer doesn't?

The issue you might be hitting on is that it is often easy for the developer to perceive the need for adjacent work and desirable for him to act immediately on the perception to maintain high standards, but it's not as easy to explain and document the perception or the appropriateness of the reaction to it, so as to inform a completely unfamiliar person about what's going on.

Unfortunately, that's a fundamental inconsistency between mechanical systems of work (such as ticketing systems) and intelligent behaviour, with mechanisation either requiring simplicity in the activity or else (if it impedes intelligent action on work that has to be complex) causing a collapse in quality.

It's also an inconsistency between allowing an individual practitioner to go forth and make autonomous decisions about work that needs complicated cognitive immersion, but then introducing someone later to second-guess the work without having participated as it went along or having seen the questions arise in context with the relevant materials present. Often these reviews become either perfunctory (because the reviewer is not cognitively engaged with the work that's been done) or, in the worst case, struggle sessions.

Both these contradictions are why neither ticketing-style systems nor code reviews are universally adopted in software development.

answered Feb 8 at 11:36
13
  • 1
    This post is a philosophical stance on the meaning of refactoring. I'm not going to take a position on whether or not I agree with your philosophy. However, this post didn't attempt to answer my question, which is: once the decision has been made to approach development in the way I described, how to then compartmentalize the feature work and the "refactor" work such that they can be reviewed in isolation. Commented Feb 8 at 16:52
  • 1
    @Dess, the problem is that the answer to that question seems simple. It's like asking how do you shake one leg without the other? A: you shake one and keep one still. How do you separate work? You do the work separately, and record the work separately. The reason why I've delivered such a wide-ranging answer is to give you the opportunity to engage on where you think the misunderstanding might be - whether it is on your part, or on my part. Why do you think the separation of the work would be so difficult as to require advice? Commented Feb 8 at 17:04
  • 1
    Yes, feature work and refactors can be done in isolated branches. In that case, there is no issue of separation. But the question that I asked has a separate setup. How that setup came to be should be immaterial; somehow it can to be, and it is like that now. So, then, proceeding from there, how to then tackle the issue that arises from that setup, and which I described? Commented Feb 8 at 17:15
  • @Dess, I'm sorry I haven't been able to discern any special "setup" or why it poses any difficulties with proceeding separately on items of work you say are separable. Commented Feb 8 at 17:29
  • 1
    If you're not able to discern it, then it's ok. We can just leave it there. Commented Feb 8 at 17:36
1

A pragmatic approach.

First solve the coding problem. Commit each refactor or feature related change. with an appropriate placeholder commit message. For any decent sized change you will easily make hundreds. All the more reason to keep tickets small.

Now with a solution at the desired level of working and clean. Create a new branch and rearrange one commit, and test it.

We rearrange one commit at a time so that if something goes wrong, we know which commit was the problem and can revert to the previous branch. If you are more adventurous you can do more particularly if you are confident on what will work. Just try and keep the amount of change low to make problem identification easier.

The goal:

  • pulling feature specific work up,
  • pushing cleaning work down
  • merge commits that really should be together. eg: because you forgot to commit a line, or made the same change in another file later.

Keep your original working branch, the last successful edit, and the current rearrangement you are working on.

If interactive rebasing flops, don't be afraid to rebuild manually by cherry picking. In fact you may prefer to build the reviewable branch through cherry picks, keeping a list of still to be merged in commits.

When the commit history looks reviewable and its change set is identical to your original branch, then start your pr strategy by pushing the a suitable number of commits into a pr for review.

answered Feb 12 at 6:10
-1

TLDR: refactoring does not exist in isolation.

I often see a mindset inspired by TDD that refactoring is a separate, independent task, that has to have a dedicated time allocated to it. This is wrong because refactoring without a goal is dangerous and instrumental goals are volatile until the the whole feature is done. Refactoring is not done to implement an unknown feature tomorrow. It is done to implement a known feature requested today.

Refactoring should never happen in isolation, and tasks devoted to refactoring have to be associated with features (or bugs), they can't exist in a vacuum.

To extract a refactoring step, do it in the following general way:

  • prototype a feature
    • violate any existing design as necessary
    • do not bother with best practices
    • cut nonessential or obvious corners (with more experience more corners are cut)
    • do not focus on tests
  • analyze the resulting design
    • isolate defects
    • find all feature requirements
    • determine an optimal design
  • throw out the prototype
  • refactor the old design to match the new optimal one
    • follow best practices
    • add tests
  • implement the feature on top of the new design

The main idea here is - no refactoring makes sense or is possible without a feature or goal. The final solution is only clear once the goal is achieved. The refactoring helps to implement a known feature, so it has to be done before the feature. (Not after, like TDD suggests).

Some steps can be omitted with experience. A common task may already have an example eliminating the need for a prototype. But the general approach stays - only refactor with a clear vision of the final state, not a state "before the feature".

answered Feb 8 at 8:16
27
  • 4
    Downvote is mine, but I only partially agree with this. Refactoring, in my case, happens because I am looking at a piece of code before making changes. And its too convoluted or deeply nested. Commented Feb 8 at 14:56
  • 1
    @RohitGupta why are you looking at it? Stop if it does not help to implement a feature. Commented Feb 8 at 14:57
  • 2
    "refactoring does not exist"... looks at commit log Commented Feb 8 at 16:09
  • 1
    "The main idea here is - no refactoring makes sense or is possible without a feature or goal." IF your codebase was built from scratch with every merge into its trunk not leaving any technical debt behind; this statement is correct. However, that scenario is naively idealistic. Even if you never choose to ignore technical debt, you simply cannot guarantee that developers will never make an oversight or end up needing to dramatically change tack due to either a new tech stack, new paradigm, or changed business requirements; at which point this answer's idealism becomes unhelpful. Commented Feb 9 at 23:37
  • 1
    @Basilevs Stating that (a) a codebase can have preexisting debt and (b) refactoring as an independent task is impossible is contradictory. Technical debt is defined as knowing that a current inplementation can be done better - the existence of technical debt implies that you therefore could (if you spent the effort) work it away. Not all technical debt is one of an imperfect implementation of a feature/requirement. Some technical debt accrues based on the tech stack evolving and therefore shifting the baseline on what is good practice, syntax, and appropriate patterns. Commented Feb 10 at 10:56

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.