We have a large Git repo. The files are a mix of styles: tabs vs spaces, unix vs windows line ending, JSON formatting styles, etc.
We'd like to change every file to be consistent.
However, if I make the change, git blame
will show me as the last person to modify every file in the repo. This would be literally true, but misleading to future modifiers of the file. If we did all the changes in a special account (fixer
) we'd have a similar problem.
How do you suggest we make such a global change?
-
By the way you don't need to actually create an account for this. If you use the --author option to git commit you can pass in any name you want. It doesn't have to match any existing account. Your own name will still show up in the 'comitter' header of the commit, which is separate from the 'author' header.bdsl– bdsl2019年04月11日 16:48:53 +00:00Commented Apr 11, 2019 at 16:48
3 Answers 3
There are two problems to consider:
- How do you review such a commit?
- How do you attribute it?
It is clear that the commit will be so massive that it is practically impossible to review. What you should do instead is use an automated tool to make the change, and put the exact tool used, the version number, and the configuration file / command line arguments into the commit message. That way, everybody can check out the old version of the code, run the tool, and verify that the output of the tool is identical with the new version of the code.
This already allows you to verify that there are no functional code changes hidden inside the massive commit.
And verifying that the style change didn't break any code is now shifted from having to review the massive commit to having to review the quality of the tool used, and that the configuration and command line arguments are correct.
If you have good test coverage, you can temporarily duplicate your tests and your code, so you have two copies of the test suite (one in the old style, and one re-formatted) and two copies of the production code. Then, you run all four combinations of test suite vs. production code. All four results should be identical, then you can be reasonably sure that the re-format broke neither your tests nor your production code.
For the second problem, there is unfortunately no better solution than what you and Robin already proposed: create a special user and attribute the commit to them.
You could get fancy, and try to break up the commit so that you attribute every re-formatted line to the author of the original line, but there are several problems with that:
- What if, during re-formatting, two lines of two authors get merged into one?
- What if, during re-formatting, one line gets split into two?
- It is actually a lie, because that person did not write that piece of code, a program did.
-
Noting, of course, that Google internally uses an automated formatting tool for check-ins and even there with all the resources at their disposal it reportedly still requires some small amount of hand-waving occasionally. I really like your note about test coverage!Patrick Hughes– Patrick Hughes2019年04月11日 18:25:44 +00:00Commented Apr 11, 2019 at 18:25
Almost 3 years ago, I made a massive formatting change in one of our repos. In the time since, I have received exactly one question attributed to my name being all over the place in git blame
. That's because the usual second step after seeing who made a change is looking up why they made the change, which brings them to a massive pull request with only whitespace and punctuation changes.
In other words, you should be fine with however you choose to attribute this change in git. People are generally smart enough to differentiate the substantial changes.
-
Most GUI git wrappers allow keep blaming in one of few clicksmax630– max6302019年04月14日 16:40:44 +00:00Commented Apr 14, 2019 at 16:40
If you can allow it to be "eventually consistent" instead of updating all the repo in a single "transaction", what I suggest is that you add an .editorconfig
file to the root of your repo.
Most IDE and editors support .editorconfig
these days and will enforce its rules as each file is edited and saved.
.editorconfig
supports defining line endings, tabs vs. spaces, indentation, spaces are the end of the line, lines at the end of the file, char set, different settings for each file extension, and if you use Visual Studio it has an endless list of other settings.
I suggest that you add an .editorconfig
to your repo in any case, just to make sure that your defined style stays consistent in the future.
-
+1 for the
.editorconfig
suggestion in general, but with the caveat that this will reformat during day-to-day functional changes, so diffs and pull requests will include style changes along with functional changes. That can make diffs harder to review for correctness.Jules Dupont– Jules Dupont2019年04月17日 01:39:40 +00:00Commented Apr 17, 2019 at 1:39 -
1@JulesDupont That's true, but I usually ignore whitespace in my diffs, so it isn't a big issue. Some of the C# specific formattings could be, though.lpacheco– lpacheco2019年04月17日 15:51:28 +00:00Commented Apr 17, 2019 at 15:51