homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: difflib.unified_diff(...) produces invalid patches
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: trentm Nosy List: TBBle, abacabadabacaba, benjamin.peterson, eric.araujo, georg.brandl, jwilk, meatballhat, puxlit, techtonik, thatiparthy, tim.peters, trentm, ysj.ray
Priority: normal Keywords: patch

Created on 2008年02月18日 20:16 by trentm, last changed 2022年04月11日 14:56 by admin.

Files
File name Uploaded Description Edit
python_difflib_unified_diff.patch trentm, 2008年02月18日 20:24 patch against the Python 2.6 svn trunk to fix this
python_difflib_no_eol.patch trentm, 2009年05月26日 16:50 patch (against trunk, currently 2.7) to fix and with test cases
issue_2142_trunk.diff ysj.ray, 2010年05月14日 08:54 patch against the trunk.
issue_2142_py3k.diff ysj.ray, 2010年05月14日 08:54 patch against py3k
issue_2142_py3k_updated.diff ysj.ray, 2010年07月20日 04:22 updated patch against py3k
issue_2142.diff ysj.ray, 2011年03月10日 07:03
issue_2142_2.diff ysj.ray, 2011年03月15日 05:55 review
Messages (23)
msg62543 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008年02月18日 20:16
When comparing content with difflib, if the resulting diff covers the
last line of one or both of the inputs that that line doesn't end with
an end-of-line character(s), then the generated diff lines don't include
an EOL. Fair enough.
Naive (and I suspect typical) usage of difflib.unified_diff(...) is:
 diff = ''.join(difflib.unified_diff(...))
This results in an *incorrect* unified diff for the conditions described
above.
>>> from difflib import *
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
... "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three+trois
The proper behaviour would be:
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
... "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three
\ No newline at end of file
+trois
\ No newline at end of file
I *believe* that "\ No newline at end of file" are the appropriate
markers -- that tools like "patch" will know how to use. At least this
is what "svn diff" generates.
I'll try to whip up a patch. 
Do others concur that this should be fixed?
msg62544 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008年02月18日 20:24
Attached is a patch against the Python 2.6 svn trunk for this.
msg62545 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008年02月18日 20:25
At a glance I suspect this patch will work back to Python 2.3 (when
difflib.unified_diff() was added). I haven't looked at the Py3k tree yet.
Note: This *may* also applied to difflib.context_diff(), but I am not sure.
msg88375 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2009年05月26日 16:50
Here is a new patch that also fixes the same issue in
difflib.context_diff() and adds a couple test cases.
msg105636 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年05月13日 14:55
Closed 8702 as a duplicate of this one. Combining nosy lists.
msg105638 - (view) Author: anatoly techtonik (techtonik) Date: 2010年05月13日 16:01
You mean that this patch sits in here for 2 years already? This suxx.
And nobody added the tag 'easy', because people haven't explicitly requested tracker privileges? That suxx. 
And it won't be in 2.7 because of that.. I am disappointed.
Why this issue is not labeled as 'difflib' related? How are people supposed to find this one?
msg105639 - (view) Author: anatoly techtonik (techtonik) Date: 2010年05月13日 16:03
Trent, the "\ New line..." seems to be feature specific to unified format only. http://en.wikipedia.org/wiki/Diff#Unified_format 
msg105647 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年05月13日 17:20
Hmm. Not in my version of diff: it seems to use '\ No newline at end of file' for both unified and 'normal' diffs.
$ diff --version
diff (GNU diffutils) 2.8.1
Copyright (C) 2002 Free Software Foundation, Inc.
...
$ diff 1.txt 2.txt
2c2
< two
\ No newline at end of file
---
> two
msg105692 - (view) Author: ysj.ray (ysj.ray) Date: 2010年05月14日 08:54
This is really a bug, but why it's not fixed during such a long time?
Since trentm's python_difflib_no_eol.patch patch failed against the current trunk, I modified it to work again, also a patch against py3k.
msg105693 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年05月14日 09:13
I think it's arguable whether this is a bug or not. There's no official specification for the unified diff format that I can find anywhere; the GNU description at
http://www.gnu.org/software/hello/manual/diff/Detailed-Unified.html#Detailed-Unified
doesn't mention this detail. The '\ No newline at end of file' is actually unnecessary for these Python functions, since they operate on lists and produce a generator, so it would be needless complication. And changing this might break existing Python code that manually parses the output of unified_diff or context_diff and doesn't know what to do with a leading '\' character. (Does such Python code exist? I don't know.)
I'd suggest adding a keyword argument to the unified_diff and context_diff functions to enable this feature, leaving it disabled by default.
msg105762 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年05月14日 21:08
Marking as a feature request: I think this is somewhere on the line between bugfix and new feature, but either way 2.7 is too close to release to start messing with something that isn't really all that broken.
For 2.7 (and 2.6, 3.1), there's also the option of updating the documentation to point out this issue. It would also be worth updating the command-line-interface example in the docs to do this post-processing (i.e., for each line that doesn't end in a newline, add '\n\ No newline at end of file') before calling writelines.
msg110778 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年07月19日 16:43
The two latest patch files were missing TestDifflib from run_unittest. Having shown that the tests failed if test_difflib only was patched, then patch difflib against the 3.1 maintainance and py3k branches, the former passed ok and the latter failed. Could someone please take a look, at the same time considering Mark D's suggestion about using a keyword argument to the unified_diff and context_diff functions.
msg110864 - (view) Author: ysj.ray (ysj.ray) Date: 2010年07月20日 04:22
Firstly, since this is marked as feature request, I updated the patch against py3k, to fix the problems found by BreamoreBoy.
msg110866 - (view) Author: ysj.ray (ysj.ray) Date: 2010年07月20日 05:10
I feel adding a keyword argument to deal with such a case is not quiet worthy. I'd prefer not fix it for 2.x, and make its behavior consist with svn diff and other similar tools, as trentm said, "tools like "patch" will know how to use."
msg118125 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2010年10月07日 17:43
c.f. some discussion on python-dev here:
http://mail.python.org/pipermail/python-dev/2010-October/104501.html 
msg123163 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年12月03日 02:42
To sum up the current consensus as I understand it, the missing end of line is clearly a bug, but the addition of "\No newline at end of file" is controversial. In the current patch, both are changed, so backporting only the first fix to 3.1 and 2.7 will require either agreeing to add "\No newline etc" in those versions too, or make another patch for 3.1/2.7.
I uploaded the patch to http://codereview.appspot.com/3432041/ and made some minor comments.
msg130491 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月10日 07:03
Updated patch which can apply to current py3k cleanly and with changes follow eric's review comments.
msg130677 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年03月12日 14:49
Patch does not look bad. I re-read the discussion on python-dev, where it was suggested to add a keyword argument to get the old behavior. Have you considered it?
msg130683 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月12日 15:28
> I re-read the discussion on python-dev, where it was suggested to add a keyword argument to get the old behavior. Have you considered it?
IIUC, at the time of that discusstion, 3.2 is pre-beta so the suitable option is to add "\No newline etc" by default and add a keyword to get the old behavior, but now it is for 3.3. I wonder if 3.3 is a chance to totally get rid of the *OLD WRONG* behavior.
msg130801 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年03月14日 10:51
I’m not sure whether the compatibility policy would prevent that. Aren’t there some cases with iterators (not strings) that work fine today without the change? If so, it should be possible to get the same behavior.
msg130959 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月15日 05:55
Yes there may be.
Here is the updated patch:
Add a new keyword argument to context_diff() and unified_diff() named "warn_no_newline_at_end". If true, emit "\ No newline etc". It defaults to True, set it to false to get the old behavior. I'm not sure if this name is too long, but I haven't got a better name.
msg236541 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015年02月24日 20:26
Can we have a review on the updated patch please.
msg384192 - (view) Author: Paul "TBBle" Hampson (TBBle) Date: 2021年01月01日 17:46
I just bounced off this issue and proposed a work-around for it in Black (https://github.com/psf/black/pull/1897).
Since it wasn't mentioned here earlier, the `\` marker _is_ documented in the GNU Diffutils documentation, under "Incomplete Lines" (https://www.gnu.org/software/diffutils/manual/html_node/Incomplete-Lines.html#Incomplete-Lines), and applies to both Context and Normal diff output formats.
History
Date User Action Args
2022年04月11日 14:56:30adminsetgithub: 46395
2021年02月18日 19:42:20djcsetnosy: - djc
2021年02月18日 17:52:07mark.dickinsonsetnosy: - mark.dickinson
2021年01月01日 17:46:45TBBlesetnosy: + TBBle
messages: + msg384192
2020年10月28日 11:35:02jwilksetnosy: + jwilk
2018年10月23日 11:55:15thatiparthysetnosy: + thatiparthy
2018年10月11日 17:57:24BreamoreBoysetnosy: - BreamoreBoy
2018年10月11日 13:04:53puxlitsetnosy: + puxlit
2015年11月04日 08:54:37abacabadabacabasetnosy: + abacabadabacaba
2015年02月24日 20:26:42BreamoreBoysetnosy: + BreamoreBoy

messages: + msg236541
versions: + Python 3.5, - Python 3.3
2011年03月15日 05:55:06ysj.raysetfiles: + issue_2142_2.diff
nosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130959
2011年03月14日 10:51:08eric.araujosetnosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130801
2011年03月12日 15:28:19ysj.raysetnosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130683
2011年03月12日 14:49:48eric.araujosetnosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130677
versions: + Python 3.3, - Python 3.2
2011年03月10日 07:03:32ysj.raysetfiles: + issue_2142.diff
nosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130491
2010年12月03日 02:42:53eric.araujosetnosy: + georg.brandl, benjamin.peterson, djc, eric.araujo, - BreamoreBoy
messages: + msg123163
2010年10月07日 17:43:11trentmsetmessages: + msg118125
2010年10月06日 22:15:40trentmsetassignee: trentm
2010年07月20日 05:10:47ysj.raysetmessages: + msg110866
2010年07月20日 04:22:01ysj.raysetfiles: + issue_2142_py3k_updated.diff

messages: + msg110864
2010年07月19日 16:43:05BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110778
2010年05月14日 21:08:47mark.dickinsonsettype: behavior -> enhancement
messages: + msg105762
versions: - Python 2.6, Python 3.1, Python 2.7
2010年05月14日 09:13:55mark.dickinsonsetnosy: + tim.peters
messages: + msg105693
2010年05月14日 08:54:51ysj.raysetfiles: + issue_2142_py3k.diff
2010年05月14日 08:54:18ysj.raysetfiles: + issue_2142_trunk.diff
nosy: + ysj.ray
messages: + msg105692

2010年05月14日 01:43:38meatballhatsetnosy: + meatballhat
2010年05月13日 17:20:18mark.dickinsonsetmessages: + msg105647
2010年05月13日 16:03:08techtoniksetmessages: + msg105639
2010年05月13日 16:01:19techtoniksetmessages: + msg105638
title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff) -> difflib.unified_diff(...) produces invalid patches
2010年05月13日 14:55:34mark.dickinsonsetversions: + Python 2.7, Python 3.2
2010年05月13日 14:55:19mark.dickinsonsetnosy: + mark.dickinson, techtonik
messages: + msg105636
2010年05月13日 14:54:50mark.dickinsonlinkissue8702 superseder
2009年05月26日 16:50:14trentmsetfiles: + python_difflib_no_eol.patch

title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char -> naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff)
messages: + msg88375
stage: test needed -> patch review
2009年05月12日 14:09:32ajaksu2setstage: test needed
versions: + Python 3.1, - Python 2.5, Python 2.4, Python 2.3
2008年02月19日 09:09:53christian.heimessetpriority: normal
keywords: + patch
2008年02月18日 20:25:22trentmsetmessages: + msg62545
2008年02月18日 20:24:08trentmsetfiles: + python_difflib_unified_diff.patch
messages: + msg62544
versions: + Python 2.6, Python 2.4, Python 2.3
2008年02月18日 20:16:55trentmcreate

AltStyle によって変換されたページ (->オリジナル) /