[Python-checkins] cpython: #14332: provide a better explanation of junk in difflib docs

andrew.kuchling python-checkins at python.org
Wed Mar 19 21:44:26 CET 2014


http://hg.python.org/cpython/rev/0a69b1e8b7fe
changeset: 89861:0a69b1e8b7fe
user: Andrew Kuchling <amk at amk.ca>
date: Wed Mar 19 16:43:06 2014 -0400
summary:
 #14332: provide a better explanation of junk in difflib docs
Initial patch by Alba Magallanes.
files:
 Doc/library/difflib.rst | 14 +++++++++++---
 Lib/difflib.py | 26 +++++++++++++-------------
 2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst
--- a/Doc/library/difflib.rst
+++ b/Doc/library/difflib.rst
@@ -27,7 +27,9 @@
 little fancier than, an algorithm published in the late 1980's by Ratcliff and
 Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
 find the longest contiguous matching subsequence that contains no "junk"
- elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
+ elements; these "junk" elements are ones that are uninteresting in some
+ sense, such as blank lines or whitespace. (Handling junk is an
+ extension to the Ratcliff and Obershelp algorithm.) The same
 idea is then applied recursively to the pieces of the sequences to the left and
 to the right of the matching subsequence. This does not yield minimal edit
 sequences, but does tend to yield matches that "look right" to people.
@@ -210,7 +212,7 @@
 Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
 delta (a :term:`generator` generating the delta lines).
 
- Optional keyword parameters *linejunk* and *charjunk* are for filter functions
+ Optional keyword parameters *linejunk* and *charjunk* are filtering functions
 (or ``None``):
 
 *linejunk*: A function that accepts a single string argument, and returns
@@ -224,7 +226,7 @@
 *charjunk*: A function that accepts a character (a string of length 1), and
 returns if the character is junk, or false if not. The default is module-level
 function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
- blank or tab; note: bad idea to include newline in this!).
+ blank or tab; it's a bad idea to include newline in this!).
 
 :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
 
@@ -624,6 +626,12 @@
 length 1), and returns true if the character is junk. The default is ``None``,
 meaning that no character is considered junk.
 
+ These junk-filtering functions speed up matching to find
+ differences and do not cause any differing lines or characters to
+ be ignored. Read the description of the
+ :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
+ parameter for an explanation.
+
 :class:`Differ` objects are used (deltas generated) via a single method:
 
 
diff --git a/Lib/difflib.py b/Lib/difflib.py
--- a/Lib/difflib.py
+++ b/Lib/difflib.py
@@ -853,10 +853,9 @@
 and return true iff the string is junk. The module-level function
 `IS_LINE_JUNK` may be used to filter out lines without visible
 characters, except for at most one splat ('#'). It is recommended
- to leave linejunk None; as of Python 2.3, the underlying
- SequenceMatcher class has grown an adaptive notion of "noise" lines
- that's better than any static definition the author has ever been
- able to craft.
+ to leave linejunk None; the underlying SequenceMatcher class has
+ an adaptive notion of "noise" lines that's better than any static
+ definition the author has ever been able to craft.
 
 - `charjunk`: A function that should accept a string of length 1. The
 module-level function `IS_CHARACTER_JUNK` may be used to filter out
@@ -1299,17 +1298,18 @@
 Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
 
 Optional keyword parameters `linejunk` and `charjunk` are for filter
- functions (or None):
+ functions, or can be None:
 
- - linejunk: A function that should accept a single string argument, and
+ - linejunk: A function that should accept a single string argument and
 return true iff the string is junk. The default is None, and is
- recommended; as of Python 2.3, an adaptive notion of "noise" lines is
- used that does a good job on its own.
+ recommended; the underlying SequenceMatcher class has an adaptive
+ notion of "noise" lines.
 
- - charjunk: A function that should accept a string of length 1. The
- default is module-level function IS_CHARACTER_JUNK, which filters out
- whitespace characters (a blank or tab; note: bad idea to include newline
- in this!).
+ - charjunk: A function that accepts a character (string of length
+ 1), and returns true iff the character is junk. The default is
+ the module-level function IS_CHARACTER_JUNK, which filters out
+ whitespace characters (a blank or tab; note: it's a bad idea to
+ include newline in this!).
 
 Tools/scripts/ndiff.py is a command-line front-end to this function.
 
@@ -1680,7 +1680,7 @@
 tabsize -- tab stop spacing, defaults to 8.
 wrapcolumn -- column number where lines are broken and wrapped,
 defaults to None where lines are not wrapped.
- linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
+ linejunk,charjunk -- keyword arguments passed into ndiff() (used by
 HtmlDiff() to generate the side by side HTML differences). See
 ndiff() documentation for argument default values and descriptions.
 """
-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list

AltStyle によって変換されたページ (->オリジナル) /