[Python-Dev] Triple-quoted strings and indentation

Andrew Durdin adurdin at gmail.com
Wed Jul 6 11:45:52 CEST 2005


Here's the draft PEP I wrote up:
Abstract
 Triple-quoted string (TQS henceforth) literals in Python preserve
 the formatting of the literal string including newlines and 
 whitespace. When a programmer desires no leading whitespace for 
 the lines in a TQS, he must align all lines but the first in the 
 first column, which differs from the syntactic indentation when a 
 TQS occurs within an indented block. This PEP addresses this 
 issue.
Motivation
 TQS's are generally used in two distinct manners: as multiline 
 text used by the program (typically command-line usage information 
 displayed to the user) and as docstrings.
 Here's a hypothetical but fairly typical example of a TQS as a 
 multiline string:
 
 if not interactive_mode:
 if not parse_command_line():
 print """usage: UTIL [OPTION] [FILE]...
 try `util -h' for more information."""
 sys.exit(1)
 Here the second line of the TQS begins in the first column, which 
 at a glance appears to occur after the close of both "if" blocks.
 This results in a discrepancy between how the code is parsed and 
 how the user initially sees it, forcing the user to jump the 
 mental hurdle in realising that the call to sys.exit() is actually 
 within the second "if" block.
 
 Docstrings on the other hand are usually indented to be more 
 readable, which causes them to have extraneous leading whitespace 
 on most lines. To counteract the problem, PEP 257 [1] specifies a 
 standard algorithm for trimming this whitespace.
 
 In the end, the programmer is left with a dilemma: either to align 
 the lines of his TQS to the first column, and sacrifice readability;
 or to indent it to be readable, but have to deal with unwanted
 whitespace.
 This PEP proposes that TQS's should have a certain amount of 
 leading whitespace trimmed by the parser, thus avoiding the 
 drawbacks of the current behaviour.
 
Specification
 Leading whitespace in TQS's will be dealt with in a similar manner 
 to that proposed in PEP 257:
 
 "... strip a uniform amount of indentation from the second
 and further lines of the [string], equal to the minimum 
 indentation of all non-blank lines after the first line. Any 
 indentation in the first line of the [string] (i.e., up to 
 the first newline) is insignificant and removed. Relative 
 indentation of later lines in the [string] is retained."
 Note that a line within the TQS that is entirely blank or consists 
 only whitespace will not count toward the minimum indent, and will 
 be retained as a blank line (possibly with some trailing whitespace).
 
 There are several significant differences between this proposal and
 PEP 257's docstring parsing algorithm:
 
 * This proposal considers all lines to end at the next newline in
 the source code (whether escaped or not); PEP 257's algorithm
 only considers lines to end at the next (necessarily unescaped)
 newline in the parsed string.
 
 * Only literal whitespace is counted; an escape such as \x20 
 will not be counted as indentation.
 
 * Tabs are not converted to spaces.
 * Blank lines at the beginning and end of the TQS will *not* be 
 stripped.
 * Leading whitespace on the first line is preserved, as is 
 trailing whitespace on all lines.
Rationale
 I considered several different ways of determining
 the amount of whitespace to be stripped, including:
 
 1. Determined by the column (after allowing for expanded tabs) of 
 the triple-quote:
 
 myverylongvariablename = """\
 This line is indented,
 But this line is not.
 Note the trailing newline:
 """
 
 + Easily allows all lines to be indented.
 
 - Easily leads to problems due to re-alignment of all but 
 first line when mixed tabs and spaces are used.
 
 - Forces programmers to use a particular level of 
 indentation for continuing TQS's.
 
 - Unclear whether the lines should align with the triple-
 quote or immediately after it.
 - Not backward compatible with most non-docstrings.
 2. Determined by the indent level of the second line of the 
 string:
 
 myverylongvariablename = """\
 This line is not indented (and has no leading newline),
 But this one is.
 Note the trailing newline:
 """
 
 + Allows for flexible alignment of lines.
 
 + Mixed tabs and spaces should be fine (as long as they're 
 consistent).
 
 - Cannot support an indent on the second line of the 
 string (very bad!).
 
 - Not backward compatible with most non-docstrings.
 
 3. Determined by the minimum indent level of all lines after the 
 first:
 
 myverylongvariablename = """\
 This line is indented,
 But this line is not.
 Note the trailing newline:
 """
 
 + Allows for flexible alignment of lines.
 
 + Mixed tabs and spaces should be fine (as long as they're 
 consistent).
 + Backward compatible with all docstrings and a majority of 
 non-docstrings
 - Support for indentation on all lines not immediately 
 obvious
 Overall, solution 3 provided the best balance of features, and 
 (importantly) had the best backward compatibility. I thus
 consider it the most suitable.
Examples
 The examples here are set out in pairs: the first of each pair 
 shows how the TQS must be currently written to avoid indentation 
 issues; the second shows how it can be written using this proposal 
 (although some variation is possible). All examples are taken or 
 adapted from the Python standard library or another real source.
 
 1. Command-line usage information:
 def usage(outfile):
 outfile.write("""Usage: %s [OPTIONS] <file> [ARGS]
 Meta-options:
 --help Display this help then exit.
 --version Output version information then exit.
 """ % sys.argv[0])
 #------------------------#
 
 def usage(outfile):
 outfile.write("""Usage: %s [OPTIONS] <file> [ARGS]
 Meta-options:
 --help Display this help then exit.
 --version Output version information then exit.
 """ % sys.argv[0])
 2. Embedded Python code:
 self.runcommand("""if 1:
 import sys as _sys
 _sys.path = %r
 del _sys
 \n""" % (sys.path,))
 #------------------------#
 self.runcommand("""\
 if 1:
 import sys as _sys
 _sys.path = %r
 del _sys
 \n""" % (sys.path,))
 3. Unit testing
 
 class WrapTestCase(BaseTestCase):
 def test_subsequent_indent(self):
 # Test subsequent_indent parameter
 expect = '''\
 * This paragraph will be filled, first
 without any indentation, and then
 with some (including a hanging
 indent).'''
 result = fill(self.text, 40,
 initial_indent=" * ",
 subsequent_indent=" ")
 self.check(result, expect)
 #------------------------#
 
 class WrapTestCase(BaseTestCase):
 def test_subsequent_indent(self):
 # Test subsequent_indent parameter
 expect = '''\
 * This paragraph will be filled, first
 without any indentation, and then
 with some (including a hanging
 indent).\
 '''
 result = fill(self.text, 40,
 initial_indent=" * ",
 subsequent_indent=" ")
 self.check(result, expect)
 Example 3 illustrates how indentation of all lines (by 2 spaces) 
 is achieved with this proposal: the position of the closing 
 triple quote is used to determine the minimum indentation for the 
 whole string. To avoid a trailing newline in the string, the 
 final newline is escaped. Example 2 avoids the need for this 
 construction by placing the first line (which is not indented) on 
 the line after the triple-quote, and escaping the leading 
 newline.
Backwards Compatibility
 Uses of TQS's fall into two broad categories: those where 
 indentation is significant, and those where it is not. Those in 
 the latter (larger) category, which includes all docstrings, will 
 remain effectively unchanged under this proposal. Docstrings in 
 particular are usually trimmed according to the rules in PEP 257 
 before their value is used; the trimmed strings will be the same 
 under this proposal as they are now.
 
 Of the former category, the majority are those which have at least 
 one line beginning in the first column of the source code; these 
 will be entirely unaffected if left alone, but may be reformatted 
 to increase readability (see example 1 above). However a small 
 number of strings in this first category depend on all lines (or 
 all but the first) being indented. Under this proposal, these 
 will need to be edited to ensure that the intended amount of 
 whitespace is preserved. Examples 2 and 3 above show two 
 different ways to reformat the strings for these cases. Note that 
 in both examples, the overall indentation of the code is cleaner, 
 producing more readable code.
 
 Some evidence may be desired to support the claims made above 
 regarding the distribution of the different uses of TQS's. I have 
 begun some analysis to produce some statistics for these; while 
 still incomplete, I have some initial results for the Python 2.4.1 
 standard library (these figures should not be off by more than a 
 small margin):
 
 In the standard library (some 396,598 lines of Python code), there 
 are 7,318 occurrences of TQS's, an average rate of one per 54 
 lines. Of these, 6,638 (90.7%) are docstrings; the remaining 680 
 (9.3%) are not. A further examination shows that 
 only 64 (0.9%) of these have leading indentation on all lines (the
 only case where the proposed solution is not backward compatible).
 These must be manually checked to determine 
 whether they will be affected; such a check reveals only 7-15 
 TQS's (0.1%-0.2%) that actually need to be edited.
 Although small, the impact of this proposal on compatibility is 
 still more than negligible; if accepted in principle, it might be 
 better suited to be initially implemented as a __future__ feature, 
 or perhaps relegated to Python 3000.
 
Implementation
 An implementation for this proposal has been made; however I have 
 not yet made a patch file with the changes, nor do the changes yet 
 extend to the documentation or other affected areas.
References
 [1] PEP 257, Docstring Conventions, David Goodger, Guido van Rossum
 http://www.python.org/peps/pep-0257.html
Copyright
 This document has been placed in the public domain.


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /