[Python-checkins] peps: incorporated %a comments; general clean-up

Tue Mar 25 23:33:59 CET 2014

http://hg.python.org/peps/rev/829c7e796eb8
changeset: 5429:829c7e796eb8
user: Ethan Furman <ethan at stoneleaf.us>
date: Tue Mar 25 15:33:49 2014 -0700
summary:
 incorporated %a comments; general clean-up
files:
 pep-0461.txt | 91 +++++++++++++++++++++------------------
 1 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/pep-0461.txt b/pep-0461.txt
--- a/pep-0461.txt
+++ b/pep-0461.txt
@@ -8,8 +8,8 @@
 Content-Type: text/x-rst
 Created: 2014年01月13日
 Python-Version: 3.5
-Post-History: 2014年01月14日, 2014年01月15日, 2014年01月17日, 2014年02月22日
-Resolution: 
+Post-History: 2014年01月14日, 2014年01月15日, 2014年01月17日, 2014年02月22日, 2014年03月25日
+Resolution:
 
 
 Abstract
@@ -40,13 +40,8 @@
 restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
 writing new wire format code, and in porting Python 2 wire format code.
 
-
-Overriding Principles
-=====================
-
-In order to avoid the problems of auto-conversion and Unicode exceptions
-that could plague Python 2 code, ``str`` objects will not be supported as
-interpolation values [4]_ [5]_.
+Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
+formats, and ``FTP`` and ``HTTP`` communications, among many others.
 
 
 Proposed semantics for ``bytes`` and ``bytearray`` formatting
@@ -57,23 +52,31 @@
 
 All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
 ``%g``, etc.) will be supported, and will work as they do for str, including
-the padding, justification and other related modifiers.
+the padding, justification and other related modifiers. The only difference
+will be that the results from these codes will be ASCII-encoded text, not
+unicode. In other words, for any numeric formatting code `%x`::
 
-Example::
+ b"%x" % val
+
+is equivalent to
+
+ ("%x" % val).encode("ascii")
+
+Examples::
 
 >>> b'%4x' % 10
 b' a'
 
- >>> '%#4x' % 10
+ >>> b'%#4x' % 10
 ' 0xa'
 
- >>> '%04X' % 10
+ >>> b'%04X' % 10
 '000A'
 
 ``%c`` will insert a single byte, either from an ``int`` in range(256), or from
 a ``bytes`` argument of length 1, not from a ``str``.
 
-Example::
+Examples::
 
 >>> b'%c' % 48
 b'0'
@@ -81,7 +84,9 @@
 >>> b'%c' % b'a'
 b'a'
 
-``%s`` is restricted in what it will accept::
+``%s`` is included for two reasons: 1) `b` is already a format code for
+``format`` numerics (binary), and 2) it will make 2/3 code easier as Python 2.x
+code uses ``%s``; however, it is restricted in what it will accept::
 
 - input type supports ``Py_buffer`` [6]_?
 use it to collect the necessary bytes
@@ -89,40 +94,46 @@
 - input type is something else?
 use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError``
 
+In particular, ``%s`` will not accept numbers (use a numeric format code for
+that), nor ``str`` (encode it to ``bytes``).
+
 Examples::
 
 >>> b'%s' % b'abc'
 b'abc'
 
+ >>> b'%s' % 'some string'.encode('utf8')
+ b'some string'
+
 >>> b'%s' % 3.14
 Traceback (most recent call last):
 ...
- TypeError: 3.14 has no __bytes__ method, use a numeric code instead
+ TypeError: b'%s' does not accept numbers, use a numeric code instead
 
 >>> b'%s' % 'hello world!'
 Traceback (most recent call last):
 ...
- TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
+ TypeError: b'%s' does not accept 'str', it must be encoded to `bytes`
+
+
+``%a`` will call ``ascii()`` on the interpolated value. This is intended
+as a debugging aid, rather than something that should be used in production.
+Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
+representation. Use cases include developing a new protocol and writing
+landmarks into the stream; debugging data going into an existing protocol
+to see if the problem is the protocol itself or bad data; a fall-back for a
+serialization format; or even a rudimentary serialization format when
+defining ``__bytes__`` would not be appropriate [8].
 
 .. note::
 
- Because the ``str`` type does not have a ``__bytes__`` method, attempts to
- directly use ``'a string'`` as a bytes interpolation value will raise an
- exception. To use strings they must be encoded or otherwise transformed
- into a ``bytes`` sequence::
-
- 'a string'.encode('latin-1')
-
-``%a`` will call ``ascii()`` on the interpolated value's ``repr()``.
-This is intended as a debugging aid, rather than something that should be used
-in production. Non-ascii values will be encoded to either ``\xnn`` or ``\unnnn``
-representation.
+ If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
 
 
 Unsupported codes
 -----------------
 
-``%r`` (which calls ``__repr__`` and returns a '`str`') is not supported.
+``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported.
 
 
 Proposed variations
@@ -131,6 +142,9 @@
 It was suggested to let ``%s`` accept numbers, but since numbers have their own
 format codes this idea was discarded.
 
+It has been suggested to use ``%b`` for bytes as well as ``%s``. This was
+rejected as not adding any value either in clarity or simplicity.
+
 It has been proposed to automatically use ``.encode('ascii','strict')`` for
 ``str`` arguments to ``%s``.
 
@@ -171,19 +185,11 @@
 ``bytes`` and ``bytearray`` already have several methods which assume an ASCII
 compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to name
 just a few. %-interpolation, with its very restricted mini-language, will not
-be any more of a nuisance than the already existing methdods.
+be any more of a nuisance than the already existing methods.
 
-
-Open Questions
-==============
-
-It has been suggested to use ``%b`` for bytes as well as ``%s``.
-
- - Pro: clearly says 'this is bytes'; should be used for new code.
-
- - Con: does not exist in Python 2.x, so we would have two ways of doing the
- same thing, ``%s`` and ``%b``, with no difference between them.
-
+Some have objected to allowing the full range of numeric formatting codes with
+the claim that decimal alone would be sufficient. However, at least two
+formats (dbf and pdf) make use of non-decimal numbers.
 
 
 Footnotes
@@ -197,8 +203,7 @@
 .. [6] http://docs.python.org/3/c-api/buffer.html
 examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
 .. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
-.. [8] mainly implicit encode/decode, with intermittent errors when the data
- was not ASCII compatible
+.. [8] https://mail.python.org/pipermail/python-dev/2014-February/132750.html
 
 
 Copyright
-- 
Repository URL: http://hg.python.org/peps