Re: [Python-Dev] PEP 460 reboot

2014年1月14日 09:32:44 -0800

Brett,
I like your proposal. There is one idea I have that could,
perhaps, improve it:
1. "%s" and "{}" will continue to work for bytes and bytearray in
the following fashion:
 - check if __bytes__/Py_buffer supported.
 - if it is, check that the bytes are strictly in the printable 
  ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc).
  Throw an error if the check fails. If not - concatenate.
 - Try str(), and do ".encode(‘ascii’, ‘stcict’)" on the result.
This way *most* of the use cases of python2 will be covered without
touching the code. So:
 - b’Hello {}’.format(‘world’) 
  will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)
 - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError
 - b’Status: {}’.format(200)
  will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)
 - b’Hello %s’ % (‘world’,) - the same as the first example
 - b’Connection: {}’.format(b’keep-alive’) - works
 - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept
I think it’s OK to check the buffers for ASCII-subset only. Yes, it
will have some sort of sub-optimal performance, but then, it’s quite
rare when string formatting is used to concatenate huge buffers.
2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and 
Py_buffer.
-- 
Yury Selivanov
On January 14, 2014 at 11:31:51 AM, Brett Cannon ([email protected]) wrote:
> 
> On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum 
> wrote:
> 
> > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon 
> wrote:
> > > I have been going on the assumption that bytes.format() would 
> change what
> > > '{}' meant for itself and would only interpolate bytes. That 
> convenient
> > > between Python 2 and 3 since it represents what we want it to 
> (str and
> > bytes
> > > under the hood, respectively), so it just falls through. We 
> could also
> > add a
> > > 'b' conversion for bytes() explicitly so as to help people 
> not
> > accidentally
> > > mix up things in bytes.format() and str.format(). But I was 
> not
> > suggesting
> > > adding a specific format spec for bytes but instead making 
> bytes.format()
> > > just do the .encode('ascii') automatically to help with compatibility 
> > when a
> > > format spec was present. If people want fancy formatting for 
> bytes they
> > can
> > > always do it themselves before calling bytes.format().
> >
> > This seems hastily written (e.g. verb missing :-), and I'm not 
> clear
> > on what you are (or were) actually proposing. When exactly would 
> > bytes.format() need .encode('ascii')?
> >
> > I would be happy to wait a few hours or days for you to to write it 
> up
> > clearly, rather than responding in a hurry.
> 
> 
> Sorry about that. Busy day at work + trying to stay on top of this 
> entire
> conversation was a bit tough. Let me try to lay out what I'm suggesting 
> for
> bytes.format() in terms of how it changes
> http://docs.python.org/3/library/string.html#format-string-syntax 
> for bytes.
> 
> 1. New conversion operator of 'b' that operates as PEP 460 specifies 
> (i.e.
> tries to get a buffer, else calls __bytes__). The default conversion 
> changes from 's' to 'b'.
> 2. Use of the conversion field adds an added step of calling
> str.encode('ascii', 'strict') on the result returned from 
> calling
> __format__().
> 
> That's it. So point 1 means that the following would work in Python 
> 3.5::
> 
> b'Hello, {}, how are you?'.format(b'Guido')
> b'Hello, {!b}, how are you?'.format(b'Guido')
> 
> It would produce an error if you used a text argument for 'Guido' 
> since str
> doesn't define __bytes__ or a buffer. That gives the EIBTI group 
> their
> bytes.format() where nothing magical happens.
> 
> For point 2, let's say you have the following in Python 2::
> 
> 'I have {} bottles of beer on the wall'.format(10)
> 
> Under my proposal, how would you change it to get the same result 
> in Python
> 2 and 3?::
> 
> b'I have {:d} bottles of beer on the wall'.format(10)
> 
> In Python 2 you're just being more explicit about the format, 
> otherwise
> it's the same semantics as today. In Python 3, though, this would 
> translate
> into (under the hood)::
> 
> b'I have {} bottles of beer on the wall'.format(format(10,
> 'd').encode('ascii', 'strict'))
> 
> This leads to the same bytes value in Python 2 (since it's just 
> a string)
> and in Python 3 (as everything accepted by bytes.format() is 
> either bytes
> already or converted to from encoding to ASCII bytes). While 
> Python 2 users
> would need to make sure they used a format spec to get the same result 
> in
> both Python 2 and 3 for ASCII bytes, it's a minor change which also 
> makes
> the format more explicit so it's not an inherently bad thing. 
> And for those
> that don't want to utilize the automatic ASCII encoding they 
> can just not
> use a format spec in the format string and just pass in bytes directly 
> (i.e. call __format__() themselves and then call str.encode() 
> on their
> own). So PBP people get to have a simple way to use bytes.format() 
> in
> Python 2 and 3 when dealing with things that can be represented 
> as ASCII
> (just as the bytes methods allow for currently).
> 
> I think this covers your desire to have numbers and anything else 
> that can
> be represented as ASCII be supported for easy porting while covering 
> my
> desire that any automatic encoding is clearly explicit in the 
> format string
> and in no way special-cased for only some types (the introduction 
> of a 'c'
> converter from PEP 460 is also fine with me).
> 
> How you would want to translate this proposal with the % operator 
> I'm not
> sure since it has been quite a while since I last seriously used 
> it and so
> I don't think I'm in a good position to propose a shift for it.
> _______________________________________________
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev 
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com 
> 
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to