[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Tue Feb 14 08:09:55 CET 2006

On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote:
> bytes(map(ord, str_or_unicode))
>> In other words, without an encoding, bytes() should simply treat 
> str and
> unicode objects *as if they were a sequence of integers*, and 
> produce an
> error when an integer is out of range. This is a logical and 
> consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.

If you're talking about "raw data", then make bytes(unicodestring) 
produce what buffer(unicodestring) currently does -- something 
completely and utterly worthless. :) [it depends on how you compiled 
python and what endianness your system has.]
There really is no case where you don't care about the 
encoding...there is always a specific desired output encoding, and 
you have to think about what encoding that is. The argument that 
latin-1 is a sensible default just because you can convert to latin-1 
by chopping off the upper 3 bytes of a unicode character's ordinal 
position is not convincing; you're still doing an encoding operation, 
it just happens to be computationally easy. That Jython programs have 
to pretend that unicode strings are an appropriate way to store 
bytes, and thus often have to do fake "latin-1" conversions which are 
really no such thing, doesn't make a convincing argument either. 
Using unicode strings to store bytes read from or written to a socket 
is really just broken.
Actually having any default encoding at all is IMO a poor idea, but 
as python has one at the moment (ascii), might as well keep using it 
for consistency until it's eliminated (sys.setdefaultencoding 
('undefined') is my friend.)
James