[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Michael Foord fuzzyman at voidspace.org.uk
Tue Jun 28 19:22:38 CEST 2011


On 28/06/2011 18:06, Terry Reedy wrote:
> On 6/28/2011 10:46 AM, Paul Moore wrote:
>>> I use Windows, and come from the UK, so 99% of my text files are
>> ASCII. So the majority of my code will be unaffected. But in the
>> occasional situation where I use a £ sign, I'll get encoding errors,
>> I do not understand this. With utf-8 you would never get a string 
> encoding error.
>
I assumed he meant that files written out as utf-8 by python would then 
be read in using the platform encoding (i.e. not utf-8 on Windows) by 
the other applications he is inter-operating with. The error would not 
be in Python but in those applications.
>> where currently things will "just work".
>> As long as you only use the machine-dependent restricted character set.
>
Which is the situation he is describing. You do go into those details 
below, and which choice is "correct" depends on which trade-off you want 
to make.
For the sake of backwards compatibility we are probably stuck with the 
current trade-off however - unless we deprecate using open(...) without 
an explicit encoding.
All the best,
Michael
> > And the failures will be data dependent, and hence intermittent
> > (the worst type of problem).
>> That is the situation now, with platform/machine dependencies added in.
> Some people share code with other machines, even locally.
>>> So, in effect, you propose making the default favour writing
>> multiplatform portable code at the expense of quick and dirty scripts?
>> Let us frame it another way. Should Python installations be compatible 
> with other Python installations, or with the other apps on the same 
> machine? Part of the purpose of Python is to cover up platform 
> differences, to the extent possible (and perhaps sensible -- there is 
> the argument). This was part of the purpose of writing our own io 
> module instead of using the compiler stdlib. The evolution of floating 
> point math has gone in the same direction. For instance, float now 
> expects uniform platform-independent Python-dependent names for 
> infinity and nan instead of compiler-dependent names.
>> As for practicality. Notepad++ on Windows offers ANSI, utf-8 (w,w/o 
> BOM), utf-16 (big/little endian). I believe that ODF documents are 
> utf-8 encoded xml (compressed or not). My original claim for this 
> proposal was/is that even Windows apps are moving to uft-8 and that 
> someday making that the default for Python everywhere will be the 
> obvious and sensible thing.
>
-- 
http://www.voidspace.org.uk/
May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /