[Python-Dev] PEP 263 - default encoding

Guido van Rossum guido@python.org
2002年3月15日 14:39:05 -0500


> a. Does this really make sense for UTF-16? It looks to me like a
> great way to induce bugs of the form "write a unicode literal
> containing 0x0A, then translate it to raw form by stripping the u
> prefix."

Of course not. I don't expect anyone to put UTF-16 in their source
encoding cookie. But should we bother making a list of encodings that
shouldn't be used?
> b. No editor is likely to implement correct display to distinguish
> between u"" and just "".

That's fine. Given phase 2, the editor should display the entire file
using the encoding given in the cookie, despite that phase 1 only
applies the encoding to u"" literals. The rest of the file is
supposed to be ASCII, and if it isn't, that's the user's problem.
> c. This definitely breaks Emacs coding cookie semantics. Emacs
> applies the coding cookie to the whole buffer. I don't see a way to
> lose offhand, but this is sufficiently subtle that I don't want to
> break my head trying to prove that you can't lose, either.

I wouldn't worry about that, see above.
> d. You probably have to deprecate ISO 2022 7-bit coding systems, too,
> because people will try to get the representation of a string by
> inputting a raw string in coded form. This might contain a quote
> character.

Good point. This sounds like a documentation issue at worst.
> e. This causes problems for UTF-8 transition, since people will want
> to put arbitrary byte strings in a raw string.

I'm not sure I understand. What do you call a raw string? Do you
mean an r"" literal? Why would people want to use that for arbitrary
binary data? Arbitrary binary data should *always* be encoded using
\xDD hex or \OOO octal escapes.
> But these will not be
> legal UTF-8 files, even though they have a UTF-8 coding cookie.
> People who are trying to do the right thing will have the rules
> changed again later, most likely.

If you're trying to do the right thing you shouldn't be putting
arbitrary binary data in any string literal.
> This means that until editors reliably implement b. and similar
> features, developers must change coding systems to type raw strings
> and Unicode strings.

Sounds like a YAGNI to me.
--Guido van Rossum (home page: http://www.python.org/~guido/)

AltStyle によって変換されたページ (->オリジナル) /