Message142177
| Author |
pitrou |
| Recipients |
Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner |
| Date |
2011年08月16日.09:18:45 |
| SpamBayes Score |
0.00020139122 |
| Marked as misclassified |
No |
| Message-id |
<1313486199.3542.3.camel@localhost.localdomain> |
| In-reply-to |
<1313485930.8.0.601749695449.issue10542@psf.upfronthosting.co.za> |
| Content |
> I think the 4 macros:
> #define _Py_UNICODE_ISSURROGATE
> #define _Py_UNICODE_ISHIGHSURROGATE
> #define _Py_UNICODE_ISLOWSURROGATE
> #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.
I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)
> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
>
> They will also be used in many str methods and afaiu PEP 393 should
> address that. I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.
AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2011年08月16日 09:18:46 | pitrou | set | recipients:
+ pitrou, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, vstinner, eric.smith, stutzbach, ezio.melotti, tchrist |
| 2011年08月16日 09:18:45 | pitrou | link | issue10542 messages |
| 2011年08月16日 09:18:45 | pitrou | create |
|