Message 121582 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	amaury.forgeotdarc, belopolsky, ezio.melotti, vstinner
Date	2010年11月20日.00:15:55
SpamBayes Score	2.0913382e-11
Marked as misclassified	No
Message-id	<201011200115.49177.victor.stinner@haypocalc.com>
In-reply-to	<AANLkTimBUsS3VQixbG-NLHP_PrDvK7jxHZQUfmqBCvam@mail.gmail.com>

Content
On Friday 19 November 2010 21:58:25 you wrote: > > I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long > > (210 lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas > > ASCII decode is just: "unicode_char = (Py_UNICODE)byte;" + an if before > > to check that 0 <= byte <= 127). > > I don't think we need 210 lines to replace "s++ = f" with proper > UTF-8 logic. Even if we do, the code can be shared with > PyUnicode_DecodeUTF8 and a UTF-8 iterator may be a welcome addition to > Python C API. Why should we do that? ASCII format is just fine. Remember that PyUnicode_FromFormatV() is part of the C API. I don't think that anyone would use non-ASCII format in C. If someone does that, (s)he should open a new issue for that :-) But I don't think that we should make the code more complex if it's just useless. Victor

Content

On Friday 19 November 2010 21:58:25 you wrote:
> > I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long
> > (210 lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas
> > ASCII decode is just: "unicode_char = (Py_UNICODE)byte;" + an if before
> > to check that 0 <= byte <= 127).
> 
> I don't think we need 210 lines to replace "*s++ = *f" with proper
> UTF-8 logic. Even if we do, the code can be shared with
> PyUnicode_DecodeUTF8 and a UTF-8 iterator may be a welcome addition to
> Python C API.
Why should we do that? ASCII format is just fine. Remember that 
PyUnicode_FromFormatV() is part of the C API. I don't think that anyone would 
use non-ASCII format in C. If someone does that, (s)he should open a new issue 
for that :-) But I don't think that we should make the code more complex if 
it's just useless.
Victor

History
Date	User	Action	Args
2010年11月20日 00:15:56	vstinner	set	recipients: + vstinner, amaury.forgeotdarc, belopolsky, ezio.melotti
2010年11月20日 00:15:55	vstinner	link	issue9769 messages
2010年11月20日 00:15:55	vstinner	create

homepage