homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, jcea, loewis, skrah, vstinner
Date 2011年12月10日.13:13:49
SpamBayes Score 1.640063e-07
Marked as misclassified No
Message-id <4EE35B06.7060300@haypocalc.com>
In-reply-to <1323465152.13.0.29944008784.issue13570@psf.upfronthosting.co.za>
Content
Le 09/12/2011 22:12, Stefan Krah a écrit :
> The bottleneck in _decimal is (res is ascii):
>
> PyUnicode_FromString(res);
>
> PyUnicode_DecodeASCII(res) has the same performance.
>
>
> With this function ...
>
> static PyObject*
> unicode_fromascii(const char* s, Py_ssize_t size)
> {
> PyObject *res;
> res = PyUnicode_New(size, 127);
> if (!res)
> return NULL;
> memcpy(PyUnicode_1BYTE_DATA(res), s, size);
> return res;
> }
>
> ... I get the same performance as with Python 2.7 (5.85s)!
The problem is that unicode_fromascii() is unsafe: it doesn't check 
that the string is pure ASCII. That's why this function is private.
Because of the PEP 383, ASCII and UTF-8 decoders (PyUnicode_DecodeASCII 
and PyUnicode_FromString) have to first scan the input to check for 
errors, and then do a fast memcpy. The scanner of these two decoders is 
already optimized to process the input string word by word (word=the C 
long type), instead of byte by byte, using a bit mask.
--
You can write your own super fast ASCII decoder using two lines:
 res = PyUnicode_New(size, 127);
 memcpy(PyUnicode_1BYTE_DATA(res), s, size);
(this is exactly what unicode_fromascii does)
> I think it would be really beneficial for C-API users to have
> more ascii low level functions that don't do error checking and
> are simply as fast as possible.
It is really important to ensure that a ASCII string doesn't contain 
characters outside [U+0000; U+007F] because many operations on ASCII 
string are optimized (e.g. UTF-8 pointer is shared with the ASCII pointer).
I prefer to not expose such function or someone will use it without 
understanding exactly how dangerous it is.
Martin and other may disagree with me.
Do you know Murphy's Law? :-)
http://en.wikipedia.org/wiki/Murphy%27s_law 
History
Date User Action Args
2011年12月10日 13:15:50vstinnersetrecipients: + vstinner, loewis, jcea, ezio.melotti, skrah
2011年12月10日 13:13:49vstinnerlinkissue13570 messages
2011年12月10日 13:13:49vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /