Message215301
| Author |
vstinner |
| Recipients |
ezio.melotti, josh.r, serhiy.storchaka, vstinner |
| Date |
2014年04月01日.08:23:34 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1396340617.89.0.519156771668.issue21118@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
str.translate() currently allocates a buffer of UCS4 characters.
translate_writer.patch:
- modify _PyUnicode_TranslateCharmap() to use the _PyUnicodeWriter API
- drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.
- cleanup also the code: charmaptranslate_output() is now responsible to handle charmaptranslate_lookup() result (to decrement the reference coutner)
str.translate() may be a little bit faster when translating ASCII to ASCII for large string, but not so much.
bytes.translate() is much faster because it builds a C array of 256 items to fast table lookup, whereas str.translate() requires a Python dict lookup for each character, which is much slower.
codecs.charmap_build() (PyUnicode_BuildEncodingMap()) creates a C array ("a three-level trie") for fast lookup. It is used with codecs.charmap_encode() for 8-bit encodings. We may reuse it for simple cases, like translating ASCII to ASCII. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2014年04月01日 08:23:37 | vstinner | set | recipients:
+ vstinner, ezio.melotti, serhiy.storchaka, josh.r |
| 2014年04月01日 08:23:37 | vstinner | set | messageid: <1396340617.89.0.519156771668.issue21118@psf.upfronthosting.co.za> |
| 2014年04月01日 08:23:37 | vstinner | link | issue21118 messages |
| 2014年04月01日 08:23:34 | vstinner | create |
|