Message187035
| Author |
vstinner |
| Recipients |
serhiy.storchaka, vstinner |
| Date |
2013年04月15日.22:03:32 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1366063415.31.0.505871872027.issue17742@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
In Python 3.3, I added _PyUnicodeWriter API to factorize code handling a Unicode "buffer", just the code to allocate memory and resize the buffer if needed.
I propose to do the same with a new _PyBytesWriter API. The API is very similar to _PyUnicodeWriter:
* _PyBytesWriter_Init(writer)
* _PyBytesWriter_Prepare(writer, count)
* _PyBytesWriter_WriteStr(writer, bytes_obj)
* _PyBytesWriter_WriteChar(writer, ch)
* _PyBytesWriter_Finish(writer)
* _PyBytesWriter_Dealloc(writer)
The patch changes ASCII, Latin1, UTF-8 and charmap encoders to use _PyBytesWriter API. A second patch changes CJK encoders.
I did not run a benchmark yet. I wrote a patch to factorize the code, not the make the code faster.
Notes on performances:
* I peek the "small buffer allocated on the stack" idea from UTF-8 encoder, but the smaller buffer is always 500 bytes (instead of a size depending on the Unicode maximum character of the input Unicode string)
* _PyBytesWriter overallocates by 25% (when overallocation is enabled), whereas charmap encoders doubles the buffer:
/* exponentially overallocate to minimize reallocations */
if (requiredsize < 2*outsize)
requiredsize = 2*outsize;
* I didn't check if the allocation size is the same with the patch. min_size and overallocate attributes should be set correctly to not make the code slower.
* The code writing a single into a _PyUnicodeWriter buffer is inlined in unicodeobject.c. _PyBytesWriter API does not provide inlined function for the same purpose. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2013年04月15日 22:03:35 | vstinner | set | recipients:
+ vstinner, serhiy.storchaka |
| 2013年04月15日 22:03:35 | vstinner | set | messageid: <1366063415.31.0.505871872027.issue17742@psf.upfronthosting.co.za> |
| 2013年04月15日 22:03:35 | vstinner | link | issue17742 messages |
| 2013年04月15日 22:03:34 | vstinner | create |
|