Issue 15612: Rewrite StringIO to use the _PyUnicodeWriter API

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59817

classification

Title:	Rewrite StringIO to use the _PyUnicodeWriter API
Type:	performance	Stage:
Components:	IO, Unicode	Versions:	Python 3.4

process

Dependencies:	Superseder:
Status:	closed	Resolution:	out of date
Assigned To:	Nosy List:	Arfrever, ezio.melotti, pitrou, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2012年08月10日 02:30 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
stringio_unicode_writer.patch	vstinner, 2012年08月10日 02:30	review
bench_stringio.py	vstinner, 2012年08月10日 02:32
bench_stringio2.py	vstinner, 2012年08月11日 15:31

Messages (12)
msg167850 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年08月10日 02:30
Attached patch rewrites the C implementation of StringIO to use the _PyUnicodeWriter API instead of the PyAccu API. It provides better performance when writing non-ASCII strings. The patch adds new functions: - _PyUnicodeWriter_Truncate() - _PyUnicodeWriter_WriteStrAt() - _PyUnicodeWriter_GetValue()
msg167851 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年08月10日 02:32
Results of my micro benchmark. Use attached bench_stringio.py with benchmark.py: https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py Command: ./python benchmark.py script bench_stringio.py ---- Common platform: CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Python unicode implementation: PEP 393 Platform: Linux-3.4.4-4.fc16.x86_64-x86_64-with-fedora-16-Verne Bits: int=32, long=64, long long=64, pointer=64 CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Platform of campaign pyaccu: Date: 2012年08月10日 04:24:53 SCM: hg revision=aaa68dce117e tag=tip branch=default date="2012-08-09 21:38 +0200" Python version: 3.3.0b1 (default:aaa68dce117e, Aug 10 2012, 04:24:19) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] Platform of campaign writer: Date: 2012年08月10日 04:23:21 SCM: hg revision=aaa68dce117e+ tag=tip branch=default date="2012-08-09 21:38 +0200" Python version: 3.3.0b1 (default:aaa68dce117e+, Aug 10 2012, 04:18:39) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] --------------------------------------+-------------+--------------- Tests \| pyaccu \| writer --------------------------------------+-------------+--------------- writer ascii \| 30.4 ms () \| 30.4 ms writer reader ascii \| 37.1 ms () \| 37 ms writer latin1 \| 31.5 ms () \| 30.6 ms writer reader latin1 \| 38.6 ms () \| 37.4 ms writer bmp \| 31.8 ms () \| 29.7 ms (-7%) writer reader bmp \| 40.8 ms () \| 36.6 ms (-10%) writer non-bmp \| 33.4 ms () \| 30.2 ms (-10%) writer reader non-bmp \| 40.9 ms () \| 36.7 ms (-10%) writer long lines ascii \| 7.96 ms () \| 7.34 ms (-8%) writer-reader long lines ascii \| 8.16 ms () \| 7.39 ms (-9%) writer long lines latin1 \| 8.01 ms () \| 7.4 ms (-8%) writer-reader long lines latin1 \| 8.05 ms () \| 7.4 ms (-8%) writer long lines bmp \| 14 ms () \| 9.42 ms (-33%) writer-reader long lines bmp \| 14.2 ms () \| 9.45 ms (-34%) writer long lines non-bmp \| 13.9 ms () \| 9.62 ms (-31%) writer-reader long lines non-bmp \| 14.3 ms () \| 9.63 ms (-32%) writer very long lines ascii \| 7.96 ms () \| 7.36 ms (-7%) writer-reader very long lines ascii \| 8.05 ms () \| 7.37 ms (-8%) writer very long lines latin1 \| 7.98 ms () \| 7.33 ms (-8%) writer-reader very long lines latin1 \| 8 ms () \| 7.39 ms (-8%) writer very long lines bmp \| 14.1 ms () \| 9.34 ms (-34%) writer-reader very long lines bmp \| 14.2 ms () \| 9.4 ms (-34%) writer very long lines non-bmp \| 13.9 ms () \| 9.5 ms (-32%) writer-reader very long lines non-bmp \| 14 ms () \| 9.61 ms (-31%) reader ascii \| 6.48 ms () \| 6.22 ms reader latin1 \| 6.59 ms () \| 6.57 ms reader bmp \| 7.22 ms () \| 6.9 ms reader non-bmp \| 7.65 ms () \| 7.31 ms --------------------------------------+-------------+--------------- Total \| 489 ms (*) \| 431 ms (-12%) --------------------------------------+-------------+---------------
msg167857 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月10日 08:10
> It provides better performance when writing non-ASCII strings. I would like to know why that is the case. If PyUnicode_Join is not optimal, then perhaps we should better optimize it. Also, you should post benchmarks with tiny strings as well.
msg167858 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月10日 08:12
> Also, you should post benchmarks with tiny strings as well. Oops, sorry, they are already there. Thanks for the numbers.
msg167926 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年08月10日 23:12
> I would like to know why that is the case. > If PyUnicode_Join is not optimal, then perhaps we should > better optimize it. I don't know. _PyUnicodeWriter overallocates its buffer (+25%). It may reduce the number of realloc(), and so the number of times that the buffer is copied.
msg167927 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月10日 23:20
> > I would like to know why that is the case. > > If PyUnicode_Join is not optimal, then perhaps we should > > better optimize it. > > I don't know. _PyUnicodeWriter overallocates its buffer (+25%). It may > reduce the number of realloc(), and so the number of times that the > buffer is copied. But PyUnicode_Join doesn't realloc() anything, since it creates a buffer of exactly the right size. So this can't be the answer.
msg167950 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月11日 10:10
Victor, your benchmark is buggy (it writes one character at a time). You should apply the following patch: $ diff -u bench_stringio_orig.py bench_stringio.py --- bench_stringio_orig.py 2012年08月11日 12:02:16.528321958 +0200 +++ bench_stringio.py 2012年08月11日 12:05:53.939536902 +0200 @@ -41,8 +41,8 @@ ('bmp', '\u20ac' * k + '\n'), ('non-bmp', '\U0010ffff' * k + '\n'), ): - bench.bench_func('writer long lines %s' % charset, writer, n // k, text) - bench.bench_func('writer-reader long lines %s' % charset, writer_reader, n // k, text) + bench.bench_func('writer long lines %s' % charset, writer, n, [text]) + bench.bench_func('writer-reader long lines %s' % charset, writer_reader, n, [text]) for charset, text in ( ('ascii', 'a' * (n // 10) + '\n'), @@ -50,8 +50,8 @@ ('bmp', '\u20ac' * (n // 10) + '\n'), ('non-bmp', '\U0010ffff' * (n // 10) + '\n'), ): - bench.bench_func('writer very long lines %s' % charset, writer, 10, text) - bench.bench_func('writer-reader very long lines %s' % charset, writer_reader, 10, text) + bench.bench_func('writer very long lines %s' % charset, writer, 100, [text]) + bench.bench_func('writer-reader very long lines %s' % charset, writer_reader, 100, [text]) data = 'abc\n' * n bench.bench_func('reader ascii', reader, data)
msg167974 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年08月11日 15:31
> Victor, your benchmark is buggy (it writes one character at a time). Oh, it's not what I wanted to test. I attach a new benchmark. Here are the results. PyAccu looks much more appropriate than _PyUnicodeWriter, because it is always faster, except to write 100.000 very long lines. Common platform: CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Bits: int=32, long=64, long long=64, pointer=64 Python unicode implementation: PEP 393 Platform: Linux-3.4.4-4.fc16.x86_64-x86_64-with-fedora-16-Verne Platform of campaign pyaccu: SCM: hg revision=9804aec74d4a tag=tip branch=default date="2012-08-10 18:55 -0400" Date: 2012年08月11日 16:53:46 Python version: 3.3.0b1 (default:9804aec74d4a, Aug 11 2012, 16:53:12) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] Platform of campaign writer: SCM: hg revision=9804aec74d4a+ tag=tip branch=default date="2012-08-10 18:55 -0400" Date: 2012年08月11日 16:50:40 Python version: 3.3.0b1 (default:9804aec74d4a+, Aug 11 2012, 16:33:18) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] --------------------------------------+-------------+--------------- 10 lines \| pyaccu \| writer --------------------------------------+-------------+--------------- reader short line ascii \| 1.53 us () \| 1.46 us writer short line ascii \| 4.85 us () \| 4.48 us (-8%) writer-reader short line ascii \| 6.45 us () \| 5.71 us (-12%) reader short line latin1 \| 1.57 us () \| 1.45 us (-8%) writer short line latin1 \| 4.92 us () \| 4.56 us (-7%) writer-reader short line latin1 \| 6.6 us () \| 5.78 us (-13%) reader short line bmp \| 1.64 us () \| 1.54 us (-6%) writer short line bmp \| 5.01 us () \| 4.43 us (-12%) writer-reader short line bmp \| 6.68 us () \| 5.71 us (-14%) reader short line non-bmp \| 1.61 us () \| 1.59 us writer short line non-bmp \| 5.1 us () \| 4.55 us (-11%) writer-reader short line non-bmp \| 6.74 us () \| 5.66 us (-16%) reader long lines ascii \| 103 us () \| 33.4 us (-68%) writer long lines ascii \| 998 ns () \| 836 ns (-16%) writer-reader long lines ascii \| 1.45 us () \| 1.18 us (-19%) reader long lines latin1 \| 105 us () \| 34.2 us (-67%) writer long lines latin1 \| 997 ns () \| 831 ns (-17%) writer-reader long lines latin1 \| 1.47 us () \| 1.2 us (-18%) reader long lines bmp \| 121 us () \| 85.9 us (-29%) writer long lines bmp \| 995 ns () \| 861 ns (-13%) writer-reader long lines bmp \| 1.43 us () \| 1.13 us (-21%) reader long lines non-bmp \| 97.1 us () \| 99.7 us writer long lines non-bmp \| 1 us () \| 819 ns (-18%) writer-reader long lines non-bmp \| 1.4 us () \| 1.18 us (-16%) reader very long lines ascii \| 1.42 us () \| 1.45 us writer very long lines ascii \| 3.04 us () \| 2.88 us (-5%) writer-reader very long lines ascii \| 4.59 us () \| 4.12 us (-10%) reader very long lines latin1 \| 1.57 us () \| 1.47 us (-7%) writer very long lines latin1 \| 3.04 us () \| 2.73 us (-10%) writer-reader very long lines latin1 \| 4.66 us () \| 4.04 us (-13%) reader very long lines bmp \| 1.55 us () \| 1.55 us writer very long lines bmp \| 3.03 us () \| 2.91 us writer-reader very long lines bmp \| 4.72 us () \| 4.08 us (-14%) reader very long lines non-bmp \| 1.55 us () \| 1.49 us writer very long lines non-bmp \| 3.09 us () \| 2.93 us (-5%) writer-reader very long lines non-bmp \| 4.59 us () \| 4.06 us (-12%) --------------------------------------+-------------+--------------- Total \| 525 us () \| 342 us (-35%) --------------------------------------+-------------+--------------- --------------------------------------+-------------+--------------- 1000 lines \| pyaccu \| writer --------------------------------------+-------------+--------------- reader short line ascii \| 68.2 us () \| 66.1 us writer short line ascii \| 308 us () \| 307 us writer-reader short line ascii \| 378 us () \| 374 us reader short line latin1 \| 72 us () \| 68.5 us writer short line latin1 \| 324 us () \| 313 us writer-reader short line latin1 \| 395 us () \| 383 us reader short line bmp \| 74.8 us () \| 71.9 us writer short line bmp \| 326 us () \| 303 us (-7%) writer-reader short line bmp \| 397 us () \| 378 us reader short line non-bmp \| 72.9 us () \| 72.6 us writer short line non-bmp \| 329 us () \| 304 us (-8%) writer-reader short line non-bmp \| 397 us () \| 383 us reader long lines ascii \| 104 us () \| 33.8 us (-67%) writer long lines ascii \| 1.99 us () \| 2.52 us (+27%) writer-reader long lines ascii \| 4.37 us () \| 3.45 us (-21%) reader long lines latin1 \| 104 us () \| 33.3 us (-68%) writer long lines latin1 \| 2.07 us () \| 2.55 us (+23%) writer-reader long lines latin1 \| 4.51 us () \| 3.57 us (-21%) reader long lines bmp \| 120 us () \| 80.5 us (-33%) writer long lines bmp \| 2.15 us () \| 2.55 us (+18%) writer-reader long lines bmp \| 4.71 us () \| 3.86 us (-18%) reader long lines non-bmp \| 90.6 us () \| 97.6 us (+8%) writer long lines non-bmp \| 2.18 us () \| 2.68 us (+23%) writer-reader long lines non-bmp \| 4.24 us () \| 4.05 us reader very long lines ascii \| 2.53 us () \| 1.66 us (-34%) writer very long lines ascii \| 3.07 us () \| 3.46 us (+13%) writer-reader very long lines ascii \| 6.18 us () \| 4.89 us (-21%) reader very long lines latin1 \| 2.57 us () \| 1.75 us (-32%) writer very long lines latin1 \| 3.16 us () \| 3.46 us (+10%) writer-reader very long lines latin1 \| 6.32 us () \| 4.98 us (-21%) reader very long lines bmp \| 2.7 us () \| 2.34 us (-14%) writer very long lines bmp \| 3.52 us () \| 3.65 us writer-reader very long lines bmp \| 6.73 us () \| 5.7 us (-15%) reader very long lines non-bmp \| 2.45 us () \| 2.35 us writer very long lines non-bmp \| 3.47 us () \| 3.87 us (+12%) writer-reader very long lines non-bmp \| 5.98 us () \| 5.85 us --------------------------------------+-------------+--------------- Total \| 3.63 ms () \| 3.34 ms (-8%) --------------------------------------+-------------+--------------- --------------------------------------+-------------+--------------- 100000 lines \| pyaccu \| writer --------------------------------------+-------------+--------------- reader short line ascii \| 6.74 ms () \| 6.43 ms writer short line ascii \| 30.7 ms () \| 29.8 ms writer-reader short line ascii \| 37.5 ms () \| 36.6 ms reader short line latin1 \| 7.08 ms () \| 6.64 ms (-6%) writer short line latin1 \| 31.3 ms () \| 30.1 ms writer-reader short line latin1 \| 38.8 ms () \| 37.5 ms reader short line bmp \| 7.46 ms () \| 6.98 ms (-6%) writer short line bmp \| 32 ms () \| 29 ms (-9%) writer-reader short line bmp \| 40.5 ms () \| 35.9 ms (-11%) reader short line non-bmp \| 7.36 ms () \| 7.23 ms writer short line non-bmp \| 33.3 ms () \| 29.4 ms (-12%) writer-reader short line non-bmp \| 40.5 ms () \| 36.5 ms (-10%) reader long lines ascii \| 103 us () \| 32.6 us (-68%) writer long lines ascii \| 59.4 us () \| 66.5 us (+12%) writer-reader long lines ascii \| 220 us () \| 99.2 us (-55%) reader long lines latin1 \| 105 us () \| 32.2 us (-69%) writer long lines latin1 \| 60.2 us () \| 67.3 us (+12%) writer-reader long lines latin1 \| 240 us () \| 97.6 us (-59%) reader long lines bmp \| 122 us () \| 76.9 us (-37%) writer long lines bmp \| 62.1 us () \| 73.8 us (+19%) writer-reader long lines bmp \| 242 us () \| 151 us (-38%) reader long lines non-bmp \| 95.7 us () \| 92.1 us writer long lines non-bmp \| 76.5 us () \| 90.3 us (+18%) writer-reader long lines non-bmp \| 198 us () \| 173 us (-12%) reader very long lines ascii \| 91.6 us () \| 11.5 us (-87%) writer very long lines ascii \| 7.15 us () \| 11.9 us (+67%) writer-reader very long lines ascii \| 145 us () \| 20.1 us (-86%) reader very long lines latin1 \| 110 us () \| 12 us (-89%) writer very long lines latin1 \| 7.52 us () \| 12.1 us (+61%) writer-reader very long lines latin1 \| 165 us () \| 20.7 us (-87%) reader very long lines bmp \| 91.1 us () \| 46.7 us (-49%) writer very long lines bmp \| 12.3 us () \| 22.5 us (+82%) writer-reader very long lines bmp \| 150 us () \| 61.9 us (-59%) reader very long lines non-bmp \| 66.8 us () \| 66.6 us writer very long lines non-bmp \| 22.4 us () \| 38.4 us (+72%) writer-reader very long lines non-bmp \| 108 us () \| 87.7 us (-19%) --------------------------------------+-------------+--------------- Total \| 316 ms () \| 294 ms (-7%) --------------------------------------+-------------+--------------- -------------+-------------+-------------- Summary \| pyaccu \| writer -------------+-------------+-------------- 10 lines \| 525 us () \| 342 us (-35%) 1000 lines \| 3.63 ms () \| 3.34 ms (-8%) 100000 lines \| 316 ms () \| 294 ms (-7%) -------------+-------------+-------------- Total \| 320 ms (*) \| 297 ms (-7%) -------------+-------------+--------------
msg167975 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年08月11日 15:35
"PyAccu looks much more appropriate than _PyUnicodeWriter, because it is always faster, except to write 100.000 very long lines." Oh... I added colors to my tool, but there was a bug: I used the wrong colors... It's just the opposite. _PyUnicodeWriter is almost always faster, except to write more than 100.000 very long lines.
msg167977 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月11日 16:19
> _PyUnicodeWriter is almost always faster Actually, PyAccu is consistently faster for the "writer" case, while _PyUnicodeWriter is faster for the "writer-reader" case. This is not because of PyAccu, but because of the way StringIO uses it: when e.g. readline() is called, the PyAccu result is converted into a PyUCS4* buffer, then each readline() result is converted again by finding the max char in the sub-buffer. So I would suggest using PyAccu, but converting its result to a _PyUnicodeWriter rather than a PyUCS4* buffer.
msg167978 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年08月11日 16:45
See benchmark results in issue15381 (the patch is not applicable to StringIO). These numbers show that resize strategy can be much slower append/join strategy on Windows.
msg238415 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2015年03月18日 11:04
I'm no more interested to work on this issue, and it's not clear that _PyUnicodeWriter is always faster. Switch from a list to _PyUnicodeWriter on a specific event would make the code much more complex. I prefer to just close the issue.

History
Date	User	Action	Args
2022年04月11日 14:57:34	admin	set	github: 59817
2015年03月18日 11:04:56	vstinner	set	status: open -> closed resolution: out of date messages: + msg238415
2012年09月25日 00:18:34	Arfrever	set	nosy: + Arfrever
2012年08月11日 16:45:34	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg167978
2012年08月11日 16:19:35	pitrou	set	messages: + msg167977
2012年08月11日 15:35:26	vstinner	set	messages: + msg167975
2012年08月11日 15:31:19	vstinner	set	files: + bench_stringio2.py messages: + msg167974
2012年08月11日 10:10:55	pitrou	set	messages: + msg167950
2012年08月10日 23:20:23	pitrou	set	messages: + msg167927
2012年08月10日 23:12:28	vstinner	set	messages: + msg167926
2012年08月10日 08:12:08	pitrou	set	messages: + msg167858
2012年08月10日 08:10:01	pitrou	set	messages: + msg167857
2012年08月10日 02:33:33	vstinner	set	title: Rewriter StringIO to use the _PyUnicodeWriter API -> Rewrite StringIO to use the _PyUnicodeWriter API
2012年08月10日 02:33:22	vstinner	set	type: performance
2012年08月10日 02:32:19	vstinner	set	files: + bench_stringio.py messages: + msg167851
2012年08月10日 02:30:29	vstinner	create

homepage