homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Faster utf-32 decoder
Type: performance Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, asvetlov, ezio.melotti, georg.brandl, pitrou, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: 3.3regression, needs review, patch

Created on 2012年04月19日 20:59 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
decode_utf32_a_3.patch serhiy.storchaka, 2012年10月20日 19:10 review
decode_utf32_b_3.patch serhiy.storchaka, 2012年10月20日 19:11 review
Messages (14)
msg158749 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年04月19日 20:59
I suggest two variants of patch, accelerating the utf-32 decoder. With PEP 393 utf-32 decoder slowed down up to 2x, these patches returns a performance at the level of Python 3.2 and even much higher (2-3x over 3.2). The variant A is simpler, but the variant B is a little faster (+8-15%).
msg158750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年04月19日 21:03
See also #14624 for UTF-16 decoder.
msg159078 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年04月23日 21:01
Here are the results of benchmarking (numbers in MB/s).
On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:
 Py2.7 Py3.2 Py3.3 patchA patchB
utf-32le 'A'*10000 461 (+215%) 454 (+220%) 292 (+398%) 1213 (+20%) 1454
utf-32le '\x80'*10000 458 (+177%) 454 (+180%) 271 (+369%) 1124 (+13%) 1270
utf-32le '\x80'+'A'*9999 462 (+171%) 454 (+175%) 271 (+361%) 1111 (+13%) 1250
utf-32le '\u0100'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1067
utf-32le '\u0100'+'A'*9999 461 (+129%) 454 (+132%) 220 (+379%) 993 (+6%) 1054
utf-32le '\u0100'+'\x80'*9999 458 (+131%) 454 (+133%) 220 (+381%) 1002 (+6%) 1059
utf-32le '\u8000'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1066
utf-32le '\u8000'+'A'*9999 462 (+128%) 454 (+132%) 220 (+379%) 994 (+6%) 1053
utf-32le '\u8000'+'\x80'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1000 (+6%) 1061
utf-32le '\u8000'+'\u0100'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1002 (+6%) 1061
utf-32le '\U00010000'*10000 386 (+167%) 416 (+148%) 212 (+386%) 930 (+11%) 1031
utf-32le '\U00010000'+'A'*9999 461 (+126%) 415 (+152%) 222 (+370%) 940 (+11%) 1044
utf-32le '\U00010000'+'\x80'*9999 458 (+125%) 415 (+148%) 222 (+364%) 930 (+11%) 1031
utf-32le '\U00010000'+'\u0100'*9999 458 (+125%) 415 (+148%) 212 (+386%) 930 (+11%) 1031
utf-32le '\U00010000'+'\u8000'*9999 458 (+125%) 415 (+149%) 222 (+365%) 930 (+11%) 1032
utf-32be 'A'*10000 461 (+216%) 454 (+221%) 292 (+399%) 1209 (+20%) 1456
utf-32be '\x80'*10000 457 (+177%) 454 (+179%) 271 (+368%) 1125 (+13%) 1268
utf-32be '\x80'+'A'*9999 462 (+171%) 453 (+176%) 271 (+362%) 1112 (+12%) 1251
utf-32be '\u0100'*10000 457 (+144%) 453 (+146%) 220 (+407%) 1048 (+6%) 1116
utf-32be '\u0100'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104
utf-32be '\u0100'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1047 (+6%) 1112
utf-32be '\u8000'*10000 457 (+144%) 453 (+147%) 220 (+408%) 1046 (+7%) 1117
utf-32be '\u8000'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104
utf-32be '\u8000'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1045 (+6%) 1112
utf-32be '\u8000'+'\u0100'*9999 459 (+142%) 454 (+144%) 220 (+404%) 1047 (+6%) 1109
utf-32be '\U00010000'*10000 386 (+155%) 416 (+137%) 212 (+364%) 940 (+5%) 984
utf-32be '\U00010000'+'A'*9999 461 (+116%) 415 (+140%) 213 (+367%) 948 (+5%) 994
utf-32be '\U00010000'+'\x80'*9999 458 (+115%) 415 (+137%) 222 (+343%) 938 (+5%) 983
utf-32be '\U00010000'+'\u0100'*9999 458 (+115%) 415 (+137%) 212 (+364%) 940 (+5%) 983
utf-32be '\U00010000'+'\u8000'*9999 458 (+115%) 415 (+137%) 222 (+343%) 939 (+5%) 983
On 32-bit Linux, Intel Atom N570 @ 1.66GHz:
 Py2.7 Py3.2 Py3.3 patchA patchB
utf-32le 'A'*10000 165 (+173%) 165 (+173%) 100 (+350%) 389 (+16%) 450
utf-32le '\x80'*10000 165 (+159%) 165 (+159%) 76 (+462%) 374 (+14%) 427
utf-32le '\x80'+'A'*9999 165 (+161%) 165 (+161%) 76 (+466%) 374 (+15%) 430
utf-32le '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+346%) 333 (+8%) 361
utf-32le '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363
utf-32le '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32le '\u8000'*10000 165 (+119%) 165 (+119%) 80 (+352%) 333 (+9%) 362
utf-32le '\u8000'+'A'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32le '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32le '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+343%) 333 (+8%) 359
utf-32le '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357
utf-32le '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
utf-32le '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359
utf-32le '\U00010000'+'\u0100'*9999 165 (+116%) 165 (+116%) 80 (+346%) 324 (+10%) 357
utf-32le '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358
utf-32be 'A'*10000 165 (+172%) 165 (+172%) 100 (+348%) 390 (+15%) 448
utf-32be '\x80'*10000 165 (+159%) 165 (+159%) 75 (+469%) 373 (+14%) 427
utf-32be '\x80'+'A'*9999 165 (+160%) 165 (+160%) 75 (+472%) 375 (+14%) 429
utf-32be '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32be '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 335 (+8%) 363
utf-32be '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362
utf-32be '\u8000'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32be '\u8000'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363
utf-32be '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362
utf-32be '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+344%) 335 (+7%) 360
utf-32be '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357
utf-32be '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
utf-32be '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359
utf-32be '\U00010000'+'\u0100'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358
utf-32be '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
For scripts see issue14624.
msg160443 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年05月11日 19:24
The patches updated to stylistic conformity of the UTF-8 decoder. Patch B is significantly accelerated for aligned input data (i. e. almost always), especially for natural order. The UTF-32 decoder can now be faster than ASCII decoder! May be it is time to change the title to "Amazingly faster UTF-32 decoding"? ;)
 Py3.2 Py3.3 patchA patchB
utf-32le 'A'*10000 162 (+462%) 100 (+810%) 391 (+133%) 910
utf-32le 'A'*9999+'\x80' 162 (+411%) 99 (+736%) 377 (+120%) 828
utf-32le 'A'*9999+'\u0100' 162 (+277%) 95 (+543%) 324 (+89%) 611
utf-32le 'A'*9999+'\u8000' 162 (+278%) 95 (+545%) 324 (+89%) 613
utf-32le 'A'*9999+'\U00010000' 162 (+280%) 95 (+547%) 322 (+91%) 615
utf-32le '\x80'*10000 162 (+436%) 94 (+823%) 389 (+123%) 868
utf-32le '\x80'+'A'*9999 162 (+441%) 94 (+832%) 388 (+126%) 876
utf-32le '\x80'*9999+'\u0100' 162 (+273%) 90 (+571%) 320 (+89%) 604
utf-32le '\x80'*9999+'\u8000' 162 (+271%) 90 (+568%) 319 (+88%) 601
utf-32le '\x80'*9999+'\U00010000' 162 (+268%) 90 (+562%) 318 (+87%) 596
utf-32le '\u0100'*10000 161 (+445%) 83 (+958%) 405 (+117%) 878
utf-32le '\u0100'+'A'*9999 162 (+440%) 83 (+954%) 403 (+117%) 875
utf-32le '\u0100'+'\x80'*9999 162 (+444%) 83 (+963%) 403 (+119%) 882
utf-32le '\u0100'*9999+'\u8000' 162 (+441%) 83 (+955%) 404 (+117%) 876
utf-32le '\u0100'*9999+'\U00010000' 162 (+259%) 79 (+637%) 325 (+79%) 582
utf-32le '\u8000'*10000 162 (+441%) 83 (+955%) 404 (+117%) 876
utf-32le '\u8000'+'A'*9999 162 (+441%) 83 (+955%) 404 (+117%) 876
utf-32le '\u8000'+'\x80'*9999 161 (+448%) 83 (+964%) 403 (+119%) 883
utf-32le '\u8000'+'\u0100'*9999 161 (+443%) 83 (+954%) 402 (+118%) 875
utf-32le '\u8000'*9999+'\U00010000' 162 (+262%) 79 (+643%) 325 (+81%) 587
utf-32le '\U00010000'*10000 149 (+483%) 83 (+947%) 390 (+123%) 869
utf-32le '\U00010000'+'A'*9999 162 (+444%) 83 (+963%) 389 (+127%) 882
utf-32le '\U00010000'+'\x80'*9999 162 (+430%) 83 (+935%) 389 (+121%) 859
utf-32le '\U00010000'+'\u0100'*9999 162 (+429%) 83 (+933%) 389 (+120%) 857
utf-32le '\U00010000'+'\u8000'*9999 162 (+431%) 83 (+937%) 388 (+122%) 861
utf-32be 'A'*10000 162 (+199%) 100 (+384%) 393 (+23%) 484
utf-32be 'A'*9999+'\x80' 162 (+186%) 99 (+368%) 376 (+23%) 463
utf-32be 'A'*9999+'\u0100' 162 (+138%) 95 (+306%) 323 (+20%) 386
utf-32be 'A'*9999+'\u8000' 162 (+139%) 95 (+307%) 323 (+20%) 387
utf-32be 'A'*9999+'\U00010000' 162 (+138%) 95 (+305%) 322 (+20%) 385
utf-32be '\x80'*10000 161 (+196%) 94 (+407%) 389 (+23%) 477
utf-32be '\x80'+'A'*9999 161 (+197%) 94 (+409%) 387 (+24%) 478
utf-32be '\x80'*9999+'\u0100' 161 (+137%) 90 (+324%) 321 (+19%) 382
utf-32be '\x80'*9999+'\u8000' 162 (+135%) 89 (+328%) 320 (+19%) 381
utf-32be '\x80'*9999+'\U00010000' 162 (+134%) 89 (+326%) 318 (+19%) 379
utf-32be '\u0100'*10000 161 (+196%) 83 (+473%) 404 (+18%) 476
utf-32be '\u0100'+'A'*9999 161 (+196%) 83 (+475%) 402 (+19%) 477
utf-32be '\u0100'+'\x80'*9999 162 (+196%) 83 (+477%) 403 (+19%) 479
utf-32be '\u0100'*9999+'\u8000' 161 (+196%) 83 (+473%) 404 (+18%) 476
utf-32be '\u0100'*9999+'\U00010000' 162 (+131%) 79 (+373%) 325 (+15%) 374
utf-32be '\u8000'*10000 161 (+195%) 83 (+472%) 404 (+18%) 475
utf-32be '\u8000'+'A'*9999 161 (+197%) 83 (+476%) 402 (+19%) 478
utf-32be '\u8000'+'\x80'*9999 161 (+197%) 83 (+476%) 403 (+19%) 478
utf-32be '\u8000'+'\u0100'*9999 162 (+194%) 83 (+473%) 403 (+18%) 476
utf-32be '\u8000'*9999+'\U00010000' 161 (+133%) 79 (+375%) 325 (+15%) 375
utf-32be '\U00010000'*10000 148 (+222%) 83 (+473%) 391 (+22%) 476
utf-32be '\U00010000'+'A'*9999 161 (+198%) 83 (+477%) 389 (+23%) 479
utf-32be '\U00010000'+'\x80'*9999 162 (+194%) 83 (+473%) 389 (+22%) 476
utf-32be '\U00010000'+'\u0100'*9999 162 (+194%) 83 (+475%) 389 (+23%) 477
utf-32be '\U00010000'+'\u8000'*9999 161 (+196%) 83 (+475%) 389 (+23%) 477
msg173405 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月20日 19:10
Patches updated to 3.4.
msg173406 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月20日 19:23
I suggest apply patch A to 3.3 as it fixes performance regression (2x) and is very simple.
msg173407 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年10月20日 19:38
Very simple? You're changing most of the code there.
msg173408 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月20日 19:59
It was too complicated code. Actually patched code is smaller.
 1 file changed, 71 insertions(+), 80 deletions(-)
UTF-16 codec was modified in some way.
msg173409 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年10月20日 20:21
That the new code is smaller is no guarantee that it's as correct :)
That is exactly the reason we don't put optimizations in bugfix releases.
msg174170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月30日 00:59
> I suggest apply patch A to 3.3 as it fixes performance
> regression (2x) and is very simple.
ASCII and UTF-8 are the two most common codecs in the world, so it's justified to have heavily optimized encoders and decoders.
I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough.
--
32 bit units is commonly used with wchar_t, but this format already has a fast decoder, PyUnicode_FromWideChar(), which uses memcpy() or _PyUnicode_CONVERT_BYTES().
msg174188 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月30日 09:29
> I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough.
Agree. I had the same doubts. That's why I proposed two patches for your choice.
msg174233 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年10月30日 22:24
New changeset 9badfe3a31a7 by Victor Stinner in branch 'default':
Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster
http://hg.python.org/cpython/rev/9badfe3a31a7 
msg174234 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月30日 22:26
I applied the patch "A" with minor changes: replace multiple goto with classic break/continue and if/else.
msg174239 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月30日 23:27
> I applied the patch "A" with minor changes: replace multiple goto with
> classic break/continue and if/else.
Looks good. Thanks.
History
Date User Action Args
2022年04月11日 14:57:29adminsetgithub: 58830
2012年10月30日 23:27:45serhiy.storchakasetmessages: + msg174239
2012年10月30日 22:26:24vstinnersetmessages: + msg174234
2012年10月30日 22:24:09python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg174233

resolution: fixed
stage: patch review -> resolved
2012年10月30日 09:29:56serhiy.storchakasetmessages: + msg174188
2012年10月30日 00:59:28vstinnersetmessages: + msg174170
2012年10月24日 09:02:12serhiy.storchakasetstage: patch review
2012年10月20日 20:21:48georg.brandlsetmessages: + msg173409
2012年10月20日 19:59:49serhiy.storchakasetmessages: + msg173408
2012年10月20日 19:38:13georg.brandlsetmessages: + msg173407
2012年10月20日 19:25:53serhiy.storchakasetnosy: + georg.brandl
2012年10月20日 19:23:40serhiy.storchakasetmessages: + msg173406
versions: + Python 3.3
2012年10月20日 19:11:26serhiy.storchakasetkeywords: + 3.3regression
files: + decode_utf32_b_3.patch
2012年10月20日 19:10:07serhiy.storchakasetkeywords: + needs review
files: + decode_utf32_a_3.patch
messages: + msg173405

versions: + Python 3.4, - Python 3.3
2012年10月20日 19:07:24serhiy.storchakasetfiles: - decode_utf32_b_2.patch
2012年10月20日 19:06:51serhiy.storchakasetfiles: - decode_utf32_a_2.patch
2012年10月20日 19:06:34serhiy.storchakasetfiles: - decode_utf32_b.patch
2012年10月20日 19:06:21serhiy.storchakasetfiles: - decode_utf32_a.patch
2012年05月11日 19:46:42serhiy.storchakasetnosy: + ezio.melotti
components: + Unicode
2012年05月11日 19:25:28serhiy.storchakasetfiles: + decode_utf32_b_2.patch
2012年05月11日 19:24:54serhiy.storchakasetfiles: + decode_utf32_a_2.patch

messages: + msg160443
2012年04月23日 21:01:40serhiy.storchakasetmessages: + msg159078
2012年04月20日 21:38:12asvetlovsetnosy: + asvetlov
2012年04月20日 06:36:35Arfreversetnosy: + Arfrever
2012年04月19日 21:03:20vstinnersetnosy: + pitrou, vstinner
messages: + msg158750
2012年04月19日 21:00:28serhiy.storchakasetfiles: + decode_utf32_b.patch
2012年04月19日 20:59:56serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /