This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年01月07日 15:25 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| utf8decode3.patch | pitrou, 2009年01月08日 03:03 | |||
| utf8decode4.patch | amaury.forgeotdarc, 2009年01月08日 13:11 | |||
| decode5.patch | pitrou, 2009年01月08日 19:20 | |||
| decode6.patch | pitrou, 2009年01月08日 20:37 | |||
| Messages (14) | |||
|---|---|---|---|
| msg79338 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月07日 15:24 | |
Here is a patch to speedup utf8 decoding. On a 64-bit build, the maximum speedup is around 30%, and on a 32-bit build around 15%. (*) The patch may look disturbingly trivial, and I haven't studied the assembler output, but I think it is explained by the fact that having a separate loop counter breaks the register dependencies (when the 's' pointer was incremented, other operations had to wait for the incrementation to be committed). [side note: utf8 encoding is still much faster than decoding, but it may be because it allocates a smaller object, regardless of the iteration count] The same principle can probably be applied to the other decoding functions in unicodeobject.c, but first I wanted to know whether the principle is ok to apply. Marc-André, what is your take? (*) the benchmark I used is: ./python -m timeit -s "import codecs;c=codecs.utf_8_decode;s=b'abcde'*1000" "c(s)" More complex input also gets a speedup, albeit a smaller one (~10%): ./python -m timeit -s "import codecs;c=codecs.utf_8_decode;s=b'\xc3\xa9\xe7\xb4\xa2'*1000" "c(s)" |
|||
| msg79353 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年01月07日 17:45 | |
Can you please upload it to Rietveld? I'm skeptical about changes that merely rely on the compiler's register allocator to do a better job. This kind of change tends to pessimize the code for other compilers, and also may pessimize it for future versions of the same compiler. |
|||
| msg79356 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月07日 18:05 | |
As I said I don't think it's due to register allocation, but simply avoiding register write-to-read dependencies by using separate variables for the loop count and the pointer. I'm gonna try under Windows (in a virtual machine, but it shouldn't make much difference since the workload is CPU-bound). I've open a Rietveld issue here: http://codereview.appspot.com/11681 |
|||
| msg79358 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月07日 18:30 | |
Ha, the patch makes things slower on MSVC. The patch can probably be rejected, then. (and interestingly, MSVC produces 40% faster code than gcc on my mini-bench, despite the virtual machine overhead) |
|||
| msg79360 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2009年01月07日 18:35 | |
On 2009年01月07日 16:25, Antoine Pitrou wrote: > New submission from Antoine Pitrou <pitrou@free.fr>: > > Here is a patch to speedup utf8 decoding. On a 64-bit build, the maximum > speedup is around 30%, and on a 32-bit build around 15%. (*) > > The patch may look disturbingly trivial, and I haven't studied the > assembler output, but I think it is explained by the fact that having a > separate loop counter breaks the register dependencies (when the 's' > pointer was incremented, other operations had to wait for the > incrementation to be committed). > > [side note: utf8 encoding is still much faster than decoding, but it may > be because it allocates a smaller object, regardless of the iteration count] > > The same principle can probably be applied to the other decoding > functions in unicodeobject.c, but first I wanted to know whether the > principle is ok to apply. Marc-André, what is your take? I'm +1 on anything that makes codecs faster :-) However, the patch should be checked with some other compilers as well, e.g. using MS VC++. > (*) the benchmark I used is: > > ./python -m timeit -s "import > codecs;c=codecs.utf_8_decode;s=b'abcde'*1000" "c(s)" > > More complex input also gets a speedup, albeit a smaller one (~10%): > > ./python -m timeit -s "import > codecs;c=codecs.utf_8_decode;s=b'\xc3\xa9\xe7\xb4\xa2'*1000" "c(s)" |
|||
| msg79397 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月08日 03:03 | |
Reopening and attaching a more ambitious patch, based on the optimization of runs of ASCII characters. This time the speedup is much more impressive, up to 75% faster on pure ASCII input -- actually faster than latin1. The worst case (tight interleaving of ASCII and non-ASCII chars) shows a 8% slowdown. (performance measured with gcc and MSVC) |
|||
| msg79409 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2009年01月08日 13:11 | |
Very nice! It seems that you can get slightly faster by not copying the initial char first: 's' is often already aligned at the beginning of the string, but not after the first copy... Attached patch (utf8decode4.patch) changes this and may enter the fast loop on the first character. Does this idea apply to the encode function as well? |
|||
| msg79416 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月08日 15:22 | |
> Attached patch > (utf8decode4.patch) changes this and may enter the fast loop on the > first character. Thanks! > Does this idea apply to the encode function as well? Probably, although with less efficiency (a long can hold 1, 2 or 4 unicode characters depending on the build). The unrolling part also applies to simple codecs such as latin1. Unrolling PyUnicode_DecodeLatin1 a bit (4 copies per iteration) makes it twice faster on non-tiny strings. I'll experiment with utf16. |
|||
| msg79430 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月08日 19:20 | |
Attached patch adds acceleration for latin1 and utf16 decoding as well. All three codecs (utf8, utf16, latin1) are now in the same ballpark performance-wise on favorable input: on my machine, they are able to decode at almost 1GB/s. (unpatched, it is between 150 and 500MB/s. depending on the codec) |
|||
| msg79431 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月08日 19:27 | |
(PS : performance measured on UCS-2 and UCS-4 builds with gcc, and under Windows with MSVC) |
|||
| msg79432 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2009年01月08日 19:37 | |
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > Attached patch adds acceleration for latin1 and utf16 decoding as well. > > All three codecs (utf8, utf16, latin1) are now in the same ballpark > performance-wise on favorable input: on my machine, they are able to > decode at almost 1GB/s. > > (unpatched, it is between 150 and 500MB/s. depending on the codec) > > Added file: http://bugs.python.org/file12655/decode5.patch A few style comments: * please use indented #pre-processor directives whenever possible, e.g. #if # define #else # define #endif * the conditions should only accept SIZE_OF_LONG == 4 and 8 and fail with an #error for any other value * you should use unsigned longs instead of signed ones * please use spaces around arithmetic operators, e.g. not a+b, but a + b * when calling functions with lots of parameters, put each parameter on a new line (e.g. for unicode_decode_call_errorhandler()) Please also add a comment somewhere to the bit masks explaining what they do and how they are used. Verbose comments are always good to have when doing optimizations such as these. Have a look at the dictionary implementation for what I mean by this. Thanks, -- Marc-Andre Lemburg eGenix.com ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
|||
| msg79434 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月08日 20:37 | |
Marc-Andre, this patch should address your comments. |
|||
| msg79447 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2009年01月08日 22:06 | |
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > Marc-Andre, this patch should address your comments. > > Added file: http://bugs.python.org/file12656/decode6.patch Thanks. Much better ! BTW: I'd also change the variable name "word" to something different, e.g. bitmap or just data. It looks too much like a reserved word (which it isn't) ;-) |
|||
| msg79549 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年01月10日 15:46 | |
I committed the patch with the last suggested change (word -> data) in py3k (r68483). I don't intend to backport it to trunk, but I suppose it wouldn't be too much work to do. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:43 | admin | set | github: 49118 |
| 2010年04月04日 03:26:17 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2009年01月10日 15:46:28 | pitrou | set | status: open -> closed resolution: fixed messages: + msg79549 |
| 2009年01月08日 22:06:30 | lemburg | set | messages: + msg79447 |
| 2009年01月08日 20:38:00 | pitrou | set | files:
+ decode6.patch messages: + msg79434 |
| 2009年01月08日 19:37:37 | lemburg | set | messages: + msg79432 |
| 2009年01月08日 19:27:38 | pitrou | set | messages: + msg79431 |
| 2009年01月08日 19:20:19 | pitrou | set | files:
+ decode5.patch messages: + msg79430 |
| 2009年01月08日 17:06:39 | kevinwatters | set | nosy: + kevinwatters |
| 2009年01月08日 15:22:31 | pitrou | set | messages: + msg79416 |
| 2009年01月08日 13:11:21 | amaury.forgeotdarc | set | files:
+ utf8decode4.patch nosy: + amaury.forgeotdarc messages: + msg79409 |
| 2009年01月08日 03:03:52 | pitrou | set | files: - utf8decode.patch |
| 2009年01月08日 03:03:42 | pitrou | set | status: closed -> open resolution: rejected -> (no value) messages: + msg79397 files: + utf8decode3.patch |
| 2009年01月07日 18:35:13 | lemburg | set | messages: + msg79360 |
| 2009年01月07日 18:30:50 | pitrou | set | status: open -> closed resolution: rejected messages: + msg79358 |
| 2009年01月07日 18:05:18 | pitrou | set | messages: + msg79356 |
| 2009年01月07日 17:45:57 | loewis | set | nosy:
+ loewis messages: + msg79353 |
| 2009年01月07日 15:25:03 | pitrou | create | |