This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年03月10日 22:19 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| utf16_decoder_union.patch | vstinner, 2012年03月20日 00:30 | review | ||
| utf16_decoder_shift.patch | serhiy.storchaka, 2012年03月20日 13:50 | review | ||
| utf16_decoder_shift_2.patch | serhiy.storchaka, 2012年03月20日 14:23 | review | ||
| utf16_decoder_shift_3.patch | serhiy.storchaka, 2012年03月30日 08:47 | Fixed for big-endian plathform. | review | |
| Messages (18) | |||
|---|---|---|---|
| msg155357 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年03月10日 22:19 | |
There are a couple of aliasing warnings in non-debug mode. For example: http://www.python.org/dev/buildbot/all/builders/x86%20Gentoo%20Non-Debug%203.x/builds/1741 Objects/object.c:293: warning: ignoring return value of 'fwrite', declared with attribute warn_unused_result Objects/object.c:302: warning: ignoring return value of 'fwrite', declared with attribute warn_unused_result Objects/unicodeobject.c:5533: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5533: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5533: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5523: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5523: warning: dereferencing pointer 'pblock' does break strict-aliasing rules cc1: warning: dereferencing pointer 'pblock' does break strict-aliasing rules cc1: warning: dereferencing pointer 'pblock' does break strict-aliasing rules cc1: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5523: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5523: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5533: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5533: warning: dereferencing pointer 'pblock' does break strict-aliasing rules Objects/unicodeobject.c:5533: warning: dereferencing pointer 'pblock' does break strict-aliasing rules cc1: warning: dereferencing pointer 'pblock' does break strict-aliasing rules cc1: warning: dereferencing pointer 'pblock' does break strict-aliasing rules cc1: warning: dereferencing pointer 'pblock' does break strict-aliasing rules |
|||
| msg155432 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2012年03月12日 02:47 | |
gcc 4.5 doesn't warn for me. Is this a compiler bug in 4.4 or 4.5? That is, are these actual aliasing violations? |
|||
| msg155451 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年03月12日 15:43 | |
Benjamin Peterson <report@bugs.python.org> wrote: > gcc 4.5 doesn't warn for me. Is this a compiler bug in 4.4 or 4.5? > That is, are these actual aliasing violations? I see this with 4.4 but also with 4.6 when using -Wstrict-aliasing=2. However, nothing bad happens when I compile with -fstrict-aliasing. I think, but I would have to consult the standard again to be sure, that technically this might be an aliasing violation in C99, since 'pblock' does not have the type of 'block'. unsigned long block = * (unsigned long *) _q; unsigned short *pblock = (unsigned short*)█ |
|||
| msg156372 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年03月20日 00:30 | |
Attached patch uses an union to make the compiler warning quiet. It should not speed up Python because the function already ensures that the pointer is aligned to the size of a long. It may slow down the function, I don't know gcc enough to guess exactly the impact on performances. An alternative is to use __attribute__((__may_alias__)), a GCC specific attribute. I don't know the impact on performances of this attribute. |
|||
| msg156406 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年03月20日 13:50 | |
What if add more hacking? If long integer already used for buffering and checking, let use it for swapping and splitting too. With my patch (attached) codecs.utf_16_be_decode runs 5% faster (on 32-bit Linux, I was not tested 64-bit). And of cause no pointers -- no aliasing warnings. |
|||
| msg156407 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年03月20日 14:04 | |
> With my patch (attached) codecs.utf_16_be_decode runs 5% faster (on 32-bit Linux, I was not tested 64-bit). And of cause no pointers -- no aliasing warnings. Your patch is wrong: you need to use & 0xffff to get lower 16 bits when reading a UTF-16 unit. For example, (Py_UCS2)(block >> 32) should be written (Py_UCS2)((block >> 32) & 0xffff). |
|||
| msg156408 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年03月20日 14:20 | |
Heh. This was in previous version of my patch. I have removed '& 0xFFFFu' and parents for simplicity. GCC produces same binaries for both sources. But you can return it back. It has effect only on plathforms with non-16-bit short, but now Python not supports they (Python is not well portable on exotic plathforms). |
|||
| msg156416 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年03月20日 14:54 | |
> It has effect only on plathforms with non-16-bit short The problem is for 64-bit long: "long >> 32" returns the 32 higher bits (32..64), not 16 bits (32..48). |
|||
| msg156433 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年03月20日 16:53 | |
"(unsigned short)(long >> 32)" returns the 16 bits (32..48) if short is 16-bit. I agree that this variant is more strict and reliable (and this was my original version) and if you do not find it verbose and redundant, so be it. The difference will be noticeable only on a very exotic platform (with a 9-bit chars, for example), where the original code also will not work. Frankly, in this straightforward patch hacking is less than in the original code.
I made a mistake with the microbenchmark. In fact, acceleration is not 5%, but 20-40%.
./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = ("\u263A" * 1000).encode("utf-16be")' 'd(x)'
|
|||
| msg157130 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年03月30日 08:47 | |
I'm sorry. Here is the corrected patch for big-endian plathform. |
|||
| msg157566 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年04月05日 11:44 | |
New changeset 2c514c382a2a by Victor Stinner in branch 'default': Close #14249: Use an union instead of a long to short pointer to avoid aliasing http://hg.python.org/cpython/rev/2c514c382a2a |
|||
| msg157567 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年04月05日 11:46 | |
Result of the benchmark before/after my commit. I prefer an unit over manually manipulate long as short or bytes, because I think that the compiler knows better how to optimize operations on integers.
unpatched:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 4.64 usec per loop
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = ("\u263A" * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 5.87 usec per loop
patched:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 3.53 usec per loop
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = ("\u263A" * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 4.85 usec per loop
|
|||
| msg157612 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月05日 19:46 | |
What compiler are you using? With gcc 4.4 on 32-bit Linux netbook I get: unpatched union shift utf-16le " "*10000 129 126 109 utf-16le "\u263A"*10000 208 203 160 utf-16be " "*10000 153 147 114 utf-16be "\u263A"*10000 226 227 167 The difference is that for shift the compiler stores block in register, and for the union the compiler stores block in memory, so that it can get address. May be more recent compilers learned to do this more effectively? Besides, shifts are more pronounced for CPython code, searching shows very few uses of union in the source code. |
|||
| msg157614 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年04月05日 20:03 | |
On 64-bit Linux with gcc-4.4 I get:
Unpatched:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 4.1 usec per loop
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = ("\u263A" * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 5.87 usec per loop
2c514c382a2a:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 3.68 usec per loop
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = ("\u263A" * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 4.72 usec per loop
utf16_decoder_shift_3.patch:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 2.23 usec per loop
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = ("\u263A" * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 3.11 usec per loop
|
|||
| msg157615 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月05日 20:17 | |
Linux, 64-bit, Intel Core i5 2500:
-> unpatched:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 2.99 usec per loop
-> Victor's commit:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
100000 loops, best of 3: 2.83 usec per loop
-> utf16_decoder_shift_3.patch:
$ ./python -m timeit -s 'import codecs; d = codecs.utf_16_be_decode; x = (" " * 1000).encode("utf-16be")' 'd(x)'
1000000 loops, best of 3: 1.88 usec per loop
It seems that the wrong patch was committed.
|
|||
| msg157620 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年04月05日 20:54 | |
New changeset 489f252b1f8b by Victor Stinner in branch 'default': Close #14249: Use bit shifts instead of an union, it's more efficient. http://hg.python.org/cpython/rev/489f252b1f8b |
|||
| msg157622 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年04月05日 21:00 | |
Ok, benchmarks have spoken, amen. I applied Serhiy Storchaka's patch (version 3). I just replaced expressions in calls to Py_MAX by variables: Py_MAX is a macro and it may have to compute each expression twice. I didn't check if it's more or less efficient. Thanks Serhiy for your contribution! I added your name to Misc/ACKS. We may change Py_MIN/Py_MAX to something more efficient using typeof(): #define max(a,b) \ ({ typeof (a) _a = (a); \ typeof (b) _b = (b); \ _a > _b ? _a : _b; }) I don't know which C compilers support it, but gcc does, at least. |
|||
| msg157624 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月05日 21:24 | |
> I just replaced expressions in calls to Py_MAX by variables: Py_MAX is a macro and it may have to compute each expression twice. gcc computes those values only once. It even caches them for use in PyUnicode_WRITE. But other compilers may not be so smart. Instead of Py_MAX(a,b) here you can use a|b. In theory this should be more efficient, but I couldn't see the difference even with microscope. However, all this does not matter, soon I will submit complex patch, which speeds up the utf-16 decoder in 2-5 times. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:27 | admin | set | github: 58457 |
| 2012年04月05日 21:24:25 | serhiy.storchaka | set | messages: + msg157624 |
| 2012年04月05日 21:00:31 | vstinner | set | messages: + msg157622 |
| 2012年04月05日 20:54:48 | python-dev | set | status: open -> closed resolution: fixed messages: + msg157620 stage: commit review -> resolved |
| 2012年04月05日 20:17:29 | pitrou | set | status: closed -> open messages: + msg157615 assignee: vstinner resolution: fixed -> (no value) stage: resolved -> commit review |
| 2012年04月05日 20:03:55 | skrah | set | messages: + msg157614 |
| 2012年04月05日 19:46:23 | serhiy.storchaka | set | messages: + msg157612 |
| 2012年04月05日 11:46:02 | vstinner | set | messages: + msg157567 |
| 2012年04月05日 11:44:34 | python-dev | set | status: open -> closed nosy: + python-dev messages: + msg157566 resolution: fixed stage: resolved |
| 2012年03月30日 08:47:13 | serhiy.storchaka | set | files:
+ utf16_decoder_shift_3.patch messages: + msg157130 |
| 2012年03月20日 16:53:24 | serhiy.storchaka | set | messages: + msg156433 |
| 2012年03月20日 14:54:01 | vstinner | set | messages: + msg156416 |
| 2012年03月20日 14:23:15 | serhiy.storchaka | set | files: + utf16_decoder_shift_2.patch |
| 2012年03月20日 14:20:16 | serhiy.storchaka | set | messages: + msg156408 |
| 2012年03月20日 14:04:48 | vstinner | set | messages: + msg156407 |
| 2012年03月20日 13:50:20 | serhiy.storchaka | set | files:
+ utf16_decoder_shift.patch nosy: + serhiy.storchaka messages: + msg156406 |
| 2012年03月20日 00:30:50 | vstinner | set | files:
+ utf16_decoder_union.patch nosy: + pitrou messages: + msg156372 keywords: + patch |
| 2012年03月12日 15:43:24 | skrah | set | messages: + msg155451 |
| 2012年03月12日 02:47:33 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg155432 |
| 2012年03月10日 22:19:52 | skrah | create | |