Message190475
| Author |
ebfe |
| Recipients |
asvetlov, christian.heimes, ebfe, gregory.p.smith, isoschiz, jcea, mark.dickinson, neologix, pitrou, python-dev, rhettinger, serhiy.storchaka, skrah, tim.peters, vstinner |
| Date |
2013年06月02日.09:30:21 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - that is quite a thing.
Valgrind shows that the number of instructions spent by unicode_hash() drops from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the mentioned 15 percent.
Cityhash was chosen because of it's MIT license and advertisement for performance on short strings.
I've now found this bug and attached a log for haypo's benchmark which compares native vs. cityhash. Caching was disabled during the test. Cityhash was compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). CPython's unittests fail due to known_hash and gdb output; besides that, everything else seems to work fine.
Cityhash is advertised for it's performance with short strings, which does not seem to show in the benchmark. However, longer strings perform *much* better.
If people are insterested, i can repeat the test on a armv7l |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2013年06月02日 09:30:24 | ebfe | set | recipients:
+ ebfe, tim.peters, rhettinger, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, asvetlov, skrah, neologix, python-dev, serhiy.storchaka, isoschiz |
| 2013年06月02日 09:30:24 | ebfe | set | messageid: <1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za> |
| 2013年06月02日 09:30:23 | ebfe | link | issue16427 messages |
| 2013年06月02日 09:30:23 | ebfe | create |
|