homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: base64.urlsafe_b64**code are too slow
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: Devin Jeanpierre, gvanrossum, jcea, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: easy, patch

Created on 2012年06月22日 14:59 by gvanrossum, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
base64.patch gvanrossum, 2012年06月22日 16:55 review
base64.patch gvanrossum, 2012年06月22日 17:19 review
base64_27.diff Devin Jeanpierre, 2015年05月30日 18:56
Messages (18)
msg163420 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 14:59
This bug is still present in Python 3.3. Some people use urlsafe strings *a lot*. And they find that the (obvious) implementation in base64.py is way too slow. In fact, I found myself writing the following a few times in App Engine libraries:
 # This is 3-4x faster than urlsafe_b64decode() 
 urlsafe = base64.b64encode(<arg>)
 return urlsafe.rstrip('=').replace('+', '-').replace('/', '_')
Someone from YouTube told me they patch the stdlib for the same reason (they claim their solution is 10x faster).
IIUC the cause of the slowness is the (mostly irrelevant) generality in b64encode and the _translate() function.
msg163422 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年06月22日 15:32
This is a feature request, not a bug. Marking for 3.3.
msg163424 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年06月22日 15:42
Guido, would your patch be enough?. Is a 3x improvement enough?. What improvement could be acceptable?. Implemented in C in "binascii" module?
msg163430 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 16:28
Why would it be a feature? There is no change in semantics, only in speed. Do we have a policy that forbids performance enhancements in bugfix releases? (It doesn't really matter, I'm just curious. Also, if it really was a feature request, it couldn't be introduced once the 3.3 beta has started, which would strike me as silly.)
Anyway, I don't think my hack should be applied literally; I was more thinking of memoizing the translation table and inlining the _translate call.
msg163432 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年06月22日 16:38
> Do we have a policy that forbids performance enhancements in bugfix
> releases?
Yes, except when it fixes a regression. I don't know where that policy is codified, but it seems quite ingrained.
It also means the window for getting it in 3.3 is getting narrow, the beta being so soon :)
msg163435 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 16:55
Here's a patch. It makes urlsafe_b64{en,de}code about 4x faster on my machine. The tests in base64_test.py still pass.
msg163439 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年06月22日 17:11
$ ./python -m timeit -s "from base64 import urlsafe_b64encode; url=bytes(range(256))" "urlsafe_b64encode(url)"
10000 loops, best of 3: 73.5 usec per loop
$ ./python -m timeit -s "from base64 import b64encode; url=bytes(range(256))" "b64encode(url).replace(b'+', b'-').replace(b'/', b'_')"
100000 loops, best of 3: 18.9 usec per loop
$ ./python -m timeit -s "from base64 import b64encode; url=bytes(range(256)); tr=bytes.maketrans(b'+/', b'-_')" "b64encode(url).translate(tr)"
100000 loops, best of 3: 15.6 usec per loop
msg163440 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 17:19
Heh. I'd forgotten about maketrans. Here's a patch that kills the _translate() helper function completely.
msg163441 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年06月22日 17:26
You're faster than me.
I think that asserts should be replaced by if/raise.
msg163443 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 17:28
I didn't touch that assert.
msg163475 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年06月22日 20:07
Antoine thinks so: http://bugs.python.org/issue15139 :-)
I think we should have an official policy, though.
msg163494 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 21:38
So, can I check this into 3.3?
msg163496 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年06月22日 21:52
I haven't tested the patch, but it looks fine to me.
msg163499 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012年06月22日 22:19
Submitted.
msg244485 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2015年05月30日 18:56
Here's a backport of the patch to 2.7. It's pretty rad, and basically identical to how YouTube monkeypatches base64.
Not sure what will happen to this patch. According to recent discussion on the list (e.g. https://mail.python.org/pipermail/python-dev/2015-May/140380.html ), performance improvements are open for inclusion in 2.7 if anyone wants to bother with merging this in and taking on the review / maintenance burden.
I'm OK with just publishing it for others to merge in with their own private versions of Python. It is only relevant if you use base64 a lot. :)
msg247672 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015年07月30日 14:26
I'm going to apply the 2.7 patch, except I'm going to leave the (now unused) _translate() function in place, in case it's used anywhere (minimizing the risk of breaking anything).
msg247675 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年07月30日 14:50
New changeset 309cbbc32491 by Guido van Rossum in branch '2.7':
Issue #15138: Speed up base64.urlsafe_b64* considerably (2.7 backport).
https://hg.python.org/cpython/rev/309cbbc32491 
msg247677 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015年07月30日 14:58
And just in case someone objects to my breaking the policy stated by Antoine in msg163432: see http://bugs.python.org/issue4753 -- it's not actually a real policy.
History
Date User Action Args
2022年04月11日 14:57:31adminsetgithub: 59343
2015年07月30日 14:58:04gvanrossumsetmessages: + msg247677
versions: + Python 2.7
2015年07月30日 14:50:45python-devsetnosy: + python-dev
messages: + msg247675
2015年07月30日 14:26:56gvanrossumsetmessages: + msg247672
2015年05月30日 18:56:12Devin Jeanpierresetfiles: + base64_27.diff
nosy: + Devin Jeanpierre
messages: + msg244485

2012年06月22日 22:19:47gvanrossumsetstatus: open -> closed
resolution: fixed
messages: + msg163499
2012年06月22日 21:52:23pitrousetmessages: + msg163496
versions: - Python 3.4
2012年06月22日 21:39:43gvanrossumsetassignee: gvanrossum
stage: needs patch -> patch review
2012年06月22日 21:38:38gvanrossumsetmessages: + msg163494
2012年06月22日 20:07:26jceasetmessages: + msg163475
2012年06月22日 17:28:38gvanrossumsetmessages: + msg163443
2012年06月22日 17:26:01serhiy.storchakasetmessages: + msg163441
2012年06月22日 17:19:43gvanrossumsetfiles: + base64.patch

messages: + msg163440
2012年06月22日 17:11:05serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg163439
2012年06月22日 16:55:23gvanrossumsetfiles: + base64.patch
keywords: + patch
messages: + msg163435
2012年06月22日 16:38:01pitrousetnosy: + pitrou
messages: + msg163432
2012年06月22日 16:28:25gvanrossumsetmessages: + msg163430
2012年06月22日 15:42:01jceasetmessages: + msg163424
2012年06月22日 15:32:06jceasetnosy: + jcea

messages: + msg163422
versions: - Python 2.6, Python 3.1, Python 2.7, Python 3.2
2012年06月22日 14:59:59gvanrossumcreate

AltStyle によって変換されたページ (->オリジナル) /