Issue 14339: Optimizing bin, oct and hex

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58547

classification

Title:	Optimizing bin, oct and hex
Type:	performance	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.3

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	mark.dickinson	Nosy List:	eric.smith, ezio.melotti, loewis, mark.dickinson, pitrou, python-dev, serhiy.storchaka, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2012年03月16日 21:09 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
long_to_binary_base.patch	serhiy.storchaka, 2012年03月16日 21:09	review
long_to_binary_base_2.patch	serhiy.storchaka, 2012年04月15日 20:32	review
long_to_binary_base_3.patch	serhiy.storchaka, 2012年04月16日 09:02	review

Messages (20)
msg156085 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年03月16日 21:09
This patch slightly optimize conversion int to string with base 2, 8 or 16 by calculating exactly number of digits and avoiding unnecessary memory allocation. The code is similar to used for decimal conversion. Without patch: $ ./python -m timeit -s 'x=1' 'bin(x)' 1000000 loops, best of 3: 0.295 usec per loop $ ./python -m timeit -s 'x=1' 'oct(x)' 1000000 loops, best of 3: 0.327 usec per loop $ ./python -m timeit -s 'x=1' 'hex(x)' 1000000 loops, best of 3: 0.298 usec per loop $ ./python -m timeit -s 'x=9999' 'bin(x)' 100000 loops, best of 3: 11.9 usec per loop $ ./python -m timeit -s 'x=9999' 'oct(x)' 100000 loops, best of 3: 4.55 usec per loop $ ./python -m timeit -s 'x=9999' 'hex(x)' 100000 loops, best of 3: 3.99 usec per loop With patch: $ ./python -m timeit -s 'x=1' 'bin(x)' 1000000 loops, best of 3: 0.213 usec per loop $ ./python -m timeit -s 'x=1' 'oct(x)' 1000000 loops, best of 3: 0.228 usec per loop $ ./python -m timeit -s 'x=1' 'hex(x)' 1000000 loops, best of 3: 0.215 usec per loop $ ./python -m timeit -s 'x=9999' 'bin(x)' 100000 loops, best of 3: 9.76 usec per loop $ ./python -m timeit -s 'x=9999' 'oct(x)' 100000 loops, best of 3: 3.58 usec per loop $ ./python -m timeit -s 'x=9999' 'hex(x)' 100000 loops, best of 3: 3.3 usec per loop
msg157234 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年03月31日 22:54
While optimizing stuff is nice, I'm not sure there's a use case here. Does it significantly speed up a real-world workload you're having?
msg157260 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年04月01日 07:10
No, I do not think that any application will significantly speeded up. In fact, it is not optimization, and getting rid of the apparent pessimization. In addition to increasing the speed, memory fragmentation is reduced. The patch has a minimum size, the new code is not larger and not more complex than the old one, it is similar to code already used in this file, there are small nonconditional advantages. I see no reason not to accept the changes.
msg158351 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月15日 18:48
A few comments: (1) The patch appears to assume that a Unicode string created with PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND. While this might be true in the current implementation, I don't know whether this is guaranteed in general. Martin, any comments on this? (2) The patch doesn't compile with '--with-pydebug': there's a reference to the (now) undefined variable 'buffer' in one of the asserts. (3) The overflow check looks as though it needs to be reworked.
msg158366 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年04月15日 20:32
> (1) The patch appears to assume that a Unicode string created with PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND. While this might be true in the current implementation, I don't know whether this is guaranteed in general. Martin, any comments on this? The same assumption is used above in long_to_decimal_string(). I added the same assert and changed maxchar to 'f'. > (2) The patch doesn't compile with '--with-pydebug': there's a reference to the (now) undefined variable 'buffer' in one of the asserts. Fixed. > (3) The overflow check looks as though it needs to be reworked. I was left an old constraint (it is enough), but it really can be weakened (in fact, it can be so weakened in the current code). Thank you for the comments and the found error.
msg158402 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月16日 07:14
Thanks for the updates. > The same assumption is used above in long_to_decimal_string(). I added > the same assert and changed maxchar to 'f'. I think 'x' would be more appropriate. :-)
msg158405 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年04月16日 09:02
> I think 'x' would be more appropriate. :-) Oops. You are right, of cause.
msg158441 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年04月16日 13:25
> (1) The patch appears to assume that a Unicode string created with > PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND. > While this might be true in the current implementation, I don't know > whether this is guaranteed in general. Martin, any comments on > this? There is a guarantee that the shortest form must always be used. So as long as the Unicode representation doesn't fundamentally change again, this property is indeed guaranteed.
msg158445 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月16日 13:38
> There is a guarantee that the shortest form must always be used. Okay, sounds good. Thanks.
msg158881 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年04月20日 20:21
New changeset dcd3344b6d5b by Mark Dickinson in branch 'default': Issue #14339: Improve speed of bin, oct and hex builtins. Patch by Serhiy Storchaka (with minor modifications). http://hg.python.org/cpython/rev/dcd3344b6d5b
msg158882 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月20日 20:23
Thanks for the patch. I made some minor changes, notably moving the overflow check closer to where it's needed, moving some comments around, and removing a (possibly inappropriate) PyErr_NoMemory call.
msg158884 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年04月20日 20:41
This should have waited until Serhiy submits a contributor form. Serhiy, can you please do this soon? Else I'll have to revert the change.
msg158890 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月20日 20:49
Aargh. Sorry, yes. Serhiy, can you do this? The form can be found at: http://www.python.org/psf/contrib/contrib-form-python/
msg158891 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月20日 20:52
Whoops; that looks like a slightly older version of the form. I think the correct one is: http://www.python.org/psf/contrib/contrib-form/
msg159076 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年04月23日 21:00
I can only do this tomorrow.
msg159220 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年04月24日 22:06
2.100 - v = PyUnicode_DecodeASCII(p, &buffer[sz] - p, NULL); ... 2.104 + assert(p == PyUnicode_1BYTE_DATA(v)); 2.105 return v; 2.106 } You may call assert(_PyUnicode_CheckConsistency(v, 1)) to ensure that the newly created string is "consistent" (see the function for the details).
msg159440 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年04月27日 06:33
Serhiy: what's the status of your contributor form? Note that you can also fax it, or scan it and send it by email.
msg159449 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年04月27日 11:19
> You may call assert(_PyUnicode_CheckConsistency(v, 1)) to ensure > that the newly created string is "consistent" (see the function > for the details). Done in the following changeset: changeset: 76560:3bdcf0cab164 parent: 76558:5fea362b92fc user: Victor Stinner <victor.stinner@gmail.com> date: Thu Apr 26 00:37:21 2012 +0200 files: Objects/longobject.c description: long_to_decimal_string() and _PyLong_Format() check the consistency of newly created strings using _PyUnicode_CheckConsistency() in debug mode
msg159463 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年04月27日 14:52
I sent the signed form. Martin, sorry for the delay. Mark, sorry, that involuntarily let you down.
msg159498 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2012年04月27日 21:51
Serhiy: thanks for submitting the form! No need to apologise: it was my bad for making the commit prematurely. And now that I see that all-important asterisk next to your name, I'll reclose the issue. :-)

History
Date	User	Action	Args
2022年04月11日 14:57:28	admin	set	github: 58547
2012年04月27日 21:51:23	mark.dickinson	set	status: open -> closed messages: + msg159498
2012年04月27日 14:52:33	serhiy.storchaka	set	messages: + msg159463
2012年04月27日 11:19:54	vstinner	set	messages: + msg159449
2012年04月27日 06:33:04	loewis	set	messages: + msg159440
2012年04月24日 22:06:27	vstinner	set	nosy: + vstinner messages: + msg159220
2012年04月23日 21:00:53	serhiy.storchaka	set	messages: + msg159076
2012年04月20日 20:52:55	mark.dickinson	set	messages: + msg158891
2012年04月20日 20:49:55	mark.dickinson	set	messages: + msg158890
2012年04月20日 20:41:08	loewis	set	status: closed -> open messages: + msg158884
2012年04月20日 20:23:27	mark.dickinson	set	status: open -> closed resolution: fixed messages: + msg158882
2012年04月20日 20:21:46	python-dev	set	nosy: + python-dev messages: + msg158881
2012年04月17日 11:16:22	mark.dickinson	set	assignee: mark.dickinson
2012年04月16日 13:38:49	mark.dickinson	set	messages: + msg158445
2012年04月16日 13:25:25	loewis	set	messages: + msg158441
2012年04月16日 09:02:31	serhiy.storchaka	set	files: + long_to_binary_base_3.patch messages: + msg158405
2012年04月16日 07:14:11	mark.dickinson	set	messages: + msg158402
2012年04月15日 20:32:11	serhiy.storchaka	set	files: + long_to_binary_base_2.patch messages: + msg158366
2012年04月15日 18:48:12	mark.dickinson	set	nosy: + loewis messages: + msg158351
2012年04月14日 04:28:50	ezio.melotti	set	nosy: + ezio.melotti
2012年04月01日 07:10:29	serhiy.storchaka	set	messages: + msg157260
2012年03月31日 22:54:28	pitrou	set	nosy: + pitrou messages: + msg157234
2012年03月17日 02:05:24	eric.smith	set	nosy: + eric.smith
2012年03月16日 23:33:08	pitrou	set	nosy: + mark.dickinson, skrah components: + Interpreter Core, - Library (Lib) stage: patch review
2012年03月16日 21:09:30	serhiy.storchaka	create

homepage