Issue 26256: Fast decimalisation and conversion to other bases

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70444

classification

Title:	Fast decimalisation and conversion to other bases
Type:	enhancement	Stage:	resolved
Components:	Documentation	Versions:	Python 3.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	skrah	Nosy List:	cheryl.sabella, docs@python, jneb, mark.dickinson, rhettinger, serhiy.storchaka, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2016年02月01日 10:28 by jneb, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
fastStr.py	jneb, 2016年02月01日 10:28	Fast conversion of integers to string (in any number base)

Pull Requests
URL	Status	Linked	Edit
PR 4808	merged	cheryl.sabella, 2017年12月12日 01:41
PR 11736	merged	miss-islington, 2019年02月02日 14:40
PR 11736	merged	miss-islington, 2019年02月02日 14:40
PR 11736	merged	miss-islington, 2019年02月02日 14:40

Messages (17)
msg259317 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016年02月01日 10:28
Inspired by the recently new discovered 49th Mersenne prime number I wrote a module to do high speed long/int to decimal conversion. I can now output the new Mersenne number in 18.5 minutes (instead of several hours) on my machine. For numbers longer than about 100000 bits, this routine it is faster than str(number) thanks to the Karatsuba multiplication in CPython. The module supports all number bases 2 through 36, and is written in pure python (both 2 and 3). There is a simple way to save more time by reusing the conversion object (saving about half the time for later calls). My suggestion is to incorporate this into some library, since Python still lacks a routine to convert to any number base. Ideally, it could be incorporated in the builtin str function, but this would need more work. When converting to C, it is recommended to optimize bases 4 and 32 the same way as oct, hex and bin do (which isn't easily accessible from Python). Hope you like it. At least, it was a lot of fun to write... Hope you like it.
msg259324 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2016年02月01日 14:02
Very nice. I'd suggest posting it as a recipe or as a Python package on PyPI for others to use. It's not really something that it would be appropriate to add to the Python core: Python isn't really intended for super-efficient high-precision arithmetic, and there are external libraries for that sort of thing (like gmpy2, for example).
msg259326 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016年02月01日 14:20
Thanks for the tip. I'll consider making it a recipe. - Jurjen
msg259402 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2016年02月02日 16:56
Is this the same algorithm as in issue3451?
msg259408 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2016年02月02日 18:07
Not really the same, since that issue was more about fast division. But it's the same spirit. It looks like I was more enthusiastic back then; now I'm older and grumpier. It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers. Cryptographic-size integers with a few hundred to a few thousand digits should be fine; number-theoretic work requiring hundreds of thousands of digits should use gmpy2 or something similar.
msg259421 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2016年02月02日 19:10
Jurjen, this is very nice! -- Like Mark, I'm not sure if this should be in CPython. Decimal (Python >= 3.3) has sneaked in a couple of fast bignum algorithms, so calculating and converting the latest Mersenne prime takes a couple of seconds: from decimal import * c = getcontext() c.prec = MAX_PREC c.Emax = MAX_EMAX c.Emin = MIN_EMIN x = Decimal(2)**74207281 - 1 s = str(x) assert s[:12] == "300376418084" assert s[-12:] == "391086436351"
msg259465 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016年02月03日 09:42
OMG is decimal that fast? Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations. It only says that precision "can be as large as needed for a given problem", but I never realized that that included millions of digits. It computed to my complete surprise 2229 to full precision (161 million decimal digits) in about a minute. That smells suspiciously like FFT style multiplication, which implies that it is way more sophisticated than integer multiplication! I suggest that the documentation of the decimal module recommends using decimal for multiprecision computations, as long as you use the builtin version.
msg259466 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2016年02月03日 09:51
"It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers." Yeah I agree. Maybe we need to explain that somewhere? In the devguide? Even in Python doc? I know that they are super crazy^W fast algorithm to handle very large numbers (hum, what is large? more than 4096 bits?), but they are usually very complex. Projects like GMP have ultra fast code with optimizations in assemblers. Having assembler implementation is clearly out of the scope of Python. "Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations." Well, nowhere means: https://docs.python.org/dev/whatsnew/3.3.html#decimal Usually, we don't document performances of a function, maybe only its complexity. -- You may move to the numpy community which is problably more keen on such optimization than the Python core.
msg259467 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016年02月03日 10:00
That reference you gave says that the binary version is faster than the Python version, but here the _complexity_ actually changed by a lot. Only people who know the library by name will notice that it is this fast. But you are right, for 99% of the people it doesn't make much of a difference.
msg259471 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2016年02月03日 10:41
On Wed, Feb 03, 2016 at 09:51:53AM +0000, STINNER Victor wrote: > Well, nowhere means: > https://docs.python.org/dev/whatsnew/3.3.html#decimal Okay, but hardly anyone reads that, and I can't blame them. :) For example, if I use something like Lua, I won't read a release announcement from 2012. > Usually, we don't document performances of a function, maybe only its complexity. As Jurjen said, the presence of the FFT (actually fast number theoretic transform here) in Decimal is something completely unexpected, so if there's a place for a line or two in the docs that makes some users happy I'd be in favor of it.
msg300769 - (view)	Author: Cheryl Sabella (cheryl.sabella) * (Python committer)	Date: 2017年08月23日 22:39
Hello, I came across this issue and was wondering if the FAQ section of the doc might be a good place to mention the presence of FFT, or actually fast number theoretic transform, as Stefan pointed out. I was also wondering if the short code snippet for the Mersenne prime might be included in the recipes as an example of what decimal is capable of?
msg300770 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2017年08月24日 00:43
A couple of thoughts: * It is okay to add a couple lines to the decimal module docs noting a Cpython specific implementation detail that it uses better than expected algorithms which can payoff nicely when used with very large numbers. * Mersenne prime examples and whatnot belong external to our docs. I usually put my recipes and examples on the ASPN Python Cookbook. http://code.activestate.com/recipes/langs/python/
msg300780 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2017年08月24日 11:13
pypy-5.8.0-beta0 (Python 3.5.3) is using a very nicely written CFFI wrapper for libmpdec, so it also has the fast bignums.
msg300781 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2017年08月24日 11:20
What needs to be mentioned though is that the context has to be set for unrounded calculations: c = getcontext() c.prec = MAX_PREC c.Emax = MAX_EMAX c.Emin = MIN_EMIN Otherwise some people believe that the bignums are just rounded floating point arithmetic that is expected to be fast (I've seen this misconception somewhere, perhaps on Stackoverflow).
msg316169 - (view)	Author: Cheryl Sabella (cheryl.sabella) * (Python committer)	Date: 2018年05月04日 12:32
ping
msg334747 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2019年02月02日 14:37
New changeset 00e9c55d27aff3e445ab4c8629cf4d59f46ff945 by Stefan Krah (Cheryl Sabella) in branch 'master': bpo-26256: Document algorithm speed for the Decimal module. (#4808) https://github.com/python/cpython/commit/00e9c55d27aff3e445ab4c8629cf4d59f46ff945
msg334748 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2019年02月02日 14:46
New changeset a2f4c4023314f69333d2e8cee68e316619f3d68e by Stefan Krah (Miss Islington (bot)) in branch '3.7': bpo-26256: Document algorithm speed for the Decimal module. (GH-4808) (#11736) https://github.com/python/cpython/commit/a2f4c4023314f69333d2e8cee68e316619f3d68e

History
Date	User	Action	Args
2022年04月11日 14:58:27	admin	set	github: 70444
2019年02月02日 14:49:57	skrah	set	status: open -> closed assignee: docs@python -> skrah resolution: fixed stage: patch review -> resolved
2019年02月02日 14:46:11	skrah	set	messages: + msg334748
2019年02月02日 14:40:45	miss-islington	set	pull_requests: + pull_request11641
2019年02月02日 14:40:34	miss-islington	set	pull_requests: + pull_request11640
2019年02月02日 14:40:22	miss-islington	set	pull_requests: + pull_request11639
2019年02月02日 14:37:43	skrah	set	messages: + msg334747
2018年05月04日 12:32:07	cheryl.sabella	set	messages: + msg316169
2017年12月12日 01:43:28	cheryl.sabella	set	versions: + Python 3.7, - Python 3.4
2017年12月12日 01:41:34	cheryl.sabella	set	keywords: + patch stage: patch review pull_requests: + pull_request4704
2017年08月24日 11:20:10	skrah	set	messages: + msg300781
2017年08月24日 11:13:20	skrah	set	messages: + msg300780
2017年08月24日 00:43:30	rhettinger	set	nosy: + rhettinger messages: + msg300770
2017年08月23日 22:39:11	cheryl.sabella	set	nosy: + cheryl.sabella messages: + msg300769
2016年02月03日 10:41:36	skrah	set	messages: + msg259471
2016年02月03日 10:00:18	jneb	set	messages: + msg259467
2016年02月03日 09:51:53	vstinner	set	nosy: + vstinner messages: + msg259466
2016年02月03日 09:42:27	jneb	set	versions: + Python 3.4, - Python 2.7 nosy: + docs@python messages: + msg259465 assignee: docs@python components: + Documentation, - Library (Lib)
2016年02月02日 19:10:40	skrah	set	nosy: + skrah messages: + msg259421
2016年02月02日 18:07:50	mark.dickinson	set	messages: + msg259408
2016年02月02日 16:56:39	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg259402
2016年02月01日 14:20:50	jneb	set	messages: + msg259326
2016年02月01日 14:02:22	mark.dickinson	set	nosy: + mark.dickinson messages: + msg259324
2016年02月01日 10:28:33	jneb	create

homepage