Issue 37000: _randbelow_with_getrandbits function inefficient with powers of two

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81181

classification

Title:	_randbelow_with_getrandbits function inefficient with powers of two
Type:	performance	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8

process

Dependencies:	Superseder:
Status:	closed	Resolution:	rejected
Assigned To:	rhettinger	Nosy List:	Mathis Hammel, mark.dickinson, rhettinger, tim.peters
Priority:	low	Keywords:	patch

Created on 2019年05月21日 20:50 by Mathis Hammel, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 13485	closed	Mathis Hammel, 2019年05月22日 06:20

Messages (6)
msg343094 - (view)	Author: Mathis Hammel (Mathis Hammel) *	Date: 2019年05月21日 20:50
In case _randbelow_with_getrandbits is called with a power of two as its argument (say 2^k), the function will consume k+1 random bits instead of k. Instead of never being rejected, the sampled number will be rejected on average once per call, causing a computational overhead in addition to wasting k+2 bits on average. This is due to taking n.bit_size() instead of (n-1).bit_size() for the size of the random candidates. Using n instead of n-1 is apparently deliberate in order to save the case where n = 1, but this could be avoided by directly returning 0 if n == 1.
msg343149 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2019年05月22日 06:52
> this could be avoided by directly returning 0 if n == 1 The other solution is to make `getrandbits(0)` valid, always returning `0`.
msg343235 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2019年05月22日 20:40
Some quick notes: * In issue 33144, we achieved a significant speed-up for _randbelow_with_getrandbits() by removing a single test. The code for that method is thin and almost any additional logic will slow it down. * The attached PR (now closed) causes a performance regression. Shuffling a thousand element list regressed from 505 usec per loop to 576 usec per loop. * We only promise that the output of random() will be reproducible across versions; however, we should have an aversion to changing the output of the other methods unless it is really necessary (because it may change the result of simulations or random selections which will cause some consternation for some end-users). For seed(8675309), the result of "[randrange(1024) for i in range(10)]" changes under the PR from [823, 438, 575, 465, 718, 186, 25, 1015, 654, 988] to [411, 219, 522, 961, 679, 516, 881, 919, 287, 882]. This is allowed but not desireable. When I get a chance, I'll take a closer look at Mark's suggestion.
msg343263 - (view)	Author: Tim Peters (tim.peters) * (Python committer)	Date: 2019年05月23日 01:15
I believe the thrust of Mark's suggestion was that it would allow using `k = (n-1).bit_length()` even when n == 1, without special-casing n == 1. But you'd still be adding a new "subtract 1" operation, and would still change results in some cases. That said, given that `(0).bit_length() == 0`, it's a bit surprising all on its own that `getrandbits(0)` raises an exception. In any case, I'd leave _randbelow_with_getrandbits alone. Conauming "extra" bits is generally a red herring, since the underlying `getrandbits()` consumes 32 bits at a time from the Twister. That is, we're _typically_ "wasting" more than a dozen bits regardless already (e.g., getrandbits(2), getrandbits(17), and getrandbits(29) all consume 32 bits). It's unfortunate that _randbelow_with_getrandbits(power_of_2) may invoke getrandbits() more than once. But there's also a bright side: because there's always the possibility that _randbelow_with_getrandbits() may invoke getrandbits() more than once, we can't guess how many times getrandbits() _was_ called. So, in turn, we can't know how much of the Twister's state space was consumed. Which, in turn, makes it much harder to deduce the Twister's' internal state from the visible outputs (but this can be done with certainty from a long enough string of, say, random.choice([0, 1]) outputs if we knew getrandbits was called exactly once for each, despite that we're only seeing 1 bit of each 32-bit Twister output). That last point shouldn't drive anything, but it is kinda pleasant that people inappropriately using the Twister in contexts where keeping secrets is important are partially protected by under-the-covers accept/reject methods.
msg343302 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2019年05月23日 15:22
> it's a bit surprising all on its own that `getrandbits(0)` > raises an exception. Given that there would be no randomness in the result, it makes sense to me that getrandbits(0) is documented to raise an exception. Related: `randrange(0)` raises an exception `choice([])` raises an exception > In any case, I'd leave _randbelow_with_getrandbits alone. That makes sense to me as well. I'll mark this as closed. There's one other bright side. If someone really cares about the speed of the power-of-two case, they can already call `getrandbits(10)` instead of `randrange(1024)`. The former is about 7x faster. Mathis, thank you for taking the time to look at this code.
msg343305 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2019年05月23日 15:47
Related: `randrange(0)` raises an exception `choice([])` raises an exception Those are very different from `getrandbits(0)`: in both cases there's no reasonable value that can be returned: for the first case, there's no integer `x` with `0 <= x < 0`; for the second, there's no element of `[]`, period. In contrast, there's an obvious, valid, return value for `getrandbits(0)`. The `getrandbits(0)` example is much more similar to `randrange(1)` (in fact, it's pretty much the same thing: apart from `n = 0`, `getrandbits(n)` is equivalent at some level to `randrange(2**n)`. So if `getrandbits(0)` should be an exception on the basis of not having any randomness in the result, then `randrange(1)` should be an exception on the same basis, as should `random.uniform(2.0, 2.0)`, etc. So to me, it makes no sense at all that `getrandbits(0)` raises: I can't see any good reason for it to do so.

History
Date	User	Action	Args
2022年04月11日 14:59:15	admin	set	github: 81181
2021年07月12日 20:00:39	mark.dickinson	link	issue44593 superseder
2021年01月28日 05:57:52	rhettinger	link	issue43040 superseder
2019年05月23日 15:47:29	mark.dickinson	set	messages: + msg343305
2019年05月23日 15:22:36	rhettinger	set	status: open -> closed resolution: rejected messages: + msg343302 stage: patch review -> resolved
2019年05月23日 01:15:09	tim.peters	set	nosy: + tim.peters messages: + msg343263
2019年05月22日 20:40:08	rhettinger	set	messages: + msg343235
2019年05月22日 06:52:08	mark.dickinson	set	messages: + msg343149
2019年05月22日 06:20:17	Mathis Hammel	set	keywords: + patch stage: patch review pull_requests: + pull_request13398
2019年05月22日 06:08:17	rhettinger	set	priority: normal -> low assignee: rhettinger versions: - Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.9
2019年05月21日 20:50:16	Mathis Hammel	create

homepage