homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add randbytes() method to random.Random
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, rhettinger, serhiy.storchaka, tim.peters, veky, vstinner
Priority: normal Keywords: patch

Created on 2020年04月14日 22:30 by vstinner, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
run_random.py rhettinger, 2020年04月20日 18:17 Custom PRNG where randbytes() doesn't fit
Pull Requests
URL Status Linked Edit
PR 19527 merged vstinner, 2020年04月14日 22:38
PR 19574 merged serhiy.storchaka, 2020年04月17日 19:14
PR 19575 merged vstinner, 2020年04月17日 20:12
PR 19797 merged vstinner, 2020年04月29日 16:03
PR 19870 merged rhettinger, 2020年05月02日 22:45
Messages (25)
msg366454 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月14日 22:30
The random module lacks a getrandbytes() method which leads developers to be creative how to generate bytes:
https://stackoverflow.com/questions/5495492/random-byte-string-in-python
It's a common use request:
* bpo-13396 in 2011
* bpo-27096 in 2016
* https://bugs.python.org/issue40282#msg366444 in 2020
Python already has three functions to generate random bytes:
* os.getrandom(): specific to Linux, not portable
* os.urandom()
* secrets.token_bytes()
These 3 functions are based on system entropy and they block on Linux until the kernel collected enough entropy: PEP 524.
While many users are fine with these functions, there are also use cases for simulation where the security doesn't matter, and it's more about being able to get reproducible experience from a seed. That's what random.Random is about.
The numpy module provides numpy.random.bytes(length) function for such use case:
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.bytes.html
One example can be to generate UUID4 with the ability to reproduce the random UUID from a seed for testing purpose, or to get reproducible behavior.
Attached PR implements the getrandbytes() method.
msg366457 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年04月14日 23:00
If we have to have this, the method name should be differentiated from getrandbits() because the latter returns an integer. I suggest just random.bytes(n), the same as numpy.
> Python already has three functions to generate random bytes:
Now, there will be four ;-)
msg366496 - (view) Author: Vedran Čačić (veky) * Date: 2020年04月15日 08:36
> I suggest just random.bytes(n), the same as numpy.
The problem with this is that people who `from random import *` (some schools insist on this, probably because most functions they need already start with `rand`) will shadow builtin `bytes`. Not that those schools do anything with `bytes`, but still, it might be inconvenient.
(The metaproblem is of course that some functions already do the "poor man's namespacing" in C-style by starting with `rand`, and some don't. I'm always for user control of namespacing, but I'm just saying that it doesn't correspond to how many beginners use `random` module.)
msg366498 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年04月15日 08:46
Do you have another name suggestion that doesn't have a parallelism problem with the existing name? The names getrandbytes() and getrandbits() suggest a parallelism that is incorrect.
msg366500 - (view) Author: Vedran Čačić (veky) * Date: 2020年04月15日 08:57
I think that the "module owner";-P must decide whether the `random` module should follow the C-namespacing or not. Of course, I'm in the "not" camp, so I believe those two "rand..." functions (randrange is completely redundant with random.choice(range)) should be supplemented with random.int and random.float. And then random.bytes will be completely natural. And people might be gently nudged into the right direction when using Python module namespaces.
msg366502 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年04月15日 09:27
I concur that bytes() isn't a good name, but am still concerned that the proposed name is a bad API decision.
msg366505 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020年04月15日 10:50
Maybe randbytes()?
msg366507 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月15日 12:40
I like "from random import randbytes" name. I concur that "from random import bytes" overrides bytes() builtin type and so can likely cause troubles.
msg366512 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月15日 13:21
I updated my PR to rename the method to randbytes().
msg366514 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月15日 13:35
The performance of the new method is not my first motivation.
My first motivation is to avoid consumers of the random to write a wrong implementation which would be biased. It's too easy to write biased functions without notifying.
Moreover, it seems like we can do something to get reproducible behavior on different architectures (different endianness) which would also be a nice feature.
For example, in bpo-13396, Amaury found this two functions in the wild:
* struct.pack("Q", random.getrandbits(64))
* sha1(str(random.getrandbits(8*20))).digest()
As I wrote, users are creative to workaround missing features :-) I don't think that these two implementations give the same result on big and little endian.
msg366524 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020年04月15日 15:12
All Random methods give the same result independently of endianess and bitness of the platform.
> I don't think that these two implementations give the same result on big and little endian.
The second one does.
msg366665 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月17日 17:03
I wrote a quick benchmark:
---
import pyperf
import random
gen = random.Random()
# gen = random.SystemRandom()
gen.seed(850779834)
if 1: #hasattr(gen, 'randbytes'):
 func = type(gen).randbytes
elif 0:
 def py_randbytes(gen, n):
 data = bytearray(n)
 i = 0
 while i < n:
 chunk = 4
 word = gen.getrandbits(32)
 word = word.to_bytes(4, 'big')
 chunk = min(n, 4)
 data[i:i+chunk] = word[:chunk]
 i += chunk
 return bytes(data)
 func = py_randbytes
else:
 def getrandbits_to_bytes(gen, n):
 return gen.getrandbits(n * 8).to_bytes(n, 'little')
 func = getrandbits_to_bytes
runner = pyperf.Runner()
for nbytes in (1, 4, 16, 1024, 1024 * 1024):
 runner.bench_func(f'randbytes({nbytes})', func, gen, nbytes)
---
Results on Linux using gcc -O3 (without LTO or PGO) using the C randbytes() implementation as the reference:
+--------------------+-------------+----------------------------------+-------------------------------+
| Benchmark | c_randbytes | py_randbytes | getrandbits_to_bytes |
+====================+=============+==================================+===============================+
| randbytes(1) | 71.4 ns | 1.04 us: 14.51x slower (+1351%) | 244 ns: 3.42x slower (+242%) |
+--------------------+-------------+----------------------------------+-------------------------------+
| randbytes(4) | 71.4 ns | 1.03 us: 14.48x slower (+1348%) | 261 ns: 3.66x slower (+266%) |
+--------------------+-------------+----------------------------------+-------------------------------+
| randbytes(16) | 81.9 ns | 3.07 us: 37.51x slower (+3651%) | 321 ns: 3.92x slower (+292%) |
+--------------------+-------------+----------------------------------+-------------------------------+
| randbytes(1024) | 1.05 us | 173 us: 165.41x slower (+16441%) | 3.66 us: 3.49x slower (+249%) |
+--------------------+-------------+----------------------------------+-------------------------------+
| randbytes(1048576) | 955 us | 187 ms: 196.30x slower (+19530%) | 4.37 ms: 4.58x slower (+358%) |
+--------------------+-------------+----------------------------------+-------------------------------+
* c_randbytes: PR 19527, randbytes() methods implemented in C
* py_randbytes: bytearray, getrandbits(), .to_bytes()
* getrandbits_to_bytes: Serhiy's implementation: gen.getrandbits(n * 8).to_bytes(n, 'little')
So well, the C randbytes() implementation is always the fastest.
random.SystemRandom().randbytes() (os.urandom(n)) performance using random.Random().randbytes() (Mersenne Twister) as a reference:
+--------------------+-------------+---------------------------------+
| Benchmark | c_randbytes | systemrandom |
+====================+=============+=================================+
| randbytes(1) | 71.4 ns | 994 ns: 13.93x slower (+1293%) |
+--------------------+-------------+---------------------------------+
| randbytes(4) | 71.4 ns | 1.04 us: 14.60x slower (+1360%) |
+--------------------+-------------+---------------------------------+
| randbytes(16) | 81.9 ns | 1.02 us: 12.49x slower (+1149%) |
+--------------------+-------------+---------------------------------+
| randbytes(1024) | 1.05 us | 6.22 us: 5.93x slower (+493%) |
+--------------------+-------------+---------------------------------+
| randbytes(1048576) | 955 us | 5.64 ms: 5.91x slower (+491%) |
+--------------------+-------------+---------------------------------+
os.urandom() is way slower than Mersenne Twister.
Well, that's not surprising: os.urandom() requires at least one syscall (getrandom() syscall on my Linux machine).
msg366666 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月17日 17:05
New changeset 9f5fe7910f4a1bf5a425837d4915e332b945eb7b by Victor Stinner in branch 'master':
bpo-40286: Add randbytes() method to random.Random (GH-19527)
https://github.com/python/cpython/commit/9f5fe7910f4a1bf5a425837d4915e332b945eb7b
msg366676 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020年04月17日 20:51
New changeset 223221b290db00ca1042c77103efcbc072f29c90 by Serhiy Storchaka in branch 'master':
bpo-40286: Makes simpler the relation between randbytes() and getrandbits() (GH-19574)
https://github.com/python/cpython/commit/223221b290db00ca1042c77103efcbc072f29c90
msg366677 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月17日 20:54
New changeset 87502ddd710eb1f030b8ff5a60b05becea3f474f by Victor Stinner in branch 'master':
bpo-40286: Use random.randbytes() in tests (GH-19575)
https://github.com/python/cpython/commit/87502ddd710eb1f030b8ff5a60b05becea3f474f
msg366860 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年04月20日 18:17
The randbytes() method needs to depend on genrandbits(). It is documented that custom generators can supply there own random() and genrandbits() methods and expect that the other downstream generators all follow. See the attached example which demonstrates that randbytes() bypasses this framework pattern.
Also, I don't want randbytes() in the C extension. We're tried to keep as much of the code as possible in pure Python and only have the MersenneTwister specific code in the C module. The improves maintainability and makes the code more accessible to a broader audience.
Also, please don't change the name of the genrand_int32() function. It was a goal to change as little as possible from the official, standard version of the C code at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html . For the most part, we just want to wrap that code for Python bindings, but not modify it.
msg366861 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年04月20日 18:20
Direct link to MT code that I would like to leave mostly unmodified: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c 
msg366869 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年04月20日 19:25
When a new method gets added to a module, it should happen in a way that is in harmony with the module's design.
msg366894 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月20日 21:53
I created bpo-40346: "Redesign random.Random class inheritance".
msg366911 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020年04月21日 06:51
$ ./python -m timeit -s 'import random' 'random.randbytes(10**6)'
200 loops, best of 5: 1.36 msec per loop
$ ./python -m timeit -s 'import random' 'random.getrandbits(10**6*8).to_bytes(10**6, "little")'
50 loops, best of 5: 6.31 msec per loop
The Python implementation is only 5 times slower than the C implementation. I am fine with implementing randbytes() in Python. This would automatically make it depending on the getrandbits() implementation.
msg366919 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月21日 12:49
Raymond:
> Also, I don't want randbytes() in the C extension. We're tried to keep as much of the code as possible in pure Python and only have the MersenneTwister specific code in the C module. The improves maintainability and makes the code more accessible to a broader audience.
I don't see how 30 lines makes Python so harder to maintain. These lines make the function 4x to 5x faster. We are not talking about 5% or 10% faster. I think that such optimization is worth it. When did we decide to stop optimizing Python?
Raymond:
> The randbytes() method needs to depend on genrandbits().
I created bpo-40346: "Redesign random.Random class inheritance" for a more generic fix, not just randbytes().
Raymond:
> Also, please don't change the name of the genrand_int32() function. It was a goal to change as little as possible from the official, standard version of the C code at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html .
This code was already modified to replace "unsigned long" with "uint32_t" for example. I don't think that renaming genrand_int32() to genrand_uint32() makes the code impossible to maintain. Moreover, it seems like http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html was not updated for 13 years.
msg367180 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月24日 09:32
Raymond:
> The randbytes() method needs to depend on genrandbits().
I created PR 19700 which allows to keep the optimization (C implementation in _randommodule.c) and Random subclasses implement randbytes() with getrandbits().
msg367676 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月29日 16:49
New changeset 2d8757758d0d75882fef0fe0e3c74c4756b3e81e by Victor Stinner in branch 'master':
bpo-40286: Remove C implementation of Random.randbytes() (GH-19797)
https://github.com/python/cpython/commit/2d8757758d0d75882fef0fe0e3c74c4756b3e81e
msg367677 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月29日 16:51
It removed the C implementation of randbytes(): it was the root issue which started discussions here and in bpo-40346. I rejected bpo-40346 (BaseRandom) and related PRs.
I close the issue.
msg368104 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020年05月05日 05:52
New changeset f01d1be97d740ea0369379ca305646a26694236e by Raymond Hettinger in branch 'master':
bpo-40286: Put methods in correct sections. Add security notice to use secrets for session tokens. (GH-19870)
https://github.com/python/cpython/commit/f01d1be97d740ea0369379ca305646a26694236e
History
Date User Action Args
2022年04月11日 14:59:29adminsetgithub: 84466
2020年05月05日 05:52:17rhettingersetmessages: + msg368104
2020年05月02日 22:45:45rhettingersetpull_requests: + pull_request19183
2020年04月29日 16:51:01vstinnersetstatus: open -> closed

messages: + msg367677
stage: patch review -> resolved
2020年04月29日 16:49:04vstinnersetmessages: + msg367676
2020年04月29日 16:03:55vstinnersetstage: resolved -> patch review
pull_requests: + pull_request19118
2020年04月24日 09:32:37vstinnersetmessages: + msg367180
2020年04月21日 12:49:42vstinnersetmessages: + msg366919
2020年04月21日 06:51:24serhiy.storchakasetmessages: + msg366911
2020年04月20日 21:53:59vstinnersetmessages: + msg366894
2020年04月20日 19:25:22rhettingersetnosy: + tim.peters
messages: + msg366869
2020年04月20日 18:20:31rhettingersetmessages: + msg366861
2020年04月20日 18:17:58rhettingersetstatus: closed -> open
files: + run_random.py
messages: + msg366860
2020年04月17日 20:54:41vstinnersetmessages: + msg366677
2020年04月17日 20:51:35serhiy.storchakasetmessages: + msg366676
2020年04月17日 20:12:00vstinnersetpull_requests: + pull_request18915
2020年04月17日 19:14:19serhiy.storchakasetpull_requests: + pull_request18914
2020年04月17日 17:06:22vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020年04月17日 17:05:50vstinnersetmessages: + msg366666
2020年04月17日 17:03:17vstinnersetmessages: + msg366665
2020年04月15日 15:12:55serhiy.storchakasetmessages: + msg366524
2020年04月15日 13:35:40vstinnersetmessages: + msg366514
2020年04月15日 13:21:06vstinnersetmessages: + msg366512
title: Add getrandbytes() method to random.Random -> Add randbytes() method to random.Random
2020年04月15日 12:40:52vstinnersetmessages: + msg366507
2020年04月15日 10:50:23serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg366505
2020年04月15日 09:27:18rhettingersetmessages: + msg366502
2020年04月15日 08:57:43vekysetmessages: + msg366500
2020年04月15日 08:46:20rhettingersetmessages: + msg366498
2020年04月15日 08:36:31vekysetnosy: + veky
messages: + msg366496
2020年04月15日 08:04:24pitrousetnosy: + mark.dickinson
2020年04月14日 23:00:46rhettingersetnosy: + rhettinger
messages: + msg366457
2020年04月14日 22:38:33vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18877
2020年04月14日 22:30:18vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /