BitGenerator support #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

flying-sheep wants to merge 47 commits into PyO3:main

from flying-sheep:pa/bitgen

Open

BitGenerator support #499

flying-sheep wants to merge 47 commits into PyO3:main from flying-sheep:pa/bitgen

Conversation

@flying-sheep

Copy link

@flying-sheep flying-sheep commented Jun 6, 2025 •

edited

Loading

See

Fixes #498

The idea is to have a safe wrapper around the npy_bitgen struct that implements rand::RngCore. That way pyo3 functions could be passed a np.random.Generator, get that wrapper from it, and pass it to Rust APIs, which could then call its methods repeatedly.

The way it’s implemented, the workflow would look like this:

acquire GIL
downcast a np.random.BitGenerator instance into a numpy::random::PyBitGenerator.
call .lock() on it to get a numpy::random::PyBitGeneratorGuard.
release GIL
call functions on guard object without needing to hold the GIL

TODO:

I see local crashes when running all tests, so there’s probably some UB, I’d appreciate help to fix it.

Safety

If somebody releases the threading lock of the BitGenerator while we’re using it, this isn’t safe 🤔

API design options

I could make this more complex by adding a new trait that is implemented by both PyBitGenerator and PyBitGeneratorGuard, allowing to choose if someone wants to

use the PyBitGenerator’s random_* methods directly on that object while holding the GIL and without locking it
use it like it’s used now, by locking the np.random.BitGenerator and returning a GIL-free object that can be used.

but for now I just implemented the use case that’s actually desired.

@flying-sheep


 WIP bitgen

06d6ce1

@flying-sheep flying-sheep changed the title ~~(削除) BItGenerator support (削除ここまで)~~ (追記) BitGenerator support (追記ここまで)

Jun 6, 2025

flying-sheep added 14 commits

June 6, 2025 19:11

@flying-sheep


 nonnull

07e2416

@flying-sheep


 fix and test

b611943

@flying-sheep

cmt

d93a264

@flying-sheep


 safer: don’t allow trying to get BitGen from any PyAny

f52b2fa

@flying-sheep


 less indirection

05814d6

@flying-sheep


 add tryfrom

37d360e

@flying-sheep


 implement rand

eed5b19

@flying-sheep

fmt

6c1a89b

@flying-sheep


 rename and deref

d1909d3

@flying-sheep


 order

bde2553

@flying-sheep


 make into lock

a0b9ec5

@flying-sheep


 docs

ee32246

@flying-sheep


 more docs

1be6838

@flying-sheep


 guard

2aa3d90

@flying-sheep flying-sheep marked this pull request as ready for review

June 8, 2025 12:44

Icxolu

Icxolu reviewed

Jun 8, 2025

View reviewed changes

Copy link

Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a useful addition! Thanks for working on it. I'm definitely not an expert here, but I left a few comment about things that stood out to me. Let me know what you think.
Also, are there any differences between numpy v1 and v2 that we need to consider?

.vscode/settings.json

Copy link

Contributor

@Icxolu Icxolu Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed

Copy link

Author

@flying-sheep flying-sheep Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do when I’m done. I like working on multiple machines, and I don’t like re-doing settings for individual projects

src/random.rs Outdated Show resolved Hide resolved

flying-sheep added 10 commits

June 8, 2025 17:01

@flying-sheep


 call_method0

0258e6d

@flying-sheep


 reaname test

876001b

@flying-sheep


 manually drop and capsule

71ce8be

@flying-sheep


 remove useless test

2de7072

@flying-sheep


 doctests

016eb7a

@flying-sheep


 smaller

1f7f37f

@flying-sheep


 clarify where to release the GIL

1d01c7a

@flying-sheep


 safety

c90176a

@flying-sheep


 oops

f49d3fa

@flying-sheep


 less unsafe

a16846d

Icxolu

Icxolu reviewed

Jun 8, 2025

View reviewed changes

src/random.rs Outdated Show resolved Hide resolved

@flying-sheep


 add thread test

573d890

Icxolu

Icxolu reviewed

Jun 9, 2025

View reviewed changes

Copy link

Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't love the drop impl, but with the way to manually release it with a Python token, it may be acceptable. Maybe @davidhewitt has an idea and/or comments about the appoach. Otherwise I only have a few minor remarks.

src/random.rs Outdated Show resolved Hide resolved

@flying-sheep

Copy link

Author

flying-sheep commented Jun 9, 2025 •

edited

Loading

Thanks for the comments, I’ll address them!

The main issue is that I think I’m triggering UB somehow and I don’t know how: when running all tests, often some unrelated test run after this one crashes ...

Also, are there any differences between numpy v1 and v2 that we need to consider?

I didn’t forget about this either, will look!

/edit: the C API for random is there since 1.19: https://numpy.org/doc/1.26/reference/random/c-api.html

flying-sheep added 6 commits

June 10, 2025 10:11

@flying-sheep


 no copy/clone

c6105c9

@flying-sheep


 rename to release

3a0aa92

@flying-sheep


 remove lifetime

a92861a

@flying-sheep


 static

6dbb6dc

@flying-sheep


 no mut ref conversion

b102d20

@flying-sheep


 disambiguate

e5e440e

mejrs

mejrs requested changes

Jun 10, 2025

View reviewed changes

.vscode/settings.json Show resolved Hide resolved

src/npyffi/random.rs Show resolved Hide resolved

Cargo.toml Outdated Show resolved Hide resolved

src/npyffi/random.rs Outdated Show resolved Hide resolved

src/random.rs

//! # use pyo3::prelude::*;

//! use rand::Rng as _;

//! # use numpy::random::{PyBitGenerator, PyBitGeneratorMethods as _};

//! # // TODO: reuse function definition from above?

Copy link

Member

@mejrs mejrs Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there should be a convenient way to get this. I'm thinking about something like

impl PyBitGenerator {
 fn new(py: Python<'_>) -> PyResult<Bound<..>>;
}

Copy link

Author

@flying-sheep flying-sheep Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are many implementations, we’d have to cover all of them.

I’d rather leave this minimal until this PR is mostly done.

src/random.rs Outdated Show resolved Hide resolved

src/random.rs Show resolved Hide resolved

src/random.rs Outdated Show resolved Hide resolved

flying-sheep added 8 commits

June 10, 2025 14:59

@flying-sheep


 rand_core only

e73e3a2

@flying-sheep


 rename bitgen type

c6493df

@flying-sheep


 c_str macro

2327f36

@flying-sheep


 intern strings

e5c6458

@flying-sheep


 docs

e8cd5e8

@flying-sheep


 more doc

@flying-sheep


 clean up tests

@flying-sheep


 no let-else

1fd7bb5

Icxolu

Icxolu reviewed

Jun 10, 2025

View reviewed changes

src/random.rs Outdated

.getattr(intern!(py, "capsule"))?

.downcast_into::<PyCapsule>()?;

let lock = self.getattr(intern!(py, "lock"))?;

// we’re holding the GIL, so there’s no race condition checking the lock and acquiring it later.

Copy link

Contributor

@Icxolu Icxolu Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be true under free-threaded Python. Is the lock known to be threadsafe and acquire simply fails if the lock is already acquired? If not we may need to guard the whole module under cfg(not(Py_GIL_DISABLED))

Copy link

Author

@flying-sheep flying-sheep Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn’t fail, it hangs, but that’s configurable with a timeout or by making it non-blocking: https://docs.python.org/3/library/threading.html#threading.Lock.acquire

and it’s a threading.Lock!

src/random.rs Outdated Show resolved Hide resolved

flying-sheep added 3 commits

June 10, 2025 18:55

@flying-sheep


 use GILOnceCell::import

@flying-sheep


 add released attr

7bc0be8

@flying-sheep

f64

8caf054

@flying-sheep

Copy link

Author

flying-sheep commented Jun 10, 2025

OK, with the release attr and changing the parallel test to use the explicit release as well, the UB now sometimes manifests as a lock poisoning error. progress?

@Icxolu

Copy link

Contributor

Icxolu commented Jun 10, 2025

I may have found a problem:

This fails as intended:

Python::with_gil(|py| {
 let obj = get_bit_generator(py)?;
 let a = obj.lock()?;
 let b = obj.lock()?;
 Ok::<_, PyErr>(())
})
.unwrap();

returning

called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'RuntimeError'>, value: RuntimeError('BitGenerator is already locked'), traceback: None }

But this does not fail:

Python::with_gil(|py| {
 let a = get_bit_generator(py)?.lock()?;
 let b = get_bit_generator(py)?.lock()?;
 Ok::<_, PyErr>(())
})
.unwrap();

and crucially it gives the same pointers:

[src/random.rs:113:18] ptr = 0x00007b9f6be44cc0
[src/random.rs:113:18] *ptr = bitgen_t {
 state: 0x00007b9f6be44d08,
 next_uint64: 0x00007b9f6837d320,
 next_uint32: 0x00007b9f6837d370,
 next_double: 0x00007b9f6837d3f0,
 next_raw: 0x00007b9f6837d320,
}
[src/random.rs:113:18] ptr = 0x00007b9f6be44cc0
[src/random.rs:113:18] *ptr = bitgen_t {
 state: 0x00007b9f6be44d08,
 next_uint64: 0x00007b9f6837d320,
 next_uint32: 0x00007b9f6837d370,
 next_double: 0x00007b9f6837d3f0,
 next_raw: 0x00007b9f6837d320,
}

So when using multiple threads, for example multiple tests running in parallel, we have a data race on the state. I think we need a lock across all instances to make this work.

@flying-sheep

Copy link

Author

flying-sheep commented Jun 11, 2025 •

edited

Loading

Oh wow, so while default_rng(...).bit_generator.state is always different (somehow), default_rng(...).bit_generator.ctypes.state_address isn’t necessarily (somehow).

note the different seed sequence passed to the function, not even then is the state address different wtf:

>>> np.random.default_rng([1, 4]).bit_generator.ctypes.state_address
4355856392
>>> np.random.default_rng([2, 4]).bit_generator.ctypes.state_address
4355856392

I have no clue what to make of that. I just assumed different random state on the Python side means a different underlying struct, because how can that not be the case?

But anyway, you made me realize that the whole approach is flawed because the same BitGenerator can be passed from the Python side multiple times. So a generator passed from Python doesn’t have guaranteed independent state from another. Therefore if we want to use it, we’d have to use its threading lock as intended instead of abusing that lock into meaning "we can now do whatever we want with it"

So I think the way to go is instead of locking to use spawn to get independent child generators (which are always different, so we could use them to our hearts’ content, but also one should probably use one per thread anyway):

>>> [bg.ctypes.state_address for bg in np.random.default_rng().bit_generator.spawn(2)]
[4355860968, 4355862376]

@mejrs

Copy link

Member

mejrs commented Jun 11, 2025

Maybe we should skip the guard part and just lock and unlock within the RngCore implementation itself. Can you give an example for why you'd want this, and why the api has this form? Why would someone want to use this rather than the RngCore impl that rand ships with? Maybe we can come up with a better design.

@flying-sheep

Copy link

Author

flying-sheep commented Jun 11, 2025 •

edited

Loading

When implementing Python-facing APIs, having a rng: np.random.Generator parameter is common. I want to write code that actually respects that parameter and uses it instead of ignoring it or calling it once to seed the actual generator.

@flying-sheep


 correct locking

d8b62ac

@juntyr

Copy link

juntyr commented Jun 24, 2025

Would it also possible to go the other way, i.e. provide a rand rng from Rust to Python as a numpy BitGenerator?

@flying-sheep

Copy link

Author

flying-sheep commented Jun 24, 2025

Yes, that's part of numpy's API as well!

@flying-sheep


 test that double locking fails

43e2d97

Labels

None yet

4 participants

@flying-sheep @Icxolu @mejrs @juntyr

Uh oh!

BitGenerator support #499

Are you sure you want to change the base?

BitGenerator support #499

Uh oh!

Conversation

@flying-sheep flying-sheep commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Safety

API design options

Uh oh!

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

Uh oh!

@Icxolu Icxolu Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

@flying-sheep flying-sheep Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flying-sheep commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@mejrs mejrs Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

@flying-sheep flying-sheep Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@Icxolu Icxolu Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

@flying-sheep flying-sheep Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flying-sheep commented Jun 10, 2025

Uh oh!

Icxolu commented Jun 10, 2025

Uh oh!

flying-sheep commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mejrs commented Jun 11, 2025

Uh oh!

flying-sheep commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juntyr commented Jun 24, 2025

Uh oh!

flying-sheep commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

@flying-sheep flying-sheep commented Jun 6, 2025 •

edited

Loading

flying-sheep commented Jun 9, 2025 •

edited

Loading

flying-sheep commented Jun 11, 2025 •

edited

Loading

flying-sheep commented Jun 11, 2025 •

edited

Loading