I implemented a simple function that generated random bytes Vec<u8>
with given n
size. Then I wondered how can I optimize this.
Here is the first implementation:
use rand::{Rng, RngCore, SeedableRng};
use rand_chacha::ChaCha20Rng;
fn random_bytes(n: usize) -> Vec<u8> {
let mut bytes = Vec::with_capacity(n);
let mut rng = ChaCha20Rng::from_entropy();
for _ in 0..n {
let b = rng.gen::<u8>();
bytes.push(b);
}
bytes
}
I wanted to go through unsafe path to make things interesting.
First thing that I came up with is to make allocated memory uninitialized. Using Box::<[T]>::new_uninit_slice
which happens to be stabilized in 1.82.0.
Second thing that came to my mind is to fill the bytes in chunks. RngCore
trait provides next_u64
method which returns a random u64
. But I could not figure out how to handle remainders. (e.g. 13 bytes is not multiple of 8 bytes leaving 5 byte chunk at the end). So I relied on using RngCore::fill_bytes
method.
Here is the unsafe implementation:
fn unsafe_random_bytes(n: usize) -> Vec<u8> {
let mut uninit_boxed = Box::<[u8]>::new_uninit_slice(n);
let bytes = {
let ptr = uninit_boxed[0].as_mut_ptr();
unsafe { std::slice::from_raw_parts_mut(ptr, n) }
};
let mut rng = ChaCha20Rng::from_entropy();
rng.fill_bytes(bytes);
unsafe { uninit_boxed.assume_init().into_vec() }
}
There are two unsafe blocks, first is for creating mutable slice from pointer and second is the assuming the memory initialized.
I'm not sure if there is any UBs. But it seems to be working. In my opinion, since box ensures that n
amount of type T
is allocated. We can construct a mutable slice from first element address, since MaybeUninit
and ManuallyDrop
is #[repr(transparent)]
.
fill_bytes
from rand::RngCore
guarantees that the destination is filled with new data.
Thus, assuming initialized for the second unsafe block also ok.
Here is the comparison for both functions using criterion
crate:benchmark
Let me know what your thoughts are. I may be missing something or don't have the knowledge.
1 Answer 1
Design
Your functions instantiate a new RNG on each call. It'd be cheaper to re-use an existing RNG.
While at it, you can use extension traits to provide the respective methods for RNGs.
Performance
Your safe approach is slow because you generate each byte one by one and push it to the Vec
. Using fill_bytes
improves performance a lot, even when not using unsafe
.
Safety
You did not provide any // SAFETY: ...
comment. Why do you assume the operations are safe? I'm not saying they are not, but you did not provide any reasoning as to why they are.
Food for thought: What happens with the uninit slice when your code panics in-between the two unsafe
statements?
Alternative implementation
use rand::{Rng, RngCore};
pub trait RandomBytes {
fn random_bytes(&mut self, n: usize) -> Vec<u8>;
}
impl<T> RandomBytes for T
where
T: Rng,
{
fn random_bytes(&mut self, n: usize) -> Vec<u8> {
let mut bytes = Vec::with_capacity(n);
for _ in 0..n {
let b = self.gen::<u8>();
bytes.push(b);
}
bytes
}
}
pub trait UnsafeRandomBytes {
fn unsafe_random_bytes(&mut self, n: usize) -> Vec<u8>;
}
impl<T> UnsafeRandomBytes for T
where
T: RngCore,
{
fn unsafe_random_bytes(&mut self, n: usize) -> Vec<u8> {
let mut uninit_boxed = Box::<[u8]>::new_uninit_slice(n);
let bytes = {
let ptr = uninit_boxed[0].as_mut_ptr();
unsafe { std::slice::from_raw_parts_mut(ptr, n) }
};
self.fill_bytes(bytes);
unsafe { uninit_boxed.assume_init().into_vec() }
}
}
pub trait AltRandomBytes {
fn alt_random_bytes(&mut self, n: usize) -> Vec<u8>;
}
impl<T> AltRandomBytes for T
where
T: RngCore,
{
fn alt_random_bytes(&mut self, n: usize) -> Vec<u8> {
let mut bytes = vec![0; n];
self.fill_bytes(&mut bytes);
bytes
}
}
Benchmark
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
use rand_chacha::rand_core::SeedableRng;
use rand_chacha::ChaCha20Rng;
use random_bytes::{AltRandomBytes, RandomBytes, UnsafeRandomBytes};
fn bench_random_bytes(c: &mut Criterion) {
let mut group = c.benchmark_group("Random");
for i in [10usize, 100, 1000, 2000, 4000, 8000, 16_000] {
group.bench_with_input(BenchmarkId::new("Safe", i), &i, |b, i| {
let mut rng = ChaCha20Rng::from_entropy();
b.iter(|| RandomBytes::random_bytes(&mut rng, *i))
});
group.bench_with_input(BenchmarkId::new("Unsafe", i), &i, |b, i| {
let mut rng = ChaCha20Rng::from_entropy();
b.iter(|| UnsafeRandomBytes::unsafe_random_bytes(&mut rng, *i))
});
group.bench_with_input(BenchmarkId::new("Safe alt", i), &i, |b, i| {
let mut rng = ChaCha20Rng::from_entropy();
b.iter(|| AltRandomBytes::alt_random_bytes(&mut rng, *i))
});
}
group.finish();
}
criterion_group!(benches, bench_random_bytes);
criterion_main!(benches);
You be the judge, as to whether the performance difference justifies the use of unsafe
here.
from_raw_parts_mut
requires that all elements are initialized. \$\endgroup\$