I would like to be able to generate random bytea
fields of arbitrary length (<1Gb) for populating test data.
What is the best way of doing this?
3 Answers 3
Enhancing Jack Douglas's answer to avoid the need for PL/PgSQL looping and bytea concatenation, you can use:
CREATE OR REPLACE FUNCTION random_bytea(bytea_length integer)
RETURNS bytea AS $body$
SELECT decode(string_agg(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0') ,''), 'hex')
FROM generate_series(1, 1ドル);
$body$
LANGUAGE 'sql'
VOLATILE
SET search_path = 'pg_catalog';
It's a simple SQL
function that's cheaper to call than PL/PgSQL.
The difference in performance due to the changed aggregation method is immense for larger bytea
values. Though the original function is actually up to 3x faster for sizes < 50 bytes, this one scales much better for larger values.
Or use a C extension function:
I've implemented a random bytea generator as a simple C extension function. It's in my scrapcode repository on GitHub. See the README there.
It nukes the performance of the above SQL version:
regress=# \a
regress=# \o /dev/null
regress=# \timing on
regress=# select random_bytea(2000000);
Time: 895.972 ms
regress=# drop function random_bytea(integer);
regress=# create extension random_bytea;
regress=# select random_bytea(2000000);
Time: 24.126 ms
-
1Well, I came up with nearly the same solution, but tested only for lower values. There @Jack's solution was a clear winner. +1 for you for not stopping here :)András Váczi– András Váczi2012年08月16日 04:35:26 +00:00Commented Aug 16, 2012 at 4:35
-
Thank you - this is excellent and thought provoking. I think
FROM generate_series(0, 1ドル);
needs to beFROM generate_series(1, 1ドル);
. Have you tried recursion? My limited testing implies that this scales better:Jack Douglas– Jack Douglas2012年08月16日 05:45:05 +00:00Commented Aug 16, 2012 at 5:45 -
2I tried symlinking
/dev/urandom
into/var/lib/pgsql/data
and reading it withpg_read_file()
for bonus crazy points, but unfortunatelypg_read_file()
readstext
input via an encoding conversion, so it can't read bytea. If you really want max speed, write aC
extension function that uses a fast pseudo-random number generator to produce binary data and wrap a bytea datum around the buffer :-)Craig Ringer– Craig Ringer2012年08月16日 06:29:59 +00:00Commented Aug 16, 2012 at 6:29 -
1@JackDouglas I couldn't help it. C extension version of
random_bytea
. github.com/ringerc/scrapcode/tree/master/postgresql/…Craig Ringer– Craig Ringer2012年08月16日 08:57:09 +00:00Commented Aug 16, 2012 at 8:57 -
1Another excellent answer! Actually one of the best I've seen so far. I haven't tested the extension, but I trust it works as advertised.Erwin Brandstetter– Erwin Brandstetter2012年08月16日 22:42:04 +00:00Commented Aug 16, 2012 at 22:42
The pgcrypto extension has gen_random_bytes(count integer)
:
test=# create extension pgcrypto;
test=# select gen_random_bytes(16);
gen_random_bytes
------------------------------------
\xaeb98ae41489460c5292aafade4498ee
(1 row)
The create extension
only needs to be done once.
-
2That's the right answer as long as you need <=1024 bytes at a time.Christophe– Christophe2023年04月24日 17:06:07 +00:00Commented Apr 24, 2023 at 17:06
I would like to be able to generate random bytea fields of arbitrary length
This function will do it, but 1Gb will take a long time because it does not scale linearly with output length:
create function random_bytea(p_length in integer) returns bytea language plpgsql as $$
declare
o bytea := '';
begin
for i in 1..p_length loop
o := o||decode(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0'), 'hex');
end loop;
return o;
end;$$;
output test:
select random_bytea(2);
/*
|random_bytea|
|:-----------|
|\xcf99 |
*/
select random_bytea(10);
/*
|random_bytea |
|:---------------------|
|\x781b462c3158db229b3c|
*/
select length(random_bytea(100000))
, clock_timestamp()-statement_timestamp() time_taken;
/*
|length|time_taken |
|-----:|:--------------|
|100000|00:00:00.654008|
*/
dbfiddle here
-
Nice use of width_bucket. Handy.Craig Ringer– Craig Ringer2012年08月16日 00:22:34 +00:00Commented Aug 16, 2012 at 0:22
-
1I've enhanced your approach to avoid the PL/PgSQL and expensive concatenation loop; see new answer. By using string_agg over generate_series instead of a PL/PgSQL concatenation loop on bytea I'm seeing a 150-fold improvement in performance.Craig Ringer– Craig Ringer2012年08月16日 00:38:33 +00:00Commented Aug 16, 2012 at 0:38