How can I generate a random bytea

Question 1

I would like to be able to generate random bytea fields of arbitrary length (<1Gb) for populating test data.

What is the best way of doing this?

Question 2

Enhancing Jack Douglas's answer to avoid the need for PL/PgSQL looping and bytea concatenation, you can use:

CREATE OR REPLACE FUNCTION random_bytea(bytea_length integer)
RETURNS bytea AS $body$
 SELECT decode(string_agg(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0') ,''), 'hex')
 FROM generate_series(1, 1ドル);
$body$
LANGUAGE 'sql'
VOLATILE
SET search_path = 'pg_catalog';

It's a simple SQL function that's cheaper to call than PL/PgSQL.

The difference in performance due to the changed aggregation method is immense for larger bytea values. Though the original function is actually up to 3x faster for sizes < 50 bytes, this one scales much better for larger values.

Or use a C extension function:

I've implemented a random bytea generator as a simple C extension function. It's in my scrapcode repository on GitHub. See the README there.

It nukes the performance of the above SQL version:

regress=# \a
regress=# \o /dev/null
regress=# \timing on
regress=# select random_bytea(2000000);
Time: 895.972 ms
regress=# drop function random_bytea(integer);
regress=# create extension random_bytea;
regress=# select random_bytea(2000000);
Time: 24.126 ms

Question 3

Well, I came up with nearly the same solution, but tested only for lower values. There @Jack's solution was a clear winner. +1 for you for not stopping here :)

Question 4

Thank you - this is excellent and thought provoking. I think FROM generate_series(0, 1ドル); needs to be FROM generate_series(1, 1ドル);. Have you tried recursion? My limited testing implies that this scales better:

Question 5

I tried symlinking /dev/urandom into /var/lib/pgsql/data and reading it with pg_read_file() for bonus crazy points, but unfortunately pg_read_file() reads text input via an encoding conversion, so it can't read bytea. If you really want max speed, write a C extension function that uses a fast pseudo-random number generator to produce binary data and wrap a bytea datum around the buffer :-)

Question 6

@JackDouglas I couldn't help it. C extension version of random_bytea. github.com/ringerc/scrapcode/tree/master/postgresql/…

Question 7

Another excellent answer! Actually one of the best I've seen so far. I haven't tested the extension, but I trust it works as advertised.

Question 8

The pgcrypto extension has gen_random_bytes(count integer):

test=# create extension pgcrypto;
test=# select gen_random_bytes(16);
 gen_random_bytes
------------------------------------
 \xaeb98ae41489460c5292aafade4498ee
(1 row)

The create extension only needs to be done once.

Question 9

That's the right answer as long as you need <=1024 bytes at a time.

Question 10

I would like to be able to generate random bytea fields of arbitrary length

This function will do it, but 1Gb will take a long time because it does not scale linearly with output length:

create function random_bytea(p_length in integer) returns bytea language plpgsql as $$
declare
 o bytea := '';
begin 
 for i in 1..p_length loop
 o := o||decode(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0'), 'hex');
 end loop;
 return o;
end;$$;

output test:

select random_bytea(2);
/*
|random_bytea|
|:-----------|
|\xcf99 |
*/
select random_bytea(10);
/*
|random_bytea |
|:---------------------|
|\x781b462c3158db229b3c|
*/
select length(random_bytea(100000))
 , clock_timestamp()-statement_timestamp() time_taken;
/*
|length|time_taken |
|-----:|:--------------|
|100000|00:00:00.654008|
*/

_{dbfiddle here}

Question 11

Nice use of width_bucket. Handy.

Question 12

I've enhanced your approach to avoid the PL/PgSQL and expensive concatenation loop; see new answer. By using string_agg over generate_series instead of a PL/PgSQL concatenation loop on bytea I'm seeing a 150-fold improvement in performance.

Craig Ringer Craig Ringer 57.9k6 gold badges162 silver badges194 bronze badges · Accepted Answer · 2012-08-16 00:35:28Z

Enhancing Jack Douglas's answer to avoid the need for PL/PgSQL looping and bytea concatenation, you can use:

CREATE OR REPLACE FUNCTION random_bytea(bytea_length integer)
RETURNS bytea AS $body$
 SELECT decode(string_agg(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0') ,''), 'hex')
 FROM generate_series(1, 1ドル);
$body$
LANGUAGE 'sql'
VOLATILE
SET search_path = 'pg_catalog';

It's a simple SQL function that's cheaper to call than PL/PgSQL.

The difference in performance due to the changed aggregation method is immense for larger bytea values. Though the original function is actually up to 3x faster for sizes < 50 bytes, this one scales much better for larger values.

Or use a C extension function:

I've implemented a random bytea generator as a simple C extension function. It's in my scrapcode repository on GitHub. See the README there.

It nukes the performance of the above SQL version:

regress=# \a
regress=# \o /dev/null
regress=# \timing on
regress=# select random_bytea(2000000);
Time: 895.972 ms
regress=# drop function random_bytea(integer);
regress=# create extension random_bytea;
regress=# select random_bytea(2000000);
Time: 24.126 ms

Well, I came up with nearly the same solution, but tested only for lower values. There @Jack's solution was a clear winner. +1 for you for not stopping here :)
Thank you - this is excellent and thought provoking. I think FROM generate_series(0, 1ドル); needs to be FROM generate_series(1, 1ドル);. Have you tried recursion? My limited testing implies that this scales better:
I tried symlinking /dev/urandom into /var/lib/pgsql/data and reading it with pg_read_file() for bonus crazy points, but unfortunately pg_read_file() reads text input via an encoding conversion, so it can't read bytea. If you really want max speed, write a C extension function that uses a fast pseudo-random number generator to produce binary data and wrap a bytea datum around the buffer :-)
@JackDouglas I couldn't help it. C extension version of random_bytea. github.com/ringerc/scrapcode/tree/master/postgresql/…
Another excellent answer! Actually one of the best I've seen so far. I haven't tested the extension, but I trust it works as advertised.

Stack Exchange Network

How can I generate a random bytea

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How can I generate a random bytea

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions