I am trying to stream binary data into a bytea
field.
The process is quite simple:
loop until the end of the incoming stream
UPDATE myTable SET data = data || $chunk WHERE id = myId
$chunk
is a binding for the current stream chunk- The row (
myId
) already exists with NULL data - each chunk is about 64k
- the final data is around 4MB
Everything is working properly expect that concatenating data become more and more slow as chunks are stored.
Since I need to keep my SQL quite portable, I cannot use "PostgreSQL Large Objects"
Is it possible to optimize this process ?
- maybe by pre-allocating with with an empty data, and then use
overlay()
to place each chunk ?
-
1Can't you assemble the chunks outside the database? I think Postgres has to rewrite the entire data for each chunk in this case, so you're asking it to do a large amount of unnecessary work.Mad Scientist– Mad Scientist2018年02月13日 09:42:49 +00:00Commented Feb 13, 2018 at 9:42
-
Unfortunately, due to memory limit reason, it is not possible to assemble chunks outside the database.Franck Freiburger– Franck Freiburger2018年02月13日 13:46:02 +00:00Commented Feb 13, 2018 at 13:46
1 Answer 1
Updating a row in a multiversion model like postgres means creating a second copy of that row with the new contents. Physically, there is no in-place update: UPDATE is similar to DELETE + INSERT the new contents.
So in the above loop, the first chunk, instead of being written once, is written N times, the second chunk is written N-1 times, the third N-2 times and so on.
To make it worse, all these writes have to go to the WAL files for journaling too, and all the intermediate versions of the rows will need be picked up by the autovacuum process to be eventually discarded.
Assuming that these chunks cannot be assembled on the client and must be streamed, it might help to do something like this:
CREATE TEMPORARY sequence s;
CREATE TEMPORARY TABLE buffer(seq int default nextval('s'), chunk bytea);
-- buffer in temporary storage
LOOP
INSERT INTO buffer(chunk) VALUES ($chunk)
END LOOP
-- assemble in final storage
INSERT INTO permanent_table(data)
SELECT string_agg(chunk,''::bytea order by seq) FROM buffer;
TRUNCATE (or DROP) TABLE buffer;
At least the chunks will be written only twice (once in buffer storage and once in final durable storage), and only once in the journal, as the temporary table won't be WAL-logged.
-
Assuming that we are okay with Postgres crashing meaning that we lose the streamed chunks up to that point, would making the
buffer
table additionallyUNLOGGED
be more performant?Mario Ishac– Mario Ishac2024年02月17日 09:23:56 +00:00Commented Feb 17, 2024 at 9:23 -
@MarioIshac: no because temporary tables are already unlogged by design.Daniel Vérité– Daniel Vérité2024年02月19日 11:50:22 +00:00Commented Feb 19, 2024 at 11:50