I have a list of columns for each entity I want to get from a SELECT/UPDATE/INSERT statement.
I want to reuse the same list instead of having a copy in each query.
Example:
SELECT col1, col2 FROM entity;
INSERT INTO entity (...) VALUES (...) RETURNING col1, col2;
UPDATE entity SET ... RETURNING col1, col2;
It's not always simple col1
, but also more complex expressions e.g. COALESCE(a, b)
. That's why I am aiming for reuse.
One way I found it can be done is with functions such as this:
CREATE FUNCTION to_entity_columns(
e entity,
OUT col1 INTEGER,
OUT col2 INTEGER
)
AS
$$
SELECT
e.col1,
e.col2
$$ LANGUAGE SQL
IMMUTABLE
STRICT;
It's possible to do:
SELECT (to_entity_columns(entity)).* FROM entity;
INSERT INTO entity (...) VALUES (...) RETURNING (to_entity_columns(entity)).*;
UPDATE entity SET ... RETURNING (to_entity_columns(entity)).*;
While this approach works, the query time now scales with a number of rows. This means the time can go up as much as 100x or 1000x. I see queries going from 1ms to 1s. The function is always IMMUTABLE but Postgres won't inline it as I would hope. It is because (see source code here) the function returns a RECORD.
I have tried to modify the function e.g. to return a composite type instead, remove .*
from the function call, but it doesn't make a difference.
The question here is twofold:
a) Is there a way to make the functions like the one above work with reasonable performance?
b) Are there any alternatives that would allow simple reuse of the list of columns like shown in the example above?
1 Answer 1
I guess that at least part of the performance problem arises because the function will be evaluated once per result row and result column, as the documentation states:
For example, if
myfunc()
is a function returning a composite type with columnsa
,b
, andc
, then these two queries have the same result:SELECT (myfunc(x)).* FROM some_table; SELECT (myfunc(x)).a, (myfunc(x)).b, (myfunc(x)).c FROM some_table;
Tip
PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example,
myfunc()
would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:SELECT m.* FROM some_table, LATERAL myfunc(x) AS m;
Placing the function in a
LATERAL
FROM
item keeps it from being invoked more than once per row.m.*
is still expanded intom.a, m.b, m.c
, but now those variables are just references to the output of theFROM
item. (TheLATERAL
keyword is optional here, but we show it to clarify that the function is gettingx
fromsome_table
.)
So you could for example rewrite the INSERT
as
WITH x(r) AS (
INSERT INTO entity (...) VALUES (...)
RETURNING to_entity_columns(entity)
)
SELECT * FROM r;
I cannot say if that will inline the function or not (read the Wiki article and experiment), but at least it will avoid calling the function more often than necessary.
-
Thank you for your suggestions. I've tried these things and the speedup is significant e.g. 50% but overall very marginal - instead of 100x slow down, I get 50x. I think functions are not a way to go about this unless they are reliably inlined.m1ch4ls– m1ch4ls2021年09月08日 18:45:52 +00:00Commented Sep 8, 2021 at 18:45
Explore related questions
See similar questions with these tags.