Reuse SELECT query by adding results in array?

Question 1

I have written a PostgreSQL function, which returns a specific order of products. Now I would like, not only to show, but to put the results of the first SELECT query to an array as well, so I can reuse the ID's inside another select query. I first tried to add an alias to the select query like SELECT * FROM (SELECT id FROM products) as pr and use pr inside the NOT IN(pr) statement of the second query, but that doesn't work ...

I will explain it more clearly with an example, this is a simplified version of the function:

CREATE OR REPLACE FUNCTION featured_products(
 valid_to_in timestamp without time zone,
 taxonomy_id_in integer,
 product_limit_in integer)
 RETURNS SETOF integer AS
$BODY$
 BEGIN
 RETURN QUERY
 (
 -- #1
 SELECT * FROM (
 SELECT "product"."supplier_id" FROM products AS "product"
 ) AS "featured"
 LIMIT 2
 )
 UNION ALL
 SELECT *
 FROM (
 SELECT "product"."supplier_id" FROM products AS "product"
 ) AS "featured"
 WHERE id NOT IN (
 -- #2
 SELECT * FROM (
 SELECT "product"."supplier_id" FROM products AS "product"
 ) AS "featured"
 LIMIT 2
 )
 LIMIT product_limit_in;
 END;
 $BODY$
 LANGUAGE plpgsql VOLATILE;

I deleted some joins and GROUP BY and ORDER BY statements, so the function is a bit more readable. And I added #1 and #2 inside the code above, so you know what I mean with select query 1 and 2.

As you can see the query #2 should return the same results as query #1. In reality these queries are much bigger. So you I just want to replace the second, identical query with just an array of ID's. Less code and probably faster.

I don't know how to add the IDs returned from the first query, to an array and put that in a NOT IN(<id's>) statement instead the second query.

Anyone who does know how to fix this?

Question 2

How about a CTE (WITH x as (...subquery...)) at the upper level of the UNION query?

Question 3

Very nice, thanks. Less code it is ;) But does it still execute the complete query 2 times if I use 2 times SELECT * FROM <name_of_CTE_WITH_query>? Or does it save the results in some kind of cache? Cause it still takes half a second to execute the function

Question 4

The CTE subquery should be executed only once, but more generally EXPLAIN or EXPLAIN ANALYZE should be used to know how the planner breaks down a query execution.

Question 5

It's a textbook case for a CTE , like @Daniel commented.
The example can be simplified some more. And you need to be aware of how LIMIT works in a UNION query.

CREATE OR REPLACE FUNCTION featured_products(valid_to_in timestamp
 , taxonomy_id_in integer
 , product_limit_in integer)
 RETURNS SETOF integer AS
$func$
BEGIN
 RETURN QUERY
 WITH featured AS (SELECT supplier_id FROM products LIMIT 2)
 SELECT supplier_id
 FROM featured
 UNION ALL
 (
 SELECT p.supplier_id
 FROM products p
 LEFT JOIN featured f USING (supplier_id)
 WHERE f.supplier_id IS NULL
 LIMIT product_limit_in
 ) -- parens required - or not?
END
$func$ LANGUAGE plpgsql VOLATILE;

LIMIT can only be applied once in a UNION (ALL) query, unless you enclose the leg of the query in parentheses. You may or may not want to add parentheses.
- The way I have it, a maximum of product_limit_in rows are returned in addition to the "featured" rows from the CTE.
- If you remove the parentheses you get a maximum of product_limit_in rows total - meaning that even "featured" products may be discarded.
  Related: Optimize a query on two big tables
Either way, don't ORDER BY the outer (combined) result before you LIMIT, if you can avoid it. Postgres can optimize the query very efficiently and just stop evaluating once enough rows have been returned (possibly fetching tuples from the top of a matching index). That would not be possible any more, which can make a huge difference in performance.
Using LEFT JOIN / NOT NULL to exclude featured rows from the second SELECT, which is probably faster than NOT IN and does not carry "surprises" when dealing with NULL values or empty results.
- Select rows which are not present in other table
In Postgres (as opposed to some other RDBMS), you can refer to p.supplier_id and f.supplier_id after joining with USING (supplier_id).

And yes, the CTE is only evaluated once:

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries.

Bold emphasis mine.

score 0 · Accepted Answer · 2015-02-12 16:29:04Z

It's a textbook case for a CTE , like @Daniel commented.
The example can be simplified some more. And you need to be aware of how LIMIT works in a UNION query.

CREATE OR REPLACE FUNCTION featured_products(valid_to_in timestamp
 , taxonomy_id_in integer
 , product_limit_in integer)
 RETURNS SETOF integer AS
$func$
BEGIN
 RETURN QUERY
 WITH featured AS (SELECT supplier_id FROM products LIMIT 2)
 SELECT supplier_id
 FROM featured
 UNION ALL
 (
 SELECT p.supplier_id
 FROM products p
 LEFT JOIN featured f USING (supplier_id)
 WHERE f.supplier_id IS NULL
 LIMIT product_limit_in
 ) -- parens required - or not?
END
$func$ LANGUAGE plpgsql VOLATILE;

LIMIT can only be applied once in a UNION (ALL) query, unless you enclose the leg of the query in parentheses. You may or may not want to add parentheses.
- The way I have it, a maximum of product_limit_in rows are returned in addition to the "featured" rows from the CTE.
- If you remove the parentheses you get a maximum of product_limit_in rows total - meaning that even "featured" products may be discarded.
  Related: Optimize a query on two big tables
Either way, don't ORDER BY the outer (combined) result before you LIMIT, if you can avoid it. Postgres can optimize the query very efficiently and just stop evaluating once enough rows have been returned (possibly fetching tuples from the top of a matching index). That would not be possible any more, which can make a huge difference in performance.
Using LEFT JOIN / NOT NULL to exclude featured rows from the second SELECT, which is probably faster than NOT IN and does not carry "surprises" when dealing with NULL values or empty results.
- Select rows which are not present in other table
In Postgres (as opposed to some other RDBMS), you can refer to p.supplier_id and f.supplier_id after joining with USING (supplier_id).

And yes, the CTE is only evaluated once:

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries.

Bold emphasis mine.

Stack Exchange Network

Reuse SELECT query by adding results in array?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Reuse SELECT query by adding results in array?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions