Help optimizing this query with multiple where exists

Question 1

In the application I'm building, the user is able to define 'types' where each 'type' has a set of 'attributes'.

The user is able to create instances of products by defining a value for each attribute the product's type has.

A pic of the schema: A pic of the schema

I'm creating the query where the user specifies the attributes values and the product type and with that I should return all the product id's that meets the query.

The problem I see in my query is that I'm performing a whole select * from attributes_products ... for each attribute that the product's type has.

Is there a way to optimize this? If I create an index in the column attributes_products.product_id would this query be actually optimal?

Example of a query where I'm looking for a product whose type has 3 attributes:

select p.id
from Products as p
where 
exists(
 select * from attributes_products
 where
 product_id = p.id AND
 attribute_id = 27 AND
 value = 'some_value'
) AND
exists(
 select * from attributes_products
 where
 product_id = p.id AND
 attribute_id = 28 AND
 value = 'other_value'
) AND
exists(
 select * from attributes_products
 where
 product_id = p.id AND
 attribute_id = 29 AND
 value = 'oother_value'
)

Many thanks.

Conclusions

So, Gareth Rees (selected answer) proposed another solution which involves multiple Joins. Here is the explanation of its query (done by PGAdmin): Selected answer query explanation

This is the explanation of the original query: Original query

I believe that the selected answer is slightly faster, but consumes a lot more memory (because of the triple join). I believe that my original query is slightly slower (very slightly, since there's an index on the attributes_products table) but a lot more efficient in memory.

Question 2

what Database Engine are you using?

Question 3

@Malachi, I'm using PostgreSQL as database manager/engine

Question 4

SQL allows you to join the same table multiple times, so what you need here is:

SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
 ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
 ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
 ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'

Here's the toy MySQL database that I'm using to answer this question:

CREATE TABLE products (
 id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
 product_id INTEGER NOT NULL,
 attribute_id INTEGER NOT NULL,
 value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);
INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');

With my query above, MySQL reports the following query plan:

+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| 1 | SIMPLE | ap1 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap2 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap3 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | temp.ap3.product_id | 1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+

See the MySQL documentation for an explanation of the EXPLAIN output.

This looks better than the plan for the OP's query:

+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| 1 | PRIMARY | p | index | NULL | PRIMARY | 4 | NULL | 2 | Using where; Using index |
| 4 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+

But results will vary from one database to another: a good query planner might be able to make something efficient out of the OP's query.

Question 5

wouldn't this still bog down a little bit? I mean not as much as the original but it would still bog down wouldn't it?

Question 6

It all depends on the query planner, but with indexes on attributes_products.product_id and attributes_products.attribute_id MySQL looks like it has a decent plan. Try it on your preferred database and see for yourself.

Question 7

I just assumed this was SQL Server, I didn't even look for tags...oops

Question 8

But what if I create an index in the attributes_products table on the product_id. Yes, my query is doing multiple subqueries, but I think that they would take minimum time given the index I'm talking about; in exchange of that, my query doesn't consumes too much memory. I think (but correct me if I'm wrong) that your query would be slightly faster, but it will hold a bunch of memory for the inner join.

Question 9

There's no substitute for trying both and seeing which is better!

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2013-11-04 13:42:06Z

SQL allows you to join the same table multiple times, so what you need here is:

SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
 ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
 ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
 ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'

Here's the toy MySQL database that I'm using to answer this question:

CREATE TABLE products (
 id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
 product_id INTEGER NOT NULL,
 attribute_id INTEGER NOT NULL,
 value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);
INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');

With my query above, MySQL reports the following query plan:

+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| 1 | SIMPLE | ap1 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap2 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap3 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | temp.ap3.product_id | 1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+

See the MySQL documentation for an explanation of the EXPLAIN output.

This looks better than the plan for the OP's query:

+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| 1 | PRIMARY | p | index | NULL | PRIMARY | 4 | NULL | 2 | Using where; Using index |
| 4 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+

But results will vary from one database to another: a good query planner might be able to make something efficient out of the OP's query.

wouldn't this still bog down a little bit? I mean not as much as the original but it would still bog down wouldn't it?
It all depends on the query planner, but with indexes on attributes_products.product_id and attributes_products.attribute_id MySQL looks like it has a decent plan. Try it on your preferred database and see for yourself.
I just assumed this was SQL Server, I didn't even look for tags...oops
But what if I create an index in the attributes_products table on the product_id. Yes, my query is doing multiple subqueries, but I think that they would take minimum time given the index I'm talking about; in exchange of that, my query doesn't consumes too much memory. I think (but correct me if I'm wrong) that your query would be slightly faster, but it will hold a bunch of memory for the inner join.
There's no substitute for trying both and seeing which is better!

Stack Exchange Network

Help optimizing this query with multiple where exists

Conclusions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Help optimizing this query with multiple where exists

Conclusions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions