6
\$\begingroup\$

In the application I'm building, the user is able to define 'types' where each 'type' has a set of 'attributes'.

The user is able to create instances of products by defining a value for each attribute the product's type has.

A pic of the schema: A pic of the schema

I'm creating the query where the user specifies the attributes values and the product type and with that I should return all the product id's that meets the query.

The problem I see in my query is that I'm performing a whole select * from attributes_products ... for each attribute that the product's type has.

Is there a way to optimize this? If I create an index in the column attributes_products.product_id would this query be actually optimal?

Example of a query where I'm looking for a product whose type has 3 attributes:

select p.id
from Products as p
where 
exists(
 select * from attributes_products
 where
 product_id = p.id AND
 attribute_id = 27 AND
 value = 'some_value'
) AND
exists(
 select * from attributes_products
 where
 product_id = p.id AND
 attribute_id = 28 AND
 value = 'other_value'
) AND
exists(
 select * from attributes_products
 where
 product_id = p.id AND
 attribute_id = 29 AND
 value = 'oother_value'
)

Many thanks.

Conclusions

So, Gareth Rees (selected answer) proposed another solution which involves multiple Joins. Here is the explanation of its query (done by PGAdmin): Selected answer query explanation

This is the explanation of the original query: Original query

I believe that the selected answer is slightly faster, but consumes a lot more memory (because of the triple join). I believe that my original query is slightly slower (very slightly, since there's an index on the attributes_products table) but a lot more efficient in memory.

asked Nov 4, 2013 at 1:17
\$\endgroup\$
2
  • \$\begingroup\$ what Database Engine are you using? \$\endgroup\$ Commented Nov 4, 2013 at 17:30
  • \$\begingroup\$ @Malachi, I'm using PostgreSQL as database manager/engine \$\endgroup\$ Commented Nov 4, 2013 at 17:54

1 Answer 1

5
\$\begingroup\$

SQL allows you to join the same table multiple times, so what you need here is:

SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
 ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
 ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
 ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'

Here's the toy MySQL database that I'm using to answer this question:

CREATE TABLE products (
 id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
 product_id INTEGER NOT NULL,
 attribute_id INTEGER NOT NULL,
 value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);
INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');

With my query above, MySQL reports the following query plan:

+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| 1 | SIMPLE | ap1 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap2 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap3 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | temp.ap3.product_id | 1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+

See the MySQL documentation for an explanation of the EXPLAIN output.

This looks better than the plan for the OP's query:

+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| 1 | PRIMARY | p | index | NULL | PRIMARY | 4 | NULL | 2 | Using where; Using index |
| 4 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+

But results will vary from one database to another: a good query planner might be able to make something efficient out of the OP's query.

answered Nov 4, 2013 at 13:42
\$\endgroup\$
5
  • \$\begingroup\$ wouldn't this still bog down a little bit? I mean not as much as the original but it would still bog down wouldn't it? \$\endgroup\$ Commented Nov 4, 2013 at 17:01
  • \$\begingroup\$ It all depends on the query planner, but with indexes on attributes_products.product_id and attributes_products.attribute_id MySQL looks like it has a decent plan. Try it on your preferred database and see for yourself. \$\endgroup\$ Commented Nov 4, 2013 at 17:25
  • \$\begingroup\$ I just assumed this was SQL Server, I didn't even look for tags...oops \$\endgroup\$ Commented Nov 4, 2013 at 17:29
  • \$\begingroup\$ But what if I create an index in the attributes_products table on the product_id. Yes, my query is doing multiple subqueries, but I think that they would take minimum time given the index I'm talking about; in exchange of that, my query doesn't consumes too much memory. I think (but correct me if I'm wrong) that your query would be slightly faster, but it will hold a bunch of memory for the inner join. \$\endgroup\$ Commented Nov 4, 2013 at 18:32
  • 1
    \$\begingroup\$ There's no substitute for trying both and seeing which is better! \$\endgroup\$ Commented Nov 4, 2013 at 19:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.