In the application I'm building, the user is able to define 'types' where each 'type' has a set of 'attributes'.
The user is able to create instances of products by defining a value for each attribute the product's type has.
A pic of the schema: A pic of the schema
I'm creating the query where the user specifies the attributes values and the product type and with that I should return all the product id's that meets the query.
The problem I see in my query is that I'm performing a whole select * from attributes_products ...
for each attribute that the product's type has.
Is there a way to optimize this? If I create an index in the column attributes_products.product_id would this query be actually optimal?
Example of a query where I'm looking for a product whose type has 3 attributes:
select p.id
from Products as p
where
exists(
select * from attributes_products
where
product_id = p.id AND
attribute_id = 27 AND
value = 'some_value'
) AND
exists(
select * from attributes_products
where
product_id = p.id AND
attribute_id = 28 AND
value = 'other_value'
) AND
exists(
select * from attributes_products
where
product_id = p.id AND
attribute_id = 29 AND
value = 'oother_value'
)
Many thanks.
Conclusions
So, Gareth Rees (selected answer) proposed another solution which involves multiple Joins. Here is the explanation of its query (done by PGAdmin): Selected answer query explanation
This is the explanation of the original query: Original query
I believe that the selected answer is slightly faster, but consumes a lot more memory (because of the triple join). I believe that my original query is slightly slower (very slightly, since there's an index on the attributes_products table) but a lot more efficient in memory.
-
\$\begingroup\$ what Database Engine are you using? \$\endgroup\$Malachi– Malachi2013年11月04日 17:30:16 +00:00Commented Nov 4, 2013 at 17:30
-
\$\begingroup\$ @Malachi, I'm using PostgreSQL as database manager/engine \$\endgroup\$Lay González– Lay González2013年11月04日 17:54:07 +00:00Commented Nov 4, 2013 at 17:54
1 Answer 1
SQL allows you to join the same table multiple times, so what you need here is:
SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'
Here's the toy MySQL database that I'm using to answer this question:
CREATE TABLE products (
id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
product_id INTEGER NOT NULL,
attribute_id INTEGER NOT NULL,
value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);
INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');
With my query above, MySQL reports the following query plan:
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| 1 | SIMPLE | ap1 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap2 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | ap3 | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | temp.ap3.product_id | 1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
See the MySQL documentation for an explanation of the EXPLAIN
output.
This looks better than the plan for the OP's query:
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| 1 | PRIMARY | p | index | NULL | PRIMARY | 4 | NULL | 2 | Using where; Using index |
| 4 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | attributes_products | ref | ap_product,ap_attribute | ap_attribute | 4 | const | 1 | Using where |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
But results will vary from one database to another: a good query planner might be able to make something efficient out of the OP's query.
-
\$\begingroup\$ wouldn't this still bog down a little bit? I mean not as much as the original but it would still bog down wouldn't it? \$\endgroup\$Malachi– Malachi2013年11月04日 17:01:37 +00:00Commented Nov 4, 2013 at 17:01
-
\$\begingroup\$ It all depends on the query planner, but with indexes on
attributes_products.product_id
andattributes_products.attribute_id
MySQL looks like it has a decent plan. Try it on your preferred database and see for yourself. \$\endgroup\$Gareth Rees– Gareth Rees2013年11月04日 17:25:52 +00:00Commented Nov 4, 2013 at 17:25 -
\$\begingroup\$ I just assumed this was SQL Server, I didn't even look for tags...oops \$\endgroup\$Malachi– Malachi2013年11月04日 17:29:50 +00:00Commented Nov 4, 2013 at 17:29
-
\$\begingroup\$ But what if I create an index in the attributes_products table on the product_id. Yes, my query is doing multiple subqueries, but I think that they would take minimum time given the index I'm talking about; in exchange of that, my query doesn't consumes too much memory. I think (but correct me if I'm wrong) that your query would be slightly faster, but it will hold a bunch of memory for the inner join. \$\endgroup\$Lay González– Lay González2013年11月04日 18:32:14 +00:00Commented Nov 4, 2013 at 18:32
-
1\$\begingroup\$ There's no substitute for trying both and seeing which is better! \$\endgroup\$Gareth Rees– Gareth Rees2013年11月04日 19:45:33 +00:00Commented Nov 4, 2013 at 19:45