We're having a problem with a query that keeps executing very very slowly
SELECT `e`.*,
IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) AS `news_from_date`,
IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) AS `news_to_date`
FROM `catalog_product_entity` AS `e`
INNER JOIN `catalog_product_website` AS `product_website`
ON product_website.product_id = e.entity_id AND product_website.website_id = 3
LEFT JOIN `catalog_product_entity_datetime` AS `at_news_from_date_default`
ON (`at_news_from_date_default`.`entity_id` = `e`.`entity_id`) AND
(`at_news_from_date_default`.`attribute_id` = '81') AND `at_news_from_date_default`.`store_id` = 0
LEFT JOIN `catalog_product_entity_datetime` AS `at_news_from_date`
ON (`at_news_from_date`.`entity_id` = `e`.`entity_id`) AND
(`at_news_from_date`.`attribute_id` = '81') AND (`at_news_from_date`.`store_id` = 9)
LEFT JOIN `catalog_product_entity_datetime` AS `at_news_to_date_default`
ON (`at_news_to_date_default`.`entity_id` = `e`.`entity_id`) AND
(`at_news_to_date_default`.`attribute_id` = '82') AND `at_news_to_date_default`.`store_id` = 0
LEFT JOIN `catalog_product_entity_datetime` AS `at_news_to_date`
ON (`at_news_to_date`.`entity_id` = `e`.`entity_id`) AND
(`at_news_to_date`.`attribute_id` = '82') AND (`at_news_to_date`.`store_id` = 9)
WHERE (((((IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) <=
'2021-10-06 23:59:59') OR
(IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) IS null)))))
AND (((((IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) >=
'2021-10-06 00:00:00') OR
(IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) IS null)))))
AND ((IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) IS not null) OR
(IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) IS not null))
The query provides a list of new products but takes minutes to execute. I'm not sure how to optimize this as all the conditions are needed to actually determine the new products.
It's actually taking minutes to execute.
Adding some profile results
And explain
Is it possible to somehow tweak the database to improve the execution time of this query?
Here's the structure on the used tables
-- auto-generated definition
create table catalog_product_entity
(
entity_id int unsigned auto_increment comment 'Entity ID'
primary key,
attribute_set_id smallint unsigned default 0 not null comment 'Attribute Set ID',
type_id varchar(32) default 'simple' not null comment 'Type ID',
sku varchar(64) null comment 'SKU',
has_options smallint default 0 not null comment 'Has Options',
required_options smallint unsigned default 0 not null comment 'Required Options',
created_at timestamp default current_timestamp() not null comment 'Creation Time',
updated_at timestamp default current_timestamp() not null on update current_timestamp() comment 'Update Time',
mv_product_id int unsigned null comment 'Megaventory Id',
mageworx_is_require smallint default 0 not null comment 'MageWorx Is Required',
is_replace_product_sku tinyint(1) default 0 not null comment 'Is Replace Product SKU',
created_by int null comment 'Created by Mass Product Import profiles id',
updated_by int null comment 'Created by Mass Product Import profiles id'
)
comment 'Catalog Product Table';
create index CATALOG_PRODUCT_ENTITY_ATTRIBUTE_SET_ID
on catalog_product_entity (attribute_set_id);
create index CATALOG_PRODUCT_ENTITY_SKU
on catalog_product_entity (sku);
-- auto-generated definition
create table catalog_product_website
(
product_id int unsigned not null comment 'Product ID',
website_id smallint unsigned not null comment 'Website ID',
primary key (product_id, website_id),
constraint CATALOG_PRODUCT_WEBSITE_WEBSITE_ID_STORE_WEBSITE_WEBSITE_ID
foreign key (website_id) references store_website (website_id)
on delete cascade,
constraint CAT_PRD_WS_PRD_ID_CAT_PRD_ENTT_ENTT_ID
foreign key (product_id) references catalog_product_entity (entity_id)
on delete cascade
)
comment 'Catalog Product To Website Linkage Table';
create index CATALOG_PRODUCT_WEBSITE_WEBSITE_ID
on catalog_product_website (website_id);
-- auto-generated definition
create table catalog_product_entity_datetime
(
value_id int auto_increment comment 'Value ID'
primary key,
attribute_id smallint unsigned default 0 not null comment 'Attribute ID',
store_id smallint unsigned default 0 not null comment 'Store ID',
entity_id int unsigned default 0 not null comment 'Entity ID',
value datetime null comment 'Value',
constraint CATALOG_PRODUCT_ENTITY_DATETIME_ENTITY_ID_ATTRIBUTE_ID_STORE_ID
unique (entity_id, attribute_id, store_id),
constraint CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID_STORE_STORE_ID
foreign key (store_id) references store (store_id)
on delete cascade,
constraint CAT_PRD_ENTT_DTIME_ATTR_ID_EAV_ATTR_ATTR_ID
foreign key (attribute_id) references eav_attribute (attribute_id)
on delete cascade,
constraint CAT_PRD_ENTT_DTIME_ENTT_ID_CAT_PRD_ENTT_ENTT_ID
foreign key (entity_id) references catalog_product_entity (entity_id)
on delete cascade
)
comment 'Catalog Product Datetime Attribute Backend Table';
create index CATALOG_PRODUCT_ENTITY_DATETIME_ATTRIBUTE_ID
on catalog_product_entity_datetime (attribute_id);
create index CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID
on catalog_product_entity_datetime (store_id);
Any ideas on how to speed it up is much appreciated
1 Answer 1
There's a lot going on in your query, and potentially a lot of room for improvement. As a general answer and set of advice, yes creating proper indexes on your tables involved should help improve the performance of your query. Proper indexes are one's that cover all of the fields used in the query. Sometimes this is feasible, other times not.
Generally speaking, to properly create a covering index, you should first specify the predicates (JOIN
, WHERE
, or HAVING
clauses) fields in the index definition, then followed by what fields you're utilizing in the SELECT
list. The order you list your fields matters because it is the order in which the underlying B-Tree data structure sorts your data in that index. If you listed a field first in the index definition that you're not actually using in your query, but the rest of the fields in that index definition are being used by your query, that index probably won't be applicable or used by your query.
So an example covering index you can try adding in your scenario would be for the table catalog_product_entity_datetime
on the fields (entity_id, attribute_id, store_id, value_id)
. This index definition includes all 3 fields of your JOIN
clause plus the field you're selecting from it ultimately.
Another word of advise is to avoid using SELECT *
generally, as that's bad practice and will potentially result in some covering indexes no longer being applicable to your queries. It also results in unnecessary amounts of data being returned oftentimes when only a subset of the fields are really needed. In your example, you're doing SELECT e.*
, do you really need all columns returned for catalog_product_entity
in this query or can you explicitly list only the subset of columns needed? (In addition to the performance drawbacks, it also could result in other issues with code maintenance down the road, should your schema change without the consumer of your query being updated accordingly.)
Additionally, your query has a bunch of ORs in your predicates, specifically the WHERE
clause. Sometimes this makes it difficult for database engines to come up with the most efficient plan and can cause what would've been an appropriate index, to not be used. Oftentimes one can rewrite their query replacing the OR
clauses with a UNION
clause instead, and selecting from their query a second time, applying the second case of their OR
clause instead.
Finally, I'll just mention you're using the IF
keyword in your WHERE
clause as well. I'm not familiar enough in MariaDB to say that keyword specifically could cause performance issues, but just a heads up that in general functions in predicates can sometimes cause performance issues because they're unable to be evaluated effectively by the database engine to applicably apply what would've been an appropriate index otherwise. So sometimes re-writing the query to not use a function in the predicate, is also another way to improve performance.
Long story short, with the advise above, you can try a multitude of different indexes and query re-writes to find the most efficient outcome. It will require a lot of testing, which is why I gave general guidance on things you can do and what to look out for. Best of luck!
-
1Thank you for the detailed response. Although in my case it was as in the comments the conversion between int and string, this is generally very useful information.gabtzi– gabtzi2021年10月06日 16:22:24 +00:00Commented Oct 6, 2021 at 16:22
Explore related questions
See similar questions with these tags.
(at_news_from_date.attribute_id = '81')
. Another might be that there are redundant indexes CATALOG_PRODUCT_ENTITY_DATETIME_ATTRIBUTE_ID and CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID - both are already covered by the constraints on the same columns (FK creates an index if it does not exist, and then you manually create a second one)