Can I improve this query by adding indexes or tuning mariadb config?

Question 1

We're having a problem with a query that keeps executing very very slowly

SELECT `e`.*,
 IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) AS `news_from_date`,
 IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) AS `news_to_date`
FROM `catalog_product_entity` AS `e`
 INNER JOIN `catalog_product_website` AS `product_website`
 ON product_website.product_id = e.entity_id AND product_website.website_id = 3
 LEFT JOIN `catalog_product_entity_datetime` AS `at_news_from_date_default`
 ON (`at_news_from_date_default`.`entity_id` = `e`.`entity_id`) AND
 (`at_news_from_date_default`.`attribute_id` = '81') AND `at_news_from_date_default`.`store_id` = 0
 LEFT JOIN `catalog_product_entity_datetime` AS `at_news_from_date`
 ON (`at_news_from_date`.`entity_id` = `e`.`entity_id`) AND
 (`at_news_from_date`.`attribute_id` = '81') AND (`at_news_from_date`.`store_id` = 9)
 LEFT JOIN `catalog_product_entity_datetime` AS `at_news_to_date_default`
 ON (`at_news_to_date_default`.`entity_id` = `e`.`entity_id`) AND
 (`at_news_to_date_default`.`attribute_id` = '82') AND `at_news_to_date_default`.`store_id` = 0
 LEFT JOIN `catalog_product_entity_datetime` AS `at_news_to_date`
 ON (`at_news_to_date`.`entity_id` = `e`.`entity_id`) AND
 (`at_news_to_date`.`attribute_id` = '82') AND (`at_news_to_date`.`store_id` = 9)
WHERE (((((IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) <=
 '2021-10-06 23:59:59') OR
 (IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) IS null)))))
 AND (((((IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) >=
 '2021-10-06 00:00:00') OR
 (IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) IS null)))))
 AND ((IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) IS not null) OR
 (IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) IS not null))

The query provides a list of new products but takes minutes to execute. I'm not sure how to optimize this as all the conditions are needed to actually determine the new products.

It's actually taking minutes to execute.

Adding some profile results

enter image description here

And explain

enter image description here

Is it possible to somehow tweak the database to improve the execution time of this query?

Here's the structure on the used tables

-- auto-generated definition
create table catalog_product_entity
(
 entity_id int unsigned auto_increment comment 'Entity ID'
 primary key,
 attribute_set_id smallint unsigned default 0 not null comment 'Attribute Set ID',
 type_id varchar(32) default 'simple' not null comment 'Type ID',
 sku varchar(64) null comment 'SKU',
 has_options smallint default 0 not null comment 'Has Options',
 required_options smallint unsigned default 0 not null comment 'Required Options',
 created_at timestamp default current_timestamp() not null comment 'Creation Time',
 updated_at timestamp default current_timestamp() not null on update current_timestamp() comment 'Update Time',
 mv_product_id int unsigned null comment 'Megaventory Id',
 mageworx_is_require smallint default 0 not null comment 'MageWorx Is Required',
 is_replace_product_sku tinyint(1) default 0 not null comment 'Is Replace Product SKU',
 created_by int null comment 'Created by Mass Product Import profiles id',
 updated_by int null comment 'Created by Mass Product Import profiles id'
)
 comment 'Catalog Product Table';
create index CATALOG_PRODUCT_ENTITY_ATTRIBUTE_SET_ID
 on catalog_product_entity (attribute_set_id);
create index CATALOG_PRODUCT_ENTITY_SKU
 on catalog_product_entity (sku);

-- auto-generated definition
create table catalog_product_website
(
 product_id int unsigned not null comment 'Product ID',
 website_id smallint unsigned not null comment 'Website ID',
 primary key (product_id, website_id),
 constraint CATALOG_PRODUCT_WEBSITE_WEBSITE_ID_STORE_WEBSITE_WEBSITE_ID
 foreign key (website_id) references store_website (website_id)
 on delete cascade,
 constraint CAT_PRD_WS_PRD_ID_CAT_PRD_ENTT_ENTT_ID
 foreign key (product_id) references catalog_product_entity (entity_id)
 on delete cascade
)
 comment 'Catalog Product To Website Linkage Table';
create index CATALOG_PRODUCT_WEBSITE_WEBSITE_ID
 on catalog_product_website (website_id);

-- auto-generated definition
create table catalog_product_entity_datetime
(
 value_id int auto_increment comment 'Value ID'
 primary key,
 attribute_id smallint unsigned default 0 not null comment 'Attribute ID',
 store_id smallint unsigned default 0 not null comment 'Store ID',
 entity_id int unsigned default 0 not null comment 'Entity ID',
 value datetime null comment 'Value',
 constraint CATALOG_PRODUCT_ENTITY_DATETIME_ENTITY_ID_ATTRIBUTE_ID_STORE_ID
 unique (entity_id, attribute_id, store_id),
 constraint CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID_STORE_STORE_ID
 foreign key (store_id) references store (store_id)
 on delete cascade,
 constraint CAT_PRD_ENTT_DTIME_ATTR_ID_EAV_ATTR_ATTR_ID
 foreign key (attribute_id) references eav_attribute (attribute_id)
 on delete cascade,
 constraint CAT_PRD_ENTT_DTIME_ENTT_ID_CAT_PRD_ENTT_ENTT_ID
 foreign key (entity_id) references catalog_product_entity (entity_id)
 on delete cascade
)
 comment 'Catalog Product Datetime Attribute Backend Table';
create index CATALOG_PRODUCT_ENTITY_DATETIME_ATTRIBUTE_ID
 on catalog_product_entity_datetime (attribute_id);
create index CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID
 on catalog_product_entity_datetime (store_id);

Any ideas on how to speed it up is much appreciated

Question 2

EAV is an ani-pattern - exhibit #N+1..

Question 3

Unfortunately Magento uses this pattern so I can't avoid using EAV pattern

Question 4

Your optimizer for some reason uses index merge instead of the available unique composite key. A possible (but imho improbable) reason is the quoted number in (at_news_from_date.attribute_id = '81'). Another might be that there are redundant indexes CATALOG_PRODUCT_ENTITY_DATETIME_ATTRIBUTE_ID and CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID - both are already covered by the constraints on the same columns (FK creates an index if it does not exist, and then you manually create a second one)

Question 5

That looks spot on (at_news_from_date.attribute_id = '81') I tried the same query using integers and it executed instantaneously

Question 6

Usually the other way around (comparing a text column to a number) is more problematic (all the text values have to be parsed). But even when technically OK it probably messed with optimizer stats or something.

Question 7

There's a lot going on in your query, and potentially a lot of room for improvement. As a general answer and set of advice, yes creating proper indexes on your tables involved should help improve the performance of your query. Proper indexes are one's that cover all of the fields used in the query. Sometimes this is feasible, other times not.

Generally speaking, to properly create a covering index, you should first specify the predicates (JOIN, WHERE, or HAVING clauses) fields in the index definition, then followed by what fields you're utilizing in the SELECT list. The order you list your fields matters because it is the order in which the underlying B-Tree data structure sorts your data in that index. If you listed a field first in the index definition that you're not actually using in your query, but the rest of the fields in that index definition are being used by your query, that index probably won't be applicable or used by your query.

So an example covering index you can try adding in your scenario would be for the table catalog_product_entity_datetime on the fields (entity_id, attribute_id, store_id, value_id). This index definition includes all 3 fields of your JOIN clause plus the field you're selecting from it ultimately.

Another word of advise is to avoid using SELECT * generally, as that's bad practice and will potentially result in some covering indexes no longer being applicable to your queries. It also results in unnecessary amounts of data being returned oftentimes when only a subset of the fields are really needed. In your example, you're doing SELECT e.*, do you really need all columns returned for catalog_product_entity in this query or can you explicitly list only the subset of columns needed? (In addition to the performance drawbacks, it also could result in other issues with code maintenance down the road, should your schema change without the consumer of your query being updated accordingly.)

Additionally, your query has a bunch of ORs in your predicates, specifically the WHERE clause. Sometimes this makes it difficult for database engines to come up with the most efficient plan and can cause what would've been an appropriate index, to not be used. Oftentimes one can rewrite their query replacing the OR clauses with a UNION clause instead, and selecting from their query a second time, applying the second case of their OR clause instead.

Finally, I'll just mention you're using the IF keyword in your WHERE clause as well. I'm not familiar enough in MariaDB to say that keyword specifically could cause performance issues, but just a heads up that in general functions in predicates can sometimes cause performance issues because they're unable to be evaluated effectively by the database engine to applicably apply what would've been an appropriate index otherwise. So sometimes re-writing the query to not use a function in the predicate, is also another way to improve performance.

Long story short, with the advise above, you can try a multitude of different indexes and query re-writes to find the most efficient outcome. It will require a lot of testing, which is why I gave general guidance on things you can do and what to look out for. Best of luck!

Question 8

Thank you for the detailed response. Although in my case it was as in the comments the conversion between int and string, this is generally very useful information.

J.D. J.D. 41.1k12 gold badges63 silver badges145 bronze badges · Answer 1 · 2021-10-06 11:53:56Z

There's a lot going on in your query, and potentially a lot of room for improvement. As a general answer and set of advice, yes creating proper indexes on your tables involved should help improve the performance of your query. Proper indexes are one's that cover all of the fields used in the query. Sometimes this is feasible, other times not.

Generally speaking, to properly create a covering index, you should first specify the predicates (JOIN, WHERE, or HAVING clauses) fields in the index definition, then followed by what fields you're utilizing in the SELECT list. The order you list your fields matters because it is the order in which the underlying B-Tree data structure sorts your data in that index. If you listed a field first in the index definition that you're not actually using in your query, but the rest of the fields in that index definition are being used by your query, that index probably won't be applicable or used by your query.

So an example covering index you can try adding in your scenario would be for the table catalog_product_entity_datetime on the fields (entity_id, attribute_id, store_id, value_id). This index definition includes all 3 fields of your JOIN clause plus the field you're selecting from it ultimately.

Another word of advise is to avoid using SELECT * generally, as that's bad practice and will potentially result in some covering indexes no longer being applicable to your queries. It also results in unnecessary amounts of data being returned oftentimes when only a subset of the fields are really needed. In your example, you're doing SELECT e.*, do you really need all columns returned for catalog_product_entity in this query or can you explicitly list only the subset of columns needed? (In addition to the performance drawbacks, it also could result in other issues with code maintenance down the road, should your schema change without the consumer of your query being updated accordingly.)

Additionally, your query has a bunch of ORs in your predicates, specifically the WHERE clause. Sometimes this makes it difficult for database engines to come up with the most efficient plan and can cause what would've been an appropriate index, to not be used. Oftentimes one can rewrite their query replacing the OR clauses with a UNION clause instead, and selecting from their query a second time, applying the second case of their OR clause instead.

Finally, I'll just mention you're using the IF keyword in your WHERE clause as well. I'm not familiar enough in MariaDB to say that keyword specifically could cause performance issues, but just a heads up that in general functions in predicates can sometimes cause performance issues because they're unable to be evaluated effectively by the database engine to applicably apply what would've been an appropriate index otherwise. So sometimes re-writing the query to not use a function in the predicate, is also another way to improve performance.

Long story short, with the advise above, you can try a multitude of different indexes and query re-writes to find the most efficient outcome. It will require a lot of testing, which is why I gave general guidance on things you can do and what to look out for. Best of luck!

Thank you for the detailed response. Although in my case it was as in the comments the conversion between int and string, this is generally very useful information.

Stack Exchange Network

Can I improve this query by adding indexes or tuning mariadb config?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Can I improve this query by adding indexes or tuning mariadb config?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions