0

I have a single reporting table of sales data with about 4 million rows of data:

CREATE TABLE reporting_sales (
 customer_id bigint(20) DEFAULT NULL,
 effective_date date DEFAULT NULL,
 expiration_date date DEFAULT NULL,
 license_type_id int(11) DEFAULT NULL,
 residency varchar(10) DEFAULT NULL,
 gender varchar(10) DEFAULT NULL,
 age_range varchar(10) DEFAULT NULL,
 KEY ndx_reporting_sales (license_type_id,
 effective_date,
 expiration_date,
 customer_id,
 residency,
 gender,
 age_range) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
;

And this is the statement I want to run to summarize the data as of a particular day:

SELECT COUNT(DISTINCT customer_id),
 license_type_id,
 residency,
 gender,
 age_range
FROM tmp_reporting_sales_fl
WHERE license_type_id in (1, 2, 3, 4, 5)
 AND effective_date <= '2021-01-01'
 AND expiration_date >= '2021-01-01'
GROUP BY license_type_id, residency, gender, age_range

I'm not sure how the index should be structured, specifically with respect to the customer_id field and the grouping.

Here's the explain for the index I have created, as shown above:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE reporting_sales range ndx_reporting_sales ndx_reporting_sales 4 1829784 16.66 Using where; Using index; Using filesort

How can I improve the performance of this statement and/or what would be a more suitable index?

asked Jan 12, 2023 at 6:43
2
  • 1
    Use the index by either (license_type_id, effective_date) or (license_type_id, expiration_date) (use those index which shows best selectivity, I predict that this will be the latter one). Anycase the rest of your index cannot work for shown query. Commented Jan 12, 2023 at 7:08
  • What's the PRIMARY KEY? Commented Jan 12, 2023 at 20:32

1 Answer 1

1

A general rule for multi-column indexes is that you can have N leading columns in the index that are used in equality conditions.

Then you can have one more column in the index after those equality columns, to use for either inequality/range conditions, or grouping, or sorting. But not more than one.

Any further columns are not used for searching, sorting, or grouping. At best, they're used for a covering index.

In the query you show, you have three range conditions. Only one of these conditions can make use of the index.

You can tell from the EXPLAIN report's len column that it is only using 4 bytes of your index. That's for the first column license_type_id, which is a 4-byte integer. The other columns of the index are ignored for this query. They don't help narrow down the examined rows, nor do they help the group by.

In the query you show, that's the best you can do.


Possible exception to the above rule: MySQL 8.0.13 implemented the skip scan range access method, which might help in some cases, but there are a lot of limitations. Read the section https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#range-access-skip-scan for details.

answered Jan 12, 2023 at 14:35
4
  • 1
    Thank you @bill-karwin, for the explanation. Going to check out your book. So, I should just build the index using whichever of the three range conditions is most selective, for instance, (expiration_date)? Will it accomplish anything to add one of the grouping columns, say (expiration_date, age_range)? Commented Jan 12, 2023 at 15:26
  • Thanks for checking out my book (be sure to get the 2022 revision)! Commented Jan 12, 2023 at 15:29
  • Yes, pick the column that is most selective, in other words reduces the examined rows the the smallest number. Note this might change over time. If your query has a range condition, then no further columns of the index can optimize group by or order by. Commented Jan 12, 2023 at 15:30
  • @BillKarwin - All the columns seem to be NULLable. So, a key_len of 4 would imply a column of length 3, such as a DATE. Commented Jan 12, 2023 at 20:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.