0

In the query

select pricing_date,count(*) 
from prices 
where pricing_date>='2022-12-01' 
group by pricing_date;

I receive results with multiple pricing_date groups of the same date:

+--------------+----------+
| pricing_date | count(*) |
|--------------+----------|
| 2022年12月01日 | 40713 |
| 2022年12月05日 | 1362 |
| 2022年12月06日 | 448 |
| 2022年12月01日 | 112 |
| 2022年12月05日 | 790 |
| 2022年12月06日 | 1007 |
| 2022年12月05日 | 905 |
| 2022年12月06日 | 225 |

etc., for 400 rows. Here's the result of explain:

explain format=json select pricing_date,count(*) from prices where pricing_date>='2022-12-01' group by pricing_date;
{
 "query_block": {
 "select_id": 1,
 "cost_info": {
 "query_cost": "397070.02"
 },
 "grouping_operation": {
 "using_filesort": false,
 "table": {
 "table_name": "prices",
 "access_type": "ALL",
 "possible_keys": [
 "pd_id",
 "id_pd"
 ],
 "rows_examined_per_scan": 520926311,
 "rows_produced_per_join": 1914444,
 "filtered": "0.37",
 "Using": "parallel query (1 columns, 1 filters, 0 exprs; 0 extra)",
 "cost_info": {
 "read_cost": "205625.62",
 "eval_cost": "191444.40",
 "prefix_cost": "397070.02",
 "data_read_per_join": "759M"
 },
 "used_columns": [
 "pricing_date"
 ],
 "attached_condition": "(`prices`.`pricing_date` >= DATE'2022-12-01')"
 }
 }
 }
}

However, if I change the date slightly the 2022年12月05日 and 2022年12月06日 dates appear to group as expected:

select pricing_date,count(*) 
from prices 
where pricing_date>='2022-12-05' 
group by pricing_date;
+--------------+----------+
| pricing_date | count(*) |
|--------------+----------|
| 2022年12月05日 | 149752 |
| 2022年12月06日 | 223728 |
+--------------+----------+

here's the result of explain:

explain select pricing_date,count(*) from prices where pricing_date>='2022-12-04' group by pricing_date;
+----+-------------+-----------------------+------------+-------+-----------------+---------+---------+--------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----+-------------+-----------------------+------------+-------+-----------------+---------+---------+--------+--------+----------+--------------------------|
| 1 | SIMPLE | prices | <null> | range | pd_id,id_pd | pd_id | 4 | <null> | 976306 | 100.0 | Using where; Using index |
+----+-------------+-----------------------+------------+-------+-----------------+---------+---------+--------+--------+----------+--------------------------+

Why doesn't the first query accurately group all the dates with the same date into the same group? I see that the second query uses a key as expected, but it doesn't seem like group by should fail on non-indexed results.

Here's the table schema:

+-----------+--------------------------------------------------------------------+
| Table | Create Table |
|-----------+--------------------------------------------------------------------|
| prices | CREATE TABLE `prices` ( |
| | `pricing_date` date DEFAULT NULL, |
| | `id` varchar(12) DEFAULT NULL, |
| | UNIQUE KEY `pd_id` (`pricing_date`,`id`), |
| | KEY `id_pd` (`id`,`pricing_date`) |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-----------+--------------------------------------------------------------------+

I'm using mysql 8.0.23 in AWS Aurora serverless v2 if that's relevant.

asked Dec 7, 2022 at 21:04
9
  • What if you execute this select date(pricing_date) as new_pricing_date, count(*) from prices where date(pricing_date)>='2022年12月01日' group by new_pricing_date; , does it change the results ? Commented Dec 7, 2022 at 21:14
  • Looks like a bug. Commented Dec 7, 2022 at 22:51
  • @ErgestBasha that yields the correct results, but takes 200x longer than the same results using an index select pricing_date,count(*) from prices force index (pd_id) where pricing_date>='2022年12月01日' group by pricing_date; (2.3 seconds vs 493 seconds) Commented Dec 7, 2022 at 23:29
  • 1
    @enharmonic it seems that something is messing up the dates as the group by is working as expected when using the date function. I know the execution time would be higher because the date function doesn't use the index. Maybe you should file a bug on the Aurora side. Commented Dec 7, 2022 at 23:56
  • 1
    If Aurora has a way to turn off "parallel query", I recommend doing that. Commented Dec 9, 2022 at 0:02

1 Answer 1

1

As suggested by Rick James, turning off Aurora's parallel query seems to solve the issue and give the expected results.

answered Dec 9, 2022 at 15:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.