Can Oracle be smart about aggregate functions, such as MIN()
, MAX()
, and (AVG)
? My testing shows that it seems to be surprisingly stupid.
I have the following query:
SELECT COUNT(userId), AVG(age),
STDDEV(age), MIN(age), MAX(age),
date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011'))
GROUP BY date_range_start, date_range_end;
It takes 27 seconds.
Now I remove the STDDEV
, MIN
and MAX
aggregations the same query takes only 12 seconds.
OK, I can see STDDEV
slowing things down as it requires 2 passes.
So I try AVG
+ MIN
and MAX
-- I get 21s.
How is this even possible? How can adding calculation of min and max to the calculation of AVG slow things down by the factor of 2 almost? Considering that out of the 12 seconds that it takes with AVG only 10 are spent on the full table scan? So adding min/max calculation changes the group by step from 2 seconds to 10?
The explain plan:
-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 12 |00:00:24.61 | 369K| 369K| | | |
| 1 | HASH GROUP BY | | 1 | 12 | 12 |00:00:24.61 | 369K| 369K| 762K| 762K| 11M (0)|
|* 2 | TABLE ACCESS FULL| USER | 1 | 29M| 29M|00:00:09.34 | 369K| 369K| | | |
-------------------------------------------------------------------------------------------------------------------------------------
2 Answers 2
Use Oracle analytics:
SELECT distinct
COUNT(userId) over (partition by date_range_start, date_range_end)
, AVG(age) over (partition by date_range_start, date_range_end)
, STDDEV(age) over (partition by date_range_start, date_range_end)
, MIN(age) over (partition by date_range_start, date_range_end)
, MAX(age) over (partition by date_range_start, date_range_end)
, date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011'))
/
It does the same but most of the times it is surprizingly faster.
If userId
cannot be NULL, change COUNT(userId)
to COUNT(*)
.
If date_range_start <= date_range_end
is true for all rows, then you could slightly change the query to:
SELECT COUNT(*), AVG(age),
STDDEV(age), MIN(age), MAX(age),
date_range_start, date_range_end
FROM users
WHERE (date_range_start BETWEEN TO_DATE('01-Dec-2010') AND TO_DATE('30-Nov-2011'))
AND (date_range_end <= TO_DATE('30-Nov-2011'))
GROUP BY date_range_start, date_range_end ;
If you have an index on date_range_start
, the change will help narrowing the search.
I would also think that an index on (date_range_start, date_range_end, age)
would be most helpful for this query but Oracle behaves so much different than MySQL that I may be totally wrong.
date_range_start <= date_range_end
?