Oracle aggregate function performance

Question 1

Can Oracle be smart about aggregate functions, such as MIN(), MAX(), and (AVG)? My testing shows that it seems to be surprisingly stupid.

I have the following query:

SELECT COUNT(userId), AVG(age), 
 STDDEV(age), MIN(age), MAX(age), 
 date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011')) 
GROUP BY date_range_start, date_range_end;

It takes 27 seconds. Now I remove the STDDEV, MIN and MAX aggregations the same query takes only 12 seconds.
OK, I can see STDDEV slowing things down as it requires 2 passes. So I try AVG + MIN and MAX -- I get 21s.

How is this even possible? How can adding calculation of min and max to the calculation of AVG slow things down by the factor of 2 almost? Considering that out of the 12 seconds that it takes with AVG only 10 are spent on the full table scan? So adding min/max calculation changes the group by step from 2 seconds to 10?

The explain plan:

-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 12 |00:00:24.61 | 369K| 369K| | | |
| 1 | HASH GROUP BY | | 1 | 12 | 12 |00:00:24.61 | 369K| 369K| 762K| 762K| 11M (0)|
|* 2 | TABLE ACCESS FULL| USER | 1 | 29M| 29M|00:00:09.34 | 369K| 369K| | | |
-------------------------------------------------------------------------------------------------------------------------------------

Question 2

do you have a tkprof/xplan showing this behaviour?

Question 3

@NiallLitchfield I have the xplan (updated), I can't get tkprof at the moment.

Question 4

How many rows does the query return?

Question 5

@ypercube the table above contains 2 columns: E-Rows and A-Rows. E-Rows stands for "Estimated Rows", A stands for "Actual". So the query returns 12 rows.

Question 6

Is this always true: date_range_start <= date_range_end ?

Question 7

Use Oracle analytics:

SELECT distinct
 COUNT(userId) over (partition by date_range_start, date_range_end)
, AVG(age) over (partition by date_range_start, date_range_end)
, STDDEV(age) over (partition by date_range_start, date_range_end)
, MIN(age) over (partition by date_range_start, date_range_end)
, MAX(age) over (partition by date_range_start, date_range_end)
, date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011')) 
/

It does the same but most of the times it is surprizingly faster.

Question 8

If userId cannot be NULL, change COUNT(userId) to COUNT(*).

If date_range_start <= date_range_end is true for all rows, then you could slightly change the query to:

SELECT COUNT(*), AVG(age), 
 STDDEV(age), MIN(age), MAX(age), 
 date_range_start, date_range_end
FROM users
WHERE (date_range_start BETWEEN TO_DATE('01-Dec-2010') AND TO_DATE('30-Nov-2011')) 
 AND (date_range_end <= TO_DATE('30-Nov-2011')) 
GROUP BY date_range_start, date_range_end ;

If you have an index on date_range_start, the change will help narrowing the search.

I would also think that an index on (date_range_start, date_range_end, age) would be most helpful for this query but Oracle behaves so much different than MySQL that I may be totally wrong.

user953user953 · Answer 1 · 2012-04-26 13:56:39Z

Use Oracle analytics:

SELECT distinct
 COUNT(userId) over (partition by date_range_start, date_range_end)
, AVG(age) over (partition by date_range_start, date_range_end)
, STDDEV(age) over (partition by date_range_start, date_range_end)
, MIN(age) over (partition by date_range_start, date_range_end)
, MAX(age) over (partition by date_range_start, date_range_end)
, date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011')) 
/

It does the same but most of the times it is surprizingly faster.

ypercubeTM ypercubeTM 99.7k13 gold badges217 silver badges306 bronze badges · Answer 2 · 2012-02-17 01:21:50Z

If userId cannot be NULL, change COUNT(userId) to COUNT(*).

If date_range_start <= date_range_end is true for all rows, then you could slightly change the query to:

SELECT COUNT(*), AVG(age), 
 STDDEV(age), MIN(age), MAX(age), 
 date_range_start, date_range_end
FROM users
WHERE (date_range_start BETWEEN TO_DATE('01-Dec-2010') AND TO_DATE('30-Nov-2011')) 
 AND (date_range_end <= TO_DATE('30-Nov-2011')) 
GROUP BY date_range_start, date_range_end ;

If you have an index on date_range_start, the change will help narrowing the search.

I would also think that an index on (date_range_start, date_range_end, age) would be most helpful for this query but Oracle behaves so much different than MySQL that I may be totally wrong.

Stack Exchange Network

Oracle aggregate function performance

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Oracle aggregate function performance

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions