I have a large table that has three columns like this:
"START_DATE" DATE,
"START_VALUE" NUMBER(10,7)
"START_DATE_VALUE" NUMBER(18,7)
GENERATED ALWAYS AS
(
(extract(YEAR FROM START_DATE) * 10000 +
extract(MONTH FROM START_DATE)*100 +
extract(DAY FROM START_DATE))*power(10,3) +
(START_VALUE+180)
) VIRTUAL
The START_DATE_VALUE
column is a virtual column that is used for partitioning. However, when I have a query like this:
select *
from mytable
where
start_date > to_date('02-01-2012', 'MM-DD-YYYY')
and start_value > 120.23452
It scans all partitions for the result. How can I make Oracle use the virtual column and then pick just the right partition to work on it?
Sorry my table definition is very large, I can't copy it in here.
2 Answers 2
I think you have a problem with your virtual column definition. For your special values 2012年02月01日 (in YYYY-MM-DD format) and 120.23452 the value of the virtual column is
2012*10000+2*100+1*1000+180+120.23452 =わ 20120000+たす200+たす1000+たす300.23452 =20121500.23452
and not
20120201300.23452
as you expected.
Also check if your virtual column is the column your table is partitioned by.
From the VLDB and Partitioning Guide:
Virtual column-based partitioned tables benefit from partition pruning for statements that use the virtual column-defining expression in the SQL statement.
So I think select-statement
select * from mytable where start_date > todate('02-01-2012', 'MM-DD-YYYY') and start_value > 120.23452
should be something like
select * from mytable where (
extract(YEAR FROM START_DATE) * 10000 +
extract(MONTH FROM START_DATE)*100 +
extract(DAY FROM START_DATE))*power(10,3) +
(START_VALUE+180)
) > 20121500.23452
for partition pruning to take place.
The
todate
function you use in your select statement does not exist in oracle sql. The name of the function is
to_date
.
-
thank you for pointing out my mistake in calculation. In this test, I want to know if Oracle is smart enough to figure out that it needs to use the virtual column to go to a subset of partitions instead of going to every single partition. I have more than 3500 partitions and it really hurt performance doing that way. Can we influence the query plan somehow to make it use the virtual column even though we don't specify it?Sean Nguyen– Sean Nguyen2012年04月09日 10:10:35 +00:00Commented Apr 9, 2012 at 10:10
-
no, i think it will not be "smart enough". I think it will work if you use the same way as i do in my example. so it is defined in the manual.miracle173– miracle1732012年04月09日 11:27:00 +00:00Commented Apr 9, 2012 at 11:27
"How can I make oracle to use the virtual column and then pick just the right partition to work on it."
You know there's a relationship between (START_DATE, START_VALUE)
and START_DATE_VALUE
but alas the optimizer doesn't. All it knows is that you are querying on two columns which don't feature in your partitioning strategy.
You haven't posted your partition key, so I've made a guess at what you're doing in this test case:
create table t23
(id number not null primary key
, start_date date not null
, start_value number not null
, start_date_value number GENERATED ALWAYS AS
(
(extract(YEAR FROM START_DATE) * 10000 +
extract(MONTH FROM START_DATE)*100 +
extract(DAY FROM START_DATE))*power(10,3) +
(START_VALUE+180)
) VIRTUAL
)
PARTITION BY range (start_date_value)
(
PARTITION range_10 values LESS THAN (1900000000) ,
PARTITION range_20 values LESS THAN (2000000000),
PARTITION range_30 values LESS THAN (2100000000),
PARTITION range_40 values LESS THAN (2200000000),
PARTITION range_50 values LESS THAN (11000000000),
PARTITION range_60 values LESS THAN (12000000000),
PARTITION range_70 values LESS THAN (13000000000),
PARTITION range_80 values LESS THAN (to_date('01-JAN-1400')),
PARTITION range_90 values LESS THAN (15000000000),
PARTITION range_100 values LESS THAN (16000000000),
PARTITION range_110 values LESS THAN (17000000000),
PARTITION range_120 values LESS THAN (18000000000),
PARTITION range_130 values LESS THAN (19000000000),
PARTITION range_140 values LESS THAN (20000000000),
PARTITION range_150 values LESS THAN (21000000000),
PARTITION range_160 values LESS THAN (22000000000),
PARTITION range_mx values LESS THAN (maxvalue)
)
/
It's quite easy to reproduce the scenario you describe. All the rows for the query are in the partition range_80
but the explain plan shows a search of all partitions:
SQL> explain plan for
select * from t23
where start_date between to_date('31-dec-1390')
and to_date('01-jan-1393')
and start_value between 1050 and 4999
/
2 3 4 5 6
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 4042841927
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 250 (1)| 00:00:03 | | |
| 1 | PARTITION RANGE ALL| | 1 | 25 | 250 (1)| 00:00:03 | 1 | 17 |
|* 2 | TABLE ACCESS FULL | T23 | 1 | 25 | 250 (1)| 00:00:03 | 1 | 17 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("START_VALUE"<=4999 AND "START_DATE">=TO_DATE(' 1390年12月31日 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "START_DATE"<=TO_DATE(' 1393年01月01日 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "START_VALUE">=1050)
16 rows selected.
SQL>
So you have two options. The first option is to use START_DATE_VALUE in your query. Presumably there's a reason why you're not applying this obvious solution; probably because that column has no business meaning .
The alternative is to change the partitioning strategy. This should be quite simple to achieve. Your START_DATE_VALUE column basically orders row by START_VALUE within START_DATE. You can get the same effect with sub-partitioning.
So here is table T42. It has range partitions for START_DATE and range subpartitions for START_VALUE (the template clause applies the same subpartitions to each partition):
create table t42
(id number not null primary key
, start_date date not null
, start_value number not null
)
PARTITION BY range (start_date)
SUBPARTITION BY range (start_value)
SUBPARTITION TEMPLATE(
SUBPARTITION lowval VALUES LESS THAN (1000) ,
SUBPARTITION medval VALUES LESS THAN (5000) ,
SUBPARTITION highval VALUES LESS THAN (MAXVALUE)
)
(
PARTITION range_10 values LESS THAN (to_date('01-JAN-700')),
PARTITION range_20 values LESS THAN (to_date('01-JAN-800')),
PARTITION range_30 values LESS THAN (to_date('01-JAN-900')),
PARTITION range_40 values LESS THAN (to_date('01-JAN-1000')),
PARTITION range_50 values LESS THAN (to_date('01-JAN-1100')),
PARTITION range_60 values LESS THAN (to_date('01-JAN-1200')),
PARTITION range_70 values LESS THAN (to_date('01-JAN-1300')),
PARTITION range_80 values LESS THAN (to_date('01-JAN-1400')),
PARTITION range_90 values LESS THAN (to_date('01-JAN-1500')),
PARTITION range_100 values LESS THAN (to_date('01-JAN-1600')),
PARTITION range_110 values LESS THAN (to_date('01-JAN-1700')),
PARTITION range_120 values LESS THAN (to_date('01-JAN-1800')),
PARTITION range_130 values LESS THAN (to_date('01-JAN-1900')),
PARTITION range_140 values LESS THAN (to_date('01-JAN-2000')),
PARTITION range_150 values LESS THAN (to_date('01-JAN-2100')),
PARTITION range_160 values LESS THAN (to_date('01-JAN-2200')),
PARTITION range_mx values LESS THAN (maxvalue)
)
/
Just to prove there's nothing up my sleeve I will populate T42 with the exact same data from T23:
SQL> insert into t42
select id, start_date, start_value
from t23
/
2 3 4
232800 rows created.
SQL> commit;
Commit complete.
SQL> EXEC DBMS_STATS.gather_table_stats(USER, 'T42')
PL/SQL procedure successfully completed.
SQL> explain plan for
select * from t42
where start_date between to_date('31-dec-1390')
and to_date('01-jan-1393')
and start_value between 1050 and 4999
/
SQL> SQL> 2 3 4 5 6
Explained.
SQL>
As you can see explain plan now picks a single sub-partition within a single partition. Precision pruning!
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2460612001
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 27 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE | | 1 | 17 | 27 (0)| 00:00:01 | 8 | 8 |
| 2 | PARTITION RANGE SINGLE| | 1 | 17 | 27 (0)| 00:00:01 | 2 | 2 |
|* 3 | TABLE ACCESS FULL | T42 | 1 | 17 | 27 (0)| 00:00:01 | 23 | 23 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("START_VALUE"<=4999 AND "START_DATE">=TO_DATE(' 1390年12月31日 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "START_DATE"<=TO_DATE(' 1393年01月01日 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "START_VALUE">=1050)
17 rows selected.
SQL>
"The reason I don't want to use subpartition is because of my text index."
I suppose the pertinent question is, do you need subpartitions? I mean, how pick is your table (number of rows)? How many rows per day? Could you scrape by with only one partition per START_DATE day and forget about START_VALUE ?
-
Thanks APC. The reason I don't want to use subpartition is because of my text index. I don't want to have a domain text index because I don't want to rebuild it when I split my max partition. I was trying to come up with an alternative by using this virtual column.Sean Nguyen– Sean Nguyen2012年04月09日 21:50:43 +00:00Commented Apr 9, 2012 at 21:50
start_date_value > 20120201300.23452
? Does it only scan a single partition?