There is a longstanding(?) problem(?) with Oracle that if you want to use a DISTINCT select on a column which has a B*-Tree Index but is nullable, that the index is not used. (As other answers suggest, this is even true if there was a check constrained added after the fact (No nulls present)).
There are various ways around this (including using a BITMAP index or adding a second NON NULL or constant column to the index). However I just noticed that if I do the SELECT DISTINCT with WHERE NOT NULL, Oracle is able to use the index (index fast full scan).
My Question: since which Oracle version this behavior is present, is it reliable (when is the two column index prefered) and why isnt it mentioned more often (for example this otherwise good answer does not mention it)?
Little reproducer
drop table SCOTT.T;
-- will not work with short rows (SELECT OWNER,SUBOBJECT_NAME ...)
create table SCOTT.T AS SELECT * FROM ALL_OBJECTS;
create index SCOTT.IDX_T_OWNER on SCOTT.T(OWNER); -- NOT NULL
-- (subobject_name,1) or (subobject_name,namespace) is NULL, NOT NULL
create index SCOTT.IDX_T_SUBOBJ on SCOTT.T(subobject_name); -- NULL
exec dbms_stats.gather_table_stats(OWNNAME=>'SCOTT',TABNAME=>'T', cascade=>true );
desc SCOTT.T;
set autotrace on explain
-- fast index scan:
select distinct OWNER from SCOTT.T;
-- full table scan:
select distinct SUBOBJECT_NAME from SCOTT.T;
-- fast index scan:
select distinct SUBOBJECT_NAME from SCOTT.T where SUBOBJECT_NAME IS NOT NULL;
Looks (on 18c) similiar to:
select distinct subobject_name from T;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 715 | 1430 | 436 (3)| 00:00:01 |
| 1 | HASH UNIQUE | | 715 | 1430 | 436 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T | 81426 | 159K| 428 (1)| 00:00:01 |
---------------------------------------------------------------------------
select distinct subobject_name from T where SUBOBJECT_NAME is not null;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 624 | 1248 | 5 (20)| 00:00:01 |
| 1 | HASH UNIQUE | | 624 | 1248 | 5 (20)| 00:00:01 |
|* 2 | INDEX FAST FULL SCAN| IDX_T_SUBOBJ | 1459 | 2918 | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------
-
1That answer doesn't mention because if you add WHERE col IS NOT NULL, you have a different query.ypercubeᵀᴹ– ypercubeᵀᴹ2019年09月16日 16:16:27 +00:00Commented Sep 16, 2019 at 16:16
-
It is only partial different because if the table has the check constraints the result would be the same (i.e. no null present). In my cases it’s very common that I don’t care or expect nulls in such queries. (One could argue that the column should be NOT NULL in that case, but it’s not possible for some unrelated reasons)eckes– eckes2019年09月16日 17:03:00 +00:00Commented Sep 16, 2019 at 17:03
1 Answer 1
since which Oracle version this behavior is present?
I have no idea but I expect it's there for a looong time.
is it reliable?
100% reliable
when is the two column index prefered?
When you want to work with null records, too.
select distinct SUBOBJECT_NAME from SCOTT.T;
can use the following index
create index SCOTT.IDX_T_SUBOBJ on SCOTT.T(1, subobject_name);
And why isnt it mentioned more often (for example this otherwise good answer does not mention it)?
The whole logic is quite clear once you know that indexes do not contain all-null records and that constraints are not used by the optimizer (as it seems to be the case).
If your index does not contain nulls and you ask for distinct on nullable column, you see, that index cannot be used (nulls are missing). If you select where col is not null
, index can be used (all non-null values are there).
-
It could use the index anyway, as soon as it knows there is a null or not the distinct values can be gatherted from an index scan. But yes I see its a Oracle limitation and its good to know that the WHERE helps me avoid another index. Thanks for the feedback.eckes– eckes2021年01月27日 13:46:53 +00:00Commented Jan 27, 2021 at 13:46