A little case study based on an example just in on the Oracle-L list server. This was supplied with a complete, working, test case that was small enough to understand and explain very quickly.
The user created a table, and used calls to dbms_stats to fake some statistics into place. Here, with a little cosmetic editing, is the code they supplied.
set serveroutput off set linesize 180 set pagesize 60 set trimspool on drop table t1 purge; create table t1 (id number(20), v varchar2(20 char)); create unique index pk_id on t1(id); alter table t1 add (constraint pk_id primary key (id) using index pk_id enable validate); exec dbms_stats.gather_table_stats(user, 't1'); declare srec dbms_stats.statrec; numvals dbms_stats.numarray; charvals dbms_stats.chararray; begin dbms_stats.set_table_stats( ownname => user, tabname => 't1', numrows => 45262481, numblks => 1938304, avgrlen => 206 ); numvals := dbms_stats.numarray (1, 45262481); srec.epc:=2; dbms_stats.prepare_column_values (srec, numvals); dbms_stats.set_column_stats ( ownname => user, tabname => 't1', colname => 'id', distcnt => 45262481, density => 1/45262481, nullcnt => 0, srec => srec, avgclen => 6 ); charvals := dbms_stats.chararray ('', ''); srec.epc:=2; dbms_stats.prepare_column_values (srec, charvals); dbms_stats.set_column_stats( ownname => user, tabname => 't1', colname => 'v', distcnt => 0, density => 0, nullcnt => 45262481, srec => srec, avgclen => 0 ); dbms_stats.set_index_stats( ownname => user, indname =>'pk_id', numrows => 45607914, numlblks => 101513, numdist => 45607914, avglblk => 1, avgdblk => 1, clstfct => 33678879, indlevel => 2 ); end; / variable n1 nvarchar2(32) variable n2 number begin :n1 := 'D'; :n2 := 50; end; / select /*+ gather_plan_statistics */ * from ( select a.id col0,a.id col1 from t1 a where a.v = :n1 and a.id > 1 order by a.id ) where rownum <= :n2 ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost peeked_binds '));
From Oracle’s perspective the table has 45M rows, with a unique sequential key starting at 1 in the id column. The query looks like a pagination query, asking for 50 rows, ordered by id. But the in-line view asks for rows where id > 1 (which, initiall, means all of them) and applies a filter on the v column.
Of course we know that v is always null, so in theory the predicate a.v = :n1 is always going to return false (or null, but not true) – so the query will never return any data. However, if you read the code carefully you’ll notice that the bind variable v has been declared as an nvarchar2() not a varchar2().
Here’s the exection plan I got on an instance running 19.3 – and it’s very similar to the plan supplied by the OP:
---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 3747 (100)| 0 |00:00:00.01 | |* 1 | COUNT STOPKEY | | 1 | | | 0 |00:00:00.01 | | 2 | VIEW | | 1 | 50 | 3747 (1)| 0 |00:00:00.01 | |* 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 452K| 3747 (1)| 0 |00:00:00.01 | |* 4 | INDEX RANGE SCAN | PK_ID | 0 | 5000 | 14 (0)| 0 |00:00:00.01 | ---------------------------------------------------------------------------------------------------- Peeked Binds (identified by position): -------------------------------------- 2 - :2 (NUMBER): 50 Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=:N2) 3 - filter(SYS_OP_C2C("A"."V")=:N1) 4 - access("A"."ID">1)
The question we were asked was this: “Why does the optimizer estimate that it will return 5,000 entries from the index range scan at operation4?”
The answer is the result of combining two observations.
First: In the Predicate Information you can see that Oracle has applied a character-set conversion to the original predicate “a.v = :n1” to produce filter(SYS_OP_C2C(“A”.”V”)=:N1). The selectivity of “function of something = bind value” is one of those cases where Oracle uses one of its guesses, in this case 1%. Note that the E-rows estimate for operation 3 (table access) is 452K, which is 1% of the 45M rows in the table.
In real life if you had optimizer_dynamic_sampling set at level 3, or had added the hint /*+ dynamic_sampling(3) */ to the query, Oracle would sample some rows to avoid the need for guessing at this point.
Secondly: the optimizer has peeked the bind variable for the rownum predicate, so it is optimizing for 50 rows (basically doing the arithmetic of first_rows(50) optimisation). The optimizer “knows” that the filter predicate at the table will eliminate all but 1% of the rows acquired, and it “knows” that it has to do enough work to find 50 rows in total – so it can calculate that (statistically speaking) it has to walk through 5,000 (= 50 * 100) index entries to visit enough rows in the table to end up with 50 rows.
Next Steps (left as exercise)
Once you’ve got the answer to the question “Why is this number 5,000?”, you might go back and point out that the estimate for the table access was 95 times larger than the estimate for the number of rowids selected from the index and wonder how that could be possible. (Answer: that’s just one of the little defects in the code for first_rows(n).)
You might also wonder what would have happened in this model if the bind variable n1 had been declared as a varchar2() rather than an nvarchar2() – and that might have taken you on to ask yet another question about what the optimizer was playing at.
Once you’ve modelled something that is a little puzzle there’s always scope for pushing the model a little further and learning a little bit more before you file the model away for testing on the next version of Oracle.
Comments and related questions are welcome.