I wrote a batch of notes about the opt_estimate() hint a couple of years ago, including one where I explained the option for using the hint to specify the number of rows in a query block. I’ve just come across a particular special case for that strategy that others might find a use for. It’s something to do whant using the “select from dual … connect by” trick for multiplying rows.
Here’s a little data to model the idea – I’ve used the all_tables view to generate some “well-known” data since I want to add a tiny bit of complexity to the query while still leaving it easy to understand the index. The results from this demonstration come from Oracle 21.3.0.0, and I’ve included the hint /*+ no_adaptive_plan */ to stop Oracle from getting too clever during optimisation.
rem
rem Script: opt_estimate_dual.sql
rem Author: Jonathan Lewis
rem Dated: Nov 2022
rem
rem Last tested
rem 21.3.0.0
rem
create table tables_table as select * from all_objects where object_type = 'TABLE';
create table objects_table as select * from all_objects;
alter table objects_table add constraint ot_pk primary key(object_id);
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname => 'tables_table',
method_opt => 'for columns size 60 owner'
);
end;
/
set serveroutput off
with driver as (
select /*+ materialize */
tt.owner, tt.object_id, v1.rn
from tables_table tt,
(
select
/*+ opt_estimate(query_block scale_rows=10) */
rownum rn
from dual
connect by
level <= 10
) v1
where
tt.owner = 'OUTLN'
)
select /*+ no_adaptive_plan */
dr.rn, dr.owner, dr.object_id,
ot.object_id, ot.owner, ot.object_type, ot.object_name
from
driver dr,
objects_table ot
where
ot.object_id = dr.object_id
/
select * from table(dbms_xplan.display_cursor(format => 'hint_report'));
In my system tables_table holds 727 rows and objects_table holds 58383 rows. Three rows in tables_table correspond to tables owned by user ‘OUTLN’ which means I expect the driver CTE (common table expression / “with” subquery) to generate 30 rows and, given the join on unique id, the query to return 30 rows.
I’ve used the /*+ materialize */ hint to force Oracle to create an in-memory temporary table for the driver CTE, the /*+ no_adaptive_plan */ hint to stop Oracle from getting too clever during optimisation, and the critical /*+ opt_estimate() */ hint to help the optimizer understand the effect of my “connect by” on dual. Here’s the execution plan I get if I omit that last hint:
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 14 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6632_31D19D4 | | | | |
| 3 | MERGE JOIN CARTESIAN | | 3 | 78 | 9 (0)| 00:00:01 |
| 4 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 5 | COUNT | | | | | |
| 6 | CONNECT BY WITHOUT FILTERING | | | | | |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 3 | 39 | 9 (0)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | TABLES_TABLE | 3 | 39 | 7 (0)| 00:00:01 |
| 10 | NESTED LOOPS | | 3 | 453 | 5 (0)| 00:00:01 |
| 11 | NESTED LOOPS | | 3 | 453 | 5 (0)| 00:00:01 |
| 12 | VIEW | | 3 | 276 | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6632_31D19D4 | 3 | 78 | 2 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | OT_PK | 1 | | 0 (0)| |
| 15 | TABLE ACCESS BY INDEX ROWID | OBJECTS_TABLE | 1 | 59 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
9 - filter("TT"."OWNER"='OUTLN')
14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
0 - STATEMENT
- no_adaptive_plan
2 - SEL$1
- materialize
I’ve highlighted operations 4 and 8 in the plan: operation 4 is the view of dual that has generated 10 rows – unfortunately the optimizer has only considered the stats of the dual table, and hasn’t factored in the effects of the “connect by with rownum”. Operation 8 shows us that the optimizer has (correctly, thanks to the histogram I requested) estimated 3 rows for the tablescan of tables_table. The result of these two estimates is that operation 3 reports an estimate of 3 ( = 3 * 1 ) rows to be used in probing objects_table.
This is the plan after enabling the /*+ opt_estimate() */ hint:
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 45 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6633_31D19D4 | | | | |
| 3 | MERGE JOIN CARTESIAN | | 30 | 780 | 13 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | TABLES_TABLE | 3 | 39 | 7 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 10 | 130 | 6 (0)| 00:00:01 |
| 6 | VIEW | | 10 | 130 | 2 (0)| 00:00:01 |
| 7 | COUNT | | | | | |
| 8 | CONNECT BY WITHOUT FILTERING | | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | NESTED LOOPS | | 30 | 4530 | 32 (0)| 00:00:01 |
| 11 | NESTED LOOPS | | 30 | 4530 | 32 (0)| 00:00:01 |
| 12 | VIEW | | 30 | 2760 | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6633_31D19D4 | 30 | 780 | 2 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | OT_PK | 1 | | 0 (0)| |
| 15 | TABLE ACCESS BY INDEX ROWID | OBJECTS_TABLE | 1 | 59 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("TT"."OWNER"='OUTLN')
14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
0 - STATEMENT
- no_adaptive_plan
2 - SEL$1
- materialize
There are three things that stand out in this report.
- I’ve highlighted operations 4 and 6: operation 4 is the tablescan of tables_table that correctly estimates 3 rows; operation 6 is the view operation that now correctly estimates 10 rows.
- With the correct estimate for the view the estimate for the join to objects_table is now correct and the join order for the merge join cartesian at operation 3 has been reversed.
- The Hint Report tells us that the opt_estimate() hint is not (always) an optimizer hint! This is a real pain because when the opt_estimate() hints you’ve tried to use don’t appear to work it’s not easy to work out what you’ve done wrong.
To make a point, I can take the demo a little further by changing the /*+ opt_estimate() */ hint to scale_rows=120. Here’s the body of the resulting plan:
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 369 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D663A_31D19D4 | | | | |
| 3 | MERGE JOIN CARTESIAN | | 360 | 9360 | 13 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | TABLES_TABLE | 3 | 39 | 7 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 120 | 1560 | 6 (0)| 00:00:01 |
| 6 | VIEW | | 120 | 1560 | 2 (0)| 00:00:01 |
| 7 | COUNT | | | | | |
| 8 | CONNECT BY WITHOUT FILTERING | | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | HASH JOIN | | 360 | 54360 | 356 (1)| 00:00:01 |
| 11 | VIEW | | 360 | 33120 | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D663A_31D19D4 | 360 | 9360 | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS FULL | OBJECTS_TABLE | 58383 | 3363K| 354 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
The earlier plans used a nested loop join into objects_table. In this plan we can see at operation 10 that the optimizer has selected a hash join because the larger row estimate for the CTE has increased the cost of the query beyond the inflection point between nested loop and hash joins.
Summary
If you need to use the “connect by” in an inline view then you may find that the optimizer gets a very bad estimate of the number of rows the view definition will generate and that an /*+ opt_estimate() */ hint in the view using the “scale_rows=nnn” option will produce better estimates of cardinality, hence a better plan.
Footnote
In this particular case where I’ve used the dual table by itself in an inline view I could have used the “rows=NNN” option to get the same effect.
In any case I could have added a /*+ qb_name() */ hint to the inline view, and includes a qualifying “@qb” in the /*+ opt_estimate() */ hint.
Using hints is hard, especially when they’re not documented. There is a lot more to learn about this hint; for example, telling the optimizer about the size of a rowsource doesn’t help if it’s going to use its estimate of distinct values in the next steps of the plan – a correction you’ve managed to introduce at one point may disappear in the very next optimizer calculation.
This catalogue lists more articles on the opt_estimate() hint and its relatives.
[…] opt_estimate 4a (Nov 2022) – addressing a specific recent example that appeared on Oracle-l […]
Pingback by opt_estimate catalogue | Oracle Scratchpad — November 15, 2022 @ 11:23 am GMT Nov 15,2022 |