Oracle Scratchpad

January 6, 2023

Case Study

Filed under: Execution plans,Oracle — Jonathan Lewis @ 4:34 pm GMT Jan 6,2023

Here’s a query that appeared on the MOS “SQL Performance” forum (needs a support account) with the request: “Can someone please assist me to optimize this query?”

It looks like a nice simple query (though looks can be deceiving) so I thought I’d use it as another example on how to think about tuning SQL.

SELECT 
        MSI.SEGMENT1, OL.SHIP_FROM_ORG_ID, 
        OL.ORDERED_QUANTITY SOLD_QTY, 
        OL.UNIT_SELLING_PRICE SELLING_PRICE
FROM 
        OE_ORDER_HEADERS_ALL   OH, 
        OE_ORDER_LINES_ALL     OL, 
        MTL_SYSTEM_ITEMS_B     MSI
WHERE 
        OH.HEADER_ID         = OL.HEADER_ID
AND     OL.SHIP_FROM_ORG_ID  = MSI.ORGANIZATION_ID
AND     OL.INVENTORY_ITEM_ID = MSI.INVENTORY_ITEM_ID
AND     NVL(oh.source_document_type_id,0) <> 10
AND     SUBSTR(SEGMENT1,4,3) = 'FIF'
AND     UPPER(OL.FLOW_STATUS_CODE) NOT IN ('ENTERED','CANCELLED')

I really dislike (and actually have trouble) reading blocks of text in upper case, so before I do anything else here’s the same code converted to (mostly) lower case, with one minor adjustment:

select
        msi.segment1, 
        ol.ship_from_org_id, 
        ol.ordered_quantity     sold_qty,
        ol.unit_selling_price   selling_price
from 
        oe_order_headers_all   oh, 
        oe_order_lines_all     ol, 
        mtl_system_items_b     msi
where 
        oh.header_id         = ol.header_id
and     ol.ship_from_org_id  = msi.organization_id
and     ol.inventory_item_id = msi.inventory_item_id
and     nvl(oh.source_document_type_id,0) <> 10
and     substr(msi.segment1,4,3) = 'FIF'
and     upper(ol.flow_status_code) not in ('ENTERED','CANCELLED')

I’ve highlighted line 15 to pick out the minor adjustment: I’ve added a table alias to the expression substr(segment1,4,3). Every column reference should include its table alias. The query was pretty good in following this guideline and it was fairly easy to work out the appropriate alias here because we can also see msi.segment1 in the select list and if there had been a column called segment1 in one of the other tables Oracle would have reported error “ORA-00918: column ambiguously defined“.

We’ve been told that the query is a “sales query” against Oracle EBS R12, and we’ve been given an execution plan – though the plan, unfortunmetly, is an image from one of the non-Oracle GUIs:

Conveniently the plan tells us that the three tables in the query really are tables and not views that hide complex subqueries; it also gives us some idea of the size of two of the tables (very big). What it doesn’t give us is any information about how and where the original predicates have been used.

Things we don’t know

  • What the query means (in business terms)
  • How long it takes the query to run
  • How long it probably should take the query to run
  • How fast the OP would like the query to run
  • How often the query will be run
  • How many rows from each table have to participate in deriving the result
  • How many rows and blocks there are in each table
  • How many distinct values there are for each of the columns in the where clause.
  • Whether there are any virtual columns (or extended stats) on the table
  • Whether there are any (numerically) skewed data distribution patterns
  • What physical patterns there might be in the rows identified by the predicates.
  • What the definitions of any available indexes are (including “function-based”)
  • Whether or not we’re allowed to change the code, or whether we have to “hint” it somehow.
  • Whether this is the only set of literal values that would appear or whether it’s generated code that allows many variations to appear.
  • Whether this is a case where a production query using bind variables is being tested with one known set of values.
  • Which version of Oracle and optimizer_features_enable / hacked optimizer parameters

Basic threats

Every single predicate that compares a column with a literal hides the column inside a function call – which means the optimizer may be “losing” important statistical information

Two of the three literal-based predicates are “negative”, i.e. “not equal” and “not in”, which is another detail that can mess up the optimizer’s arithmetic (though the specific effects may vary with version and the column statistics).

Observations and Guesswork

The optimizer cardinality estimate for upper(ol.flow_status_code) not in (‘ENTERED’,’CANCELLED’) is 76,240 with a tablescan cost of 989,658. That looks like a very small fraction of a very large table. But it’s possible that this is the standard estimate for “unknown value not in (list of 2 items)” which would be 5% of 5% in 19c. Certainly 76,240 * 20 * 20 = 30M sounds like a suitable number of rows for a table with a tablescan cost close to a million. Possibly a better cardinality estimate would change the plan.

The name flow_status_code suggests a column that would have a small number of distinct values with an enormous data skew. If we had extended stats, or a virtual column, on upper(flow_status_code) with a histogram in place the optimizer might pick a completely different path. If the estimate became very small it might be able to choose nested loops and an indexed access path all the way through the query; if the estimate became very large it might decide to use a different join order.

The optimizer cardinality estimate for nvl(oh.source_document_type_id,0) <> 10 is 6.65M with a tablescan cost of 95,823. The selectivity for this predicate is derived as 1 – selectivity(nvl(oh.source_document_type_id,0)= 10), which is derived as 1 – selectivity( oh.source_document_type_id = 10).

At this point I could launch into all sorts of speculation about the column: the number of distinct values, the number of nulls, the existence (or not) of a histogram, a comparison between the blocks and cardinality of this “order headers” table and the estimates made above for the “order lines” table – but trying to cover all the options would be a long and tangled document, so I’ll just show you one example that might be a valid model this predicate:

  • t1 is a table with 100,000 rows
  • data_type_id is a numeric column with 80 distinct values
  • there are 1,000 rows in t1 where data_type_id is null
  • I’ve gathered stats just before executing a couple of queries

Here are the two queries, each followed by the critical line from its execution plan:

SQL> select count(*) from t1 where nvl(data_type_id,0) <> 50;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|*  2 |   TABLE ACCESS FULL| T1   | 98763 |   289K|   553   (2)| 00:00:01 |
---------------------------------------------------------------------------


SQL> select count(*) from t1 where data_type_id = 50;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|*  2 |   TABLE ACCESS FULL| T1   |  1238 |  3714 |   551   (1)| 00:00:01 |
---------------------------------------------------------------------------

A quick arithmetic check: add the two row estimates: 98,763 + 1,238 = 99,001. Allowing for rounding errors in the arithmetic that’s the 99,000 rows where data_type_id is not null.

Unfortunately the actual count from the first query is 2,793 – the optimizer’s estimate is out by a factor of more than 35; and the actual count from the second query is 97,207 – the optimizer’s estimate is out (in the opposite direction) by a factor of more than 78.

The statistics are “up to date” – except 50 is an extremely common value for data_type_id and I haven’t created a histogram on the column.

So perhaps the original query is asking for the “rare” order headers but there’s no histogram to give the optimizer any clue that 10 (the unwanted value) is an extremely common value for the document_type_id. Would a suitable histogram allow the optimizer to produce a much lower cardinality estimate and a significant change in the execution plan?

It may be significant that the table order in the from clause, and the predicate order in the where clause suggests that the original programmer may have been aiming for an execution plan that started at the oe_order_headers_all table.

Finally we come to the predicate substr(msi.segment1,4,3) = ‘FIF’. I don’t know much about EBS but I can guess what the table names oe_order_headers_all and oe_order_lines_all represent; on the other hand I can’t think what is really behind the name mtl_system_items_b, especially when the column names that join it to the order lines table make it look as if it should be called something more like oe_products_all; and there’s no way to guess what the significance of a little chunk of a column called segment1 might be.

The only thing we can infer from the execution plan is that it must be a fairly huge table since the optimizer has chosen to probe it through a nested looped 76,240 times with a total incremental cost of 152,551 (= 1,244,546 – 1,091,995) rather than doing a tablescan and hash join.

The cardinality and cost estimates show a couple of interesting details. First, there’s a clear error in the cardinality estimate of the nested loop join since but the number of rows produced by the join is (very slightly) larger than the number of rows supplied to it by the hash join, even though it’s accessing the table by a unique scan of a unique index.

Secondly it’s worth commenting on the cost of the indexed access, which might seem low at only 2 when I’ve claimed it’s a “fairly huge” table which would presumably require traversing an index with a blevel of 2 (root, branch, leaf) before visiting the required table block – so a cost of 4 for the table visit would seem reasonable. In fact 4 is what we’d see for a non-unique index (on this data) and a query for “id = {constant}”; the optimizer has substracted one for the uniqueness, and one because it’s a nested loop join.

If we know that there are only a very few rows where substr(segment_1,4,3) = ‘FIF’, and if we think that this is a good starting table for the optimizer then we need to create a virtual column (or extended stats) and might need to support that with a histogram. Even then, of course, it might not be a good table to have first in the join order.

Joins

So far we’ve only been considering ways to improve the optimizer’s cardinality estimates in the hope that better information would give it a better execution plan. We believe that all three tables are very large and suspect that if the actual volume of relevant data is small we can get a better path that uses nested loops from beginning to end. But if we want that path to be efficient we’re going to need suitable indexes, including a precision index into the first table in the join.

What we need to do now is consider the resources that might be needed to ensure we have the “perfect” indexes for the optimum path, compared with the excess resources that would be used if we could find a “good enough” path.

Looking at the query, my feeling is that there are two possible paths that might work reasonably well (assuming the required volume of data is sufficiently small):

  • oe_order_headers_all -> or_order_lines_all -> mtl_system_items_b
  • oe_order_lines_all -> or_order_headers_all -> mtl_system_items_b

I suspect (in the absence of any supplied information) that a path starting with mtl_system_items_b will do too much random I/O into the order lines tables. (It’s a standard example of the problem displayed by pairings like: products -> order_lines and customers -> orders the order lines for any given product are likely to be be scattered widely across the table, as are the orders for any particular customer. Starting with mtl_system_items_b might do a lot of random I/O before discarding a lot of unwanted order lines.

Looking at the oe_order_lines_all table it strikes me that only a small number of rows will be newly entered or cancelled, and most of them will be in states like “completed”, “picked” invoiced”, etc, so the predicate on flow_status_code is probably not one that will eliminate a lot of data, so I’m just going to talk about the options for getting into the oe_order_headers_all table. Assuming the predicate “nvl(oh.source_document_type_id,0) <> 10” does identify a “small enough” number of rows then we probably have the a good enough index (the foreign key index – which probably should be created with a degree of compression and will probably also be the primary key) into oe_order_lines_all, and we know we have a unique index from there into mtl_system_items_b.

So how do we access that small number of rows as efficiently as possible with a minimum of overhead. We’ve set up a histogram on source_document_type_id so that the optimizer gets a better idea of the number of rows – but that still leaves us with a tablescan of a huge table unless we create a tailored index. Here’s an example of the type of approach we can take – based on a table t2 that has 50,000 rows of which almost all have a document_type of ‘XX’, which we want to ignore, and a few rows where the document_type is null which have to be included in the ones we keep:

SQL> execute dbms_stats.gather_table_stats(user,'t2',method_opt =>'for all columns size 1 for columns size 254 document_type')

SQL> select count(*) from t2 where nvl(document_type,'YY') <> 'XX';

  COUNT(*)
----------
       300

Execution Plan
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   278   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T2   |   300 |  1200 |   278   (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("DOCUMENT_TYPE",'YY')<>'XX')

In keeping with the original supplied code I have a predicate which translates NULLs into a value that isn’t the ‘XX’ that we want to exclude. With the histogram in place the optimizer expresion has actually got exactly the right estimate.

So here’s a possible index definition that will allow us to create a very small index that identifies exactly those rows as efficiently as possible:

SQL> create index t2_i1 on t2 (case when nvl(document_type,'YY') <> 'XX' then 1 end);

SQL> execute dbms_stats.gather_table_stats(user,'t2',method_opt =>'for all hidden columns')

SQL> select count(*) from t2 where case when nvl(document_type,'YY') <> 'XX' then 1 end = 1;

  COUNT(*)
----------
       300

Execution Plan
----------------------------------------------------------
Plan hash value: 3503408237

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     2 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |       |     1 |     2 |            |          |
|*  2 |   INDEX RANGE SCAN| T2_I1 |   300 |   600 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(CASE  WHEN NVL("DOCUMENT_TYPE",'YY')<>'XX' THEN 1 END =1)

Two things to note; first, after I’ve created the “function-based” index I’ve gathered stats on “all hidden columns”. This is one way of gathering stats on the system-generated, invisible, virtual column that supports the index, and is a step that is often missed until the stats are gathered overnight by the automatic stats collection job at some random point in the future. Secondly I’ve had to change the query so that the predicate I use is an exact match for the index definition; this is why it’s often nicer to create a virtual column for the expression and index the virtual column – and to keep 3rd party or legacy code safe it’s often a good idea to declare such columns invisible.

Finally, just to show the efficiency of this strategy, here are a couple of the interesting stats about the index:

SQL> select table_name, index_name, num_rows , leaf_blocks from user_indexes where table_name = 'T2';

TABLE_NAME                INDEX_NAME             NUM_ROWS LEAF_BLOCKS
------------------------- -------------------- ---------- -----------
T2                        T2_I1                       300           1


The table has 50,000 rows, but the index has only 300 entries (that fit in a single leaf block) as the expression is NULL for the other 49,700. As an important bonus it’s fairly safe to assume that there won’t be any code in the system that will decide to use this strangely defined index when it should be using some other index.

Conclusion

We started with a query that was “too slow”. It contained some predicates that would hide any useful statistical information from the optimizer. In two of the three cases we could give the optimizer some useful statistics by creating virtual columns or extended stats on the expressions; and in the third case the structure of the expression was a special “nvl()” case that could simply have been missing a histogram on the underlying column.

Some of the information in the execution plan gave us clues about the scale of the tables – including the table which was accessed by a unique index – but on-site DBAs wouldn’t need to make guesses about some of the numbers I came up with, they could simply query the data dictionary or, for some details, query the data directly.

Reviewing table and column names, and making some assumptions about some of the data distributions (again details that could be extracted from the actual data), I picked a path that would probably be suitable if the required volume of data was relatively small, then demonstrated how we could add in an efficient index that would make this query as efficient as needed without requiring much overhead in index maintenance and without introducing the risk of other queries changing execution plans to use this new index.

November 15, 2022

opt_estimate 4a

Filed under: CBO,Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 11:21 am GMT Nov 15,2022

I wrote a batch of notes about the opt_estimate() hint a couple of years ago, including one where I explained the option for using the hint to specify the number of rows in a query block. I’ve just come across a particular special case for that strategy that others might find a use for. It’s something to do whant using the “select from dual … connect by” trick for multiplying rows.

Here’s a little data to model the idea – I’ve used the all_tables view to generate some “well-known” data since I want to add a tiny bit of complexity to the query while still leaving it easy to understand the index. The results from this demonstration come from Oracle 21.3.0.0, and I’ve included the hint /*+ no_adaptive_plan */ to stop Oracle from getting too clever during optimisation.

rem
rem     Script:         opt_estimate_dual.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2022
rem
rem     Last tested 
rem             21.3.0.0
rem

create table tables_table as select * from all_objects where object_type = 'TABLE';
create table objects_table as select * from all_objects;

alter table objects_table add constraint ot_pk primary key(object_id);

begin
        dbms_stats.gather_table_stats(
                ownname    => user,
                tabname    => 'tables_table',
                method_opt => 'for columns size 60 owner'
        );
end;
/

set serveroutput off

with driver as (
        select  /*+ materialize */
                tt.owner, tt.object_id, v1.rn
        from    tables_table tt,
                (
                select
                        /*+  opt_estimate(query_block scale_rows=10) */
                        rownum rn
                from    dual
                connect by
                        level <= 10
                ) v1
        where
                tt.owner = 'OUTLN'
)
select  /*+ no_adaptive_plan */
        dr.rn, dr.owner, dr.object_id,
        ot.object_id, ot.owner, ot.object_type, ot.object_name
from
        driver dr,
        objects_table   ot
where
        ot.object_id = dr.object_id
/

select * from table(dbms_xplan.display_cursor(format => 'hint_report'));


In my system tables_table holds 727 rows and objects_table holds 58383 rows. Three rows in tables_table correspond to tables owned by user ‘OUTLN’ which means I expect the driver CTE (common table expression / “with” subquery) to generate 30 rows and, given the join on unique id, the query to return 30 rows.

I’ve used the /*+ materialize */ hint to force Oracle to create an in-memory temporary table for the driver CTE, the /*+ no_adaptive_plan */ hint to stop Oracle from getting too clever during optimisation, and the critical /*+ opt_estimate() */ hint to help the optimizer understand the effect of my “connect by” on dual. Here’s the execution plan I get if I omit that last hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    14 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6632_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |     3 |    78 |     9   (0)| 00:00:01 |
|   4 |     VIEW                                 |                            |     1 |    13 |     2   (0)| 00:00:01 |
|   5 |      COUNT                               |                            |       |       |            |          |
|   6 |       CONNECT BY WITHOUT FILTERING       |                            |       |       |            |          |
|   7 |        FAST DUAL                         |                            |     1 |       |     2   (0)| 00:00:01 |
|   8 |     BUFFER SORT                          |                            |     3 |    39 |     9   (0)| 00:00:01 |
|*  9 |      TABLE ACCESS FULL                   | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|  10 |   NESTED LOOPS                           |                            |     3 |   453 |     5   (0)| 00:00:01 |
|  11 |    NESTED LOOPS                          |                            |     3 |   453 |     5   (0)| 00:00:01 |
|  12 |     VIEW                                 |                            |     3 |   276 |     2   (0)| 00:00:01 |
|  13 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6632_31D19D4 |     3 |    78 |     2   (0)| 00:00:01 |
|* 14 |     INDEX UNIQUE SCAN                    | OT_PK                      |     1 |       |     0   (0)|          |
|  15 |    TABLE ACCESS BY INDEX ROWID           | OBJECTS_TABLE              |     1 |    59 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - filter("TT"."OWNER"='OUTLN')
  14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  no_adaptive_plan

   2 -  SEL$1
           -  materialize


I’ve highlighted operations 4 and 8 in the plan: operation 4 is the view of dual that has generated 10 rows – unfortunately the optimizer has only considered the stats of the dual table, and hasn’t factored in the effects of the “connect by with rownum”. Operation 8 shows us that the optimizer has (correctly, thanks to the histogram I requested) estimated 3 rows for the tablescan of tables_table. The result of these two estimates is that operation 3 reports an estimate of 3 ( = 3 * 1 ) rows to be used in probing objects_table.

This is the plan after enabling the /*+ opt_estimate() */ hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    45 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6633_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |    30 |   780 |    13   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL                    | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|   5 |     BUFFER SORT                          |                            |    10 |   130 |     6   (0)| 00:00:01 |
|   6 |      VIEW                                |                            |    10 |   130 |     2   (0)| 00:00:01 |
|   7 |       COUNT                              |                            |       |       |            |          |
|   8 |        CONNECT BY WITHOUT FILTERING      |                            |       |       |            |          |
|   9 |         FAST DUAL                        |                            |     1 |       |     2   (0)| 00:00:01 |
|  10 |   NESTED LOOPS                           |                            |    30 |  4530 |    32   (0)| 00:00:01 |
|  11 |    NESTED LOOPS                          |                            |    30 |  4530 |    32   (0)| 00:00:01 |
|  12 |     VIEW                                 |                            |    30 |  2760 |     2   (0)| 00:00:01 |
|  13 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6633_31D19D4 |    30 |   780 |     2   (0)| 00:00:01 |
|* 14 |     INDEX UNIQUE SCAN                    | OT_PK                      |     1 |       |     0   (0)|          |
|  15 |    TABLE ACCESS BY INDEX ROWID           | OBJECTS_TABLE              |     1 |    59 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("TT"."OWNER"='OUTLN')
  14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  no_adaptive_plan

   2 -  SEL$1
           -  materialize


There are three things that stand out in this report.

  • I’ve highlighted operations 4 and 6: operation 4 is the tablescan of tables_table that correctly estimates 3 rows; operation 6 is the view operation that now correctly estimates 10 rows.
  • With the correct estimate for the view the estimate for the join to objects_table is now correct and the join order for the merge join cartesian at operation 3 has been reversed.
  • The Hint Report tells us that the opt_estimate() hint is not (always) an optimizer hint! This is a real pain because when the opt_estimate() hints you’ve tried to use don’t appear to work it’s not easy to work out what you’ve done wrong.

To make a point, I can take the demo a little further by changing the /*+ opt_estimate() */ hint to scale_rows=120. Here’s the body of the resulting plan:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |   369 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D663A_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |   360 |  9360 |    13   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                    | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|   5 |     BUFFER SORT                          |                            |   120 |  1560 |     6   (0)| 00:00:01 |
|   6 |      VIEW                                |                            |   120 |  1560 |     2   (0)| 00:00:01 |
|   7 |       COUNT                              |                            |       |       |            |          |
|   8 |        CONNECT BY WITHOUT FILTERING      |                            |       |       |            |          |
|   9 |         FAST DUAL                        |                            |     1 |       |     2   (0)| 00:00:01 |
|  10 |   HASH JOIN                              |                            |   360 | 54360 |   356   (1)| 00:00:01 |
|  11 |    VIEW                                  |                            |   360 | 33120 |     2   (0)| 00:00:01 |
|  12 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D663A_31D19D4 |   360 |  9360 |     2   (0)| 00:00:01 |
|  13 |    TABLE ACCESS FULL                     | OBJECTS_TABLE              | 58383 |  3363K|   354   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

The earlier plans used a nested loop join into objects_table. In this plan we can see at operation 10 that the optimizer has selected a hash join because the larger row estimate for the CTE has increased the cost of the query beyond the inflection point between nested loop and hash joins.

Summary

If you need to use the “connect by” in an inline view then you may find that the optimizer gets a very bad estimate of the number of rows the view definition will generate and that an /*+ opt_estimate() */ hint in the view using the “scale_rows=nnn” option will produce better estimates of cardinality, hence a better plan.

Footnote

In this particular case where I’ve used the dual table by itself in an inline view I could have used the rows=NNN” option to get the same effect.

In any case I could have added a /*+ qb_name() */ hint to the inline view, and includes a qualifying “@qb” in the /*+ opt_estimate() */ hint.

Using hints is hard, especially when they’re not documented. There is a lot more to learn about this hint; for example, telling the optimizer about the size of a rowsource doesn’t help if it’s going to use its estimate of distinct values in the next steps of the plan – a correction you’ve managed to introduce at one point may disappear in the very next optimizer calculation.

This catalogue lists more articles on the opt_estimate() hint and its relatives.

September 29, 2022

Case Study

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:27 pm BST Sep 29,2022

A recent question on the Oracle Developer Community forum asked for help with a statement that was taking a long time to run. The thread included the results from a trace file that had been passed through tkprof so we have the query and the actual execution plan with the Row Source Operation details.

Here’s the query – extracted from the tkprof output:

SELECT DISTINCT
       pll.po_line_id,
       ploc.line_location_id,
       (SELECT ptl.line_type
          FROM apps.po_line_types_tl ptl
         WHERE ptl.line_type_id = pll.line_type_id AND ptl.LANGUAGE = 'US')
           "Line_Type",
       ploc.quantity_accepted,
       NULL
           release_approved_date,
       NULL
           release_date,
       NULL
           release_hold_flag,
       NULL
           release_type,
       DECODE (ploc.po_release_id, NULL, NULL, ploc.quantity)
           released_quantity,
       (SELECT items.preprocessing_lead_time
          FROM apps.mtl_system_items_b items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id = ploc.SHIP_TO_ORGANIZATION_ID)
           "PreProcessing_LT",
       (SELECT items.full_lead_time
          FROM apps.mtl_system_items_b items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id = ploc.SHIP_TO_ORGANIZATION_ID)
           "Processing_LT",
       (SELECT items.postprocessing_lead_time
          FROM apps.mtl_system_items_b items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id = ploc.SHIP_TO_ORGANIZATION_ID)
           "PostProcessing_LT",
       ploc.firm_status_lookup_code,
       NVL (
           (SELECT pla.promised_date
              FROM apps.po_line_locations_archive_all pla
             WHERE     pla.po_header_id = pha.po_header_id
                   AND pla.po_line_id = pll.po_line_id
                   AND pla.line_location_id = ploc.line_location_id
                   AND pla.revision_num =
                       (SELECT MIN (revision_num)
                          FROM apps.po_line_locations_archive_all plla2
                         WHERE     plla2.promised_date IS NOT NULL
                               AND plla2.line_location_id =
                                   ploc.line_location_id)),
           ploc.promised_date)
           "Original_Promise_Date",
       (SELECT items.long_description
          FROM apps.mtl_system_items_tl items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id IN
                       (SELECT fin.inventory_organization_id
                          FROM apps.financials_system_params_all fin
                         WHERE fin.org_id = pha.org_id)
               AND items.LANGUAGE = 'US')
           "Item_Long_Description",
       NVL (ploc.approved_flag, 'N')
           approved_code,
       pvs.country
           "Supplier_Site_Country",
       pll.note_to_vendor,
         NVL (ploc.quantity, 0)
       - NVL (ploc.quantity_cancelled, 0)
       - NVL (ploc.quantity_received, 0) * ploc.price_override
           "Shipment_Amount",
       ploc.attribute4
           "PO_Ship_Date",
       (SELECT meaning
          FROM apps.fnd_lookup_values
         WHERE     lookup_type = 'SHIP_METHOD'
               AND lookup_code = ploc.attribute9
               AND language = 'US')
           "Ship_Method",
       (SELECT prla.note_to_receiver
          FROM apps.po_req_distributions_all  prda
               INNER JOIN apps.po_requisition_lines_all prla
                   ON prda.requisition_line_id = prla.requisition_line_id
         WHERE prda.distribution_id = pdi.req_distribution_id)
           "Note_To_Receiver",
       DECODE (pha.USER_HOLD_FLAG, 'Y', 'Y', pll.USER_HOLD_FLAG)
           "Hold_Flag",
       (SELECT ABC_CLASS_NAME
          FROM APPS.MTL_ABC_ASSIGNMENT_GROUPS  ASG
               INNER JOIN APPS.MTL_ABC_ASSIGNMENTS ASSI
                   ON ASG.ASSIGNMENT_GROUP_ID = ASSI.ASSIGNMENT_GROUP_ID
               INNER JOIN APPS.MTL_ABC_CLASSES classes
                   ON ASSI.ABC_CLASS_ID = classes.ABC_CLASS_ID
         WHERE     ASG.organization_id = ploc.SHIP_TO_ORGANIZATION_ID
               AND ASG.ASSIGNMENT_GROUP_NAME = 'MIN ABC Assignment'
               AND ASSI.inventory_item_id = pll.item_id)
           ABCClass,
       (SELECT CONCATENATED_SEGMENTS AS charge_accountsfrom
          FROM apps.gl_code_combinations_kfv gcc
         WHERE gcc.code_combination_id = pdi.code_combination_id)
           AS charge_accounts
  FROM apps.po_headers_all         pha,
       apps.po_lines_all           pll,
       apps.po_line_locations_all  ploc,
       apps.po_distributions_all   pdi,
       apps.per_all_people_f       papf,
       apps.AP_SUPPLIERS           pv,
       apps.AP_SUPPLIER_SITES_ALL  pvs,
       apps.AP_SUPPLIER_CONTACTS   pvc,
       apps.ap_terms               apt,
       apps.po_lookup_codes        plc1,
       apps.po_lookup_codes        plc2,
       apps.hr_locations           hlv_line_ship_to,
       apps.hr_locations           hlv_ship_to,
       apps.hr_locations           hlv_bill_to,
       apps.hr_organization_units  hou,
       apps.hr_locations_no_join   loc,
       apps.hr_locations_all_tl    hrl1,
       apps.hr_locations_all_tl    hrl2
 WHERE     1 = 1
       AND pll.po_header_id(+) = pha.po_header_id
       AND ploc.po_line_id(+) = pll.po_line_id
       AND pdi.line_location_id(+) = ploc.line_location_id
       AND ploc.shipment_type IN ('STANDARD', 'PLANNED')
       AND papf.person_id(+) = pha.agent_id
       AND TRUNC (SYSDATE) BETWEEN papf.effective_start_date
                               AND papf.effective_end_date
       AND papf.employee_number IS NOT NULL
       AND pv.vendor_id(+) = pha.vendor_id
       AND pvs.vendor_site_id(+) = pha.vendor_site_id
       AND pvc.vendor_contact_id(+) = pha.vendor_contact_id
       AND apt.term_id(+) = pha.terms_id
       AND plc1.lookup_code(+) = pha.fob_lookup_code
       AND plc1.lookup_type(+) = 'FOB'
       AND plc2.lookup_code(+) = pha.freight_terms_lookup_code
       AND plc2.lookup_type(+) = 'FREIGHT TERMS'
       AND hlv_line_ship_to.location_id(+) = ploc.ship_to_location_id
       AND hlv_ship_to.location_id(+) = pha.ship_to_location_id
       AND hlv_bill_to.location_id(+) = pha.bill_to_location_id
       AND hou.organization_id = pha.org_id
       AND hou.location_id = loc.location_id(+)
       AND hrl1.location_id(+) = pha.ship_to_location_id
       AND hrl1.LANGUAGE(+) = 'US'
       AND hrl2.location_id(+) = pha.bill_to_location_id
       AND hrl2.LANGUAGE(+) = 'US'
       AND hou.organization_id IN (2763)
       AND NVL (pha.closed_code, 'OPEN') IN ('OPEN', 'CLOSED')
       AND NVL (pll.closed_code, 'OPEN') IN ('OPEN', 'CLOSED')
       AND NVL (ploc.cancel_flag, 'N') = 'N'
       AND pha.authorization_status IN
               ('APPROVED', 'REQUIRES REAPPROVAL', 'IN PROCESS')

As you can see there are 10 inline scalar subqueries (highlighted) in the query with a select distinct to finish off the processing of an 18 table join. That’s a lot of scalar subqueries so it’s worth asking whether the code should be rewritten to use joins (though in newer vesions of Oracle some of the subqueries might be transformed to outer joins anyway – but the OP is using 11.2.0.4). We also know that a distinct is sometimes a hint that the code has a logic error that has been “fixed” by eliminating duplicates.

Ignoring those points, let’s consider the execution plan from the tkprof output which (with a tiny bit of extra formatting) is as follows:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.46       1.75          0          3          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    50346    279.02    1059.39     179103   30146895          0      755164
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    50348    279.49    1061.14     179103   30146898          0      755164

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 678  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         9          9          9  TABLE ACCESS BY INDEX ROWID PO_LINE_TYPES_TL (cr=20 pr=0 pw=0 time=680 us cost=2 size=32 card=1)
         9          9          9   INDEX UNIQUE SCAN PO_LINE_TYPES_TL_U1 (cr=11 pr=0 pw=0 time=323 us cost=1 size=0 card=1)(object id 63682480)

    576365     576365     576365  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_B (cr=2267756 pr=28 pw=0 time=22598079 us cost=4 size=13 card=1)
    576365     576365     576365   INDEX UNIQUE SCAN MTL_SYSTEM_ITEMS_B_U1 (cr=1720936 pr=0 pw=0 time=4644552 us cost=3 size=0 card=1)(object id 42812859)

    576365     576365     576365  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_B (cr=2267747 pr=0 pw=0 time=2442479 us cost=4 size=13 card=1)
    576365     576365     576365   INDEX UNIQUE SCAN MTL_SYSTEM_ITEMS_B_U1 (cr=1720936 pr=0 pw=0 time=1238342 us cost=3 size=0 card=1)(object id 42812859)

    576365     576365     576365  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_B (cr=2267743 pr=0 pw=0 time=2029190 us cost=4 size=14 card=1)
    576365     576365     576365   INDEX UNIQUE SCAN MTL_SYSTEM_ITEMS_B_U1 (cr=1720932 pr=0 pw=0 time=967729 us cost=3 size=0 card=1)(object id 42812859)

    672743     672743     672743  TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=5507736 pr=163043 pw=0 time=535914552 us cost=3 size=27 card=1)
    672743     672743     672743   INDEX UNIQUE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=4560824 pr=163043 pw=0 time=533161038 us cost=2 size=0 card=1)(object id 42811947)
    755121     755121     755121    SORT AGGREGATE (cr=3540960 pr=163043 pw=0 time=530079821 us)
   1040963    1040963    1040963     TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=3540960 pr=163043 pw=0 time=534243973 us cost=5 size=15 card=1)
   1776649    1776649    1776649      INDEX RANGE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=1123074 pr=6392 pw=0 time=37128373 us cost=3 size=0 card=2)(object id 42811947)

    587486     587486     587486  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_TL (cr=3436629 pr=3564 pw=0 time=64125044 us cost=5 size=34 card=1)
    587486     587486     587486   INDEX RANGE SCAN MTL_SYSTEM_ITEMS_TL_U1 (cr=2852930 pr=869 pw=0 time=45628505 us cost=4 size=0 card=1)(object id 136492495)
         1          1          1    TABLE ACCESS BY INDEX ROWID FINANCIALS_SYSTEM_PARAMS_ALL (cr=645351 pr=0 pw=0 time=5743158 us cost=2 size=10 card=1)
    322268     322268     322268     INDEX SKIP SCAN FINANCIALS_SYSTEM_PARAMS_U1 (cr=323083 pr=0 pw=0 time=5104895 us cost=1 size=0 card=1)(object id 42770563)

        10         10         10  TABLE ACCESS BY INDEX ROWID FND_LOOKUP_VALUES (cr=51 pr=1 pw=0 time=3620 us cost=5 size=60 card=1)
        20         20         20   INDEX RANGE SCAN FND_LOOKUP_VALUES_X99 (cr=31 pr=1 pw=0 time=2133 us cost=4 size=0 card=1)(object id 42759866)

    634276     634276     634276  NESTED LOOPS  (cr=3540930 pr=5535 pw=0 time=181518759 us cost=5 size=28 card=1)
    634276     634276     634276   TABLE ACCESS BY INDEX ROWID PO_REQ_DISTRIBUTIONS_ALL (cr=1631471 pr=5253 pw=0 time=65405333 us cost=3 size=12 card=1)
    634276     634276     634276    INDEX UNIQUE SCAN PO_REQ_DISTRIBUTIONS_U1 (cr=994522 pr=5252 pw=0 time=31023194 us cost=2 size=0 card=1)(object id 42788583)
    634276     634276     634276   TABLE ACCESS BY INDEX ROWID PO_REQUISITION_LINES_ALL (cr=1909459 pr=282 pw=0 time=115275921 us cost=2 size=16 card=1)
    634276     634276     634276    INDEX UNIQUE SCAN PO_REQUISITION_LINES_U1 (cr=944449 pr=268 pw=0 time=12285440 us cost=1 size=0 card=1)(object id 42789681)

    511989     511989     511989  NESTED LOOPS  (cr=3533763 pr=6 pw=0 time=8999321 us cost=5 size=55 card=1)
    511989     511989     511989   NESTED LOOPS  (cr=2850293 pr=6 pw=0 time=7086027 us cost=4 size=45 card=1)
    576055     576055     576055    TABLE ACCESS BY INDEX ROWID MTL_ABC_ASSIGNMENT_GROUPS (cr=612378 pr=0 pw=0 time=2002832 us cost=2 size=29 card=1)
    576055     576055     576055     INDEX UNIQUE SCAN MTL_ABC_ASSIGNMENT_GROUPS_U2 (cr=36323 pr=0 pw=0 time=951307 us cost=1 size=0 card=1)(object id 42783622)
    511989     511989     511989    TABLE ACCESS BY INDEX ROWID MTL_ABC_ASSIGNMENTS (cr=2237915 pr=6 pw=0 time=4672006 us cost=3 size=16 card=1)
    511989     511989     511989     INDEX UNIQUE SCAN MTL_ABC_ASSIGNMENTS_U1 (cr=1551490 pr=4 pw=0 time=2533524 us cost=2 size=0 card=1)(object id 42757737)
    511989     511989     511989   TABLE ACCESS BY INDEX ROWID MTL_ABC_CLASSES (cr=683470 pr=0 pw=0 time=1488045 us cost=1 size=10 card=1)
    511989     511989     511989    INDEX UNIQUE SCAN MTL_ABC_CLASSES_U1 (cr=171481 pr=0 pw=0 time=693745 us cost=0 size=0 card=1)(object id 42789694)

     13320      13320      13320  TABLE ACCESS BY INDEX ROWID GL_CODE_COMBINATIONS (cr=34801 pr=0 pw=0 time=802675 us cost=3 size=49 card=1)
     13320      13320      13320   INDEX UNIQUE SCAN GL_CODE_COMBINATIONS_U1 (cr=21481 pr=0 pw=0 time=397344 us cost=2 size=0 card=1)(object id 42775044)


    755164     755164     755164  HASH UNIQUE (cr=30147018 pr=179103 pw=0 time=1058922684 us cost=749257 size=197349453 card=482517)
    768890     768890     768890   HASH JOIN  (cr=7289842 pr=6926 pw=0 time=244582512 us cost=696202 size=197349453 card=482517)
    140451     140451     140451    TABLE ACCESS FULL PER_ALL_PEOPLE_F (cr=38207 pr=0 pw=0 time=313692 us cost=18484 size=13278261 card=428331)
    768890     768890     768890    NESTED LOOPS OUTER (cr=7251635 pr=6926 pw=0 time=242897348 us cost=672652 size=30016980 card=79410)
    755121     755121     755121     NESTED LOOPS OUTER (cr=5538283 pr=6031 pw=0 time=154841427 us cost=443987 size=28382903 card=78623)
    755121     755121     755121      NESTED LOOPS OUTER (cr=5508916 pr=6031 pw=0 time=153523676 us cost=443982 size=18184959 card=51809)
    755121     755121     755121       NESTED LOOPS OUTER (cr=5386279 pr=6031 pw=0 time=151985656 us cost=443978 size=11642422 card=34142)
    755121     755121     755121        NESTED LOOPS  (cr=5090949 pr=6031 pw=0 time=139220421 us cost=375644 size=11574138 card=34142)
    792959     792959     792959         NESTED LOOPS  (cr=1747964 pr=134 pw=0 time=64597738 us cost=109035 size=19934760 card=73560)
    254919     254919     254919          HASH JOIN OUTER (cr=315780 pr=6 pw=0 time=14811187 us cost=29121 size=5413350 card=22650)
    254919     254919     254919           NESTED LOOPS OUTER (cr=286919 pr=0 pw=0 time=12395919 us cost=13792 size=5209500 card=22650)
    254919     254919     254919            HASH JOIN RIGHT OUTER (cr=107134 pr=0 pw=0 time=12153146 us cost=13790 size=3868572 card=17426)
      3834       3834       3834             VIEW  HR_LOCATIONS (cr=3913 pr=0 pw=0 time=15826 us cost=125 size=360 card=60)
      3834       3834       3834              NESTED LOOPS  (cr=3913 pr=0 pw=0 time=15055 us cost=125 size=1080 card=60)
      3834       3834       3834               TABLE ACCESS FULL HR_LOCATIONS_ALL (cr=262 pr=0 pw=0 time=11211 us cost=125 size=304 card=38)
      3834       3834       3834               INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=3651 pr=0 pw=0 time=6183 us cost=0 size=20 card=2)(object id 42783719)
    254919     254919     254919             HASH JOIN RIGHT OUTER (cr=103221 pr=0 pw=0 time=11917174 us cost=13666 size=3764016 card=17426)
      3834       3834       3834              VIEW  HR_LOCATIONS (cr=3898 pr=0 pw=0 time=14651 us cost=125 size=360 card=60)
      3834       3834       3834               NESTED LOOPS  (cr=3898 pr=0 pw=0 time=14267 us cost=125 size=1080 card=60)
      3834       3834       3834                TABLE ACCESS FULL HR_LOCATIONS_ALL (cr=247 pr=0 pw=0 time=9532 us cost=125 size=304 card=38)
      3834       3834       3834                INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=3651 pr=0 pw=0 time=9539 us cost=0 size=20 card=2)(object id 42783719)
    254919     254919     254919              HASH JOIN RIGHT OUTER (cr=99323 pr=0 pw=0 time=11817243 us cost=13541 size=3659460 card=17426)
        45         45         45               INDEX RANGE SCAN FND_LOOKUP_VALUES_U1 (cr=21 pr=0 pw=0 time=614 us cost=4 size=49 card=1)(object id 63685210)
    254919     254919     254919               HASH JOIN RIGHT OUTER (cr=99302 pr=0 pw=0 time=11729251 us cost=13537 size=2805586 card=17426)
        59         59         59                INDEX RANGE SCAN FND_LOOKUP_VALUES_U1 (cr=20 pr=0 pw=0 time=445 us cost=4 size=49 card=1)(object id 63685210)
    254919     254919     254919                NESTED LOOPS  (cr=99282 pr=0 pw=0 time=11653162 us cost=13533 size=1951712 card=17426)
         1          1          1                 NESTED LOOPS OUTER (cr=116 pr=0 pw=0 time=113273 us cost=3 size=40 card=1)
         1          1          1                  NESTED LOOPS  (cr=113 pr=0 pw=0 time=113227 us cost=2 size=32 card=1)
         1          1          1                   INDEX UNIQUE SCAN HR_ALL_ORGANIZATION_UNTS_TL_PK (cr=110 pr=0 pw=0 time=113164 us cost=1 size=17 card=1)(object id 63680720)
         1          1          1                   TABLE ACCESS BY INDEX ROWID HR_ALL_ORGANIZATION_UNITS (cr=3 pr=0 pw=0 time=59 us cost=1 size=15 card=1)
         1          1          1                    INDEX UNIQUE SCAN HR_ORGANIZATION_UNITS_PK (cr=2 pr=0 pw=0 time=7 us cost=0 size=0 card=1)(object id 42789144)
         1          1          1                  TABLE ACCESS BY INDEX ROWID HR_LOCATIONS_ALL (cr=3 pr=0 pw=0 time=42 us cost=1 size=8 card=1)
         1          1          1                   INDEX UNIQUE SCAN HR_LOCATIONS_PK (cr=2 pr=0 pw=0 time=7 us cost=0 size=0 card=1)(object id 42797079)
    254919     254919     254919                 TABLE ACCESS BY INDEX ROWID PO_HEADERS_ALL (cr=99166 pr=0 pw=0 time=11505632 us cost=13530 size=1254672 card=17426)
    255397     255397     255397                  INDEX SKIP SCAN PO_HEADERS_ALL_X3 (cr=1753 pr=0 pw=0 time=725236 us cost=352 size=0 card=37674)(object id 42773719)
    254883     254883     254883            INDEX UNIQUE SCAN AP_TERMS_TL_U1 (cr=179785 pr=0 pw=0 time=183291 us cost=0 size=8 card=1)(object id 42798416)
    482528     482528     482528           TABLE ACCESS FULL AP_SUPPLIER_SITES_ALL (cr=28861 pr=6 pw=0 time=227983 us cost=13727 size=4323123 card=480347)
    792959     792959     792959          TABLE ACCESS BY INDEX ROWID PO_LINES_ALL (cr=1432184 pr=128 pw=0 time=53002963 us cost=5 size=96 card=3)
    793375     793375     793375           INDEX RANGE SCAN PO_LINES_U2 (cr=504726 pr=20 pw=0 time=17603112 us cost=2 size=0 card=5)(object id 42755253)
    755121     755121     755121         TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ALL (cr=3342985 pr=5897 pw=0 time=71357938 us cost=4 size=68 card=1)
   1138558    1138558    1138558          INDEX RANGE SCAN PO_LINE_LOCATIONS_N15 (cr=1707311 pr=5830 pw=0 time=37903421 us cost=3 size=0 card=2)(object id 63697005)
    723002     723002     723002        VIEW PUSHED PREDICATE  HR_LOCATIONS (cr=295330 pr=0 pw=0 time=11391536 us cost=2 size=2 card=1)
    723002     723002     723002         NESTED LOOPS  (cr=295330 pr=0 pw=0 time=11004720 us cost=2 size=18 card=1)
    723002     723002     723002          INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=146911 pr=0 pw=0 time=1391389 us cost=1 size=10 card=1)(object id 42783719)
    723002     723002     723002          TABLE ACCESS BY INDEX ROWID HR_LOCATIONS_ALL (cr=148419 pr=0 pw=0 time=9233363 us cost=1 size=8 card=1)
    723002     723002     723002           INDEX UNIQUE SCAN HR_LOCATIONS_PK (cr=117800 pr=0 pw=0 time=836734 us cost=0 size=0 card=1)(object id 42797079)
    755119     755119     755119       INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=122637 pr=0 pw=0 time=829404 us cost=0 size=20 card=2)(object id 42783719)
    755121     755121     755121      INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=29367 pr=0 pw=0 time=716408 us cost=0 size=20 card=2)(object id 42783719)
    768883     768883     768883     TABLE ACCESS BY INDEX ROWID PO_DISTRIBUTIONS_ALL (cr=1713352 pr=895 pw=0 time=75314769 us cost=3 size=17 card=1)
    768883     768883     768883      INDEX RANGE SCAN PO_DISTRIBUTIONS_N1 (cr=1096671 pr=874 pw=0 time=24392643 us cost=2 size=0 card=1)(object id 42782429)

The plan is a bit long, but you may recall that a query with scalar subqueries in the select list reports the plans for each of the separate scalar subqueries before reporting the main query block – and I’ve inserted blank lines in the output above to improve the visibility of the individual blocks / scalar subqueries.

An odd little detail of this tkprof output was that there was no report of the wait information recorded against the query, though the following information appeared as the summary for the trace file, giving us a very good idea of the wait events for the individual query:

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        6      0.85       2.14          0          6          0           0
Execute      6      0.00       0.00          0          7        104          85
Fetch    50358    279.03    1059.39     179103   30146895          0      755329
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    50370    279.88    1061.54     179103   30146908        104      755414

Misses in library cache during parse: 3

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   50363        0.00          0.00
  SQL*Net message from client                 50362      157.17        227.70
  row cache lock                                141        0.03          0.67
  library cache lock                             77        0.01          0.21
  library cache pin                              75        0.01          0.27
  Disk file operations I/O                      791        0.00          0.01
  gc current block 3-way                     835881        0.15        305.35
  gc current block 2-way                     471360        0.24        144.04
  KJC: Wait for msg sends to complete            40        0.00          0.00
  gc cr multi block request                       8        0.00          0.00
  gc current block congested                  10014        0.03          4.23
  gc cr block 3-way                           20215        0.06          4.69
  gc current grant busy                          20        0.00          0.00
  gc cr grant 2-way                          165010        0.07         25.13
  db file sequential read                    179103        0.05        196.31
  gc cr grant congested                         729        0.19          0.36
  gc current block busy                       71431        0.05        118.15
  gc cr block 2-way                            1800        0.01          0.31
  latch free                                      3        0.00          0.00
  gc cr block congested                         197        0.01          0.06
  latch: cache buffers chains                    45        0.00          0.00
  latch: gc element                              15        0.00          0.00
  gc cr block busy                               15        0.02          0.07
  latch: object queue header operation            1        0.00          0.00
  KSV master wait                                 2        0.00          0.00
  ASM file metadata operation                     1        0.00          0.00
  SQL*Net more data to client                     1        0.00          0.00
  gc current grant 2-way                          6        0.00          0.00

An important initial observation is that the query returned 750,000 rows in 50,000 fetches (all figures rounded for convenience) and that’s consistent with the SQL*Plus default arraysize of 15. So there might be a little time saved by setting the arraysize to a larger value (but only a few 10s of seconds – based on the 227 seconds total minus the 157 second maximum wait for the “SQL*Net message from client” figures and there may be some benefit of increasing the SQL*Net SDU_SIZE at the same time). Critically, though, we should ask “why do you want a query to return 750,000 rows?”, and “how fast do you think is ‘reasonable’?” You’ll also note from the “gc” waits that the system is based on RAC with at least 3 nodes – and RAC is always a suspect when you see unexpected time spent in a query.

Where in the driving query block does most of the time go between the last hash join (line 62) and the hash unique (line 61) – it’s in the query block whose plan starts at line 28 where we see 163,000 physical blocks read (pr=) and 535 seconds (time= microseconds) of which 6,400 blocks come from the index range scan operation at line 32 but most comes from line 31 fetching 1 million rows (by index rowid) from table po_lines_locations_archive_all.

    672743     672743     672743  TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=5507736 pr=163043 pw=0 time=535914552 us cost=3 size=27 card=1)
    672743     672743     672743   INDEX UNIQUE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=4560824 pr=163043 pw=0 time=533161038 us cost=2 size=0 card=1)(object id 42811947)
    755121     755121     755121    SORT AGGREGATE (cr=3540960 pr=163043 pw=0 time=530079821 us)
   1040963    1040963    1040963     TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=3540960 pr=163043 pw=0 time=534243973 us cost=5 size=15 card=1)
   1776649    1776649    1776649      INDEX RANGE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=1123074 pr=6392 pw=0 time=37128373 us cost=3 size=0 card=2)(object id 42811947)

This part of the workload comes from 672,743 executions of the subquery starting at line 36 of the original query text:

           (SELECT pla.promised_date
              FROM apps.po_line_locations_archive_all pla
             WHERE     pla.po_header_id = pha.po_header_id
                   AND pla.po_line_id = pll.po_line_id
                   AND pla.line_location_id = ploc.line_location_id
                   AND pla.revision_num =
                       (SELECT MIN (revision_num)
                          FROM apps.po_line_locations_archive_all plla2
                         WHERE     plla2.promised_date IS NOT NULL
                               AND plla2.line_location_id =
                                   ploc.line_location_id))

If we want to improve the performance of this query with a minimum of re-engineering, recoding and risk then a good point to start would be to examine this query block in isolation and see if there is a simple, low-cost way of improving its efficiency. (Note: this may not be a route to optimising the whole query “properly”, but it may give a quick win that is “good enough”.)

We could go a little further down this route of optimising the scalar subqueries by looking at the time spent in each of them in turn. Taking out the top line of each of the separate sections of the plan and extracting just the pr, pw and time values (which I’ll scale back from microseconds to seconds) we get the following

pr=      0      pw=0    time=   0
pr=     28      pw=0    time=  23
pr=      0      pw=0    time=   2
pr=      0      pw=0    time=   2
pr= 163043      pw=0    time= 536
pr=   3564      pw=0    time=  64
pr=      1      pw=0    time=   0
pr=   5535      pw=0    time= 182
pr=      6      pw=0    time=   9
pr=      0      pw=0    time=   1

The 8th scalar subquery (line 42 in the plan, line 75 in the query) gives us an opportunity to reduce the run time by 182 seconds, so might be worth a little investment in programmer time.

The 6th subquery (line 34 in the plan, line 49 in the query) adds only 64 seconds to the run time, so we might be less inclined to do anything about it.

You might note that the 2nd, 3rd and 4th subqueries are against the same table with the same predicate to get three different columns – this group is the “obvious” choice for recoding as a single join rather than three separate subqueries, but if you look at the total times of the three subqueries the “extra” two executions add only two seconds each to the total time – so although this scalar subquery coding pattern is undesirable, it’s not necessarily going to be worth expending the effort to rewrite it in this case.

If you’re wondering, by the way, why different subqueries are reporting different numbers of rows returned (and each one should return at most one row on each execution), there are two reasons for any subquery to be reporting fewer than the 768,890 rows reported by the basic driving hash join:

  1. an execution may simply return no rows,
  2. there may be some benefits from scalar subquery caching.

One of the nice details about newer versions of Oracle is that the “starts” statistic is also reported in the trace/tkprof output so you would be able to see how much your query had benefited from scalar subquery caching.

If we add together the time reported by each of the scalar subquery sections of the plan the total time reported is approximately 819 seconds. Cross-checking with the difference in the times reported for operations 61 and 62 (hash unique of hash join) we see: 1,059 seconds – 245 seconds = 814 seconds. This is a good match (allowing for the accumulation of a large number of small errors) for the 819 seconds reported in the subqueries – so the hash unique isn’t a significant part of the query even though it has virtually no effect on the volume of data. You’ll note that it didn’t spill to disc (pw = 0) but completed in memory.

Summary

I’ve written a quick note on this query because the coding style was undesirable and the execution plan quite lengthy. I’ve reviewed how the style of the SQL is echoed in the shape of the plan. I’ve then pursued the idea of optimising the code “piece-wise” to see if there were any opportunities for improving the performance “enough” without going through the effort of a complete redesign of the query. [Update: One of the participants in the thread is currently walking through the mechanics of manually unnesting the most expensive scalar subquery into an outer join.]

Given the information in the Row Source Operation section of the tkprof output it proved easy to identify where the largest amounts of times appeared that might be reduced by localised optimsation.

In passing I pointed out the possibility of reducing the time spent on network traffic by increasing the array fetch size, and increasing the SDU_SIZE (at both ends of the connection) for the SQL*Net messages to client.

Footnote (addendum)

I made a passing reference to the waits that told us that the user was running RAC. These waits merit a few follow-up comments.

The numbers for “gc” waits are high. Of particular note are the 71,000 waits and 118 seconds waited on “gc current block busy” which wave a big red flag telling us that there’s too much DML modifying the same object(s) from multiple nodes at the same time. (The even larger numbers for the “gc current block 2/3- way” say the same, but “busy” really emphasises the “hot-spot” aspect of the problem.)

Ideally we would like to see exactly where in the execution plan the bulk of those waits is occurring and, since the OP has been able to supply a trace file for the query, it’s possible that the query can be re-run to produce the SQL Monitor report (if the OP is suitably licenced) that summarises the Active Session History (ASH) for each line of the plan.

If the ASH data were available for a run of the report we could then do some analysis of parameter values recorded in v$active_session_history to see if that supplied further information. Unfortunately the view v$event_name doesn’t tell us what the parameter values mean for most of the “gc current%” waits, but a couple of the ones that do have descriptions report parameter1 as the file# and parameter2 as the block#, so maybe that’s true for many of them. (For some of the waits parameter1 is listed as the “le” (lock element), which doesn’t really help very much.)

Another diagnostic that could be very helpful is to take a snapshot of the session activity stats (v$sesstat) for the session as this might tell us that part of the load comes from “unexpected” work going on. In particular if we do an analysis of the “db file sequential read” waits we may find that many of the waits are for blocks in the undo tablespace, which would prompt us to examine the session stats to see what they tell us through the “% – undo records applied” statistics.

As a final comment – linking the “gc” activity back to my comment “That’s a lot of scalar subqueries so it’s worth asking whether the code should be rewritten to use joins” – if you add an extra table to a query with a simple join condition to add columns from that table to the query select list then Oracle can pin various index blocks; if you replace the join with a scalar subquery (which means you’re after just one column from one row each time) then Oracle has to pin and release all the index blocks on each call to the subquery. The benefit of the subquery approach is that scalar subquery caching may mean the subquery is rarely executed (check the highly suggestive stats in the plan for the first and seventh scalar subquery blocks – lines 16 and 39); the downside to the subquery approach is that you may end up spending a lot more time in buffer cache handling which, for RAC, includes the global cache (gc) management.

July 26, 2022

Hinting

Filed under: Execution plans,Hints,Oracle,Upgrades — Jonathan Lewis @ 1:05 pm BST Jul 26,2022

This is just a lightweight note on the risks of hinting (which might also apply occasionally to SQL Plan Baselines). I’ve just rediscovered a little script I wrote (or possibly last tested/edited) in 2007 with a solution to the problem of how to structure a query to use an “index fast full scan” (index_ffs) following by a “table access by rowid” – a path that is not available to the optimizer for select statements (even when hinted) though it became available (sometimes inappropriately) for deletes and updates in 12c.

It’s possible that this method was something I designed for a client using 9i, but the code still behaves as expected in 11.1.0.7. Here’s the setup and query:

rem
rem     Script:         wildcard.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2007
rem
rem     Last tested
rem             11.1.0.7
rem

create table t1
as
select
        cast(dbms_random.string('a',8) as varchar2(8))  str,
        rpad('x',100)                                   padding
from
        all_objects
where
        rownum <= 10000
;

alter table t1 modify str not null;
create index t1_i1 on t1(str);

begin
        dbms_stats.gather_table_stats(
                user, 't1', 
                cascade => true,
                method_opt => 'for all columns size 1'
        );
end;
/

explain plan for
select  
        /*+ 
                qb_name(main) 
                unnest(@subq1)
                leading(@sel$2fc9d0fe t1@subq1 t1@main)
                index_ffs(@sel$2fc9d0fe t1@subq1(t1.str))
                use_nl(@sel$2fc9d0fe t1@main)
                rowid(@sel$2fc9d0fe t1@main)
        */
        * 
from    t1 
where   rowid in (
                select  /*+ qb_name(subq1) */
                        rowid 
                from    t1 
                where   upper(str) like '%CHD%'
)
;

select * from table(dbms_xplan.display(format=>'outline alias'));

As you can see, I’ve got an IN subquery (query block subq1) to generate a list of rowids from the table for the rows that match my predicate and then my main query (query block main) selects the rows identified by that list.

I’ve added hints to the main query block to unnest the subquery (which will result in a new query block appearing) then do a nested loop from the t1 referenced in subq1 (t1@subq1) to the t1 referenced in main (t1@main), starting with an index fast full scan of t1@subq1 and accessing t1@main by rowid.

The unnest hint was actually redundant – unnesting happened automatically and uncosted. You’ll notice all the other hints are directed at a query block called sel$2fc9d0fe which is the resulting query block name when subq1 is unnested into main.

Here’s the resulting execution plan showing, amongst other details in the Outline Data, that this really was running on 11.1.0.7

Plan hash value: 1953350015

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500 | 65500 |   509   (0)| 00:00:07 |
|   1 |  NESTED LOOPS               |       |   500 | 65500 |   509   (0)| 00:00:07 |
|*  2 |   INDEX FAST FULL SCAN      | T1_I1 |   500 | 10500 |     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS BY USER ROWID| T1    |     1 |   110 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$2FC9D0FE
   2 - SEL$2FC9D0FE / T1@SUBQ1
   3 - SEL$2FC9D0FE / T1@MAIN

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$2FC9D0FE" "T1"@"MAIN")
      LEADING(@"SEL$2FC9D0FE" "T1"@"SUBQ1" "T1"@"MAIN")
      ROWID(@"SEL$2FC9D0FE" "T1"@"MAIN")
      INDEX_FFS(@"SEL$2FC9D0FE" "T1"@"SUBQ1" ("T1"."STR"))
      OUTLINE(@"SUBQ1")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ1")
      OUTLINE_LEAF(@"SEL$2FC9D0FE")
      ALL_ROWS
      DB_VERSION('11.1.0.7')
      OPTIMIZER_FEATURES_ENABLE('11.1.0.7')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(UPPER("STR") LIKE '%CHD%')

Running the test under 19.11.0.0 (and adding the hint_report option to the dbms_xplan format) this is the resulting plan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 | 55000 |    47   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   500 | 55000 |    47   (0)| 00:00:01 |
--------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$48592A03 / T1@MAIN

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$48592A03" "T1"@"MAIN")
      OUTLINE(@"SUBQ1")
      OUTLINE(@"MAIN")
      ELIMINATE_SQ(@"SUBQ1")
      OUTLINE_LEAF(@"SEL$48592A03")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("T1"."STR") LIKE '%CHD%')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 5 (U - Unused (1), N - Unresolved (4))
---------------------------------------------------------------------------
   0 -  SEL$2FC9D0FE
         N -  index_ffs(@sel$2fc9d0fe t1@subq1(t1.str))
         N -  leading(@sel$2fc9d0fe t1@subq1 t1@main)
         N -  rowid(@sel$2fc9d0fe t1@main)
         N -  use_nl(@sel$2fc9d0fe t1@main)

   0 -  SUBQ1
         U -  unnest(@subq1)

Clearly the plan has changed – but the hint report says that Oracle has NOT ignored my hints; instead it tells us that they cannot be resolved. If we check the Query Block Name / Object Alias list and the Outline Data we see why: there is no query block named @sel$2fc9d0fe and the reason it doesn’t exist is that the optimizer has applied a previously non-existent transformation ‘eliminate_sq’ (which appeared in 12c) to subq1.

So, on the upgrade from 11.1.0.7 to 19.11.0.0 an SQL Plan Baseline that forced the path we wanted would no longer work (though it might be reported as “applied”) because there is a new transformation that we had (necessarily) not been blocking.

The solution is easy: add the hint no_eliminate_sq(@subq1) to our query and try again.

We still get the full tablescan even though the hint report tells us that the optimizer used the new hint. Here’s the new Outline Data, and the Hint Report showing that the hint was used.

  Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$8C456B9A" "T1"@"SUBQ1")
      OUTLINE(@"SUBQ1")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ1")
      OUTLINE(@"SEL$2FC9D0FE")
      ELIMINATE_JOIN(@"SEL$2FC9D0FE" "T1"@"MAIN")
      OUTLINE_LEAF(@"SEL$8C456B9A")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("STR") LIKE '%CHD%')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 7 (U - Unused (4))
---------------------------------------------------------------------------
   0 -  SUBQ1
           -  no_eliminate_sq(@subq1)
           -  qb_name(subq1)

   1 -  SEL$8C456B9A
         U -  leading(@sel$2fc9d0fe t1@subq1 t1@main)
           -  qb_name(main)

   1 -  SEL$8C456B9A / T1@MAIN
         U -  rowid(@sel$2fc9d0fe t1@main)
         U -  use_nl(@sel$2fc9d0fe t1@main)

   1 -  SEL$8C456B9A / T1@SUBQ1
         U -  index_ffs(@sel$2fc9d0fe t1@subq1(t1.str))

But now the Outline Data is showing us a new hint – eliminate_join(@sel$2fc9dofe t1@main). So we’re not losing the subquery, but we’ve lost the join thanks to a transformation that was actually available in 10.2 but presumably couldn’t be applied to our code pattern until at least 12.1. So let’s try again adding in another blocking hint no_eliminate_join(@sel$2fc9dofe t1@main).

We still get the full tablescan – and this time the Outline Data tells us that the problem hint is now eliminate_join(@sel$2fc9dofe t1@subq1) – which we might have anticipated, and now address by adding no_eliminate_join(@sel$2fc9dofe t1@subq1) to the query and having one more go. This finally gets us back to the path that we had previously seen in 11.1.0.7.

(In fact, adding the hint optimizer_features_enable(‘11.1.0.’) to the original set of hints would have been – in this case, at least – would have been enough to get the original execution plan.)

Summary

This note is just another simple demonstration that hints do not guarantee plan stability across upgrades – and then showing that it can take a few experimental steps to discover what’s new in the optimizer that is making your previous set of hints ineffective.

Typically the problem will be the availability of new transformations (or enhancements to existing transformations) which manage to invalidate the old hints before the optimizer has had a chance to consider them. This is (to some extent) why a SQL Plan Baseline always captures the value of optimiser_features_enable() as part of the baseline.

July 22, 2022

Trim CPU

Filed under: Execution plans,Hash Join,Joins,Oracle,Performance,Problem Solving — Jonathan Lewis @ 6:56 am BST Jul 22,2022

Prompted by an unexpectedly high CPU usage on a hash join of two large dadta sets Stefan Koehler posted a poll on twitter recently asking for opinions on the following code fragment:

FROM
    TAB1
INNER JOIN TAB2 ON
    TAB1.COL1 = TAB2.COL1
AND TRIM(TAB1.COL3) > TRIM(TAB2.COL3)

While I struggle to imagine a realistic business requirement for the second predicate and think it’s indicative of a bad data model, I think it is nevertheless quite instructive to use the example to show how a hash join can use a lot of CPU if the join includes a predicate that isn’t on equality.

Trivia

Before examining the potential for wasting CPU, I’ll just point out two problems with using the trim() function in this way – because (while I hope that col3 is character string in both tables) I’ve seen code that uses “to_date(to_char(date_column))” instead of trunc(date_column):

Cut-n-paste from SQL*Plus:

SQL> select 1 from dual where trim(100) > trim(20);

no rows selected

==================================================================

SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

SQL> select d1, d2 from t2 where trim(d1) > trim(d2);

20-jul-2022 15:24:46 19-aug-2022 15:26:44

1 row selected.

SQL> alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';

SQL> select d1, d2 from t2 where trim(d1) > trim(d2);

no rows selected

The trim() function converts numerics and dates to strings using the default format for the session before the comparison takes place, so not only can you get unexpected (i.e. wrong) results, two users can get contradictory results from the same data at the same time because they’ve specified different session defaults!

The CPU issue

The critical point that everyone should remember is this: hash joins can only operate on equality (though, to avoid ambiguity, one should point out that “equality” does also mean “not equals”, which is why hash anti-joins can be efficient).

This means that even though the clause “where tab1.col1 = tab2.col1 and tab1.col3 > tab2.col3” might specify the matching rows for an individual tab1 row with high precision and great efficiency for a nested loop join with the right index, a hash join has a completely different workload. Every tab1 row has to have its col3 compared with every tab2 row that matches on col1. The secondary tests multiply up to “n-squared”, and if any col1 value is is highly repetitive then the work done on checking col3 becomes excessive.

It’s easier to see this in a worked example, so here’s some sample data:

rem
rem     Script:         trim_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table tab1 as select * from all_Objects where owner != 'PUBLIC' and object_type != 'SYNONYM' and rownum <= 200;

create table tab2 as select * from all_Objects where owner != 'PUBLIC' and object_type != 'SYNONYM';

On a new pdb in 19.11 and 21.3 the second statement gave me roughly 46,000 rows. checking owners and row counts I got the following results:

SQL> select owner, count(*) from tab1 group by owner;

OWNER                      COUNT(*)
------------------------ ----------
SYS                             128
SYSTEM                           65
OUTLN                             7

SQL> select owner, count(*) from tab2 group by owner;

OWNER                      COUNT(*)
------------------------ ----------
SYS                           40104
SYSTEM                          417
OUTLN                             7

... plus about 17 rows aggregating 6,000 rows

And here’s the query (indicating 4 variations) that I’m going to use to demonstrate the CPU issue, followed by its execution plan and rowsource_execution_statistics:

set serveroutput off
alter session set statistics_level = all;

select
        count(*)
from
        tab1
inner join 
        tab2 
on
        tab1.owner = tab2.owner
-- and  trim(tab1.object_name)  > trim(tab2.object_name)
-- and  rtrim(tab1.object_name) > rtrim(tab2.object_name)
-- and  ltrim(tab1.object_name) > ltrim(tab2.object_name)
and     tab1.object_name > tab2.object_name
;

select * from table(dbms_xplan.display_cursor(format=>'projection allstats last'));

SQL_ID  74m49y5av3mpg, child number 0
-------------------------------------
select  count(*) from  tab1 inner join  tab2 on  tab1.owner =
tab2.owner -- and trim(tab1.object_name)  > trim(tab2.object_name) -- and rtrim(tab1.object_name) > rtrim(tab2.object_name) 
-- and ltrim(tab1.object_name) > ltrim(tab2.object_name) and tab1.object_name > tab2.object_name

Plan hash value: 2043035240

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.39 |     942 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.39 |     942 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    101K|    329K|00:00:00.39 |     942 |  1335K|  1335K|  814K (0)|
|   3 |    TABLE ACCESS FULL| TAB1 |      1 |    200 |    200 |00:00:00.01 |       5 |       |       |          |
|   4 |    TABLE ACCESS FULL| TAB2 |      1 |  46014 |  46014 |00:00:00.01 |     937 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("TAB1"."OWNER"="TAB2"."OWNER")
       filter("TAB1"."OBJECT_NAME">"TAB2"."OBJECT_NAME")

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]
   2 - (#keys=1; rowset=407)
   3 - (rowset=256) "TAB1"."OWNER"[VARCHAR2,128], "TAB1"."OBJECT_NAME"[VARCHAR2,128]
   4 - (rowset=256) "TAB2"."OWNER"[VARCHAR2,128], "TAB2"."OBJECT_NAME"[VARCHAR2,128]

Comparing the basic colums the CPU time recorded at the Hash Join operation was 0.39 seconds, of which only a tiny amount was in the feeding tablescans. There are two things to note from the plan.

First is confirmation of my comments about the join having to be an equality and the inequality being applied later. You can see this in the Predicate Information in the way the user’s predicate list has been split at operation 2 into an access() predicate and a filter() predicate. The access predicate finds the right hash bucket and row(s) within bucket – the filter predicate is applied as a secondary test.

The second point to note is that the Column Projection Information shows us that the basic column values are passed up to the Hash Join, which tells us that the hash join operation has to do the trimming. The big question at that point is – how many times does the same value from the same incoming row get trimmed.

Remember that there are 128 rows in tab1 where where owner = ‘SYS’, so when a ‘SYS’ row arrives from tab2 the hash join has to find the right bucket then walk through the rows in that bucket (which will probably be nothing but those SYS rows). So how many times does Oracle evaluate trim(SYS). Arguably it needs to for each tab1 row in the bucket (though the hash table might have been built to include the trimmed value) but clearly it ought not to re-evaluate it 128 times for the column in the single tab2 row – and we’ll come back to that point later.

Let’s go back to the 3 variants on the first test. We were interested in the comparing trim() with trim(), but since trim() is equilavent to ltrim(rtrim()) I wondered if ltrim (left trim) and rtrim (right trim) took different amount of time, and whether the trim() time would be close to the sum of ltrim() time and rtrim() time.

Without showing the plans etc. here are the time reported in my 19.11.0.0 test at the hash join operation (the 21.3 times were very similar):

  • no trim – 0.39 seconds
  • ltrim() – 1.02 seconds
  • rtrim() – 2.66 seconds
  • trim() – 2.70 seconds.

Clearly that’s a lot of extra CPU on top of the base CPU cost. This is not entirely surprising since string operations tend to be expensive, neverthless the differences are large enough to be more than random fluctuations and operational error.

Remember that this is just two tables of 200 and 46,000 rows respectively. It turned out that the rowsources that Stefan was using were in the order of 800K and 2M rows – with CPU time increasing from 1,100 seconds to 2,970 seconds because of the trim().

So how many times was the trim() function called in total?

Faking it.

If we assume that the trim() built-in SQL function behaves in the same way as a deterministic PL/SQL function we can at least count the number of calls that take place by writing a deterministic function to put into the SQL. Something like:

create or replace package p1 as
        n1 number;
        function f1(v1 in varchar2) return varchar2 deterministic;
end;
/

create or replace package body p1 as 

        function f1 (v1 in varchar2)
        return varchar2 
        deterministic
        is
        begin
                p1.n1 := p1.n1 + 1;
                return trim(v1);
        end;

end;
/

set serveroutput off
alter session set statistics_level = all;

exec p1.n1 := 0

select
        count(*)
from
    tab1
inner join tab2 on
    tab1.owner = tab2.owner
and     p1.f1(tab1.object_name) > p1.f1(tab2.object_name)
-- and  p1.f1(tab1.object_name) > trim(tab2.object_name)
-- and  trim(tab1.object_name)  > p1.f1(tab2.object_name)
;

select * from table(dbms_xplan.display_cursor(format=>'projection allstats last'));

set serveroutput on
execute dbms_output.put_line(p1.n1);

I’ve created a package with a public variable n1 so that I can set it and read it from “outside”, then I’ve created (and lied about) a function that increments that variable and returns its input, claiming that it’s deterministic. Once I’ve got the package in place I’ve:

  • set the variable to zero
  • run a query that does one of
    • use my function twice
    • use my function once – on the build table
    • use my function once – on the probe table
  • report the execution plan with stats
  • print the value of the variable

The timings are not really important, but here’s the execution plan when I used the function on both sides of the inequality:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:21.15 |    1513 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:21.15 |    1513 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |  23007 |    329K|00:00:21.13 |    1513 |  1335K|  1335K|  860K (0)|
|   3 |    TABLE ACCESS FULL| TAB1 |      1 |    200 |    200 |00:00:00.01 |       5 |       |       |          |
|   4 |    TABLE ACCESS FULL| TAB2 |      1 |  46014 |  46014 |00:00:00.02 |     937 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TAB1"."OWNER"="TAB2"."OWNER")
       filter("P1"."F1"("TAB1"."OBJECT_NAME")>"P1"."F1"("TAB2"."OBJECT_NAME"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]
   2 - (#keys=1)
   3 - (rowset=256) "TAB1"."OWNER"[VARCHAR2,128], "TAB1"."OBJECT_NAME"[VARCHAR2,128]
   4 - "TAB2"."OWNER"[VARCHAR2,128], "TAB2"."OBJECT_NAME"[VARCHAR2,128]

Apart from the change of function name the plan is the same – although it now takes over 21 CPU seconds to complete, of which most of the time is probably building and tearing down the PL/SQL stack. The important figure, though is the number of function calls I saw recorded in p1.n1: it was a little over 10 million calls to generate the 329 thousand rows (A-Rows for the hash join).

When I ran the code with only one call to my deterministic function it was called 5 million times regardless of whether it was used for the build or probe table. Oracle did nothing to minimise the number of times the function was called.

Predictive Mode

Near the start of this note I showed you a little query to aggregate the rows of the two tables by owner; with a little enhancement I can reuse that code to show you exactly how many times the deterministic function was called:

select
        v1.owner, ct1, ct2, ct1 * ct2, sum(ct1 * ct2) over() tot_ct
from
        (select owner, count(object_name) ct1 from tab1 group by owner) v1,
        (select owner, count(object_name) ct2 from tab2 group by owner) v2
where
        v2.owner = v1.owner
/

OWNER                  CT1        CT2    CT1*CT2     TOT_CT
--------------- ---------- ---------- ---------- ----------
SYS                    128      40104    5133312    5160466
SYSTEM                  65        417      27105    5160466
OUTLN                    7          7         49    5160466

3 rows selected.

The number of comparisons done by the filter() predicate 5,160,466: double it to get the number of function calls. For every single one of the 40,104 SYS rows in tab2 the function was called for every single one of the SYS rows in tab1, for both sides of the inequality.

It’s a shame that Oracle doesn’t calculate and project the “virtual columns” that will be used in the join predicates, because in my case that would have reduced the number of calls from 10 million to 40,232 – a factor of roughly 250. That would probably be worth a lot of CPU to Stefan.

Damage Limitation

For my silly little query that went from 0.39 seconds to 2.70 seconds you might decide there’s no point in trying to improve things – in fact many of the sites I’ve visited probably wouldn’t even notice the CPU wastage (on one call); but when the query runs for 2,970 seconds and a little fiddling around shows that it could run in 1,100 seconds you might be inclined to see if there’s something you could do improve things.

Andrew Sayer suggested the possibility of rewriting the query with a pair of CTEs (“with” subqueries) that were forced to materialize the trim() in the CTE. The cost of physically creating the two large GTTs might well be much less than the CPU spent on the trim()ed join.

Alternatively – and dependent on the ownership and quality of the application – you could write a check constraint on each table to ensure that the column value was always equal to the trim() of the column value.

A similar option would be to add an (invisible) column to each table and use a trigger to populate the column with the trimmed value and then use the trimmed column in the query.

Conclusion

I don’t think that anything I’ve done or described in this note could be called rocket science (or telescope science as, perhaps, it should be in honour of Webb); but it has shown how much insight you can gain into what Oracle is doing and how you may be able to pin-point excess work using a few simple mechanisms that have been around for more than 15 years.

July 21, 2022

Pagination cost

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 3:56 pm BST Jul 21,2022

There’s a thread on the MOSC database tuning forum (needs an account) at the moment asking why a “fetch first N” query to fetch next 41 rows with an offset of 8602 rows takes longer to run than the same query when the offset is zero rows. Here’s a possible answer with a little lesson in thinking about what’s going on.

Apart from gremlins in the system there are two possible reasons

  • nothing has changed, but it takes longer to fetch 8643 rows in order and discard 8602 of them than it takes to fetch 41 rows in order and discard none
  • the optimizer has worked out that if it has to fetch 8643 rows then it ought to use a different plan but (as often happens) it was a bad idea to change the plan.

Here’s a little script to build some demo data.

rem
rem     Script:         fetch_first_offset.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1 
as
select 
        * 
from 
        all_objects
where   rownum <= 50000
order by 
        dbms_random.value
/

create index t1_i1 on t1(object_name);

alter session set statistics_level = all;
set serveroutput off
set feedback off

column owner format a20
column object_type format a12
column object_name format a32

All I’ve done is create a table of 50,000 rows with an order by clause that maximises the randomness of the data pattern so that the index on object_name will have a very high clustering_factor.

Here’s the first query I’m going to run, followed by the execution plan, pulled from memory with rowsource execution stats enabled. I’ve queried for the first 20 rows (offset 0 next 20) ordered by object_name:

select
        owner, object_type, object_name
from
        t1
order by
        object_name
offset 
        0 rows
fetch next 
        20 rows only
/

select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last'));


SQL_ID  fmdb8vuxwkp99, child number 0
-------------------------------------
select  owner, object_type, object_name from  t1 order by  object_name
offset  0 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    23 (100)|     20 |00:00:00.01 |      25 |      2 |
|*  1 |  VIEW                         |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     20 |     3   (0)|     20 |00:00:00.01 |       5 |      2 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=20 AND
              "from$_subquery$_002"."rowlimit_$$_rownumber">0))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=20)

As you can see from the execution plan Oracle has used an index full scan (because that will access the data in exactly the right order and allows the “nosort stopkey” on the “window (no)sort” operation). It has fetched (A-Rows) 20 rows and reported a cost of 23 – which basically corresponds to 3 block visits for the index and one block visit for each row from the table. In passing you’ll notice from the Predicate Information at operation 2 that Oracle has transformed our “fetch first” into an analytic query using row_number() over(). The phrase “syntactic sugar” seems appropriate.

How do things change if we ask for the 2nd 20 rows – (offset 20, next 20). I’ll show only the output from dbms_xplan, including its slightly mangled SQL statement but dropping the Predicate Information:

select  owner, object_type, object_name from  t1 order by  object_name
offset  20 rows fetch next  20 rows only

Plan hash value: 3254925009

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    43 (100)|     20 |00:00:00.01 |      45 |
|*  1 |  VIEW                         |       |      1 |     40 |    43   (0)|     20 |00:00:00.01 |      45 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     40 |    43   (0)|     40 |00:00:00.01 |      45 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  49999 |    43   (0)|     40 |00:00:00.01 |      45 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     40 |     3   (0)|     40 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------------------

As you can see, the optimizer has still decided to use the index full scan, and this time has fetched 40 rows and passed them up the plan until at operation 1 it discards the first 20 rows. The cost (43) is again related to 3 blocks for the index, 40 blocks for 40 rows from the (randomly distributed) table.

What would we see if we added a /*+ full(t1) */ hint to the query to force a tablescan to get the 2nd 20 rows?

select /*+ full(t1) */  owner, object_type, object_name from  t1 order
by  object_name offset  20 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.01 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |     40 |   947   (1)|     20 |00:00:00.01 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|     40 |00:00:00.01 |     996 | 15360 | 15360 |14336  (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Oracle has obeyed the hint, and the tablescan has fetched all 50,000 rows from the table and sorted them. Fortunately the optimizer knows that it needs only the top 40 rows so it has been discarding rows as it sorts, hence the appearance of the “pushed rank” in the “window sort” at operation 2; we haven’t had to create a sorted list of all 50,000 rows before picking the top 40. Again, once we’ve got the top 40 we discard the top 20 to allow for the offset.

We note that the cost of the tablescan was 278 but the cost of the sort was really rather large, taking the total cost of this path to 947. So here’s a thought experiment – what’s likely to happen if we ask for an offset of 940 and next 20? Given the costs we’ve seen for the indexed access path the optimizer will calculate a cost of 3 (plus a bit, maybe) for the index and a cost of 960 for visiting the table giving a total cost of about 963 – which should make the tablescan strategy the lower cost.

select  owner, object_type, object_name from  t1 order by  object_name
offset  940 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.02 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |    960 |   947   (1)|     20 |00:00:00.02 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|    960 |00:00:00.02 |     996 |   267K|   267K|  237K (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

My guesswork about the cost seems to have been nearly right. Unhinted, with an offset of 940 (which you can see as the 960 rows fetched) the optimizer has decided that the tablescan path has a lower cost than the indexed access.

Of course we ought to check this by hinting the indexed access path and seeing what its cost is:

select  /*+ index(t1) */  owner, object_type, object_name from  t1
order by  object_name offset  940 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |   969 (100)|     20 |00:00:00.01 |     968 |      4 |
|*  1 |  VIEW                         |       |      1 |    960 |   969   (1)|     20 |00:00:00.01 |     968 |      4 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |  50000 |     9   (0)|    960 |00:00:00.01 |       9 |      4 |
-----------------------------------------------------------------------------------------------------------------------

The cost of the indexed access path is 969 – that’s 960 for the randomly scattered table rows we need plus 9 for the index scan (because at 960 index entries we’re going to visit a few more index leaf blocks than the original 3).

Summing Up

I’ve demonstrated with a simple query using “offset N rows fetch M rows only” that the optimizer will calculate the cost of fetching “N + M” rows using whatever paths are available, then pick the lowest cost path.

As you might expect, the presence of a suitable index will encourage the optimizer to walk the index in order jumping randomly around the table to avoid the cost of acquiring all the relevant data and sorting it. So for “small” values of “offset + next” Oracle might choose an indexed access path with “window nosort stopkey”, but for “large” values of “offset + next” it might choose to do a full tablescan with “window sort pushed rank”.

The consequence of this is that – in the same way we see the optimizer switching between nested loops and hash joins at the wrong moment – we may see the optimizer switch from an indexed access path to a full tablescan either too soon, or too late.

Answering the question

Why did the query with an offset of 8602 take so much longer than the query with an offset of zero when the next was only 41 rows?

It may be that the optimizer stuck with an indexed access path and had to do physical reads of 8,643 blocks when it should have switched to a tablescan.

It may be that the optimizer switched to a tablescan and sort when it should have stuck with using an index on well-clustered, well-cached, data.

As so often happens, the first step to answering an SQL performance question is to look at the actual execution plans.

July 15, 2022

Index Wildcard

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 11:51 am BST Jul 15,2022

This is a little case study on working around a problem caused by a 3rd-party application that couldn’t be changed quickly and easily. There’s a touch of irony in this case as the problem I’ll address is a side effect of the best possible workaround to a more generic design flaw in the application, moreover the side effect appears because of a special case optimisation mechanism introduced in 12.2.

A recent post on the MOS database forum (needs an account) came up with a slightly surprising problem – a query with a predicate “columnX like ‘%XYZ%” would perform reasonably well if hinted to use a B-tree index on columnX (doing an index full scan) but performed badly when cursor_sharing was set to force because it switched to an index range scan which did a lot more work.

Modelling

Here’s a little script to build some data to demonstate the issue:

rem
rem     Script:         index_wildcard.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0

create table t1
as
select  obj.*
from
        (select * from all_objects where rownum <= 50000) obj,
        (select rownum from dual connect by level <= 5) gen
/

create index t1_id on t1(object_name);

alter session set cursor_sharing = force;
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

set serveroutput off

All I’ve done is create a simple heap table with 250,000 rows and an index (which happens to be on a column declared as NOT NULL, but that detail is not important). Here’s my query:

select  /*+ index(t1(object_name)) cursor_sharing_exact */
        t1.*
from
        t1 t1
where
        object_name like '%XYZ%'
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

In normal circumstances the leading wildcard would mean that the optimizer would not be able to use an index driven plan (unless the query satisfied the requirements to allow an index fast full scan), so the default plan would have been a tablescan with a selectivity of 5% (estimated rows = 12,500).

Because I’ve hinted the use of an index I get the following plan from a call to dbms_xplan.display_cursor():

LAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  1q1p6fyubhafu, child number 0
-------------------------------------
select /*+ index(t1(object_name)) cursor_sharing_exact */  t1.* from
t1 t1 where  object_name like '%XYZ%'

Plan hash value: 2798063786

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.08 |    1718 |   1717 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.08 |    1718 |   1717 |
|*  2 |   INDEX FULL SCAN                   | T1_ID |      1 |  12500 |      0 |00:00:00.08 |    1718 |   1717 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("OBJECT_NAME" LIKE '%XYZ%')

Because I’ve set cursor_sharing to force I’ve included the hint /*+ cursor_sharing_exact */ in the query to make sure that Oracle sees the query with a literal rather than a system generated bind variable. The plan tells us that the optimizer has picked an index full scan, and the Predicate Information for operation 2 tells us that Oracle has checked our predicate for every single entry in the index, and the A-rows column tells us that it didn’t find any matching index entries. So the total work done was 1718 buffer gets and 250,000 index entries tested. (You’ll notice that the E-rows is still the standard 5% guess.)

Now let’s try again, allowing enforced cursor sharing to be applied (i.e. taking out the cursor_sharing_exact hint):

SQL_ID  cm4tjwuqxgbqg, child number 0
-------------------------------------
select /*+ index(t1(object_name)) */  t1.* from  t1 t1 where
object_name like :"SYS_B_0"

Plan hash value: 539998951

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.26 |     241K|
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.26 |     241K|
|*  2 |   INDEX RANGE SCAN                  | T1_ID |      1 |  12500 |    250K|00:00:00.04 |    1718 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_NAME" LIKE :SYS_B_0)
   2 - access("OBJECT_NAME" LIKE :SYS_B_0)

The SQL text shows us that the literal has been replaced by the bind variable :SYS_B_0. Operation 2 has changed from an index full scan to an index range scan – which “feels” as if “the range” should be the whole index and the query shouldn’t, therefore, do any more work than the index full scan plan. But the Buffers column now shows 241K buffer visits at operation 1, the table access! What’s gone wrong?

In the case of the OP the unexpected visit to the table blocks didn’t do just a few hundred thousand “unnecessary” buffer visits, it did 4 million buffer gets and 127,000 disk reads.

Look at the Predicate Information (always): the filter() that matched our original predicate is no longer being applied to the index, it’s only applied at the table. The only predicate applied to the index is an access() predicate – so Oracle takes the bind value and works out that the range scan should start at the beginning of the index and stop at the end of the index – that’s what an access() predicate does, it doesn’t check the actual values in the index entries once it has derived the start and stop locations.

Trouble-shooting

You will have noticed in the initial code to set up the data I had a call to set event 10053 (the CBO trace event) – a fairly quick and easy option to look at in this case since I’m looking at such a simple query. The critical details that showed up in the trace files were:

  • in both cases the optimizer reported “User hint to use this index” in the Base Statistical Information.
  • in the case of the “literal string” SQL the optimizer did not consider an index full scan
  • in the case of the “forced bind” SQL the optimizer did not consider an index range scan

This is worth knowing as it tells us that the switch between paths was not an “accident” of circumstances or statistics, it was a deliberately code strategy; so we need to discover whether there is something we can do to change the strategy.

Time to check the hidden parameters and fix controls, and this is what I found in a listing of the 19 .11.0.0 system fix controls (listed as appearing in 12.2.x.x):

     BUGNO OPTIMIZE SQL_FEATURE                        DESCRIPTION

---------- -------- ---------------------------------- ---------------------------------------------------------------- 
  20289688 12.2.0.1 QKSFM_ACCESS_PATH_20289688         check for leading wildcard in LIKE with bind 

Doesn’t that sound a little bit promising? Let’s turn the fix off by adding the hint /*+ opt_param(‘_fix_control’ ‘20289688:0’) */ to the problem query and see what we get:

SQL_ID  75cxtymdrh8r2, child number 0
-------------------------------------
select /*+   index(t1(object_name))   opt_param('_fix_control'
'20289688:0')  */  t1.* from  t1 t1 where  object_name like :"SYS_B_0"

Plan hash value: 539998951

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.08 |    1718 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.08 |    1718 |
|*  2 |   INDEX RANGE SCAN                  | T1_ID |      1 |  12500 |      0 |00:00:00.08 |    1718 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME" LIKE :SYS_B_0)
       filter("OBJECT_NAME" LIKE :SYS_B_0)

We still have an index range scan (with a range that is obviously the whole index), but our predicate is now being applied as a filter() in the index with a total of only 1,718 buffer gets even though the literal has been replaced by a system-generated bind variable.

Summary

The application is accepting a search value from the user interface and constructing “literal string” SQL that adds a leading and trailing wildcard to the value. As a result of this the optimizer will not consider using a particular index unless hinted, and does a very expensive tablescan.

The optimizer will accept and use an index() hint to produce a plan that is much more efficient than the tablescan even though it has to use an index full scan and apply a filter predicate on every index entry.

Unfortunately cursor_sharing has been set to force to deal with the general parsing overheads of applications that construct “literal string” SQL, and when the index() hint is applied to a bind-variable version of this query the optimizer applies an index range scan and postpones the filter() until it has acquired rows from the table.

A check of the system fix controls identified a fix that looked as if it might have some relevance to our situation, “leading wild card, bind variable, LIKE clause”, and a test with an opt_param() hint to switch off that optimizer “fix” had the effect we wanted – the optimizer still reported an index range scan but it also used our initial predicate as a filter() predicate on the index.

Technically, since the fix_control was introduced in 12.2.0.1 we could have got the same effect by setting the optimizer_features_enable back to 12.1.0.2 – but that’s not a good idea when the production version is so much newer although it might be an acceptable strategy when implemented as a hint for a few special case queries.

After showing that the fix control works by hinting the next step should be to create an SQL Patch (using dbms_sqldiag.create_sql_patch) attached to the (forced) bind-variable version of the query.

Update (a few hours later)

It didn’t take long to find that the fix_control and the optimzer_features_enable were not the only workarounds. There’s a hidden parameter “_optim_peek_user_binds” with the description “enable peeking of user binds” default value “true”, so if you set this to false the optimizer can’t notice that the critical bind value starts with a wildcard and behaves the way the OP wants. As with the other options you could set this as a system parameter or a session parameter, or as a hint (SQL patch etc.) /*+ opt_param(‘_optim_peek_user_binds’ ‘false’) */ – the last being the kindest for the system.

Update (21st July)

Bug number 33500777 (unpublished) looks like an exact match for the problem; the description reads:

FILTER FOR PREDICATE WITH LIKE AND BIND WITH LEADING WILDCARD IS NOT ALLOCATED ON INDEX SCAN AFTER FIX 20289688

June 17, 2022

Case Study

Filed under: Execution plans,Oracle,Parallel Execution,Performance,Problem Solving,Troubleshooting,Tuning — Jonathan Lewis @ 1:25 pm BST Jun 17,2022

The question “How do you trouble-shoot a performance problem” came up in an online session I did for the AIOUG (All-India OUG) today. It’s a very broad question and the only possible answers are either extremely generic, or very specific – so here’s a specific example that I talked about to give some idea of the processes I go through. It’s an example from the Oracle-l list server asking the (paraphrased) question:

I’ve got a parallel query that take 24 seconds to return 2,500 rows for 2018. .The same query for 2019 should return a similar amount of data but consumes a lot of TEMP space before failing; it takes 45 minutes to complete if I remove the parallel hint. The only difference between the two queries is a change to a single predicate: “opclf.year_number = to_number(‘YYYY’)” and the statistics are up to date – what’s going on?

The ease of diagnosing a problem is dependent on the ease of access to all the right information, and you don’t always know initially what the right information might be. In this case the OP had started by posting to github the SQL Monitor reports for the two queries, which were automatically generated since the queries were using parallel execution.

Click here to show/hide the first few sections for the 2019 output
SQL Text
------------------------------
SELECT /*+ PARALLEL(8) */ DISTINCT rd.document_id AS doc_id ,'firm_id' AS criteria_key ,opf.ultimate_parent_firm_id AS series_id ,period_number FROM ( SELECT /*+ PARALLEL(8) */ DISTINCT rd.document_id ,rd.client_role_id ,( CASE WHEN MONTHS_BETWEEN(TO_DATE('04/28/2022', 'MM/DD/YYYY'), TO_DATE('04/01/2017', 'MM/DD/YYYY')) > 12 THEN TRUNC(dc.date_value, 'YEAR') ELSE TRUNC(dc.date_value, 'MONTH') END ) period_number FROM REPORT_ENGINE.date_code dc ,REPORT_ENGINE.lit_fact_bd rd INNER JOIN
report_engine.firm FIRM ON rd.firm_id = FIRM.firm_id WHERE dc.date_value BETWEEN TO_DATE('04/01/2017', 'MM/DD/YYYY') AND TO_DATE('04/28/2022', 'MM/DD/YYYY') AND rd.publication_date_id = dc.date_id AND rd.year_number = to_number('2019') AND (FIRM.ultimate_parent_firm_id IN (to_number('252094'))) ) rd INNER JOIN report_engine.opposing_counsel op ON rd.client_role_id = op.legal_role_id INNER JOIN report_engine.lit_fact_bd opclf ON opclf.document_id = rd.document_id AND op.opposing_counsel_role_id
= opclf.client_role_id AND opclf.year_number = to_number('2019') INNER JOIN report_engine.firm opf ON opclf.firm_id = opf.firm_id AND opf.firm_id >= 1000 WHERE 1 = 1

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  SYS (647:28741)
 SQL ID              :  3cjg20q2vw41f
 SQL Execution ID    :  16777216
 Execution Started   :  06/09/2022 05:08:24
 First Refresh Time  :  06/09/2022 05:08:25
 Last Refresh Time   :  06/09/2022 05:08:27
 Duration            :  221s
 Module/Action       :  sqlplus@c111dhw (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  sqlplus@c111dhw (TNS V1-V3)

Global Stats
====================================================================
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Write | Write |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes |
====================================================================
|     222 |     179 |       28 |       15 |    49M | 11624 |   2GB |
====================================================================

Parallel Execution Details (DOP=8 , Servers Allocated=16)
=====================================================================================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |  Other   | Buffer | Write | Write |         Wait Events         |
|                |       |         | Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes |         (sample #)          |
=====================================================================================================================================
| PX Coordinator | QC    |         |    0.26 |    0.26 |          |     0.00 |  12144 |       |     . |                             |
| p000           | Set 1 |       1 |         |         |          |          |        |       |     . |                             |
| p001           | Set 1 |       2 |         |         |          |          |        |       |     . |                             |
| p002           | Set 1 |       3 |         |         |          |          |        |       |     . |                             |
| p003           | Set 1 |       4 |         |         |          |          |        |       |     . |                             |
| p004           | Set 1 |       5 |         |         |          |          |        |       |     . |                             |
| p005           | Set 1 |       6 |         |         |          |          |        |       |     . |                             |
| p006           | Set 1 |       7 |         |         |          |          |        |       |     . |                             |
| p007           | Set 1 |       8 |         |         |          |          |        |       |     . |                             |
| p008           | Set 2 |       1 |     220 |     177 |       28 |       15 |    48M | 11624 |   2GB | direct path write temp (28) |
| p009           | Set 2 |       2 |         |         |          |          |        |       |     . |                             |
| p010           | Set 2 |       3 |         |         |          |          |        |       |     . |                             |
| p011           | Set 2 |       4 |    1.71 |    1.70 |          |     0.01 |   595K |       |     . |                             |
| p012           | Set 2 |       5 |         |         |          |          |        |       |     . |                             |
| p013           | Set 2 |       6 |         |         |          |          |        |       |     . |                             |
| p014           | Set 2 |       7 |         |         |          |          |        |       |     . |                             |
| p015           | Set 2 |       8 |         |         |          |          |        |       |     . |                             |
=====================================================================================================================================

You’ll note that I haven’t got as far as the body of the execution plan yet, and I’ve highlighted line 44 – a line in the middle of the summary of activity for the parallel execution processes. There are 8 servers in each of two sets (we’re running parallel 8) and the line I’ve highlighted is the first server of the second set. The reason I’ve highlighted it is that it’s the one server that’s doing almost all the work – effectively the query (at some point in the plan) is running serially.

So, a first thought, maybe we’ve just been unlucky and running parallel 7 (say) would result in a better distribution of data across parallel servers and allow each of 7 processes to do whatever they had to do to a much smaller amount of data. Maybe a change to the distribution method (pq_distribute() hint) would spread the workload more evenly. In either case “N” smaller workload might still use a lot of TEMP, but possibly no individual process would fail, and the job would complete nearly N times as quickly.

Note: the 2018 Monitor report showed an equivalent skew in the data distribution, but the amount of work needed was much smaller with a read/write load of only 143MB compared to 2GB for the 2019 report. The OP did try running parallel 7, but with no change to the overall effect.

Let’s take a quick glance at the plan body (click to view/hide)
SQL Plan Monitoring Details (Plan Hash Value=1862006233)
=========================================================================================================================================================================================================================
| Id    |                     Operation                      |           Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Write | Write | Mem | Temp | Activity |        Activity Detail       |
|       |                                                    |                           | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes |     |      |   (%)     |         (# samples)         |
=========================================================================================================================================================================================================================
|     0 | SELECT STATEMENT                                   |                           |         |      |           |        |     3 |          |       |       |     |      |   |                             |
|     1 |   PX COORDINATOR                                   |                           |         |      |           |        |     3 |          |       |       |     |      |   |                             |
|     2 |    PX SEND QC (RANDOM)                             | :TQ10003                  |     781 |  153 |           |        |       |          |       |       |     |      |   |                             |
|     3 |     HASH UNIQUE                                    |                           |     781 |  153 |           |        |       |          |       |       |     |      |   |                             |
|     4 |      PX RECEIVE                                    |                           |     781 |  151 |           |        |       |          |       |       |     |      |   |                             |
|     5 |       PX SEND HASH                                 | :TQ10002                  |     781 |  151 |           |        |     2 |          |       |       |     |      |   |                             |
|  -> 6 |        BUFFER SORT                                 |                           |     781 |  153 |       219 |     +3 |     2 |        0 | 11624 |   2GB | 61M |   2G |    26.21 | Cpu (26)                     |
|       |                                                    |                           |         |      |           |        |       |          |       |       |     |      |   | direct path write temp (28) |
|  -> 7 |         NESTED LOOPS                               |                           |     781 |  151 |       219 |     +3 |     2 |      39M |       |       |     |      |   |                             |
|  -> 8 |          NESTED LOOPS                              |                           |     781 |  151 |       219 |     +3 |     2 |      39M |       |       |     |      |     0.49 | Cpu (1)                      |
|  -> 9 |           NESTED LOOPS                             |                           |     777 |   44 |       219 |     +3 |     2 |      39M |       |       |     |      |   |                             |
| -> 10 |            NESTED LOOPS                            |                           |      41 |   26 |       219 |     +3 |     2 |     6463 |       |       |     |      |   |                             |
| -> 11 |             HASH JOIN                              |                           |      41 |   21 |       219 |     +3 |     2 |     6463 |       |       |  6M |      |   |                             |
|    12 |              BUFFER SORT                           |                           |         |      |         1 |     +3 |     2 |    36855 |       |       |     |      |   |                             |
|    13 |               PX RECEIVE                           |                           |      87 |   19 |         1 |     +3 |     2 |    36855 |       |       |     |      |   |                             |
|    14 |                PX SEND HASH                        | :TQ10000                  |      87 |   19 |         1 |     +3 |     1 |    38694 |       |       |     |      |   |                             |
|    15 |                 NESTED LOOPS                       |                           |      87 |   19 |         1 |     +3 |     1 |    38694 |       |       |     |      |   |                             |
|    16 |                  NESTED LOOPS                      |                           |      87 |   19 |         1 |     +3 |     1 |    38694 |       |       |     |      |   |                             |
|    17 |                   TABLE ACCESS BY INDEX ROWID      | FIRM                      |       1 |    2 |         1 |     +3 |     1 |       43 |       |       |     |      |   |                             |
|    18 |                    INDEX RANGE SCAN                | FIRM_ULT_PARENT_FIRM_IDX1 |       1 |    1 |         1 |     +3 |     1 |       43 |       |       |     |      |   |                             |
|    19 |                   PARTITION RANGE SINGLE           |                           |         |      |         1 |     +3 |    43 |    38694 |       |       |     |      |   |                             |
|    20 |                    BITMAP CONVERSION TO ROWIDS     |                           |         |      |         1 |     +3 |    43 |    38694 |       |       |     |      |   |                             |
|    21 |                     BITMAP INDEX SINGLE VALUE      | LIT_FACT_BD_IDX09         |         |      |         1 |     +3 |    43 |       49 |       |       |     |      |   |                             |
|    22 |                  TABLE ACCESS BY LOCAL INDEX ROWID | LIT_FACT_BD               |      63 |   19 |         3 |     +1 | 38694 |    38694 |       |       |     |      |     0.49 | Cpu (1)                      |
|    23 |              PX RECEIVE                            |                           |      20 |    2 |         1 |     +3 |     2 |        2 |       |       |     |      |   |                             |
|    24 |               PX SEND HASH                         | :TQ10001                  |      20 |    2 |           |        |       |          |       |       |     |      |   |                             |
|    25 |                PX BLOCK ITERATOR                   |                           |      20 |    2 |           |        |       |          |       |       |     |      |   |                             |
|    26 |                 TABLE ACCESS FULL                  | OPPOSING_COUNSEL          |      20 |    2 |           |        |       |          |       |       |     |      |   |                             |
| -> 27 |             TABLE ACCESS BY INDEX ROWID            | DATE_CODE                 |       1 |      |       219 |     +3 |  6465 |     6463 |       |       |     |      |   |                             |
| -> 28 |              INDEX UNIQUE SCAN                     | PK_DATE_CODE              |       1 |      |       219 |     +3 |  6465 |     6465 |       |       |     |      |   |                             |
| -> 29 |            PARTITION RANGE SINGLE                  |                           |      19 |      |       219 |     +3 |  6465 |      39M |       |       |     |      |   |                             |
| -> 30 |             TABLE ACCESS BY LOCAL INDEX ROWID      | LIT_FACT_BD               |      19 |      |       220 |     +2 |  6465 |      39M |       |       |     |      |    35.92 | Cpu (74)                     |
| -> 31 |              INDEX RANGE SCAN                      | LIT_FACT_BD_IDX20         |       1 |      |       219 |     +3 |  6465 |      39M |       |       |     |      |     9.22 | Cpu (19)                     |
| -> 32 |           INDEX UNIQUE SCAN                        | PK_FIRM                   |       1 |      |       219 |     +3 |   39M |      39M |       |       |     |      |    10.68 | Cpu (22)                     |
| -> 33 |          TABLE ACCESS BY INDEX ROWID               | FIRM                      |       1 |      |       219 |     +3 |   39M |      39M |       |       |     |      |    16.99 | Cpu (35)                     |
===================================================================================================================================================================

You can see from the “->” symbols at the left hand side of the plan that this report was generated while the plan was still running. The thing that leaps out as you glance down the page is the value in the “Rows (Actual)” column at operations 7-9 (which show the rowsources generated by some nested loop joins) and operations 29, 32 and 33 of the plan that tell us something about how those rowsources were generated.

Operation 29 has executed (Execs) 6,465 so far, producing a total of 39M rows, and operations 32 and 33 have both executed 39M times each producing a total of 39M rows by index unique scan.

The plan for the 2018 data was similar though the join order for DATE_CODE, LIT_FACT_BD and FIRM was different (and it was the join to LIT_FACT_BD that increased the row count dramatically – so hinting it to be the last table in the join might help a bit), but the largest rowcount for the 2018 query was only 3M rows, not the 39M that had appeared after only 6,465 rows of a possible driving 39,855 in the 2019 query.

So it’s not surprising that the query could take much longer for 2019. It’s not the volume of output that matters, it’s the volume of input (or, more accurately, throughput or intermediate) data that matters.

Let’s think about that volume, though: the 2018 plan generated 3M rows and then crunched them down to 2,500 rows and the 2019 plan was supposed to produce a similar sized output (from 39M+ rows). Could we rewrite the query in some way that made it do some intermediate aggregation so that the volume of data to be aggregated was never enormous?

Let’s take a look at the plan from the 2018 Monitor report (click to show/hide)
SQL Plan Monitoring Details (Plan Hash Value=472871521)
=======================================================================================================================================================================================================================================
| Id |                     Operation                      |           Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Temp  | Activity |       Activity Detail       |
|    |                                                    |                           | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes | (Max) | (Max) |   (%)     |        (# samples)         |
=======================================================================================================================================================================================================================================
|  0 | SELECT STATEMENT                                   |                           |         |      |         1 |    +24 |    17 |     2613 |      |       |       |       |       |       |   |                            |
|  1 |   PX COORDINATOR                                   |                           |         |      |         1 |    +24 |    17 |     2613 |      |       |       |       |       |       |   |                            |
|  2 |    PX SEND QC (RANDOM)                             | :TQ10003                  |       1 |   39 |         1 |    +24 |     8 |     2613 |      |       |       |       |       |       |   |                            |
|  3 |     HASH UNIQUE                                    |                           |       1 |   39 |         9 |    +16 |     8 |     2613 |      |       |       |       |    9M |       |     6.90 | Cpu (2)                     |
|  4 |      PX RECEIVE                                    |                           |       1 |   38 |         9 |    +16 |     8 |       3M |      |       |       |       |       |       |   |                            |
|  5 |       PX SEND HASH                                 | :TQ10002                  |       1 |   38 |        12 |    +14 |     8 |       3M |      |       |       |       |       |       |     3.45 | Cpu (1)                     |
|  6 |        BUFFER SORT                                 |                           |       1 |   39 |        23 |     +2 |     8 |       3M | 4584 | 143MB |   703 | 143MB |  151M |  151M |    34.48 | Cpu (2)                     |
|    |                                                    |                           |         |      |           |        |       |          |      |       |       |       |       |       |   | direct path read temp (6)  |
|    |                                                    |                           |         |      |           |        |       |          |      |       |       |       |       |       |   | direct path write temp (2) |
|  7 |         NESTED LOOPS                               |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
|  8 |          NESTED LOOPS                              |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
|  9 |           NESTED LOOPS                             |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
| 10 |            NESTED LOOPS                            |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
| 11 |             HASH JOIN                              |                           |      41 |   21 |        15 |     +2 |     8 |    19334 |      |       |       |       |    7M |       |   |                            |
| 12 |              BUFFER SORT                           |                           |         |      |        13 |     +2 |     8 |    19233 |      |       |       |       |    1M |       |   |                            |
| 13 |               PX RECEIVE                           |                           |      89 |   19 |        13 |     +2 |     8 |    19233 |      |       |       |       |       |       |   |                            |
| 14 |                PX SEND HASH                        | :TQ10000                  |      89 |   19 |         1 |     +1 |     1 |    19233 |      |       |       |       |       |       |   |                            |
| 15 |                 NESTED LOOPS                       |                           |      89 |   19 |         1 |     +1 |     1 |    19233 |      |       |       |       |       |       |   |                            |
| 16 |                  NESTED LOOPS                      |                           |      89 |   19 |         1 |     +1 |     1 |    19233 |      |       |       |       |       |       |   |                            |
| 17 |                   TABLE ACCESS BY INDEX ROWID      | FIRM                      |       1 |    2 |         1 |     +1 |     1 |       43 |      |       |       |       |       |       |   |                            |
| 18 |                    INDEX RANGE SCAN                | FIRM_ULT_PARENT_FIRM_IDX1 |       1 |    1 |         1 |     +1 |     1 |       43 |      |       |       |       |       |       |   |                            |
| 19 |                   PARTITION RANGE SINGLE           |                           |         |      |         1 |     +1 |    43 |    19233 |      |       |       |       |       |       |   |                            |
| 20 |                    BITMAP CONVERSION TO ROWIDS     |                           |         |      |         1 |     +1 |    43 |    19233 |      |       |       |       |       |       |   |                            |
| 21 |                     BITMAP INDEX SINGLE VALUE      | LIT_FACT_BD_IDX09         |         |      |         1 |     +1 |    43 |       51 |      |       |       |       |       |       |   |                            |
| 22 |                  TABLE ACCESS BY LOCAL INDEX ROWID | LIT_FACT_BD               |      64 |   19 |         1 |     +1 | 19233 |    19233 |      |       |       |       |       |       |   |                            |
| 23 |              PX RECEIVE                            |                           |      20 |    2 |        15 |     +2 |     8 |       20 |      |       |       |       |       |       |   |                            |
| 24 |               PX SEND HASH                         | :TQ10001                  |      20 |    2 |         1 |    +14 |     8 |       20 |      |       |       |       |       |       |   |                            |
| 25 |                PX BLOCK ITERATOR                   |                           |      20 |    2 |         1 |    +14 |     8 |       20 |      |       |       |       |       |       |   |                            |
| 26 |                 TABLE ACCESS FULL                  | OPPOSING_COUNSEL          |      20 |    2 |         1 |    +14 |     3 |       20 |      |       |       |       |       |       |   |                            |
| 27 |             PARTITION RANGE SINGLE                 |                           |       1 |      |        15 |     +2 | 19334 |       3M |      |       |       |       |       |       |   |                            |
| 28 |              TABLE ACCESS BY LOCAL INDEX ROWID     | LIT_FACT_BD               |       1 |      |        16 |     +1 | 19334 |       3M |      |       |       |       |       |       |    17.24 | Cpu (5)                     |
| 29 |               INDEX RANGE SCAN                     | LIT_FACT_BD_IDX20         |       1 |      |        15 |     +2 | 19334 |       3M |      |       |       |       |       |       |   |                            |
| 30 |            TABLE ACCESS BY INDEX ROWID             | DATE_CODE                 |       1 |      |        15 |     +2 |    3M |       3M |      |       |       |       |       |       |    10.34 | Cpu (3)                     |
| 31 |             INDEX UNIQUE SCAN                      | PK_DATE_CODE              |       1 |      |        16 |     +1 |    3M |       3M |      |       |       |       |       |       |     6.90 | Cpu (2)                     |
| 32 |           INDEX UNIQUE SCAN                        | PK_FIRM                   |       1 |      |        23 |     +2 |    3M |       3M |      |       |       |       |       |       |     6.90 | Cpu (2)                     |
| 33 |          TABLE ACCESS BY INDEX ROWID               | FIRM                      |       1 |      |        16 |     +1 |    3M |       3M |      |       |       |       |       |       |    13.79 | Cpu (4)                     |
=======================================================================================================================================================================================================================================

We see from operations 3 – 7 that the 3M rows generated from the nested loop joins pass up through a buffer sort operation before being crunched down to 2,613 rows. It’s probably the buffer sort (which is buffering but not sorting) that has mostly passed through a single server and spilled to disc in the 2019 report. We just don’t want that 39M+ rows ever to exist.

So how easy will it be to change the SQL (click to view/hide)
SELECT
        /*+ PARALLEL(8) */
        DISTINCT rd.document_id AS doc_id
        ,'firm_id' AS criteria_key
        ,opf.ultimate_parent_firm_id AS series_id
        ,period_number
FROM (
        SELECT
                /*+ PARALLEL(8) */
                DISTINCT rd.document_id
                ,rd.client_role_id
                ,(
                        CASE 
                                WHEN MONTHS_BETWEEN(TO_DATE('04/28/2022', 'MM/DD/YYYY'), TO_DATE('04/01/2017', 'MM/DD/YYYY')) > 12
                                        THEN TRUNC(dc.date_value, 'YEAR')
                                ELSE TRUNC(dc.date_value, 'MONTH')
                                END
                        ) period_number
        FROM REPORT_ENGINE.date_code dc
                ,REPORT_ENGINE.lit_fact_bd rd
        INNER JOIN report_engine.firm FIRM ON rd.firm_id = FIRM.firm_id
        WHERE dc.date_value BETWEEN TO_DATE('04/01/2017', 'MM/DD/YYYY')
                        AND TO_DATE('04/28/2022', 'MM/DD/YYYY')
                AND rd.publication_date_id = dc.date_id
                AND rd.year_number = to_number('2019')
                AND (FIRM.ultimate_parent_firm_id IN (to_number('252094')))
        ) rd
INNER JOIN report_engine.opposing_counsel op ON rd.client_role_id = op.legal_role_id
INNER JOIN report_engine.lit_fact_bd opclf ON opclf.document_id = rd.document_id
        AND op.opposing_counsel_role_id = opclf.client_role_id
        AND opclf.year_number = to_number('2019')
INNER JOIN report_engine.firm opf ON opclf.firm_id = opf.firm_id
        AND opf.firm_id >= 1000
WHERE 1 = 1;

Lines 7-10 and 27 tell us we alredy have an inline view where we’re doing a “select distinct” and, unwinding the mix of “Oracle” and “ANSI” syntax, we can see that it joins DATE_CODE, LIT_FACT_BD and FIRM, and we know that one of those tables explodes the intermediate data size to something enormous. So it looks like the original author of this code had already worked out that the query needed to aggregate early.

Checking back to the original plans we note that there’s only one “hash unique” operation, and there’s no sign of a “view” operation, so maybe the performance problem is a result of the optimizer suddenly deciding it can do complex view merging with this inline view, and perhaps all we need to do is add the hint /*+ no_merge */ to the inline view and see what happens.

Here’s the plan after adding the hint (click to hide/vew)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----
| Id  | Operation                                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----
|   0 | SELECT STATEMENT                                     |                           |       |       |   314 (100)|          |       |       |        |      | |
|   1 |  PX COORDINATOR                                      |                           |       |       |            |          |       |       |        |      | |
|   2 |   PX SEND QC (RANDOM)                                | :TQ10005                  |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,05 | P->S | QC (RAN D)  |
|   3 |    HASH UNIQUE                                       |                           |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,05 | PCWP | |
|   4 |     PX RECEIVE                                       |                           |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,05 | PCWP | |
|   5 |      PX SEND HASH                                    | :TQ10004                  |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,04 | P->P | HASH |
|   6 |       HASH UNIQUE                                    |                           |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|   7 |        NESTED LOOPS                                  |                           |    23 |  2599 |   313   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|   8 |         NESTED LOOPS                                 |                           |    23 |  2599 |   313   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|   9 |          NESTED LOOPS                                |                           |    23 |  2323 |   310   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|* 10 |           HASH JOIN                                  |                           |   388 | 21340 |   148   (1)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  11 |            PX RECEIVE                                |                           |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,04 | PCWP | |
|  12 |             PX SEND BROADCAST                        | :TQ10002                  |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,02 | P->P | BROADCA ST  |
|  13 |              PX BLOCK ITERATOR                       |                           |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWC | |
|* 14 |               TABLE ACCESS FULL                      | OPPOSING_COUNSEL          |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWP | |
|  15 |            VIEW                                      |                           |   835 | 39245 |   146   (1)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  16 |             HASH UNIQUE                              |                           |   835 | 63460 |   146   (1)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  17 |              PX RECEIVE                              |                           |   835 | 63460 |   145   (0)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  18 |               PX SEND HASH                           | :TQ10003                  |   835 | 63460 |   145   (0)| 00:00:03 |       |       |  Q1,03 | P->P | HASH |
|* 19 |                HASH JOIN BUFFERED                    |                           |   835 | 63460 |   145   (0)| 00:00:03 |       |       |  Q1,03 | PCWP | |
|  20 |                 BUFFER SORT                          |                           |       |       |            |          |       |       |  Q1,03 | PCWC | |
|  21 |                  PX RECEIVE                          |                           |   835 | 52605 |   136   (0)| 00:00:03 |       |       |  Q1,03 | PCWP | |
|  22 |                   PX SEND HASH                       | :TQ10000                  |   835 | 52605 |   136   (0)| 00:00:03 |       |       |        | S->P | HASH |
|  23 |                    NESTED LOOPS                      |                           |   835 | 52605 |   136   (0)| 00:00:03 |       |       |        |      | |
|  24 |                     NESTED LOOPS                     |                           |   835 | 52605 |   136   (0)| 00:00:03 |       |       |        |      | |
|  25 |                      TABLE ACCESS BY INDEX ROWID     | FIRM                      |     1 |    12 |     2   (0)| 00:00:01 |       |       |        |      | |
|* 26 |                       INDEX RANGE SCAN               | FIRM_ULT_PARENT_FIRM_IDX1 |     1 |       |     1   (0)| 00:00:01 |       |       |        |      | |
|  27 |                      PARTITION RANGE SINGLE          |                           |       |       |            |          |    30 |    30 |        |      | |
|  28 |                       BITMAP CONVERSION TO ROWIDS    |                           |       |       |            |          |       |       |        |      | |
|* 29 |                        BITMAP INDEX SINGLE VALUE     | LIT_FACT_BD_IDX09         |       |       |            |          |    30 |    30 |        |      | |
|* 30 |                     TABLE ACCESS BY LOCAL INDEX ROWID| LIT_FACT_BD               |   598 | 30498 |   136   (0)| 00:00:03 |    30 |    30 |        |      | |
|  31 |                 PX RECEIVE                           |                           |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,03 | PCWP | |
|  32 |                  PX SEND HASH                        | :TQ10001                  |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,01 | P->P | HASH |
|  33 |                   PX BLOCK ITERATOR                  |                           |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,01 | PCWC | |
|* 34 |                    TABLE ACCESS FULL                 | DATE_CODE                 |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,01 | PCWP | |
|  35 |           PARTITION RANGE SINGLE                     |                           |     1 |    46 |     0   (0)|          |    30 |    30 |  Q1,04 | PCWP | |
|* 36 |            TABLE ACCESS BY LOCAL INDEX ROWID         | LIT_FACT_BD               |     1 |    46 |     0   (0)|          |    30 |    30 |  Q1,04 | PCWP | |
|* 37 |             INDEX RANGE SCAN                         | LIT_FACT_BD_IDX20         |     1 |       |     0   (0)|          |    30 |    30 |  Q1,04 | PCWP | |
|* 38 |          INDEX UNIQUE SCAN                           | PK_FIRM                   |     1 |       |     0   (0)|          |       |       |  Q1,04 | PCWP | |
|  39 |         TABLE ACCESS BY INDEX ROWID                  | FIRM                      |     1 |    12 |     0   (0)|          |       |       |  Q1,04 | PCWP | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----

Note particularly that operations 15 and 16 tell us that we’ve forced the optimizer into treating the inline view as a separate query block and we now have two aggregation steps, one inside the view, and another after joining FIRM (again) and LIT_FACT_BD (again) to the inline view.

Unfortunately the plan shown here is pulled from memory using dbms_xplan.display_cursor() after execution, so it include the various parallel executoin colums (TQ, IN-OUT, PQ Distrib), but doesn’t have the rowsource execution stats enabled so we can’t see the actual workload and data volume, but in the words of the OP: “adding no_merge hint did the trick and now the SQL is just executing fine”.

Summary

The steps for solving the performance problems of a specific SQL statement are very fluid. For a long-running or parallel statement the SQL Monitor report will automatically be created (though there are limits on the size of the plan that may disable the feature) and this is the easiest source of useful information, though you might also need to pull the execution plan from v$sql_plan to get details about parallel execution and partitioning at the same time.

If you’re not licensed for the diagnostic and performance packs, though, enabling SQL Trace to get the plan and waits gets you a lot of infomation, and querying (g)v$pq_tqstat immediately after running the query can fill in the parallel traffic details.

In this example the SQL Monitor report showed a highly skewed distribution, which might have been fixable by changing the PQ distribution, or even by simply changing the degree of parallelism.

Further examination of the report showed that the query generated an enormous rowsource which it then crunched down to a small result set. Comparing the 2018 and 2019 plans (which were not quite identical but were sufficiently similar to justify comparison) the same skew and explosion of rowsource were visible in both, though the data size involved in the 2018 plan made it seem that the plan was a “good” one which really it wasn’t.

The obvious target for tuning was to see if the explosion in volume could be reduced or eliminated by writing the query with some intermediate non-mergeable view(s), and it turned out that the query had been written with that intent in its original form but without a hint to block complex view merging. After adding the hint the performance was acceptable.

January 19, 2022

Hash Aggregation – 2

Filed under: Execution plans,Infrastructure,Oracle,Performance,Tuning — Jonathan Lewis @ 12:03 pm GMT Jan 19,2022

In the note I wrote a couple of days ago about the way the optimizer switches from hash group by to sort group by if you add an order by X,Y clause to a query that has a group by X,Y I mentioned that this had been something I’d noticed about 15 years ago (shortly after Oracle introduced hash aggregation, in fact) but it was only the second of two details I’d noted when experimenting with this new operation. The first thing I’d noticed came from an even simpler example, and here’s a (cryptic) clue to what it was:


column operation_type format a24

select
        operation_type, count(*)
from
        V$sql_workarea
group by
        operation_type
order by
        operation_type
;

OPERATION_TYPE             COUNT(*)
------------------------ ----------
APPROXIMATE NDV                   1
BUFFER                          130
CONNECT-BY (SORT)                10
GROUP BY (HASH)                  12
GROUP BY (SORT)                 240
HASH-JOIN                       466
IDX MAINTENANCE (SORT)           39
LOAD WRITE BUFFERS               10
RANGE TQ COMPUTATION             13
RESULT CACHE                      4
SORT (v1)                        10
SORT (v2)                       147
WINDOW (SORT)                    35

The clue isn’t in the query, it’s in what’s missing from the result set, so here’s some code to create and query some data to make it easier to spot the anomaly:

rem
rem     Script:         hash_agg.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2007
rem

create table t1
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        lpad(mod(rownum-1,1000),6)      small_vc_K,
        lpad(rownum-1,6)                small_vc_M
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid wordpress format issue
;

set serveroutput off

prompt  ===========================
prompt  1000 distinct values (hash)
prompt  ===========================

select
        /*+ gather_plan_statistics 1000 */
        count(*)
from
        (
        select  /*+ no_merge */
                distinct small_vc_K
        from
                t1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  ===========================
prompt  1000 distinct values (sort)
prompt  ===========================

select
        /*+ gather_plan_statistics 1000 */
        count(*)
from
        (
        select  /*+ no_merge no_use_hash_aggregation */
                distinct small_vc_K
        from
                t1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

I’ve added the /*+ gather_plan_statistics */ hint to the query so that I can check on the rowsource execution stats and (particularly) the memory and/or temporary space used; and I’ve blocked hash aggregation in one of the two queries, so I expect to see a “hash unique” operation in the first query and a “sort unique” operation in the second. Here’s what I get from 19.11.0.0:


===========================
1000 distinct values (hash)
===========================

  COUNT(*)
----------
      1000

SQL_ID  1baadqgv02r6b, child number 0
-------------------------------------
select  /*+ gather_plan_statistics 1000 */  count(*) from  (  select
/*+ no_merge */   distinct small_vc_K  from   t1  )

Plan hash value: 171694178

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   765 (100)|      1 |00:00:00.07 |    2637 |   2632 |       |       |          |
|   1 |  SORT AGGREGATE      |      |      1 |      1 |            |      1 |00:00:00.07 |    2637 |   2632 |       |       |          |
|   2 |   VIEW               |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.07 |    2637 |   2632 |       |       |          |
|   3 |    HASH UNIQUE       |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.07 |    2637 |   2632 |  1889K|  1889K| 1417K (0)|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   405  (17)|   1000K|00:00:00.04 |    2637 |   2632 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

17 rows selected.

===========================
1000 distinct values (sort)
===========================

  COUNT(*)
----------
      1000

SQL_ID  a66rqhgw7a6dk, child number 0
-------------------------------------
select  /*+ gather_plan_statistics 1000 */  count(*) from  (  select
/*+ no_merge no_use_hash_aggregation */   distinct small_vc_K  from
t1  )

Plan hash value: 1750119335

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   765 (100)|      1 |00:00:00.22 |    2637 |       |       |          |
|   1 |  SORT AGGREGATE      |      |      1 |      1 |            |      1 |00:00:00.22 |    2637 |       |       |          |
|   2 |   VIEW               |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.22 |    2637 |       |       |          |
|   3 |    SORT UNIQUE       |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.22 |    2637 | 48128 | 48128 |43008  (0)|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   405  (17)|   1000K|00:00:00.02 |    2637 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Look back at the summary of v$sql_workarea. Can you now spot something that might be missing? Since we’ve now got two execution plans with their sql_ids, let’s run a much more precise query against the view.

select
        sql_id, child_number, operation_id, operation_type ,
        total_executions, last_memory_used/1024 last_mem_kb
from
        v$sql_workarea
where
        sql_id in (
                '1baadqgv02r6b',
                'a66rqhgw7a6dk'
)
order by
        sql_id, child_number, operation_id
/

SQL_ID        CHILD_NUMBER OPERATION_ID OPERATION_TYPE       TOTAL_EXECUTIONS LAST_MEM_KB
------------- ------------ ------------ -------------------- ---------------- -----------
1baadqgv02r6b            0            3 GROUP BY (HASH)                     1       1,417
a66rqhgw7a6dk            0            3 GROUP BY (SORT)                     1          42

The first thing to note, of course, is that the sort aggregate at operation 1 doesn’t report a workarea at all; but this shouldn’t be a surprise, the operation is simply counting rows as they arrive, there’s no real sorting going on.

Operation 3 in both cases is the more interesting one. In the plan it’s reported as a “hash/sort unique” but the workarea operation has changed this to a “group by (hash/sort)”.  It’s a little detail that probably won’t matter to most people most of the time – but it’s the sort of thing that can waste a couple of hours of time when you’re trying to analyze a performance oddity.

And another thing …

You’ll have noticed, by the way, that the hash unique operation actually demanded far more memory than the sort unique operation; 1.4MB compared to 42KB. In a large production system this may be fairly irrelevant, especially since the discrepancy tends to disappear for higher volumes, and hardly matters if the operation spills to disc. However, if you have a large number of processes doing a lot of small queries using hash aggregation you may find that cumulatively they use up an unexpected fraction of your pga_aggregate_target.

There is some scope (with the usual “confirm with Oracle support” caveat) for modifying this behaviour with a fix_control:

16792882    QKSFM_COMPILATION_16792882  Cardinality threshold for hash aggregation(raised to power of 10)

alter session set "_fix_control"='16792882:3';

The default value is 0, the maximum value seems to be 3 and the description about raising to power of 10 seems to be the wrong way round, but I haven’t done any careful testing. When I set the value to 3 the hash unique with an estimated output of 1,000 switched to a sort unique.(hence my suggestion about 10N rather than N10).

Footnote

In the previous note I pointed out that Oracle would use a single sort group by operation rather than a consecutive hash group by / sort order by pair of operations. This simpler example helps to explain why. If you check the CBO trace files for the two you can check the costs of the inline aggregate view.

From a 19.11.0.0 trace file here are two costing fragments for the hash group by strategy; the first bit is the cost of the tablescan that acquires the rows (operation 4) the second is the final cost of the inline view / query block:


  Best:: AccessPath: TableScan
         Cost: 404.639881  Degree: 1  Resp: 404.639881  Card: 1000000.000000  Bytes: 0.000000

Final cost for query block SEL$2 (#2) - All Rows Plan:
  Best join order: 1
  Cost: 764.843155  Degree: 1  Card: 1000000.000000  Bytes: 7000000.000000
  Resc: 764.843155  Resc_io: 337.000000  Resc_cpu: 1069607888
  Resp: 764.843155  Resp_io: 337.000000  Resc_cpu: 1069607888

I won’t show you the equivalent extracts for the example where I blocked hash aggregation because the numbers are identical. So there’s no way that Oracle will want to do hash group by followed by sort order by, when it can do just a single sort group by that costs exactly the same as the hash group by operation alone. This is a shame, and a little ironic because when Oracle Corp introduce hash aggregation they made a bit of a fuss about how much more efficient it was than sorting- but it looks like no-one told the optimizer about this.

January 1, 2022

Happy New Year

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,Transformations — Jonathan Lewis @ 12:01 am GMT Jan 1,2022

Here’s an entertaining little bug that appeared on the oracle-l list server just in time to end one year and start another in a suitable way. The thread starts with an offering from Matthias Rogel (shown below with some cosmetic changes) to be run on Oracle 12.2.0.1:

rem
rem     Script:         group_by_bug.sql
rem     Author:         Matthias Rogel  / Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             19.11.0.0       Fixed
rem             12.2.0.1        Fail
rem

create table t as (
        select date'2021-12-30' d from dual 
        union all 
        select date'2021-12-31'   from dual
);

select extract(year from d), count(*) from t group by extract(year from d);

alter table t add primary key(d);
select extract(year from d), count(*) from t group by extract(year from d);

This doesn’t look particularly exciting – I’ve created a table with two rows holding two dates in the same year, then counted the number of rows for “each” year before and after adding a primary key on the date column. Pause briefly to think about what the results might look like …

Table created.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          2

1 row selected.


Table altered.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          1
              2021          1

2 rows selected.

After adding the primary key (with its unique index) the result changes to something that is clearly (for this very simple data set) wrong.

At this point I offered a hypothetical reason why Oracle might be producing the wrong result, but Tim Gorman was one step ahead of me and supplied a bug reference from MOS: Wrong Result Using GROUP BY with EXTRACT Function Against DATE (Doc ID 2629968.1)

The MOS document describes this as a bug introduced in the upgrade from 12.1.0.2 to 12.2.0.1, demonstrates the error with the extract() function applied to a date, and supplies three possible workarounds (but not the workaround or explanation I supplied in my response on oracle-l).

The document also pointed to a further bug note that described how the problem also appeared with the combination of the to_char() function applied to a date column with a unique indexes: 12.2 Wrong Results from Query with GROUP BY Clause Based on To_char Function of Unique Index Columns (Doc ID 2294763.1) with a further suggestion for applying a patch (see MOS Doc ID: 26588069.8) or upgrading to 18.1 (where the bug has been fixed).

Matthias Rogel supplied a follow-up demonstrating the problem with to_char(), which prompted me to create an example showing that it wasn’t just about dates – which I’ve tidied up below (reminder, all results on this page are from 12.2.0.1):

create  table t1 as 
select  round(rownum/10,1) n1 
from    all_objects 
where   rownum <= 10
;


select n1 from t1 order by n1;
select n1, count(*) from t1 group by n1 order by n1;

column nch format A3

select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

alter table t1 add constraint t1_pk primary key(n1);
select to_char(n1,'99') nch , count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

As before I’ve created a simple table, and populated it with a few rows of data. THe first two queries are there to show you the data (0.1 to 1.0 by steps of 0.1), and show that aggregating the raw data produces one row per value.

I’ve then repeated the aggregation query, but converted each value to a character string that effectively rounds the value to an integer. Here are the two sets of results, before and after adding the primary key.

NCH   COUNT(*)
--- ----------
  0          4
  1          6

2 rows selected.

Table altered.

NCH   COUNT(*)
--- ----------
  0          1
  0          1
  0          1
  0          1
  1          1
  1          1
  1          1
  1          1
  1          1
  1          1

10 rows selected.

Again, the introduction of the primary key constraint on the column results in wrong results. In this example, though I’ve pulled the execution plans from memory along with their outlines, and this is what the two plans look like.

SQL_ID  gt5a14jb0g4n0, child number 0
-------------------------------------
select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99')
order by 1,2

Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    10 |    30 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    10 |    30 |     4  (50)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |    10 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


SQL_ID  4fxxtmrh8cpzp, child number 0
-------------------------------------
select to_char(n1,'99') nch , count(*) from t1 group by
to_char(n1,'99') order by 1,2

Plan hash value: 1252675504

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     2 (100)|          |
|   1 |  SORT ORDER BY   |       |    10 |    30 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9BB7A81A")
      ELIM_GROUPBY(@"SEL$47952E7A")
      OUTLINE(@"SEL$47952E7A")
      ELIM_GROUPBY(@"SEL$1")
      OUTLINE(@"SEL$1")
      INDEX(@"SEL$9BB7A81A" "T1"@"SEL$1" ("T1"."N1"))
      END_OUTLINE_DATA
  */

In the absence of the primary key (index) Oracle does a full tablescan, then hash group by, then sort order by. When the primary key is put in place Oracle does an index full scan (which is legal because the index must contain all the data thanks to the not null declaration inherent in a primary key) and a sort order by without any group by.

You might wonder if the problem arises because Oracle assumes the indexed path somehow means the aggregation doesn’t apply – but with a /*+ full(t1) */ hint in place and a full tablescan in the plan the aggregation step is still missing — and if you look at the Outline Data section of the plan you can see that this is explicitly demanded by the hint(s): /*+ elim_groupby() */

My hypothesis (before I read the bug note) was that the optimizer had picked up the primary key declaration and seen that n1 was unique and therefore allowed the aggregating group by to be eliminated, but failed to “notice” that the to_char() – or extract() in the date example – meant that the assumption of uniqueness was no longer valid. To work around this problem very locally I simply added the hint /*+ no_elim_groupby */ (with no query block specified) to the query – and got the correct results.

Footnote

There is an interesting side note to this example (though not one that I would want to see used in a production system – this comment is for interest only). If you look at the Outline Data for the plan when there was no primary key you’ll notice that the only outline_leaf() is named sel$1 whereas in the plan with the primary key sel$1 appears as an outline() and the only outline_leaf() is named sel$9bb7a81a. As “outline leaf” is a query block that was used by the optimizer in constructing the final plan, while an “outline” is an intermediate query block that was examined before being transformed into another query block. So this difference in the Outline Data tells us that the problem appears thanks to a transformation that did not happen when there was no index – so what would our execution plan look like if the only hint we used in the query was /*+ outline_leaf(@sel$1) */ ?

SQL_ID  apgu34hc3ap7f, child number 0
-------------------------------------
select /*+ outline_leaf(@sel$1) */ to_char(n1,'99') nch , count(*) from
t1 group by to_char(n1,'99') order by 1,2

Plan hash value: 3280011052

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY    |       |    10 |    30 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY   |       |    10 |    30 |     3  (67)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


This posting was scheduled to launch at 00:01 GMT on 1st January 2022. Happy new year – just be careful that you don’t try to extract() or to_char() it if you’re running 12.2.0.1 unless you’ve applied patch 26588069.

December 22, 2021

Explain Plan

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:26 pm GMT Dec 22,2021

Here’s a little example that appeared on the Oracle database forum a few years ago (2013, 11.2.0.3 – it’s another of my drafts that waited a long time for completion) which captures a key problem with execution plans:

you need to make sure you look at the right one.

We have the tkprof output from an SQL statement that has been traced because it needs to go faster.


select clndr_id , count(*)
from
 task where (clndr_id = :"SYS_B_0") group by clndr_id
union
select clndr_id , count(*) from project where (clndr_id = :"SYS_B_1") group by clndr_id

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.01       0.00          0          0          0           0
Fetch        2     53.32     612.03      81650      58920          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     53.34     612.04      81650      58920          0           2

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 34  (PX)
Number of plan statistics captured: 1


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         2          2          2  SORT UNIQUE (cr=58923 pr=81650 pw=22868 time=113329109 us cost=58277 size=24 card=2)
         2          2          2   UNION-ALL  (cr=58923 pr=81650 pw=22868 time=113329001 us)
         1          1          1    SORT GROUP BY NOSORT (cr=58330 pr=81070 pw=22868 time=104312437 us cost=58128 size=7 card=1)
   5589739    5589739    5589739     VIEW  index$_join$_003 (cr=58330 pr=81070 pw=22868 time=619784236 us cost=57240 size=38875249 card=5553607)
   5589739    5589739    5589739      HASH JOIN  (cr=58330 pr=81070 pw=22868 time=617373467 us)
   5590158    5590158    5590158       INDEX RANGE SCAN NDX_TASK_CALENDAR (cr=21676 pr=21676 pw=0 time=113637058 us cost=11057 size=38875249 card=5553607)(object id 24749)
   6673774    6673774    6673774       INDEX FAST FULL SCAN NDX_TASK_PROJ_RSRC (cr=36651 pr=36526 pw=0 time=213370625 us cost=21921 size=38875249 card=5553607)(object id 217274)
         1          1          1    SORT GROUP BY NOSORT (cr=593 pr=580 pw=0 time=9016527 us cost=149 size=17 card=1)
    136390     136390     136390     INDEX FAST FULL SCAN NDX_PROJECT_CALENDAR (cr=593 pr=580 pw=0 time=165434 us cost=132 size=2315876 card=136228)(object id 154409)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      2   SORT (UNIQUE)
      2    UNION-ALL
      1     SORT (GROUP BY NOSORT)
5589739      TABLE ACCESS   MODE: ANALYZED (BY INDEX ROWID) OF 'TASK' (TABLE)
5589739       INDEX   MODE: ANALYZED (RANGE SCAN) OF 'NDX_TASK_CALENDAR' (INDEX)
5590158     SORT (GROUP BY NOSORT)
6673774      INDEX   MODE: ANALYZED (RANGE SCAN) OF 'NDX_PROJECT_CALENDAR' (INDEX)

One of the first things you should notice from the tkprof output is that the “Rowsource Operation” section and the “Execution Plan” section do not match. Remember that tkprof is simply calling explain plan when it generates the “Execution Plan” section of the output and has no information about the type and value of bind variables, so it’s easy for it to generate a plan that didn’t happen. (There are other reasons why the two sets of output might differ – but this is the commonest one.)

Another thing you might note in passing is that the system has cursor_sharing set to force or similar – a detail you can infer from the bind variable names having the form :SYS_B_nnn. This, alone, might be enough to convince you to ignore the Execution Plan because of its potential to mislead.

Despite the tendancy to mislead there is (in this case) a very important clue in the Execution Plan. In lines 2 and 3 we can see the “sort unique” and “union all” that the optimizer has used to implement the UNION operator in the query. Then we see that the “union all” has two “sort group by (nosort)” children, one for each of the aggregate query blocks – the “nosort” in both cases because the query blocks ensure that only a single value of clndr_id is selected in each case anyway.

The interesting thing in the Execution Plan is that the range scan of ndx_task_calendar in the first query block tells us that the optimizer thinks that all the information we need can be found in that index. So why does the Rowsource Operation tell us that at run-time the optimizer thought it needed to include the ndx_task_proj_rsrc index in an index join as well?

Assuming we haven’t found a bug the answer must be that there is another predicate hidden somewhere behind the query. There may be a security predicate, or it may simply be that task is actually a view with a definition like “select {list of columns} from task_t where {some predicate}”.

The mismatch between Execution Plan and Rowsource Operation gives us a clue, but the output from tkprof (and even the underlying trace file) is incomplete – we need to see what predicates Oracle has used to execute this query, and where it used them. So here’s the plan for the query pulled from memory by a call to dbms_xplan.display_cursor():

--------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name                 | E-Rows |  OMem |  1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                      |        |       |       |          |         |
|   1 |  SORT UNIQUE              |                      |      2 |  2048 |  2048 | 2048  (0)|         |
|   2 |   UNION-ALL               |                      |        |       |       |          |         |
|   3 |    SORT GROUP BY NOSORT   |                      |      1 |       |       |          |         |
|*  4 |     VIEW                  | index$_join$_003     |   5553K|       |       |          |         |
|*  5 |      HASH JOIN            |                      |        |   207M|    11M|  176M (1)|     181K|
|*  6 |       INDEX RANGE SCAN    | NDX_TASK_CALENDAR    |   5553K|       |       |          |         |
|*  7 |       INDEX FAST FULL SCAN| NDX_TASK_PROJ_RSRC   |   5553K|       |       |          |         |
|   8 |    SORT GROUP BY NOSORT   |                      |      1 |       |                  |         |
|*  9 |     INDEX FAST FULL SCAN  | NDX_PROJECT_CALENDAR |    136K|       |       |          |         |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("CLNDR_ID"=:SYS_B_0)
   5 - access(ROWID=ROWID)
   6 - access("CLNDR_ID"=:SYS_B_0)
   7 - filter("DELETE_SESSION_ID" IS NULL)
   9 - filter(("CLNDR_ID"=:SYS_B_1 AND "DELETE_SESSION_ID" IS NULL))

And there in the Predicate Information section we see two extra predicates “delete_session_id is null” – one on each table. Presumably the index ndx_project_calendar includes this column, but ndx_task_calendar does not – hence the need for the index join.

Given that the predicate has been attached to both tables, it seems fairly likely (from an outsider’s perspecetive) that this is row-level security (RLS / FGAC / VPD) in place. Regardless of whether it is RLS, or just a view layer, it looks like the only options to improve the performance of this query is to persuade it into using a full tablescan, or to find an alternatice index access path (which might mean adding delete_session_id to the index it’s already using – and checking whether this change would have any nasty side effects).

Footnote

A potentially significant, but easy to miss detail is the “Misses in library cache during parse:” – this statement had to be “hard parsed” (i.e. optimised) before execution; if you see a trace file where a statement has executed and parsed many times and the Misses is greater than one then it’s worth asking yourself why that might be.

December 1, 2021

Best Practice

Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 7:08 pm GMT Dec 1,2021

This showed up recently on the Oracle Developer Forum.

For a while I’m questioning myself about a best practice query for performance.

I’ve 2 tables: EMPLOYEES and DEPARTMENTS. The EMPLOYEES table can contains millions of records. Which query is better for performance ?

Select t1.emp_id, t1.emp_name, t2.dept_name
from EMPLOYEES t1, DEPARTMENTS t2
where t1.dept_id = t2.dept_id
order by t1.emp_name
Select emp_id, emp_name, (select mypackage.get_dept_name(dept_id) from dual) as dept_name
from EMPLOYEES
order by emp_name

The nice thing thing about this question is the correct use of English grammar; it uses “better” rather than “best” (the former is correct when comparing two things, the latter requires at least three things to be compared). On the down-side, any suggestion that a single pattern might be “best practice” for performance should be viewed with great caution; it’s far too easy for the less experience to translate “this is a good idea for this query” as “you should always do this”. At best it’s possible to identify patterns at the edges of style where you could say either: “this usually works well enough” or “you shouldn’t do this unless it’s really necessary”.

So I’m going to ignore the generic question and pretend that the question was supposed to be about the performance of the specific case; and then I’m going to complain that the question is badly posed because it omits several pieces of information that are likely to be relevant.

Some points to consider as background before looking too closely into performance:

  • We might expect that there is a referential integrity (RI) constraint between departments and employees. We might also expect to see emp.dept_id declared as not null – if it’s not then do we really want employees who are not yet assigned to a department to disappear from the report. Maybe the join should be an outer join.
  • How many departments are there. If there are millions of employees might there be thousands, or even tens of thousands, of departments? That’s going to make a difference to performance and (depending on strategy) to the consistency of the performance.
  • How important is it to squeeze the last bit of performance from this query. An Oracle database can perform perfectly well with millions of rows but how rapidly can the network transfer the rows from the database to the client, and how well is the client process going to deal with such a large result set. (Why does the client want to run a query that returns millions of rows – is the client then going to do something, after a long and tedious data transfer, that should have been done in 1/100th of the time in the database to supply a much smaller result set).
  • How competent are the people who may be required to support the query once it has gone into production. Should you follow the principle of preferring simplicity to optimum performance. What’s the difference in (database) performance between the better performance and the one that’s easier to understand?
  • Is there any reason why the query must not use parallel execution. Depending on the coding strategy adopted it’s possible that the slower serial query becomes the faster parallel query, and parallelism hasn’t even been mentioned.
  • Do you think that the version of Oracle may change the relative performance of different strategies for the query? (The correct answer is “yes”, and the query that uses a function call and scalar subquery is the one that’s more likely to be affected by future tweaks to the Oracle internal processing.)

So let’s look at specifics.

I’m going to ignore the option for using of a PL/SQL function to generate a department name from an id. Possibly the hope that using it would reduce the number of times a piece of SQL was executed against the dept table; but you shouldn’t use the public result cache for such a lightweight call, particularly when the session’s scalar subquery cache would probably get all the benefit that was available anyway.

Assume, then, that the referential integrity and not null declarations are in place, and that the number of departments is vaguely commensurate with the millions of employees. I’ve set up a little test with 50,000 rows in an “emp” table and 200 rows in a “dept” table just to have some concrete results to prompt a few thoughts.

Given the nature of the query (no predicates to limit data selected) the simple join with “order by” probably has only one sensible path:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |  50000 |00:00:00.03 |     953 |       |       |          |
|   1 |  SORT ORDER BY      |      |      1 |  50000 |  50000 |00:00:00.03 |     953 |  3738K|   834K| 3322K (0)|
|*  2 |   HASH JOIN         |      |      1 |  50000 |  50000 |00:00:00.01 |     953 |  1335K|  1335K| 1566K (0)|
|   3 |    TABLE ACCESS FULL| DEPT |      1 |    200 |    200 |00:00:00.01 |       5 |       |       |          |
|   4 |    TABLE ACCESS FULL| EMP  |      1 |  50000 |  50000 |00:00:00.01 |     948 |       |       |          |
-----------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEPT"."ID"="EMP"."DEPT_ID")

I’ve run this with with rowsource execution stats enabled so that I can highlight details that might suggest ways to save resources like CPU or memory consumption. This gives rise to two thoughts:

First, would the subquery method do anything to reduce the volume of data sorted – i.e. would the subquery run late; secondly, even if it didn’t run late might it reduce the work needed to translate department ids into department names.

Here’s a version of the query using a scalar subquery with the resulting execution plan:

select 
        emp.id, emp.ename,
        (select dept.dept_name from dept where dept.id = emp.dept_id) dept_name
from 
        emp
order by
         emp.ename
/

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |      1 |        |  50000 |00:00:00.05 |    5884 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPT    |   4931 |      1 |   4931 |00:00:00.01 |    4935 |       |       |          |
|*  2 |   INDEX UNIQUE SCAN         | DEPT_PK |   4931 |      1 |   4931 |00:00:00.01 |       4 |       |       |          |
|   3 |  SORT ORDER BY              |         |      1 |  50000 |  50000 |00:00:00.05 |    5884 |  3738K|   834K| 3322K (0)|
|   4 |   TABLE ACCESS FULL         | EMP     |      1 |  50000 |  50000 |00:00:00.01 |     949 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEPT"."ID"=:B1)

The plan for any query with scalar subqueries in the select list reports the subqueries at the same level as the main query and before the query – so we can see here that the scalar subquery has been executed only 4,931 times, rather than the full 50,000, and that’s the benefit of scalar subquery caching. The execution count is in the thousands rather than being 200 (number of departments) because the cache is rather small and when hash values for department ids collide hashing only the first into the cache is retained (the cache doesn’t use linked lists to handle collisions).

There’s some ambiguity here, though. We can see that the Buffers statistic at operation 3 (the Sort) matches the total for the query, and consists of the value for the table scan plus the value for the subquery execution (5884 = 4935 + 949, but don’t ask me why the emp tablescan reports 949 buffers instead of the 948 previously reported) so we can’t easily tell whether Oracle executed the subquery before or after it had sorted the data from the tablescan. This could make a difference to performance since including the department name in the data to be sorted could mean using a lot more memory (and more I/O if the larger volume resulted in the sort spilling to disc). We can do a quick check with a simplified query – selecting only the department id from emp, rather than translating id to name.

select 
        emp.id, emp.ename,
        emp.dept_id
--      (select dept.dept_name from dept where dept.id = emp.dept_id) dept_name
from 
        emp
order by
         emp.ename
/

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |  50000 |00:00:00.02 |     948 |       |       |          |
|   1 |  SORT ORDER BY     |      |      1 |  50000 |  50000 |00:00:00.02 |     948 |  2746K|   746K| 2440K (0)|
|   2 |   TABLE ACCESS FULL| EMP  |      1 |  50000 |  50000 |00:00:00.01 |     948 |       |       |          |
----------------------------------------------------------------------------------------------------------------


With this simpler query the Used-Mem for the sort operation drops from 3322K to 2440K so Oracle must have been using the scalar subquery to fetch the department name before sorting. (I don’t know why Buffers for the tablescan has dropped to 948 again.)

So maybe it would be worth rewriting the query to sort the “narrower” data set before calling the scalar subquery:

select
        /*+ 
                qb_name(main) 
                no_merge(@inline)
                no_eliminate_oby(@inline)
        */
        v1.id, v1.ename, 
        (select /*+ qb_name(ssq) */ dept.dept_name from dept where dept.id = v1.dept_id) dept_name
from    (
        select 
                /*+ qb_name(inline) */
                emp.id, emp.ename, emp.dept_id
        from 
                emp
        order by
                emp.ename
        )       v1
/

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |      1 |        |  50000 |00:00:00.03 |     949 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPT    |   4909 |      1 |   4909 |00:00:00.01 |    5112 |       |       |          |
|*  2 |   INDEX UNIQUE SCAN         | DEPT_PK |   4909 |      1 |   4909 |00:00:00.01 |     203 |       |       |          |
|   3 |  VIEW                       |         |      1 |  50000 |  50000 |00:00:00.03 |     949 |       |       |          |
|   4 |   SORT ORDER BY             |         |      1 |  50000 |  50000 |00:00:00.02 |     949 |  2746K|   746K| 2440K (0)|
|   5 |    TABLE ACCESS FULL        | EMP     |      1 |  50000 |  50000 |00:00:00.01 |     949 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEPT"."ID"=:B1)

As you can see, my execution plan now has a View operation telling us that the Sort operation completed inside a non-mergable view; and we can see that the Buffers and the Used-Mem statistics for the Sort operation has stayed at the 948(+1 again) buffers and 2440K of the simpler query. Of course, the need for the data (i.e. sorted rowsource) to pass up the plan through the view operation means a little extra CPU at that point before we start executing the subquery. By a lucky fluke the hash collisions on the scalar subquery cache have dropped slightly because the data department ids are arriving in a different order – this can be a good thing but, as you will have seen in the linked article above on filter subqueries, the change of order could have been a disaster. (More articles on scalar subquery caching – and deterministic functions – linked to from this URL)

You might also note the odd little glitch in the reporting – the final figure for Buffers has “lost” the result from the scalar subquery. (And this was on 19.11.0.0 – so it’s not an old error.)

So it seems we can do better than either of the options supplied in the original post. But the important question then is whether we should adopt that strategy or not. And the answer in many cases will be “probably not, unless it’s in very special circumstances and carefully documented”.

I’ve used a no_merge() hint and the undocumented no_eliminate_oby() hint; and that happens to have done what I wanted in the release of Oracle I was using; but in principle the optimizer could unnest the “@qb_name(ssq)” scalar subquery and produce a join between the “@qb_name(inline)” inline view and the unnested table – and that might cause the ordered data to become disordered (even though the sort order by operation would still be taking place). That’s a little unlikely, of course, but if it did happen someone would have to notice and deal with an urgent rewrite. (Strangely the reason why the optimizer doesn’t do this unnest in 19c and 21c is because – according to the 10053 trace file – the subquery “might return more than one row”, even though it’s selecting by equality on primary key!)

Summary Observations

Resist the temptation to ask for “best practices” for writing performant SQL. There are no immediately practical answers to the question (though there may be a couple of suggestions about how not to write your SQL).

You don’t necessarily want to put the more performant query into production.

Footnote

If you want to play around with this model, here’s the script I used to generate the data and run the test:

Click here to expand/contract the text
rem
rem     Script:         emp_dept_optimise.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2021
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem

define m_dept=200

drop table emp;
drop table dept;

execute dbms_randoms.seed(0)

create table dept as
select 
        rownum id, 
        lpad(dbms_random.value,20) dept_name, 
        rpad('x',100) padding
from
        all_objects
where 
        rownum &amp;lt;= &amp;amp;m_dept
/

alter table dept add constraint dept_pk primary key(id);

create table emp as
select 
        rownum id, 
        1 + mod(rownum-1,&amp;amp;m_dept) dept_id,
        lpad(dbms_random.value,20) ename, 
        rpad('x',100) padding
from
        all_objects
where
        rownum &amp;lt;= 50000
/

alter table emp add constraint emp_pk primary key (id)
/

alter table emp add constraint emp_fk_dept foreign key(dept_id) references dept
/

set linesize 180
set arraysize 250

set termout off
set serveroutput off

alter session set statistics_level = all;

select 
        emp.id, emp.ename,
        emp.dept_id
--      (select dept.dept_name from dept where dept.id = emp.dept_id) dept_name
from 
        emp
order by
         emp.ename
/

spool emp_dept_optimise.lst

prompt  ============================================
prompt  For a baseline on memory, query with dept_id
prompt  ============================================

select * from table(dbms_xplan.display_cursor(format=&amp;gt;'allstats last'));
spool off

select 
        emp.id, emp.ename,
        dept.dept_name
from 
        emp,
        dept
where
        dept.id = emp.dept_id
order by
         emp.ename
/

spool emp_dept_optimise.lst append

prompt  ======================================
prompt  For a simple join between emp and dept
prompt  ======================================

select * from table(dbms_xplan.display_cursor(format=&amp;gt;'allstats last projection'));
spool off

select 
        emp.id, emp.ename,
--      emp.dept_id
        (select dept.dept_name from dept where dept.id = emp.dept_id) dept_name
from 
        emp
order by
         emp.ename
/

spool emp_dept_optimise.lst append

prompt  ==============================================
prompt  Now switching to the dept_name scalar subquery
prompt  ==============================================

select * from table(dbms_xplan.display_cursor(format=&amp;gt;'allstats last'));
spool off



select
        /*+ 
                qb_name(main) 
                no_merge(@inline)
                no_eliminate_oby(@inline)
        */
        v1.id, v1.ename, 
        (select /*+ qb_name(ssq) */ dept.dept_name from dept where dept.id = v1.dept_id) dept_name
from    (
        select 
                /*+ qb_name(inline) */
                emp.id, emp.ename, emp.dept_id
        from 
                emp
        order by
                emp.ename
        )       v1
/

spool emp_dept_optimise.lst append

prompt  ==================================================
prompt  Now with the dept_name scalar subquery 'postponed'
prompt  ==================================================

select * from table(dbms_xplan.display_cursor(format=&amp;gt;'allstats last outline'));
spool off

set serveroutput on
alter session set statistics_level = typical;

November 12, 2021

Index Conspiracy

Filed under: Execution plans,humour,Indexing,Oracle — Jonathan Lewis @ 3:28 pm GMT Nov 12,2021

A little light entertainment – but with a tiny bit of information that’s worth knowing – for a Friday evening. This is a demo I used at IOUG 2003 to warm the audience up before the main event.

We start with a table and two indexes – and a clunky little query just to show the index names and object ids.

rem
rem     Script:         index_conspiracy_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2003 / Nov 2021
rem     Purpose:        Demonstrate the anti-microsoft conspiracy.
rem

create table t1 
as
select 
        rownum          n1, 
        rownum          n2,
        rpad('x',10)    small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 3000
;

create index first_col_index on t1(n1);
create index microsoft_index on t1(n2);

select object_id, table_name, index_name
from
        (
        select  object_id, object_name
        from    user_objects
        where   object_type = 'INDEX'
        )       v1,
        (
        select table_name, index_name
        from   user_indexes
        where  table_name = 'T1'
        )       v2
where
        object_name = index_name
order by 
        object_id
;

You’ll notice that the n1 and n2 columns are identical and that means the corresponding indexes will have identical content, statistics and costs. So let’s use autotrace to check the plans for a few queries – we won’t be using bind variables so it’s a good bet that the plans from autotrace would be what we’d get if we actually ran the queries and pulled the plans from memory:

set autotrace traceonly explain

prompt  ==========================================
prompt  Initial Behaviour (uses "first_col_index")
prompt  ==========================================

select
        *
from    t1
where   n1 = 44
and     n2 = 44
;

At the time of the IOUG conference there was quite a lot of antipathy between Microsoft and Oracle, so it didn’t surprise anyone that the query ignored the index called “microsoft_index” and used the following plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 4128733246

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |     1 |   120 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1              |     1 |   120 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | FIRST_COL_INDEX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N2"=44)
   2 - access("N1"=44)

We can demonstrate that the microsoft_index was an appropriate index for this query by renaming it, of course, and I’ll leave it as an exercise for the user to see what happens as we go through several different names:

prompt  ============================================
prompt  Renaming "microsoft_index" to "better_index"
prompt  ============================================

alter index microsoft_index rename to better_index;

select
        *
from    t1
where   n1 = 44
and     n2 = 44
;

prompt  ============================================
prompt  Renaming "better_index" to "microsoft_index"
prompt  ============================================

alter index better_index rename to microsoft_index ;

select
        *
from    t1
where   n1 = 44
and     n2 = 44
;

prompt  =============================================
prompt  Renaming "microsoft_index" to "ellison_index"
prompt  =============================================

alter index microsoft_index rename to ellison_index ;

select
        *
from    t1
where   n1 = 44
and     n2 = 44
;

prompt  =========================================
prompt  Renaming "ellison_index" to "gates_index"
prompt  =========================================

alter index ellison_index rename to gates_index ;

select
        *
from    t1
where   n1 = 44
and     n2 = 44
;

set autotrace off

I think the version of Oracle that I demonstrated this on was a late version of 9i. The behaviour is the same in 21c.

October 15, 2021

use_nl redux

Filed under: CBO,Execution plans,Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 2:58 pm BST Oct 15,2021

A question has just appeared on a note I wrote in 2012 about the incorrect use of the use_nl() hint in some sys-recursive SQL, linking forward to an explanation I wrote in 2017 of the use_nl() hint – particularly the interpretation of the form use_nl(a,b), which does not mean “use a nested loop from table A to table B)”.

The question is essentially – “does Oracle pick the join order before it looks at the hints”?

I’m going to look at one of the queries in the question (based on the 2017 table creation code) and explain how Oracle gets to the plan it finally picks. I’ll be using an instance of 21.3 in the examples below. Here’s the query, followed by the plan:

select
        /*+ use_nl(b) */
        a.v1, b.v1, c.v1, d.v1
from
        a, b, c, d
where
        d.n100 = 0
and     a.n100 = d.id
and     b.n100 = a.n2
and     c.id   = a.id
/


| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 20000 |  1054K|    78   (4)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL | D    |   100 |  1800 |    26   (4)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 20000 |   703K|    52   (4)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    | 10000 |   214K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"SEL$1" "C"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "C"@"SEL$1")
      USE_HASH(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "A"@"SEL$1")
      LEADING(@"SEL$1" "B"@"SEL$1" "A"@"SEL$1" "D"@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "D"@"SEL$1")
      FULL(@"SEL$1" "A"@"SEL$1")
      FULL(@"SEL$1" "B"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('21.1.0')
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."ID"="A"."ID")
   3 - access("A"."N100"="D"."ID")
   4 - filter("D"."N100"=0)
   5 - access("B"."N100"="A"."N2")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   6 -  SEL$1 / "B"@"SEL$1"
         U -  use_nl(b)

Note
-----
   - this is an adaptive plan

Points to note:

  • The Hint Report says the plan final did not use the use_nl(b) hint.
  • Whatever you may think the join order is by looking at the body of the plan, the leading() hint in the Outline Data tells us that the join order was (B A D C), and that explains why the use_nl(b) hint could not be used: B was never “the next table in the join order”.
  • The “visible” order of activity displayed in the plan is C D B A, but that’s because we swap_join_inputs(D) to put D above the (B,A) join, then swap_join_inputs(C) to put C above D.

So did Oracle completely pre-empt any plans that allowed B to be “the next table”, thus avoiding the hint, or did it consider some plans where B wasn’t the first table in the join order and, if so, would it have used a nested loop into B if that plan had had a low enough cost?

The only way to answer these questions is to look at the CBO (10053) trace file. For very simple queries it’s often enough to pick out a few lines as a starting point – in my case using egrep:

egrep -e "^Join order" -e"Best so far" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Join order aborted2: cost > best plan cost
Join order[5]:  D[D]#0  C[C]#3  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[6]:  D[D]#0  C[C]#3  B[B]#2  A[A]#1
Join order aborted2: cost > best plan cost

Join order[7]:  A[A]#1  D[D]#0  B[B]#2  C[C]#3
Join order aborted2: cost > best plan cost
Join order[8]:  A[A]#1  D[D]#0  C[C]#3  B[B]#2
Join order aborted2: cost > best plan cost
Join order[9]:  A[A]#1  B[B]#2  D[D]#0  C[C]#3
Join order aborted2: cost > best plan cost
Join order[10]:  A[A]#1  C[C]#3  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[11]:  A[A]#1  C[C]#3  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost

Join order[12]:  B[B]#2  D[D]#0  A[A]#1  C[C]#3
Join order aborted2: cost > best plan cost
Join order[13]:  B[B]#2  A[A]#1  D[D]#0  C[C]#3
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
Join order[14]:  B[B]#2  A[A]#1  C[C]#3  D[D]#0
Join order aborted2: cost > best plan cost
Join order[15]:  B[B]#2  C[C]#3  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Join order[16]:  C[C]#3  D[D]#0  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[17]:  C[C]#3  A[A]#1  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[18]:  C[C]#3  A[A]#1  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost
Join order[19]:  C[C]#3  B[B]#2  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Oracle has considerd 19 possible join orders (out of a maximum of 24 (i.e. 4!). In theory we might see 6 plans starting with each of the 4 tables. In fact we see that the optimizer’s first choice started with table D, producing 6 join orders, then switched to starting with table A, producing only 5 join orders.

The “missing” order is (A, B, C, D) which should have appeared between join orders 9 and 10. If we check the detail in the trace file for join order 9 (A, B, D, C) we’ll see that the optimizer aborted after calculating the join from A to B because the cost had already exceeded the “Best so far” by then and didn’t even start to calculate the cost of getting to the 3rd table (D) in that join order. Clearly, then, there was no point in the optimizer considering any other join order that started with (A, B) – hence the absence of (A, B, C, D), which would otherwise have been the next in the list

I’ve highlighted all the join orders where the optimizer didn’t abort, but produced a new “Best so far” line. My original egrep command was too simple to tell the whole story, though; the “Best so far” line includes only the cost of getting data from the first table in the join order and I need to expand the output in each case to 4 lines (one for each table). This is what the 4 non-aborted summaries look like:

egrep -A+3 -e"Best so far" or21_ora_15956.trc

Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 2  cost: 30137.036118  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 30163.548157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 3  cost: 78.079517  card: 10000.000000  bytes: 550000.000000
              Table#: 2  cost: 30163.348157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 2  cost: 2483.956340  card: 1000000.000000  bytes: 32000000.000000
              Table#: 1  cost: 2530.068379  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 2556.580418  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
              Table#: 1  cost: 52.204078  card: 20000.000000  bytes: 720000.000000
              Table#: 0  cost: 78.479517  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 104.991556  card: 20000.000000  bytes: 1380000.000000

As you can see, when we start with (B A) the estimated cost drops dramatically.

Now that we’ve see that Oracle looks at many (though not a completely exhaustive set of) plans on the way to the one it picks the thing we need to do to address the original question is check that for every single calculation where B is “the next table”, Oracle obeys our hint.

Each time the optimizer join “the next table” its default strategy is to consider the cost of a Nested Loop join, a Sort Merge join (possibly in two different ways), and a Hash join in that order. However, if the optimizer is obeying the hint it will consider only the nested loop join. Here’s a suitable call to egrep with the first four join orders::

egrep -e "^Join order" -e "^Now joining" -e"^NL Join" -e"^SM Join" -e"^HA Join" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Now joining: A[A]#1
NL Join
SM Join
SM Join (with index on outer)
HA Join
Now joining: B[B]#2
NL Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Now joining: C[C]#3
NL Join
SM Join
HA Join
Now joining: B[B]#2
NL Join

Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Now joining: B[B]#2
NL Join
Now joining: A[A]#1
NL Join
SM Join
HA Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Now joining: C[C]#3
NL Join
Join order aborted2: cost > best plan cost

As you can see, the only join considered when the optimizer is “Now joining” B is a Nested Loop join; for all other tables all three join methods (and sometimes two variants of the Sort Merge join) are evaluated.

You may also notice another of the clever strategies the optimizer uses to minimise its workload. On the Join order[2] the optimizer has gone straight to “Now joining C” because it has remembered the result of joining A to D from the previous join order.

This is only a very simple example and analysis, but I hope it has given you some idea of how the optimizer works, how clever it tries to be about minimising the work, and how it can obey a hint while still producing an execution plan that appears to have ignored the hint.

October 11, 2021

Adaptive Study

Filed under: Execution plans,Oracle — Jonathan Lewis @ 11:57 am BST Oct 11,2021

This is a little case study of adaptive optimisation in Oracle 19c with a surprising side-effect showing up when the optimizer gave the execution engine the option to “do the right thing” and the execution engine took it – except the “right thing” turned out to be a wrong thing.

We started with a request to the Oracle-L list server asking about the difference between the operations “table access by rowid” and “table access by rowid batched” and why changing the parameter “optimizer_adaptive_reporting_only” should make a plan switch from one to the other, and how much of a performance impact this would have because this was the only change that showed up in a plan that went from fast (enough) to very slow when the parameter was changed from true to false.

The batching (or not) of the table access really shouldn’t make much difference; the batch option tends to appear if there’s a “blocking” operation (such as a hash join) further up the execution plan, but the mechanism by which a rowsource is produced and passed up the tree is only likely to be affected very slightly. So there had to be something else going on.

Fortunately the OP had the SQL Monitor reports available from a fast / non-batched / reporting only = true run and a slow / batched / “reporting only = false” run. I’ve shown these below with the option to expand and contract them on demand:

Fast plan (reporting only):

Click on this line to expand the “reporting only = true (fast)” plan
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)         
 Instance ID         :  2                       
 Session             :  XXXXX (510:5394) 
 SQL ID              :  791qwn38bq6gv           
 SQL Execution ID    :  33554432                
 Execution Started   :  10/07/2021 11:46:56     
 First Refresh Time  :  10/07/2021 11:46:56     
 Last Refresh Time   :  10/07/2021 11:51:36     
 Duration            :  280s                    
 Module/Action       :  SQL*Plus/-              
 Service             :  XXXXX.XXXXX.com 
 Program             :  sqlplus.exe             
 Fetch Calls         :  370                     

Global Stats
===========================================================================
| Elapsed |   Cpu   |    IO    | Cluster  | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
===========================================================================
|     252 |     170 |       71 |       11 |   370 |    39M | 251K |   2GB |
===========================================================================

SQL Plan Monitoring Details (Plan Hash Value=250668601)
===============================================================================================================================================================================
| Id |                      Operation                       |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  |
|    |                                                      |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |
===============================================================================================================================================================================
|  0 | SELECT STATEMENT                                     |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  1 |   FILTER                                             |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  2 |    NESTED LOOPS OUTER                                |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  3 |     NESTED LOOPS OUTER                               |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  4 |      HASH JOIN OUTER                                 |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  5 |       NESTED LOOPS OUTER                             |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  6 |        STATISTICS COLLECTOR                          |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  7 |         NESTED LOOPS OUTER                           |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  8 |          HASH JOIN OUTER                             |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  9 |           NESTED LOOPS OUTER                         |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 10 |            STATISTICS COLLECTOR                      |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
| 11 |             NESTED LOOPS OUTER                       |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 12 |              NESTED LOOPS OUTER                      |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 13 |               NESTED LOOPS                           |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 14 |                NESTED LOOPS OUTER                    |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 15 |                 NESTED LOOPS                         |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 16 |                  NESTED LOOPS OUTER                  |                               |    272K |    1M |       279 |     +2 |     1 |       2M |       |       |     . |
| 17 |                   NESTED LOOPS                       |                               |    272K |    1M |       279 |     +2 |     1 |       2M |       |       |     . |
| 18 |                    FILTER                            |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
| 19 |                     NESTED LOOPS OUTER               |                               |    272K |  598K |       279 |     +2 |     1 |       2M |       |       |     . |
| 20 |                      VIEW                            | index$_join$_006              |    276K | 48299 |       279 |     +2 |     1 |       2M |       |       |     . |
| 21 |                       HASH JOIN                      |                               |         |       |       279 |     +2 |     1 |       2M |       |       | 132MB |
| 22 |                        HASH JOIN                     |                               |         |       |         2 |     +1 |     1 |       2M |       |       | 124MB |
| 23 |                         INDEX STORAGE FAST FULL SCAN | TET_IX2                       |    276K |  8505 |         1 |     +2 |     1 |       2M |       |       |     . |
| 24 |                         INDEX STORAGE FAST FULL SCAN | TET_IX4                       |    276K | 13077 |         1 |     +2 |     1 |       2M |       |       |     . |
| 25 |                        INDEX STORAGE FAST FULL SCAN  | TET_PK                        |    276K | 11889 |       279 |     +2 |     1 |       2M |   149 |  62MB |     . |
| 26 |                      TABLE ACCESS BY INDEX ROWID     | TT                            |       1 |     2 |       279 |     +2 |    2M |       2M |  2347 |  18MB |     . |
| 27 |                       INDEX UNIQUE SCAN              | TT_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |    11 | 90112 |     . |
| 28 |                    TABLE ACCESS BY INDEX ROWID       | TM                            |       1 |     2 |       279 |     +2 |    2M |       2M | 12476 |  97MB |     . |
| 29 |                     INDEX UNIQUE SCAN                | TM_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |  1683 |  13MB |     . |
| 30 |                   TABLE ACCESS BY INDEX ROWID        | TU                            |       1 |     1 |       257 |    +21 |    2M |    17764 |   137 |   1MB |     . |
| 31 |                    INDEX UNIQUE SCAN                 | TU_PK                         |       1 |       |       257 |    +21 |    2M |    17764 |     1 |  8192 |     . |
| 32 |                  TABLE ACCESS BY INDEX ROWID         | TEP                           |       1 |     2 |       279 |     +2 |    2M |       2M |  155K |   1GB |     . |
| 33 |                   INDEX UNIQUE SCAN                  | TEP_PK                        |       1 |     1 |       279 |     +2 |    2M |       2M |  1729 |  14MB |     . |
| 34 |                 TABLE ACCESS BY INDEX ROWID          | TLIM                          |       1 |     1 |       279 |     +2 |    2M |       2M |       |       |     . |
| 35 |                  INDEX UNIQUE SCAN                   | TLIM_PK                       |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 36 |                TABLE ACCESS BY INDEX ROWID           | TLPSE                         |       1 |     1 |       279 |     +2 |    2M |       2M |       |       |     . |
| 37 |                 INDEX UNIQUE SCAN                    | TLPSE_PK                      |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 38 |               INDEX RANGE SCAN                       | TCX_IX2                       |       1 |     2 |       279 |     +2 |    2M |       2M |  8870 |  69MB |     . |
| 39 |              TABLE ACCESS BY INDEX ROWID             | TC                            |       1 |     2 |       279 |     +2 |    2M |       2M | 14648 | 114MB |     . |
| 40 |               INDEX UNIQUE SCAN                      | TC_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |   157 |   1MB |     . |
| 41 |            INDEX RANGE SCAN                          | TCX_PK                        |       1 |     2 |       279 |     +2 |    2M |       2M |       |       |     . |
| 42 |           INDEX RANGE SCAN                           | TCX_PK                        |       1 |     2 |           |        |       |          |       |       |     . |
| 43 |          TABLE ACCESS BY INDEX ROWID                 | TC                            |       1 |     2 |       279 |     +2 |    2M |       2M | 16037 | 125MB |     . |
| 44 |           INDEX UNIQUE SCAN                          | TC_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |   224 |   2MB |     . |
| 45 |        TABLE ACCESS BY INDEX ROWID                   | TP                            |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 46 |         INDEX RANGE SCAN                             | TP_PK                         |      15 |     1 |       279 |     +2 |    2M |      28M |       |       |     . |
| 47 |       TABLE ACCESS BY INDEX ROWID                    | TP                            |       1 |     3 |           |        |       |          |       |       |     . |
| 48 |        INDEX RANGE SCAN                              | TP_PK                         |      15 |     1 |           |        |       |          |       |       |     . |
| 49 |      TABLE ACCESS STORAGE FULL FIRST ROWS            | TLIET                         |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 50 |     VIEW PUSHED PREDICATE                            | TEB_VW                        |       1 |    57 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 51 |      NESTED LOOPS OUTER                              |                               |       1 |    57 |       272 |     +8 |    2M |     1459 |       |       |     . |
| 52 |       NESTED LOOPS                                   |                               |       1 |    55 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 53 |        NESTED LOOPS                                  |                               |       1 |    53 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 54 |         NESTED LOOPS                                 |                               |       1 |    51 |       272 |     +9 |    2M |     1459 |       |       |     . |
| 55 |          NESTED LOOPS                                |                               |       5 |    41 |       279 |     +2 |    2M |     6965 |       |       |     . |
| 56 |           NESTED LOOPS                               |                               |       1 |     7 |       279 |     +2 |    2M |     770K |       |       |     . |
| 57 |            NESTED LOOPS                              |                               |       1 |     4 |       279 |     +2 |    2M |     770K |       |       |     . |
| 58 |             NESTED LOOPS                             |                               |       1 |     3 |       279 |     +2 |    2M |     770K |       |       |     . |
| 59 |              TABLE ACCESS BY INDEX ROWID             | TEP                           |       1 |     3 |       279 |     +2 |    2M |     770K |       |       |     . |
| 60 |               INDEX UNIQUE SCAN                      | TEP_PK                        |       1 |     2 |       279 |     +2 |    2M |       2M |       |       |     . |
| 61 |              INDEX RANGE SCAN                        | TLP_IX1                       |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 62 |             VIEW                                     |                               |       1 |     1 |       279 |     +2 |  770K |     770K |       |       |     . |
| 63 |              SORT AGGREGATE                          |                               |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 64 |               TABLE ACCESS BY INDEX ROWID            | TPR                           |       1 |     1 |       279 |     +2 |  770K |     770K |       |       |     . |
| 65 |                INDEX UNIQUE SCAN                     | TPR_PK                        |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 66 |            TABLE ACCESS BY INDEX ROWID               | TET                           |       1 |     3 |       279 |     +2 |  770K |     770K | 28892 | 226MB |     . |
| 67 |             INDEX RANGE SCAN                         | TET_Ix1                       |       1 |     2 |       279 |     +2 |  770K |     899K |  6957 |  54MB |     . |
| 68 |           TABLE ACCESS BY INDEX ROWID                | TWE                           |       5 |    34 |       272 |     +9 |  770K |     6965 |   890 |   7MB |     . |
| 69 |            INDEX RANGE SCAN                          | TWE_IDX1                      |      35 |     2 |       272 |     +9 |  770K |     6965 |    22 | 176KB |     . |
| 70 |          TABLE ACCESS BY INDEX ROWID                 | TT                            |       1 |     2 |       272 |     +9 |  6965 |     1459 |       |       |     . |
| 71 |           INDEX UNIQUE SCAN                          | TT_PK                         |       1 |     1 |       272 |     +9 |  6965 |     6965 |       |       |     . |
| 72 |         INDEX RANGE SCAN                             | TCX_IX2                       |       1 |     2 |       256 |    +24 |  1459 |     1459 |   932 |   7MB |     . |
| 73 |        TABLE ACCESS BY INDEX ROWID                   | TC                            |       1 |     2 |       256 |    +24 |  1459 |     1459 |       |       |     . |
| 74 |         INDEX UNIQUE SCAN                            | TC_PK                         |       1 |     1 |       256 |    +24 |  1459 |     1459 |       |       |     . |
| 75 |       TABLE ACCESS BY INDEX ROWID                    | TLS                           |       1 |     2 |       256 |    +24 |  1459 |     1451 |       |       |     . |
| 76 |        INDEX SKIP SCAN                               | TLS_PK                        |       1 |     1 |       256 |    +24 |  1459 |     1451 |       |       |     . |
| 77 |    SORT AGGREGATE                                    |                               |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 78 |     FIRST ROW                                        |                               |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 79 |      INDEX RANGE SCAN (MIN/MAX)                      | TCX_IX2                       |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
===============================================================================================================================================================================

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
    0 -  STATEMENT
         U -  first_rows / hint overridden by another in parent query block
           -  first_rows
 
  56 -  SEL$5
           -  no_merge
 
Note
-----
   - this is an adaptive plan

Slow plan (runtime adapted):

Click on this line to expand the “reporting_only = false (slow)” plan
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)          
 Instance ID         :  2                        
 Session             :  XXXXX (509:27860) 
 SQL ID              :  8t19y7v5j9ztg            
 SQL Execution ID    :  33554432                 
 Execution Started   :  10/07/2021 07:56:09      
 First Refresh Time  :  10/07/2021 07:56:09      
 Last Refresh Time   :  10/07/2021 08:07:17      
 Duration            :  668s                     
 Module/Action       :  SQL*Plus/-               
 Service             :  XXXXX.XXXXX.com  
 Program             :  sqlplus.exe              
 Fetch Calls         :  370                      

Global Stats
==========================================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  | Fetch | Buffer | Read | Read  | Write | Write |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes | Reqs  | Bytes | Returned Bytes |
==========================================================================================================================
|     705 |     280 |      270 |        0.00 |      155 |   370 |    40M | 984K |  11GB |  6422 |   3GB |            6GB |
==========================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3015036808)
========================================================================================================================================================================================================
| Id |                      Operation                       |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Mem  | Temp  |
|    |                                                      |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes | (Max) | (Max) |
========================================================================================================================================================================================================
|  0 | SELECT STATEMENT                                     |                               |         |       |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  1 |   FILTER                                             |                               |         |       |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  2 |    NESTED LOOPS OUTER                                |                               |       1 |    3M |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  3 |     NESTED LOOPS OUTER                               |                               |       1 |    3M |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  4 |      HASH JOIN OUTER                                 |                               |       1 |    3M |       538 |   +131 |     1 |       2M |  3387 |   2GB |  3387 |   2GB | 450MB |   2GB |
|  5 |       NESTED LOOPS OUTER                             |                               |       1 |    3M |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  6 |        STATISTICS COLLECTOR                          |                               |         |       |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  7 |         NESTED LOOPS OUTER                           |                               |       1 |    3M |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  8 |          HASH JOIN OUTER                             |                               |       1 |    3M |       155 |     +3 |     1 |       2M |  3035 |   1GB |  3035 |   1GB | 309MB |   1GB |
|  9 |           NESTED LOOPS OUTER                         |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 10 |            STATISTICS COLLECTOR                      |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 11 |             NESTED LOOPS OUTER                       |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 12 |              NESTED LOOPS OUTER                      |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 13 |               NESTED LOOPS                           |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 14 |                NESTED LOOPS OUTER                    |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 15 |                 NESTED LOOPS                         |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 16 |                  NESTED LOOPS OUTER                  |                               |    272K |    1M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 17 |                   NESTED LOOPS                       |                               |    272K |    1M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 18 |                    FILTER                            |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 19 |                     NESTED LOOPS OUTER               |                               |    272K |  598K |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 20 |                      VIEW                            | index$_join$_006              |    276K | 48299 |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 21 |                       HASH JOIN                      |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       | 132MB |     . |
| 22 |                        HASH JOIN                     |                               |         |       |         3 |     +1 |     1 |       2M |       |       |       |       | 124MB |     . |
| 23 |                         INDEX STORAGE FAST FULL SCAN | TET_IX2                       |    276K |  8505 |         1 |     +1 |     1 |       2M |   129 |  54MB |       |       |     . |     . |
| 24 |                         INDEX STORAGE FAST FULL SCAN | TET_IX4                       |    276K | 13077 |         3 |     +1 |     1 |       2M |   167 |  81MB |       |       |     . |     . |
| 25 |                        INDEX STORAGE FAST FULL SCAN  | TET_PK                        |    276K | 11889 |       129 |     +3 |     1 |       2M |   198 |  61MB |       |       |     . |     . |
| 26 |                      TABLE ACCESS BY INDEX ROWID     | TT                            |       1 |     2 |       129 |     +3 |    2M |       2M |  1488 |  12MB |       |       |     . |     . |
| 27 |                       INDEX UNIQUE SCAN              | TT_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |     7 | 57344 |       |       |     . |     . |
| 28 |                    TABLE ACCESS BY INDEX ROWID       | TM                            |       1 |     2 |       129 |     +3 |    2M |       2M |  9875 |  77MB |       |       |     . |     . |
| 29 |                     INDEX UNIQUE SCAN                | TM_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |  1235 |  10MB |       |       |     . |     . |
| 30 |                   TABLE ACCESS BY INDEX ROWID        | TU                            |       1 |     1 |       119 |    +11 |    2M |    17764 |       |       |       |       |     . |     . |
| 31 |                    INDEX UNIQUE SCAN                 | TU_PK                         |       1 |       |       119 |    +11 |    2M |    17764 |       |       |       |       |     . |     . |
| 32 |                  TABLE ACCESS BY INDEX ROWID         | TEP                           |       1 |     2 |       129 |     +3 |    2M |       2M |  140K |   1GB |       |       |     . |     . |
| 33 |                   INDEX UNIQUE SCAN                  | TEP_PK                        |       1 |     1 |       129 |     +3 |    2M |       2M |  1478 |  12MB |       |       |     . |     . |
| 34 |                 TABLE ACCESS BY INDEX ROWID          | TLIM                          |       1 |     1 |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 35 |                  INDEX UNIQUE SCAN                   | TLIM_PK                       |       1 |       |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 36 |                TABLE ACCESS BY INDEX ROWID           | TLPSE                         |       1 |     1 |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 37 |                 INDEX UNIQUE SCAN                    | TLPSE_PK                      |       1 |       |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 38 |               INDEX RANGE SCAN                       | TCX_IX2                       |       1 |     2 |       129 |     +3 |    2M |       2M |  4642 |  36MB |       |       |     . |     . |
| 39 |              TABLE ACCESS BY INDEX ROWID             | TC                            |       1 |     2 |       129 |     +3 |    2M |       2M | 22307 | 174MB |       |       |     . |     . |
| 40 |               INDEX UNIQUE SCAN                      | TC_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |   546 |   4MB |       |       |     . |     . |
| 41 |            INDEX RANGE SCAN                          | TCX_PK                        |       1 |     2 |           |        |       |          |       |       |       |       |     . |     . |
| 42 |           INDEX RANGE SCAN                           | TCX_PK                        |       1 |     2 |         1 |   +131 |     1 |     976K |       |       |       |       |     . |     . |
| 43 |          TABLE ACCESS BY INDEX ROWID                 | TC                            |       1 |     2 |        27 |   +131 |    2M |       2M | 21549 | 168MB |       |       |     . |     . |
| 44 |           INDEX UNIQUE SCAN                          | TC_PK                         |       1 |     1 |        27 |   +131 |    2M |       2M |   959 |   7MB |       |       |     . |     . |
| 45 |        TABLE ACCESS BY INDEX ROWID BATCHED           | TP                            |       1 |     3 |           |        |       |          |       |       |       |       |     . |     . |
| 46 |         INDEX RANGE SCAN                             | TP_PK                         |      15 |     1 |           |        |       |          |       |       |       |       |     . |     . |
| 47 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TP                            |       1 |     3 |        36 |   +157 |     1 |       15 |       |       |       |       |     . |     . |
| 48 |        INDEX RANGE SCAN                              | TP_PK                         |      15 |     1 |        36 |   +157 |     1 |       15 |       |       |       |       |     . |     . |
| 49 |      TABLE ACCESS STORAGE FULL FIRST ROWS            | TLIET                         |       1 |     3 |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 50 |     VIEW PUSHED PREDICATE                            | TEB_VW                        |       1 |    57 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 51 |      NESTED LOOPS OUTER                              |                               |       1 |    57 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 52 |       NESTED LOOPS                                   |                               |       1 |    55 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 53 |        NESTED LOOPS                                  |                               |       1 |    53 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 54 |         NESTED LOOPS                                 |                               |       1 |    51 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 55 |          NESTED LOOPS                                |                               |       5 |    41 |       510 |   +159 |    2M |     6965 |       |       |       |       |     . |     . |
| 56 |           NESTED LOOPS                               |                               |       1 |     7 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 57 |            NESTED LOOPS                              |                               |       1 |     4 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 58 |             NESTED LOOPS                             |                               |       1 |     3 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 59 |              TABLE ACCESS BY INDEX ROWID             | TEP                           |       1 |     3 |       512 |   +157 |    2M |     770K |  661K |   5GB |       |       |     . |     . |
| 60 |               INDEX UNIQUE SCAN                      | TEP_PK                        |       1 |     2 |       512 |   +157 |    2M |       2M |  2934 |  23MB |       |       |     . |     . |
| 61 |              INDEX RANGE SCAN                        | TLP_IX1                       |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 62 |             VIEW                                     |                               |       1 |     1 |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 63 |              SORT AGGREGATE                          |                               |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 64 |               TABLE ACCESS BY INDEX ROWID            | TPR                           |       1 |     1 |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 65 |                INDEX UNIQUE SCAN                     | TPR_PK                        |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 66 |            TABLE ACCESS BY INDEX ROWID BATCHED       | TET                           |       1 |     3 |       511 |   +158 |  770K |     770K | 79759 | 623MB |       |       |     . |     . |
| 67 |             INDEX RANGE SCAN                         | TET_Ix1                       |       1 |     2 |       510 |   +159 |  770K |     899K | 15834 | 124MB |       |       |     . |     . |
| 68 |           TABLE ACCESS BY INDEX ROWID BATCHED        | TWE                           |       5 |    34 |       506 |   +163 |  770K |     6965 |  2080 |  16MB |       |       |     . |     . |
| 69 |            INDEX RANGE SCAN                          | TWE_IDX1                      |      35 |     2 |       506 |   +163 |  770K |     6965 |   118 | 944KB |       |       |     . |     . |
| 70 |          TABLE ACCESS BY INDEX ROWID                 | TT                            |       1 |     2 |       506 |   +163 |  6965 |     1459 |   208 |   2MB |       |       |     . |     . |
| 71 |           INDEX UNIQUE SCAN                          | TT_PK                         |       1 |     1 |       506 |   +163 |  6965 |     6965 |       |       |       |       |     . |     . |
| 72 |         INDEX RANGE SCAN                             | TCX_IX2                       |       1 |     2 |       506 |   +163 |  1459 |     1459 |  1388 |  11MB |       |       |     . |     . |
| 73 |        TABLE ACCESS BY INDEX ROWID                   | TC                            |       1 |     2 |       506 |   +163 |  1459 |     1459 |   936 |   7MB |       |       |     . |     . |
| 74 |         INDEX UNIQUE SCAN                            | TC_PK                         |       1 |     1 |       506 |   +163 |  1459 |     1459 |    75 | 600KB |       |       |     . |     . |
| 75 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TLS                           |       1 |     2 |       506 |   +163 |  1459 |     1451 |     1 |  8192 |       |       |     . |     . |
| 76 |        INDEX SKIP SCAN                               | TLS_PK                        |       1 |     1 |       506 |   +163 |  1459 |     1451 |     1 |  8192 |       |       |     . |     . |
| 77 |    SORT AGGREGATE                                    |                               |       1 |       |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 78 |     FIRST ROW                                        |                               |       1 |     3 |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 79 |      INDEX RANGE SCAN (MIN/MAX)                      | TCX_IX2                       |       1 |     3 |       512 |   +157 |    2M |       2M |  9356 |  73MB |       |       |     . |     . |
=======================================================================================================================================================================================================

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
   0 -  STATEMENT
         U -  first_rows / hint overridden by another in parent query block
           -  first_rows
 
  56 -  SEL$5
           -  no_merge
 
Note
-----
   - this is an adaptive plan

If you want to pull these plans into separate windows and compare (nothing but) the Operations and Names line by line you’ll find that the only differences appear at operations 45, 47, 66, 68, and 75:

| 45 |        TABLE ACCESS BY INDEX ROWID                   | TP                            |
| 47 |       TABLE ACCESS BY INDEX ROWID                    | TP                            |
| 66 |            TABLE ACCESS BY INDEX ROWID               | TET                           |
| 68 |           TABLE ACCESS BY INDEX ROWID                | TWE                           |
| 75 |       TABLE ACCESS BY INDEX ROWID                    | TLS                           |

| 45 |        TABLE ACCESS BY INDEX ROWID BATCHED           | TP                            |
| 47 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TP                            |
| 66 |            TABLE ACCESS BY INDEX ROWID BATCHED       | TET                           |
| 68 |           TABLE ACCESS BY INDEX ROWID BATCHED        | TWE                           |
| 75 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TLS                           |

So what could possibly make one plan so much slower than the other?

There are all sorts of bits and pieces in these plans that you need to be able to spot “in passing” if you want to become fluent at understanding execution plans. It’s something that takes a lot of practise but there’s one general tip (or warning, perhaps) that I can offer.

If you start out by looking for one particular thing you’ll miss lots of important clues; on a first pass through the plan just try to notice anything that looks a little interesting or informative, then go back for a more detailed examination on a second pass through the plan.

I won’t go through the entire pattern of thought that went through my mind as I started looking at these plans, but here are a couple of flags I raised

  • Adaptive plans in SQL Monitor – we’re likely to see some “statistics collector” operations and that’s the “obvious” source of the anomaly, but reading plans that include their adaptive bits can be a mind-bending experience.
  • Global Stats of the slow one says 984K read requests (compared to 251K for the fast plan) – that might explain the difference in timing – keep an eye out for where the big numbers appear. (NB Don’t, at this point, go looking for them as that may lead you into missing the real issue.)
  • The slow plan plan shows the top operation with a Start Active of +157 while the fast plan has a start active of +2: that’s pretty consistent with a comment the user made (but I hadn’t mentioned) about the response time from the user’s perspective; and it tells us that there’s a blocking operation somewhere in the slow plan. That’s nice because that’s what we might expect from seeing an adaptive plan switching from a nested loop to a hash join. (So we already think we’re a clever bunny – too bad it wasn’t quite the right answer.)
  • There are two places in the plan which report Statistics Collector, both having a Nested Loop Outer then a Hash Join Outer immediately above them as the two candidate consumers for the rowsource supplied by the Statistic Collector’s child operation. (Note: in some cases you will see two Nested Loop operations above a Statistics Collector before you get to the Hash Join, but that’s just because Oracle may implement a nested loop join in two steps, first to the index, second to the table)

First thoughts

Bearing in mind the warning about not digging “too deep too soon”, here are a few potentially significant observations. In the following notes I’ll simply refer to the plans as the “fast” plan and the “slow” plan.

  • The slow plan shows a large hash join with spill to disc at operation 4, and that’s one of the options triggered by the statistics collector at operation 6. It’s a little confusing that the nested loop join following it also reports execution statistics because only one of the two operations would have been doing anything, but we’ll assume for now that that’s a reporting error and postpone making a fuss about it. The same pattern appears at operations 8, 9 and 10.
  • A cross-check to the fast plan shows that the hash joins don’t do any work at the corresponding operations (although it also displays the oddity of operations that (we believe) didn’t actually run reporting a long Time Active.
  • The slow plan has an oddity at operation 41 – it’s an index range scan of a primary key (TCX_PK) that doesn’t report any execution statistics. Cross-checking to the slow plan we see that it’s operation 42 that doesn’t report any execution statistics (and it’s also an index range scan, and using the same index!) . We note that the fast plan “Starts” its range scan 2 million times, while the slow plan starts just once, starting at time +131 and having an active time of 1 second. [side-note: I am a little suspicious of that number – It looks to me as if it ought to be reporting 27 seconds]
  • Keep going – because just a bit further down we see that the slow plan has no stats for operations 45 and 46 (index range scan of TP_PK with table access of TP) while the fast plan has no stats for operations 47 and 48 (also an index range scan of TP_PK with table access to TP). Again we see the same pattern that the slow plan executes the operation just once while the fast plan executes its operations 2M times.
  • Keep going – the previous two observations are interesing and probably need further investigation, but they might not be critical. The very next line (operation 49) in both plans shows us a “TABLE ACCESS STORAGE FULL FIRST ROWS” that executes 2 million times – that’s got to hurt, surely, but let’s not get side-tracked just yet.
  • Operation 50 is a “VIEW PUSHED PREDICATE” – that almost certainly means it’s the second child of a nested loop join with a join predicate pushed into a non-mergeable view (and the view name is TEB_VW so it’s not a view created by an internal transformation) and the operation has, like so many other lines in the plan, started 2 million times.
  • Looking at the rest of the plan, there are no more statistics collectors and the plans have an exact match on operations. Unfortunately we don’t have a Predicate Information section, so we can’t tell whether matching operations were really doing the same thing (e.g. was an index range scan in one plan using more columns to probe the index than the corresponding index range scan in the other plan) however we can check times:
  • The View Pushed Predicate in the fast plan starts at time +24 [Another slightly suspicious time given all the +2 starts in the locale] and is active for 256 seconds, while in the slow plan it starts at time +163 and is active for 506 seconds. So it looks as if a lot of the excess run time of the query time is spent in this part of the plan — for no logical reason that we can see – although the access to table TEP by index TEP_PK does no physical reads in the fast plan and 661K reads in the slow plan.
  • Again we take a quick note, and move on. The final observation is that the last three lines of the plan look like the plan for a subquery block (executed very efficiently) of the “find the most recent / highest / lowest” type, and a quick check to the top of the plan shows that its parent is a FILTER operation, corroborating our first guess.

Starting to dig

Reviewing the first pass we can see that we lose a lot of “startup” time to the two hash joins where the build table in each case has to be completed before any further tables can be joined. This is in the order of 160 seconds, which is consistent with the OP’s observations, and it’s happening because adaptive plans are activated, triggering a change from nested loop joins to hash joins.

More significantly, from the perspective of time, is that the nested loop join into the View Pushed Predicate is active for twice as long in the slow plan as it is in the fast plan – so that’s a place to look a little more closely, revealing that operation 59 is probably the reason for the difference: 661 thousand read requests in the slow plan but none in the fast plan.

Unfortunately we don’t have any Activity Stats (i.e. active session history data) in the report, but since the access to the table is reported as unique access by unique index in both cases we can be fairly sure that the difference isn’t due to a difference in the Predicate Information (that isn’t in the report).

Clearly we need to stop the stop the adaptive plan from being taken to avoid the start-up delay (e.g. add a /*+ no_adaptive_plan */ hint to the query) but that still leaves two puzzles:

  1. why are the rows estimates so bad (and at least part of the reason for that is that it turned out that the query was being optimized with optimizer_mode = first_rows – that’s the legacy first_rows, not a cost-based first_rows_N);
  2. how could the same sub-plan result in far more physical reads in one case compared to the other when the critical operation is a unique index access.

The answer to the second question could be in an observation I first published 14 years ago – and it could indicate a generic threat to adaptive optimisation.

If you have an execution plan which, stripped to a minimum, looks like this:

Join Operation
        Table_X
        Table_Y

The order in which the result appears is likely to change depending on the join mechanism that Oracle chooses, viz Hash Join, Merge Join or Nested Loop Join.

Under “non-adaptive” conditions if you have a join that’s border-line between a hash join and a nested loop join it frequently means that the optimizer will fip flop between two plans like the following (leading to the classic question – nothing changed why did the plan change):

Hash Join
        Table_X
        Table_Y

Nested Loop Join
        Table_Y
        Table_X

Note that the order in which the tables appear is reversed.

As it says in another article of mine: all joins are nested loop joins, just with different startup costs”. In both the plans above Oracle picks a row from Table_Y and looks for a match in Table_X, so the order of the result set is dictated by the Table_Y regardless of whether the join is a hash join or a nested loop join. However, if Oracle has decided to use an adaptive plan and starts with the nested loop (Y -> X) and decides to switch to a hash join it doesn’t swap the join order as the join mechanism is selected, so a result set whose order would have been dictated by Table_Y turns into the same result set (we hope) but in an order dictated by Table_X.

Consequences:

If you’re using very big tables and Oracle produces an adaptive nested loop join early in the plan this may result in a later nested loop being lucky and inducing lots of “self-caching” because its driving rowsource is in a nice order. If the plan adapts to a hash join the driving data set may appear in a completely different order that makes the later nested loop jump randomly around a very large table, inducing a lot of “self-flushing” as one table block is discarded from the buffer cache to make space for another. (I published an article several years ago about how a similar – though far from identical – type of anomaly could happen with Exadata and compression: an unlucky order of data access causing a massive extra workload.)

Conclusion and further thoughts

In this note I’ve tried to present my thoughts as I’ve read through an execution plan trying to understand what it’s doing and why it’s showing the performance characteristics it does.

In this case the interpretation was made harder because the plan was an adaptive plan – and there doesn’t appear to be an option in the procedure in dbms_sql_monitor to choose between hiding and revealing the adaptive parts [ed: this statement may apply only to the text option – see comment #1 for a counter-example using the ‘Active HTML” option]; moreover there was no Activity (ASH) information supplied and we didn’t have the Predicate Information.

The performance “issue” was that when adaptive plans were allowed (as opposed to reported only) we could see that two nested loops changed to hash joins. It was fairly clear that this explained the huge delay before results started to appear, but didn’t explain why the query took so much longer to complete.

We have a hypothesis that the extra run time of the query was due to “bad luck” because we can see very clearly that a nested loop into a non-mergeable view with pushed predicate reports a huge number of single block read requests; and we know that changing a join from a nested loop to a hash join without changing the order of the child operations will change the order in which the join’s rowsource is generated.

Ini this case the query was executing under the legacy first_rows optimizer mode, and it’s possible that if first_rows_N had been used the optimizer would have behaved differently, especially since we have a query that is returning 2M rows and we only want the first few rows.

Next Steps

The obvious “next step” in this investigation is to check whether first_rows_N co-operates nicely with adaptive optimisation. After all, the only significant thing that adaptive optimisation does to (serial) execution plans is set an inflexion point to dictate when a nested loop should change to a hash join – and a hash join is a blocking operation which is rarely a good thing for a first_rows_N plan.

So, does first_rows_N disable this adaptive plan analysis, does it move the inflection point significantly, or does the optimizer simply forget that hash joins are less desirable in first_rows N optimisation. And if you’re running a system in first_rows_N mode should you disable adaptive plans by default, and only enable it for special cases.

I also have an urge to test a couple of ideas about why the two timing anomalies I mentioned have appeared, but it’s already taken me several hours to write notes (including a few replies to the list server) about the 30 minutes I’ve spent looking at an execution plan, so any further experimens will have to go on my to-do list.

Next Page »

Website Powered by WordPress.com.