Oracle Scratchpad

July 21, 2022

Pagination cost

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 3:56 pm BST Jul 21,2022

There’s a thread on the MOSC database tuning forum (needs an account) at the moment asking why a “fetch first N” query to fetch next 41 rows with an offset of 8602 rows takes longer to run than the same query when the offset is zero rows. Here’s a possible answer with a little lesson in thinking about what’s going on.

Apart from gremlins in the system there are two possible reasons

  • nothing has changed, but it takes longer to fetch 8643 rows in order and discard 8602 of them than it takes to fetch 41 rows in order and discard none
  • the optimizer has worked out that if it has to fetch 8643 rows then it ought to use a different plan but (as often happens) it was a bad idea to change the plan.

Here’s a little script to build some demo data.

rem
rem     Script:         fetch_first_offset.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1 
as
select 
        * 
from 
        all_objects
where   rownum <= 50000
order by 
        dbms_random.value
/

create index t1_i1 on t1(object_name);

alter session set statistics_level = all;
set serveroutput off
set feedback off

column owner format a20
column object_type format a12
column object_name format a32

All I’ve done is create a table of 50,000 rows with an order by clause that maximises the randomness of the data pattern so that the index on object_name will have a very high clustering_factor.

Here’s the first query I’m going to run, followed by the execution plan, pulled from memory with rowsource execution stats enabled. I’ve queried for the first 20 rows (offset 0 next 20) ordered by object_name:

select
        owner, object_type, object_name
from
        t1
order by
        object_name
offset 
        0 rows
fetch next 
        20 rows only
/

select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last'));


SQL_ID  fmdb8vuxwkp99, child number 0
-------------------------------------
select  owner, object_type, object_name from  t1 order by  object_name
offset  0 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    23 (100)|     20 |00:00:00.01 |      25 |      2 |
|*  1 |  VIEW                         |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     20 |     3   (0)|     20 |00:00:00.01 |       5 |      2 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=20 AND
              "from$_subquery$_002"."rowlimit_$$_rownumber">0))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=20)

As you can see from the execution plan Oracle has used an index full scan (because that will access the data in exactly the right order and allows the “nosort stopkey” on the “window (no)sort” operation). It has fetched (A-Rows) 20 rows and reported a cost of 23 – which basically corresponds to 3 block visits for the index and one block visit for each row from the table. In passing you’ll notice from the Predicate Information at operation 2 that Oracle has transformed our “fetch first” into an analytic query using row_number() over(). The phrase “syntactic sugar” seems appropriate.

How do things change if we ask for the 2nd 20 rows – (offset 20, next 20). I’ll show only the output from dbms_xplan, including its slightly mangled SQL statement but dropping the Predicate Information:

select  owner, object_type, object_name from  t1 order by  object_name
offset  20 rows fetch next  20 rows only

Plan hash value: 3254925009

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    43 (100)|     20 |00:00:00.01 |      45 |
|*  1 |  VIEW                         |       |      1 |     40 |    43   (0)|     20 |00:00:00.01 |      45 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     40 |    43   (0)|     40 |00:00:00.01 |      45 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  49999 |    43   (0)|     40 |00:00:00.01 |      45 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     40 |     3   (0)|     40 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------------------

As you can see, the optimizer has still decided to use the index full scan, and this time has fetched 40 rows and passed them up the plan until at operation 1 it discards the first 20 rows. The cost (43) is again related to 3 blocks for the index, 40 blocks for 40 rows from the (randomly distributed) table.

What would we see if we added a /*+ full(t1) */ hint to the query to force a tablescan to get the 2nd 20 rows?

select /*+ full(t1) */  owner, object_type, object_name from  t1 order
by  object_name offset  20 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.01 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |     40 |   947   (1)|     20 |00:00:00.01 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|     40 |00:00:00.01 |     996 | 15360 | 15360 |14336  (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Oracle has obeyed the hint, and the tablescan has fetched all 50,000 rows from the table and sorted them. Fortunately the optimizer knows that it needs only the top 40 rows so it has been discarding rows as it sorts, hence the appearance of the “pushed rank” in the “window sort” at operation 2; we haven’t had to create a sorted list of all 50,000 rows before picking the top 40. Again, once we’ve got the top 40 we discard the top 20 to allow for the offset.

We note that the cost of the tablescan was 278 but the cost of the sort was really rather large, taking the total cost of this path to 947. So here’s a thought experiment – what’s likely to happen if we ask for an offset of 940 and next 20? Given the costs we’ve seen for the indexed access path the optimizer will calculate a cost of 3 (plus a bit, maybe) for the index and a cost of 960 for visiting the table giving a total cost of about 963 – which should make the tablescan strategy the lower cost.

select  owner, object_type, object_name from  t1 order by  object_name
offset  940 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.02 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |    960 |   947   (1)|     20 |00:00:00.02 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|    960 |00:00:00.02 |     996 |   267K|   267K|  237K (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

My guesswork about the cost seems to have been nearly right. Unhinted, with an offset of 940 (which you can see as the 960 rows fetched) the optimizer has decided that the tablescan path has a lower cost than the indexed access.

Of course we ought to check this by hinting the indexed access path and seeing what its cost is:

select  /*+ index(t1) */  owner, object_type, object_name from  t1
order by  object_name offset  940 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |   969 (100)|     20 |00:00:00.01 |     968 |      4 |
|*  1 |  VIEW                         |       |      1 |    960 |   969   (1)|     20 |00:00:00.01 |     968 |      4 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |  50000 |     9   (0)|    960 |00:00:00.01 |       9 |      4 |
-----------------------------------------------------------------------------------------------------------------------

The cost of the indexed access path is 969 – that’s 960 for the randomly scattered table rows we need plus 9 for the index scan (because at 960 index entries we’re going to visit a few more index leaf blocks than the original 3).

Summing Up

I’ve demonstrated with a simple query using “offset N rows fetch M rows only” that the optimizer will calculate the cost of fetching “N + M” rows using whatever paths are available, then pick the lowest cost path.

As you might expect, the presence of a suitable index will encourage the optimizer to walk the index in order jumping randomly around the table to avoid the cost of acquiring all the relevant data and sorting it. So for “small” values of “offset + next” Oracle might choose an indexed access path with “window nosort stopkey”, but for “large” values of “offset + next” it might choose to do a full tablescan with “window sort pushed rank”.

The consequence of this is that – in the same way we see the optimizer switching between nested loops and hash joins at the wrong moment – we may see the optimizer switch from an indexed access path to a full tablescan either too soon, or too late.

Answering the question

Why did the query with an offset of 8602 take so much longer than the query with an offset of zero when the next was only 41 rows?

It may be that the optimizer stuck with an indexed access path and had to do physical reads of 8,643 blocks when it should have switched to a tablescan.

It may be that the optimizer switched to a tablescan and sort when it should have stuck with using an index on well-clustered, well-cached, data.

As so often happens, the first step to answering an SQL performance question is to look at the actual execution plans.

July 15, 2022

Index Wildcard

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 11:51 am BST Jul 15,2022

This is a little case study on working around a problem caused by a 3rd-party application that couldn’t be changed quickly and easily. There’s a touch of irony in this case as the problem I’ll address is a side effect of the best possible workaround to a more generic design flaw in the application, moreover the side effect appears because of a special case optimisation mechanism introduced in 12.2.

A recent post on the MOS database forum (needs an account) came up with a slightly surprising problem – a query with a predicate “columnX like ‘%XYZ%” would perform reasonably well if hinted to use a B-tree index on columnX (doing an index full scan) but performed badly when cursor_sharing was set to force because it switched to an index range scan which did a lot more work.

Modelling

Here’s a little script to build some data to demonstate the issue:

rem
rem     Script:         index_wildcard.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0

create table t1
as
select  obj.*
from
        (select * from all_objects where rownum <= 50000) obj,
        (select rownum from dual connect by level <= 5) gen
/

create index t1_id on t1(object_name);

alter session set cursor_sharing = force;
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

set serveroutput off

All I’ve done is create a simple heap table with 250,000 rows and an index (which happens to be on a column declared as NOT NULL, but that detail is not important). Here’s my query:

select  /*+ index(t1(object_name)) cursor_sharing_exact */
        t1.*
from
        t1 t1
where
        object_name like '%XYZ%'
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

In normal circumstances the leading wildcard would mean that the optimizer would not be able to use an index driven plan (unless the query satisfied the requirements to allow an index fast full scan), so the default plan would have been a tablescan with a selectivity of 5% (estimated rows = 12,500).

Because I’ve hinted the use of an index I get the following plan from a call to dbms_xplan.display_cursor():

LAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  1q1p6fyubhafu, child number 0
-------------------------------------
select /*+ index(t1(object_name)) cursor_sharing_exact */  t1.* from
t1 t1 where  object_name like '%XYZ%'

Plan hash value: 2798063786

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.08 |    1718 |   1717 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.08 |    1718 |   1717 |
|*  2 |   INDEX FULL SCAN                   | T1_ID |      1 |  12500 |      0 |00:00:00.08 |    1718 |   1717 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("OBJECT_NAME" LIKE '%XYZ%')

Because I’ve set cursor_sharing to force I’ve included the hint /*+ cursor_sharing_exact */ in the query to make sure that Oracle sees the query with a literal rather than a system generated bind variable. The plan tells us that the optimizer has picked an index full scan, and the Predicate Information for operation 2 tells us that Oracle has checked our predicate for every single entry in the index, and the A-rows column tells us that it didn’t find any matching index entries. So the total work done was 1718 buffer gets and 250,000 index entries tested. (You’ll notice that the E-rows is still the standard 5% guess.)

Now let’s try again, allowing enforced cursor sharing to be applied (i.e. taking out the cursor_sharing_exact hint):

SQL_ID  cm4tjwuqxgbqg, child number 0
-------------------------------------
select /*+ index(t1(object_name)) */  t1.* from  t1 t1 where
object_name like :"SYS_B_0"

Plan hash value: 539998951

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.26 |     241K|
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.26 |     241K|
|*  2 |   INDEX RANGE SCAN                  | T1_ID |      1 |  12500 |    250K|00:00:00.04 |    1718 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_NAME" LIKE :SYS_B_0)
   2 - access("OBJECT_NAME" LIKE :SYS_B_0)

The SQL text shows us that the literal has been replaced by the bind variable :SYS_B_0. Operation 2 has changed from an index full scan to an index range scan – which “feels” as if “the range” should be the whole index and the query shouldn’t, therefore, do any more work than the index full scan plan. But the Buffers column now shows 241K buffer visits at operation 1, the table access! What’s gone wrong?

In the case of the OP the unexpected visit to the table blocks didn’t do just a few hundred thousand “unnecessary” buffer visits, it did 4 million buffer gets and 127,000 disk reads.

Look at the Predicate Information (always): the filter() that matched our original predicate is no longer being applied to the index, it’s only applied at the table. The only predicate applied to the index is an access() predicate – so Oracle takes the bind value and works out that the range scan should start at the beginning of the index and stop at the end of the index – that’s what an access() predicate does, it doesn’t check the actual values in the index entries once it has derived the start and stop locations.

Trouble-shooting

You will have noticed in the initial code to set up the data I had a call to set event 10053 (the CBO trace event) – a fairly quick and easy option to look at in this case since I’m looking at such a simple query. The critical details that showed up in the trace files were:

  • in both cases the optimizer reported “User hint to use this index” in the Base Statistical Information.
  • in the case of the “literal string” SQL the optimizer did not consider an index full scan
  • in the case of the “forced bind” SQL the optimizer did not consider an index range scan

This is worth knowing as it tells us that the switch between paths was not an “accident” of circumstances or statistics, it was a deliberately code strategy; so we need to discover whether there is something we can do to change the strategy.

Time to check the hidden parameters and fix controls, and this is what I found in a listing of the 19 .11.0.0 system fix controls (listed as appearing in 12.2.x.x):

     BUGNO OPTIMIZE SQL_FEATURE                        DESCRIPTION

---------- -------- ---------------------------------- ---------------------------------------------------------------- 
  20289688 12.2.0.1 QKSFM_ACCESS_PATH_20289688         check for leading wildcard in LIKE with bind 

Doesn’t that sound a little bit promising? Let’s turn the fix off by adding the hint /*+ opt_param(‘_fix_control’ ‘20289688:0’) */ to the problem query and see what we get:

SQL_ID  75cxtymdrh8r2, child number 0
-------------------------------------
select /*+   index(t1(object_name))   opt_param('_fix_control'
'20289688:0')  */  t1.* from  t1 t1 where  object_name like :"SYS_B_0"

Plan hash value: 539998951

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.08 |    1718 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.08 |    1718 |
|*  2 |   INDEX RANGE SCAN                  | T1_ID |      1 |  12500 |      0 |00:00:00.08 |    1718 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME" LIKE :SYS_B_0)
       filter("OBJECT_NAME" LIKE :SYS_B_0)

We still have an index range scan (with a range that is obviously the whole index), but our predicate is now being applied as a filter() in the index with a total of only 1,718 buffer gets even though the literal has been replaced by a system-generated bind variable.

Summary

The application is accepting a search value from the user interface and constructing “literal string” SQL that adds a leading and trailing wildcard to the value. As a result of this the optimizer will not consider using a particular index unless hinted, and does a very expensive tablescan.

The optimizer will accept and use an index() hint to produce a plan that is much more efficient than the tablescan even though it has to use an index full scan and apply a filter predicate on every index entry.

Unfortunately cursor_sharing has been set to force to deal with the general parsing overheads of applications that construct “literal string” SQL, and when the index() hint is applied to a bind-variable version of this query the optimizer applies an index range scan and postpones the filter() until it has acquired rows from the table.

A check of the system fix controls identified a fix that looked as if it might have some relevance to our situation, “leading wild card, bind variable, LIKE clause”, and a test with an opt_param() hint to switch off that optimizer “fix” had the effect we wanted – the optimizer still reported an index range scan but it also used our initial predicate as a filter() predicate on the index.

Technically, since the fix_control was introduced in 12.2.0.1 we could have got the same effect by setting the optimizer_features_enable back to 12.1.0.2 – but that’s not a good idea when the production version is so much newer although it might be an acceptable strategy when implemented as a hint for a few special case queries.

After showing that the fix control works by hinting the next step should be to create an SQL Patch (using dbms_sqldiag.create_sql_patch) attached to the (forced) bind-variable version of the query.

Update (a few hours later)

It didn’t take long to find that the fix_control and the optimzer_features_enable were not the only workarounds. There’s a hidden parameter “_optim_peek_user_binds” with the description “enable peeking of user binds” default value “true”, so if you set this to false the optimizer can’t notice that the critical bind value starts with a wildcard and behaves the way the OP wants. As with the other options you could set this as a system parameter or a session parameter, or as a hint (SQL patch etc.) /*+ opt_param(‘_optim_peek_user_binds’ ‘false’) */ – the last being the kindest for the system.

Update (21st July)

Bug number 33500777 (unpublished) looks like an exact match for the problem; the description reads:

FILTER FOR PREDICATE WITH LIKE AND BIND WITH LEADING WILDCARD IS NOT ALLOCATED ON INDEX SCAN AFTER FIX 20289688

May 31, 2022

Join View delete

Filed under: ANSI Standard,Bugs,CBO,Join Elimination,Oracle,Transformations — Jonathan Lewis @ 5:39 pm BST May 31,2022

I’ve written a couple of notes about deleting from join views and (ultimately) the fact that if you have muliple key-preserved tables in join view then the first key preserved table in the from clause is the one where the delete is applied. The topic came up on the Oracle developer forum again today, and closed with a funny little quirk.

Here’s a little data model followed by one query and two deletes that I’ve just run on Oracle 19.11.0.0 and 21.3.0.0. The query and first delete feature a join using “ANSI” syntax, the second delete translates the join into “traditional” Oracle syntax.

rem
rem     Script:         delete_join_2.sql       
rem     Dated:          May 2022
rem     Author:         J P Lewis
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem

create table t_parent (
        id      number 
                        constraint t_parent_pk primary key,
        val     varchar2(10)
);

create table t_child (
        id      number 
                        constraint t_child_fk references t_parent(id),
        val     varchar2(10)
);

alter table t_child add constraint t_child_pk primary key(id);

insert into t_parent values(1,rpad('parent',10));
insert into t_child  values(1,rpad('child', 10));

commit;

execute dbms_stats.gather_table_stats(user,'t_parent',cascade=>true)
execute dbms_stats.gather_table_stats(user,'t_child', cascade=>true)

set autotrace on explain

select t_child.* from t_parent join t_child on t_child.id = t_parent.id;

delete
        (select t_child.* from t_parent join t_child on t_child.id = t_parent.id);

delete (select t_child.* from t_parent, t_child where t_child.id = t_parent.id);

set autotrace off

The table definitions and content (with a little modification) come from the forum posting, and I have to admit that the “one-to-one” referential integrity from parent to child is a little unusual (though occasionally a very effective strategy). I’ve also enabled autotrace to get the (probable) execution plans for the three statements, but in the script on my test machine I’ve also enabled the CBO (10053) trace to pick up a few extra details.

The query is there to show us that the join will return a single row, and the row content will be from the t_child table. What happens, though, when we try to use that query as an inline view to the delete.

It is an updateable join view – and both tables are, in fact, key-preserved thanks to the primary key constraints at both ends of the join. Here’s what we get (with the execution plans removed from the output):

        ID VAL
---------- ----------
         1 child

1 row selected.



delete
*
ERROR at line 1:
ORA-02292: integrity constraint (TEST_USER.T_CHILD_FK) violated - child record found


1 row deleted.

Line 5 tells us that we selected one row – as expected.

Line 11 tells us that we’ve raised an error because we’re trying to break the foreign key constraint on the t_child table – and that error shouldn’t be a surprise because the t_parent table appears first in the from clause, so that’s the table the optimizer wants to delete rows from.

However, line 15 tells us that if we change to “traditional ” Oracle syntax the delete works – so which of the two delete statements is displaying buggy behaviour?

Execution plans

There’s a clue in the execition plans (when we get them – autotrace doesn’t give a plan for the delete that raised the error). So here are the plans for the select and the “traditional” delete in that order:

----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    14 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T_CHILD |     1 |    14 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------



-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT |            |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  DELETE          | T_CHILD    |       |       |            |          |
|   2 |   INDEX FULL SCAN| T_CHILD_PK |     1 |     3 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With assistance from the foreign key constraint, Join Elimination has occurred in both cases leaving t_child as the candidate for deletion. You’ll also note that the optimizer has recognised that while the view’s select list is “all columns” it only needs to optimize for “select rowid” to drive the delete, so produces an index-only access path to get the data it needs.

If we use explain plan, or check the 10053 trace file, for the ANSI delete that failed we’ll find the following:

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |             |     1 |     6 |     1   (0)| 00:00:01 |
|   1 |  DELETE             | T_PARENT    |       |       |            |          |
|   2 |   NESTED LOOPS      |             |     1 |     6 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN  | T_CHILD_PK  |     1 |     3 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| T_PARENT_PK |     1 |     3 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$4396EC5C
   3 - SEL$4396EC5C / T_CHILD@SEL$1
   4 - SEL$4396EC5C / T_PARENT@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T_CHILD"."ID"="T_PARENT"."ID")

There’s no Join Elimination, and Oracle is following the rule that “the first key preserved table in the from clause should be the target of the deletes.

Where’s the bug?

This is a question that doesn’t have an obvious answer.

The traditional syntax gives you the result you probably wanted but it’s deleting from the second table in the from clause of the join view – so is that a documentation bug or a coding bug because if the documentation is correct the statement should have raised the error that the ANSI syntax gave.

Conversely, is there an optimizer bug (or limitation) because table elimination could have been used in the ANSI case but wasn’t, although you could say that the ANSI code is doing the right thing (following the documentation) by raising an error and that the traditional syntax is doing the wrong thing.

If we search the CBO (10053) traces of the select and the ANSI join, looking for sections where the optimizer considers Join Elimination, the results are helpful:

grep -nT "^JE" or19_ora_30861_*.trc >temp.txt

With a little cosmetic editing here’s the output from the trace of the select statement:

or19_ora_30861_select.trc:  78  JE - Join Elimination
or19_ora_30861_select.trc: 807  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_select.trc: 813  JE:[V2] Query block (0x7fd3a152f5c0) before join elimination:
or19_ora_30861_select.trc: 816  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 817  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 818  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 819  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_select.trc: 820  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_select.trc: 821  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 822  JE:[V2] Query block (0x7fd3a152f5c0) after join elimination:

or19_ora_30861_select.trc: 844  JE:[V2]   Considering Join Elimination on query block SEL$58A6D7F6 (#0)
or19_ora_30861_select.trc: 850  JE:[V2] Query block (0x7fd39c09ee60) before join elimination:
or19_ora_30861_select.trc: 853  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 854  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 855  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 856  JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 857  JE:[V2] Eliminate table: T_PARENT (T_PARENT)
or19_ora_30861_select.trc: 858  JE:[V2] Query block (0x7fd39c09ee60) after join elimination:

We can see that the optimizer has considered join elimination twice (lines 2 and 12), once for each of two query blocks (sel$1 – a query block embedded in the text – and sel$58a6d7f6, a query block name generated during some query transformation).

At lines 5 and 6 we can see that Oracle failed to take advantage of the referential integrity constraint to eliminate t_parent with the explanation “predicate column refs”; but at lines 16 and 17 we see that there’s a change of heart and the optimizer does take advantage of the referential integrity to eliminate t_parent. So let’s look at the text of the “query block before join elimination” in both cases and see what’s changed:

First pass
----------
SELECT 
        "T_PARENT"."ID"  "QCSJ_C000000000300000",
        "T_PARENT"."VAL" "QCSJ_C000000000300002",
        "T_CHILD"."ID"   "QCSJ_C000000000300001",
        "T_CHILD"."VAL"  "QCSJ_C000000000300003" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"


Second Pass
-----------
SELECT 
        "T_CHILD"."ID" "ID",
        "T_CHILD"."VAL" "VAL" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"

The first pass seems to have expanded the two tables in the intial JOIN, ignoring the limited select list, so join elimination is blocked by the presence of the (non-PK) column t_parent.val (aliased as qcsj_c000000000300002, but when we get to the 2nd pass the select list has been limited to the minimum necessary set of columns that would be needed by a pure select statement and join elimination is possible.

The clue to how/why this has happened is the in the few lines between the two passes; the trace file also reports: “CVM: Merging SPJ view SEL$1 (#0) into SEL$2 (#0)” and when we look for the point where the optimizer registered query block sel$2 it turns out to have one item in its from clause, an object called: from$_subquery$_003 (which tells us where the 3 came from in all those column aliases QCSJ_C0000000003nnnnn. In effect it looks like the optimizer started by rewriting the ANSI select in Oracle syntax as (with a little poetic licence for clarity):

select  child_id, child_val, parent_id
from    (
         select  t_child.*, t_parent.*
         from    t_child, t_parent
         where   t_child.id = t_parent.id
        ) from$_subquery$_003
/ 

So the first pass was the optimizer trying to do join elimination on the inner query block and the second pass happened after the inner block was merged with the outer block and the resulting query block (called sel$58a6d7f6) allowed the join elimination to take place

So what happened with the delete statement. Here’s the result of the call to grep:

or19_ora_30861_delete.trc:  90  JE - Join Elimination
or19_ora_30861_delete.trc: 837  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_delete.trc: 843  JE:[V2] Query block (0x7fd3a152f758) before join elimination:
or19_ora_30861_delete.trc: 846  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_delete.trc: 847  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 848  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_delete.trc: 849  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_delete.trc: 850  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_delete.trc: 851  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 852  JE:[V2] Query block (0x7fd3a152f758) after join elimination:

That’s the lot – there is no attempt to do join elimination on a querye block called sel$58a6d7f6, though a few lines after this extract the trace file reports:

CVM:   Merging SPJ view SEL$1 (#0) into SEL$2 (#0)
Registered qb: SEL$58A6D7F6 0x9c0a3038 (VIEW MERGE SEL$2; SEL$1; SEL$2)

That’s exactly the same view merging that was present in the select – but the optimizer doesn’t attempt join elimination before moving on to report, a couple of lines later:

CVM:   Merging SPJ view SEL$58A6D7F6 (#0) into DEL$1 (#0)
Registered qb: SEL$4396EC5C 0x9c09dde0 (VIEW MERGE DEL$1; SEL$58A6D7F6; DEL$1)

It looks like this is a case of the optimizer missing a cycle in its transform/optimze loopback. The delete went through exactly the same process as the select, but failed to optimize the intermediate query block (which happened to be called from$_subquery$_004 in this case, being one layer deeper thanks to the extra level introduced by the delete).

Workaround

I tried two or three things to get the ANSI-style code to work without changing the table order in the from clause – extra layers of subqueries, explicit aliases, but Oracle kept coming back to a point where it had the full projection of all the columns in the two tables. (This reminded me of an old bug in ANSI select expansion that was fixed in 12cR2. It made me wonder if this was a code path where the same bug had been overlooked – it also made me wonder if there would be any problems if the two tables in the join totalled more than 1,000 columns – but that’s a test for another day.)

I did eventually find a fix (that didn’t involved switching the table order in the inline view):

delete
        (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);

Note that I now have only the primary key of t_child in the select

The section in the CBO trace file for Join Elimination looked like this:

JE:[V2] Query block (0x7f93996231f8) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_PARENT"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_PARENT" "T_PARENT","TEST_USER"."T_CHILD" "T_CHILD" WHERE "T_CHILD"."ID"="T_PARENT"."ID"
JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2] Eliminate table: T_PARENT (T_PARENT)
JE:[V2] Replaced column: T_PARENT.ID with column: T_CHILD.ID
JE:[V2] Query block (0x7f93996231f8) after join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_CHILD"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_CHILD" "T_CHILD"
Registered qb: SEL$E703A888 0x996231f8 (JOIN REMOVED FROM QUERY BLOCK SEL$1; SEL$1; "T_PARENT"@"SEL$1")

You may decide that this is an obvious workaround, but it’s not particularly helpful. In the general case I would expect to see a stored view that joined the two tables, with the expectation that this type of delete was just one of many operations the view was used for. Rewriting a view to use the increasingly common ANSI syntax shouldn’t result in some statements crashing “for no apparent reason”.

The last laugh

Given that the select and delete both included a merge of sel$1 into sel$2 to produce a query block called sel$58a6d7f6 I wondered what would happen if I blocked the merge:

SQL>  delete  /*+  no_merge (@sel$1) */
  2          (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);
	(select t_child.id from t_parent join t_child on t_child.id = t_parent.id)
        *
ERROR at line 2:
ORA-01752: cannot delete from view without exactly one key-preserved table

Isn’t that the error message that should have been coming out the entire time – according to the orginal documentation for updateable join views?

April 13, 2022

Adaptive Joins

Filed under: CBO,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 1:53 pm BST Apr 13,2022

There’s a question on the Oracle Forums at the moment about a query that’s taking a long time to parse. Even after being reduced to one prebuilt (currently remote) table with two non-correlated outer joins to it the parse time is several hundred seconds. This seems fairly bizarre – I have seen some very long parse times from Oracle, but 10 minutes for 3 tables is well over the top; it did remind me, though of a note I started a few years ago of a 4 table join taking 4 seconds to parse, so I thought I’d present the query, the plan, and a little chat on debugging. Here’s the query:

select
        /*+ parallel(t4,3) */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

I’m expecting a simple cascade of hash joins, with t1, t2 and t3 – the “small” tables – turning into “build” tables, then t4 – the “large” table – passing through each of them in turn until the penultimate rowsource is aggregated.

Here’s the execution plan — which looks pretty much as I expected it to – but there’s something wrong about it that isn’t visible in the output. Why is the query (plan) saying it took 0.07 seconds to complete (A-time), returning only 60 rows, when my SQL*Plus session didn’t return any data for 4 seconds

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   300 (100)|          |        |      |            |     60 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR                |          |      1 |        |            |          |        |      |            |     60 |00:00:00.07 |       5 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)          | :TQ10004 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,04 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    HASH GROUP BY               |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |     60 |00:00:00.01 |       0 |      0 |  1394K|  1394K|     3/0/0|
|   4 |     PX RECEIVE                 |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |    180 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND HASH              | :TQ10003 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,03 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY            |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,03 | PCWP |            |    180 |00:00:00.14 |    6114 |   6018 |  1394K|  1394K|     3/0/0|
|*  7 |        HASH JOIN               |          |      3 |   8460 |   299   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.14 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|   8 |         JOIN FILTER CREATE     | :BF0000  |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|   9 |          PX RECEIVE            |          |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|  10 |           PX SEND BROADCAST    | :TQ10000 |      0 |     22 |     2   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  11 |            PX SELECTOR         |          |      3 |        |            |          |  Q1,00 | SCWC |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 12 |             TABLE ACCESS FULL  | T3       |      1 |     22 |     2   (0)| 00:00:01 |  Q1,00 | SCWP |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 13 |         HASH JOIN              |          |      3 |  27322 |   297   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  14 |          JOIN FILTER CREATE    | :BF0001  |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  15 |           PX RECEIVE           |          |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  16 |            PX SEND BROADCAST   | :TQ10001 |      0 |     21 |     2   (0)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  17 |             PX SELECTOR        |          |      3 |        |            |          |  Q1,01 | SCWC |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 18 |              TABLE ACCESS FULL | T2       |      1 |     21 |     2   (0)| 00:00:01 |  Q1,01 | SCWP |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 19 |          HASH JOIN             |          |      3 |  92953 |   294   (3)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  20 |           JOIN FILTER CREATE   | :BF0002  |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  21 |            PX RECEIVE          |          |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  22 |             PX SEND BROADCAST  | :TQ10002 |      0 |     19 |     2   (0)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  23 |              PX SELECTOR       |          |      3 |        |            |          |  Q1,02 | SCWC |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|* 24 |               TABLE ACCESS FULL| T1       |      1 |     19 |     2   (0)| 00:00:01 |  Q1,02 | SCWP |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|  25 |           JOIN FILTER USE      | :BF0000  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  26 |            JOIN FILTER USE     | :BF0001  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  27 |             JOIN FILTER USE    | :BF0002  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  28 |              PX BLOCK ITERATOR |          |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWC |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|* 29 |               TABLE ACCESS FULL| T4       |     48 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.06 |    6114 |   6018 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
  12 - SEL$1 / T3@SEL$1
  18 - SEL$1 / T2@SEL$1
  24 - SEL$1 / T1@SEL$1
  29 - SEL$1 / T4@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('_fix_control' '16923858:5')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T4"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T3"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T4"@"SEL$1" "T2"@"SEL$1" "T3"@"SEL$1")
      USE_HASH(@"SEL$1" "T4"@"SEL$1")
      USE_HASH(@"SEL$1" "T2"@"SEL$1")
      USE_HASH(@"SEL$1" "T3"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T4"@"SEL$1" BROADCAST NONE)
      PX_JOIN_FILTER(@"SEL$1" "T4"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T2"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T2"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T3"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T3"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T2"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T3"@"SEL$1")
      GBY_PUSHDOWN(@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T3"."ID"="T4"."ID3")
  12 - filter((TO_NUMBER("T3"."SMALL_VC")=1 OR TO_NUMBER("T3"."SMALL_VC")=2 OR TO_NUMBER("T3"."SMALL_VC")=3 OR TO_NUMBER("T3"."SMALL_VC")=4 OR TO_NUMBER("T3"."SMALL_VC")=5))
  13 - access("T2"."ID"="T4"."ID2")
  18 - filter((TO_NUMBER("T2"."SMALL_VC")=1 OR TO_NUMBER("T2"."SMALL_VC")=2 OR TO_NUMBER("T2"."SMALL_VC")=3 OR TO_NUMBER("T2"."SMALL_VC")=4))
  19 - access("T1"."ID"="T4"."ID1")
  24 - filter((TO_NUMBER("T1"."SMALL_VC")=1 OR TO_NUMBER("T1"."SMALL_VC")=2 OR TO_NUMBER("T1"."SMALL_VC")=3))
  29 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0002,"T4"."ID1"),SYS_OP_BLOOM_FILTER(:BF0001,"T4"."ID2"),SYS_OP_BLOOM_FILTER(:BF0000,"T4"."ID3")))


One possible trouble-shooting step is simply to re-run the query, taking a snapshot of the session activity stats (v$mystat) and the session events (v$session_event) to see what they tell you (if anything) – here’s the critical extract from the session stats:

Name                                      Value
----                                      -----
CPU used when call started                  423
CPU used by this session                    429
parse time cpu                              411
parse time elapsed                          419

Most of the time is parse time, spent on the CPU. (If this had been a much larger scale problem and had occurred in the recent past I might have looked at ASH (v$active_session_hsitory) for any samples for the correct SQL_ID, and seen the problem revealed in the in_parse, in_hard_parse columns.

So let’s enable event 10053 and run the query again – but since it’s “only” 4 seconds, let’s tweak the timer option to report any step that took longer than 0.1 seconds. The default timer setting is a whole second (10^6 microseconds), so we set the fix-control to 5 to get 0.1 seconds (10^5 microseconds).

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

-- run the query, find the trace file

 grep TIMER or19_ora_23370.trc 

Here’s the output from the call to grep: it looks like group by placement (GBP) is causing a problem.

TIMER:  GBP: costing SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER: GBP: iteration (#1) SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER:  GBP: costing SEL$1565E019 cpu: 0.293 sec elapsed: 0.298 sec
TIMER: GBP: iteration (#2) SEL$1565E019 cpu: 0.294 sec elapsed: 0.299 sec
TIMER:  GBP: costing SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER: GBP: iteration (#3) SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER:  GBP: costing SEL$B5D97CA0 cpu: 0.533 sec elapsed: 0.540 sec
TIMER: GBP: iteration (#4) SEL$B5D97CA0 cpu: 0.534 sec elapsed: 0.540 sec
TIMER:  GBP: costing SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.531 sec
TIMER: GBP: iteration (#5) SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.532 sec
TIMER:  GBP: costing SEL$ED1298E3 cpu: 0.522 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#8) SEL$ED1298E3 cpu: 0.523 sec elapsed: 0.524 sec
TIMER:  GBP: costing SEL$5968095A cpu: 0.523 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#9) SEL$5968095A cpu: 0.524 sec elapsed: 0.523 sec
TIMER:  GBP: costing SEL$4CA81688 cpu: 0.525 sec elapsed: 0.527 sec
TIMER: GBP: iteration (#12) SEL$4CA81688 cpu: 0.526 sec elapsed: 0.528 sec
TIMER: Group-By Placement SEL$1 cpu: 3.766 sec elapsed: 3.793 sec
TIMER: Cost-Based Transformations (Overall) SEL$1 cpu: 3.769 sec elapsed: 3.795 sec
TIMER: Access Path Analysis (Final) SEL$1 cpu: 0.288 sec elapsed: 0.289 sec
TIMER: SQL Optimization (Overall) SEL$1 cpu: 4.072 sec elapsed: 4.108 sec

If you check further up the page, though, you’ll see in the Outline Information that Oracle has not used group by placement (it has done a “group by pushdown” but that’s different, and relates to aggregation in parallel execution. So one quick hack we could try is to add the hint /*+ no_place_group_by(@sel$1) */ to the query just to see what happens – and here’s the effect on the parse time:

Name                                      Value
----                                      -----
parse time cpu                               33
parse time elapsed                           34

Problem solved – provided we can get the hint into the code (by hand, or SQL Patch, etc.) But the question still remains: where did the time go? The trace file was fairly long (375,000 lines for the original, compared to 32,000 for the hinted) but a rapid scan seemed in order – and something very quickly caught my attention. It was pretty easy to spot because something big and nasty had happened 8 times.

The answer was in “Adaptive Plans”, which (mostly) get flagged with the label “AP:” in the 10053 trace file, for example:

AP: Computing costs for inflection point at min value 0.00
AP: Using binary search for inflection point search
AP: Costing Join for ADM inflection point at card 0.00
AP: Costing Join for ADM inflection point at card 0.00

You can see here that the optimizer is searching for an “inflection point”, that means it’s “thinking about” an adaptive join, and searching for the number of rows where a switch between a nested loop join and a hash join makes sense.

Notice particularly the comment about “using binary search”. After calculating the cost of the “best” join using the current estimates of cardinality for the incoming rowsource the optimizer starts calculating the possible costs (nested loop or hash, it ignores merge) for a set of alternative cardinalities until it finds the cardinality where the nested loop join and hash join have the same cost. That’s the number of rows that Oracle will use at run time to decide whether it should switch from its initial selection of join method to the alternative.

If the initial join selection was a hash join (i.e. large volume startegy) Oracle will use an alternative cardinality of 1 to start its list of estimates – on the assumption that that would produce a nested loop plan and then keep doubling the estimate until the resulting plan switched to a hash join, then close in on the break point by halving and doubling the remaining gap between the NL estimate and the HJ estimate.

If the initial join selection was a nested loop join Oracle will use a “worst case scenario” for the incoming estimate (acting, broadly speaking, as if every filter at that point had had a selectivity of 1 – i.e. 100% of the available data), and start by halving the estimate. This is fine when the starting estimate is (for example) for the first table in the join order and the esimate is just a few million rows.

Here’s the first estimate in my case (you’ll need a wide screen, or very small print):

6772:AP: Computing costs for inflection point at max value 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00 

After you’ve halved that number about 1,000 times you’re down to values in the region of a few thousand. And when you have to cost a couple of joins every time you halve, and when you’ve gone through the whole process 8 times that’s a lot of CPU.

In fact, with the no_place_group_by hint there was still one point where the optimizer did this adaptive join work – which probably accounts for most of the final 0.33 CPU seconds – but it didn’t show up in any separately reported timer events.

Of course the final little test of this parse time hypothesis is to add the hint /*+ no_adaptive_plan */ – so I did that, and the parse time was still about 0.3 seconds! Investigation of the 10053 trace showed that even with the hint in place the optimizer still went through that one huge binary chop – but when it had worked out the inflection point it printed the message:

AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

According to the hint report the hint was valid, however, so that behaviour looks a little buggy. It then occurred to me that maybe I could have hinted /*+ no_adaptive_plan(@sel$1) */ – and that worked properly with the trace reporting:

AP: Checking validity for query block SEL$1, sqlid=7fjtvwktcmsgq
AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

Had I not realised that this was possible I have fallen back on the hint /*+ opt_param() */ to set the parameter optimizer_adaptive_plans to false for the query in which case the following message (and no other “AP:” message) appeared near the start of the trace:

AP: Adaptive joins bypassed for query block SEL$1 due to disabled by adaptive join parameter

If you want to experiment on your own system here’s the script to create the data – the script name reflects the fact that I found this example by accident while working on something completely different:

rem
rem     Script:         12c_vector_transform_c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2015
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             12.1.0.2

drop table t1;
drop table t2;
drop table t3;
drop table t4;
purge recyclebin;

create table t1 
as
select
        rownum          id,
        to_char(mod(rownum,10)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t1 
        add constraint t1_pk primary key(id)
;

create table t2
as
select
        rownum          id,
        to_char(mod(rownum,12)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t2
        add constraint t2_pk primary key(id)
;

create table t3
as
select
        rownum          id,
        to_char(mod(rownum,14)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t3
        add constraint t3_pk primary key(id)
;

create table t4
nologging
as
select
        t1.id                   id1,
        t2.id                   id2,
        t3.id                   id3,
        rpad(rownum,10)         small_vc,
        rpad('x',100)           padding
from
        t1, t2, t3
;

spool 12c_vector_transform_c

set timing on
set linesize 255
set trimspool on
set pagesize 90
set arraysize 100
set serveroutput off

execute snap_my_stats.start_snap

alter session set statistics_level = all;

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

select
        /*+ 
--              opt_param('optimizer_adaptive_plans' 'false')
--              no_adaptive_plan
--              no_adaptive_plan(@sel$1)
--              no_place_group_by(@sel$1)
                parallel(t4,3)
        */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats all note -bytes'));

alter session set events '10053 trace name context off';

set serveroutput on
execute snap_my_stats.end_snap

spool off

March 22, 2022

Upgrade Surprise

Filed under: 19c,Bugs,Oracle,Transformations,Upgrades — Jonathan Lewis @ 10:04 am GMT Mar 22,2022

Here’s a little surprise that showed up in the most recent (March 2022) article that I sent to Simpletalk for the series on transformations. I had been using 19c (19.11.0.0) to create and run my little demos but the editor had used 12.2.0.1 to check the examples and questioned a comment I had made about a “default plan”.

Here’s the query in question. I was using the emp and dept tables from the Scott schema to demonstrate a point about subquery execution:

rem
rem     Script:         unnest_demo_simpletalk_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem 

break on deptno skip 1

select
        /*+ 
                qb_name(main)
                gather_plan_statistics 
        */
        *
from    emp e1
where   e1.sal + nvl(e1.comm,0) > (
                select  /*+ qb_name(subq) */
                        avg(e2.sal + nvl(e2.comm,0))
                from    emp e2
                where   e2.deptno = e1.deptno
        )
order by
        e1.deptno, e1.empno
/

As you can see, I’ve used a correlated aggregate subquery to report all employees who earned more than the average for their department, where “earnings” is calculated as the sum of salary and commission.

Here’s the plan I got when I ran this query under 19c:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |      1 |      6 |00:00:00.01 |      24 |  2048 |  2048 | 2048  (0)|
|*  2 |   FILTER             |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   3 |    TABLE ACCESS FULL | EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   4 |    SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:00.01 |      18 |       |       |          |
|*  5 |     TABLE ACCESS FULL| EMP  |      3 |      5 |     14 |00:00:00.01 |      18 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("E1"."SAL"+NVL("E1"."COMM",0)>)
   5 - filter("E2"."DEPTNO"=:B1)

When my editor ran the query on 12.2.0.1, and when I started up an instance of 12.2.0.1 and ran the query, the plan looked like this:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|*  1 |  FILTER              |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|   2 |   SORT GROUP BY      |      |      1 |      4 |     14 |00:00:00.02 |      29 |      6 |  2048 |  2048 | 2048  (0)|
|*  3 |    HASH JOIN         |      |      1 |     70 |     70 |00:00:00.02 |      29 |      6 |  1922K|  1922K| 1053K (0)|
|   4 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      6 |       |       |          |
|   5 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9E18A930")
      MERGE(@"SEL$AA0D0E02" >"SEL$B4BE209F")
      OUTLINE(@"SEL$B4BE209F")
      UNNEST(@"SUBQ")
      OUTLINE(@"SEL$AA0D0E02")
      OUTLINE(@"SEL$D6166863")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      FULL(@"SEL$9E18A930" "E2"@"SUBQ")
      FULL(@"SEL$9E18A930" "E1"@"MAIN")
      LEADING(@"SEL$9E18A930" "E2"@"SUBQ" "E1"@"MAIN")
      USE_HASH(@"SEL$9E18A930" "E1"@"MAIN")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("E1"."SAL"+NVL("E1"."COMM",0)>SUM("E2"."SAL"+NVL("E2"."COMM",0))/COUNT("E2"."SAL"+NVL("E2"."COMM",0))
              )
   3 - access("E2"."DEPTNO"="E1"."DEPTNO")

(I’ve added in a request for the ‘outline’ in the display_cursor() format.) The correlated subquery has been unnested and the resulting inline view has been subject to complex view merging. It was only at this point that I realised that the 19c plan was a little surprising and not what I should have expected.

After checking that the configuration and statistics (including the system stats) were the same on the two instances I re-ran the query on 12c with the /*+ no_unnest */ hint to make it use the plan that 19c had produced and I found (as expected) that the plan with filter subquery produced by 19c had a higher cost than the unnesting plan produced by 12c.

Next I re-ran the query on 19c with the /*+ unnest */ hint to make it use the plan that 12c had produced – but it didn’t! 19c “ignored” the hint and carried on using the filter subquery plan. It did, however, supply the following extra information when I added the ‘hint_report’ option to the to the display_cursor() format:

Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------

   1 -  MAIN
           -  qb_name(main)

   4 -  SUBQ
         U -  unnest / Failed basic validity checks
           -  qb_name(subq)

The code in 19c thinks that it would be illegal to unnest the subquery that 12c was unnesting (does this mean that some people may be getting wrong results in 12c). So I checked the 10053 (CBO) trace file to see if there was any further information there that would “explain” the problem. This is what I found:

SU: Checking validity for Null Accepting Semi JoinUnnesting for query block MAIN(#1)
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: More than 1 column in connect condition.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

The reference to “Null accepting” looks a little suspect but prompted a few experiments (modifying the data to eliminate nulls, adding not null declarations to columns, simplifying the query etc.) that suggested that the problem was essentially that the optimizer did not want to unnest when the comparison was with the expression (sal + comm) regardless of the operator, and even when all the relevant columns had been populated, declared not null, and the nvl() function had been removed.

It doesn’t seem reasonable in this case, but possibly the block is a generic strategy to avoid errors in some more subtle cases, and perhaps the block will be refined and relaxed in future versions. (Or maybe it’s a bug that the wrong test is being called at this point – there was nothing in the query requiring “null acceptance” by the time I got through the last test.)

I did find a workaround that you could use to avoid any code rewrite:

alter table emp add nvl_earnings 
        invisible 
        generated always as (sal + nvl(comm,0)) 
        virtual
;

There’s seems to be no good reason why this should work – but it does. The subquery unnests and the Predicate Information in the plan doesn’t give any clue that it’s using a virtual column.

Summary:

When you upgrade from 12c there are some queries involving correlated subqueries that no longer unnest the subquery. This may have a significant impact on performance and it may not be possible to bypass the problem unless you rewrite the query to do a manual unnest although I did find a virtual column workaround for my specific example. So far I’ve tested the query on 19.11.0.0 and 21.3.0.0, the behaviour is the same in both versions.

Footnote:

After failing to find anything on MOS about the problem I emailed a draft of this note to Nigel Bayliss at Oracle – who did find a promising match on MOS.

The failure to unnest may be the consequence of the fix for bug 30593046: “A query having a scalar subquery returned a wrong result”. The fix was introduced in 19.9.0.0 but was too restrictive, leading to the creation of bug 33325981: “Query Loses Subquery Unnesting From Plan in 19.9 and Above”.

The fix for 33325981 was distributed in 19.13.0.0 and 21.4.0.0 (plus a couple of earlier RURs, with patches available for various versions back to 12.2.0.1). Unfortunately the newer fix still doesn’t go far enough in reducing the restrictions and my example still doesn’t unnest.

Make sure you check any code that depends on “expression-based” subquery unnesting before you upgrade to 19.9, as it may change plan, which may make a difference to performance and a requirement for a workaround.

March 2, 2022

Generated Predicates

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 11:25 am GMT Mar 2,2022

A question arrived on the MOS Community forum yesterday (needs an account if you want to see the original) that reported a couple of fragments of a CBO trace (10053) file:

----- Current SQL Statement for this session (sql_id=4c85twjpdg8g9) -----
select /*+ 123456 */ count(*) from gl_detail where prepareddatev='2022-01-22 15:00:00'

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "NC63"."GL_DETAIL" "GL_DETAIL" 
WHERE "GL_DETAIL"."PREPAREDDATEV"='2022-01-22 15:00:00' 
AND SUBSTR("GL_DETAIL"."PREPAREDDATEV",1,10)='2022-01-22'

The question was:

Why after transformations ,oracle add condition SUBSTR(“GL_DETAIL”.”PREPAREDDATEV”,1,10)=’2022-01-22′

Mark Powell asked for the execution plan and information about indexes (normal and function-based) and histograms, as well as asking for the Oracle version. I asked about constraints and virtual columns and, in particular, the possibility of a virtual column being used as a partition key.

We didn’t get explicit answers to all our questions, but we did get “no constraints, no virtual columns, no partitioning”, and we also got the full 10053 trace file which, given the simplicity of the query, was mercifully short .. a mere 95KB and 2,800 lines.

The key aid to reading 10053 trace files is knowing what you’re expecting to see before you start looking. And with a generated predicate there was likely to be something that would tell me about about the “column” that caused the predicate to appear and the arithmetic that was the consequence of that predicate coming into existence. So I started with the section headed “SINGLE TABLE ACCESS PATH” where the cardinality estimate (for each individual table) would be calculated. This showed two columns being considered for the single table in the query:

  Column (#77): 
    NewDensity:0.000000, OldDensity:0.000035 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:8314506
  Column (#77): PREPAREDDATEV(
 
  Column (#88): 
    NewDensity:0.000188, OldDensity:0.000583 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:4551
  Column (#88): SYS_NC00088$(

Check the name of column #88 – sys_nc00088$ – that’s an internally generated virtual column which may well be to be associated with a function-based index, so let’s back up a bit to the “BASIC STATISTICAL INFORMATION” and (thirteen sets of) index stats for the table where we find:

  Index: I_GL_DETAIL_7  Col#: 88
    LVLS: 3  #LB: 433301  #DK: 4551  LB/K: 95.00  DB/K: 5922.00  CLUF: 26953639.00

The obvious first guess is that column #88 is the invisible virtual column underpinning an index that has been created on substr(prepareddatev,1,10) and here’s a quick and dirty test script to demonstrate that this could be the correct guess.

create table t1 (v1 varchar2(20), v2 varchar2(1));
create index t1_i1 on t1(substr(v1,1,10));

select column_name, virtual_column, hidden_column from user_tab_cols where table_name = 'T1';
select * from user_ind_expressions where table_name = 'T1';

insert into t1 values('2022-03-02 09:01:00', 'x');
commit;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace traceonly explain

select /*+ full(t1) */  * from t1 where v1 = '2022-03-02 09:01:00';

set autotrace off

And here’s the output cut and pasted from an SQL*Plus session running 11.2.0.4 (which is the version the CBO trace file came from).

Table created.


Index created.


COLUMN_NAME          VIR HID
-------------------- --- ---
V1                   NO  NO
V2                   NO  NO
SYS_NC00003$         YES YES

3 rows selected.


INDEX_NAME           TABLE_NAME                COLUMN_EXPRESSION                        COLUMN_POSITION
-------------------- ------------------------- ---------------------------------------- ---------------
T1_I1                T1                        SUBSTR("V1",1,10)                                      1

1 row selected.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("V1"='2022-03-02 09:01:00' AND
              SUBSTR("V1",1,10)='2022-03-02')

We see the “extra” predicate and a column with a name of the form sys_ncXXXXX$. The results from more recent versions of Oracle should be the same. I think there’s a pretty good chance that if the OP runs suitable queries against XXX_tab_cols and XXX_ind_expressions they’ll see similar results that explain the predicate that surprised them.

Footnote

There are various notes on the blog about constraints and transitive closure generating extra predicates, and how the optimizer can use function-based indexes that have definitions that are “good enough” though not perfect matches for user-supplied predicates. This is just another little detail in how the optimizer tries to find as much helpful information as it can from the data dictionary. The earliest note I can find on my blog about this at present is about partition elimination and generated predicates – which prompted various comments about function-based indexes and predicate generation.

February 21, 2022

CTE Enhancement

Filed under: 12c,Oracle,Performance,Transformations — Jonathan Lewis @ 9:24 am GMT Feb 21,2022

For many years I’ve told people that when you materialize a CTE (common table expression / “with” subquery) the result set will be written to the temporary tablespace using direct path writes and will be read back using cached reads. This stopped being an accurate description in 12c.

There is a clue about this in the way that the corresponding execution plans and I’ll be pointing that out later. The key difference between earlier versions of Oracle and newer versions is that the GTT (global temporary table) that holds the materialized result set is not necessarily written to disc, and may even avoid allocating a temporary segment.

I started looking at this feature a couple of days ago after remembering that someone, somewhere, had mentioned some details about a temporary object being kept in the PGA rather than being written to disc if the size wasn’t too big. I couldn’t remember if this was GTTs or temporary LOBs (or something completely different) and I only had a vague memory that there was a critical size that might have been 256KB; so I started experimenting with materializing CTEs.

Here’s the script I used to establish a baseline in 11g. I took a fairly arbitrary starting guess that if there was a PGA/Temp split is might be around 64KB.

rem
rem     Script:         cte_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

alter session set events '10046 trace name context forever, level 8';

prompt  ==============================
prompt  First sample - just under 64KB
prompt  ==============================

set arraysize  35
set pagesize  35

set pause Waiting...
set pause on

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 63
)
select
        substr(v1,-10)
from
        g1
;

alter session set events '10046 trace name context off';

execute snap_my_stats.end_snap
execute snap_ts.end_snap

prompt  ==============================
prompt  Second sample - just over 64KB
prompt  ==============================

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 64
)
select
        substr(v1,-10)
from
        g1
;

execute snap_my_stats.end_snap
execute snap_ts.end_snap

set pause off

The calls to the package snap_my_stats are the usual ones I use (very old source at this Wayback URL) to report the session’s activity (v$mystat) between start and end snapshot and the calls in the snap_ts package do the same for the I/O per tablespace, summing bu tablespace across v$filestat and v$tempstat.

This script basically materializes and reports a GTT with a single column of 1,024 bytes, and in this case either 63 or 64 rows. You’ll note that I’ve built another assumption into the code that the CTE (if kept in memory) won’t follow all the details of a “real” table block, but will be a simple array with a tiny overhead per row – I’ve also assumed that the optimizer isn’t smart enough (or foolhardy enough) to push the substr() call inside the CTE.

I’ve set pause on and set both the arraysize and pagesize to a value less than the total number of rows I’ll be fetching so that I can check a couple of dynamic performance views (in particular v$sort_usage) from another session while the queries are running.

As a starting point, here are some critical values I’ve selected from the various outputs for just the 63 row case when running 11.2.04:

-------------
Session stats
-------------
Name                                                 Value
----                                                 -----
physical reads                                           9
physical reads cache                                     9
physical writes                                          9
physical writes direct                                   9
physical writes direct temporary tablespace              9

---------
T/S Stats 
---------
TS#        Reads      Blocks   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max Tablespace
----       -----      ------   ---------    ---      ------      ------   ---------    --- -------------------
   3           1           9        .000      1           1           9        .000      0 TEMP


Since I've enabled extended tracing at level 8 (waits) I can also show you the I/O waits reported for the reads and writes:
WAIT #140185232180256: nam='direct path write temp' ela= 180 file number=201 first dba=35073 block cnt=9 obj#=-1 tim=1645178184267768
...
WAIT #140185232180256: nam='db file scattered read' ela= 183 file#=201 block#=35073 blocks=9 obj#=-40016369 tim=1645178184268342

A dump of the 9 blocks of the temporary file (the “file number-201” translates to tempfile 1 since I have db_files = 200) starting at block 35073 looks like an ordinary table with pctfree 10 (which is one of the annoying details of GTTs – you can’t adjust it), 3 ITL slots (which is normal for CTAS) and 7 rows per block.

So, for 11g, we conclude that the effect of materializing a CTE is simply to create a GTT in the temporary tablespace, write it out using direct path writes, then read it back into the buffer cache using db file scattered reads. (You might want to confirm that this always happens, even if the CTE holds only one row.)

If you take advantage of the pause to issue “alter system flush buffer_cache” from another session you can also dump the segment header block (35072 in my case) to see that it’s a normal table segment header block – using freelist management, not ASSM because that’s the way temporary tablespaces have to be declared. The segment header block didn’t get written to disc in the normal course of the test.

12c Enhancement

This is the moment where the second query, and the pause that allows me to query v$sort_usage, becomes significant. When I started 12.2.0.1 with the 63 row query I saw:

  • No I/O on the temporary tablespace
  • No entry in v$sort_usage

To my great satisfaction the 64 row query did report I/O to the temporary tablespace (10 blocks this time – needing one extra block to cater for the 64th row) with v$sort_usage reporting a segment being created on my behalf. Obviously I re-ran the test a couple of times, flushing the buffer cache and shared pool, and connecting to a new session each time. The results were totally consistent: 63 rows => no GTT, 64 rows => GTT.

If you’re feeling a little suspicious at this point, bear with me.

This is the point where I switched to 19.11.0.0 – and both queries ran in memory with no sign of a GTT being created. Luckily I had cloned the query several times in the script generating different pairs of numbers of rows: 127/128, 255/256, 511/512, 1023/1024, and when I hit 1024 (and 1023) my session produced a GTT.

Somewhere between 512 and 1023 rows I was hitting a critical breakpoint – so I nearly started working through a binary chop to find the actually breakpoint; luckily, though, I had a little inspiration: if the overhead per row was 3 bytes (as it would be for a normal table column of more than 254 bytes) then 1023 rows would have an overhead of about 3KB – so I should test 1021 rows if I wanted to test a memory of just under 1MB.

Sure enough, at 1021 rows the GTT didn’t appear, at 1022 rows it did – time after time after time.

But …

My tests seemed to be totally repeatable. Again, I connected to a new session a few times, I flushed the buffer cache, I flushed the shared pool, I checked v$sort_usage. Consistently the results seemed to say:

  • 12.2 uses the PGA up to 64KB then dumps to a GTT
  • 19.11.0.0 uses the PGA up to 1MB then dumps to a GTT

Except – that night I had to shut down the two virtual machines because sometimes, for no obvious reason, I can’t hibernate my laptop while the VMs are running; and when I started everything up again the following morning and started re-running the tests as I wrote up the results something had changed. In fact my 19.11 instance didn’t dump to a GTT until I had reached nearly 10MB of data and my 12.2 wasn’t even dumping at 1MB; and I have no idea why a complete restart made such a difference.

After spending a little time trying to figure out what had changed – and I think it may have been that I’d been running the previous day’s tests after doing a lot of heavy work with temporary LOBs trying to pin down an anomaly with the handling of the temporary tablespace – I finally tried a google search using keywords that might be relevant and found this article that Keith Laker wrote about 5 years ago.

The feature is known as In-memory “cursor-duration” temporary table. I mentioned a clue in the execution plans at the start of this note: materialization shows up with a “temp table transformation” operation followed, in 11g, by with a child operation of “load as select”; but in 12.2 the child operation is “load as select (cursor duration memory)”. I really should have started my invesigation by putting the entire text of that operation into a search engine.

Summary

(Basically the highlights from Keith’s article):

  • The “in-memory cursor-duration”temporary table” change appeared in 12.2
  • It can be used in a number of transformations that the optimizer does
  • It’s not possible to force the use of the feature for a given query, it’s down to an internal algorithm
  • The mechanism uses memory that is “essentially” PGA
  • Despite the name this feature does not require you to licence the In-Memory option
  • If you’re still using an older version of Oracle this could be a good reason for upgrading as it can reduce the I/O load particularly for “analytic” types of query at a cost of using extra memory.

All the work I had done trying to find a break-point where Oracle switched from using PGA to using a GTT had been a waste of time – and the apparently consistent results on the first day had been an “accident” dictated (possibly) by some PGA-related activity that had taken place before I started running my tests .

Footnotes and geeky things

Five years on from the publication date of Keith’s article we shouldn’t be surprised to see some changes. Keith notes that the mechanism will apply only to serial queries that do more than one pass over the table – but there are two points to raise there:

  • possibly the two-pass thing is because it usually takes two passes over a CTE before Oracle will materialize a CTE automatically; my example shows the in-memory effect on a single pass – but that was a forced materialization.
  • the restrictions on parallelism may have been relaxed by now – if you check for hidden parameters you will find: _in_memory_cdt_maxpx, default value 4, described as “Max Parallelizers allowed for IMCDT”.

Interestingly there are more “cdt” parameters in 12.2 than there are in 19.11, and there are clear indications of some changes in algorithm and mechanism:

12c parameters like '%cdt%
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_cdt_shared_memory_limit                   0                    Shared Memory limit for CDT
_cdt_shared_memory_query_percent           20                   Per Query Percentage from CDT pool
_imcdt_use_mga                             ON                   MGA for In Memory CDT
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

19g parameters like '%cdt%'
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_hcs_enable_in_mem_cdt_hint                FALSE                add hint opt_param('_in_memory_cdt', 'off')
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

The parameter “_in_memory_cdt” can take the values ON, LIMITED, or OFF – which tells you that even if you can’t force a query to use in-memory CDTs you can (if you really want to) stop a query from using the feature. There are a few notes about this parameter and its significance to RAC and parallel execution (for 12.2) on MOS – if you have an account – Doc ID 2388236.1 What is _in_memory_cdt Parameter?

The reference to MGA (the “managed global area”) in 12.2 is also quite interesting. This is documented as a Solaris feature using OSM to share memory between processes. For more general details you can review MOS Doc ID 2638904.1 MGA (Managed Global Area) Reference Note (again, only if you have an account).

The “new” oradebug mechanism shows (from 18c) a couple of relevant components under SQL compilation and execution that you could trace if you want to investigate further.

Components in library RDBMS:
--------------------------
  SQL_Compiler                 SQL Compiler ((null))
    ICDT_Compile               In Memory CDT Compilation (qks3t)
  SQL_Execution                SQL Execution (qer, qes, kx, qee)
    ICDT_Exec                  In Memory CDT Execution (qes3t, kxtt)

January 17, 2022

Hash Aggregation – 1

Filed under: CBO,Hints,Oracle,Performance,Tuning — Jonathan Lewis @ 1:15 pm GMT Jan 17,2022

Here’s an observation I made some time in 2007, but didn’t mention online until a (possibly) relevant question appeared on the Oracle database forum in 2017; and the topic reappeared in a nearly unrelated question a little while ago. The 2017 question was this:

I am running following SQL, which it taking almost 4-5 seconds and returning only 18 records.

SELECT
   SUM(TOTAL_AMOUNT) C1,
   a.LEVEL2_ENAME AS c2,
   c.CURR_YEARMO AS c3
FROM TERRITORY_PRESET_MSNP a,
   CV_RESTATED_MSNP b
   LEFT OUTER JOIN
   MONTH_D c
   ON b.YEARMO = c.CURR_YEARMO,
   PRODUCT_OFFERING d
WHERE   b.PO_ID = d.ROW_ID
    AND b.DATASOURCE_ID = 10
    AND b.YEARMO = 201704
    AND b.OWNER_TERR_ID = a.TERR_ID
    AND c.CURR_YEARMO = 201704
    AND a.YEARMO = 201706
GROUP BY c.CURR_YEARMO, a.LEVEL2_ENAME
ORDER BY C3, C2;

If I remove the ORDER BY clause it is returning results in 1 second.

Before saying anything else, I’ll just make a couple of points about the SQL:

  • It’s not a good idea to mix traditional Oracle syntax with “ANSI” syntax – it’s likely to make things harder for the next person to read the code and there’s just a slight possibility that the rewrite that Oracle applies to hide the ANSI syntax may block some of the possible execution paths.
  • The C3, C2 in the order by clause are the column aliases for the curr_yearno, level2_ename columns used in the group by clause.  Although Oracle allows you to use aliases in the order by (but not in the group by) doing so can only make the SQL a little harder to interpret (especially in a case like this when you have both clauses).
  • There’s a left outer join to month_d (aliased as c), but the where clause then includes the predicate c.CURR_YEARMO = 201704 which will eliminate any rows where curr_yearmo is null, thus converting (we hope – but the mix and match syntax might introduce a side-effect) the outer join to an inner join – so maybe that’s a design error in the SQL.

Addressing the question, though, the first thought (rapidly eliminated) is that perhaps this is the standard complaint of the GUI interface: “it’s fast until I add an order by clause”.

The commonest reason for this complaint is that the typical GUI interface shows you the first few rows and waits for you to page down, so your impression of the response time is “the time to see the first few rows” rather than “the time to get the complete result set” when it might take much more time to return the entire result set. When you add an order by clause it’s possible that Oracle will have to sort the entire result set before you see any of it. It’s often the difference between “first few rows” and “entire result set” that triggers the complaint.

In this case the “rapid elimination” of this thought is due to the OP saying the result set was only 18 rows. which is likely to produce the gut feeling that it shouldn’t take very long for Oracle to sort 18 rows if it had to find all of them before showing displaying them. On the other hand the thought might need a little follow-up, after all:

  • it’s possible that the GUI is only displaying 15 rows at a time and it’s takes a lot of time to find the extra 3 rows. Just think of a tablescan with a filter subquery when the rows you want are the first few in the table. Without an order by the rows can be displayed as they are found, with an order by Oracle will have to get to the end of the tablescan before the rows can be sorted and displayed.
  • the optimizer can produce terrible estimates and the order by clause might prompt it to say “if I start with a different table, driving through a specific index, and changing the join order then I won’t have to do any sorting for the order by clause” The resulting path may be a very bad idea if the arithmetic produces the wrong results.

The OP hasn’t shown us the execution plan – and that’s what we really need to see; but there is an alternative  guess that we could make about what the optimizer is doing that would affect the performance this much.

The query is an aggregate query – we have a group by. Since 10g Oracle has been able to use “hash aggregation” – which shows up as the HASH GROUP BY operation in an execution plan. Here’s a little demo script, with a couple of sample queries:

rem
rem     Script:         sort_hash.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2012
rem 

create table t1
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                level <= 1e4  -- > comment to avoid wordpress format issue
)
select
        trunc(dbms_random.value(0,262144))      n_256K,
        trunc(dbms_random.value(0,131072))      n_128K,
        trunc(dbms_random.value(0,8192))        n_8k
from
        generator       v1,
        generator       v2
where
        rownum <= 8 * 1048576  -- > comment to avoid wordpress format issue
;

set arraysize 1000
set timing on
set autotrace traceonly 

prompt  ===========
prompt  No Order by
prompt  ===========

select
        n_8K, count(*) ct
from
        t1
group by
        n_8k
;


prompt  =============
prompt  With Order by
prompt  =============

select
        n_8K, count(*) ct
from
        t1
group by
        n_8k
order by
        1
;


My table has 8M rows, and my queries target the column with 8K distinct values. I’ve enabled autotrace from SQL*Plus, set a large arraysize (to reduce time lost to SQL*Net round-trips), and set timing on so we can get an elapsed time for total execution. I’ve set autotrace to “traceonly” so that the SQL*Plus client will fetch the data but won’t doesn’t waste resources formatting it, but I’m not actually interested in the handful of execution statistics that will be reported.

Here are the two sets of results from a test run on 19.11.0.0. Note, particularly, the figure for Elapsed:


===========
No Order by
===========

8192 rows selected.

Elapsed: 00:00:00.58

Execution Plan
----------------------------------------------------------
Plan hash value: 136660032

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
---------------------------------------------------------------------------

=============
With Order by
=============

8192 rows selected.

Elapsed: 00:00:03.03

Execution Plan
----------------------------------------------------------
Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   1 |  SORT GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
---------------------------------------------------------------------------

The time has jumped from slightly under 0.6 seconds to just over 3 seconds as the critical operation changes from a HASH GROUP BY to a SORT GROUP BY (even though the estimated cost, hence predicted run-time, of execution has not changed).

Your first thought at this point is probably along the lines of “surely it doesn’t take 2.4 seconds to sort 8,192 small rows, why doesn’t Oracle do a hash group by followed by a sort order by?” The answer seems to be “it just doesn’t”. So here’s one way to make it happen (with execution plan and elapsed time from 19.11.0.0 again):

select
        dist_8k, ct
from
        (
        select  /*+ no_merge */
                n_8K dist_8k, count(*) ct
        from
                t1
        group by
                n_8k
        )
order by 
        dist_8k
;

8192 rows selected.

Elapsed: 00:00:00.59

Execution Plan
----------------------------------------------------------
Plan hash value: 1705136228

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  8192 |   208K|  6938  (61)| 00:00:01 |
|   1 |  SORT ORDER BY       |      |  8192 |   208K|  6938  (61)| 00:00:01 |
|   2 |   VIEW               |      |  8192 |   208K|  6938  (61)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
-----------------------------------------------------------------------------


Again the estimated cost of execution doesn’t (to the nearest whole number) change from the basic cost of the hash aggregation – but we have brought the time back down to just under 0.6 seconds.

It’s worth reminding you at this point that if you can re-engineer any SQL that’s performing badly and can see that the driving core of the query can be reduced to something much simpler and more efficient, then wrapping that core into an inline view with the /*+ no_merge */ hint (and possibly putting it up into a “with subquery” clause) might be the safest first step and most effective way of improving performance.

There is an option for avoiding the query rewrite here – hint the path you want to see:


select  /*+ use_hash_aggregation */
        n_8K, count(*) ct
from
        t1
group by
        n_8k
order by 
        1
;

8192 rows selected.

Elapsed: 00:00:00.59

Execution Plan
----------------------------------------------------------
Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   1 |  SORT ORDER BY      |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
----------------------------------------------------------------------------

The nice thing about this, of course, is that you don’t actually have to edit the text; the hint could be attached to the query through an SQL Patch (or by abusing the SQL Profile or SQL Plan Baseline mechanisms).

The difficult part of hinting is finding the correct query block name for a more complex query. I simply added the hint /*+ use_hash_aggregation */ but the hint can be aimed at a query block so, in the absence of explicit query block names I could have used the hint /*+ use_hash_aggregation(@sel$1) */ using the default naming.

In a more complex case you can find the appropriate query block name by using the ‘alias’ format option when generating the execution plan. Consider the following query (where t2 and t3 are created from view all_objects), with its initial execution plan:


explain plan for
select 
        t2.owner, count(*)
from 
        t2 
where 
        t2.object_id in (
                select  t3.object_id 
                from    t3 
                where   t3.object_type = 'TABLE'
        )
group by 
        t2.owner
order by 
        t2.owner
/


select * from table(dbms_xplan.display(format=>'alias'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2646727453

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |    23 |   575 |   319   (8)| 00:00:01 |
|   1 |  SORT GROUP BY        |      |    23 |   575 |   319   (8)| 00:00:01 |
|*  2 |   HASH JOIN RIGHT SEMI|      |  2298 | 57450 |   317   (7)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | T3   |  2298 | 34470 |   158   (7)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T2   | 57448 |   561K|   156   (6)| 00:00:01 |
------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$5DA710D3
   3 - SEL$5DA710D3 / T3@SEL$2
   4 - SEL$5DA710D3 / T2@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
   3 - filter("T3"."OBJECT_TYPE"='TABLE')

The Query Block Name / Object Alias information tells us that the query block holding the sort group by at operation 1 is named SEL$5DA710D3 so we can use that as the target query block in the hint: /*+ use_hash_aggregation(@SEL$5DA710D3) */ and the plan changes to:


-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |    23 |   575 |   319   (8)| 00:00:01 |
|   1 |  SORT ORDER BY         |      |    23 |   575 |   319   (8)| 00:00:01 |
|   2 |   HASH GROUP BY        |      |    23 |   575 |   319   (8)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|      |  2298 | 57450 |   317   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T3   |  2298 | 34470 |   158   (7)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2   | 57448 |   561K|   156   (6)| 00:00:01 |
-------------------------------------------------------------------------------

It’s possible that you’ll only notice a significant difference in performance (and, perhaps, PGA memory allocated) when you aggregate a large number of rows into a small result set. And it does seem that this is one case where the only way to get the plan you want without a significant code rewrite is through a hint

tl;dr

If you have some code which does a “group by X, Y, Z order by X, Y, Z” (the order of the columns/expressions used doesn’t have to be the same for the two clauses) then Oracle will use a sort group by operation to handle the aggregation and ordering in a single step, even though we can find cases where hash aggregation followed by sort ordering is more efficient.

If you come across such a case then injecting the hint /*+ use_hash_aggregation(@query_block_name) */ may be the only way to change the execution plan if you’re not allowed to edit the SQL.

Footnote

In the second of the two links to the Oracle Developer Forum you’ll see that one poster pointed out that if the order by clause uses a suitable expression to substitute for one of the columns in the group by clause then you don’t need to hint the code, e.g.

group by
        numeric_column
order by
        numeric_column + 0

Very cute, but not a good idea.

Footnote 2

There is another part to the testing I started in 2007, and I’ll come back to that later in Hash Aggregation – 2.

January 1, 2022

Happy New Year

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,Transformations — Jonathan Lewis @ 12:01 am GMT Jan 1,2022

Here’s an entertaining little bug that appeared on the oracle-l list server just in time to end one year and start another in a suitable way. The thread starts with an offering from Matthias Rogel (shown below with some cosmetic changes) to be run on Oracle 12.2.0.1:

rem
rem     Script:         group_by_bug.sql
rem     Author:         Mathias Rogel  / Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             19.11.0.0       Fixed
rem             12.2.0.1        Fail
rem

create table t as (
        select date'2021-12-30' d from dual 
        union all 
        select date'2021-12-31'   from dual
);

select extract(year from d), count(*) from t group by extract(year from d);

alter table t add primary key(d);
select extract(year from d), count(*) from t group by extract(year from d);

This doesn’t look particularly exciting – I’ve created a table with two rows holding two dates in the same year, then counted the number of rows for “each” year before and after adding a primary key on the date column. Pause briefly to think about what the results might look like …

Table created.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          2

1 row selected.


Table altered.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          1
              2021          1

2 rows selected.

After adding the primary key (with its unique index) the result changes to something that is clearly (for this very simple data set) wrong.

At this point I offered a hypothetical reason why Oracle might be producing the wrong result, but Tim Gorman was one step ahead of me and supplied a bug reference from MOS: Wrong Result Using GROUP BY with EXTRACT Function Against DATE (Doc ID 2629968.1)

The MOS document describes this as a bug introduced in the upgrade from 12.1.0.2 to 12.2.0.1, demonstrates the error with the extract() function applied to a date, and supplies three possible workarounds (but not the workaround or explanation I supplied in my response on oracle-l).

The document also pointed to a further bug note that described how the problem also appeared with the combination of the to_char() function applied to a date column with a unique indexes: 12.2 Wrong Results from Query with GROUP BY Clause Based on To_char Function of Unique Index Columns (Doc ID 2294763.1) with a further suggestion for applying a patch (see MOS Doc ID: 26588069.8) or upgrading to 18.1 (where the bug has been fixed).

Matthias Rogel supplied a follow-up demonstrating the problem with to_char(), which prompted me to create an example showing that it wasn’t just about dates – which I’ve tidied up below (reminder, all results on this page are from 12.2.0.1):

create  table t1 as 
select  round(rownum/10,1) n1 
from    all_objects 
where   rownum <= 10
;


select n1 from t1 order by n1;
select n1, count(*) from t1 group by n1 order by n1;

column nch format A3

select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

alter table t1 add constraint t1_pk primary key(n1);
select to_char(n1,'99') nch , count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

As before I’ve created a simple table, and populated it with a few rows of data. THe first two queries are there to show you the data (0.1 to 1.0 by steps of 0.1), and show that aggregating the raw data produces one row per value.

I’ve then repeated the aggregation query, but converted each value to a character string that effectively rounds the value to an integer. Here are the two sets of results, before and after adding the primary key.

NCH   COUNT(*)
--- ----------
  0          4
  1          6

2 rows selected.

Table altered.

NCH   COUNT(*)
--- ----------
  0          1
  0          1
  0          1
  0          1
  1          1
  1          1
  1          1
  1          1
  1          1
  1          1

10 rows selected.

Again, the introduction of the primary key constraint on the column results in wrong results. In this example, though I’ve pulled the execution plans from memory along with their outlines, and this is what the two plans look like.

SQL_ID  gt5a14jb0g4n0, child number 0
-------------------------------------
select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99')
order by 1,2

Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    10 |    30 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    10 |    30 |     4  (50)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |    10 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


SQL_ID  4fxxtmrh8cpzp, child number 0
-------------------------------------
select to_char(n1,'99') nch , count(*) from t1 group by
to_char(n1,'99') order by 1,2

Plan hash value: 1252675504

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     2 (100)|          |
|   1 |  SORT ORDER BY   |       |    10 |    30 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9BB7A81A")
      ELIM_GROUPBY(@"SEL$47952E7A")
      OUTLINE(@"SEL$47952E7A")
      ELIM_GROUPBY(@"SEL$1")
      OUTLINE(@"SEL$1")
      INDEX(@"SEL$9BB7A81A" "T1"@"SEL$1" ("T1"."N1"))
      END_OUTLINE_DATA
  */

In the absence of the primary key (index) Oracle does a full tablescan, then hash group by, then sort order by. When the primary key is put in place Oracle does an index full scan (which is legal because the index must contain all the data thanks to the not null declaration inherent in a primary key) and a sort order by without any group by.

You might wonder if the problem arises because Oracle assumes the indexed path somehow means the aggregation doesn’t apply – but with a /*+ full(t1) */ hint in place and a full tablescan in the plan the aggregation step is still missing — and if you look at the Outline Data section of the plan you can see that this is explicitly demanded by the hint(s): /*+ elim_groupby() */

My hypothesis (before I read the bug note) was that the optimizer had picked up the primary key declaration and seen that n1 was unique and therefore allowed the aggregating group by to be eliminated, but failed to “notice” that the to_char() – or extract() in the date example – meant that the assumption of uniqueness was no longer valid. To work around this problem very locally I simply added the hint /*+ no_elim_groupby */ (with no query block specified) to the query – and got the correct results.

Footnote

There is an interesting side note to this example (though not one that I would want to see used in a production system – this comment is for interest only). If you look at the Outline Data for the plan when there was no primary key you’ll notice that the only outline_leaf() is named sel$1 whereas in the plan with the primary key sel$1 appears as an outline() and the only outline_leaf() is named sel$9bb7a81a. As “outline leaf” is a query block that was used by the optimizer in constructing the final plan, while an “outline” is an intermediate query block that was examined before being transformed into another query block. So this difference in the Outline Data tells us that the problem appears thanks to a transformation that did not happen when there was no index – so what would our execution plan look like if the only hint we used in the query was /*+ outline_leaf(@sel$1) */ ?

SQL_ID  apgu34hc3ap7f, child number 0
-------------------------------------
select /*+ outline_leaf(@sel$1) */ to_char(n1,'99') nch , count(*) from
t1 group by to_char(n1,'99') order by 1,2

Plan hash value: 3280011052

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY    |       |    10 |    30 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY   |       |    10 |    30 |     3  (67)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


This posting was scheduled to launch at 00:01 GMT on 1st January 2022. Happy new year – just be careful that you don’t try to extract() or to_char() it if you’re running 12.2.0.1 unless you’ve applied patch 26588069.

December 3, 2021

Column Groups

Filed under: CBO,Oracle,Statistics — Jonathan Lewis @ 10:40 am GMT Dec 3,2021

Here’s a little test that should have been the second test I ran when column groups were first made available in Oracle. It’s a check on the second of the two basic algorithms that Oracle uses for combining predicates:

  1. sel(predA and predB) = sel(predA) * sel(predB)
  2. sel(predA or predB) = sel(predA) + sel(predB) – sel(predA and predB)

As I pointed out many years ago in Cost Based Oracle – Fundamentals these are the formulae for combining probabilities of independent events. But if the predicates are not independent the formulae make the optimizer produce bad cardinality estimates and that’s why the Oracle developers introduced column groups – virtual columns that gather statistics about combinations of columns.

So let’s set up an extreme example and test the results (I’ve run it one 21c and 19c, but the issue is the same for all versions with column groups):

rem
rem     Script:         column_group_and_or.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        mod(rownum,100)                 n1,
        mod(rownum,100)                 n2,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator 
;

prompt  =========================================
prompt  Run the script twice, 
prompt  first without gathering the column group
prompt  then after gathering the column group
prompt  =========================================

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for columns (n1, n2) size 1'
        );
end;
/

column column_name format a32

select
        column_name, num_distinct, data_default
from
        user_tab_cols
where
        table_name = 'T1'
/

This script gives you a table with 10,000 rows, where the n1 and n2 columns always have exactly the same values as each other, and each column has 100 rows each of 100 distinct values. If you allow the pl/sql block to run the column stats query will report the following:

COLUMN_NAME                      NUM_DISTINCT DATA_DEFAULT
-------------------------------- ------------ --------------------------------------------------------------------------------
N1                                        100
N2                                        100
V1                                      10000
PADDING                                     1
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS            100 SYS_OP_COMBINED_HASH("N1","N2")

5 rows selected.

After creating the data (and statistics) we can execute the following simple statements:

set autotrace traceonly explain

select  * 
from    t1
where
        n1 = 50
/

select  * 
from    t1
where
        n1 = 50
and     n2 = 50
/

select  *
from    t1
where
        n1 = 50
or      n2 = 50
/

set autotrace off

We know that the first query will return 100 rows, and that the optimizer has enough information to get the correct estimate: the selectivity of “n1 = 50” is 1/100, the number of rows in the table is 10,000, so the estimate should be 1/100 * 10,000 = 100.

For the second query: if we don’t create the column group we expect the optimizer to multiply the selectivities of the two predicates together before multiplying by the number of rows in the table to get an estimate of (1/100 * 1/100) * 10,000 = 1

If we’ve created the column group we expect the optimizer to use the column group selectivity to calculate its estimate (1/100 * 10,000) = 100.

Then we get to the third query: if we don’t create the column group we expect the optimizer to slot 1/100 into the 4 places it has to appear in the formula to get an estimate (1/100 + 1/100 – (1/100 * 1/100)) * 10,000 = 199.

If we have created the column group the third expression in the formula can use the column group 1/num_distinct for the third expression to get: (1/100 + 1/100 – (1/100)) * 10,000 = 100

Here are the three execution plans (with a little cosmetic work) when the column group exists:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 11800 |    26   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 11800 |    26   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 11800 |    26   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 11800 |    26   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50 AND "N2"=50)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   199 | 23482 |    27   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   199 | 23482 |    27   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50 OR "N2"=50)

As you can see from the results in the Rows column, the optimizer has used the column group statistics for the AND estimate, but not for the OR estimate, resulting in an estimate that is out by a factor of nearly 2.

It should be noted that failure to use the column group for AND’ed predicates can introduce a massive error in cardinality estimates but for OR’ed predicates the worst it can be is a factor of 2.

Summary

There is a possible enhancement to the the optimizer’s use of column group statistics that could make the arithmetic consistent for AND and OR predicates and could allow the optimizer to produce better estimates for OR conditions.

Although any error in cardinality estimates can lead to undesirable execution plans the scale of the error due to the current inconsistency will be at worst a factor or 2, which is unlikely to introduce many surprise plans. The nature of the error means the estimate will be too big, which might push the optimizer into using tablescans and hash joins when indexed accesses and nested loops might be the better choice.

October 15, 2021

use_nl redux

Filed under: CBO,Execution plans,Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 2:58 pm BST Oct 15,2021

A question has just appeared on a note I wrote in 2012 about the incorrect use of the use_nl() hint in some sys-recursive SQL, linking forward to an explanation I wrote in 2017 of the use_nl() hint – particularly the interpretation of the form use_nl(a,b), which does not mean “use a nested loop from table A to table B)”.

The question is essentially – “does Oracle pick the join order before it looks at the hints”?

I’m going to look at one of the queries in the question (based on the 2017 table creation code) and explain how Oracle gets to the plan it finally picks. I’ll be using an instance of 21.3 in the examples below. Here’s the query, followed by the plan:

select
        /*+ use_nl(b) */
        a.v1, b.v1, c.v1, d.v1
from
        a, b, c, d
where
        d.n100 = 0
and     a.n100 = d.id
and     b.n100 = a.n2
and     c.id   = a.id
/


| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 20000 |  1054K|    78   (4)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL | D    |   100 |  1800 |    26   (4)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 20000 |   703K|    52   (4)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    | 10000 |   214K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"SEL$1" "C"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "C"@"SEL$1")
      USE_HASH(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "A"@"SEL$1")
      LEADING(@"SEL$1" "B"@"SEL$1" "A"@"SEL$1" "D"@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "D"@"SEL$1")
      FULL(@"SEL$1" "A"@"SEL$1")
      FULL(@"SEL$1" "B"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('21.1.0')
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."ID"="A"."ID")
   3 - access("A"."N100"="D"."ID")
   4 - filter("D"."N100"=0)
   5 - access("B"."N100"="A"."N2")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   6 -  SEL$1 / "B"@"SEL$1"
         U -  use_nl(b)

Note
-----
   - this is an adaptive plan

Points to note:

  • The Hint Report says the plan final did not use the use_nl(b) hint.
  • Whatever you may think the join order is by looking at the body of the plan, the leading() hint in the Outline Data tells us that the join order was (B A D C), and that explains why the use_nl(b) hint could not be used: B was never “the next table in the join order”.
  • The “visible” order of activity displayed in the plan is C D B A, but that’s because we swap_join_inputs(D) to put D above the (B,A) join, then swap_join_inputs(C) to put C above D.

So did Oracle completely pre-empt any plans that allowed B to be “the next table”, thus avoiding the hint, or did it consider some plans where B wasn’t the first table in the join order and, if so, would it have used a nested loop into B if that plan had had a low enough cost?

The only way to answer these questions is to look at the CBO (10053) trace file. For very simple queries it’s often enough to pick out a few lines as a starting point – in my case using egrep:

egrep -e "^Join order" -e"Best so far" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Join order aborted2: cost > best plan cost
Join order[5]:  D[D]#0  C[C]#3  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[6]:  D[D]#0  C[C]#3  B[B]#2  A[A]#1
Join order aborted2: cost > best plan cost

Join order[7]:  A[A]#1  D[D]#0  B[B]#2  C[C]#3
Join order aborted2: cost > best plan cost
Join order[8]:  A[A]#1  D[D]#0  C[C]#3  B[B]#2
Join order aborted2: cost > best plan cost
Join order[9]:  A[A]#1  B[B]#2  D[D]#0  C[C]#3
Join order aborted2: cost > best plan cost
Join order[10]:  A[A]#1  C[C]#3  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[11]:  A[A]#1  C[C]#3  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost

Join order[12]:  B[B]#2  D[D]#0  A[A]#1  C[C]#3
Join order aborted2: cost > best plan cost
Join order[13]:  B[B]#2  A[A]#1  D[D]#0  C[C]#3
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
Join order[14]:  B[B]#2  A[A]#1  C[C]#3  D[D]#0
Join order aborted2: cost > best plan cost
Join order[15]:  B[B]#2  C[C]#3  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Join order[16]:  C[C]#3  D[D]#0  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[17]:  C[C]#3  A[A]#1  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[18]:  C[C]#3  A[A]#1  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost
Join order[19]:  C[C]#3  B[B]#2  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Oracle has considerd 19 possible join orders (out of a maximum of 24 (i.e. 4!). In theory we might see 6 plans starting with each of the 4 tables. In fact we see that the optimizer’s first choice started with table D, producing 6 join orders, then switched to starting with table A, producing only 5 join orders.

The “missing” order is (A, B, C, D) which should have appeared between join orders 9 and 10. If we check the detail in the trace file for join order 9 (A, B, D, C) we’ll see that the optimizer aborted after calculating the join from A to B because the cost had already exceeded the “Best so far” by then and didn’t even start to calculate the cost of getting to the 3rd table (D) in that join order. Clearly, then, there was no point in the optimizer considering any other join order that started with (A, B) – hence the absence of (A, B, C, D), which would otherwise have been the next in the list

I’ve highlighted all the join orders where the optimizer didn’t abort, but produced a new “Best so far” line. My original egrep command was too simple to tell the whole story, though; the “Best so far” line includes only the cost of getting data from the first table in the join order and I need to expand the output in each case to 4 lines (one for each table). This is what the 4 non-aborted summaries look like:

egrep -A+3 -e"Best so far" or21_ora_15956.trc

Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 2  cost: 30137.036118  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 30163.548157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 3  cost: 78.079517  card: 10000.000000  bytes: 550000.000000
              Table#: 2  cost: 30163.348157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 2  cost: 2483.956340  card: 1000000.000000  bytes: 32000000.000000
              Table#: 1  cost: 2530.068379  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 2556.580418  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
              Table#: 1  cost: 52.204078  card: 20000.000000  bytes: 720000.000000
              Table#: 0  cost: 78.479517  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 104.991556  card: 20000.000000  bytes: 1380000.000000

As you can see, when we start with (B A) the estimated cost drops dramatically.

Now that we’ve see that Oracle looks at many (though not a completely exhaustive set of) plans on the way to the one it picks the thing we need to do to address the original question is check that for every single calculation where B is “the next table”, Oracle obeys our hint.

Each time the optimizer join “the next table” its default strategy is to consider the cost of a Nested Loop join, a Sort Merge join (possibly in two different ways), and a Hash join in that order. However, if the optimizer is obeying the hint it will consider only the nested loop join. Here’s a suitable call to egrep with the first four join orders::

egrep -e "^Join order" -e "^Now joining" -e"^NL Join" -e"^SM Join" -e"^HA Join" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Now joining: A[A]#1
NL Join
SM Join
SM Join (with index on outer)
HA Join
Now joining: B[B]#2
NL Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Now joining: C[C]#3
NL Join
SM Join
HA Join
Now joining: B[B]#2
NL Join

Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Now joining: B[B]#2
NL Join
Now joining: A[A]#1
NL Join
SM Join
HA Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Now joining: C[C]#3
NL Join
Join order aborted2: cost > best plan cost

As you can see, the only join considered when the optimizer is “Now joining” B is a Nested Loop join; for all other tables all three join methods (and sometimes two variants of the Sort Merge join) are evaluated.

You may also notice another of the clever strategies the optimizer uses to minimise its workload. On the Join order[2] the optimizer has gone straight to “Now joining C” because it has remembered the result of joining A to D from the previous join order.

This is only a very simple example and analysis, but I hope it has given you some idea of how the optimizer works, how clever it tries to be about minimising the work, and how it can obey a hint while still producing an execution plan that appears to have ignored the hint.

October 7, 2021

Hints and Costs

Filed under: 12c,CBO,Conditional SQL,Execution plans,Oracle — Jonathan Lewis @ 12:06 pm BST Oct 7,2021

This note is one I drafted three years ago, based on a question from the Oracle-L. It doesn’t directly address that question because at the time I was unable to create a data set that reproduced the problem’ but it did highlight a detail that’s worth mentioning, so I’ve finally got around to completing it (and testing on a couple of newer versions of Oracle).

I’ll start with a model that was supposed to demonstrate the problem behind the question:


rem
rem     Script:         122_or_expand.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:        
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem             12.2.0.1
rem

create table t1
segment creation immediate
pctfree 80 pctused 20
nologging
as
select
        *
from
        all_objects
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk
        primary key(object_id)
        using index pctfree 80
;

variable b1 number
variable b2 number
variable b3 number
variable b4 number
variable b5 number
variable b6 number

exec :b1 := 100
exec :b2 := 120
exec :b3 := 1100
exec :b4 := 1220
exec :b5 := 3100
exec :b6 := 3320

set serveroutput off

select
        object_name
from 
        t1
where
        object_id between :b1 and :b2
or      object_id between :b3 and :b4
or      object_id between :b5 and :b6
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));


The critical feature of the query is the list of disjuncts (ORs) which all specify a range for object_id. The problem was that the query used a plan with an index full scan when there were no statistics on the table (or its indexes), but switched to a plan  that used index range scans when statistics were gathered – and the performance of the plan with the full scan was unacceptable.  (Clearly the “proper” solution is to have some suitable statistics in place – but sometimes such things are out of the control of the people who have to solve the problems.)

The /*+ index() */ and (undocumented) /*+ index_rs_asc() */ hints had no effect on the plan. The reason why the /*+ index() */ hint made no difference is because an index full scan is one of the ways in which the /*+ index() */ hint can be obeyed – the hint doesn’t instruct the optimizer to pick an index range scan. The hint /*+ index_rs_asc() */ specifically tells the optimizer to pick an index Range Scan ASCending if the hint has been specified correctly and the choice is available and legal. So why was the optimizer not doing as it was told. Without seeing the execution plan or CBO trace file from a live example I can’t guarantee that the following hypothesis is correct, but I think it’s in the right ball park.

I think the optimizer was probably using the (new to 12c) cost-based“OR expansion” transformation, which basically transformed the query into a UNION ALL of several index range scans – and that’s why its outline would show /*+ index_rs_asc() */ hints, and the hint would only become valid after the transformation had taken place so if Oracle didn’t consider (or considered and discarded) the transformation when there were no stats in place then the hint would have to be “Unused” (as the new 19c hint-report would say).

When I tried to model the problem the optimizer kept doing nice things with my data, so I wasn’t able to demonstrate the OP’s problem. However in one of my attempts to get a silly plan I did something silly – that can happen by accident if your client code isn’t careful! I’ll tell you what that was in a moment – first, a couple of plans.

As it stands, with the data and bind variables as shown, the optimizer used “b-tree / bitmap conversion” to produce an execution plan that did three separate index range scans, converts rowids to bit, OR-ed the bit-strings, then converted back to rowids before accessing the table:

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |    84 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |    84   (5)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|   3 |    BITMAP OR                        |       |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   5 |      SORT ORDER BY                  |       |       |       |            |          |
|*  6 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   8 |      SORT ORDER BY                  |       |       |       |            |          |
|*  9 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|  10 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|  11 |      SORT ORDER BY                  |       |       |       |            |          |
|* 12 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

So the first thing I had to do was disable this feature, which I did by adding the hint /*+ opt_param(‘_b_tree_bitmap_plans’,’false’) */ to the query. This adjustment left Oracle doing the OR-expansion that I didn’t want to see:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   297 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   288 | 19008 |   297   (1)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|  10 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |   173 |  7612 |   177   (1)| 00:00:01 |
|* 11 |      INDEX RANGE SCAN                  | T1_PK           |   173 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

You’ll notice that the three range scans have different row estimates and costs – that’s the effect of bind variable peeking and my careful choice of bind variables to define different sized ranges. Take note, by the way, for the three filter predicates flagged at operations 3, 6, and 9.  These are the “conditional plan” filters that say things like: “don’t run the sub-plan if the runtime value of :b5 is greater than :b6”.

Since I didn’t want to see OR-expansion just yet I then added the hint /*+ no_or_expand(@sel$1) */ to the query and that gave me a plan with tablescan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   617 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   291 | 12804 |   617   (4)| 00:00:01 |
--------------------------------------------------------------------------

This was a shame because I really wanted to see the optimizer produce an index full scan at this point – so I decided to add an “unnamed index” hint to the growing list of hints – specifically: /*+ index_(@sel$1 t1@sel$1) */

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |   405 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |   405   (2)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | T1_PK |   291 |       |   112   (7)| 00:00:01 |
---------------------------------------------------------------------------------------------

This, of course, is where things started to get a little interesting – the index full scan costs less than the tablescan but didn’t appear until hinted. But after a moment’s thought you can dismiss this one (possibly correctly) as an example of the optimizer being cautious about the cost of access paths that are dictated by bind variables or unpeekable inputs. (But these bind variables were peekable – so maybe there’s more to it than that – I was still trying to get to a point where my model would behave more like the OP’s, so I didn’t follow up on this detail: maybe in a couple of years time … ).

Once last tweak – and that will bring me to the main point of this note. In my original code I was using three ranges dictated by 3 pairs of bind variables, for example [:b5, :b6]. What would happen if I made :b5 greater than :b6, say I swapped their values?

The original btree/bitmap plan didn’t change, but where I had simply blocked bree/bitmap plans and seen OR-expansion as a result the plan changed to a full tablescan (with the cost you saw above of 617). So tried again, adding the hint /*+ or_expand(@sel$1) */ to see why; and this is the plan I got:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   735 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   116 |  7656 |   735   (3)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|* 10 |     TABLE ACCESS FULL                  | T1              |     1 |    44 |   615   (4)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

I still get the same three branches in the expansion, but look what’s happened to the sub-plan for the third pair of bind variables. The optimizer still has the FILTER at operation 9 – and that will evaluate to FALSE for the currently peeked values; but the optimizer has decided that it should use a tablescan for this part of the query if it ever gets a pair of bind variables in the right order; and the cost of the tablescan has echoed up the plan to make the total cost of the plan 735, which is (for obvious reasons) higher than the cost of running the whole query as a single tablescan.

The same anomaly appears in 19.11.0.0 and 21.3.0.0. On the plus side, it’s possible that if you have code like this the optimizer will be using the btree/bitmap conversion anyway;

tl;dr

As a generic point it’s worth ensuring that if you’re using bind variables in client code to define ranges then you’ve got to get the values in the right order otherwise one day the answer to the question “nothing changed why is the query running so slowly?” will be “someone got in first with the bound values the wrong way round”.

DB Optimizer

Filed under: CBO,Oracle — Jonathan Lewis @ 8:56 am BST Oct 7,2021

I’ve rediscovered this list of articles by Stefan Koehler on my laptop; it has a bias towards SAP users, and it’s several years old judging by the dates so some details will have changed, but there still seem to be plenty of people using 12c (and SAP) so I thought I’d tidy it up and publish it

September 20, 2021

Optimizer Tip

Filed under: CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 9:04 am BST Sep 20,2021

This is a note I drafted in March 2016, starting with a comment that it had been about the same time the previous year that I had written:

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

I keep finding ancient drafts like this and if they still seem relevant – even if they are a little behind the times – I’ve taken to dusting them down and publishing. (There are still more than 730 drafts on my blog at present – which gives me scope for one per day for the next 2 years!)

With the passing of time, though, new information becomes available, algorithms change and (occasionally) I discover I’ve made a significant error in a hypothesis (or guess). In this case there are a couple of important additions that I’ve added to the end of the original note.

Optimizer Tips (IOUG Quick Tips 2015)

There are two very common reasons why the optimizer picks a bad execution plan. The first is that its estimate of the required data volume is bad, the second is that it has a misleading impression of how scattered that data is.

The first issue is often due to problems with the selectivity of complex predicates, the second to unsuitable values for the clustering_factor of potentially useful indexes. Recent [ed: i.e. as at 2015] versions of the Oracle software have given us features that try to address both these issues and I’m going to comment on them in the following note.

As always any change can have side effects; introducing a new feature might have no effect on 99% of what we do, a beneficial effect on 99% of the remainder, and a hideous effect on the 1% of 1% that’s left, so I will be commenting on both the pros and cons of both features.

Column Group Stats

The optimizer assumes that the data in two different columns of a single table are independent – for example the registration number on your car (probably) has nothing to do with the account number of your bank account. So when we execute queries like:

     colX = 'abcd'
and  colY = 'wxyz'

the optimizer’s calculations will be something like:

“one row in 5,000 is likely to have colX = ‘abcd’ and one row in 2,000 is likely to have colY = ‘wxyz’, so the combination will probably appear in roughly one row in ten million”.

On the other hand we often find tables that do things like storing post codes (zipcodes) in one column and city names in another, and there’s a strong correlation between post codes and city – for example the district code (first part of the post code) “OX1” will be in the city of Oxford (Oxfordshire, UK). So if we query a table of addresses for rows where:

     district_code = 'OX1'
and  city          = 'Oxford'

there’s a degree of redundancy, but the optimizer will multiply the total number of distinct district codes in the UK by the total number of distinct city names in the UK as it tries to work out the number of addresses that match the combined predicate and will come up with a result that is far too small.

In cases like this we can define “column group” statistics about combinations of columns that we query together, using the function dbms_stats.create_extended_stats(). This function will create a type of virtual column for a table and report the system-generated name back to us, and we will be able to see that name in the view user_tab_cols, and the definition in the view user_stat_extensions. If we define a column group in this way we then need to gather stats on it, which we can do in one of two ways, either by using the generated name or by using the expression that created it.


SQL> create table addresses (district_code varchar2(8), city varchar2(40));

Table created.

SQL> execute dbms_output.put_line( -
>        dbms_stats.create_extended_stats( -
>            user,'addresses','(district_code, city)'))

SYS_STU12RZM_07240SN3V2667EQLW

PL/SQL procedure successfully completed.

begin
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns SYS_STU12RZM_07240SN3V2667EQLW size 1'
        );
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns (district_code, city) size 1'
        );
end;
/

I’ve included both options in the anonymous pl/sql block, but you only need one of them. In fact if you use the second one without calling create_extended_stats() first Oracle will create the column group implicitly, but you won’t know what it’s called until you query user_stat_extensions.

I’ve limited the stats collection to basic stats with the “size 1” option. You can collect a histogram on a column group but since the optimizer can only use a column group with equality predicates you should only create a histogram in the special cases where you know that you’re going to get a frequency histogram or “Top-N” histogram.

You can also define extended stats on expressions (e.g. trunc(delivery-date) – trunc(collection_date)) rather than column groups, but since you’re only allowed 20 column groups per table [but see update 1] it would be better to use virtual columns for expressions since you can have as many virtual columns as you like on a table provided the total column count stays below the limit of 1,000 columns per table.

Warnings

  • Column group statistics are only used for equality expressions. [see also update 2]
  • Column group statistics will not be used if you’ve created a histogram on any of the underlying columns unless there’s also a histogram on the column group itself.
  • Column group statistics will not be used if you query any of the underlying columns with an “out of range” value. This, perhaps, is the biggest instability threat with column groups. As time passes and new data appears you may find people querying the new data. If you haven’t kept the column stats up to date then plans can change dramatically as the optimizer switches from using column group stats to multiplying the selectivities of underlying columns.
  • The final warning arrives with 12c. If you have all the adaptive optimizer options enabled the optimizer will keep a look out for tables that it thinks could do with column group stats, and automatically creates them the next time you gather stats on the table. In principle this shouldn’t be a problem – the optimizer should only do this when it has seen that column group stats should improve performance – but you might want to monitor your system for the arrival of new automatic columns.

Preference: table_cached_blocks

Even when the cardinality estimates are correct we may find that we get an inefficient execution plan because the optimizer doesn’t want to use an index that we think would be a really good choice. A common reason for this failure is that the clustering_factor on the index is unrealistically large.

The clustering_factor of an index is a measure of how randomly you will jump around the table as you do an index range scan through the index and the algorithm Oracle uses to calculate this measure has a serious flaw in it: it can’t tell the difference between a little bit of localised jumping and constant random leaps across the entire width of the table.

To calculate the clustering_factor Oracle basically walks the entire index in order using the rowid at the end of each index entry to check which table block it would have to visit, and every time it has to visit a “new” table block it increments a counter. The trouble with this approach is that, by default, it doesn’t remember its recent history so, for example, it can’t tell the difference in quality between the following two sequences of table block visits:

Block 612, block 87, block 154, block  3, block 1386, block 834, block 237
Block  98, block 99, block  98, block 99, block   98, block  99, block  98

In both cases Oracle would say that it had visited seven different blocks and the data was badly scattered. This has always been a problem, but it became much more of a problem when Oracle introduced ASSM (automatic segment space management). The point of ASSM is to ensure that concurrent inserts from different sessions tend to use different table blocks, the aim being to reduce contention due to buffer busy waits. As we’ve just seen, though, the clustering_factor doesn’t differentiate between “a little bit of scatter” and “a totally random disaster area”.

Oracle finally addressed this problem by introducing a “table preference” which allows you to tell it to “remember history” when calculating the clustering_factor. So, for example, a call like this:

execute dbms_stats.set_table_prefs(user,'t1','table_cached_blocks',16)

would tell Oracle that the next time you collect statistics on any indexes on table t1 the code to calculate the clustering_factor should remember the last 16 table blocks it had “visited” and not increment the counter if the next block to visit was already in that list.

If you look at the two samples above, this means the counter for the first list of blocks would reach 7 while the counter for the second list would only reach 2. Suddenly the optimizer will be able to tell the difference between data that is “locally” scattered and data that really is randomly scattered. You and the optimizer may finally agree on what constitutes a good index.

It’s hard to say whether there’s a proper “default” value for this preference. If you’re using ASSM (and there can’t be many people left who aren’t) then the obvious choice for the parameter would be 16 since ASSM tends to format 16 consecutive blocks at a time when a segment needs to make more space available for users [but see Update 3]. However, if you know that the real level of insert concurrency on a table is higher than 16 then you might be better off setting the value to match the known level of concurrency.

Are there any special risks to setting this preference to a value like 16? I don’t think so; it’s going to result in plans changing, of course, but indexes which should have a large clustering_factor should still end up with a large clustering_factor after setting the preference and gathering statistics; the indexes that ought to have a low clustering_factor are the ones most likely to change, and change in the right direction.

Footnote: “Danger, Will Robinson”.

I’ve highlighted two features that are incredibly useful as tools to give the optimizer better information about your data and allow it to get better execution plans with less manual intervention. The usual warning applies, though: “if you want to get there, you don’t want to start from here”. When you manipulate the information the optimizer is using it will give you some new plans; better information will normally result in better plans but it is almost inevitable that some of your current queries are running efficiently “by accident” (possibly because of bugs) and the new code paths will result in some plans changing for the worse.

Clearly it is necessary to do some thorough testing but fortunately both features are incremental and any changes can be backed out very quickly and easily. We can change the table_cached_blocks one table at a time (or even, with a little manual intervention, one index at a time) and watch the effects; we can add column groups one at a time and watch for side effects. All it takes to back out of a change is a call to gather index stats, or a call to drop extended stats. It’s never nice to live through change – especially change that can have a dramatic impact – but if we find after going to production that we missed a problem with our testing we can reverse the changes very quickly.

Updates

Update 1 – 20 sets of extended stats. In fact the limit is the larger of 20 and ceiling(column count/10), and the way the arithmetic is applied is a little odd so there are ways to hack around the limit.

Update 2 – Column groups and equality. It’s worth a special menton that the predicate colX is null is not an equality predicate, and column group stats will not apply but there can be unexpected side effects even for cases where you don’t use this “is null” predicate. (More articles here about column groups.)

Update 3 – table_cached_blocks = 16. This suggestions doesn’t cater for systems running RAC.

August 25, 2021

qbregistry 2

Filed under: CBO,dbms_xplan,Oracle,Transformations — Jonathan Lewis @ 1:45 pm BST Aug 25,2021

Following a question (very similar to one I had been asking myself) that appeared on twitter in response to my original posting on the new qbregistry format option in the dbms_xplan package, I’ve drafted a note of how I interpreted the execution plan so that I could more clearly see how my visualisation of the transformation maps (or fails to map) to the Query Block Registry.

I can’t guarantee the correctness of the description I’ve given here, but it’s probably fairly accurate.

Original Query (hiding the unnest and no_semijoin hints)

select  
        /* sel$1 */
        * 
from    t1 
where   t1.owner = 'OUTLN' 
and     object_name in (
                select  /* sel$2 */
                        distinct t2.object_name 
                from   t2 
                where  t2.object_type = 'TABLE'
        )
;

Transformation 1: unnest the subquery

This is visible in the Outline Information of the execution plan as the hint UNNEST(@”SEL$2″ UNNEST_INNERJ_DISTINCT_VIEW) and produces two new query blocks, the query block that “is” the unnested subquery and the query block that joins t1 to the unnested subquery vw_nso_1.

select  
        /* SEL$5DA710D3 */
        t1.* 
from
        t1,
        (
        select  /* SEL$683B0107 */
                distinct
                t2.object_name 
        from    t2 
        where   t2.object_type = 'TABLE'
        )       vw_nso_1
where
        t1.owner = 'OUTLN' 
and     vw_nso_1.object_name = t1.object_name



Transformation 2: view merge (join then aggregate)

This is visible in the Outline Information of the execution plan in the hint MERGE(@”SEL$683B0107″ >”SEL$5DA710D3″). I think this produces three new query blocks; the block that “is” the merged view, a block that selects (projects) from the merged view, and the query block that the main query now becomes.

We will pretend that t1 has only 4 columns, owner, object_name, object_type, object_id.

select
        /* SEL$B186933D */
        vm_nwvw_2.owner,
        vm_nwvw_2.object_name,
        vm_nwvw_2.object_type
        vm_nwvw_2.object_id
from    (
        select  /* SEL$2F1334C4 */
                -- no distinct, and t2.object_name and t1.rowid eliminated
                t1.owner,
                t1.object_name,
                t1.object_type
                t1.object_id
        from    (
                select  /* SEL$88A77D12 */
                        distinct
                        t1.rowid,        -- ensures we don't duplicate t1 rows
                        t1.owner,
                        t1.object_name,
                        t2.object_name,  -- seems redundant, but is in the trace file
                        t1.object_type
                        t1.object_id
                from
                        t1,
                        t2
                where
                        t2.object_type = 'TABLE'
                and     t1.owner = 'OUTLN'
                and     t1.object_name = t2.object_name
                ) 
        ) vm_nwvw_2
;

Transformation 3: aggregate into partial join

I realised only as I was writing this note that I had completely forgotten that the plan reported a semi join even though the subquery had been hinted with a no_semijoin hint, and that the reported semi join was actually a partial join.

However, the query block registry is identical with or without a partial join (controlled by the hint [no]partial_join) so there doesn’t seem to be a transformation stage corresponding to the choice of strategy. Maybe the apparently redundant query block layer allows the variation to appear if required.

It’s Difficult

A problem I have with the query block registry is deciding what it’s telling us – and maybe the trace file and the execution plan are not trying to tell us exactly the same thing. I think it’s quite difficult, anyway, to find a good way of presenting the information that is completely informative but clear and uncluttered.

Something that may help, when you can check the trace file and the final execution plan, is the order in which query blocks are registered. Some of them may be discarded, of course, as the optimizer works through options, some of them may be marked as COPY, but if you ignore those you may be able to see from what’s left the evolution of the plan. Here, for example, is the extract of the lines where each query block is registered, taken from the CBO trace for this query, with numbering:

32:Registered qb: SEL$1 0xc26e28e8 (PARSER)
36:  signature (): qb_name=SEL$1 nbfros=1 flg=0
37:    fro(0): flg=4 objn=76167 hint_alias="T1"@"SEL$1"

39:Registered qb: SEL$2 0xc26e0d58 (PARSER)
43:  signature (): qb_name=SEL$2 nbfros=1 flg=0
44:    fro(0): flg=4 objn=76168 hint_alias="T2"@"SEL$2"

966:Registered qb: SEL$683B0107 0xbcc187c8 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
970:  signature (): qb_name=SEL$683B0107 nbfros=1 flg=0
971:    fro(0): flg=0 objn=76168 hint_alias="T2"@"SEL$2"

973:Registered qb: SEL$5DA710D3 0xbb1fcb10 (SUBQUERY UNNEST SEL$1; SEL$2;)
977:  signature (): qb_name=SEL$5DA710D3 nbfros=2 flg=0
978:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
979:    fro(1): flg=5 objn=0 hint_alias="VW_NSO_1"@"SEL$5DA710D3"

1471:Registered qb: SEL$2F1334C4 0xbcf4b210 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3)
1475:  signature (): qb_name=SEL$2F1334C4 nbfros=1 flg=0
1476:    fro(0): flg=5 objn=0 hint_alias="VM_NWVW_2"@"SEL$2F1334C4"

1478:Registered qb: SEL$88A77D12 0xbcda5540 (PROJECTION VIEW FOR CVM SEL$683B0107)
1482:  signature (): qb_name=SEL$88A77D12 nbfros=2 flg=0
1483:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
1484:    fro(1): flg=1 objn=0 hint_alias="VW_NSO_1"@"SEL$5DA710D3"

1489:Registered qb: SEL$B186933D 0xbcda5540 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3)
1493:  signature (): qb_name=SEL$B186933D nbfros=2 flg=0
1494:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
1495:    fro(1): flg=0 objn=76168 hint_alias="T2"@"SEL$2"

Because it’s a very simple query you can almost see the “thinking” in the clumping of the line numbers.

  • The first two registrations are the original query blocks.
  • After a break the next pair is the t2 subquery being unested and the query which is the join between t1 and the unnested t2.
  • After another break we see, in rapid succession, the view using the merged join view, the projection view using that merged join view, then the query block selecting from that projection.

The thing I find difficult to keep clear in my mind (when trying to describe what the trace/registry is saying, that is, not when just reading the plan) is the “doubling” effect where transformation steps often seem to produce two query blocks, one for the inline view containing the transformed object and one for the query that is now using the transformed object; and a further source of confusion appears when a query block seems to be able to peek into an inner query block to reference the objects in it. I just keep losing track of the layers!

It’s probablyh as safe as it’s going to be to read this note (unless someone points out an error). I don’t think there’s any more that I could find to say about the example.

Next Page »

Website Powered by WordPress.com.