Oracle Scratchpad

May 31, 2022

Join View delete

Filed under: ANSI Standard,Bugs,CBO,Join Elimination,Oracle,Transformations — Jonathan Lewis @ 5:39 pm BST May 31,2022

I’ve written a couple of notes about deleting from join views and (ultimately) the fact that if you have muliple key-preserved tables in join view then the first key preserved table in the from clause is the one where the delete is applied. The topic came up on the Oracle developer forum again today, and closed with a funny little quirk.

Here’s a little data model followed by one query and two deletes that I’ve just run on Oracle 19.11.0.0 and 21.3.0.0. The query and first delete feature a join using “ANSI” syntax, the second delete translates the join into “traditional” Oracle syntax.

rem
rem     Script:         delete_join_2.sql       
rem     Dated:          May 2022
rem     Author:         J P Lewis
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem

create table t_parent (
        id      number 
                        constraint t_parent_pk primary key,
        val     varchar2(10)
);

create table t_child (
        id      number 
                        constraint t_child_fk references t_parent(id),
        val     varchar2(10)
);

alter table t_child add constraint t_child_pk primary key(id);

insert into t_parent values(1,rpad('parent',10));
insert into t_child  values(1,rpad('child', 10));

commit;

execute dbms_stats.gather_table_stats(user,'t_parent',cascade=>true)
execute dbms_stats.gather_table_stats(user,'t_child', cascade=>true)

set autotrace on explain

select t_child.* from t_parent join t_child on t_child.id = t_parent.id;

delete
        (select t_child.* from t_parent join t_child on t_child.id = t_parent.id);

delete (select t_child.* from t_parent, t_child where t_child.id = t_parent.id);

set autotrace off

The table definitions and content (with a little modification) come from the forum posting, and I have to admit that the “one-to-one” referential integrity from parent to child is a little unusual (though occasionally a very effective strategy). I’ve also enabled autotrace to get the (probable) execution plans for the three statements, but in the script on my test machine I’ve also enabled the CBO (10053) trace to pick up a few extra details.

The query is there to show us that the join will return a single row, and the row content will be from the t_child table. What happens, though, when we try to use that query as an inline view to the delete.

It is an updateable join view – and both tables are, in fact, key-preserved thanks to the primary key constraints at both ends of the join. Here’s what we get (with the execution plans removed from the output):

        ID VAL
---------- ----------
         1 child

1 row selected.



delete
*
ERROR at line 1:
ORA-02292: integrity constraint (TEST_USER.T_CHILD_FK) violated - child record found


1 row deleted.

Line 5 tells us that we selected one row – as expected.

Line 11 tells us that we’ve raised an error because we’re trying to break the foreign key constraint on the t_child table – and that error shouldn’t be a surprise because the t_parent table appears first in the from clause, so that’s the table the optimizer wants to delete rows from.

However, line 15 tells us that if we change to “traditional ” Oracle syntax the delete works – so which of the two delete statements is displaying buggy behaviour?

Execution plans

There’s a clue in the execition plans (when we get them – autotrace doesn’t give a plan for the delete that raised the error). So here are the plans for the select and the “traditional” delete in that order:

----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    14 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T_CHILD |     1 |    14 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------



-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT |            |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  DELETE          | T_CHILD    |       |       |            |          |
|   2 |   INDEX FULL SCAN| T_CHILD_PK |     1 |     3 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With assistance from the foreign key constraint, Join Elimination has occurred in both cases leaving t_child as the candidate for deletion. You’ll also note that the optimizer has recognised that while the view’s select list is “all columns” it only needs to optimize for “select rowid” to drive the delete, so produces an index-only access path to get the data it needs.

If we use explain plan, or check the 10053 trace file, for the ANSI delete that failed we’ll find the following:

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |             |     1 |     6 |     1   (0)| 00:00:01 |
|   1 |  DELETE             | T_PARENT    |       |       |            |          |
|   2 |   NESTED LOOPS      |             |     1 |     6 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN  | T_CHILD_PK  |     1 |     3 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| T_PARENT_PK |     1 |     3 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$4396EC5C
   3 - SEL$4396EC5C / T_CHILD@SEL$1
   4 - SEL$4396EC5C / T_PARENT@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T_CHILD"."ID"="T_PARENT"."ID")

There’s no Join Elimination, and Oracle is following the rule that “the first key preserved table in the from clause should be the target of the deletes.

Where’s the bug?

This is a question that doesn’t have an obvious answer.

The traditional syntax gives you the result you probably wanted but it’s deleting from the second table in the from clause of the join view – so is that a documentation bug or a coding bug because if the documentation is correct the statement should have raised the error that the ANSI syntax gave.

Conversely, is there an optimizer bug (or limitation) because table elimination could have been used in the ANSI case but wasn’t, although you could say that the ANSI code is doing the right thing (following the documentation) by raising an error and that the traditional syntax is doing the wrong thing.

If we search the CBO (10053) traces of the select and the ANSI join, looking for sections where the optimizer considers Join Elimination, the results are helpful:

grep -nT "^JE" or19_ora_30861_*.trc >temp.txt

With a little cosmetic editing here’s the output from the trace of the select statement:

or19_ora_30861_select.trc:  78  JE - Join Elimination
or19_ora_30861_select.trc: 807  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_select.trc: 813  JE:[V2] Query block (0x7fd3a152f5c0) before join elimination:
or19_ora_30861_select.trc: 816  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 817  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 818  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 819  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_select.trc: 820  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_select.trc: 821  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 822  JE:[V2] Query block (0x7fd3a152f5c0) after join elimination:

or19_ora_30861_select.trc: 844  JE:[V2]   Considering Join Elimination on query block SEL$58A6D7F6 (#0)
or19_ora_30861_select.trc: 850  JE:[V2] Query block (0x7fd39c09ee60) before join elimination:
or19_ora_30861_select.trc: 853  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 854  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 855  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 856  JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 857  JE:[V2] Eliminate table: T_PARENT (T_PARENT)
or19_ora_30861_select.trc: 858  JE:[V2] Query block (0x7fd39c09ee60) after join elimination:

We can see that the optimizer has considered join elimination twice (lines 2 and 12), once for each of two query blocks (sel$1 – a query block embedded in the text – and sel$58a6d7f6, a query block name generated during some query transformation).

At lines 5 and 6 we can see that Oracle failed to take advantage of the referential integrity constraint to eliminate t_parent with the explanation “predicate column refs”; but at lines 16 and 17 we see that there’s a change of heart and the optimizer does take advantage of the referential integrity to eliminate t_parent. So let’s look at the text of the “query block before join elimination” in both cases and see what’s changed:

First pass
----------
SELECT 
        "T_PARENT"."ID"  "QCSJ_C000000000300000",
        "T_PARENT"."VAL" "QCSJ_C000000000300002",
        "T_CHILD"."ID"   "QCSJ_C000000000300001",
        "T_CHILD"."VAL"  "QCSJ_C000000000300003" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"


Second Pass
-----------
SELECT 
        "T_CHILD"."ID" "ID",
        "T_CHILD"."VAL" "VAL" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"

The first pass seems to have expanded the two tables in the intial JOIN, ignoring the limited select list, so join elimination is blocked by the presence of the (non-PK) column t_parent.val (aliased as qcsj_c000000000300002, but when we get to the 2nd pass the select list has been limited to the minimum necessary set of columns that would be needed by a pure select statement and join elimination is possible.

The clue to how/why this has happened is the in the few lines between the two passes; the trace file also reports: “CVM: Merging SPJ view SEL$1 (#0) into SEL$2 (#0)” and when we look for the point where the optimizer registered query block sel$2 it turns out to have one item in its from clause, an object called: from$_subquery$_003 (which tells us where the 3 came from in all those column aliases QCSJ_C0000000003nnnnn. In effect it looks like the optimizer started by rewriting the ANSI select in Oracle syntax as (with a little poetic licence for clarity):

select  child_id, child_val, parent_id
from    (
         select  t_child.*, t_parent.*
         from    t_child, t_parent
         where   t_child.id = t_parent.id
        ) from$_subquery$_003
/ 

So the first pass was the optimizer trying to do join elimination on the inner query block and the second pass happened after the inner block was merged with the outer block and the resulting query block (called sel$58a6d7f6) allowed the join elimination to take place

So what happened with the delete statement. Here’s the result of the call to grep:

or19_ora_30861_delete.trc:  90  JE - Join Elimination
or19_ora_30861_delete.trc: 837  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_delete.trc: 843  JE:[V2] Query block (0x7fd3a152f758) before join elimination:
or19_ora_30861_delete.trc: 846  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_delete.trc: 847  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 848  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_delete.trc: 849  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_delete.trc: 850  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_delete.trc: 851  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 852  JE:[V2] Query block (0x7fd3a152f758) after join elimination:

That’s the lot – there is no attempt to do join elimination on a querye block called sel$58a6d7f6, though a few lines after this extract the trace file reports:

CVM:   Merging SPJ view SEL$1 (#0) into SEL$2 (#0)
Registered qb: SEL$58A6D7F6 0x9c0a3038 (VIEW MERGE SEL$2; SEL$1; SEL$2)

That’s exactly the same view merging that was present in the select – but the optimizer doesn’t attempt join elimination before moving on to report, a couple of lines later:

CVM:   Merging SPJ view SEL$58A6D7F6 (#0) into DEL$1 (#0)
Registered qb: SEL$4396EC5C 0x9c09dde0 (VIEW MERGE DEL$1; SEL$58A6D7F6; DEL$1)

It looks like this is a case of the optimizer missing a cycle in its transform/optimze loopback. The delete went through exactly the same process as the select, but failed to optimize the intermediate query block (which happened to be called from$_subquery$_004 in this case, being one layer deeper thanks to the extra level introduced by the delete).

Workaround

I tried two or three things to get the ANSI-style code to work without changing the table order in the from clause – extra layers of subqueries, explicit aliases, but Oracle kept coming back to a point where it had the full projection of all the columns in the two tables. (This reminded me of an old bug in ANSI select expansion that was fixed in 12cR2. It made me wonder if this was a code path where the same bug had been overlooked – it also made me wonder if there would be any problems if the two tables in the join totalled more than 1,000 columns – but that’s a test for another day.)

I did eventually find a fix (that didn’t involved switching the table order in the inline view):

delete
        (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);

Note that I now have only the primary key of t_child in the select

The section in the CBO trace file for Join Elimination looked like this:

JE:[V2] Query block (0x7f93996231f8) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_PARENT"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_PARENT" "T_PARENT","TEST_USER"."T_CHILD" "T_CHILD" WHERE "T_CHILD"."ID"="T_PARENT"."ID"
JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2] Eliminate table: T_PARENT (T_PARENT)
JE:[V2] Replaced column: T_PARENT.ID with column: T_CHILD.ID
JE:[V2] Query block (0x7f93996231f8) after join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_CHILD"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_CHILD" "T_CHILD"
Registered qb: SEL$E703A888 0x996231f8 (JOIN REMOVED FROM QUERY BLOCK SEL$1; SEL$1; "T_PARENT"@"SEL$1")

You may decide that this is an obvious workaround, but it’s not particularly helpful. In the general case I would expect to see a stored view that joined the two tables, with the expectation that this type of delete was just one of many operations the view was used for. Rewriting a view to use the increasingly common ANSI syntax shouldn’t result in some statements crashing “for no apparent reason”.

The last laugh

Given that the select and delete both included a merge of sel$1 into sel$2 to produce a query block called sel$58a6d7f6 I wondered what would happen if I blocked the merge:

SQL>  delete  /*+  no_merge (@sel$1) */
  2          (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);
	(select t_child.id from t_parent join t_child on t_child.id = t_parent.id)
        *
ERROR at line 2:
ORA-01752: cannot delete from view without exactly one key-preserved table

Isn’t that the error message that should have been coming out the entire time – according to the orginal documentation for updateable join views?

March 22, 2022

Upgrade Surprise

Filed under: 19c,Bugs,Oracle,Transformations,Upgrades — Jonathan Lewis @ 10:04 am GMT Mar 22,2022

Here’s a little surprise that showed up in the most recent (March 2022) article that I sent to Simpletalk for the series on transformations. I had been using 19c (19.11.0.0) to create and run my little demos but the editor had used 12.2.0.1 to check the examples and questioned a comment I had made about a “default plan”.

Here’s the query in question. I was using the emp and dept tables from the Scott schema to demonstrate a point about subquery execution:

rem
rem     Script:         unnest_demo_simpletalk_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem 

break on deptno skip 1

select
        /*+ 
                qb_name(main)
                gather_plan_statistics 
        */
        *
from    emp e1
where   e1.sal + nvl(e1.comm,0) > (
                select  /*+ qb_name(subq) */
                        avg(e2.sal + nvl(e2.comm,0))
                from    emp e2
                where   e2.deptno = e1.deptno
        )
order by
        e1.deptno, e1.empno
/

As you can see, I’ve used a correlated aggregate subquery to report all employees who earned more than the average for their department, where “earnings” is calculated as the sum of salary and commission.

Here’s the plan I got when I ran this query under 19c:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |      1 |      6 |00:00:00.01 |      24 |  2048 |  2048 | 2048  (0)|
|*  2 |   FILTER             |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   3 |    TABLE ACCESS FULL | EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   4 |    SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:00.01 |      18 |       |       |          |
|*  5 |     TABLE ACCESS FULL| EMP  |      3 |      5 |     14 |00:00:00.01 |      18 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("E1"."SAL"+NVL("E1"."COMM",0)>)
   5 - filter("E2"."DEPTNO"=:B1)

When my editor ran the query on 12.2.0.1, and when I started up an instance of 12.2.0.1 and ran the query, the plan looked like this:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|*  1 |  FILTER              |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|   2 |   SORT GROUP BY      |      |      1 |      4 |     14 |00:00:00.02 |      29 |      6 |  2048 |  2048 | 2048  (0)|
|*  3 |    HASH JOIN         |      |      1 |     70 |     70 |00:00:00.02 |      29 |      6 |  1922K|  1922K| 1053K (0)|
|   4 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      6 |       |       |          |
|   5 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9E18A930")
      MERGE(@"SEL$AA0D0E02" >"SEL$B4BE209F")
      OUTLINE(@"SEL$B4BE209F")
      UNNEST(@"SUBQ")
      OUTLINE(@"SEL$AA0D0E02")
      OUTLINE(@"SEL$D6166863")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      FULL(@"SEL$9E18A930" "E2"@"SUBQ")
      FULL(@"SEL$9E18A930" "E1"@"MAIN")
      LEADING(@"SEL$9E18A930" "E2"@"SUBQ" "E1"@"MAIN")
      USE_HASH(@"SEL$9E18A930" "E1"@"MAIN")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("E1"."SAL"+NVL("E1"."COMM",0)>SUM("E2"."SAL"+NVL("E2"."COMM",0))/COUNT("E2"."SAL"+NVL("E2"."COMM",0))
              )
   3 - access("E2"."DEPTNO"="E1"."DEPTNO")

(I’ve added in a request for the ‘outline’ in the display_cursor() format.) The correlated subquery has been unnested and the resulting inline view has been subject to complex view merging. It was only at this point that I realised that the 19c plan was a little surprising and not what I should have expected.

After checking that the configuration and statistics (including the system stats) were the same on the two instances I re-ran the query on 12c with the /*+ no_unnest */ hint to make it use the plan that 19c had produced and I found (as expected) that the plan with filter subquery produced by 19c had a higher cost than the unnesting plan produced by 12c.

Next I re-ran the query on 19c with the /*+ unnest */ hint to make it use the plan that 12c had produced – but it didn’t! 19c “ignored” the hint and carried on using the filter subquery plan. It did, however, supply the following extra information when I added the ‘hint_report’ option to the to the display_cursor() format:

Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------

   1 -  MAIN
           -  qb_name(main)

   4 -  SUBQ
         U -  unnest / Failed basic validity checks
           -  qb_name(subq)

The code in 19c thinks that it would be illegal to unnest the subquery that 12c was unnesting (does this mean that some people may be getting wrong results in 12c). So I checked the 10053 (CBO) trace file to see if there was any further information there that would “explain” the problem. This is what I found:

SU: Checking validity for Null Accepting Semi JoinUnnesting for query block MAIN(#1)
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: More than 1 column in connect condition.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

The reference to “Null accepting” looks a little suspect but prompted a few experiments (modifying the data to eliminate nulls, adding not null declarations to columns, simplifying the query etc.) that suggested that the problem was essentially that the optimizer did not want to unnest when the comparison was with the expression (sal + comm) regardless of the operator, and even when all the relevant columns had been populated, declared not null, and the nvl() function had been removed.

It doesn’t seem reasonable in this case, but possibly the block is a generic strategy to avoid errors in some more subtle cases, and perhaps the block will be refined and relaxed in future versions. (Or maybe it’s a bug that the wrong test is being called at this point – there was nothing in the query requiring “null acceptance” by the time I got through the last test.)

I did find a workaround that you could use to avoid any code rewrite:

alter table emp add nvl_earnings 
        invisible 
        generated always as (sal + nvl(comm,0)) 
        virtual
;

There’s seems to be no good reason why this should work – but it does. The subquery unnests and the Predicate Information in the plan doesn’t give any clue that it’s using a virtual column.

Summary:

When you upgrade from 12c there are some queries involving correlated subqueries that no longer unnest the subquery. This may have a significant impact on performance and it may not be possible to bypass the problem unless you rewrite the query to do a manual unnest although I did find a virtual column workaround for my specific example. So far I’ve tested the query on 19.11.0.0 and 21.3.0.0, the behaviour is the same in both versions.

Footnote:

After failing to find anything on MOS about the problem I emailed a draft of this note to Nigel Bayliss at Oracle – who did find a promising match on MOS.

The failure to unnest may be the consequence of the fix for bug 30593046: “A query having a scalar subquery returned a wrong result”. The fix was introduced in 19.9.0.0 but was too restrictive, leading to the creation of bug 33325981: “Query Loses Subquery Unnesting From Plan in 19.9 and Above”.

The fix for 33325981 was distributed in 19.13.0.0 and 21.4.0.0 (plus a couple of earlier RURs, with patches available for various versions back to 12.2.0.1). Unfortunately the newer fix still doesn’t go far enough in reducing the restrictions and my example still doesn’t unnest.

Make sure you check any code that depends on “expression-based” subquery unnesting before you upgrade to 19.9, as it may change plan, which may make a difference to performance and a requirement for a workaround.

February 21, 2022

CTE Enhancement

Filed under: 12c,Oracle,Performance,Transformations — Jonathan Lewis @ 9:24 am GMT Feb 21,2022

For many years I’ve told people that when you materialize a CTE (common table expression / “with” subquery) the result set will be written to the temporary tablespace using direct path writes and will be read back using cached reads. This stopped being an accurate description in 12c.

There is a clue about this in the way that the corresponding execution plans and I’ll be pointing that out later. The key difference between earlier versions of Oracle and newer versions is that the GTT (global temporary table) that holds the materialized result set is not necessarily written to disc, and may even avoid allocating a temporary segment.

I started looking at this feature a couple of days ago after remembering that someone, somewhere, had mentioned some details about a temporary object being kept in the PGA rather than being written to disc if the size wasn’t too big. I couldn’t remember if this was GTTs or temporary LOBs (or something completely different) and I only had a vague memory that there was a critical size that might have been 256KB; so I started experimenting with materializing CTEs.

Here’s the script I used to establish a baseline in 11g. I took a fairly arbitrary starting guess that if there was a PGA/Temp split is might be around 64KB.

rem
rem     Script:         cte_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

alter session set events '10046 trace name context forever, level 8';

prompt  ==============================
prompt  First sample - just under 64KB
prompt  ==============================

set arraysize  35
set pagesize  35

set pause Waiting...
set pause on

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 63
)
select
        substr(v1,-10)
from
        g1
;

alter session set events '10046 trace name context off';

execute snap_my_stats.end_snap
execute snap_ts.end_snap

prompt  ==============================
prompt  Second sample - just over 64KB
prompt  ==============================

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 64
)
select
        substr(v1,-10)
from
        g1
;

execute snap_my_stats.end_snap
execute snap_ts.end_snap

set pause off

The calls to the package snap_my_stats are the usual ones I use (very old source at this Wayback URL) to report the session’s activity (v$mystat) between start and end snapshot and the calls in the snap_ts package do the same for the I/O per tablespace, summing bu tablespace across v$filestat and v$tempstat.

This script basically materializes and reports a GTT with a single column of 1,024 bytes, and in this case either 63 or 64 rows. You’ll note that I’ve built another assumption into the code that the CTE (if kept in memory) won’t follow all the details of a “real” table block, but will be a simple array with a tiny overhead per row – I’ve also assumed that the optimizer isn’t smart enough (or foolhardy enough) to push the substr() call inside the CTE.

I’ve set pause on and set both the arraysize and pagesize to a value less than the total number of rows I’ll be fetching so that I can check a couple of dynamic performance views (in particular v$sort_usage) from another session while the queries are running.

As a starting point, here are some critical values I’ve selected from the various outputs for just the 63 row case when running 11.2.04:

-------------
Session stats
-------------
Name                                                 Value
----                                                 -----
physical reads                                           9
physical reads cache                                     9
physical writes                                          9
physical writes direct                                   9
physical writes direct temporary tablespace              9

---------
T/S Stats 
---------
TS#        Reads      Blocks   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max Tablespace
----       -----      ------   ---------    ---      ------      ------   ---------    --- -------------------
   3           1           9        .000      1           1           9        .000      0 TEMP


Since I've enabled extended tracing at level 8 (waits) I can also show you the I/O waits reported for the reads and writes:
WAIT #140185232180256: nam='direct path write temp' ela= 180 file number=201 first dba=35073 block cnt=9 obj#=-1 tim=1645178184267768
...
WAIT #140185232180256: nam='db file scattered read' ela= 183 file#=201 block#=35073 blocks=9 obj#=-40016369 tim=1645178184268342

A dump of the 9 blocks of the temporary file (the “file number-201” translates to tempfile 1 since I have db_files = 200) starting at block 35073 looks like an ordinary table with pctfree 10 (which is one of the annoying details of GTTs – you can’t adjust it), 3 ITL slots (which is normal for CTAS) and 7 rows per block.

So, for 11g, we conclude that the effect of materializing a CTE is simply to create a GTT in the temporary tablespace, write it out using direct path writes, then read it back into the buffer cache using db file scattered reads. (You might want to confirm that this always happens, even if the CTE holds only one row.)

If you take advantage of the pause to issue “alter system flush buffer_cache” from another session you can also dump the segment header block (35072 in my case) to see that it’s a normal table segment header block – using freelist management, not ASSM because that’s the way temporary tablespaces have to be declared. The segment header block didn’t get written to disc in the normal course of the test.

12c Enhancement

This is the moment where the second query, and the pause that allows me to query v$sort_usage, becomes significant. When I started 12.2.0.1 with the 63 row query I saw:

  • No I/O on the temporary tablespace
  • No entry in v$sort_usage

To my great satisfaction the 64 row query did report I/O to the temporary tablespace (10 blocks this time – needing one extra block to cater for the 64th row) with v$sort_usage reporting a segment being created on my behalf. Obviously I re-ran the test a couple of times, flushing the buffer cache and shared pool, and connecting to a new session each time. The results were totally consistent: 63 rows => no GTT, 64 rows => GTT.

If you’re feeling a little suspicious at this point, bear with me.

This is the point where I switched to 19.11.0.0 – and both queries ran in memory with no sign of a GTT being created. Luckily I had cloned the query several times in the script generating different pairs of numbers of rows: 127/128, 255/256, 511/512, 1023/1024, and when I hit 1024 (and 1023) my session produced a GTT.

Somewhere between 512 and 1023 rows I was hitting a critical breakpoint – so I nearly started working through a binary chop to find the actually breakpoint; luckily, though, I had a little inspiration: if the overhead per row was 3 bytes (as it would be for a normal table column of more than 254 bytes) then 1023 rows would have an overhead of about 3KB – so I should test 1021 rows if I wanted to test a memory of just under 1MB.

Sure enough, at 1021 rows the GTT didn’t appear, at 1022 rows it did – time after time after time.

But …

My tests seemed to be totally repeatable. Again, I connected to a new session a few times, I flushed the buffer cache, I flushed the shared pool, I checked v$sort_usage. Consistently the results seemed to say:

  • 12.2 uses the PGA up to 64KB then dumps to a GTT
  • 19.11.0.0 uses the PGA up to 1MB then dumps to a GTT

Except – that night I had to shut down the two virtual machines because sometimes, for no obvious reason, I can’t hibernate my laptop while the VMs are running; and when I started everything up again the following morning and started re-running the tests as I wrote up the results something had changed. In fact my 19.11 instance didn’t dump to a GTT until I had reached nearly 10MB of data and my 12.2 wasn’t even dumping at 1MB; and I have no idea why a complete restart made such a difference.

After spending a little time trying to figure out what had changed – and I think it may have been that I’d been running the previous day’s tests after doing a lot of heavy work with temporary LOBs trying to pin down an anomaly with the handling of the temporary tablespace – I finally tried a google search using keywords that might be relevant and found this article that Keith Laker wrote about 5 years ago.

The feature is known as In-memory “cursor-duration” temporary table. I mentioned a clue in the execution plans at the start of this note: materialization shows up with a “temp table transformation” operation followed, in 11g, by with a child operation of “load as select”; but in 12.2 the child operation is “load as select (cursor duration memory)”. I really should have started my invesigation by putting the entire text of that operation into a search engine.

Summary

(Basically the highlights from Keith’s article):

  • The “in-memory cursor-duration”temporary table” change appeared in 12.2
  • It can be used in a number of transformations that the optimizer does
  • It’s not possible to force the use of the feature for a given query, it’s down to an internal algorithm
  • The mechanism uses memory that is “essentially” PGA
  • Despite the name this feature does not require you to licence the In-Memory option
  • If you’re still using an older version of Oracle this could be a good reason for upgrading as it can reduce the I/O load particularly for “analytic” types of query at a cost of using extra memory.

All the work I had done trying to find a break-point where Oracle switched from using PGA to using a GTT had been a waste of time – and the apparently consistent results on the first day had been an “accident” dictated (possibly) by some PGA-related activity that had taken place before I started running my tests .

Footnotes and geeky things

Five years on from the publication date of Keith’s article we shouldn’t be surprised to see some changes. Keith notes that the mechanism will apply only to serial queries that do more than one pass over the table – but there are two points to raise there:

  • possibly the two-pass thing is because it usually takes two passes over a CTE before Oracle will materialize a CTE automatically; my example shows the in-memory effect on a single pass – but that was a forced materialization.
  • the restrictions on parallelism may have been relaxed by now – if you check for hidden parameters you will find: _in_memory_cdt_maxpx, default value 4, described as “Max Parallelizers allowed for IMCDT”.

Interestingly there are more “cdt” parameters in 12.2 than there are in 19.11, and there are clear indications of some changes in algorithm and mechanism:

12c parameters like '%cdt%
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_cdt_shared_memory_limit                   0                    Shared Memory limit for CDT
_cdt_shared_memory_query_percent           20                   Per Query Percentage from CDT pool
_imcdt_use_mga                             ON                   MGA for In Memory CDT
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

19g parameters like '%cdt%'
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_hcs_enable_in_mem_cdt_hint                FALSE                add hint opt_param('_in_memory_cdt', 'off')
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

The parameter “_in_memory_cdt” can take the values ON, LIMITED, or OFF – which tells you that even if you can’t force a query to use in-memory CDTs you can (if you really want to) stop a query from using the feature. There are a few notes about this parameter and its significance to RAC and parallel execution (for 12.2) on MOS – if you have an account – Doc ID 2388236.1 What is _in_memory_cdt Parameter?

The reference to MGA (the “managed global area”) in 12.2 is also quite interesting. This is documented as a Solaris feature using OSM to share memory between processes. For more general details you can review MOS Doc ID 2638904.1 MGA (Managed Global Area) Reference Note (again, only if you have an account).

The “new” oradebug mechanism shows (from 18c) a couple of relevant components under SQL compilation and execution that you could trace if you want to investigate further.

Components in library RDBMS:
--------------------------
  SQL_Compiler                 SQL Compiler ((null))
    ICDT_Compile               In Memory CDT Compilation (qks3t)
  SQL_Execution                SQL Execution (qer, qes, kx, qee)
    ICDT_Exec                  In Memory CDT Execution (qes3t, kxtt)

January 1, 2022

Happy New Year

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,Transformations — Jonathan Lewis @ 12:01 am GMT Jan 1,2022

Here’s an entertaining little bug that appeared on the oracle-l list server just in time to end one year and start another in a suitable way. The thread starts with an offering from Matthias Rogel (shown below with some cosmetic changes) to be run on Oracle 12.2.0.1:

rem
rem     Script:         group_by_bug.sql
rem     Author:         Mathias Rogel  / Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             19.11.0.0       Fixed
rem             12.2.0.1        Fail
rem

create table t as (
        select date'2021-12-30' d from dual 
        union all 
        select date'2021-12-31'   from dual
);

select extract(year from d), count(*) from t group by extract(year from d);

alter table t add primary key(d);
select extract(year from d), count(*) from t group by extract(year from d);

This doesn’t look particularly exciting – I’ve created a table with two rows holding two dates in the same year, then counted the number of rows for “each” year before and after adding a primary key on the date column. Pause briefly to think about what the results might look like …

Table created.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          2

1 row selected.


Table altered.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          1
              2021          1

2 rows selected.

After adding the primary key (with its unique index) the result changes to something that is clearly (for this very simple data set) wrong.

At this point I offered a hypothetical reason why Oracle might be producing the wrong result, but Tim Gorman was one step ahead of me and supplied a bug reference from MOS: Wrong Result Using GROUP BY with EXTRACT Function Against DATE (Doc ID 2629968.1)

The MOS document describes this as a bug introduced in the upgrade from 12.1.0.2 to 12.2.0.1, demonstrates the error with the extract() function applied to a date, and supplies three possible workarounds (but not the workaround or explanation I supplied in my response on oracle-l).

The document also pointed to a further bug note that described how the problem also appeared with the combination of the to_char() function applied to a date column with a unique indexes: 12.2 Wrong Results from Query with GROUP BY Clause Based on To_char Function of Unique Index Columns (Doc ID 2294763.1) with a further suggestion for applying a patch (see MOS Doc ID: 26588069.8) or upgrading to 18.1 (where the bug has been fixed).

Matthias Rogel supplied a follow-up demonstrating the problem with to_char(), which prompted me to create an example showing that it wasn’t just about dates – which I’ve tidied up below (reminder, all results on this page are from 12.2.0.1):

create  table t1 as 
select  round(rownum/10,1) n1 
from    all_objects 
where   rownum <= 10
;


select n1 from t1 order by n1;
select n1, count(*) from t1 group by n1 order by n1;

column nch format A3

select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

alter table t1 add constraint t1_pk primary key(n1);
select to_char(n1,'99') nch , count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

As before I’ve created a simple table, and populated it with a few rows of data. THe first two queries are there to show you the data (0.1 to 1.0 by steps of 0.1), and show that aggregating the raw data produces one row per value.

I’ve then repeated the aggregation query, but converted each value to a character string that effectively rounds the value to an integer. Here are the two sets of results, before and after adding the primary key.

NCH   COUNT(*)
--- ----------
  0          4
  1          6

2 rows selected.

Table altered.

NCH   COUNT(*)
--- ----------
  0          1
  0          1
  0          1
  0          1
  1          1
  1          1
  1          1
  1          1
  1          1
  1          1

10 rows selected.

Again, the introduction of the primary key constraint on the column results in wrong results. In this example, though I’ve pulled the execution plans from memory along with their outlines, and this is what the two plans look like.

SQL_ID  gt5a14jb0g4n0, child number 0
-------------------------------------
select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99')
order by 1,2

Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    10 |    30 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    10 |    30 |     4  (50)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |    10 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


SQL_ID  4fxxtmrh8cpzp, child number 0
-------------------------------------
select to_char(n1,'99') nch , count(*) from t1 group by
to_char(n1,'99') order by 1,2

Plan hash value: 1252675504

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     2 (100)|          |
|   1 |  SORT ORDER BY   |       |    10 |    30 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9BB7A81A")
      ELIM_GROUPBY(@"SEL$47952E7A")
      OUTLINE(@"SEL$47952E7A")
      ELIM_GROUPBY(@"SEL$1")
      OUTLINE(@"SEL$1")
      INDEX(@"SEL$9BB7A81A" "T1"@"SEL$1" ("T1"."N1"))
      END_OUTLINE_DATA
  */

In the absence of the primary key (index) Oracle does a full tablescan, then hash group by, then sort order by. When the primary key is put in place Oracle does an index full scan (which is legal because the index must contain all the data thanks to the not null declaration inherent in a primary key) and a sort order by without any group by.

You might wonder if the problem arises because Oracle assumes the indexed path somehow means the aggregation doesn’t apply – but with a /*+ full(t1) */ hint in place and a full tablescan in the plan the aggregation step is still missing — and if you look at the Outline Data section of the plan you can see that this is explicitly demanded by the hint(s): /*+ elim_groupby() */

My hypothesis (before I read the bug note) was that the optimizer had picked up the primary key declaration and seen that n1 was unique and therefore allowed the aggregating group by to be eliminated, but failed to “notice” that the to_char() – or extract() in the date example – meant that the assumption of uniqueness was no longer valid. To work around this problem very locally I simply added the hint /*+ no_elim_groupby */ (with no query block specified) to the query – and got the correct results.

Footnote

There is an interesting side note to this example (though not one that I would want to see used in a production system – this comment is for interest only). If you look at the Outline Data for the plan when there was no primary key you’ll notice that the only outline_leaf() is named sel$1 whereas in the plan with the primary key sel$1 appears as an outline() and the only outline_leaf() is named sel$9bb7a81a. As “outline leaf” is a query block that was used by the optimizer in constructing the final plan, while an “outline” is an intermediate query block that was examined before being transformed into another query block. So this difference in the Outline Data tells us that the problem appears thanks to a transformation that did not happen when there was no index – so what would our execution plan look like if the only hint we used in the query was /*+ outline_leaf(@sel$1) */ ?

SQL_ID  apgu34hc3ap7f, child number 0
-------------------------------------
select /*+ outline_leaf(@sel$1) */ to_char(n1,'99') nch , count(*) from
t1 group by to_char(n1,'99') order by 1,2

Plan hash value: 3280011052

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY    |       |    10 |    30 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY   |       |    10 |    30 |     3  (67)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


This posting was scheduled to launch at 00:01 GMT on 1st January 2022. Happy new year – just be careful that you don’t try to extract() or to_char() it if you’re running 12.2.0.1 unless you’ve applied patch 26588069.

August 25, 2021

qbregistry 2

Filed under: CBO,dbms_xplan,Oracle,Transformations — Jonathan Lewis @ 1:45 pm BST Aug 25,2021

Following a question (very similar to one I had been asking myself) that appeared on twitter in response to my original posting on the new qbregistry format option in the dbms_xplan package, I’ve drafted a note of how I interpreted the execution plan so that I could more clearly see how my visualisation of the transformation maps (or fails to map) to the Query Block Registry.

I can’t guarantee the correctness of the description I’ve given here, but it’s probably fairly accurate.

Original Query (hiding the unnest and no_semijoin hints)

select  
        /* sel$1 */
        * 
from    t1 
where   t1.owner = 'OUTLN' 
and     object_name in (
                select  /* sel$2 */
                        distinct t2.object_name 
                from   t2 
                where  t2.object_type = 'TABLE'
        )
;

Transformation 1: unnest the subquery

This is visible in the Outline Information of the execution plan as the hint UNNEST(@”SEL$2″ UNNEST_INNERJ_DISTINCT_VIEW) and produces two new query blocks, the query block that “is” the unnested subquery and the query block that joins t1 to the unnested subquery vw_nso_1.

select  
        /* SEL$5DA710D3 */
        t1.* 
from
        t1,
        (
        select  /* SEL$683B0107 */
                distinct
                t2.object_name 
        from    t2 
        where   t2.object_type = 'TABLE'
        )       vw_nso_1
where
        t1.owner = 'OUTLN' 
and     vw_nso_1.object_name = t1.object_name



Transformation 2: view merge (join then aggregate)

This is visible in the Outline Information of the execution plan in the hint MERGE(@”SEL$683B0107″ >”SEL$5DA710D3″). I think this produces three new query blocks; the block that “is” the merged view, a block that selects (projects) from the merged view, and the query block that the main query now becomes.

We will pretend that t1 has only 4 columns, owner, object_name, object_type, object_id.

select
        /* SEL$B186933D */
        vm_nwvw_2.owner,
        vm_nwvw_2.object_name,
        vm_nwvw_2.object_type
        vm_nwvw_2.object_id
from    (
        select  /* SEL$2F1334C4 */
                -- no distinct, and t2.object_name and t1.rowid eliminated
                t1.owner,
                t1.object_name,
                t1.object_type
                t1.object_id
        from    (
                select  /* SEL$88A77D12 */
                        distinct
                        t1.rowid,        -- ensures we don't duplicate t1 rows
                        t1.owner,
                        t1.object_name,
                        t2.object_name,  -- seems redundant, but is in the trace file
                        t1.object_type
                        t1.object_id
                from
                        t1,
                        t2
                where
                        t2.object_type = 'TABLE'
                and     t1.owner = 'OUTLN'
                and     t1.object_name = t2.object_name
                ) 
        ) vm_nwvw_2
;

Transformation 3: aggregate into partial join

I realised only as I was writing this note that I had completely forgotten that the plan reported a semi join even though the subquery had been hinted with a no_semijoin hint, and that the reported semi join was actually a partial join.

However, the query block registry is identical with or without a partial join (controlled by the hint [no]partial_join) so there doesn’t seem to be a transformation stage corresponding to the choice of strategy. Maybe the apparently redundant query block layer allows the variation to appear if required.

It’s Difficult

A problem I have with the query block registry is deciding what it’s telling us – and maybe the trace file and the execution plan are not trying to tell us exactly the same thing. I think it’s quite difficult, anyway, to find a good way of presenting the information that is completely informative but clear and uncluttered.

Something that may help, when you can check the trace file and the final execution plan, is the order in which query blocks are registered. Some of them may be discarded, of course, as the optimizer works through options, some of them may be marked as COPY, but if you ignore those you may be able to see from what’s left the evolution of the plan. Here, for example, is the extract of the lines where each query block is registered, taken from the CBO trace for this query, with numbering:

32:Registered qb: SEL$1 0xc26e28e8 (PARSER)
36:  signature (): qb_name=SEL$1 nbfros=1 flg=0
37:    fro(0): flg=4 objn=76167 hint_alias="T1"@"SEL$1"

39:Registered qb: SEL$2 0xc26e0d58 (PARSER)
43:  signature (): qb_name=SEL$2 nbfros=1 flg=0
44:    fro(0): flg=4 objn=76168 hint_alias="T2"@"SEL$2"

966:Registered qb: SEL$683B0107 0xbcc187c8 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
970:  signature (): qb_name=SEL$683B0107 nbfros=1 flg=0
971:    fro(0): flg=0 objn=76168 hint_alias="T2"@"SEL$2"

973:Registered qb: SEL$5DA710D3 0xbb1fcb10 (SUBQUERY UNNEST SEL$1; SEL$2;)
977:  signature (): qb_name=SEL$5DA710D3 nbfros=2 flg=0
978:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
979:    fro(1): flg=5 objn=0 hint_alias="VW_NSO_1"@"SEL$5DA710D3"

1471:Registered qb: SEL$2F1334C4 0xbcf4b210 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3)
1475:  signature (): qb_name=SEL$2F1334C4 nbfros=1 flg=0
1476:    fro(0): flg=5 objn=0 hint_alias="VM_NWVW_2"@"SEL$2F1334C4"

1478:Registered qb: SEL$88A77D12 0xbcda5540 (PROJECTION VIEW FOR CVM SEL$683B0107)
1482:  signature (): qb_name=SEL$88A77D12 nbfros=2 flg=0
1483:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
1484:    fro(1): flg=1 objn=0 hint_alias="VW_NSO_1"@"SEL$5DA710D3"

1489:Registered qb: SEL$B186933D 0xbcda5540 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3)
1493:  signature (): qb_name=SEL$B186933D nbfros=2 flg=0
1494:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
1495:    fro(1): flg=0 objn=76168 hint_alias="T2"@"SEL$2"

Because it’s a very simple query you can almost see the “thinking” in the clumping of the line numbers.

  • The first two registrations are the original query blocks.
  • After a break the next pair is the t2 subquery being unested and the query which is the join between t1 and the unnested t2.
  • After another break we see, in rapid succession, the view using the merged join view, the projection view using that merged join view, then the query block selecting from that projection.

The thing I find difficult to keep clear in my mind (when trying to describe what the trace/registry is saying, that is, not when just reading the plan) is the “doubling” effect where transformation steps often seem to produce two query blocks, one for the inline view containing the transformed object and one for the query that is now using the transformed object; and a further source of confusion appears when a query block seems to be able to peek into an inner query block to reference the objects in it. I just keep losing track of the layers!

It’s probablyh as safe as it’s going to be to read this note (unless someone points out an error). I don’t think there’s any more that I could find to say about the example.

August 24, 2021

qbregistry

Filed under: CBO,dbms_xplan,Oracle,Transformations — Jonathan Lewis @ 11:54 am BST Aug 24,2021

If you look at the “Outline Information” from an execution plan it shows you a list of hints that will (in theory, at least) recreate the execution plan and it’s this information that gets stored as the “injection” part of an SQL Plan Baseline. Unfortunately the hints won’t necessarily allow you to infer what transformations the optimizer has used to get to the final execution plan.

If you’re prepared to generate the CBO trace file you could examine the Query Block Registry that appears near the end of the trace file to get some clues – here’s an example from 19.11.0.0 for a simple query involving a single table plus an IN subquery:

Query Block Registry:
SEL$2 0x6d47cde8 (PARSER)
  SEL$5DA710D3 0x6d480e60 (SUBQUERY UNNEST SEL$1; SEL$2;)
    SEL$2F1334C4 0x6d480e60 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$683B0107 0x6d47cde8 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
    SEL$B186933D 0x6d48e3a8 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3) [FINAL]
    SEL$88A77D12 0x6d48e3a8 (PROJECTION VIEW FOR CVM SEL$683B0107)
      SEL$B186933D 0x6d48e3a8 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3) [FINAL]
SEL$1 0x6d480e60 (PARSER)
  SEL$5DA710D3 0x6d480e60 (SUBQUERY UNNEST SEL$1; SEL$2;)

I’m not going to say anything about interpreting this extract because I want to highlight a recent feature of the dbms_xplan package (brought to my attention by Franck Pachot some time ago). One of the format options for displaying execution plans will report the query block registry. Here’s the output from display_cursor(format=>’qbregistry’)) in 21.3.0.0 for the query that produced the CBO trace extract above:

Query Block Registry:
---------------------
  SEL$1 (PARSER)
    SEL$5DA710D3 (SUBQUERY UNNEST SEL$1 ; SEL$2)
      SEL$2F1334C4 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$2 (PARSER)
    SEL$683B0107 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
      SEL$88A77D12 (PROJECTION VIEW FOR CVM SEL$683B0107)
        SEL$B186933D (VIEW MERGE SEL$88A77D12 ; SEL$683B0107) [FINAL]

Two things to notice here – first that the output has reduced the 9 lines to 7 lines (which can only be helpful). secondly that the redundant memory addresses which appear in the trace file don’t get copied into the report.

I’m still not going to say anything about interpreting the output because I want to show you the display_cursor() output for the same query when executed in 19.11.0..0. It looks like this:

Query Block Registry:
---------------------

  <q o="13"><n><![CDATA[SEL$88A77D12]]></n><p><![CDATA[SEL$683B0107]]></p><
        f><h><t><![CDATA[T1]]></t><s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[VW_N
        SO_1]]></t><s><![CDATA[SEL$5DA710D3]]></s></h></f></q>
  <q o="12"><n><![CDATA[SEL$683B0107]]></n><p><![CDATA[SEL$2]]></p><f><h><t
        ><![CDATA[T2]]></t><s><![CDATA[SEL$2]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![CDATA[S
        EL$1]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$2]]></n><f><h><t><![CDATA[T2]]></t><s><![CDATA[S
        EL$2]]></s></h></f></q>
  <q o="18" f="y" h="y"><n><![CDATA[SEL$B186933D]]></n><p><![CDATA[SEL$88A7
        7D12]]></p><i><o><t>VW</t><v><![CDATA[SEL$683B0107]]></v></o></i><f><h><t
        ><![CDATA[T1]]></t><s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[T2]]></t><s
        ><![CDATA[SEL$2]]></s></h></f></q>
  <q o="19" h="y"><n><![CDATA[SEL$5DA710D3]]></n><p><![CDATA[SEL$1]]></p><i
        ><o><t>SQ</t><v><![CDATA[SEL$2]]></v></o></i><f><h><t><![CDATA[T1]]></t><
        s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[VW_NSO_1]]></t><s><![CDATA[SEL$
        5DA710D3]]></s></h></f></q>
  <q o="15" f="y"><n><![CDATA[SEL$2F1334C4]]></n><p><![CDATA[SEL$5DA710D3]]
        ></p><f><h><t><![CDATA[VM_NWVW_2]]></t><s><![CDATA[SEL$2F1334C4]]></s></h
        ></f></q>

Yes, it’s naked XML (extracted from the v$sql_plan.other_xml column for operation 1).

I had been living in hope that someone else would write a messy bit of SQL to translate this into something readable – but the last time I searched the Internet for “other_xml qbregistry” I got the magical result of a Googlewhack (i.e. only one hit), which was in Russian, and largely a short description of all the options for the format command.

Since I’ve just installed 21.3 on a VM I decided to bite the bullet but I’ve taken the short-cut to writing the code. I’ve run a trace on a call to dbms_xplan.display_cursor() and extracted the critical query from the resulting trace file. Then I spent 30 minutes making it readable, hacking it to make it almost workable on 19c, then finding out why it can’t work without a little extra effort. Here’s the resulting hack:

rem
rem     Script:         qbregistry_query.sql
rem     Author:         Oracle Corp / Jonathan Lewis
rem     Dated:          Aug 2021
rem
rem     Last tested 
rem             19.11.0.0
rem

define m_sql_id='232sya6twg7sq'
define m_origin = 2

with 
xml as (
        select  other_xml
        from    V$sql_plan 
        where   sql_id = '&m_sql_id' 
        and     id = 1
        and     other_xml is not null
),
allqbs as ( 
        select 
                extractvalue(d.column_value, '/q/n') qbname, 
                extractvalue(d.column_value, '/q/@f') final, 
                extractvalue(d.column_value, '/q/p') prev, 
                extractvalue(d.column_value, '/q/@o') origin 
        from 
                table(xmlsequence(extract(xmltype ((select other_xml from xml)), '/other_xml/qb_registry/q'))) d 
), 
inpqbs as ( 
        select 
                xml.qbname qbname, 
                listagg(xml.depqbs, ',') within group (
                        order by xml.depqbs) depqbs 
        from 
                xmltable('/other_xml/qb_registry/q/i/o' passing xmltype((select other_xml from xml)) 
                        columns depqbs varchar2(256) path 'v', 
                        qbname varchar2(256) path './../../n'
                ) xml 
        where     xml.depqbs in ( select qbname from allqbs) 
        group by xml.qbname
), 
recqb   (src, origin, dest, final, lvl, inpobjs) as ( 
        select 
                qbname src, origin origin, null dest, final final, 1 lvl, null inpobjs 
        from 
                allqbs
        where 
--              origin = &m_origin
                origin in (2,3)
        union all 
        select 
                a.qbname src, a.origin origin, a.prev dest, a.final final, lvl+1, 
                (select depqbs from inpqbs i where i.qbname = a.qbname) inpobjs 
        from
                allqbs a, 
                recqb r 
        where a.prev = r.src
)
search depth first by src asc set ordseq, 
finalans as ( 
        select 
                src,
/* 
                (
                select 
                        name 
                from    v$query_block_origin 
                where 
                        origin_id=origin
                )       origin, 
*/
                origin,
                dest, final, lvl, inpobjs 
        from recqb order by ordseq
) 
select
        /*+ opt_param('parallel_execution_enabled', 'false') */ 
        g.qbreg 
from (
        select 
                rpad(' ', 2*(lvl-1)) || 
                src || ' (' || origin || 
                        case when length(dest)>0 
                                then ' ' || dest 
                                else '' 
                        end || 
                        case when length(inpobjs)>0 
                                then ' ; ' || inpobjs 
                                else ' ' 
                        end ||
                        ')' || 
                        case when final='y' 
                                then ' [final]' 
                                else '' 
                        end 
                qbreg 
        from 
                finalans
        ) g
/

In lines 10 and 11 I’ve defined a couple of substitution variables that appear further on in the script. One is the SQL_ID for the query you’re interested in, the other is a fixed (probably) symbolic constant used by Oracle.

Lines 14-20 are a “with” subquery that I’ve prepended to Oracle’s internal code to create a single row, single column table holding the other_xml value of the query of interest. You’ll notice that I’ve been fairly casual about this bit since I haven’t catered for the fact that a single sql_id may have several child cursors and might even be obsolete.

Lines 28 and 36 are where I’ve used my “with” subquery to supply the other_xml value that would have appeared as a bind variable (:B1) in the trace file.

Line 49 (commented out for the reason described in footnote 1) uses the m_origin variable to identify a row in the dynamic performance view v$query_block_origin (highlighted at line 68) and that’s where we have a problem with Oracle 19c: the view doesn’t exist, nor does the x$qbname structure that the view is based on (although it’s easy to find a table of the values in the oracle executable – albeit that several items in the 21c list don’t appear in the 19.11 executable).

In the code above I’ve actually commented out the whole of the inline scalar subquery that translates an origin number into an origin name and reported the actual value of origin. Originally I did this to check whether it was worth spending any more working on the code – and this is the result I got the initial test:

SEL$1 (2 )
  SEL$5DA710D3 (19 SEL$1 ; SEL$2)
    SEL$2F1334C4 (15 SEL$5DA710D3 ) [final]
SEL$2 (2 )
  SEL$683B0107 (12 SEL$2 )
    SEL$88A77D12 (13 SEL$683B0107 )
      SEL$B186933D (18 SEL$88A77D12 ; SEL$683B0107) [final]

A quick check by eye shows that it’s got the same pattern and set of query block names that the 21c output produced so it’s clearly a step in the right direction. Now all I need is a way to translate the origin numbers into names.

I could have tried searching x$ksmfsv to see if I could spot a pointer to the relevant structure and fake my way through the whole process of creating a “nearly dynamic” performance view, but I decided the quick and dirty workaround was to dump a CSV file listing the view contents in 21c, then read the file back as an external table to copy the data into a local IOT (index organized table) called my_query_block_origin. With the inline view back in play – and the name suitably changed – the 19c and 21c queries produced the same result (which is slightly surprising as the “SUBQUERY UNNEST” and “VIEW MERGE” options don’t seem to exist in the 19.11 list I found in the oracle executable.)

Footnote 1

Here’s a query to show the content of that 21c view (which is fairly interesting in its own right):

set linesize 144
set pagesize 100
set trimspool on
set tabout off

column  name format a60
column  hint_token format a32

spool query_block_origin.lst

select
        origin_id,
        name,
        hint_token
from
        v$query_block_origin
/

 ORIGIN_ID NAME                                                         HINT_TOKEN
---------- ------------------------------------------------------------ --------------------------------
         0 NOT NAMED
         1 ALLOCATE
         2 PARSER
         3 HINT
         4 COPY
         5 SAVE
         6 MV REWRITE                                                   REWRITE
         7 PUSHED PREDICATE                                             PUSH_PRED
         8 STAR TRANSFORM SUBQUERY
         9 COMPLEX VIEW MERGE
        10 COMPLEX SUBQUERY UNNEST
        11 OR EXPANSION                                                 USE_CONCAT
        12 SUBQ INTO VIEW FOR COMPLEX UNNEST
        13 PROJECTION VIEW FOR CVM
        14 GROUPING SET TO UNION
        15 SPLIT/MERGE QUERY BLOCKS
        16 COPY PARTITION VIEW
        17 RESTORE
        18 VIEW MERGE                                                   MERGE
        19 SUBQUERY UNNEST                                              UNNEST
        20 STAR TRANSFORM                                               STAR_TRANSFORMATION
        21 INDEX JOIN
        22 STAR TRANSFORM TEMP TABLE
        23 MAP QUERY BLOCK
        24 VIEW ADDED
        25 SET QUERY BLOCK
        26 QUERY BLOCK TABLES CHANGED
        27 QUERY BLOCK SIGNATURE CHANGED
        28 MV UNION QUERY BLOCK
        29 SPLIT QUERY BLOCK FOR GSET-TO-UNION                          EXPAND_GSET_TO_UNION
        30 PREDICATES REMOVED FROM QUERY BLOCK                          PULL_PRED
        31 PREDICATES ADDED TO QUERY BLOCK
        32 OLD PUSHED PREDICATE                                         OLD_PUSH_PRED
        33 ORDER BY REMOVED FROM QUERY BLOCK                            ELIMINATE_OBY
        34 JOIN REMOVED FROM QUERY BLOCK                                ELIMINATE_JOIN
        35 OUTER-JOIN REMOVED FROM QUERY BLOCK                          OUTER_JOIN_TO_INNER
        36 STAR TRANSFORMATION JOINBACK ELIMINATION                     ELIMINATE_JOIN
        37 BITMAP JOIN INDEX JOINBACK ELIMINATION                       ELIMINATE_JOIN
        38 CONNECT BY COST BASED TRANSFORMATION                         CONNECT_BY_COST_BASED
        39 CONNECT BY WITH FILTERING                                    CONNECT_BY_FILTERING
        40 CONNECT BY WITH NO FILTERING                                 NO_CONNECT_BY_FILTERING
        41 CONNECT BY START WITH QUERY BLOCK
        42 CONNECT BY FULL SCAN QUERY BLOCK
        43 PLACE GROUP BY                                               PLACE_GROUP_BY
        44 CONNECT BY NO FILTERING COMBINE                              NO_CONNECT_BY_FILTERING
        45 VIEW ON SELECT DISTINCT
        46 COALESCED SUBQUERY                                           COALESCE_SQ
        47 QUERY HAS COALESCED SUBQUERIES                               COALESCE_SQ
        48 SPLIT QUERY BLOCK FOR DISTINCT AGG OPTIM                     TRANSFORM_DISTINCT_AGG
        49 CONNECT BY ELIMINATE DUPLICATES FROM INPUT                   CONNECT_BY_ELIM_DUPS
        50 CONNECT BY COST BASED TRANSFORMATION FOR WHR ONLY            CONNECT_BY_CB_WHR_ONLY
        51 TABLE EXPANSION                                              EXPAND_TABLE
        52 TABLE EXPANSION BRANCH
        53 JOIN FACTORIZATION SET QUERY BLOCK                           FACTORIZE_JOIN
        54 DISTINCT PLACEMENT                                           PLACE_DISTINCT
        55 JOIN FACTORIZATION BRANCH QUERY BLOCK
        56 TABLE LOOKUP BY NESTED LOOP QUERY BLOCK                      TABLE_LOOKUP_BY_NL
        57 FULL OUTER JOIN TRANSFORMED TO OUTER                         FULL_OUTER_JOIN_TO_OUTER
        58 LEFT OUTER JOIN TRANSFORMED TO ANTI                          OUTER_JOIN_TO_ANTI
        59 VIEW DECORRELATED                                            DECORRELATE
        60 QUERY VIEW DECORRELATED                                      DECORRELATE
        61 NOT EXISTS SQ ADDED
        62 BRANCH WITH OUTER JOIN
        63 BRANCH WITH ANTI JOIN
        64 UNION ALL FOR FULL OUTER JOIN
        65 VECTOR TRANSFORMATION                                        VECTOR_TRANSFORM
        66 VECTOR TRANSFORMATION TEMP TABLE
        67 QUERY ANSI REARCHiTECTURE                                    ANSI_REARCH
        68 VIEW ANSI REARCHiTECTURE                                     ANSI_REARCH
        69 ELIMINATION OF GROUP BY                                      ELIM_GROUPBY
        70 UAL BRANCH OF UNNESTED SUBQUERY
        71 QUERY BLOCK HAS BUSHY JOIN                                   BUSHY_JOIN
        72 SUBQUERY ELIMINATE                                           ELIMINATE_SQ
        73 OR EXPANSION UNION ALL BRANCH
        74 OR EXPANSION UNION ALL VIEW                                  OR_EXPAND
        75 DIST AGG GROUPING SETS UNION ALL TRANSFORMATION              USE_DAGG_UNION_ALL_GSETS
        76 MATERIALIZED WITH CLAUSE
        77 STATISTCS BASED TRANSFORMED QB
        78 PQ TABLE EXPANSION
        79 LEFT OUTER JOIN TRANSFORMED TO BOTH INNER AND ANTI
        80 SHARD TEMP TABLE
        81 BRANCH OF COMPLEX UNNESTED SET QUERY BLOCK
        82 DIST AGG GROUPING SETS OPTIMIZATION                          DAGG_OPTIM_GSETS


You’ll notice the highlight for origin_id 2 which has the name PARSER – that’s the (first) significant value when reporting the query block registry but take note, also, of origin_id 3 which has the name hint. This is where the code built into 21c goes wrong. If you use the qb_name hint to name all your query blocks then their origin_id will be 3, and Oracle’s code won’t find them.

When I added the hint /*+ qb_name(main) */ to the query this is what I got from my registry query:

Query Block Registry:
---------------------

  SEL$1 (PARSER)
    SEL$7D4DB4AA (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$1)
      SEL$EFD91A2C (PROJECTION VIEW FOR CVM SEL$7D4DB4AA)
        SEL$7086F02E (VIEW MERGE SEL$EFD91A2C ; SEL$7D4DB4AA) [FINAL]

And when I also added the hint /*+ qb_name(subq) */ to the subquery the result was this:

Query Block Registry:
---------------------

An uncaught error happened in display_cursor : ORA-06502: PL/SQL: numeric or value error

I’ve said for a long time: “always name all your query blocks”. I think 21c (temporarily) demonstrates why you have two options: name ALL of them or name NONE of them. If you name just some of them you might not notice that parts of your plan don’t appear in the registry report, and I’d say it’s better to see an error than to be fooled into thinking you’ve got complete information.

Footnote 2

There’s another new option for the format parameter in 21c which is qbregistry_graph. I haven’t considered playing about with the trace file to see if I can extract and hack the SQL that generates the appropriate output (but that might change if I pick up a tool to turn the textual description into a graphic). For the registry listing above this is what the “graph” output looks like:

Query Block Registry Graph (dot format):
---------------------
digraph g{
  rankdir = TB
  "SEL$88A77D12"
  "SEL$683B0107"
  "SEL$1"
  "SEL$2"
  "SEL$B186933D" [peripheries=2]
  "SEL$5DA710D3"
  "SEL$2F1334C4" [peripheries=2]
  "SEL$683B0107" -> "SEL$88A77D12" [label="PROJECTION VIEW FOR CVM"]
  "SEL$2" -> "SEL$683B0107" [label="SUBQ INTO VIEW FOR COMPLEX UNNEST"]
  "SEL$88A77D12" -> "SEL$B186933D" [label="VIEW MERGE"]
  "SEL$1" -> "SEL$5DA710D3" [label="SUBQUERY UNNEST"]
  "SEL$5DA710D3" -> "SEL$2F1334C4" [label="SPLIT/MERGE QUERY BLOCKS"]
  "SEL$683B0107" -> "SEL$B186933D" [style=dotted]
  "SEL$2" -> "SEL$5DA710D3" [style=dotted]
  { rank = same }
  {
    rank="sink";
    rankdir = LR;
    item1 [style=invis];
    item2 [shape="plaintext" label="Participating query blocks"];
    item3 [label="&nbsp;" peripheries=2];
    item4 [shape="plaintext" label="Final query blocks"];
    item1 -> item2 [style=dotted];
    { rank=same item3 item4; }
  }
}

Footnote 2.1 (a few days later)

A couple of days after publishing this note I received an email pointing out that the qbregistry_graph output is in the Graphviz DOT language (see http://www.graphviz.org/) and there are even websites where it can easily be rendered into graphic form. This is the result I got by pasting the output into the website:

I haven’t tried to think through a generalised pattern for drawing these pictures, but I think I’d prefer to see a diagram which showed that the “final” query block sel$2f1334c4 was used by query block sel$b186933d. After all, the word “final” in this context means only that the query block was one for which the optimizer produced an independent (sub-)plan.

Footnote 3

For completeness – here’s the original SQL and plan for the statement that produced this qbregistry example:

select
        *
from    t1
where   owner = 'OUTLN'
and     object_name in (
                select  /*+
                                unnest
                                no_semijoin
                        */
                        distinct object_name
                from   t2
                where  object_type = 'TABLE'
        )
;

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |       |       |     5 (100)|          |
|   1 |  VIEW                                  | VM_NWVW_2 |     1 |   483 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                          |           |     1 |   155 |     5  (20)| 00:00:01 |
|   3 |    NESTED LOOPS SEMI                   |           |     1 |   155 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1        |     1 |   128 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_I1     |     1 |       |     1   (0)| 00:00:01 |
|*  6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2        |     1 |    27 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T2_I2     |    48 |       |     0   (0)|          |
----------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$B186933D / "VM_NWVW_2"@"SEL$2F1334C4"
   2 - SEL$B186933D
   4 - SEL$B186933D / "T1"@"SEL$1"
   5 - SEL$B186933D / "T1"@"SEL$1"
   6 - SEL$B186933D / "T2"@"SEL$2"
   7 - SEL$B186933D / "T2"@"SEL$2"

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      DB_VERSION('21.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$B186933D")
      MERGE(@"SEL$683B0107" >"SEL$5DA710D3")
      OUTLINE_LEAF(@"SEL$2F1334C4")
      OUTLINE(@"SEL$88A77D12")
      OUTLINE(@"SEL$683B0107")
      OUTLINE(@"SEL$5DA710D3")
      UNNEST(@"SEL$2" UNNEST_INNERJ_DISTINCT_VIEW)
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$2F1334C4" "VM_NWVW_2"@"SEL$2F1334C4")
      INDEX_RS_ASC(@"SEL$B186933D" "T1"@"SEL$1" ("T1"."OWNER"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B186933D" "T1"@"SEL$1")
      INDEX_RS_ASC(@"SEL$B186933D" "T2"@"SEL$2" ("T2"."OBJECT_TYPE"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B186933D" "T2"@"SEL$2")
      LEADING(@"SEL$B186933D" "T1"@"SEL$1" "T2"@"SEL$2")
      USE_NL(@"SEL$B186933D" "T2"@"SEL$2")
      USE_HASH_AGGREGATION(@"SEL$B186933D" UNIQUE)
      PARTIAL_JOIN(@"SEL$B186933D" "T2"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("OWNER"='OUTLN')
   6 - filter("OBJECT_NAME"="OBJECT_NAME")
   7 - access("OBJECT_TYPE"='TABLE')

Query Block Registry:
---------------------
  SEL$1 (PARSER)
    SEL$5DA710D3 (SUBQUERY UNNEST SEL$1 ; SEL$2)
      SEL$2F1334C4 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$2 (PARSER)
    SEL$683B0107 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
      SEL$88A77D12 (PROJECTION VIEW FOR CVM SEL$683B0107)
        SEL$B186933D (VIEW MERGE SEL$88A77D12 ; SEL$683B0107) [FINAL]

Tables t1 and t2 are copies of the same data set, which is a subset of 100 rows from all_objects. You won’t necessarily see this plan on your systems because (even with the hints) the plan can vary depending on the number of rows with owner = ‘OUTLN’ (which is likely to be zero) or with object_type = ‘TABLE’ (which might be all of them). The script I started with was one I had used in a note I wrote about “distinct” appearing in the select list of subqueries, but the data it produced in the newer versions of Oracle was sufficiently different that I had to be a little more careful in constructing a data set that produced stable plans.

If you cross-check the Query Block Registry with the Outline Information you’ll see that the lines labelled FINAL start with the query block names that are shown as “outline_leaf” entries, and the other 5 query block names appear as “outline” entries.

Reading down the tree I then find myself strugling to interpret the QBR. I think I know what has happened, but I can’t quite manage to see exactly how the QBR is telling me that it happened.

QBR – tentative interpretation

Part of the difficulty is that the QBR seems to have a section for every initial query block in the query, so there’s (a) likely to be some overlap between sections and (b) some sequencing that means you can’t get the full picture just by reading straight from top to bottom. In this case we have two initial query blocks (the main query, implicitly named sel$1, and the subquery implicitly named sel$2), and I think the interpretation is as follows:

Starting with sel$2 section we can see that its second line tells us that the subquery was unnested and the resulting aggregate inline view is the sole content of a query block called sel$683B0107.

Jumping backwards to the sel$1 section, its second line tells us that sel$5DA710D3 is a query block consisting of a join between t1 and the inline aggregate view.

Sticking with the sel$1 tree, we then see a query block that tells us that the optimizer has transformed an “aggregate then join” into a “join then aggregate”. sel$2F1334C4 is the query block holding nothing but a select from the view VM_VWNW_2.

Returning to the sel$2 tree, sel$88A77D12 is the resulting query block when the inline aggregate view is merged using complex view merging. This is where I get a bit stuck, because this seems to be repeating a step that we’ve handled in the sel$1 section by a different route.

The final step of the sel$2 tree is sel$B186933D the query block where we select from the non-mergeable inline view VM_VWNW_2 that seems to have come from one of two different places.

Bottom line on this one: even though it’s an extremely simple query and I believe I understand what the execution plan is telling us about the transformations that took place, the query block registry is still something of a mystery to me.

August 23, 2021

Distributed Query

Filed under: distributed,Execution plans,Hints,Oracle,subqueries,Transformations,Troubleshooting — Jonathan Lewis @ 5:24 pm BST Aug 23,2021

Here’s an example that appeared on the Oracle Developer Community forum about a year ago that prompted me to do a little investigative work. The question involved a distributed query that was “misbehaving” – the interesting points were the appearance of the /*+ rule */ and /*+ driving_site() */ hints in the original query when combined with a suggestion to address the problem using the /*+ materialize */ hint with factored subqueries (common table expressions – CTEs), or when combined with my suggestion to use the /*+ no_merge */ hint.

If you don’t want to read the whole article there’s a tl;dr summary just before the end.

The original question was posed with a handful of poorly constructed code fragments that were supposed to describe the problem, viz:


select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ; -- query n°1

select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table); -- query n°2

select * from
select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ,
select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table)
where att_table_1 = att_table_2  -- sic

The crux of the problem was that the two separate statements individually produced an acceptable execution plan but the attempt to use the queries in inline views with a join resulted in a plan that (from the description) sounded like the result of Oracle merging the two inline views and running the two IN subqueries as FILTER (existence) subqueries.

We weren’t shown any execution plans and only had the title of the question (“Distributed sql query through multiple databases”) to give us the clue that there might be three different databases involved.

Obviously there are several questions worth asking when presented with this problem. The first being “can we have a more realistic piece of code”, also “which vesion of Oracle”, and “where are the execution plans”. I can’t help feeling that there’s more to the problem than just the three tables that seem to be suggested by the fragments supplied.

More significant, though, was the surprise that rule and driving_site should work together. There’s a long-standing (but incorrect) assertion that “any other hint invalidates the RULE hint”. I think I’ve published an example somewhere showing that /*+ unnest */ would affect an execution plan where the optimizer still obeyed the /*+ rule */ hint, and there’s an old post on this blog which points out that transformation and optimisation are (or were, at the time) independent of each other, implying that you could combine the rule hint with “transformational” hints and still end up with a rule-based execution plan.

Despite old memories suggesting the contrary my first thought was that the rule and driving_site hints couldn’t be working together – and that made it worth running a little test. Then one of the other specialists on the forums suggested using subquery factoring with the materialize hint – and I thought that probably wouldn’t help because when you insert into a global temporary table the driving site has to become the site that holds the global temporary tables (in fact this isn’t just a feature of GTTs). So there was another thing prompting me to run a test. (And then I suggested using the /*+ no_merge */ hint – but thought I’d check if that idea was going to work before I suggested it.)

So here’s a code sample to create some data, and the first two simple queries with calls for their predicted execution plans:

rem
rem     Script:         distributed_multi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

rem     create public database link test@loopback using 'test';
rem     create public database link test2@loopback using 'test2';

rem     create public database link orcl@loopback using 'orcl';
rem     create public database link orcl2@loopback using 'orcl2';

rem     create public database link orclpdb@loopback using 'orclpdb';
rem     create public database link orclpdb2@loopback using 'orclpdb2';

define m_target=test@loopback
define m_target2=test2@loopback

define m_target=orcl@loopback
define m_target2=orcl2@loopback

define m_target=orclpdb@loopback
define m_target2=orclpdb2@loopback

create table t0
as
select  *
from    all_objects
where   mod(object_id,4) = 1
;

create table t1
as
select  *
from    all_objects
where   mod(object_id,11) = 0
;

create table t2
as
select  *
from    all_Objects
where   mod(object_id,13) = 0
;

explain plan for
select  /*+ driving_site(t1) */
        t1.object_name, t1.object_id
from    t1@&m_target    t1
where
        t1.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

explain plan for
select
        /*+ rule driving_site(t2) */
        t2.object_name, t2.object_id
from    t2@&m_target2   t2
where
        t2.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

Reading from the top down – t0 is in the local database, t1 is in remote database 1, t2 is in remote database 2. I’ve indicated the creation and selection of a pair of public database links at the top of the script – in this case both of them are loopback links to the local database, but I’ve used substitition variables in the SQL to allow me to adjust which databases are the remote ones. Since there are no indexes on any of the tables the optimizer is very limited in its choice of execution plans, which are as follows in 19.3 (the oraclepdb/orclpdb2 links).

First, the query against t1@orclpdb1 – which will run cost-based:


-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|*  1 |  HASH JOIN SEMI        |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   2 |   TABLE ACCESS FULL    | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   3 |   REMOTE               | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A1"."OBJECT_ID"="A2"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement

You’ll note that operation 3 is simply REMOTE, and t0 is the object accessed – which means this query is behaving as if the (local) t0 table is the remote one as far as the execution plan is concerned. The IN-OUT column tells us that this operation is “Remote to Serial” (R->S)” and the instance called to is named “!” which is how the local database is identified in the plan from a remote database.

We can also see that the execution plan gives us the “Remote SQL Information” for operation 2 – and that’s the text of the query that gets sent by the driving site to the instance that holds the object of interest. In this case the query is simply selecting the object_id values from all the rows in t0.

Now the plan for the query against t2@orclpdb2 which includes a /*+ rule */ hint:

-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   SORT JOIN            |          |        |      |
|   3 |    TABLE ACCESS FULL   | T2       | ORCLP~ |      |
|*  4 |   SORT JOIN            |          |        |      |
|   5 |    VIEW                | VW_NSO_1 | ORCLP~ |      |
|   6 |     SORT UNIQUE        |          |        |      |
|   7 |      REMOTE            | T0       |      ! | R->S |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A1"."OBJECT_ID"="OBJECT_ID")
       filter("A1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   7 - SELECT /*+ RULE */ "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement
   - rule based optimizer used (consider using cbo)

The most striking feature of this plan is that it is an RBO (rule based optimizer) plan not a cost-based plan – and the Note section confirms that observation. We can also see that the Remote SQL Information is echoing the /*+ RULE */ hint back in it’s query against t0. Since the query is operating rule-based the hash join mechanism is not available (it’s a costed path – it needs to know the size of the data that will be used in the build table), and that’s why the plan is using a sort/merge join.

Following the “incremental build” strategy for writing SQL all we have to do as the next step of producing the final code is put the two queries into separate views and join them:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ rule driving_site(t2) */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

And here’s the execution plan – which, I have to admit, gave me a bit of a surprise on two counts when I first saw it:


-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT       |          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   MERGE JOIN           |          |        |      |
|   3 |    MERGE JOIN          |          |        |      |
|   4 |     SORT JOIN          |          |        |      |
|   5 |      REMOTE            | T2       | ORCLP~ | R->S |
|*  6 |     SORT JOIN          |          |        |      |
|   7 |      REMOTE            | T1       | ORCLP~ | R->S |
|*  8 |    SORT JOIN           |          |        |      |
|   9 |     VIEW               | VW_NSO_1 |        |      |
|  10 |      SORT UNIQUE       |          |        |      |
|  11 |       TABLE ACCESS FULL| T0       |        |      |
|* 12 |   SORT JOIN            |          |        |      |
|  13 |    VIEW                | VW_NSO_2 |        |      |
|  14 |     SORT UNIQUE        |          |        |      |
|  15 |      TABLE ACCESS FULL | T0       |        |      |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
       filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   8 - access("T2"."OBJECT_ID"="OBJECT_ID")
       filter("T2"."OBJECT_ID"="OBJECT_ID")
  12 - access("T1"."OBJECT_ID"="OBJECT_ID")
       filter("T1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   5 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T2" "T2"
       (accessing 'ORCLPDB2.LOCALDOMAIN@LOOPBACK' )

   7 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T1" "T1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - rule based optimizer used (consider using cbo)

The two surprises were that (a) the entire plan was rule-based, and (b) the driving_site() selection has disappeared from the plan.

Of course as soon as I actually started thinking about what I’d written (instead of trusting the knee-jerk “just stick the two bits together”) the flaw in the strategy became obvious.

  • Either the whole query runs RBO or it runs CBO – you can’t split the planning.
  • In the words of The Highlander “There can be only one” (driving site that is) – only one of the database involved will decide how to decompose and distribute the query.

It’s an interesting detail that the /*+ rule */ hint seems to have pushed the whole query into the arms of the RBO despite being buried somewhere in the depths of the query rather than being in the top level query block – but we’ve seen that before in some old data dictionary views.

The complete disregard for the driving_site() hints is less interesting – there is, after all, a comment in the manuals somewhere to the effect that when two hints contradict each other they are both ignored. (But I did wonder why the Hint Report that should appear with 19.3 plans didn’t tell me that the hints had been observed but not used.)

The other problem (from the perspective of the OP) is that the two inline views have been merged so the join order no longer reflects the two isolated components we used to have. So let’s fiddle around a little bit to see how close we can get to what the OP wants. The first step would be to add the /*+ no_merge */ hint to both inline view, and eliminate one of the /*+ driving_site() */ hints to see what happens, and since we’re modern we’ll also get rid of the /*+ rule */ hint:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ qb_name(subq1) no_merge driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ qb_name(subq2) no_merge */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|*  1 |  HASH JOIN             |      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|   2 |   VIEW                 |      |  4342 |   334K|    14   (8)| 00:00:01 |        |      |
|   3 |    REMOTE              |      |       |       |            |          |      ! | R->S |
|   4 |   VIEW                 |      |  5168 |   398K|    57   (8)| 00:00:01 |        |      |
|*  5 |    HASH JOIN SEMI      |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   6 |     TABLE ACCESS FULL  | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   7 |     REMOTE             | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A2"."OBJECT_ID"="A1"."OBJECT_ID")
   5 - access("A3"."OBJECT_ID"="A6"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - EXPLAIN PLAN INTO "PLAN_TABLE" FOR SELECT /*+ QB_NAME ("SUBQ2") NO_MERGE */
       "A1"."OBJECT_NAME","A1"."OBJECT_ID" FROM  (SELECT DISTINCT "A3"."OBJECT_ID"
       "OBJECT_ID" FROM "T0" "A3") "A2","T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1" WHERE
       "A1"."OBJECT_ID"="A2"."OBJECT_ID" (accessing '!' )

   7 - SELECT "OBJECT_ID" FROM "T0" "A6" (accessing '!' )

Note
-----
   - fully remote statement

In this plan we can see that the /*+ driving_site() */ hint has been applied – the plan is presented from the point of view of orclpdb (the database holding t1). The order of the two inline views has apparently been reversed as we move from the statement to its plan – but that’s just a minor side effect of the hash join (picking the smaller result set as the build table).

Operations 5 – 7 tell us that t1 is treated as the local table and used for the build table in a hash semi-join, and then t0 is accessed by a call back to our database and its result set is used as the probe table.

From operation 3 (in the body of the plan, and in the Remote SQL Information) we see that orclpdb has handed off the entire t2 query block to a remote operation – which is ‘accessing “!”. But there’s a problem (in my opinion) in the SQL that it’s handing off – the text is NOT the text of our inline view; it’s already been through a heuristic transformation that has unnested the IN subquery of our original text into a “join distinct view” – if we had used a hint to force this transformation it would have been the /*+ unnest(UNNEST_INNERJ_DISTINCT_VIEW) */ variant.

SELECT /*+ NO_MERGE */
        "A1"."OBJECT_NAME","A1"."OBJECT_ID"
FROM
       (SELECT DISTINCT "A3"."OBJECT_ID" "OBJECT_ID" FROM "T0" "A3") "A2",
       "T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1"
WHERE
        "A1"."OBJECT_ID"="A2"."OBJECT_ID"

I tried to change this by adding alternative versions of the /* unnest() */ hint to the original query, following the query block names indicated by the outline information (not shown), but it looks as if the code path constructs the Remote SQL operates without considering the main query hints – perhaps the decomposition code is simply following the code path of the old heuristic “I’ll do it if it’s legal” unnest. The drawback to this is that if the original form of the text had been sent to the other site the optimizer that had to handle it could have used cost-based query transformation and may have come up with a better plan.

You may be wondering why I left the /*+ driving_site() */ hint in one of the inline views rather than inserting it in the main query block. The answer is simple – it didn’t seem to work (even in 19.3) when I put /*+ driving_site(t1@subq1) */ in the main query block.

tl;dr

The optimizer has to operate rule-based or cost-based, it can’t do a bit of both in the same query – so if you’ve got a /*+ RULE */ hint that takes effect anywhere in the query the entire query will be optimised under the rule-based optimizer.

There can be only one driving site for a query, and if you manage to get multiple driving_site() hints in a query that contradict each other the optimizer will ignore all of them.

When the optimizer decomposes a distributed query and produces non-trivial components to send to remote sites you may find that some of the queries constructed for the remote sites have been subject to transformations that you cannot influence by hinting.

Footnote

I mentioned factored subqueries and the /*+ materialize */ option in the opening notes. In plans where the attempt to specify the driving site failed (i.e. when the query ran locally) the factored subqueries did materialize. In any plans where the driving site was a remote site the factored subqueries were always inline. This may well be related to the documented (though not always implemented) restriction that temporary tables cannot take part in distributed transactions.

March 8, 2021

Join Elimination redux

Filed under: Bugs,CBO,Join Elimination,Oracle,Transformations — Jonathan Lewis @ 12:58 pm GMT Mar 8,2021

This note is a followup to a post from a few years back (originally dating further back to 2012) where I described an inconsistency that appeared when join elimination and deferrable constraints collided. The bug resurfacted recently in a new guise in a question on the Oracle Developer forum with a wonderful variation on the symptons that ultimately gave a good clue to underlying issue. The post included a complete working example of the anomaly, but I’ll demonstrate it using a variation of my 2012/2017 code. We start with a pair of tables with referential integrity defined between them:

rem
rem     Script:         join_eliminate_bug_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2021
rem
rem     Last tested 
rem             19.11.0.0
rem             19.8.0.0 (LiveSQL)
rem

create table parent (
        id      number(4),
        name    varchar2(10),
        constraint par_pk primary key (id)
        deferrable initially immediate
)
;

create table child(
        id_p    number(4)       
                constraint chi_fk_par
                references parent,
        id      number(4),
        name    varchar2(10),
        constraint chi_pk primary key (id_p, id) 
)
;

insert into parent values (1,'Smith');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

commit;

begin
        dbms_stats.gather_table_stats(user,'child');
        dbms_stats.gather_table_stats(user,'parent');
end;
/

You’ll notice that I’ve created the primary key constraint on parent as “deferrable initially immediate”. So let’s write some code that defers the constraint, inserts some duplicate data executes a join between the two tables:

set serveroutput off
set constraint par_pk deferred;

insert into parent (id,name) values (1,'Smith');

select
        /*+ initially immediate  PK */
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);

Since I’ve now got two rows with id = 1 in parent the query ought to return duplicates for every row in child where id_p = 1, but it doesn’t. Here’s the output from the query and the execution plan:

     ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          2 Sally

2 rows selected.


PLAN_TABLE_OUTPUT
--------------------------------------------------
SQL_ID  gy6h8td4tmdpg, child number 0
-------------------------------------
select  /*+ initially immediate  PK */  chi.* from  child chi,  parent
par where  par.id = chi.id_p

Plan hash value: 2406669797

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS FULL| CHILD |     2 |    24 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

The optimizer has applied “join elimination” to take parent out of the transformed query, so no duplicates. Arguably this is the wrong result.

Let’s roll back the insert and change the experiment – let’s change the constraint on the parent primary key so that it’s still deferrable, but initially deferred then repeat the insert and query:

rollback;
alter table child drop constraint chi_fk_par;
alter table parent drop constraint par_pk;

alter table parent add constraint par_pk primary key (id) deferrable initially deferred;
alter table child add constraint chi_fk_par foreign key(id_p) references parent;

insert into parent (id,name) values (1,'Smith');

select
        /*+ initially deferred  PK */
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);

In this case we don’t need to “set constraint par_pk deferred”, it’s implicitly deferred by definition and will only be checked when we commit any transaction. Would you expect this to make any difference to the result? This is what we get:

      ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          1 Simon
         1          2 Sally
         1          2 Sally

4 rows selected.


PLAN_TABLE_OUTPUT
--------------------------------------------------
SQL_ID  8gvn3mzr8uv0h, child number 0
-------------------------------------
select  /*+ initially deferred  PK */  chi.* from  child chi,  parent
par where  par.id = chi.id_p

Plan hash value: 1687613841

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |       |       |     2 (100)|          |
|   1 |  NESTED LOOPS      |        |     2 |    30 |     2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| CHILD  |     2 |    24 |     2   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | PAR_PK |     1 |     3 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PAR"."ID"="CHI"."ID_P")

When the parent primary key is initially deferred then join elimination doesn’t take place – so we get two copies of each child row in the output. (This is still true even if we add the “rely” option to the parent primary key constraint).

Debug Analysis

As I said at the top of the article, this does give us a clue about the source of the bug. A check of the dictionary table cdef$ (constraint definitions) shows the following notes for column cdef$.defer:

  defer         number,                     /* 0x01 constraint is deferrable */
                                              /* 0x02 constraint is deferred */
                                /* 0x04 constraint has been system validated */
                                 /* 0x08 constraint name is system generated */
etc...

With my examples the “initially immediate” constraint reported defer = 5, for the “initially deferred” constraint it reported the value 7. It looks as if the optimizer code to handle join elimination look only at the static definition of the constraint (bit 0x02) and doesn’t consider the possibility that if bit 0x01 is set it should also check the session state to see if the constraint has been temporarily deferred.

Conclusion

If you are going to implement deferrable constraints be very careful about tracking exactly how you use them, and be aware that if you execute arbitrary queries in mid-transaction then you may find that the results are not exactly what you expect. In fact, though it’s not demonstrated here, different forms of SQL to that should express the same requirement may not give the same results.

Update (May 2021)

This buggy behaviour is still present in 19.11.0.0

September 5, 2010

Not NULL

Filed under: CBO,Execution plans,Indexing,Infrastructure,NULL,Oracle,Transformations — Jonathan Lewis @ 5:56 pm BST Sep 5,2010

Here’s a little detail that I’ve known for years – but keep forgetting until I embarrass myself by rediscovering it (usually in front of a client). I’ll start with a demonstration of a useful little feature of mandatory columns:
(more…)

Website Powered by WordPress.com.