Oracle Scratchpad

December 6, 2019

Temp space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:18 pm GMT Dec 6,2019

A question about hunting down the source of the error “ORA-01652 unable to extend temp segment by NNN in tablespace XXX” shows up on the Oracle-L mailing list or the Oracle developer community forum from time to time. In most cases the tablespace referenced is the temporary tablespace, which means the session reporting the error was probably trying to allocate some space for sorting, or doing a hash join, or instantiating a GTT (global temporary table) or a CTE (common table expression / “with” subquery). The difficulty in cases like this is that the session reporting the error might be the victim of some other session’s greed – so looking at what the session was doing won’t necessarily point you to the real problem.

Of course you then run into a further problem tracking down the source of the problem. By the time you hear the complaint (even if it’s only seconds after the error appeared) the session that had been hogging the temporary tablespace may have finished what it was doing, leaving a huge amount of free space in the temporary tablespace and suggesting (to the less experienced and cynical user) that there’s something fundamentally wrong with the way Oracle has been accounting for space usage.

If you find yourself in this situation remember that (if you’re licensed to take advantage of it) the active session history may be able to help.  One of the columns in v$active_session_history is called temp_space_allocated with the slightly misleading description: “Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken”. A simple query against v$active_session_history may be enough to identify the session and SQL  statement that had been holding the temporary space when the error was raised, for example:


column pga_allocated        format 999,999,999,999
column temp_space_allocated format 999,999,999,999

break on session_id skip 1 on session_serial#

select
        session_id, session_serial#, 
        sample_id, 
        sql_id, 
        pga_allocated,
        temp_space_allocated
from
        v$active_session_history
where
        sample_time between sysdate - 5/1440 and sysdate
and     nvl(temp_space_allocated,0) != 0
order by
        session_id, sample_id
/

All I’ve done for this example is query v$active_session_history for the last 5 minutes reporting a minimum of information from any rows that show temp space usage. As a minor variation on the theme you can obviously change the time range, and you might want to limit the output to rows reporting more than 1MB (say) of temp space usage.

You’ll notice that I’ve also reported the pga_allocated (Description: Amount of PGA memory (in bytes) consumed by this session at the time this sample was taken) in this query; this is just a little convenience – a query that’s taking a lot of temp space will probably start by acquiring a lot of memory so it’s nice to be able to see the two figures together.

There are plenty of limitations and flaws in the usefulness of this report and I’ll say something about that after showing an example of usage. Let’s start with a script to build some data before running a space-consuming query:


rem
rem     Script:         allocate_tempspace.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 as 
select * from all_objects
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

execute dbms_stats.gather_table_stats(null,'t1')

execute dbms_lock.sleep(20)

set pagesize  60
set linesize 255
set trimspool on
set serveroutput off
alter session set statistics_level = all;

with ttemp as (
        select /*+ materialize */ * from t1
)
select 
        /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ 
        t1a.object_type,
        max(t1a.object_name)
from
        ttemp t1a, ttemp t1b
where
        t1a.object_id = t1b.object_id
group by
        t1a.object_type
order by
        t1a.object_type
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

My working table t1 consists of 16 copies of the view all_objects – so close to 1 million rows in my case – and the query is hinted to avoid any of the clever transformations that the optimizer could use to reduce the workload so it’s going to do a massive hash join and aggregation to report a summary of a couple of dozen rows. Here’s the execution plan (in this case from 12.2.0.1, though the plan is the same for 19.3 with some variations in the numbers).


SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665B_E2772D3 |      1 |        |      0 |00:00:01.51 |   28915 |      0 |   9217 |  2068K|  2068K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    989K|    989K|00:00:00.24 |   19551 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     29 |     29 |00:00:08.51 |   18493 |  21345 |   2910 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     15M|     15M|00:00:03.93 |   18493 |  21345 |   2910 |    48M|  6400K|   65M (1)|   25600 |
|   6 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.36 |    9233 |   9218 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.35 |    9233 |   9218 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.40 |    9257 |   9217 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.39 |    9257 |   9217 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")

Critically this plan shows us two uses of the temp space but only reports one of them as Used-Tmp. The “hash join” at operation 5 tells us that it reached 65MB of (tunable PGA) memory before going “1-pass”, eventually dumping 25,600 KB to disc. This space usage is corroborated by the 2,910 writes (which, at an 8KB block size, would be 23,280 KB). The missing Used-Tmp, however, is the space taken up by the materialized CTE. We can see that operation 2 is a “load as select” that writes 9,217 blocks to disc (subsequently read back twice – the tablescans shown in operations 7 and 9). That’s  74,000 KB of temp space that doesn’t get reported Used-Tmp.

If we take a look at the plan from 19.3 we see different numbers, but the same “error of omission”:

SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6624_E259E68 |      1 |        |      0 |00:00:01.26 |   23706 |      0 |   5593 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    907K|    907K|00:00:00.21 |   18024 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     25 |     25 |00:00:06.89 |   11193 |  13843 |   2655 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     14M|     14M|00:00:03.55 |   11193 |  13843 |   2655 |    44M|  6400K|   64M (1)|      23M|
|   6 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.26 |    5598 |   5594 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.25 |    5598 |   5594 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.34 |    5595 |   5594 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.33 |    5595 |   5594 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


With slightly fewer rows in t1 (907K vs. 989K) we write 5,593 blocks for the materialized CTE  (instead of 9,217) and spill 2,655 blocks during the hash join (instead of 2,910). But again it’s only the hash join spill that is reported under Used-Tmp. Note, by the way, that the Used-Tmp in 12.2 was reported in KB when it’s reported in MB in 19.3.0.0.

Side note: comparing the number of rows created and blocks written for the CTE, it looks as if 19.3 is using the data blocks much more efficiently than 12.2. There’s no obvious reason for this (though a first guess would be that the older mechanism is to write a GTT with pctfree=10 while the new avoid any free space and transactional details) so, as ever, I now have another draft for a blog note reminding me to investigate (eventually) what differences there are in CTE storage on the upgrade. It’s something that might make a difference in a few special cases.

With the figures from the execution plans in mind we can now look at the results of the query against v$active_session_history. Conveniently the queries took a few seconds to complete, so we’re going to see several rows for each execution.

First the results from 12.2.0.1

SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
        14           22234   15306218 1cwabt12zq6zb       95,962,112            1,048,576
                             15306219 1cwabt12zq6zb       97,731,584           37,748,736
                             15306220 1cwabt12zq6zb      148,194,304           77,594,624
                             15306221 1cwabt12zq6zb      168,117,248           85,983,232
                             15306222 1cwabt12zq6zb      168,117,248           90,177,536
                             15306223 1cwabt12zq6zb      168,117,248           95,420,416
                             15306224 1cwabt12zq6zb      168,117,248           98,566,144
                             15306225 1cwabt12zq6zb      168,117,248          102,760,448
                             15306226 1cwabt12zq6zb      116,933,632          103,809,024
                             15306227 1cwabt12zq6zb      116,933,632          103,809,024
                             15306228 b66ycurnwpgud        8,602,624            1,048,576

I pointed out that we had 25,600 KB reported as Used-Tmp and roughly 74,000 KB unreported – a total of nearly 100,000 KB that is reasonably close to the 103,800,000 bytes reported by ASH. Moreover the timing of the plan (loading the CTE in the first 2 seconds) seems to agree with the growth to 77,590,000 of temp_space_allocated by the time we get to sample_id 15306220 in ASH. Then we have several seconds of slow growth as the hash join takes place and feeds its resulte up to the sort group by. At the end of the query we happen to have been lucky enough to catch one last sample just before the session had released all its temp space and ceased to be active.  (Note: however, that the sql_id at that sample point was not the sql_id of our big query – and that’s a clue about one of the limitations of using ASH to find the greedy SQL.)

We see the same pattern of behaviour in 19.3.0.0:


SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
       136           42767    2217500 1cwabt12zq6zb      143,982,592           46,137,344
                              2217501 1cwabt12zq6zb      193,527,808           54,525,952
                              2217502 1cwabt12zq6zb      193,527,808           57,671,680
                              2217503 1cwabt12zq6zb      193,527,808           61,865,984
                              2217504 1cwabt12zq6zb      197,722,112           67,108,864
                              2217505 1cwabt12zq6zb      150,601,728           70,254,592
                              2217506 1cwabt12zq6zb      150,601,728           70,254,592

We start with an almost instantaneous jump to 46MB of temp_space_allocated in the first second of the query – that’s the 5,593 blocks of the CTE being materialized, then the slow growth of temp space as the hash join runs, spills to disc, and passes its data up to the sort group by. Again we can see that the peak usage was the CTE (46MB) plus the reported spill of 23MB (plus rounding errors and odd bits).

Preliminary Observations

Queries against ASH (v$active_session_history) can show us sessions that were holding space in the temporary tablespace at the moment a sample of active sessions was taken. This may allow us to identify greedy sessions that were causing other sessions to fail with ORA-01652 (unable to allocate temp segment).

We have seen that there is at least one case where we get better information about temp space allocation from ASH than we do from the variants on v$sql_plan that include the SQL Workarea information (v$sql_workarea, v$sql_workarea_active) because the space acquired during materialization of CTEs is not reported as a “tunable SQL workarea” but does appear in the ASH temp_space_allocated.

At first sight it looks as if we may be able to use the query against ASH to identify the statement (by sql_id) that was the one being run by the greedy session when it consumed all the space. As we shall see in a further article, there are various reasons why this may over-optimistic, however in many cases there’s a fair chance that when you see the same sql_id appearing in a number of consecutive rows of the report then that statement may be the thing that is responsible for the recent growth in temp space usage – and you can query v$sql to find the text and call dbms_xplan.display_cursor() to get as much execution plan information as possible.

Further questions

  • When does a session release the temp_space_allocated? Will the space be held (locked) as long as the cursor is open, or can it be released when it is no longer needed? Will it be held, but releasable, even after the cursor has (from the client program’s perspective) been closed?
  • Could we be fooled by a report that said a session was holding a lot of space when it didn’t need it and would have released it if the demand had appeared?
  • Under what conditions might the temp_space_allocated in an ASH sample have nothing to do with the sql_id reported in the same sample?
  • Are there any other reasons why ASH might report temp_space_allocated when an execution plan doesn’t?
  • Is temp_space_allocated only about the temporary tablespace, or could it include informatiom about other (“permanent”) tablespaces ?

Stay tuned for further analysis of the limitations of using v$active_session_history.temp_space_allocated to help identify the srouce of a space management ORA-01652 issue.

 

 

December 4, 2019

E-rows / A-rows

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:17 pm GMT Dec 4,2019

This note was prompted by an error I made at the UKOUG TechFest19 yesterday. It’s fairly well-known that when you read an execution plan that includes the rowsource execution stats – so you get the E-rows (estimated) and A-rows (Actual) reported – then a sensible check of the quality of the optimizer’s calculations is to compare the estimates and actuals allowing for the fact that the E-rows is “per start” and the A-rows is “cumulative”, so A-rows = E-rows * Starts.

The error I made yesterday was to forget that this relationship isn’t always true. In particular partitioning and parallel query introduced the need to be a little flexibility in reading the numbers – which I’ll demonstrate with a coupld of simple examples running under 12.2.0.1


rem
rem     Script:         estimate_actual.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem

create table pt_composite_1 (
        id,
        grp,
        small_vc,
        padding
)
nologging
partition by range(id) 
subpartition by hash (grp)
subpartitions 4
(
        partition p1 values less than (  4000),
        partition p2 values less than (  8000),
        partition p3 values less than ( 16000),
        partition p4 values less than ( 32000),
        partition p5 values less than ( 64000),
        partition p6 values less than (128000)
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                            id,
        trunc(rownum/100)                                 grp,
        cast(to_char(trunc(rownum/20)) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))              padding
from
        generator       g1,
        generator       g2
where 
        rownum <= 1e5 -- > comment to avoid WordPress format issue
/

create table t3 
nologging pctfree 80
storage (initial 1M next 1M)
as
select * from pt_composite_1
/

All I’ve done is create a couple of tables with 100,000 rows each – and now I’m going to count the rows and see what I get from the execution plans with rowsource execution stats enabled:


set serveroutput off
alter session set statistics_level = all;

prompt  =================
prompt  Partition effects
prompt  =================

select count(id) from pt_composite_1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ================
prompt  Parallel effects
prompt  ================

select /*+ parallel (t3 4) */ count(id) from t3;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

With a little cosmetic tidying, here are the two execution plans (note that I haven’t used the “last” format option when reporting the parallel plan:


=================
Partition effects
=================

select count(id) from pt_composite_1

-------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |      1 |        |      1 |00:00:00.04 |    1866 |
|   1 |  SORT AGGREGATE      |                |      1 |      1 |      1 |00:00:00.04 |    1866 |
|   2 |   PARTITION RANGE ALL|                |      1 |    100K|    100K|00:00:00.04 |    1866 |
|   3 |    PARTITION HASH ALL|                |      6 |    100K|    100K|00:00:00.04 |    1866 |
|   4 |     TABLE ACCESS FULL| PT_COMPOSITE_1 |     24 |    100K|    100K|00:00:00.04 |    1866 |
-------------------------------------------------------------------------------------------------


================
Parallel effects
================

select /*+ parallel (t3 4) */ count(id) from t3

------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |      1 |        |      1 |00:00:00.04 |      20 |      0 |
|   1 |  SORT AGGREGATE        |          |      1 |      1 |      1 |00:00:00.04 |      20 |      0 |
|   2 |   PX COORDINATOR       |          |      1 |        |      4 |00:00:00.04 |      20 |      0 |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE     |          |      4 |      1 |      4 |00:00:00.11 |    8424 |   7692 |
|   5 |      PX BLOCK ITERATOR |          |      4 |    100K|    100K|00:00:00.11 |    8424 |   7692 |
|*  6 |       TABLE ACCESS FULL| T3       |     61 |    100K|    100K|00:00:00.06 |    8424 |   7692 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)

As you can see, the lines specifying partition selection report E-Rows for the whole table, not for any partition-level approximation, so for operations 3 and 4 we shouldn’t multiply Starts by E-rows to compare with A-row. (Starts = 6 for operation 3 because we have 6 partitions, and Start = 24 for operation 4 because at the lowest level we have a total of 24 data segments).

For the parallel query we see the same pattern – every parallel slave reports the expected total number of rows, and every “block iterator” (rowid range) reports the expected total number of rows. Again we see that using multiplication to compare E-rows and A-rows would not be valid.

In fact it’s not just partitioning and parallelism that can cause confusion. Even something as simple as a serial nested loop join has a couple of surprises (largely thanks to the evolution of the mechanics – without a matching adjustment to the execution plans – over time). Here’s a script to generate a couple of tables, which we will then join – hinting various mechanisms for the nested loop.


create table t1
as
select
        rownum           id,
        mod(rownum,100)  n1,
        cast(lpad(rownum,20) as varchar2(20)) v1 
from
        dual
connect by
        level <= 1000 -- > comment to avoid WordPress format issue
;

create table t2
as
select  * from t1
union all
select  * from t1
union all
select  * from t1
;

create index t2_i1 on t2(id);

It’s not a subtle test – as you can see we have 3 rows in table t2 for every row in t1. So let’s pick some t1 rows and join to t2 on id. Again it’s 12.2.0.1:

set serveroutput off
alter session set statistics_level = all;

prompt  ==============================
prompt  Nested loop join (traditional)
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                opt_param('_nlj_batching_enabled', 0)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ==============================
prompt  Nested loop join with prefetch
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                nlj_prefetch(t2) 
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ==============================
prompt  Nested loop join with batching
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                nlj_batching(t2) 
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

Here are the three plans, with a couple of comments after each – all three queries returned the same 30 *- 10 * 3) rows.


==============================
Nested loop join (traditional)
==============================

--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     30 |00:00:00.01 |      60 |
|   1 |  NESTED LOOPS                        |       |      1 |     30 |     30 |00:00:00.01 |      60 |
|*  2 |   TABLE ACCESS FULL                  | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     10 |      3 |     30 |00:00:00.01 |      45 |
|*  4 |    INDEX RANGE SCAN                  | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")

This is the original nested loop structured (apart from the “batched” option that appeared in 12c) and follows the rule/guideline:

  • Operation 2 operates once and returns the 10 rows predicted.
  • Operation 3 is started 10 times by operation 1, with a prediction of 3 rows per start – and the actual comes out at 30 rows.
  • Operation 4 is started 10 times (once for each start of operation 3), with a predication of 3 rowids per start – and the actual comes out at 30 rows
==============================
Nested loop join with prefetch
==============================

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |      3 |     30 |00:00:00.01 |      60 |
|   2 |   NESTED LOOPS                      |       |      1 |     30 |     30 |00:00:00.01 |      30 |
|*  3 |    TABLE ACCESS FULL                | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|*  4 |    INDEX RANGE SCAN                 | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")
 

Again in the order of rowsource generation

  • Operation 3 starts once with a prediction of 10 rows and the rule works.
  • Operation 4 is started 10 times by operation 2, with a prediction of 3 rows (rowids) per start, and the rule works.
  • Operation 2 was started once by operation 1, with a predication of 30 rows (rowids), and the rule works.
  • Operation 1 starts once, but the prediction is reported as the value you would have got from the original NLJ shape – and breaks the rule.
==============================
Nested loop join with batching
==============================

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     30 |00:00:00.01 |      60 |
|   1 |  NESTED LOOPS                |       |      1 |     30 |     30 |00:00:00.01 |      60 |
|   2 |   NESTED LOOPS               |       |      1 |     30 |     30 |00:00:00.01 |      30 |
|*  3 |    TABLE ACCESS FULL         | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|*  4 |    INDEX RANGE SCAN          | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2    |     30 |      3 |     30 |00:00:00.01 |      30 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")


In the order in which rowsources are created

  • Operation 3 starts once with a prediction of 10 rows – and the A-rows matches the rule
  • Operation 4 is started 10 times by opreation 2, with a prediction of 3 rows per start – and the A-rows matches the rule.
  • Operation 5 is started 30 times by operation 1, with a prediction of 3 rows per start – again reporting the value that you would have seen from the original representation of the NLJ, the prediction obviously should be 1 – so the rule is broken again

tl;dr

It is important to remember that the basic rule of “A-rows = starts * E-rows” does not hold for the partition-related lines or the PX related lines of partitioned and parallel execution plans.

You may also find a few other cases where you need be a little cautious about trusting the rule without first thinking carefully about the mechanics of what the shape of the plan is telling you.

October 31, 2019

IOT Hash

Filed under: Execution plans,Hash Join,Infrastructure,IOT,Joins,Oracle,Troubleshooting — Jonathan Lewis @ 2:59 pm GMT Oct 31,2019

It’s another of my double-entendre titles. The optimizer can turn a hash join involving an index-organized table into a real performance disaster (though you may have to help it along the way by using a silly definition for your primary key columns). This post was inspired by a question posted on the Oracle Developer Community forum recently so the table and column names I’ve used in my model reflect (almost, with a few corrections) the names used in the post.

We start with a simple requirement expressed through the following SQL:


rem
rem     Script:         iot_hash.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

insert
        /*+
                qb_name(insert)
        */
into t_iot(
        id, inst_id, nr_time,
        o_time, st, null_col, svname
)
select
        /*+
                qb_name(main)
                unnest(@subq)
                leading(@sel$a93afaed apar@main ob@subq)
                use_hash(@sel$a93afaed ob@subq)
                swap_join_inputs(@sel$a93afaed ob@subq)
                index_ss_asc(@sel$a93afaed ob@subq (t_iot.st t_iot.inst_id t_iot.svname))a
        */
        apar.id,
        'UP',
        to_date('2019-10-24','yyyy-mm-dd'),
        to_date('1969-12-31','yyyy-mm-dd'),
        'IDLE',
        null,
        'tkt007.jj.bb.com'
from
        t_base apar
where
        apar.id not in (
                select
                        /*+
                                qb_name(subq)
                        */
                        id
                from
                        t_iot ob
                where
                        inst_id = 'UP'
        )
and     nvl(apar.gp_nm,'UA') = 'UA'
and     rownum <= 5000
/

The requirement is simple – insert into table t_iot a set of values dictated by a subset of the rows in table t_base if they do not already exist in t_iot. To model the issue that appeared I’ve had to hint the SQL to get the following plan (which I pulled from memory after enabling rowsource execution stats):


---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |             |      1 |        |   296 (100)|      0 |00:00:00.03 |     788 |    148 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL | T_IOT       |      1 |        |            |      0 |00:00:00.03 |     788 |    148 |       |       |          |
|*  2 |   COUNT STOPKEY          |             |      1 |        |            |    100 |00:00:00.03 |      99 |     91 |       |       |          |
|*  3 |    HASH JOIN RIGHT ANTI  |             |      1 |    100 |   296   (2)|    100 |00:00:00.03 |      99 |     91 |    14M|  1843K|   15M (0)|
|*  4 |     INDEX SKIP SCAN      | T_IOT_STATE |      1 |  12614 |   102   (0)|  10000 |00:00:00.01 |      92 |     91 |       |       |          |
|*  5 |     TABLE ACCESS FULL    | T_BASE      |      1 |    100 |     2   (0)|    100 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<=5000)
   3 - access("APAR"."ID"="ID")
   4 - access("INST_ID"='UP')
       filter("INST_ID"='UP')
   5 - filter(NVL("APAR"."GP_NM",'UA')='UA')

The optimizer has unnested (as hinted) the subquery and converted it to an anti-join using a right hash anti-join. Take a look at the Used-mem for the hash join – would it surprise you to learn that the total size of the (not compressed in any way) IOT, and all its indexes, and the t_base table together total less than 4 MB. Something dramatically awful has happened in the hash join to generated a requirement of 14MB. (In the case of the OP this appeared as an unexpected 5GB written to the temporary tablespace.)

Before I address the source of the high memory usage, take a close look at the Predicate Information, particularly operation 3, and ask yourself what the definition of index t_iot_state might be. The predicate joins t_base.id to t_iot.id, and here’s the code to create both tables and all the indexes.

create table t_iot (
        nr_time         timestamp,
        id              varchar2(1024),
        inst_id         varchar2(200),
        o_time          timestamp,
        st              varchar2(200),
        null_col        varchar2(100),
        svname          varchar2(200),
        constraint t_iot_pk primary key(nr_time, id, inst_id)
)
organization index
/

insert into t_iot
select
        sysdate,
        dbms_random.string('l',10),
        'UP',
        sysdate,
        'IDLE',
        null,
        rpad('x',25,'x')
from
        all_objects
where
        rownum <= 1e4 -- > hint to avoid wordpress format issue
/

create index t_iot_state on t_iot(st, inst_id, svname); 
create index idx2        on t_iot(id, inst_id, svname);

create table t_base(
        id              varchar2(400) not null,
        gp_nm           varchar2(200)
)
/

insert into t_base
select
        dbms_random.string('l',10),
        'UA'
from
        all_objects
where
        rownum <= 100 -- > hint to avoid wordpress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 't_iot',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 't_base',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/


The index t_iot_state that Oracle has used in the hash join is defined on the columns (st, inst_id, svname) – so the predicate is doing a comparison with a column that’s not in the index! At least, it’s not visibly declared in the index; but this is a secondary index on an IOT, and IOTs don’t have “normal” rowids, the rowid in a secondary index is the value of the primary key (plus a “block guess”). So the columns in the index (even though not declared in the index) are: (st, inst_id, svname, {nr_time, id, inst_id, blockguess}). So this index does supply the required id column.

Side note: you’ll see in the list of columns above that inst_id appears twice. In fact (since Oracle 9, I think) the code to handle secondary indexes has been smart enough to avoid this duplication. If the secondary index contains columns from the primary key then the “rowid” doesn’t store those columns, the code knows how to construct the primaryh key value from the stored PK columns combined with the needed columns from the index entry. This can make IOTs a very nice choice of implementation for “intersection” tables that are used to represent many-to-many joins between two other tables.

Unfortunately this “rowid” is the explanation for the massive memory demand. Take a look at the “Column Projection Information” for my execution plan:


Column Projection Information (identified by operation id):
-----------------------------------------------------------
   2 - "APAR"."ID"[VARCHAR2,400], "APAR"."GP_NM"[VARCHAR2,200], ROWNUM[8]
   3 - (#keys=1) "APAR"."ID"[VARCHAR2,400], "APAR"."GP_NM"[VARCHAR2,200]
   4 - "OB".ROWID[ROWID,1249], "NR_TIME"[TIMESTAMP,11], "ID"[VARCHAR2,1024], "INST_ID"[VARCHAR2,200], "OB".ROWID[ROWID,1249]
   5 - "APAR"."ID"[VARCHAR2,400], "APAR"."GP_NM"[VARCHAR2,200]

The interesting line is operation 4. A hash join takes the rowsource from its first child (the build table) and creates an in-memory hash table (which may spill to disc, of course), so if I see an unreasonable memory allocation (or unexpected spill to disc) a good starting point is to look at what the first child is supplying. In this case the first child seems to be saying that it’s supplying (or allowing for) nearly 3,700 bytes to be passed up to the hash join.

On closer inspection we can see it’s reporting the “rowid” twice, and also reporting the three columns that make up that rowid. I think it’s reasonable to assume that it’s only supplying the rowid once, and maybe it’s not even supplying the other three columns because they are embedded in the rowid. Doing a quick arithmetic check, let’s multiply the size of the rowid by the value of A-rows: 1,249 * 10,000 = 12,490,000. That’s pretty close to the 14MB reported by the hash join in operation 3.

Hypothesis – to get at the id column, Oracle has used this index (actually a very bad choice of those available) to extract the rowid and then passed the rowid up to the parent in a (length padded) fixed format. Oracle has then created a hash table by extracting the id column from the rowid and building the hash table on it but also carrying the length-padded rowid into the hash table.  Possible variants on this theme are that some or all of the other columns in the Column Projection Information are also passed upwards so that the id doesn’t have to be extracted, but if they are they are not padded to their maximum length.

A simple test that this is broadly the right assumption is to re-run the model making the declared length of the rowid much larger to see what happens to the memory allocation. Changing the inst_id declaration from 200 bytes to 1000 bytes (note the stored value is only the 2 bytes needed for the value ‘UP’) the Used-mem jumps to 23 MB (which is an increment very similar to 800 * 10,000).  You’ll note that I chose to experiment with a column that wasn’t the column used in the join. It was a column in the secondary index definition, though, so another test would be to change the nr_time column from a timestamp (11 bytes) to a large varchar2, so I re-ran the test declaring the nr_time as a varchar2(1000) – reverting the inst_id to varchar2(200) – and the Used-mem increased to 25MB.

Preliminary Conclusion

If Oracle uses the contents of the rowid of a secondary index on an IOT in a join then it constructs a fixed format version for the rowid by padding every column in the primary key to its maximum length and concatenating the results. This can have catastrophic side effects on performance if you’ve declared some very long columns “just in case”. Any time you use index organized tables you should remember to check the Column Projection Information in any execution plans that use secondary indexes in case they are passing a large, padded, primary key through the plan to a point where a blocking operation (such as a hash join or merge join) has to accumulate a large number of rows.

Footnote

In my test case I had to hint the query heavily to force Oracle into the path I wanted to demonstrate.

It’s surprising that the optimizer should have chosen this path in the OP’s system, given that there’s another secondary index that contains the necessary columns in its definition. (So one thought is that there’s a statistics problem to address, or possibly the “good” index is subject to updates that make it become very inefficient (large) very quickly.)

Another oddity of the OP’s system was that Oracle should have chosen to do a right hash anti-join when it looked as if joining the tables in the opposite order would produce a much smaller memory demand and lower cost – there was an explict swap_join_inputs() hint in the Outline Information (so copying the outline into the query and changing that to no_swap_join_inputs() might have been abother viable workaround.) In the end the OP hinted the query to use a nested loop (anti-)join from t_base to t_iot – which is another way to avoid the hash table threat with padded rowids.

 

October 22, 2019

ANSI Plans

Filed under: CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 6:59 pm BST Oct 22,2019

Here’s a thought that falls somewhere between philosophical and pragmatic. It came up while I was playing around with a problem from the Oracle database forum that was asking about options for rewriting a query with a certain type of predicate. This note isn’t really about that question but the OP supplied a convenient script to demonstrate their requirement and I’ve hi-jacked most of the code for my own purposes so that I can ask the question:

Should the presence of an intermediate view name generated by the optimizer in the course of cost-based query transformation cause two plans, which are otherwise identical and do exactly the same thing, to have different plan hash values ?

To demonstrate the issue let’s start with a simple script to create some data and generate an execution plan.


rem
rem     Script:         or_expand_plans.sql
rem     Author:         Jonathan Lewis
rem     Dated           Oct 2019
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Notes:
rem     Data creation copied from ODC
rem     https://community.oracle.com/thread/4297365
rem

create table t as   
select 1 as id, to_date('2019-10-11', 'YYYY-MM-DD') as lastupdated, 'Toronto' as place from dual  
union all  
select 2 as id, to_date('2019-10-12', 'YYYY-MM-DD') as lastupdated, 'Toronto' as place from dual  
union all  
select 3 as id, to_date('2019-10-15', 'YYYY-MM-DD') as lastupdated, 'Toronto' as place from dual  
union all  
select 4 as id, to_date('2019-10-21', 'YYYY-MM-DD') as lastupdated, 'Toronto' as place from dual  
;  

create table t_others as  
select 1 as id, to_date('2019-10-12', 'YYYY-MM-DD') as lastupdated, 'Blue'  as color, 'Zone 7' as zoneid from dual  
union all  
select 2 as id, to_date('2019-10-21', 'YYYY-MM-DD') as lastupdated, 'Red'   as color, 'Zone 7' as zoneid from dual  
union all  
select 3 as id, to_date('2019-10-16', 'YYYY-MM-DD') as lastupdated, 'White' as color, 'Zone 7' as zoneid from dual  
union all  
select 4 as id, to_date('2019-10-17', 'YYYY-MM-DD') as lastupdated, 'Green' as color, 'Zone 7' as zoneid from dual  
;  

create table t_further_info as  
select 1 as id, to_date('2019-10-12', 'YYYY-MM-DD') as lastupdated, 'A' as typeinfo from dual  
union all   
select 2 as id, to_date('2019-10-14', 'YYYY-MM-DD') as lastupdated, 'C' as typeinfo from dual  
union all  
select 3 as id, to_date('2019-10-21', 'YYYY-MM-DD') as lastupdated, 'D' as typeinfo from dual  
union all  
select 4 as id, to_date('2019-10-21', 'YYYY-MM-DD') as lastupdated, 'E' as typeinfo from dual  
;  

prompt  ====================
prompt  "Traditional" syntax
prompt  ====================

explain plan for
select 
        /*+ or_expand */ 
        * 
from
        t,
        t_others        pt,
        t_further_info  fi  
/*
where   (
             t.lastupdated >= to_date('2019-10-21', 'YYYY-MM-DD')   
         or pt.lastupdated >= to_date('2019-10-21', 'YYYY-MM-DD')  
         or fi.lastupdated >= to_date('2019-10-21', 'YYYY-MM-DD')  
        )  
*/
where   to_date('2019-10-21', 'YYYY-MM-DD') <= any(t.lastupdated, pt.lastupdated, fi.lastupdated)   
and     pt.id = t.id  
and     fi.id = t.id  
;

select * from table(dbms_xplan.display(null,null,'outline'));


You’ll see that I have a simple three-table join with the nasty little detail that I have a “non-join” predicates that may require Oracle to check across all three tables before it can decide whether or not a row should be discarded. I’ve shown two variants on a theme – they both have exactly the same effect but the ANY() presentation is just a little bit neater and more compact.

Essentially Oracle can use one of two strategies for this type of query/predicate; the first is to join all three tables and wait until the final join rowsource appears and then apply the check, or it can split the query into a union all of three separate queries where each query drives off a different table selecting only the rows from that table that match “its” part of the predicate.

In the latter case the second and third branches of the union all have to be modified to ensure that they discard any rows already returned by preceding parts of the union all; this can mean lots of new predicates appearing that use the lnnvl() function. (Of course there are variants between these two extremes, but the general principle is unchanged.)

The presence of the (incomplete) /*+ or_expand */ hint in my query is there to tell the optimizer that it should attempt to transform the query into the three-part union all. This, by the way, is a 12c feature, though older versions of Oracle could get similar effects in some cases from the /*+ use_concat */ hint. Here’s the plan, with outline,  I got from 12.2.0.1:


Plan hash value: 3181357500

----------------------------------------------------------------------------------------------
| Id  | Operation               | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                    |     3 |   270 |    17   (6)| 00:00:01 |
|   1 |  VIEW                   | VW_ORE_67EF6547    |     3 |   270 |    17   (6)| 00:00:01 |
|   2 |   UNION-ALL             |                    |       |       |            |          |
|*  3 |    HASH JOIN            |                    |     1 |    55 |     6   (0)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN|                    |     4 |   144 |     4   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL  | T_FURTHER_INFO     |     1 |    13 |     2   (0)| 00:00:01 |
|   6 |      BUFFER SORT        |                    |     3 |    69 |     2   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS FULL | T_OTHERS           |     3 |    69 |     2   (0)| 00:00:01 |
|*  8 |     TABLE ACCESS FULL   | T                  |     3 |    57 |     2   (0)| 00:00:01 |
|*  9 |    HASH JOIN            |                    |     2 |   182 |    11  (10)| 00:00:01 |
|  10 |     VIEW                | VW_JF_SET$FB5125FC |     2 |   156 |     9  (12)| 00:00:01 |
|  11 |      UNION-ALL          |                    |       |       |            |          |
|* 12 |       HASH JOIN         |                    |     1 |    42 |     4   (0)| 00:00:01 |
|* 13 |        TABLE ACCESS FULL| T_OTHERS           |     1 |    23 |     2   (0)| 00:00:01 |
|* 14 |        TABLE ACCESS FULL| T                  |     3 |    57 |     2   (0)| 00:00:01 |
|* 15 |       HASH JOIN         |                    |     1 |    42 |     4   (0)| 00:00:01 |
|* 16 |        TABLE ACCESS FULL| T                  |     1 |    19 |     2   (0)| 00:00:01 |
|  17 |        TABLE ACCESS FULL| T_OTHERS           |     4 |    92 |     2   (0)| 00:00:01 |
|  18 |     TABLE ACCESS FULL   | T_FURTHER_INFO     |     4 |    52 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      USE_HASH(@"SEL$7C4216F7" "PT"@"SEL$1")
      LEADING(@"SEL$7C4216F7" "T"@"SEL$1" "PT"@"SEL$1")
      FULL(@"SEL$7C4216F7" "PT"@"SEL$1")
      FULL(@"SEL$7C4216F7" "T"@"SEL$1")
      USE_HASH(@"SEL$A4A33BE0" "T"@"SEL$1")
      LEADING(@"SEL$A4A33BE0" "PT"@"SEL$1" "T"@"SEL$1")
      FULL(@"SEL$A4A33BE0" "T"@"SEL$1")
      FULL(@"SEL$A4A33BE0" "PT"@"SEL$1")
      USE_HASH(@"SET$49E1C21B_3" "T"@"SEL$1")
      USE_MERGE_CARTESIAN(@"SET$49E1C21B_3" "PT"@"SEL$1")
      LEADING(@"SET$49E1C21B_3" "FI"@"SEL$1" "PT"@"SEL$1" "T"@"SEL$1")
      FULL(@"SET$49E1C21B_3" "T"@"SEL$1")
      FULL(@"SET$49E1C21B_3" "PT"@"SEL$1")
      FULL(@"SET$49E1C21B_3" "FI"@"SEL$1")
      USE_HASH(@"SEL$5FCD2D3C" "FI"@"SEL$1")
      LEADING(@"SEL$5FCD2D3C" "VW_JF_SET$FB5125FC"@"SEL$81DF0931" "FI"@"SEL$1")
      FULL(@"SEL$5FCD2D3C" "FI"@"SEL$1")
      NO_ACCESS(@"SEL$5FCD2D3C" "VW_JF_SET$FB5125FC"@"SEL$81DF0931")
      NO_ACCESS(@"SEL$67EF6547" "VW_ORE_67EF6547"@"SEL$67EF6547")
      OUTLINE(@"SET$49E1C21B_2")
      OUTLINE(@"SET$49E1C21B_1")
      OUTLINE(@"SEL$1")
      FACTORIZE_JOIN(@"SET$49E1C21B"("FI"@"SET$49E1C21B_2" "FI"@"SET$49E1C21B_1"))
      OUTLINE(@"SET$0E101D56")
      OUTLINE(@"SEL$81DF0931")
      OUTLINE(@"SEL$5AB42CD1")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE(@"SET$49E1C21B")
      OUTLINE_LEAF(@"SEL$67EF6547")
      FACTORIZE_JOIN(@"SET$49E1C21B"("FI"@"SET$49E1C21B_2" "FI"@"SET$49E1C21B_1"))
      OUTLINE_LEAF(@"SET$0E101D56")
      OUTLINE_LEAF(@"SEL$5FCD2D3C")
      OUTLINE_LEAF(@"SET$FB5125FC")
      OUTLINE_LEAF(@"SEL$A4A33BE0")
      OUTLINE_LEAF(@"SEL$7C4216F7")
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      ALL_ROWS
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PT"."ID"="T"."ID" AND "FI"."ID"="T"."ID")
   5 - filter("FI"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   7 - filter(LNNVL("PT"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
   8 - filter(LNNVL("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
   9 - access("FI"."ID"="ITEM_1")
  12 - access("PT"."ID"="T"."ID")
  13 - filter("PT"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  14 - filter(LNNVL("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
  15 - access("PT"."ID"="T"."ID")
  16 - filter("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


This is a wonderful demonstration of how brilliant the optimizer can be.  The query has gone through several transformations and two of them have very high visibility. First, you can see the name VW_ORE_67EF6547 at operation 1. This is a view name that Oracle generates to express (cost-based) OR-Expansion” so the optimizer has clearly obeyed my hint. As a consequence of OR-expansion we can also see several examples of the lnnvl() function appearing in the Predicate Information section of the output.; we can also see the hint re-appearing in the completed form of OR_EXPAND(@”SEL$1″ (1) (2) (3)) in the Outline Data.

However, we don’t have the union all of three pieces that we might have expected; we have a union all of two pieces and the second piece is a hash join between the table t_further_info and a view called VW_JF_SET$FB5125FC. This view is the result of “join factorization”. The optimizer has taken the 2nd and 3rd sections of our union all view and decided that it would be cost-effective to “factor out” a common table, so this:

select from t_others, t, t_further_info ... where t_others.date_predicate ...
union all
select from t, t_others, t_further_info ... where t.date_predicate ...

changes to this:

select from 
        (
        select from t_others, t   ... where t_others.date_predicate ...
        union all
        select from t, t_others   ... where t.date_predicate ...
        ),
        t_further_info 
where   ...

Having said all that, I now have to change the code because the original query was written using “ANSI” style joins – like this:


explain plan for
select
        /*+ or_expand */
        * 
from
        t   
inner join 
        t_others       pt  
on      pt.id = t.id  
inner join 
        t_further_info fi  
on      fi.id = t.id  
where
        (
             t.lastupdated >= to_date('2019-10-21', 'YYYY-MM-DD')   
         or pt.lastupdated >= to_date('2019-10-21', 'YYYY-MM-DD')  
         or fi.lastupdated >= to_date('2019-10-21', 'YYYY-MM-DD')  
        )
;

select  * from table(dbms_xplan.display(null,null,'outline'));


In the ANSI example I happen to have used the explicit “OR” list for the date predicates but that’s not really signficant . Here’s the plan produced by this query – and the first thing I’d like you to note is the Plan hash value:

Plan hash value: 3309788271

----------------------------------------------------------------------------------------------
| Id  | Operation               | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                    |     3 |   270 |    17   (6)| 00:00:01 |
|   1 |  VIEW                   | VW_ORE_31069B60    |     3 |   270 |    17   (6)| 00:00:01 |
|   2 |   UNION-ALL             |                    |       |       |            |          |
|*  3 |    HASH JOIN            |                    |     1 |    55 |     6   (0)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN|                    |     4 |   144 |     4   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL  | T_FURTHER_INFO     |     1 |    13 |     2   (0)| 00:00:01 |
|   6 |      BUFFER SORT        |                    |     3 |    69 |     2   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS FULL | T_OTHERS           |     3 |    69 |     2   (0)| 00:00:01 |
|*  8 |     TABLE ACCESS FULL   | T                  |     3 |    57 |     2   (0)| 00:00:01 |
|*  9 |    HASH JOIN            |                    |     2 |   182 |    11  (10)| 00:00:01 |
|  10 |     VIEW                | VW_JF_SET$997549B1 |     2 |   156 |     9  (12)| 00:00:01 |
|  11 |      UNION-ALL          |                    |       |       |            |          |
|* 12 |       HASH JOIN         |                    |     1 |    42 |     4   (0)| 00:00:01 |
|* 13 |        TABLE ACCESS FULL| T_OTHERS           |     1 |    23 |     2   (0)| 00:00:01 |
|* 14 |        TABLE ACCESS FULL| T                  |     3 |    57 |     2   (0)| 00:00:01 |
|* 15 |       HASH JOIN         |                    |     1 |    42 |     4   (0)| 00:00:01 |
|* 16 |        TABLE ACCESS FULL| T                  |     1 |    19 |     2   (0)| 00:00:01 |
|  17 |        TABLE ACCESS FULL| T_OTHERS           |     4 |    92 |     2   (0)| 00:00:01 |
|  18 |     TABLE ACCESS FULL   | T_FURTHER_INFO     |     4 |    52 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      USE_HASH(@"SEL$D12FC97A" "PT"@"SEL$1")
      LEADING(@"SEL$D12FC97A" "T"@"SEL$1" "PT"@"SEL$1")
      FULL(@"SEL$D12FC97A" "PT"@"SEL$1")
      FULL(@"SEL$D12FC97A" "T"@"SEL$1")
      USE_HASH(@"SEL$09C9729D" "T"@"SEL$1")
      LEADING(@"SEL$09C9729D" "PT"@"SEL$1" "T"@"SEL$1")
      FULL(@"SEL$09C9729D" "T"@"SEL$1")
      FULL(@"SEL$09C9729D" "PT"@"SEL$1")
      USE_HASH(@"SET$E8D85587_3" "T"@"SEL$1")
      USE_MERGE_CARTESIAN(@"SET$E8D85587_3" "PT"@"SEL$1")
      LEADING(@"SET$E8D85587_3" "FI"@"SEL$2" "PT"@"SEL$1" "T"@"SEL$1")
      FULL(@"SET$E8D85587_3" "T"@"SEL$1")
      FULL(@"SET$E8D85587_3" "PT"@"SEL$1")
      FULL(@"SET$E8D85587_3" "FI"@"SEL$2")
      USE_HASH(@"SEL$95B99BAF" "FI"@"SEL$2")
      LEADING(@"SEL$95B99BAF" "VW_JF_SET$997549B1"@"SEL$BB7F1ECF" "FI"@"SEL$2")
      FULL(@"SEL$95B99BAF" "FI"@"SEL$2")
      NO_ACCESS(@"SEL$95B99BAF" "VW_JF_SET$997549B1"@"SEL$BB7F1ECF")
      NO_ACCESS(@"SEL$31069B60" "VW_ORE_31069B60"@"SEL$31069B60")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$58A6D7F6")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SET$E8D85587_2")
      OUTLINE(@"SET$E8D85587_1")
      MERGE(@"SEL$58A6D7F6" >"SEL$3")
      OUTLINE(@"SEL$9E43CB6E")
      FACTORIZE_JOIN(@"SET$E8D85587"("FI"@"SET$E8D85587_2" "FI"@"SET$E8D85587_1"))
      OUTLINE(@"SET$6117B24C")
      OUTLINE(@"SEL$BB7F1ECF")
      OUTLINE(@"SEL$344003E3")
      OR_EXPAND(@"SEL$9E43CB6E" (1) (2) (3))
      OUTLINE(@"SET$E8D85587")
      OUTLINE_LEAF(@"SEL$31069B60")
      FACTORIZE_JOIN(@"SET$E8D85587"("FI"@"SET$E8D85587_2" "FI"@"SET$E8D85587_1"))
      OUTLINE_LEAF(@"SET$6117B24C")
      OUTLINE_LEAF(@"SEL$95B99BAF")
      OUTLINE_LEAF(@"SET$997549B1")
      OUTLINE_LEAF(@"SEL$09C9729D")
      OUTLINE_LEAF(@"SEL$D12FC97A")
      OUTLINE_LEAF(@"SET$E8D85587_3")
      ALL_ROWS
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("FI"."ID"="T"."ID" AND "PT"."ID"="T"."ID")
   5 - filter("FI"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   7 - filter(LNNVL("PT"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
   8 - filter(LNNVL("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
   9 - access("FI"."ID"="ITEM_1")
  12 - access("PT"."ID"="T"."ID")
  13 - filter("PT"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  14 - filter(LNNVL("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
  15 - access("PT"."ID"="T"."ID")
  16 - filter("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


Is the plan for the “ANSI” version of the query the same as the plan for the “traditional” version? How carefully have you checked – after simply noting that the two Plan hash values were different.

The plans are the same – in that the mechanics are exactly the same and that ought to be the criterion on which we should judge them. But the hash values are different because of the change from traditional to ANSI syntax. The traditional form of the query starts life with a single query block while the ANSI form is considered to be two query blocks, so the initial ANSI query goes through three stages:


1) select from t1 join t2 join t3

2) select /*+ qb_name(sel$2) */ from (select /* qb_name(sel$1) */ from t1, t2), t3

3) select /*+ qb_name(sel$9E43CB6E) */ from t1, t2, t3

So the query is rapidly transformed to the equivalent traditional syntax but we  now have a query block name of SEL$9E43CB6E instead of SEL$1 that the traditional query (in the absence of a /*+ qb_name() */ hint would have had. This is why you see the difference in the two or_expand() hints in the Outline Data section. One reads: OR_EXPAND(@SEL$1 (1) (2) (3)), the other reads OR_EXPAND(@”SEL$9E43CB6E” (1) (2) (3)), and all the subseqent query block name differences follow on from this initial transformation. (Note: the value “sel$9e43cb6e” is derived from the input query block names of sel$1 and sel$2 that the new query block is derived from)

You may decide that this mismatch isn’t really terribly important. If you’ve modified the code to switch to ANSI style joins then you may be prepared to put in a little extra effort to check the plan in detail to see that it hasn’t changed; but it would be a lot nicer if the hash value wasn’t dependent on generated view names. You may recall that at one time Oracle had problems with plans that used materialized CTEs (“with” subqueries) because the plan hash value was affected by object names like sys_temp_0fd9d6791_dfc12da. The same principle ought, I think, to apply here.

If you don’t mind the ANSI/tradiational switch though, you might find that you’re less happy when you upgrade to 19c, because the same effect appears there too, only it’s worse. Not only do “identical” traditional and ANSI plans have different hash values, they don’t match the values from 12c because the generated name for the join factorization views (VW_JF) change in the upgrade. So if you’re depending on SQL Plan Baselines to reproduce 12c plans on 19c when you upgrade you may find cases where you know the stored baseline is giving you the same plan but Oracle thinks it isn’t and refuses to use it.

tl;dr

Plans which are functionally identical can have different plan hash values because the plans were reached through a different series of tranformations. In particular if you rewrite queries from “traditional” Oracle syntax to “ANSI” syntax you will find cases where the plan doesn’t change but the plan hash value does thanks to a change in the names of views generated by some transformations.

More significantly, if you upgrade from 12c to 19c there are case where the names of views generated by transformations may change, which could cause the optimizer to discard some of your carefully constructed SQL Plan Baselines as “not reproducible”.

Footnote

For reference, if I add the hint /*+ opt_param(‘_optimizer_join_factorization’ ‘false’) */ to the “traditional query then I get the following plan which shows more clearly the three branches that the original query has been split into – each section starting with a different choice for the driving table:


-------------------------------------------------------------------------------------------
| Id  | Operation               | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                 |     3 |   270 |    19   (6)| 00:00:01 |
|   1 |  VIEW                   | VW_ORE_67EF6547 |     3 |   270 |    19   (6)| 00:00:01 |
|   2 |   UNION-ALL             |                 |       |       |            |          |
|*  3 |    HASH JOIN            |                 |     1 |    55 |     7  (15)| 00:00:01 |
|*  4 |     HASH JOIN           |                 |     1 |    32 |     4   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL  | T               |     1 |    19 |     2   (0)| 00:00:01 |
|   6 |      TABLE ACCESS FULL  | T_FURTHER_INFO  |     4 |    52 |     2   (0)| 00:00:01 |
|   7 |     TABLE ACCESS FULL   | T_OTHERS        |     4 |    92 |     2   (0)| 00:00:01 |
|*  8 |    HASH JOIN            |                 |     1 |    55 |     6   (0)| 00:00:01 |
|   9 |     MERGE JOIN CARTESIAN|                 |     4 |   144 |     4   (0)| 00:00:01 |
|* 10 |      TABLE ACCESS FULL  | T_OTHERS        |     1 |    23 |     2   (0)| 00:00:01 |
|  11 |      BUFFER SORT        |                 |     4 |    52 |     2   (0)| 00:00:01 |
|  12 |       TABLE ACCESS FULL | T_FURTHER_INFO  |     4 |    52 |     2   (0)| 00:00:01 |
|* 13 |     TABLE ACCESS FULL   | T               |     3 |    57 |     2   (0)| 00:00:01 |
|* 14 |    HASH JOIN            |                 |     1 |    55 |     6   (0)| 00:00:01 |
|  15 |     MERGE JOIN CARTESIAN|                 |     4 |   144 |     4   (0)| 00:00:01 |
|* 16 |      TABLE ACCESS FULL  | T_FURTHER_INFO  |     1 |    13 |     2   (0)| 00:00:01 |
|  17 |      BUFFER SORT        |                 |     3 |    69 |     2   (0)| 00:00:01 |
|* 18 |       TABLE ACCESS FULL | T_OTHERS        |     3 |    69 |     2   (0)| 00:00:01 |
|* 19 |     TABLE ACCESS FULL   | T               |     3 |    57 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      USE_HASH(@"SET$BB614FD2_3" "T"@"SET$BB614FD2_3")
      USE_MERGE_CARTESIAN(@"SET$BB614FD2_3" "PT"@"SET$BB614FD2_3")
      LEADING(@"SET$BB614FD2_3" "FI"@"SET$BB614FD2_3" "PT"@"SET$BB614FD2_3"
              "T"@"SET$BB614FD2_3")
      FULL(@"SET$BB614FD2_3" "T"@"SET$BB614FD2_3")
      FULL(@"SET$BB614FD2_3" "PT"@"SET$BB614FD2_3")
      FULL(@"SET$BB614FD2_3" "FI"@"SET$BB614FD2_3")
      USE_HASH(@"SET$BB614FD2_2" "T"@"SET$BB614FD2_2")
      USE_MERGE_CARTESIAN(@"SET$BB614FD2_2" "FI"@"SET$BB614FD2_2")
      LEADING(@"SET$BB614FD2_2" "PT"@"SET$BB614FD2_2" "FI"@"SET$BB614FD2_2"
              "T"@"SET$BB614FD2_2")
      FULL(@"SET$BB614FD2_2" "T"@"SET$BB614FD2_2")
      FULL(@"SET$BB614FD2_2" "FI"@"SET$BB614FD2_2")
      FULL(@"SET$BB614FD2_2" "PT"@"SET$BB614FD2_2")
      USE_HASH(@"SET$BB614FD2_1" "PT"@"SET$BB614FD2_1")
      USE_HASH(@"SET$BB614FD2_1" "FI"@"SET$BB614FD2_1")
      LEADING(@"SET$BB614FD2_1" "T"@"SET$BB614FD2_1" "FI"@"SET$BB614FD2_1"
              "PT"@"SET$BB614FD2_1")
      FULL(@"SET$BB614FD2_1" "PT"@"SET$BB614FD2_1")
      FULL(@"SET$BB614FD2_1" "FI"@"SET$BB614FD2_1")
      FULL(@"SET$BB614FD2_1" "T"@"SET$BB614FD2_1")
      NO_ACCESS(@"SEL$49E1C21B" "VW_ORE_67EF6547"@"SEL$67EF6547")
      OUTLINE(@"SEL$1")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$49E1C21B")
      OUTLINE_LEAF(@"SET$BB614FD2")
      OUTLINE_LEAF(@"SET$BB614FD2_1")
      OUTLINE_LEAF(@"SET$BB614FD2_2")
      OUTLINE_LEAF(@"SET$BB614FD2_3")
      ALL_ROWS
      OPT_PARAM('_optimizer_join_factorization' 'false')
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PT"."ID"="T"."ID")
   4 - access("FI"."ID"="T"."ID")
   5 - filter("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   8 - access("PT"."ID"="T"."ID" AND "FI"."ID"="T"."ID")
  10 - filter("PT"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  13 - filter(LNNVL("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))
  14 - access("PT"."ID"="T"."ID" AND "FI"."ID"="T"."ID")
  16 - filter("FI"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  18 - filter(LNNVL("PT"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))
  19 - filter(LNNVL("T"."LASTUPDATED">=TO_DATE(' 2019-10-21 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))

88 rows selected.

Although the “traditional” and “ANSI” plans still show a difference in their Plan hash values when join factorization is blocked, the absence of the join factorization view means that the plan hash values are now consistent between 12c to 19c (the output above came from 19.3.0.0 as you can see in the Outline information).

October 11, 2019

v$session

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:29 pm BST Oct 11,2019

Here’s an odd, and unpleasant, detail about querying v$session in the “most obvious” way. (And if you were wondering what made me resurrect and complete a draft on “my session id” a couple of days ago, this posting is the reason). Specifically if you want to select some information for your own session from v$session the query you’re likely to use in any recent version of Oracle will probably be of the form:


select {list for columns} from v$session where sid = to_number(sys_context('userenv','sid'));

Unfortunately that one little statement hides two anomalies – which you can see in the execution plan. Here’s a demonstration cut from an SQL*Plus session running under 19.3.0.0:


SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  gcfrzq9knynj3, child number 0
-------------------------------------
select program from V$session where sid = sys_context('userenv','sid')

Plan hash value: 2422122865

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  MERGE JOIN CARTESIAN     |                 |     1 |    33 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    12 |     0   (0)|
|*  3 |    FIXED TABLE FULL       | X$KSLWT         |     1 |     8 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |     1 |     4 |     0   (0)|
|   5 |   BUFFER SORT             |                 |     1 |    21 |     0   (0)|
|*  6 |    FIXED TABLE FULL       | X$KSUSE         |     1 |    21 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------ -------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid')))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter((BITAND("S"."KSUSEFLG",1)<>0 AND BITAND("S"."KSSPAFLG",1)<>0 AND 
              "S"."INDX"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))
              AND INTERNAL_FUNCTION("S"."CON_ID") AND "S"."INST_ID"=USERENV('INSTANCE')))

As you can see, v$session is a join of 3 separate structures – x$kslwt (v$session_wait), x$ksled (v$event_name), and x$ksuse (the original v$session as it was some time around 8i), and the plan shows two “full tablescans” and a Cartesian merge join. Tablescans and Cartesian merge joins are not necessarily bad – especially where small tables and tiny numbers of rows are concerned – but they do merit at least a brief glance.

x$ksuse is a C structure in the fixed SGA and that structure is a segmented array (which seems to be chunks of 126 entries in 19.3, and chunks of 209 entries in 12.2 – but that’s fairly irrelevant). The SID is simply the index into the array counting from 1, so if you have a query with a predicate like ‘SID = 99’ Oracle can work out the address of the 99th entry in the array and access it very quickly – which is why the SID column is reported as a “fixed index” column in the view v$indexed_fixed_column.

But we have two problems immediately visible:

  1. the optimizer is not using the “index” to access x$ksuse despite the fact that we’re giving it exactly the value we want to use (and we can see a suitable predicate at operation 6 in the plan)
  2. the optimizer has decided to start executing the query at the x$kslwt table

Before looking at why thing’s have gone wrong, let’s check the execution plan to see what would have happened if we’d copied the value from the sys_context() call into a bind variable and queried using the bind variable – which we’ll keep as a character type to make it a fair comparison:

SQL_ID  cm3ub1tctpdyt, child number 0
-------------------------------------
select program from v$session where sid = to_number(:v1)

Plan hash value: 1627146547

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  MERGE JOIN CARTESIAN     |                 |     1 |    32 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    12 |     0   (0)|
|*  3 |    FIXED TABLE FIXED INDEX| X$KSLWT (ind:1) |     1 |     8 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |     1 |     4 |     0   (0)|
|   5 |   BUFFER SORT             |                 |     1 |    20 |     0   (0)|
|*  6 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    20 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(:V1))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter(("S"."INDX"=TO_NUMBER(:V1) AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))


When we have a (character) bind variable instead of a sys_context() value the optimizer manages to use the “fixed indexes”., but it’s still started executing at x$kslwt, and still doing a Cartesian merge join. The plan would be the same if the bind variable were a numeric type, and we’d still get the same plan if we replaced the bind variable with a literal number.

So problem number 1 is that Oracle only seems able to use the fixed index path for literal values and simple bind variables (plus a few “simple” functions). It doesn’t seem to use the fixed indexes for most functions (even deterministic ones) returning a value and the sys_context() function is a particular example of this.

Transitivity

Problem number 2 comes from a side-effect of something that I first described about 13 years ago – transitive closure. Take a look at the predicates in both the execution plans above. Where’s the join condition between x$ksuse and x$kslwt ? There should be one, because the underlying SQL defining [g]v$session  has the following joins:

from
      x$ksuse s,
      x$ksled e,
      x$kslwt w
where
      bitand(s.ksspaflg,1)!=0
and   bitand(s.ksuseflg,1)!=0
and   s.indx=w.kslwtsid       -- this is the SID column for v$session and v$session_wait
and   w.kslwtevt=e.indx
 

What’s happened here is that the optimizer has used transitive closure: “if a = b and b = c then a = c” to clone the predicate “s.indx = to_number(sys_context(…))” to “w.kslwtsid = to_number(sys_context(…))”. But at the same time the optmizer has eliminated the predicate “s.indx = w.kslwtsid”, which it shouldn’t do because this is 12.2.0.1 and ever since 10g we’ve had the parameter _optimizer_transitivity_retain = true — but SYS is ignoring the parameter setting.

So we no longer have a join condition between x$ksuse and x$kslwt – which means there has to be a cartesian merge join between them and the only question is whether this should take place before or after the join between x$kslwt and x$ksled. In fact, the order doesn’t really matter because there will be only one row identified in x$kslwt and one row in x$ksuse, and the join to x$ksled is simply a lookup (by undeclarable unique key) to translate an id into a name and it will take place only once whatever we do about the other two structures.

But there is a catch – especially if your sessions parameter is 25,000 (which it shouldn’t be) and the number of connected sessions is currently 20,000 (which it shouldn’t be) – the predicate against x$ksuse does a huge amount of work as it walks the entire array testing every row (and it doesn’t even do the indx test first – it does a couple of bitand() operations). Even then this wouldn’t be a disaster – we’re only talking a couple of hundredths of a second of CPU – until you find the applications that run this query a huge number of times.

We would prefer to avoid two full tablescans since the arrays could be quite large, and of the two it’s the tablescan of x$ksuse that is going to be the greater threat; so is there a way to bypass the threat?  Once we’ve identified the optimizer anomaly we’ve got a pointer to a solution. Transitivity is going wrong, so let’s attack the transitivity. Checking the hidden parameters we can find a parameter: _optimizer_generate_transitive_pred which defaults to true, so let’s set it to false for the query and check the plan:

select  /*+   opt_param('_optimizer_generate_transitive_pred','FALSE')
*/  program from  v$session where  sid = sys_context('userenv','sid')

Plan hash value: 3425234845

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  NESTED LOOPS             |                 |     1 |    32 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    28 |     0   (0)|
|   3 |    FIXED TABLE FULL       | X$KSLWT         |    47 |   376 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    20 |     0   (0)|
|*  5 |   FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |     4 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("S"."INDX"="W"."KSLWTSID" AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INDX"=TO_NUMBER(SYS_CONTEXT('user
              env','sid')) AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTEVT"="E"."INDX")


Although it’s not nice to insert hidden parameters into the optimizer activity we do have a result. We don’t have any filtering on x$kslwt – fortunately this seems to be limited in size (but see footnote) to the number of current sessions (unlike x$ksuse which has an array size defined by the sessions parameter or derived from the processes parameter). For each row in x$kslwt we do an access into x$ksuse using the “index” (note that we don’t see access predicates for the fixed tables, we just have to note the operation says FIXED INDEX and spot the “index-related” predicate in the filter predicate list), so this strategy has reduced the number of times we check the complex predicate on x$ksuse rows.

It’s still far from ideal, though. What we’d really like to do is access x$kslwt by index using the known value from sys_context(‘userenv’,’sid’). As it stands the path we get from using a hidden parameter which isn’t listed as legal for the opt_param() hint is a plan that we would get if we used an unhinted query that searched for audsid = sys_context(‘userenv’,’sessionid’).

SQL_ID  7f3f9b9f32u7z, child number 0
-------------------------------------
select  program from  v$session where  audsid =
sys_context('userenv','sessionid')

Plan hash value: 3425234845

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  NESTED LOOPS             |                 |     2 |    70 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     2 |    62 |     0   (0)|
|   3 |    FIXED TABLE FULL       | X$KSLWT         |    47 |   376 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    23 |     0   (0)|
|*  5 |   FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |     4 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("S"."INDX"="W"."KSLWTSID" AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."KSUUDSES"=TO_NUMBER(SYS_CONTEXT('
              userenv','sessionid')) AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTEVT"="E"."INDX")


The bottom line, then, seems to be that if you need a query by SID against v$session to be as efficient as possible then your best bet is to load a numeric variable with the sys_context(‘userenv’,’sid’) and then select where “sid = :bindvariable”.  Otherwise query by audsid, or use a hidden parameter to affect the optimizer.

Until the the SYS schema follows the _optimizer_transitivity_retain parameter or treats sys_context() the same way it treats a bind variable there is always going to be some unnecessary work when querying v$session and that excess will grow with either the number of connected sessions (if you optimize the query) or with the value of the sessions parameter.

Footnote

In (much) older versions of Oracle v$session_wait sat on top of x$ksusecst, which was part of the same C structure as x$ksuse. In newer versions of Oracle x$kslwt is a structure that is created on demand in the PGA/UGA – I hope that there’s a short cut that allows Oracle to find the waiting elements in x$ksuse[cst] efficiently, rather than requiring a walk through the whole thing, otherwise a tablescan of the (nominally smaller) x$kslwt structure will be at least as expensive as a tablescan of the x$ksuse structure.

Update (just a few minutes after posting)

Bob Bryla has pointed out in a tweet that there are many “bugs” not fixed until 19.1 for which the workaround is to set “_optimizer_transitivity_retain” to false. So maybe this isn’t an example of SYS doing something particularly strange – it may be part of a general reworking of the mechanism that still has a couple of undesirable side effects.

Bob’s comment prompted me to clone the x$ tables into real tables in a non-SYS schema and model the fixed indexes with primary keys, and I found that the resulting plan (though very efficient) still discarded the join predicate. So we may be seeing the side effects of a code enhancement relating to generating predicates that produce unique key access paths. (A “contrary” test, the one in the 2013 article I linked to, still retains the join predicate for the query that has non-unique indexes.)

 

September 28, 2019

Negative Offload

Filed under: Exadata,Execution plans,HCC,Oracle,Troubleshooting — Jonathan Lewis @ 5:38 pm BST Sep 28,2019

At the Trivadis Performance Days 2019 I did a presentation on using execution plans to understand what a query was doing. One of the examples I showed was a plan from an Exadata system (using 11.2.0.4) that needed to go faster. The plan was from the SQL Monitor report and all I want to show you is one line that’s reporting a tablescan. To fit the screen comfortably I’ve removed a number of columns from the output.

The report had been generated while the statement was still running (hence the “->” at the left hand edge) and the query had scanned 166 segments (with no partition elimination) of a table with 4,500 data segments (450 range partitions and 10 hash sub-partitions – note the design error, by the way, hash partitioning in Oracle should always hash for a powert of 2).


SQL Plan Monitoring Details (Plan Hash Value=3764612084)  
============================================================================================================================================
| Id   |           Operation            | Name  | Read  | Read  | Write | Write |   Cell   | Mem  | Activity |       Activity Detail       |  
|      |                                |       | Reqs  | Bytes | Reqs  | Bytes | Offload  |      |   (%)    |         (# samples)         |   
============================================================================================================================================
| -> 5 |      TABLE ACCESS STORAGE FULL | TXN   |  972K | 235GB |       |       | -203.03% |   7M |    63.43 | Cpu (1303)                  | 
|      |                                |       |       |       |       |       |          |      |          | cell smart table scan (175) | 
============================================================================================================================================

In the presentation I pointed out that for a “cell smart table scan” (note the Activity Detail colum) this line was using a surprisingly large amount of CPU.

We had been told that the table was using hybrid columnar compression (HCC) and had been given some figures that showed the compression factor was slightly better than 4. I had also pointed out that the typical size of a read request was 256KB. (Compare Read Reqs with Read Bytes)

To explain the excessive CPU I claimed that we were seeing “double decompression” – the cell was decompressing (uncompressing) compression units (CUs), finding that the resulting decompressed data was larger than the 1MB unit that Exadata allows and sending the original compressed CU to the database server where it was decompressed again – and the server side decompression was burning up the CPU.

This claim is (almost certainly) true – but the justification I gave for the claim was at best incomplete (though, to be brutally honest, I have to admit that I’d made a mistake): I pointed out that the Cell Offload was negative 200% and that this was what told us about the double decompression. While double decompression was probably happening the implication I had made was that a negative offload automatically indicated double decompression – and that’s was an incorrect assumption on my part. Fortunately Maurice Müller caught up with me after the session was over and pointed out the error then emailed me a link to a relevant article by Ahmed Aangour.

The Cell Offload is a measure of the difference between the volume of data read and the volume of data returned to the server. If the cell reads 256KB from disc, but the column and row selection means the cell returns 128KB the Cell Offload would be 50%; if the cell returns 64KB the Cell Offload would be 75% (100 * (1 – 64KB/256KB)). But what if you select all the rows and columns from a compressed table – the volume of data after decompression would be larger than the compressed volume the cell had read from disc – and in this case we knew that we were reading 256KB at a time and the compression factor was slightly greater than 4, so the uncompressed data would probably be around 1MB, giving us a Cell Offload of 100 * (1 – 1024KB / 256KB) = negative 300%

Key Point: Any time that decompression, combined with the row and column selection, produces more data than the volume of data read from disc the Cell Offload will go negative. A negative Cell Offload is not inherently a problem (though it might hint at a suboptimal use of compression).

Follow-up Analysis

Despite the error in my initial understanding the claim that we were seeing double decompression was still (almost certainly) true – but we need to be a little more sophisticated in the analysis. The clue is in the arithmetic a few lines further up the page. We can see that we are basically reading 256KB chunks of the table, and we know that 256KB will expand to roughly 1MB so we ought to see a Cell Offload of about -300%; but the Cell Offload is -200%. This suggests fairly strongly that on some of the reads the decompressed data is slightly less than 1MB, which allows the cell to return the decompressed data to the database server, while some of the time the decompressed data is greater than 1MB, forcing the cell to send the original (compressed) CU to the databsae server.

We may even be able work the arithmetic backwards to estimate the number of times double decompression appeared.  Assume that two-thirds of the time the cell decompressed the data and successfully sent (just less than) 1MB back to the database server and one-third of the time the cell decompressed the data and found that the result was too large and sent 256KB of compressed data back to the server, and let’s work with the 972,000 read requests reported to see what drops out of the arithmetic:

  • Total data read: 972,000 * 256KB = 243,000 MB
  • Data sent to db server:  648,000 * 1MB + 324,000 * 256KB = 729,000 MB
  • Cell Offload = 100 * (1 – 729/243) = -200%   Q.E.D.

Of course it would be nice to avoid guessing – and if we were able to check the session activity stats (v$sessstat) while the query was running (or after it had completed) we could pick up several numbers that confirmed our suspicion. For 11.2.0.4, for example, we would keep an eye on:

	cell CUs sent uncompressed
	cell CUs processed for uncompressed
	EHCC {class} CUs Decompressed

Differences between these stats allows you to work out the number of compression units that failed the 1MB test on the cell server and were sent to the database server to be decompressed. There is actually another statistic named “cell CUs sent compressed” which would make life easy for us, but I’ve not seen it populated in my tests – so maybe it doesn’t mean what it seems to say.

Here’s an example from an 11.2.0.4 system that I presented a few years ago showing some sample numbers.

cell CUs sent uncompressed              5,601
cell CUs processed for uncompressed     5,601

EHCC CUs Decompressed                  17,903
EHCC Query High CUs Decompressed       12,302 

This reveals an annoying feature of 11g (continued in 12.1) that results in double counting of the statistics, confusing the issue when you’re trying to analyze what’s going on. In this case the table consisted of 12,302 compression units, and the query was engineered to cause the performance problem to appear. The first two statistics show us how many CUs were decompressed successfully (we’ll see a change appearing there in 12.1). We then see that all 12,302 of the table’s “query high” compression units were decompressed – but the “total” of all CUs decompressed was 17.903.

It’s not a coincidence that 12,302 + 5,601 = 17,903; there’s some double counting going on. I don’t know how many of the statistics are affected in this way, but Oracle has counted the CUs that passsed decompression once as they were processed at the cell server and again as they arrived at the database server. In this example we can infer that 12,302 – 5,601 = 6,701 compression units failed decompression at the cell server and were sent to the database server in compressed form to be decompressed again.

Here’s a couple of sets of figures from some similar tests run on 12.1.0.2 – one with a table compressed to query high another compressed to query low. There is one critical difference from the 11g figures but the same double-counting seems to have happened. In both cases the “EHCC Query [Low|High] CUs Decompressed” show the correct number of CUs in each table. Note, though that the “cell CUs processed for uncompress” in 12.1 appear to report the number of attempted decompressions rather than 11g’s number of successful decompressions.


=========================

cell CUs sent uncompressed                     19,561	-- successful decompressions at cell server
cell CUs processed for uncompressed            19,564	=> 3 failures

EHCC CUs Decompressed                          39,125	=  2 * 19,561 successes + 3 db server decompression
EHCC Query High CUs Decompressed               19,564

=========================

cell CUs sent uncompressed                     80,037	-- successful decompressions at cell server
cell CUs processed for uncompressed            82,178	=> 2,141 failures

EHCC CUs Decompressed                         162,215	=  2 * 80,037 successes + 2,141 db server decompressions
EHCC Query Low CUs Decompressed                82,178

=========================

I’ve annotated the figures to explain the arithmetic.

There has been some significant renaming and separation of statistics in 12.2, as described in this post by Roger MacNicol, and the problems of double-counting should have disappeared. I haven’t yet tested my old models in the latest versions of Oracle, though, so can’t show you anyy figures to demonstrate the change.

Takeaways

There are 4 key points to note in this posting.

  • Hash (sub)partitioning should be based on powers of 2, otherwise some partitions will be twice size of others.
  • There is a 1MB limit on the “data packet” sent between the cell server and database server in Exadata.
  • If you select a large fraction of the rows and columns from an HCC compressed table you may end up decompressing a lot of your data twice if the decompressed data for a read request is larger than the 1MB unit (and the cost will be highly visible at the database server as CPU usage).
  • The Cell Offload figure for a tablescan (in particular) will go negative if the volume of data sent from the cell server to the database server is larger than the volume of data read from the disk- even if double decompression hasn’t been happening.

A little corollary to the third point: if you are writing to a staging table with the expectation of doing an unfiltered tablescan (or a select *), then you probably don’t want to use hybrid columnar compression on the table as you will probably end up using a lot of CPU at the database server to compress it, then do double-decompression using even more CPU on the database server.  It’s only if you really need to minimise disk usage and have lots of CPU capacity to spare that you have a case for using hybrid columnar compression for the table (and Oracle In-Memory features may also change the degree of desirability).

Footnote

I haven’t said anything about accessing table data by index when the table is subject to HCC compression. I haven’t tested the mechanism in recent versions of Oracle but it used to be the case that the cell server would supply the whole compression unit (CU) to the database server which would decompress it to construct the relevant row. One side effect of this was that the same CU could be decompressed (with a high CPU load) many times in the course of a single query.

 

August 23, 2019

Optimizer Tricks 1

Filed under: CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 12:39 pm BST Aug 23,2019

I’ve got a number of examples of clever little tricks the optimizer can do to transform your SQL before starting in on the arithmetic of optimisation. I was prompted to publish this one by a recent thread on ODC. It’s worth taking note of these tricks when you spot one as a background knowledge of what’s possible makes it much easier to interpret and trouble-shoot from execution plans. I’ve labelled this one “#1” since I may publish a few more examples in the future, and then I’ll have to catalogue them – but I’m not making any promises about that.

Here’s a table definition, and a query that’s hinted to use an index on that table.


rem
rem     Script:         optimizer_tricks_01.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem

create table t1 (
        v1      varchar2(10),
        v2      varchar2(10),
        v3      varchar2(10),
        padding varchar2(100)
);

create index t1_i1 on t1(v1, v2, v3);


explain plan for
select
        /*+ index(t1 (v1, v2, v3)) */
        padding 
from 
        t1
where
        v1 = 'ABC'
and     nvl(v3,'ORA$BASE') = 'SET2'
;

select * from table(dbms_xplan.display);

The query uses the first and third columns of the index, but wraps the 3rd column in an nvl() function. Because of the hint the optimizer will generate a plan with an index range scan, but the question is – what will the Predicate Information tell us about Oracle’s use of my two predicates:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC')
       filter(NVL("V3",'ORA$BASE')='SET2')

The nvl() test is used during the index range scan (from memory I think much older versions of Oracle would have postponed the predicate test until they had accessed the table itself). This means Oracle will do a range scan over the whole section of the index where v1 = ‘ABC’, testing every index entry it finds against the nvl() predicate.

But what happens if we modify column v3 to be NOT NULL? (“alter table t1 modify v3 not null;”) Here’s the new plan:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC' AND "V3"='SET2')
       filter("V3"='SET2')


The optimizer will decide that with the NOT NULL status of the column the nvl() function can be eliminated and the predicate can be replaced with a simple column comparison. At this point the v3 predicate can now be used to reduce the number of index entries that need to be examined by using a type of skip-scan/iterator approach, but Oracle still has to test the predciate against the index entries it walks through – so the predicate still appears as a filter predicate as well.

You might notice, by the way, that the Plan hash value does not change as the predicate use changes – even though the change in use of predicates could make a huge difference to the performance. (As indicated in the comments at the top of the script, I’ve run this model against 11.2.0.4 – which is the version used in the ODC thread – and 19.3.0.0: the behaviour is the same in both versions, and the Plan hash value doesn’t change from version to version.)

Footnote

The reason why I decided to publish this note is that the original thread on the ODC forums reported the Following contradictory details – an index definition and the optimizer’s use of that index as shown in the predicate section of the plan:


Index column name      Column position
---------------------- ----------------
FLEX_VALUE_SET_ID      1
PARENT_FLEX_VALUE      2
RANGE_ATTRIBUTE        3
CHILD_FLEX_VALUE_LOW   4
CHILD_FLEX_VALUE_HIGH  5
ZD_EDITION_NAME        6

---------------------------------------------------------------------------
|* 17 |      INDEX RANGE SCAN             | FND_FLEX_VALUE_NORM_HIER_U1   |
---------------------------------------------------------------------------
  17 - access("FLEX_VALUE_SET_ID"=:B1 AND NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2')  
       filter((NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2'  ..... lots more bits of filter predicate.

Since the expression nvl(zd_edition_name, ‘ORA$BASE’) = ‘SET2’ appears as an access predicate and a filter predicate it must surely be a column in the index. So either this isn’t the definition of the index being used or, somehow, there’s a trick that allows zd_edition_name to appear as a column name in the index when it really means nvl(zd_edition_name,’ORA$BASE’) at run-time. (And if there is I want to know what it is – edition-based redefinition and tricks with virtual columns spring to mind, but I avoid thinking about complicated explanations when a simpler one might be available.)

 

August 20, 2019

Join View

Filed under: constraints,Infrastructure,Joins,Oracle — Jonathan Lewis @ 12:39 pm BST Aug 20,2019

It’s strange how one thing leads to another when you’re trying to check some silly little detail. This morning I wanted to find a note I’d written about the merge command and “stable sets”, and got to a draft about updatable join views that I’d started in 2016 in response to a question on OTN (as it was at the time) and finally led to a model that I’d written in 2008 showing that the manuals were wrong.

Since the manual – even the 19c manual – is still wrong regarding the “Delete Rule” for updatable (modifiable) join views I thought I’d quickly finish off the draft and post the 2008 script. Here’s what the manual says about deleting from join views (my emphasis on “exactly”):

Rows from a join view can be deleted as long as there is exactly one key-preserved table in the join. The key preserved table can be repeated in the FROM clause. If the view is defined with the WITH CHECK OPTION clause and the key preserved table is repeated, then the rows cannot be deleted from the view.

But here’s a simple piece of code to model a delete from a join view that breaks the rule:


rem
rem     Script:         delete_join.sql 
rem     Dated:          Dec 2008
rem     Author:         J P Lewis
rem

create table source
as
select level n1
from dual
connect by level <= 10
/ 
 
create table search
as
select level n1
from dual
connect by level <= 10
/ 

alter table source modify n1 not null;
alter table search modify n1 not null;

create unique index search_idx on search(n1);
-- create unique index source_idx on source(n1)

I’ve set up a “source” and a “search” table with 10 rows each and the option for creating unique indexes on each table for a column that’s declared non-null. Initially, though, I’ve only created the index on search to see what happens when I run a couple of “join view” deletes using “ANSI” syntax.

prompt  ===============================
prompt  Source referenced first in ANSI
prompt  ===============================

delete from (select * from source s join search s1 on s.n1 = s1.n1);
select count(1) source_count from source;
select count(1) search_count from search;
rollback;
 
prompt  ===============================
prompt  Search referenced first in ANSI
prompt  ===============================

delete from (select * from search s join source s1 on s.n1 = s1.n1);
select count(1) source_count from source;
select count(1) search_count from search;
rollback;

With just one of the two unique indexes in place the order of the tables in the inline view makes no difference to the outcome. Thanks to the unique index on search any row in the inline view corresponds to exactly one row in the source table, while a single row in the search table could end up appearing in many rows in the view – so the delete implictly has to operate as “delete from source”. So both deletes will result in the source_count being zero, and the search_count remaining at 10.

If we now repeat the experiment but create BOTH unique indexes, both source and search will be key-preserved in the join. According to the manual the delete should produce some sort of error. In fact the delete works in both cases – but the order that the tables appear makes a difference. When source is the first table in the in-line view the source_count drops to zero and the search_count stays at 10; when search is the first table in the in-line view the search_count drops to zero and the source_count stays at 10.

I wouldn’t call this totally unreasonable – but it’s something you need to know if you’re going to use the method, and something you need to document very carefully in case someone editing your code at a later date (or deciding that they could add a unique index) doesn’t realise the significance of the table order.

This does lead on to another important test – is it the order that the tables appear in the from clause that matters, or the order they appear in the join order that Oracle uses to optimise the query. (We hope – and expect – that it’s the join order as written, not the join order as optimised, otherwise the effect of the delete could change from day to day as the optimizer chose different execution plans!). To confirm my expectation I switched to traditional Oracle syntax with hints (still with unique indexes on both tables), writing a query with search as the first table in the from clause, but hinting the inline view to vary the optimised join order.


prompt  ============================================
prompt  Source hinted as leading table in join order 
prompt  ============================================

delete from (
        select 
                /*+ leading(s1, s) */
                * 
        from 
                search s,
                source s1 
        where
                s.n1 = s1.n1
        )
;

select count(1) source_count from source; 
select count(1) search_count from search;
rollback;

prompt  ============================================
prompt  Search hinted as leading table in join order 
prompt  ============================================

delete from (
        select 
                /*+ leading(s, s1) */
                * 
        from 
                search s,
                source s1 
        where
                s.n1 = s1.n1
        )
;

select count(1) source_count from source; 
select count(1) search_count from search;
rollback;

In both cases the rows were deleted from search (the first table in from clause). And, to answer the question you should be asking, I did check the execution plans to make sure that the hints had been effective:


============================================
Source hinted as leading table in join order
============================================

------------------------------------------------------------------
| Id  | Operation           | Name       | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | DELETE STATEMENT    |            |    10 |    60 |     1 |
|   1 |  DELETE             | SEARCH     |       |       |       |
|   2 |   NESTED LOOPS      |            |    10 |    60 |     1 |
|   3 |    INDEX FULL SCAN  | SOURCE_IDX |    10 |    30 |     1 |
|*  4 |    INDEX UNIQUE SCAN| SEARCH_IDX |     1 |     3 |       |
------------------------------------------------------------------

============================================
Search hinted as leading table in join order
============================================

------------------------------------------------------------------
| Id  | Operation           | Name       | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | DELETE STATEMENT    |            |    10 |    60 |     1 |
|   1 |  DELETE             | SEARCH     |       |       |       |
|   2 |   NESTED LOOPS      |            |    10 |    60 |     1 |
|   3 |    INDEX FULL SCAN  | SEARCH_IDX |    10 |    30 |     1 |
|*  4 |    INDEX UNIQUE SCAN| SOURCE_IDX |     1 |     3 |       |
------------------------------------------------------------------

Summary

Using updatable join views to handle deletes can be very efficient but the manual’s statement of the “Delete Rule” is incorrect. It is possible to have several key-preserved tables in the view that you’re using, and if that’s the case you need to play safe and ensure that the table you want to delete from is the first table in the from clause. This means taking steps to eliminate the risk of someone editing some code at a later date without realising the importance of the table order.

Update (very shortly after publication)

Iduith Mentzel has pointed out in comment #1 below that the SQL Language Reference Guide and the DBA Administration Guide are not consistent in their descriptions of deleting from a join view, and that the SQL Language Reference Guide correctly states that the delete will be applied to the first mentioned key-preserved table.

 

 

August 8, 2019

Free Space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 1:27 pm BST Aug 8,2019

Several years ago I wrote a note about reporting dba_free_space and dba_extents to produce a map of the space usage in a tablespace in anticipation of messing about with moving or rebuilding objects to try and reduce the size of the files in the tablespace.  In the related page where I published the script I pointed out that a query against dba_extents would be expensive because it makes use of structure x$ktfbue which generates the information dynamically by reading segment header blocks. I also pointed out in a footnote to the original article that if you’ve enabled the recyclebin and have “dropped” some objects then there will be some space that is reported as free but is not quite free since the extents will still be allocated. This brings me to the topic for today’s blog.

While visiting a client site recently I came across an instance that was running a regular report to monitor available space in the database. Basically this was a query against view dba_free_space. Surprisingly it was taking a rather long time to complete – and the reason for this came in two parts. First, the recyclebin was enabled and had some objects in it and secondly there were no stats on the fixed object x$ktfbue.

In the case of the client the particular query produced a plan that included the following lines:


Id  Operation             Name              Rows    Bytes  Cost (%CPU)  Time
--  --------------------- ----------------  ----   ------  -----------  --------
63  HASH JOIN                               2785     212K     46  (85)  00:00:01
64    TABLE ACCESS FULL   RECYCLEBIN$       1589    20657      7   (0)  00:00:01
65    FIXED TABLE FULL    X$KTFBUE          100K    6347K     38 (100)  00:00:01 

This is part of the view where Oracle calculates the size of all the extents of objects in the recyclebin so that they can be reported as free space. Notice that in this plan (which is dependent on version, system stats, object_stats and various optimizer parameters) the optimizer has chosen to do a hash join between the recyclebin (recyclebin$) and the x$ structure – and that has resulted in a “full tablescan” of x$ktfbue, which means Oracle reads the segment header block of every single segment in the entire database. (I don’t know where the row stats came from as there were no stats on x$ktfbue, and this plan was pulled from the AWR history tables so the query had been optimised and captured some time in the past.)

If there had been nothing in the recyclebin the hash join and two tablescans wouldn’t have mattered, unfortunately the recyclebin had been enabled and there were a few rows in recyclebin$, so the “tablescan” happened. Here’s a cut-n-paste from a much simpler query run against a fairly new (no 3rd party app) database running 12.1.0.2 to give you some idea of the impact:


SQL> execute snap_events.start_snap

PL/SQL procedure successfully completed.

SQL> select count(*) from x$ktfbue;

  COUNT(*)
----------
      8774

1 row selected.

SQL> execute snap_events.end_snap
---------------------------------------------------------
Session Events - 01-Aug 21:28:13
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
Disk file operations I/O                              7           0           0        .018           1
db file sequential read                           5,239           0          14        .003           6
SQL*Net message to client                             7           0           0        .000           0
SQL*Net message from client                           7           0       1,243     177.562         572
events in waitclass Other                             3           1           0        .002           0

PL/SQL procedure successfully completed.

On my little laptop, with nothing else going on, I’ve managed to get away with “only” 5,239 single block reads, and squeezed them all into just 14 centiseconds (local SSD helps). The clients wasn’t so lucky – they were seeing tens of thousands of real physical reads.

The ideal solution, of course, was to purge the recyclebin and disable the feature – it shouldn’t be necessary to enable it on a production system – but that’s something that ought to require at least some paperwork. In the short term gathering stats on the fixed table helped because the plan changed from a hash join with “tablescan” of x$ktfbue to a nested loop with an “indexed” access path, looking more like the following (from a query against just recyclebin$ and x$ktfbue)

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |       |       |     4 (100)|          |
|   1 |  NESTED LOOPS            |                  |     7 |   182 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL      | RECYCLEBIN$      |     6 |    66 |     4   (0)| 00:00:01 |
|*  3 |   FIXED TABLE FIXED INDEX| X$KTFBUE (ind:1) |     1 |    15 |     0   (0)|          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("BUE"."KTFBUESEGBNO"="BLOCK#" AND "BUE"."KTFBUESEGTSN"="TS#" AND
              "BUE"."KTFBUESEGFNO"="FILE#"))

This was still fairly resource-intensive for the client, but was something of an improvement – they had a lot more than 6 items in their recyclebin.

Part of the problem, of course, is that x$ktfbue is one of the objects that Oracle skips when you gather “fixed object” stats – it can be a bit expensive for exactly the reason that querying it can be expensive, all those single block segment header reads.

If you want to check the stats and gather them (as a one-off, probably) here’s some suitable SQL:


select
        table_name, num_rows, avg_row_len, sample_size, last_analyzed
from
        dba_tab_statistics
where
        owner = 'SYS'
and     table_name = 'X$KTFBUE'
;

begin
        dbms_stats.gather_table_stats('SYS','X$KTFBUE');
end;
/

Summary

You probably shouldn’t have the recyclebin enabled in a production system; but if you do, and if you also run a regular report on free space (as many sites seem to do) make sure (a) you have a regular routine to minimise the number of objects that it accumulates and (b) gather statistics (occasionally) on x$ktfbue to minimise the overhead of the necessary join between recyclebin$ and x$ktfbue.

July 12, 2019

opt_estimate 5

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 10:28 am BST Jul 12,2019

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

So here’s a simplified version of the problem – it involves pushing a predicate into a union all view. First some data and a baseline query:

rem
rem     Script:         opt_estimate_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem

create table t1
as
select
        rownum                          id,
        100 * trunc(rownum/100)-1       id2,
        mod(rownum,1e3)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        dual
connect by
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create table t2a pctfree 75 as select * from t1;
create table t2b pctfree 75 as select * from t1;

create index t2ai on t2a(id);
create index t2bi on t2b(id);

explain plan for
select
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select 'a' flag, t2a.* from t2a
         union all
         select 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
/

select * from table(dbms_xplan.display(null,null,'outline alias'))
/


There is one row with t1.id = 99, and I would like the optimizer to use an indexed access path to select the one matching row from each of the two tables in the union all view. The smart execution plan would be a nested loop using a “pushed join predicate” – and that’s exactly what we get by default with this data set:


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     1 |    15 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     1 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     1 |    15 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1        / T1@SEL$1
   3 - SET$5715CE2E / T2@SEL$1
   4 - SET$5715CE2E
   5 - SEL$639F1A6F / T2A@SEL$2
   6 - SEL$639F1A6F / T2A@SEL$2
   7 - SEL$B01C6807 / T2B@SEL$3
   8 - SEL$B01C6807 / T2B@SEL$3

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
      INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SET$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
      OUTLINE_LEAF(@"SET$5715CE2E")
      OUTLINE_LEAF(@"SEL$B01C6807")
      OUTLINE_LEAF(@"SEL$639F1A6F")
      ALL_ROWS
      OPT_PARAM('_nlj_batching_enabled' 0)
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)
   6 - access("T2A"."ID"="T1"."N1")
   8 - access("T2B"."ID"="T1"."N1")

So that worked well – operation 2 predicts one row for the tablescan of t1, with a nested loop join and union all pushed predicate where an index range scan of t2a_i1 and t2b_i1 gives us one row from each table. The “Predicate Information” tells us that the t1.n1 join predicate has been pushed inside the view to both subqueries so we see “t2a.id = t1.n1”, and “t2b.id = t1.n1”.

So what if I want to tell Oracle that it will actually find 5 rows in the t2a range scan and table access and 7 rows in the t2b range scan and table access (perhaps in a more complex view that would persuade Oracle to use two different indexes to get into the view and change the join order and access method for the next few tables it accessed). Since I’ve recently just written about the nlj_index_scan option for opt_estimate() you might think that this is the one we need to use – perhaps something like:


opt_estimate(@sel$639f1a6f nlj_index_scan, t2a@sel$2 (t1), t2a_i1, scale_rows=5)
opt_estimate(@sel$b01c6807 nlj_index_scan, t2b@sel$3 (t1), t2b_i1, scale_rows=7)

You’ll notice I’ve been very careful to find the fully qualified aliases for t2a and t2b by looking at the “Query Block Name / Object Alias” section of the plan (if the view appeared as a result of Oracle using Concatenation or OR-Expansion you would find that you got two query block names that looked similar but had suffixes of “_1” and “_2”). But it wasn’t worth the effort, it didn’t work. Fiddling around with all the possible variations I could think of didn’t help (maybe I should have used set$5715ce2e as the query block target for both the hints – no; what if I …)

Of course if we look at the “Outline Data” we’d notice that the use_nl() hint in the outline says: “USE_NL(@SEL$1 T2@SEL$1)”, so we don’t have a nested loop into t2a and t2b, we have a nested loop into the  view t2. So I decided to forget the nested loop idea and just go for the indexes (and the tables, when I got to them) with the following hints (you’ll notice that during the course of my experiments I added my own query block names to the initial query blocks – so the generated query block names have changed):



explain plan for
select
        /*+
                qb_name(main)
                opt_estimate(@sel$f2bf1101, index_scan, t2a@subq_a, t2ai, scale_rows=5)
                opt_estimate(@sel$f2bf1101, table,      t2a@subq_a,       scale_rows=5)
                opt_estimate(@sel$f4e7a233, index_scan, t2b@subq_b, t2bi, scale_rows=7)
                opt_estimate(@sel$f4e7a233, table,      t2b@subq_b,       scale_rows=7)
        */
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select /*+ qb_name(subq_a) */ 'a' flag, t2a.* from t2a
         union all
         select /*+ qb_name(subq_b) */ 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
;

select * from table(dbms_xplan.display(null,null,'outline alias'));


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     5 |    75 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     5 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     7 |   105 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     7 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Excellent – we get the cardinalities we want to see for the tables – except the view operator doesn’t hold the sum of the table cardinalities, and the join doesn’t multiply up the estimates either. I couldn’t find a way of getting the view to show 12 rows (not even with a guessed – but presumably unimplemented – opt_estimate(view …) hint!), however during the course of my experiments I tried the hint: “opt_estimate(@main, table, t2@main, scale_rows=15)”. This didn’t have any visible effect in the plan but while searching through the 10053 trace file I found the following lines:

Table Stats::
  Table: from$_subquery$_002  Alias: T2  (NOT ANALYZED)
  #Rows: 20000  SSZ: 0  LGR: 0  #Blks:  37  AvgRowLen:  15.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 9

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 20000.000000 to 300000.000000 due to opt_estimate hint

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 12.000000 to 180.000000 due to opt_estimate hint

So at some point in the code path the optimizer is aware that 5 + 7 = 12, and that 12 * 15 = 180. But this doesn’t show up in the final execution plan. You might notice, by the way, that the scale_rows=15 has been applied NOT ONLY to the place where I was aiming – it’s also been applied to scale up the 20,000 rows that are estimated to be in the union all to 300,000 as the estimate for a tablescan of the two tables.

Possibly if I spent more time working through the 10053 trace file (which, as I’ve said before, I try to avoid doing) I might have found exactly which code path Oracle followed to get to the plan it produced and managed to tweak some hints to get the numbers I wanted to see. Possibly the optimizer was already following the code path that actually produced the numbers I wanted, then “forgot” to use them. One day, perhaps, I’ll tale another look at the problem – but since I wasn’t trying to solve a problem for a client (and given that there was an alternative workaround) I closed the 10053 trace file and put the model aside for a rainy day.

Footnote

One thought did cross my mind as a way of finding out if there was a real solution – and I offer this for anyone who wants to play: create a second data set that genuinely produces the 5 and 7 I want to see (and, check that the view reports the sum of the two components); then run the original query against the original data so that you’ve got the execution plan in memory, overwrite the original data with the new data set (without changing the statistics on the orginal). Then use the SQL Tuning Advisor to see if it produces a SQL profile for the captured SQL_ID that reproduces the correct plan for the second data set and check what opt_estimate() hints it uses.  (Warning – this may turn into a frustrating waste of time.)

Update Oct 2019

I’ve been saying for years that I don’t like the trick of pulling the Outline Information from an execution plan in memory and storing it in the database as an SQL Profile because that’s effectively storing an SQL Plan Baseline as an SQL Profile and there might be subtle and (potentially) misleading side effects of abusing the two mechanisms. Behind the argument I’ve also made the observation that while both mechamisms store hints, the hints for an SQL Profile are about statistics and the hints for an SQL Plan Baseline are about transformations, joins, and other mechanis.

However .;.

I’ve now down the test I described in the foot note above – created a table with data in it that made Oracle choose full tablescans for the t2a and t2b tables, then changed the data (without changing the object statistic) and run the SQL Tuning tool to see if the optimizer would suggest the plan I wanted and offer a profile to produce it.

I was successful – Oracle offered the profile, and when I looked at it (before accepting it) it looked like this:


         1 OPT_ESTIMATE(@"SEL$1", TABLE, "T2"@"SEL$1", SCALE_ROWS=200)
         1 OPT_ESTIMATE(@"SEL$1", JOIN, ("T2"@"SEL$1", "T1"@"SEL$1"), SCALE_ROWS=15)
         1 OPTIMIZER_FEATURES_ENABLE(default)
         1 IGNORE_OPTIM_EMBEDDED_HINTS

But when I accepted it and looked at it again it looked like this:


        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
        IGNORE_OPTIM_EMBEDDED_HINTS
        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
        INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
        USE_NL(@"SEL$1" "T2"@"SEL$1")
        LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
        NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
        FULL(@"SEL$1" "T1"@"SEL$1")
        OUTLINE(@"SET$1")
        OUTLINE(@"SEL$3")
        OUTLINE(@"SEL$2")
        OUTLINE_LEAF(@"SEL$1")
        PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
        OUTLINE_LEAF(@"SET$5715CE2E")
        OUTLINE_LEAF(@"SEL$B01C6807")
        OUTLINE_LEAF(@"SEL$639F1A6F")
        ALL_ROWS
        DB_VERSION('19.1.0')
        OPTIMIZER_FEATURES_ENABLE('19.1.0')
        INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))

In other words, Oracle has recorded something that looks like an SQL Plan Baseline and called it an SQL Profile.

July 1, 2019

opt_estimate 4

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:18 pm BST Jul 1,2019

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

June 28, 2019

opt_estimate 3

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:12 pm BST Jun 28,2019

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

June 26, 2019

Glitches

Filed under: 12c,Bugs,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 5:11 pm BST Jun 26,2019

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.

 

June 25, 2019

opt_estimate 2

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 8:22 pm BST Jun 25,2019

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t change the arithmetic the optimizer uses, but it does mean the presentation of the nested loop goes back to the original pre-9i form which makes it a little easier to see costs and cardinalities adding and multiplying their way through the plan. I do not do this in production systems.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2. The optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15 * 15 = 255 and the cost comes to: 2 (t1 cost) + (15 (t1 rows) * 16 (t2 unit cost)) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name   nlj_index_scan,   target_table_alias   (list of possible driving tables),   target_index,   numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is: 2 (t1 cost) + (15 (t1 rows) * 2 (t2 unit cost)).

Let’s try to adjust the prediction that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decide to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality at operation 1 has also dropped from 225 to 47 – if the optimizer is told that its cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins (where “multi” can be greater than 2).

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints and remember we’vs disabled “nlj batching“).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted became desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

June 17, 2019

Can’t Unnest

Filed under: Conditional SQL,Execution plans,Oracle,subqueries — Jonathan Lewis @ 3:35 pm BST Jun 17,2019

In an echo of a very old “conditional SQL” posting, a recent posting on the ODC general database discussion forum ran into a few classic errors of trouble-shooting. By a lucky coincidence this allowed me to rediscover and publish an old example of parallel execution gone wild before moving on to talk about the fundamental problem exhibited in the latest query.

The ODC thread started with a question along the lines of “why isn’t Oracle using the index I hinted”, with the minor variation that it said “When I hint my SQL with an index hint it runs quickly so I’ve created a profile that applies the hint, but the hint doesn’t get used in production.”

The query was a bit messy and, as is often the case with ODC, the formatting wasn’t particularly readable, so I’ve extracted the where clause from the SQL that was used to generate the profile and reformatted it below. See if you can spot the hint clue that tells you why there might be a big problem using this SQL to generate a profile to use in the production environment:


WHERE   
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE' 
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS' 
AND     MSG.SRCH_4_FLD_VAL = '123456'   
AND     (
            (    'INVOICENUMBER' = 'INVOICENUMBER' 
             AND MSG.MSG_ID IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('123456') AS TABLE_OF_VARCHAR)))
            ) 
         OR (    'INVOICENUMBER' = 'SIEBELORDERID' 
             AND MSG.SRCH_3_FLD_VAL IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('') AS TABLE_OF_VARCHAR)))
            )
        ) 
AND     MSG.MSG_ID = TRK.INV_NUM(+) 
AND     (   TRK.RESEND_DT IS NULL 
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)   
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1   
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

If the SQL by itself doesn’t give you an inportant clue, compare it with the Predicate Information from the “good” execution plan that it produced:


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("TRK"."RESEND_DT" IS NULL OR "TRK"."RESEND_DT"=))  
   8 - filter(("MSG"."SRCH_4_FLD_VAL"='123456' AND "MSG"."MSG_CAPTR_STG_CD"='PRE_BCS'))  
   9 - access("MSG"."MSG_ID"="COLUMN_VALUE" AND "MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
       filter("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
  10 - access("MSG"."MSG_ID"="TRK"."INV_NUM")  
  13 - access("TRK1"."INV_NUM"=:B1)  

Have you spotted the thing that isn’t there in the predicate information ?

What happened to the ‘INVOICENUMBER’ = ‘INVOICENUMBER’ predicate and the ‘INVOICENUMBER’ = ‘SIEBELORDERID’ predicate? They’ve disappeared because the optimizer knows that the first predicate is always true and doesn’t need to be tested at run-time and the second one is always false and doesn’t need to be tested at run-time. Moreover both predicates are part of a conjunct (AND) – so in the second case the entire two-part predicate can be eliminated; so the original where clause can immediately be reduced to:


WHERE   
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE' 
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS' 
AND     MSG.SRCH_4_FLD_VAL = '123456'   
AND     (
                 MSG.MSG_ID IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('123456') AS TABLE_OF_VARCHAR)))
        ) 
AND     MSG.MSG_ID = TRK.INV_NUM(+) 
AND     (   TRK.RESEND_DT IS NULL 
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)   
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1   
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

Looking at this reduced predicate you may note that the IN subquery referencing the fnm_gn_in_string_list() collection could now be unnested and used to drive the final execution plan, and the optimizer will even recognize that it’s a rowsource with at most one row. So here’s the “good” execution plan:


---------------------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                               | Name                  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |  
---------------------------------------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                        |                       |      1 |        |      2 |00:00:00.08 |      12 |      7 |       |       |          |  
|   1 |  SORT ORDER BY                          |                       |      1 |      1 |      2 |00:00:00.08 |      12 |      7 |  2048 |  2048 | 2048  (0)|  
|*  2 |   FILTER                                |                       |      1 |        |      2 |00:00:00.08 |      12 |      7 |       |       |          |  
|   3 |    NESTED LOOPS OUTER                   |                       |      1 |      1 |      2 |00:00:00.08 |      10 |      7 |       |       |          |  
|   4 |     NESTED LOOPS                        |                       |      1 |      1 |      2 |00:00:00.06 |       6 |      5 |       |       |          |  
|   5 |      VIEW                               | VW_NSO_1              |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |  
|   6 |       HASH UNIQUE                       |                       |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |  1697K|  1697K|  487K (0)|  
|   7 |        COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |  
|*  8 |      TABLE ACCESS BY INDEX ROWID        | FNM_VSBL_MSG          |      1 |      1 |      2 |00:00:00.06 |       6 |      5 |       |       |          |  
|*  9 |       INDEX RANGE SCAN                  | XIE2FNM_VSBL_MSG      |      1 |      4 |      4 |00:00:00.04 |       4 |      3 |       |       |          |  
|* 10 |     INDEX RANGE SCAN                    | XPKBCS_INV_RESEND_TRK |      2 |      1 |      2 |00:00:00.01 |       4 |      2 |       |       |          |  
|  11 |    SORT AGGREGATE                       |                       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
|  12 |     FIRST ROW                           |                       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
|* 13 |      INDEX RANGE SCAN (MIN/MAX)         | XPKBCS_INV_RESEND_TRK |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
---------------------------------------------------------------------------------------------------------------------------------------------------------------  

The plan looks great – Oracle predicts a single row driver (operation 5) which can use a very good index (XIE2FNM_VSBL_MSG) in a nested loop, followed by a second nested loop, followed by a filter subquery and a sort of a tiny amount of data. Predictions match actuals all the way down the plan, and the workload is tiny. So what goes wrong in production?

You’ve probably guessed the flaw in this test. Why would anyone include a predicate like ‘INVOICENUMBER’ = ‘INVOICENUMBER’ in production code, or even worse ‘INVOICENUMBER’ = ‘SIEBELORDERID’. The OP has taken a query using bind variables picked up the actual values that were peeked when the query was executed, and substituted them into the test as literals. This has allowed the optimizer to discard two simple predicates and one subquery when the production query would need a plan that catered for the possibility that the second subquery would be the one that had to be executed and the first one bypassed. Here’s the corrected where clause using SQL*Plus variables (not the substitution type, the proper type) for the original bind variables:


WHERE
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE'
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS'
AND     MSG.SRCH_4_FLD_VAL = :BindInvoiceTo
AND     (
            (    :BindSearchBy = 'INVOICENUMBER' 
             AND MSG.MSG_ID IN (
                        SELECT  *
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST(:BindInvoiceList) AS TABLE_OF_VARCHAR)))
            )
         OR (    :BindSearchBy = 'SIEBELORDERID' 
             AND MSG.SRCH_3_FLD_VAL IN (
                        SELECT  *
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST(:BindSeibelIDList) AS TABLE_OF_VARCHAR)))
            )
        )
AND     MSG.MSG_ID = TRK.INV_NUM(+)
AND     (   TRK.RESEND_DT IS NULL
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

And this, with the “once good” hint in place to force the use of the XIE2FNM_VSBL_MSG index, is the resulting execution plan


---------------------------------------------------------------------------------------------------------  
| Id  | Operation                           | Name                  | E-Rows |  OMem |  1Mem | Used-Mem |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                    |                       |        |       |       |          |  
|   1 |  SORT ORDER BY                      |                       |      1 | 73728 | 73728 |          |  
|*  2 |   FILTER                            |                       |        |       |       |          |  
|   3 |    NESTED LOOPS OUTER               |                       |      1 |       |       |          |  
|*  4 |     TABLE ACCESS BY INDEX ROWID     | FNM_VSBL_MSG          |      1 |       |       |          |  
|*  5 |      INDEX FULL SCAN                | XIE2FNM_VSBL_MSG      |   4975K|       |       |          |  
|*  6 |     INDEX RANGE SCAN                | XPKBCS_INV_RESEND_TRK |      1 |       |       |          |  
|*  7 |    COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |       |       |          |  
|*  8 |    COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |       |       |          |  
|   9 |    SORT AGGREGATE                   |                       |      1 |       |       |          |  
|  10 |     FIRST ROW                       |                       |      1 |       |       |          |  
|* 11 |      INDEX RANGE SCAN (MIN/MAX)     | XPKBCS_INV_RESEND_TRK |      1 |       |       |          |  
---------------------------------------------------------------------------------------------------------  
 
Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter((((:BINDSEARCHBY='INVOICENUMBER' AND  IS NOT NULL) OR  
              (:BINDSEARCHBY='SIEBELORDERID' AND  IS NOT NULL)) AND ("TRK"."RESEND_DT" IS NULL OR  
              "TRK"."RESEND_DT"=)))  
   4 - filter(("MSG"."SRCH_4_FLD_VAL"=:BINDINVOICETO AND "MSG"."MSG_CAPTR_STG_CD"='PRE_BCS'))  
   5 - access("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
       filter("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
   6 - access("MSG"."MSG_ID"="TRK"."INV_NUM")  
   7 - filter(VALUE(KOKBF$)=:B1)  
   8 - filter(VALUE(KOKBF$)=:B1)  
  11 - access("TRK1"."INV_NUM"=:B1)  

The “unnested driving subquery” approach can no longer be used – we now start with the fnm_vsbl_msg table (accessing it using a most inefficient execution path because that’s what the hint does for us, and we can obey the hint), and for each row check which of the two subqueries we need to execute. There is, in fact, no way we can hint this query to operate efficiently [at least, that’s my opinion, .I may be wrong].

The story so far

If you’re going to try to use SQL*Plus (or similar) to test a production query with bind variables you can’t just use a sample of literal values in place of the bind variables (though you may get lucky sometimes, of course), you should set up some SQL*Plus variables and assign values to them.

Though I haven’t said it presiously in this article this is an example where a decision that really should have been made by the front-end code has been embedded in the SQL and passed to the database as SQL which cannot be run efficiently. The front end code should have been coded to recognise the choice between invoice numbers and Siebel order ids and sent the appropriate query to the database.

Next Steps

WIthout making a significant change to the front-end mechanism wrapper is it possible to change the SQL so something the optimizer can handle efficiently? Sometimes the answer is yes; so I’ve created a simpler model to demonstrate the basic problem and supply a solution for cases like this one. The key issue is finding a way of working around the OR clauses that are trying to allow the optimizer to choose between two subqueries but make it impossible for either to be unnested into a small driving data set.

First, some tables:


rem
rem     Script:         or_in_twice.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,371)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,372)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table t3
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,373)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


Now a query – first setting up a variable in SQL*Plus to allow us to emulate a production query with bind variables. Since I’m only going to use Explain Plan the variable won’t be peekable, so there would still be some scope for this plan not matching a production plan, but it’s adequate to demonstrate the structural problem:


variable v1 varchar2(10)
exec :v1 := 'INVOICE'

explain plan for
select
        t1.v1 
from
        t1
where
        (
            :v1 = 'INVOICE' 
        and t1.id in (select id from t2 where n1 = 0)
        )
or      (
            :v1 = 'ORDERID' 
        and t1.id in (select id from t3 where n1 = 0)
        )
;

select * from table(dbms_xplan.display);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |   150 |    26   (4)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    26   (4)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL| T3   |     1 |     8 |    26   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:V1='INVOICE' AND  EXISTS (SELECT 0 FROM "T2" "T2" WHERE
              "ID"=:B1 AND "N1"=0) OR :V1='ORDERID' AND  EXISTS (SELECT 0 FROM "T3"
              "T3" WHERE "ID"=:B2 AND "N1"=0))
   3 - filter("ID"=:B1 AND "N1"=0)
   4 - filter("ID"=:B1 AND "N1"=0)

As you can see, thanks to the OR that effectively gives Oracle the choice between running the subquery against t3 or the one against t2, Oracle is unable to do any unnesting. (In fact different versions of Oracle allow different levels of sophistication with disjuncts (OR) of subqueries, so this is the kind of example that’s always useful to keep for tests against future versions.)

Since we know that we are going to use one of the data sets supplied in one of the subqueries and have no risk of double-counting or eliminating required duplicates, one strategy we could adopt for this query is to rewrite the two subqueries as a single subquery with a union all – because we know the optimizer can usually handle a single IN subquery very nicely. So let’s try the following:


explain plan for
select
        t1.v1
from
        t1
where
        t1.id in (
                select  id 
                from    t2 
                where   n1 = 0
                and     :v1 = 'INVOICE'
                union all
                select  id 
                from    t3 
                where   n1 = 0
                and     :v1 = 'ORDERID'
        )
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |    54 |  1512 |    77   (3)| 00:00:01 |
|*  1 |  HASH JOIN             |          |    54 |  1512 |    77   (3)| 00:00:01 |
|   2 |   VIEW                 | VW_NSO_1 |    54 |   702 |    51   (2)| 00:00:01 |
|   3 |    HASH UNIQUE         |          |    54 |   432 |    51   (2)| 00:00:01 |
|   4 |     UNION-ALL          |          |       |       |            |          |
|*  5 |      FILTER            |          |       |       |            |          |
|*  6 |       TABLE ACCESS FULL| T2       |    27 |   216 |    26   (4)| 00:00:01 |
|*  7 |      FILTER            |          |       |       |            |          |
|*  8 |       TABLE ACCESS FULL| T3       |    27 |   216 |    26   (4)| 00:00:01 |
|   9 |   TABLE ACCESS FULL    | T1       | 10000 |   146K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="ID")
   5 - filter(:V1='INVOICE')
   6 - filter("N1"=0)
   7 - filter(:V1='ORDERID')
   8 - filter("N1"=0)


Thanks to the FILTERs at operations 5 and 7 this plan will pick the data from just one of the two subqueries, reduce it to a unique list and then use that as the build table to a hash join. Of course, with different data (or suitable hints) that hash join could become a nested loop using a high precision index.

But there’s an alternative. We manually rewrote the two subqueries as a single union all subquery and as we did so we moved the bind variable comparisons inside their respective subqueries; maybe we don’t need to introduce the union all. What would happen if we simply take the original query and move the “constant” predicates inside their subqueries?


explain plan for
select
        t1.v1
from
        t1
where
        t1.id in (select id from t2 where n1 = 0 and :v1 = 'INVOICE')
or      t1.id in (select id from t3 where n1 = 0 and :v1 = 'ORDERID')
;

select * from table(dbms_xplan.display);


-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |    54 |  1512 |    77   (3)| 00:00:01 |
|*  1 |  HASH JOIN             |          |    54 |  1512 |    77   (3)| 00:00:01 |
|   2 |   VIEW                 | VW_NSO_1 |    54 |   702 |    51   (2)| 00:00:01 |
|   3 |    HASH UNIQUE         |          |    54 |   432 |    51   (2)| 00:00:01 |
|   4 |     UNION-ALL          |          |       |       |            |          |
|*  5 |      FILTER            |          |       |       |            |          |
|*  6 |       TABLE ACCESS FULL| T3       |    27 |   216 |    26   (4)| 00:00:01 |
|*  7 |      FILTER            |          |       |       |            |          |
|*  8 |       TABLE ACCESS FULL| T2       |    27 |   216 |    26   (4)| 00:00:01 |
|   9 |   TABLE ACCESS FULL    | T1       | 10000 |   146K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="ID")
   5 - filter(:V1='ORDERID')
   6 - filter("N1"=0)
   7 - filter(:V1='INVOICE')
   8 - filter("N1"=0)

In 12.2.0.1 and 18.3.0.0 it gets the same plan as we did with our “single subquery” rewrite – the optimizer is able to construct the union all single subquery (although the ordering of the subqueries has been reversed) and unnest without any other manual intervention. (You may find that earlier versions of Oracle don’t manage to do this, but you might have to go all the way back to 10g.

Conclusion

Oracle doesn’t like disjuncts (OR) and finds conjuncts (AND) much easier to cope with. Mixing OR and subqueries is a good way to create inefficient execution plans, especially when you try to force the optimizer to handle a decision that should have been taken in the front-end code. The optimizer, however, gets increasingly skilled at handling the mixture as you move through the newer versions; but you may have to find ways to give it a little help if you see it running subqueries as filter subqueries when you’re expecting it to unnest a subquery to produce a small driving data set.

 

Next Page »

Powered by WordPress.com.