Oracle Scratchpad

April 10, 2024

AWR Snap ID

Filed under: AWR,Oracle,Troubleshooting — Jonathan Lewis @ 9:17 am BST Apr 10,2024

What to do when you hit a problem (possibly after an incomplete recovery) that reports an “ORA-00001 unique key violation” on sys.wrm$_snapshot_pk – as reported recently in this thread on the MOSC SQL Performance forum (needs a MOS account.)

Snapshot ids are carefully sequenced, without gaps, so somehow the thing that controls the “current” sequence number has gone backwards and is trying to generate a value that is lower than the current highest value in wrm$_snapshot. The thread I referenced above does point to an article dated 2017 on Alibaba discussing methods of checking for corruption and clearing up messes; but as an extra option you could simply try hacking the control table to set the “last used” snapshot id so something higher than the highest value currently in wrm$_snapshot. The table you need to hack is wrm$_wr_control and here’s an example of its contents from an instance of 19.11 (preceded by a check of the current maximum snap_id in wrm$_snapshot):

SQL> select max(snap_id) max_snap_id, max(end_interval_time) max_snap_time from wrm$_snapshot;

MAX_SNAP_ID MAX_SNAP_TIME
----------- ---------------------------------------------------------------------------
       7304 09-APR-24 07.00.14.180 PM

SQL> execute print_table('select * from wrm$_wr_control')
DBID                           : 3158514872
SNAP_INTERVAL                  : +00000 01:00:00.0
SNAPINT_NUM                    : 3600
RETENTION                      : +00008 00:00:00.0
RETENTION_NUM                  : 691200
MOST_RECENT_SNAP_ID            : 7304
MOST_RECENT_SNAP_TIME          : 09-APR-24 07.00.15.169 PM
MRCT_SNAP_TIME_NUM             : 1712685600
STATUS_FLAG                    : 2
MOST_RECENT_PURGE_TIME         : 09-APR-24 08.35.57.430 AM
MRCT_PURGE_TIME_NUM            : 1712648156
MOST_RECENT_SPLIT_ID           : 7295
MOST_RECENT_SPLIT_TIME         : 1712648156
SWRF_VERSION                   : 30
REGISTRATION_STATUS            : 0
MRCT_BASELINE_ID               : 0
TOPNSQL                        : 2000000000
MRCT_BLTMPL_ID                 : 0
SRC_DBID                       : 3158514872
SRC_DBNAME                     : CDB$ROOT
T2S_DBLINK                     :
FLUSH_TYPE                     : 0
SNAP_ALIGN                     : 0
MRCT_SNAP_STEP_TM              : 1712685613
MRCT_SNAP_STEP_ID              : 0
TABLESPACE_NAME                : SYSAUX
-----------------
1 row(s) selected

PL/SQL procedure successfully completed.

Of course you ought to update the most_recent_snap_time as well, and the mrct_snap_time_num (which looks like the number of seconds since 1st Jan 1900 GMT (on my instance)).

Then there’s the interpretation and sanity checking of the other “most recent / mrct” columns to worry about, and the possibility of PDBs vs. CDBs – but those are topics that I’m going to leave to someone else to worry about.

April 2, 2024

Index Usage – 3

Filed under: Indexing,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 4:34 pm BST Apr 2,2024

In the second part of this series I described some of the technicalities of Index Usage Tracking and showed an example of what I was doing to test the feature. In this episode I’ll describe some of the index access methods I’ve tested and report the results. I’ve listed the tests I’ve planned so far and will show results as I find time to run the tests – if you can think of more cases add them in the comments and I’ll extend the list. (If you think a test is a very good idea, “upvote” it in the comments and I’ll try to run it sooner.

Before I’ve done all the tests I’ll add a section on Conclusions and Recommendations. This will be addressing the questions: “Is the feature worth using?” and “What’s the best way to use it”. This section of the note may develop over time as special cases or boundary conditions show up.

Setup (recap from pt.2)

I’ll be using three session to do my testing:

  • Session 1 logged in as the test user to execute test statements and query dba_index_usage.
  • Session 2 logged in as sys in the same pdb to query x$keiut and x$keiut_info/v$index_usage_info (real time summaries)
  • Session 3 logged in as the oracle s/w owner and connected as sys in the cdb$root to call keiutflush to make mmon flush x$keiut to disk. The session may have to “alter session set container = {the test pdb}” to flush the right x$keiut.
  • CDB SYS session
    • call keiutFlush twice to clear all active elements from x$keiut_info/v$index_usage_info (check contents)
  • PDB SYS session
    • check x$keiut is empty
  • User session:
    • check dba_index_usage for targetted index(es)
    • execute some statements forcing use of index and check execution plan
  • PDB sys session:
    • check contents of x$keiut and x$keiut_info/v$index_usage_Info
  • CDB sys session
    • call keiutFlush
  • End user session
    • Check contents of dba_index_usage (report changes)

Presentation

For each test I’ll show (in collapsed form) the SQL I used to define the objects being tested, and describe the purpose and method of the test. Then I’ll simply show the critical changes in dba_index_usage and, in some cases, x$keiut_info/v$index_usage_info, x$keiut as a result of the test.

Initial list of tests

  • Gathering index stats
  • “Foreign Key” indexes and primary key indexes during referential integrity
  • Single table, single column index, multiple statements
    • Unique index (equality and range predicates)
    • non-unique index covering unique constraint
    • non-unique index generally
  • Inlist iterator and union all.
  • Nested loop joins
  • Single table, Multi-column index (access/filter, and skip scan)
  • Index-only query (range scan, full scan, fast full scan, index joins)
  • Bitmap access (with multiple bitmaps per key value)
    • Bitmap and / or / minus
  • IOT accessed by primary key
  • IOT accessed by secondary key – ensuring “misses on guesses”
  • DML – single table access
  • DML – with subquery / join methods embedded
  • Locally partitioned index (single table partition, multiple table partition)
  • Globally partitioned index

Test Data

I’ve used the following script to generate common data for many of the initial tests in the list above – I may extend this script as I complete more and more of the tests, though I may also introduce separate scripts for some specific cases:

rem
rem     Script:         iut_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem     To be tested
rem             23.3.0.0
rem             12.2.0.1
rem
rem     Notes:
rem     Create some test data to demonstrate index usage tracking
rem

create table t1 as
select
        rownum - 1                              id,
        mod(rownum-1,10000)                     n1,
        trunc((rownum - 1)/20)                  col1,
        trunc((rownum - 1)/10)                  col2,
        rownum - 1                              col3,
        round(100 * dbms_random.normal())       rand,
        cast(rpad(rownum,25) as varchar2(25))   v1,
        cast(rpad('x',80,'x') as varchar2(80))  padding
from
        all_objects
where
        rownum <= 50000
/

create table t2 as select * from t1;

-- single column primary key index (unique)

alter table t1 add constraint t1_pk primary key (id) 
        using index (
        create unique index t1_pk on t1(id)
        );

create index t1_col3 on t1(col3);
alter table t1 add constraint t1_uk unique (col3);

create index t1_3col on t1(col1, col2, col3) compress 2;
create index t1_rand on t1(rand);

-- multi-column primary key index (unique)

alter table t2 add constraint t2_pk primary key (col1, id) 
        using index (
        create unique index t2_pk on t2(col1, id) compress
        );

-- "foreign key index" (one to many)

alter table t2 add constraint t2_fk_t1 foreign key (col3) references t1(id);
create index t2_col3 on t2(col3);

Gathering Stats

The SQL used to gather index stats from any of the calls to dbms_stats.gather_xxx_stats() does a simple select statement that is hinted to access the index. In my case the indexes were all fairly small – smaller than the critical number of blocks that trigger sampling methods – so Oracle examined every block and row in the index, leading to (a check in) index usage stats looking like the following t1 primary key example:

OBJECT_ID                      : 209180
NAME                           : T1_PK
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 1
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 50000
BUCKET_1000_PLUS_ACCESS_COUNT  : 1
BUCKET_1000_PLUS_ROWS_RETURNED : 50000
LAST_USED                      : 01-apr-2024 13:48:51

So gathering stats does flag the index as used – but with the enhanced tracking it’s going to be relatively easy to spot cases where a small number of accesses account for a large number of rows – hinting that these accesses may be only for stats collection.

If you’ve read the linked article you’ll see how Oracle’s sampling strategy has changed in recent years, so a check that will be relevant to some of your indexes is whether or not the average rows returned is equivalent to roughly 1,140 leaf blocks. Other simple checks that might be appropriate are: “is the very large access happening once per day/week” in line with your expected stats collection strategy.

Of course, if large access counts are happening fairly frequently you can raise the question – does this look an appropriate result for the table or index, or does it hint at an index that is used when it should not be, or an index that needs to be refined (e.g. by the addition of extra columns to allow better elimination before visiting the table).

Referential Integrity (and DML)

With referential integrity in place Oracle will (in the most commonly used setup)

  • check that matching child rows do not exist when you try to delete a parent or modify its key value – will either operation flag a supporting “foreign key” index as used.
  • check that the parent row exists if you try to insert a child row or update a child row to associate it with a different parent value – will either operation flag the parent primary key index as used

In my test script t1 is the parent and t2 is the child. The referential integrity is from t2.col3 to t1.id and the relevant indexes are unique indexes named t2_col3 and t1_pk respectively. Test statements are:

-- Delete parent when child exists
-- Delete parent when child deleted
-- --------------------------------

delete from t1 where id = 17000;
-- ORA-02292: integrity constraint (TEST_USER.T2_FK_T1) violated - child record found

delete from t2 where col3 = 17000;
delete from t1 where id   = 17000;

commit;

-- Insert into child without parent
-- Insert into child with    parent
-- --------------------------------

insert into t2 values(60000, 1,1,1,60000,1,'x','x');
-- ORA-02291: integrity constraint (TEST_USER.T2_FK_T1) violated - parent key not found

insert into t2 values (18000,1,1,1,15000,1,'x','x');

commit;

-- update parent to "abandon" child
-- --------------------------------

update t1 set id = 60000 where id = 25000;
-- ORA-02292: integrity constraint (TEST_USER.T2_FK_T1) violated - child record found

-- update child to change to non-existent parent
-- update child to change to pre-existing parent
-- ---------------------------------------------

update t2 set col3 = 60000 where id = 25000;
-- ORA-02291: integrity constraint (TEST_USER.T2_FK_T1) violated - parent key not found

update t2 set col3 = 30000 where id = 25000;

commit;

After calling keiutFlush and checking that there were no active elements in x$keiutinfo/v$index_usage_Info, and no rows in x$keiut I executed all the statements above one after the other (some failed, of course, with their errors shown above). Nothing was captured in x$keiut.

Apart from the implications of “foreign key” indexes not being flagged as used during referential integrity checks, the tests above also show us that updates and deletes driven by index access do not show the driving index flagged as used: t1.id = constant, t2.col3 = constant, t2.id = constant (which used an index skip scan on t2_pk)).

Single table, single column index

This set of tests is close to a repeat of the first demonstration in part 2. Here’s the list of statements aimed at index t1_pk. Note that t1_pk(id) is a unique index on a primary key constraint, t1_uk(col3) is a non-unique index covering a unique constraints, t2_col3(col3) is a single column non-unique index.

--
--      Out of range, no rows
--
select v1 from t1 where id = -1;
select v1 from t1 where id = 60001;

--
--      Single row
--
select v1 from t1 where id = 1000;
select v1 from t1 where id = 2000;

--
--      multiple rows: 5, 50, 500, 1500
--
select max(v1) from t1 where id between 11 and 15;
select max(v1) from t1 where id between 101 and 150;
select max(v1) from t1 where id between 1001 and 1500;
select max(v1) from t1 where id between 10001 and 11500;

Here are the results after the sequence: call keiutFlush, execute test SQL, report x$keiut, call keiutFlush, report change in dba_index_usage:

SQL> select objname, num_starts, num_execs, rows_returned from x$keiut;

OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
-------------------------------- ---------- ---------- -------------
TEST_USER.T1_PK                           8          8          2057


SQL> host expand temp1.lst
OBJECT_ID                      : 209180
NAME                           : T1_PK
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 8
TOTAL_EXEC_COUNT               : 8
TOTAL_ROWS_RETURNED            : 2075
BUCKET_0_ACCESS_COUNT          : 2
BUCKET_1_ACCESS_COUNT          : 2
BUCKET_2_10_ACCESS_COUNT       : 1
BUCKET_2_10_ROWS_RETURNED      : 5
BUCKET_11_100_ACCESS_COUNT     : 1
BUCKET_11_100_ROWS_RETURNED    : 50
BUCKET_101_1000_ACCESS_COUNT   : 1
BUCKET_101_1000_ROWS_RETURNED  : 500
BUCKET_1000_PLUS_ACCESS_COUNT  : 1
BUCKET_1000_PLUS_ROWS_RETURNED : 1500
LAST_USED                      : 02-apr-2024 15:29:06

You’ll note that I executed 8 select statements, and expected a total of 2,057 rows (index rowids) being passed to the table access operation, and the changes in stats shown in dba_index_usage are an exact match for the predictions above the listed SQL statements.

Since t1.col3 is an exact match of t1.id, and since t2 is a duplicate of t1, it seems likely that tests that start by cloning the SQL and changing the column or table name as appropriate would give the matching results – and they do, so I won’t bother to print them all up.

There is one final test of a single column index before I move on to simple queries targeting a multi-column index. Here’s the statement I want to test to confirm an important point:

select v1 from t1 where id between 101 and 150 and mod(id,10) = 0;

This query will scan through 50 index entries, discarding all but 5 of them, returning 5 rows from the table. The key question is this – will dba_index_usage report 50 rows accessed or 5 rows accessed. Here’s what the changes in stats looked like after the test

TOTAL_ACCESS_COUNT             : 1
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 5
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 1
BUCKET_2_10_ROWS_RETURNED      : 5
BUCKET_11_100_ACCESS_COUNT     : 0
BUCKET_11_100_ROWS_RETURNED    : 0
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0

This is a very important point: the stats in dba_index_usage do not tell us how many rows (index entries) we visited in the index, they tell us how many rowids (or, possibly, key values) survived to be passed to the parent operation (typically the table access). So you might look at some stats that say: “25,000 executions, all in the 2 to 10 range – good index” when you’ve got a badly designed index does 90% of the total work of a query and discards 2,000 index entries for each rowid it uses to access a table.

Inlist Iterators, Union All and mutiple references

Here’s a sample query (with its result set, and actual execution plan pulled from memory) accessing the t1 table through the index on column rand. If you try to repeat this example it probably won’t give you exactly the same results because I used Oracle’s random number generator to generate a normal distribution of integer values (with mean zero and standard deviation of 100), but there’s a reasonable chance that you’ll see similar numbers in your output as I’ve been careful to pick three values that should return significantly different numbers of rows:

select
        rand, count(*) 
from    t1 
where   rand in (100, 200, 300)
and     v1 is not null
group by 
        rand 
order by 
        count(*)
/


      RAND   COUNT(*)
---------- ----------
       300          2
       200         25
       100        114

3 rows selected.

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------
SQL_ID  fbwmnax39jqtq, child number 0
-------------------------------------
select         rand, count(*) from    t1 where   rand in (100, 200,
300) and     v1 is not null group by         rand order by
count(*)

Plan hash value: 874747030

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |         |      1 |        |      3 |00:00:00.01 |     143 |       |       |          |
|   1 |  SORT ORDER BY                 |         |      1 |      3 |      3 |00:00:00.01 |     143 |  2048 |  2048 | 2048  (0)|
|   2 |   SORT GROUP BY NOSORT         |         |      1 |      3 |      3 |00:00:00.01 |     143 |       |       |          |
|   3 |    INLIST ITERATOR             |         |      1 |        |    141 |00:00:00.01 |     143 |       |       |          |
|*  4 |     TABLE ACCESS BY INDEX ROWID| T1      |      3 |    220 |    141 |00:00:00.01 |     143 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | T1_RAND |      3 |    220 |    141 |00:00:00.01 |       5 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("V1" IS NOT NULL)
   5 - access(("RAND"=100 OR "RAND"=200 OR "RAND"=300))

This plan is probably the simplest demonstration of the difference between total_exec_count and total_access_count. Here’s the output from x$keiut after this test, followed by the details available from dba_index_usage after calling keiutFlush:

OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
-------------------------------- ---------- ---------- -------------
TEST_USER.T1_RAND                         3          1           141

Given that x$keiut has reported three “starts”, and we can see that the plan shows three “Starts” and “A-rows” is 141, and the output shows three rows totalling 141 we might hope that dba_index_usage is going to show us exactly what happened with this query. Here are the results:

SQL> execute print_tABLE(q'(select * from dba_index_usage where name = 'T1_RAND')')
OBJECT_ID                      : 209738
NAME                           : T1_RAND
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 3
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 141
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 0
BUCKET_2_10_ROWS_RETURNED      : 0
BUCKET_11_100_ACCESS_COUNT     : 3
BUCKET_11_100_ROWS_RETURNED    : 141
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0
LAST_USED                      : 02-apr-2024 19:02:03

The three “starts” from x$keiut show up in the total_access_count but the three separate accesses have been accumulated and averaged to appear as 3 accesses in the 11-100 range. (141 / 3 = 47). So there’s some loss of precision in the information. It’s better than just the old “yes/no” report, and in many cases it’s likely to give you numbers that close enough to the right ballpark to be useful, but there are likely to be some cases where the more detailed information would be more beneficial. It’s worth mentioning that the total_access_count (hence the averaging) will include “Starts” where no rows were returned.

Once upon a time, of course, a query involving an inlist might have been rewritten as a query with union all, such as:

select
        rand, count(v1)
from    t1
where   rand = 100
group by
        rand
union all
select
        rand, count(v1)
from    t1
where   rand = 200
group by
        rand
union all
select
        rand, count(v1)
from    t1
where   rand = 300
group by
        rand
;

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |      3 |00:00:00.01 |     144 |
|   1 |  UNION-ALL                    |         |      1 |        |      3 |00:00:00.01 |     144 |
|   2 |   SORT GROUP BY NOSORT        |         |      1 |    128 |      1 |00:00:00.01 |     113 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1      |      1 |    128 |    114 |00:00:00.01 |     113 |
|*  4 |     INDEX RANGE SCAN          | T1_RAND |      1 |    128 |    114 |00:00:00.01 |       2 |
|   5 |   SORT GROUP BY NOSORT        |         |      1 |     61 |      1 |00:00:00.01 |      27 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T1      |      1 |     61 |     25 |00:00:00.01 |      27 |
|*  7 |     INDEX RANGE SCAN          | T1_RAND |      1 |     61 |     25 |00:00:00.01 |       2 |
|   8 |   SORT GROUP BY NOSORT        |         |      1 |     61 |      1 |00:00:00.01 |       4 |
|   9 |    TABLE ACCESS BY INDEX ROWID| T1      |      1 |     61 |      2 |00:00:00.01 |       4 |
|* 10 |     INDEX RANGE SCAN          | T1_RAND |      1 |     61 |      2 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("RAND"=100)
   7 - access("RAND"=200)
  10 - access("RAND"=300)

Does this make any difference to the way that Oracle reports the executions, starts, and rows returned? Here’s the change in dba_index_usage due to this query:

NAME                           : T1_RAND
TOTAL_ACCESS_COUNT             : 3
TOTAL_EXEC_COUNT               : 3
TOTAL_ROWS_RETURNED            : 141
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 1
BUCKET_2_10_ROWS_RETURNED      : 2
BUCKET_11_100_ACCESS_COUNT     : 1
BUCKET_11_100_ROWS_RETURNED    : 25
BUCKET_101_1000_ACCESS_COUNT   : 1
BUCKET_101_1000_ROWS_RETURNED  : 114
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0

In this case three separate accesses through the t1_rand index have been recorded individually in the correct buckets – with one interesting anomaly: the total execution count is 3 rather than 1. This suggests that the “total_exec_count” is a little more complicated than I originally thought. For each execution of a single query the “total_exec_count” seems to be the “total_operation_count” and possibly, for “conditional SQL”, the count of operations that were actually called.

Nested Loops

After seeing how handled the inlist iterator, the obvious next step was to repeat the experiment but use a table to hold the values I’d selected for the inlist and force a nested loop join. Here’s the table creation followed by the query and its execution plan:

create table t3 (n1 number);

insert into t3 values(100);
insert into t3 values(200);
insert into t3 values(300);
commit;

execute dbms_stats.gather_table_stats(user,'t3')

alter table t3 add constraint t3_uk unique(n1);

select
        /*+
                leading(t3 t1)
                use_nl(t1)
                index(t1(rand))
        */
        t3.n1,
        max(t1.v1)
from
        t3, t1
where
        t1.rand = t3.n1
group by
        t3.n1
/


PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  7mxmx715yfb73, child number 0
-------------------------------------
select  /*+   leading(t3 t1)   use_nl(t1)   index(t1(rand))  */  t3.n1,
 max(t1.v1) from  t3, t1 where  t1.rand = t3.n1 group by  t3.n1

Plan hash value: 3159923631

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |      3 |00:00:00.01 |     147 |
|   1 |  SORT GROUP BY NOSORT         |         |      1 |      3 |      3 |00:00:00.01 |     147 |
|   2 |   NESTED LOOPS                |         |      1 |    220 |    141 |00:00:00.01 |     147 |
|   3 |    NESTED LOOPS               |         |      1 |    220 |    141 |00:00:00.01 |       9 |
|   4 |     INDEX FULL SCAN           | T3_UK   |      1 |      3 |      3 |00:00:00.01 |       2 |
|*  5 |     INDEX RANGE SCAN          | T1_RAND |      3 |     73 |    141 |00:00:00.01 |       7 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T1      |    141 |     73 |    141 |00:00:00.01 |     138 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T1"."RAND"="T3"."N1")

As you can the plan involves two indexes, and (after the usual start of called keiutFlush) executing the query ends up with x$keiut reporting two rows:

OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
-------------------------------- ---------- ---------- -------------
TEST_USER.T1_RAND                         3          1           141
TEST_USER.T3_UK                           1          1             3

The num_starts and rows_returned happily agree with the Starts and A-rows values in the plan and after a second call to keiutFlush the changes in dba_index_usage are as follows:

NAME                           : T1_RAND
TOTAL_ACCESS_COUNT             : 3
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 141
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 0
BUCKET_2_10_ROWS_RETURNED      : 0
BUCKET_11_100_ACCESS_COUNT     : 3
BUCKET_11_100_ROWS_RETURNED    : 141
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0

NAME                           : T3_UK
TOTAL_ACCESS_COUNT             : 1
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 3
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 1
BUCKET_2_10_ROWS_RETURNED      : 3
BUCKET_11_100_ACCESS_COUNT     : 0
BUCKET_11_100_ROWS_RETURNED    : 0
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0

Again we see that the figures agree with the execution plan but, just as we saw with the inlist iterator, the level of detail reported for rows returned does go down to the individual accesses, it simply averages (for the execution) the total rows returned across the total access count to assign the work done in index t1_uk to the 11 to 100 bucket.

Prompted by the reporting of the union all query in the previous session it’s worth looking at a slight more complex join to clarify what an “exec” is. Using the same t1 and t2 tables, here’s a suitablu silly query:

select
        count(t2.v1)
from
        t2,
        t1 t1a,
        t1 t1b
where
        t1a.id = t2.n1
and     t1b.id = 10 * t2.n1
/

SQL_ID  917pamcvnmtsk, child number 0
-------------------------------------
select  count(t2.v1) from  t2,  t1 t1a,  t1 t1b where  t1a.id = t2.n1
and t1b.id = 10 * t2.n1

Plan hash value: 1623429891

----------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |      1 |00:00:00.24 |    5306 |
|   1 |  SORT AGGREGATE      |       |      1 |      1 |      1 |00:00:00.24 |    5306 |
|   2 |   NESTED LOOPS       |       |      1 |  50000 |  25000 |00:00:00.24 |    5306 |
|   3 |    NESTED LOOPS      |       |      1 |  50000 |  50000 |00:00:00.13 |    1662 |
|   4 |     TABLE ACCESS FULL| T2    |      1 |  50000 |  50000 |00:00:00.02 |     960 |
|*  5 |     INDEX UNIQUE SCAN| T1_PK |  50000 |      1 |  50000 |00:00:00.07 |     702 |
|*  6 |    INDEX UNIQUE SCAN | T1_PK |  50000 |      1 |  25000 |00:00:00.08 |    3644 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T1A"."ID"="T2"."N1")
   6 - access("T1B"."ID"=10*"T2"."N1")

You’ll notice that there are two “index unique scan” operations, both starting 50,000 times; and that the second operation returns a row in only 25,000 of those starts thanks to the definition of n1 and the “10 x n1” in my predicate. Here’s the corresponding report from x$keiut.

OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
-------------------------------- ---------- ---------- -------------
TEST_USER.T1_PK                      100000          2         75000

We see num_execs = 2 – i.e. one for each time the index appears in an operation in the plan. Then, as expected, num_starts and rows_returned match the sum of the two Starts and A-Rows. But here’s the change in dba_index_usage – spot the oddity:

NAME                           : T1_PK
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 100000
TOTAL_EXEC_COUNT               : 2
TOTAL_ROWS_RETURNED            : 75000
BUCKET_0_ACCESS_COUNT          : 50000
BUCKET_1_ACCESS_COUNT          : 50000

Oracle reports the “execute count” of 2; and the total rows returned as 75000 – but it seems that that 75,000 is made up of 50,000 accesses that returned 1 row (which we know happened), and 50,000 accesses that returned 0 rows (which is not the way things went).

It’s going to be very easy for me to fool myself that I know what Oracle is doing here – it’s going to take several more experiments to get a good idea of what information Oracle is recording and when and how it’s accumulating it (or not) between calls to keiutFlush.

In this simple case where we’ve executed one query with two operations it would be easy to assume that Oracle has identified and summarised the two operations separately, introducing some simple rounding errors.

  • The first operation starts 50,000 times and returns 50,000 rows so, on average, it should be counted in the “1 access” bucket 50,000 times.
  • The second operation starts 50,000 times and returns 25,000 rows so, on average and with rounding (down), that’s 0 rows per start. Hence a count of 50,000 in the “0 access” bucket.

It seems like a viable explanation – but we might want to fiddle with slightly different queries, and do some tests of multiple executions of the same query (while the data changes), and some tests with slightly different but similarly structured queries, to see if Oracle is accumulating on a “per query execution basis”, or doing something more complex, or something more naive.

The other thing to note about the way that dba_index_usage has captured the statistics of this report is that it looks as if it depends either on rowsource execution statistics or the SQL Monitor statistics being gathered on every execution – and that could lead to significant overheads (in CPU, most probably).

Conclusions and Suggestions

Calls to gather index stats execute queries that will be captured by index usage tracking, so you need a strategy to help you ignore “BUCKET_1000_PLUS_ROWS_RETURNED” accesses that are about stats collection, while taking action when they are indications of index usage that could be made more efficient.

Indexes used during referential integrity checks are not flagged as used. This is probably not much of a threat for the relevant (primary/unique key) indexes on the parent end since you are unlikely to drop indexes that enforce uniqueness even if they don’t appear to be used; however it does mean that there is some risk of dropping a “foreign key” index that is needed to avoid locking problems.

Indexes used to drive delete and update statements are not captured by the new mechanism – at least for simple “delete from single_table where …” and “update table where …”. This could result in indexes being dropped that have been created to enhance DML performance. (There are outstanding tests for DML using index-driven subqueries or join views to see if indexes used to access other tables in the statement do get flagged.)

The statistics about “rows returned” tell you how many index entries are passed to the parent operation, not about the number of index entries examined; so a frequently used index that consistently reports a very small number of “rows returned” is not necessarily an efficient, well-designed index.

The stats on inlist iterators are a little disappointing: while the number of iterations appears as the total_access_count, the accesses are all allocated to the same bucket on the basis of total_rows_returned / total_access_count and no allowance is made for different values in the list returning significantly different numbers of rows. The same is true of the index accesses for the second table of a nested loop join. On the other hand if an inlist query is rewritten as a union all query then the stats for each branch of the union all are captured separately – this means access counts and rows returned are reported against the correct bucket, but as a side effect the execution count is reported as the number of branches that have used the index, executed rather than the number of executions of the query.

Tests with slightly more complex queries show that the total_exec_count relates to the number of times an index is named in an execution plan operation, not just the number of times the query is executed; and the slightly more complex queries make it easy to see side effects of the rounding of “total_row_count / total_access_count” (particularly the “0 bucket” and “1 bucket” stats) that make the overall results look strangely inconsistent. In my (current) opinion, though, it’s probably the case that the 0/1 level where the inconsistency is most visible is where it’s also least important.

March 13, 2024

Querying LONGs

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 2:19 pm GMT Mar 13,2024

Update for 23c: If your only need for using LONGs in predicates is to query the partitioning views by high_value you won’t need to read this note as the views now expose columns high_value_clob and high_value_json. (See comment #3 below.)

Despite their continued presence in the Oracle data dictionary, LONG columns are not an option that anyone should choose; you can’t do much with them and they introduce a number of strange space management problems. Nevertheless a recent thread on the Oracle database forum started with the following question: “How do you use LONG columns in a WHERE clause?”. The basic answer is: “You don’t”.

This user wanted to query all_tab_partitions for a given table_name and high_value, and the high_value is (still) a LONG, so attempts to use it resulted in Oracle error “ORA-00997: illegal use of LONG datatype”. A possible, and fairly obvious but undesirable, solution to the requirement is to write a PL/SQL function to read the current row from all_tab_partitions and returns the first N characters of the high_value as a varchar2(). Here’s a version (not quite the one I posted) of such a function, with a sample of use:

rem
rem     Script:         get_high_value.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem 

create or replace function get_high_value (
        i_tab_owner varchar2,
        i_tab_name varchar2, 
        i_part_name varchar2,
        i_part_posn number
)
return varchar2
is
        v1 varchar2(4000);
begin
        select  atp.high_value 
        into    v1 
        from    all_tab_partitions atp
        where   atp.table_owner         = upper(i_tab_owner)
        and     atp.table_name          = upper(i_tab_name)
        and     atp.partition_name      = upper(i_part_name)
        and     atp.partition_position  = upper(i_part_posn)
        ;
        return v1;
end;
/

select
        apt.table_owner, apt.table_name, 
        apt.tablespace_name, 
        apt.partition_name, apt.partition_position,
        apt.high_value
from
        all_tab_partitions      apt
where
        apt.table_owner = 'TEST_USER'
and     apt.table_name  = 'PT_RANGE'
and     get_high_value(
                apt.table_owner, 
                apt.table_name, 
                apt.partition_name,
                apt.partition_position
        ) = '200'
/

This seemed to work quite well and sufficiently rapidly – but I only had two partitioned tables in my schema and a total of 12 partitions, so it’s not sensible to look at the clock to see how efficient the query is.

Another possible solution introduced me to a function that has been around for years (and many versions) which I had never come across: sys_dburigen(). PaulZip supplied the following code (which I’ve altered cosmetically and edited to pick up a table in my schema):

select  *
from    (
        select 
                dbms_lob.substr(
                        sys_dburigen (
                                atp.table_owner, 
                                atp.table_name, 
                                atp.partition_name, 
                                atp.partition_position, 
                                atp.high_value, 
                                'text()'
                ).getclob(), 4000, 1)           high_value_str,
                atp.table_owner, atp.table_name, atp.partition_name,
                atp.tablespace_name, atp.high_value
        from    all_tab_partitions atp
        where   atp.table_owner = 'TEST_USER'
        and     atp.table_name  = 'PT_RANGE'
        )
where   high_value_str = '200'
/

This was so cute, and looked like a much nicer (i.e. “legal”) solution than my PL/SQL hacking that I had to take a closer look at sys_dburigen() – first to understand what it was supposed achieve (yes, I do RTFM) then to see how it actually worked.

Something I did first was simply to strip back the layers of the expression used to supplied the high_value_str which took me through the following four combinations (with and without ‘text’ , with and without ‘get_clob’). Each expression is followed by the result for the row selected above:

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value, 'text()').getclob()

200

---

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value).getclob()

<?xml version="1.0"?><HIGH_VALUE>200</HIGH_VALUE>

--

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value, 'text()')

DBURITYPE('/PUBLIC/ALL_TAB_PARTITIONS/ROW[TABLE_OWNER=''TEST_USER'' and TABLE_NAME=''PT_RANGE'' and PARTITION_NAME=''P200'' and PARTITION_POSITION=''1'']/HIGH_VALUE/text()', NULL)

--

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value)

DBURITYPE('/PUBLIC/ALL_TAB_PARTITIONS/ROW[TABLE_OWNER=''TEST_USER'' and TABLE_NAME=''PT_RANGE'' and PARTITION_NAME=''P200'' and PARTITION_POSITION=''1'']/HIGH_VALUE', NULL)

Working from the bottom pair up we see that we start by generating a dburitype which defines the type of thing we want to query and the restriction we want to use while querying. The ‘text()’ option simply adds an extra detail to the dburitype.

The top pair shows us that the get_clob() will then return the value we have requested, either as an XML value, or as the text value described by the XML value if we’ve supplied the ‘text()’ option.

Our call to sys_dburigen() has specified an object we want to access, and 4 columns in that object that will identify a unique row in that object, and a fifth column that we want returned either as an XML value or as a text value.

Tracing

I actually worked through the analysis in the opposite direction to the one I’ve been showing. When the call to sys_dburigen() I suspected that it might be doing the same thing as my PL/SQL function call, so I ran the two queries with SQL tracing enabled to see what activity took place at the database.

Ignoring the driving query against all_tab_partitions the content of the trace for the PL/SQL version was basically 3 executions (I had 3 partitions in the pt_range table) of:

SELECT ATP.HIGH_VALUE
FROM
 ALL_TAB_PARTITIONS ATP WHERE ATP.TABLE_OWNER = UPPER(:B4 ) AND
  ATP.TABLE_NAME = UPPER(:B3 ) AND ATP.PARTITION_NAME = UPPER(:B2 ) AND
  ATP.PARTITION_POSITION = UPPER(:B1 )

The content of the trace for the sys_dburigen() version was 3 executions of a query like:

SELECT alias000$."HIGH_VALUE" AS HIGH_VALUE
FROM
 "ALL_TAB_PARTITIONS" alias000$ WHERE 1 = 1 AND ((((alias000$."TABLE_OWNER"=
  'TEST_USER') AND (alias000$."TABLE_NAME"='PT_RANGE')) AND
  (alias000$."PARTITION_NAME"='P200')) AND (alias000$."PARTITION_POSITION"=
  '1'))

Note particularly the literal values in the predicates in lines 4, 5 and 6. This code using sys_dburigen() has to generate and optimise (hard parse) a new SQL statement for every partition in the table referenced in the driving query. For a table with a large number of partitions, and a system with a large number of partitioned tables, the disruption of shared pool that this might cause could be severe if (as the user said at one point) “we will be frequently selecting from all_tab_partitions”. [Damage limitation: if the session sets cursor_sharing to FORCE temporarily the generated SQL will be subject to bind variable substitution – but that’s not an ideal workaround.]

Summary

Using LONG columns in SQL predicates is not nice – and not likely to be efficient – but there are ways of working around the limitations of LONGs. It’s undesirable to use PL/SQL that calls SQL inside a SQL statement, but we can use a PL/SQL function to return a string from a LONG in the current row – and since that’s pretty much what Oracle seems to be doing with its call to sys_dburigen() it’s hard to insist that the PL/SQL strategy is inappropriate. (But maybe the call to sys_dburigen() in this context would be considered an abuse of a feature anyway – even though it seems much more elegant and flexible once you’ve learned a little about how it works.)

Footnote

As another detail on analysing the cost/benefit of different approaches – it would be possible to avoid creating the pl/sql function by embedding it in the SQL as a “with function” clause:

with function get_high_value (
        i_tab_owner varchar2,
        i_tab_name varchar2, 
        i_part_name varchar2,
        i_part_posn number
)
return varchar2
is
        v1 varchar2(4000);
begin
        select  atp.high_value 
        into    v1 
        from    all_tab_partitions atp
        where   atp.table_owner         = upper(i_tab_owner)
        and     atp.table_name          = upper(i_tab_name)
        and     atp.partition_name      = upper(i_part_name)
        and     atp.partition_position  = upper(i_part_posn)
        ;
        return v1;
end;
select
        apt.table_owner, apt.table_name, 
        apt.tablespace_name, 
        apt.partition_name, apt.partition_position,
        apt.high_value
from
        all_tab_partitions      apt
where
        apt.table_owner = 'TEST_USER'
and     apt.table_name  = 'PT_RANGE'
and     get_high_value(
                apt.table_owner, 
                apt.table_name, 
                apt.partition_name,
                apt.partition_position
        ) = '200'
/

Footnote 2

I have asked the user why they want to query all_tab_partitions by high_value since it seems to be a slightly odd thing to do and there may be a better way of doing whatever it is that this query is supposed to support. They haven’t responded to the question, so I’ll take a guess that they want to rename (or move etc.) partitions that they don’t know the name for – perhaps because they are using interval partitioning or automatic list partitioning. If the guess is correct then the solutions offered are irrelevant – you don’t need to know the name of a partition to manipulate it, you need only know some value that is a legal member of the partition:

SQL> select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position;

PARTITION_NAME
----------------------
P200
P400
P600

3 rows selected.

SQL> alter table pt_range rename partition for (199) to pt_0200;

Table altered.

SQL> select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position;

PARTITION_NAME
----------------------
PT_0200
P400
P600

3 rows selected.

March 12, 2024

Missing Partition

Filed under: CBO,Execution plans,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 12:20 pm GMT Mar 12,2024
Tags: ,

Here’s a silly little detail about execution plans on (interval) partitioned tables that I hadn’t noticed until it showed up on this thread on a public Oracle forum: it’s an execution plan that claims that Oracle will be visiting a partition that clearly won’t be holding the data requested.

Here’s the starting section of a demonstration script – mostly by Solomon Yakobson with minor tweaks and additions from me:

rem
rem     Script:         non_existent_partition.sql
rem     Author:         Solomon Yakobson / Jonathan Lewis
rem     Dated:          Mar 2024
rem
rem     Last tested 
rem             19.11.0.0
rem

create table invoices_partitioned(
        invoice_no   number not null,
        invoice_date date   not null,
        comments     varchar2(500)
)
partition by range (invoice_date)
interval (interval '3' month)
(
        partition invoices_past values less than (date '2023-01-01')
);

insert into invoices_partitioned
select  level,
        date '2023-01-01' + numtoyminterval(3 * (level - 1),'month'),
        null
from  dual
connect by level <= 6
/

insert into invoices_partitioned select * from invoices_partitioned;
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/

commit
/

execute dbms_stats.gather_table_stats(user,'invoices_partitioned')

set linesize 156
column high_value format a80

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

alter table invoices_partitioned drop partition for (date'2023-09-01');
purge recyclebin;

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

The script creates an interval partitioned table, with an interval of 3 months, then inserts 131,072 rows per partition (the strange re-execution of “insert into x select from x” was my lazy way of increasing the volume of data from the original one row per partition without having to think too carefully.

After creating the data we report the partition names and high values in order, then drop (and purge) the partition that should hold the value 1st Sept 2023 (which will be the partition with the high_value of 1st Oct 2023) and report the partition names and high values again so that you can see the “gap” in the high values and the adjustment to the partition_position values. Here are the “before” and “after” outputs:

PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39377             TO_DATE(' 2023-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 7     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

7 rows selected.



PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

6 rows selected.

Now we check the execution plan for a query that would have accessed the partition we’ve just dropped:

explain plan for
select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';

select  *
  from  dbms_xplan.display();


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |   109K|  1173K|   104   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|                      |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The execution plans says it’s going to visit partition number 4 (pstart/pstop) – which we know definitely cannot be holding any relevant data. If this were an ordinary range-partitioned table – as opposed to interval partitioned – it would be the correct partition for 1st Sept 2024, of course, but it isn’t, so it feels like the pstart/pstop ought to say something like “non-existent” and all the numeric estimates should be zero.

A quick trick for making an interval partition appear without inserting data into it is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually had the same effect of making Oracle create the partition, so I ran the query against user_tab_partitions again just to check that this hadn’t happend.

So what’s going to happen at run-time: is this an example of “explain plan” telling us a story that’s not going to match what shows up in v$sql_plan (dbms_xplan.display_cursor). Let’s run the query (with rowsource execution stats enabled) and find out:


set serveroutput off
alter session set statistics_level = all;
alter session set "_rowsource_execution_statistics"=true;

select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';


select  *
  from  table(dbms_xplan.display_cursor(format=>'allstats last partition'));


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  d42kw12htubhn, child number 0
-------------------------------------
select  *   from  invoices_partitioned   where invoice_date = date
'2023-09-01'

Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |      1 |        |       |       |      0 |00:00:00.01 |
|   1 |  PARTITION RANGE SINGLE|                      |      1 |    109K|     4 |     4 |      0 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |      0 |    109K|     4 |     4 |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

It’s the same plan with the same “wrong” partition identified, and the same estimate for rows returned – but the access never actually happened: Starts = 0 on the table access full.

My hypothesis about this misleading reporting is that Oracle knows from the table definition everything about every partition that might eventually exist – the high_value for the “anchor” partition is known and the interval is known so the appropriate partition number for any partition key value can be derived [ed: see comment #4 for details]. Then, at some point, a disconnect appears between the theoretical partition position and the set of physically instantiated partitions, so the optimizer gets the message “theoretically it’s in the 4th partition” and collects the stats from “partition_position = 4” to do the arithmetic and produce the plan. [ed: a later thought – the 109K estimate in this particularly simple example is based on {number of rows in table}/ {number of distinct values in table}]

Fortunately there’s some metadata somewhere that means the run-time engine doesn’t try to access the wrong partition, so this little glitch doesn’t really matter for this simple query – beyond its ability to cause a little confusion.

It’s possible, though, that this type of behaviour leaves the optimizer with another reason for getting the arithmetic wrong and picking the wrong path if you have a number of “missing” partitions in an interval partitioned table that you’re querying with a range-based predicate that crosses several (non-existent) partitions. So treat this as a warning/clue if you recognise that pattern in any of your partitioned table.

Disable oradebug

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 10:43 am GMT Mar 12,2024
Tags: ,

Here’s a little detail that I discovered recently when I wanted to do something a little exotic in 23c on the official Virtualbox VM. There’s been a hidden parameter to disable oradebug since (at least) 12.2.0.1. The clue is in the name:

_disable_oradebug_commands

The valid values for this parameter are none, restricted, and all. The default value in 12c, 18c, and 19c is none; the default value in 21.3 and 23.3 is restricted. This means some of the investigative code you may have used in the earlier versions may produce an error in the newer versions. To change the value you have to restart the database. Here are a couple of the error messages you might see:

_disable_oradebug_commands=all

SQL> oradebug setmypid
ORA-32519: insufficient privileges to execute ORADEBUG command: execution of ORADEBUG commands is disabled for this instance
SQL> 


_disable_oradebug_commands=restricted

SQL> oradebug setmypid
Statement processed.

SQL> oradebug peek 0x000000008E65E1D8 16
[08E65E1D8, 08E65E1E8) = 99FC0501 00000004 8E519690 00000000

SQL> oradebug call keiutFlush
ORA-32519: insufficient privileges to execute ORADEBUG command: execution of ORADEBUG commands is disabled for this instance

SQL> 


The “restricted” example above is the one that led me to the parameter when I was testing 23c. However, setting the value to “none” in the spfile and restarting the instance didn’t help. This is what I saw when I tried running my code from a PDB:

SQL> oradebug setmypid
Statement processed.
SQL> oradebug call keiutFlush
ORA-32519: insufficient privileges to execute ORADEBUG command: OS debugger privileges required for client
SQL> 

I have no idea how to give myself OS debugger privileges. It’s a nuisance, but since I’m running the whole thing from the server and have to have at least two sessions active for the testing I’m doing, it’s not a big issue to have one more running from the oracle (OS account) connected to the cdb$root to call the flush command.

Footnote

To check for the list of valid values for string parameters, you can query v$parameter_valid_values, but that view won’t show you the “underscore” parameters (the commented where clause above is the text in v$fixed_view_definition that allows Oracle to hide the hidden parameter). To see the valid values for the hidden parameters you need to access the x$ structure underneath the v$, and I wrote a little script (that has to be run by sys) to do that a long time ago.

November 20, 2023

Session Activity Stats

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 3:35 pm GMT Nov 20,2023

A little detail to remember when trouble-shooting at the session level – some of the information summarised in the Instance Activity figures (v$sysstat) is not available in the Session Activity figures (v$sesstat / v$mystat). The difference goes right down to the x$ objects, and here are two versions of a simple query I wrote for 19c to check for the missing statistics (you’ll have to be able to connect as SYS to get results from the first one):

rem
rem     Script:         mystat_missing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.3.0.0
rem             19.11.0.0
rem             12.2.0.1

spool mystat_missing

set linesize 132
set pagesize 60
set trimspool on
set tab off


select 
        usd.indx, usd.ksusdnam, usd.ksusdcls
from 
        x$ksusd usd
where
        usd.indx not in (
                select  sta.indx 
                from    x$ksumysta      sta
        )
/

select
        sys.statistic#, sys.name, sys.class
from
        v$sysstat sys
where
        sys.statistic# not in (
                select  mst.statistic# 
                from    v$mystat        mst
        )
/

spool off

If you run the script against 12.2.0.1 you’ll find that there are no “missing” session stats, but when you upgrade to 19c (in my case 19.11.0.0) you’ll find a couple of dozen statistics reported. This was the output I got:

STATISTIC# NAME                                                                  CLASS
---------- ---------------------------------------------------------------- ----------
      2010 Instance Statistic test case                                            384
      2011 cell RDMA reads unused                                                  264
      2012 cell RDMA writes                                                        264
      2013 cell pmem cache read hits unused                                        264
      2014 cell pmem cache writes                                                  264
      2015 NS transactions                                                         257
      2016 NS transactions timed out                                               257
      2017 NS transactions interrupted                                             257
      2018 NS transactions not using all standbys                                  257
      2019 NS transactions skipping send                                           257
      2020 NS transaction setups                                                   257
      2021 NS transaction bytes loaded                                             320
      2022 NS transaction bytes to network                                         320
      2023 NS transaction bytes relogged                                           322
      2024 NS transaction bytes logged                                             322
      2025 NS transaction send time                                                320
      2026 NS transaction setup time                                               320
      2027 NS transaction confirm time                                             320
      2028 NS recovery timeout interrupts                                          384
      2029 NS recovery DTC full interrupts                                         384
      2030 NS recovery fetch requests made                                         384
      2031 NS recovery fetch ranges received                                       384
      2032 NS recovery fetch requested bytes                                       384
      2033 NS recovery fetch received bytes                                        384
      2034 NS recovery fetch received time                                         384
      2035 NS recovery fetch requests remade                                       384

26 rows selected.

Running the query against 23c Free (23.3) on the Developer VM created by Oracle, the number of “missing” statistics jumped to 1,052 – so I won’t list them. Given the size of the list I did a quick check to remind myself of how many statistics were actually present in v$sysstat, and that produced an interesting comparison.

--
--      19.11.0.0
--
SQL> select count(*) ct_19c from v$sysstat;

    CT_19C
----------
      2036

--
--      23.3.0.0 (on premises)
--
SQL> select count(*) ct_23c from v$sysstat;

    CT_23C
----------
      2661

So there are 600-ish extra stats available in 23c, but 1,000-ish stats that don’t appear at the session level. So if you’ve been taking advantage of some of the “new” 19c stats to help with trouble-shooting you may find that they disappear on the upgrade. More on that later.

If you look at the output I’ve listed above you might spot that all the missing stats satisfy the predicate “class >= 256”. In fact, the class is a bit vector, and a more appropriate predicate would be: “bitand(class,256) = 256”. Either option gives you a fairly lazy way to do any checks you might be interested in. For example, after setting up a database link from a 19c instance to a 23c instance, I ran the following query to find out how many statistics that were visible in the 19c v$sesstat had changed their class to become invisible in the 23c v$sesstat.

select name from v$sysstat where class < 256
intersect
select name from V$sysstat@oracle23 where class >= 256
/


NAME
----------------------------------------------------------------
...
SMON posted for dropping temp segment
SMON posted for instance recovery
SMON posted for txn recovery for other instances
SMON posted for undo segment recovery
SMON posted for undo segment shrink
TBS Extension: bytes extended
TBS Extension: files extended
TBS Extension: tasks created
TBS Extension: tasks executed
...
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanout failures: delayed log
commit cleanout failures: flashback
commit cleanout failures: hot backup in progress
commit cleanout failures: write disabled
...
db corrupt blocks detected
db corrupt blocks recovered
...


502 rows selected.

Of the 502 stats a very large fraction were about In Memory (prefixed IM), and there were a number that looked as if they were only relevant to background processes and therefore (to some extent, possibly) not of interest when debugging user activity. It’s also possible that some of the statistics would fall into different classes if certain features (e.g hybrid columnar compression) were linked in to the executable.

Another query that might be of interest is a typical “what’s changed” query. What statistics are available in 23c that aren’t in 19c (and vice versa):

select
        name, version
from    (
        select name, '23c' version from v$sysstat@oracle23
        minus
        select name, '23c' from V$sysstat
        union all
        select name, '19c' version from v$sysstat
        minus
        select name, '19c' from V$sysstat@oracle23
        )
order by
        version desc, name
/

Again there are so many that I don’t think it would be helpful to reproduce my results, but I’ll just say that there were 663 stats in 23c that weren’t in 19c, and 38 stats in 19c that weren’t in 23c. Of course, it’s possible (and I didn’t check carefully) that some of these new/”disappearing” statistics show up only because they’ve had a name change as the version changed.

A couple of the new 23c stats that I like the look of (and would like to see at the session level) are:

user commits PL/SQL
user commits Autonomous

I’m sure that there are more that will reveal themselves as (occasionally) useful over time, and further investigation is left to the reader.

October 27, 2023

SQL_trace 23

Filed under: Execution plans,Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 12:46 pm BST Oct 27,2023

That’s not the 23rd article I’ve written on sql_trace, it’s just that there’s a little “catch-up” detail about sql_trace that I recently discovered in 23.2 (though it might also be working in 21c or even in recent versions of 19c).

Although level 4 and 8 (binds and waits) of the SQL trace event are commonly known, and the more modern style of calls to sql_trace with the added plan_stat options are fairly well known it’s only in 23c (possibly 21c) that all the plan_stat options get implemented in dbms_session and dbms_monitor. From some time back in the 11g timeline we’ve been able to execute calls like:

alter system set events '
        sql_trace[SQL:9tz4qu4rj9rdp]
        bind=false,
        wait=true,
        plan_stat=all_executions
'
;

Taking advantage of the “oradebug doc” command (which can only be run by a highly privileged user), you could discover the available options for the plan_stat:

SQL> oradebug doc event name sql_trace

sql_trace: event for sql trace

Usage
-------
sql_trace
   wait       < false | true >,
   bind       < false | true >,
   plan_stat  < never | first_execution | all_executions | adaptive >,
   level      <ub4>

The default value for plan_stat is first_execution which is usually adequate, but if you were trying to find out why a particular query sometimes runs very slowly compared to usual you might want to set it to all_executions. If the query executes extremely frequently, though, and produces a fairly length execution plan you might decide to set the value to adaptive (see end notes) which ought to limit the frequency with which the execution plan is dumped into the trace file.

Note: the threat of very long plans for very quick executions of an extremely popular statement comes from two directions – the extra “execution” time to get the plan written to the trace file, and the total size of all the files dumped to the trace directory. Depending on your requirements you might be able to use the “UTS trace” setup to limit the volume kept on disc.

If you check the packages dbms_session and dbms_monitor you will find the following procedures:

QL> desc dbms_session
...
PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

SQL> desc dbms_monitor
...
PROCEDURE CLIENT_ID_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 CLIENT_ID                      VARCHAR2                IN
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE DATABASE_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SERV_MOD_ACT_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SERVICE_NAME                   VARCHAR2                IN
 MODULE_NAME                    VARCHAR2                IN     DEFAULT
 ACTION_NAME                    VARCHAR2                IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSION_ID                     BINARY_INTEGER          IN     DEFAULT
 SERIAL_NUM                     BINARY_INTEGER          IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

Chris Antognini wrote an article some years ago listing the “10046 trace” levels and the relationships between the available numeric levels and the named parameter format (adaptive = 64). In the note he pointed out that adaptive was not available as a plan_stat value in dbms_session and dbms_monitor, and that Enhancement Request #14499199 had been raised to correct this omission. Since then I’ve run a little script from time to time to check if the ER has been fulfilled:

olumn  sid     new_value my_sid
column  serial# new_value my_serial

select  sid, serial#
from    v$session       ses
where   ses.sid = (
                select  ms.sid
                from    v$mystat ms
                where   rownum = 1
        )
;

execute dbms_monitor.session_trace_enable( &my_sid, &my_serial, waits=>false, binds=>false, plan_stat=>'adaptive')

execute dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive');

execute dbms_monitor.session_trace_disable( &my_sid, &my_serial)
execute dbms_session.session_trace_disable;

This depends on the user having suitable privileges on the packages and on a couple of dynamic performance views, and all that happens is that it succeeds (or fails) to enable tracing. Here’s the output from 19.11.0.0 of the two calls to enable:

BEGIN dbms_monitor.session_trace_enable(         40,      56799, waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_MONITOR", line 123
ORA-06512: at line 1


BEGIN dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_SESSION", line 343
ORA-06512: at line 1

Then on the upgrade to 23.2.0.0, the boring but satisfying:

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

Readers with versions in the gap between 19.11.0.0 and 23.2.0.0 are invited to run a check and report the results in the comments to narrow down when the enhancement became available. [Update: see comments #1 and #2 – versions up to 19.21 show the same error; and comment #5 shows the same error in 21.3]

plan_stat=adaptive

There are a couple of notes on MOS about the meaning/effect of this setting. They’re not 100% consistent with each other.

Doc ID 21154.1 says:

This dumps the STAT information if a SQL took more than about 1 minute thereby giving information for the more expensive SQLs and for different executions of such SQLs.

Doc ID 8328200 (relating to Bug 8328200) says:

“adaptive” mode dumps STAT lines in SQL trace for every minute of dbtime per shared cursor.

As you can see you can read the two statements as trying to say the same thing, but there is some ambiguity in both statements, and some need for clarification of terms. So I’ve done a couple of simple experiments – basically running a PL/SQL loop that executes the same statement 120 times, where each execution takes a little less 3 seconds to complete.

Cutting the top and tail from each trace file left me with 120 “FETCH” lines; a few of these lines were immediately followed by STAT lines, most were followed by CLOSE lines with no STAT lines reported for the execution. Here are the first few lines of the results from a trace file generated by 23.2 from a call to: “grep -A+1 FETCH {tracefile}”:

FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
STAT #140175454862744 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=277822 pr=0 pw=0 str=1 time=2814245 dr=0 dw=0 us)'
--
FETCH #140175454862744:c=2758633,e=2885235,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36479173822
CLOSE #140175454862744:c=2,e=1,dep=1,type=3,tim=36479173972
--

Piping this into “grep -B+1 STAT” extracted just the FETCH/STAT pairs, which I could then pass through “grep FETCH” to get the output that I wanted to see: the “tim=” values for only those FETCH calls that were followed by STAT lines:

grep -A+1 FETCH {tracefile}  | grep -B+1 STAT  |  grep FETCH
 
FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
FETCH #140175454862744:c=2716296,e=2782323,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36541290925
FETCH #140175454862744:c=2804165,e=2916326,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36610338064
FETCH #140175454862744:c=2677000,e=2737197,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36676533249
FETCH #140175454862744:c=2722436,e=2796966,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36744048403
FETCH #140175454862744:c=2725066,e=2792661,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36811536449

If you do the arithmetic with the tim= values you’ll find that the time between the executions that reported their execution plan in STAT lines varies from about 65 seconds to 69 seconds. The results from 19.11 were similar, varying from 64.9 seconds to 67.9 seconds. So the dumping STAT lines is dependent on spending (approximately) one minute of dbtime in execution (as both MOS Documents indicate) but it’s once every minute, and it’s time that can be accumulated over many executions of the statement. Part of the deviation from one minute comes, of course, from the fact the STAT lines are dumped only as the cursor closes.

The other detail that needs investigation is the interpretation of the clause “per shared cursor”. If I ran my test from two separate sessions concurrently would I be accumulating “db minutes” twice as fast and dumping STAT lines twice as frequently? The answer was yes – providing the two sessions acquired the same child cursor in v$sql.

To demonstrate this, I ran two concurrent copies of the PL/SQL loop, starting one session about 65 seconds after the other. My first trace file showed its second set of STAT lines after about 63 seconds then roughly every 33 seconds, and the second trace file (starting about one minute late) showed its second set of STAT lines after about 33 seconds, repeating that interval between dumps until nearly the end of its run when the last gap (after the first session had finished) stretched to 50 seconds.

Is it good enough?

Maybe, maybe not. It gives you a sample of execution plans for queries that accumulate “a lot” of time while using a particular child cursor, and that might be sufficient for some purposes.

If you have a frequently executed query that usually takes a couple of seconds but occasionally takes 90 seconds then that 90 second execution will show up in the trace file, either because it’s a different plan, and gets dumped on the first execution, or because it will have increased the shared execution time by more than 60 seconds.

If you have a query that executes extremely frequently and takes a couple of hundredths of a second each time but occasionally takes 5 seconds the adaptive option may not help. As with the 90 second/change of plan case you’ll see the plan; but all it does if the plan doesn’t change is improve your chances of seeing the stats of that slow execution plan – it has to take place in that little window where its execution time just tips the running total over the next 60 second limit.

August 19, 2023

force_match patch

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 2:56 pm BST Aug 19,2023

A recent post on one of the forums (needs a MOS account) asked if it was possible to change a batch of SQL Plan Profiles from “exact match” to “force match” without going through the hassle of dropping and recreating them. This took me back a couple of years to an occasion when I was asked if it was possible to make an SQL Patch “force match” – the answer was “yes” and the method is pretty simple (once you’ve done the research). There are two methods – the dangerous one and the “safe” one, but both are hacks and there’s no reason for Oracle Corp. to support you if either goes wrong.

Background

When you create an SQL Patch Oracle calls the same PL.SQL code that it uses to create SQL Plan Profiles, so the method applies to both features.

  • Patches and profiles are referenced through an SQL_Handle, but the handle is derived from a signature, and the signature comes from v$sql which holds two columns exact_matching_signature and force_matching_signature,
  • All the components of the profile or patch are keyed through the signature – and if a profile is set to force_match Oracle uses the force_matching_signature as the signature, otherwise is uses the exact_matching signature. Patches, of course, only ever use the exact_matching_signature.
  • The SQL_handle consists of a text prefix attached to the “rawtohex()” representation of the signture, viz:
sql_handle = 'SQL_' || to_char(signature,'fm0xxxxxxxxxxxxxxx')
  • Finally, for SQL Plan Profiles there is a flag on the profile “header” record that tells Oracle whether it is exact matching or force matching.

So, to convert an SQL Plan Profile or an SQL Patch from exact matching to force matching all you have to do is find the two signatures for the statement, then update the data dictionary to change the SQL_Handle and flag on the header, and change the signature on all the connected tables!

If you’re not keen on updating the data dictionary directly there’s a “safer” option.

  • create the appropriate staging table (see dbms_sqldiag (patches) or dbms_spm (profiles))
  • pack (export to) the staging table
  • update the staging table with relevant signature, handle and flag values.
  • unpack (import from) the staging table (the “replace” option doesn’t seem to work correctly for SQL Patches so you’ll have to delete the existing SQL Patches)

Demonstration (sql_patch)

rem
rem     Script:         patch_hack_force_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct  2021 / Nov 2022
rem

create table t1 as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4    -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(n1, n2);

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format=>'-hint_report'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------

SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

Plan hash value: 2432955788

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=15 AND "N2"=15)


I’ve started by creating an indexed table and executing a query that clearly ought to use the index. I’ve flushed the shared_pool [see warning] before running the query and reported the actual plan with the “-hint_report” format so that I don’t get a complaint about my “find” and “this” hints.

column exact_signature new_value m_exact_signature
column force_signature new_value m_force_signature

column exact_handle new_value m_exact_handle
column force_handle new_value m_force_handle

column sql_id new_value m_sql_id

select
        sql_id, child_number,
        to_char(exact_matching_signature,'999999999999999999999') exact_signature,
        'SQL_' || to_char(exact_matching_signature,'fm0xxxxxxxxxxxxxxx') exact_handle,
        to_char(force_matching_signature,'999999999999999999999') force_signature,
        'SQL_' || to_char(force_matching_signature,'fm0xxxxxxxxxxxxxxx') force_handle
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

SQL_ID        CHILD_NUMBER EXACT_SIGNATURE        EXACT_HANDLE          FORCE_SIGNATURE        FORCE_HANDLE
------------- ------------ ---------------------- --------------------- ---------------------- ---------------------
ayxnhrqzd38g3            0   14028721741576327483 SQL_c2b00818bb74d13b     2882966267652493885 SQL_28025a8bbadc523d


After checking the plan (I have plenty of “prompt” comments and “pause” lines in the full text) the next step is to pick out the sql_id and two signatures from v$sql – reporting the signatures both in their original numeric form and after “conversion” to SQL Handle form. The next step is to create a patch; and the best patch I can make to demonstrate the point is a patch that forces the optimizer into a full tablescan. You’ll notice that I’ve started by dropping it with a “when others then null” exception – that’s just a lazy bit of code to deal with the pre-existence of the patch if I’ve crashed the program on a previous execution.

set serveroutput on

begin
        begin
                dbms_sqldiag.drop_sql_patch('patch_hack_force');
        exception
                when others then null;
        end;

        dbms_output.put_Line(
                'Patch result is: ' ||
                dbms_sqldiag.create_sql_patch(
                        sql_id  => '&m_sql_id',
                        name    => 'patch_hack_force',
                        hint_text => 'full(t1@sel$1)',
                        validate => true
                )
        );

end;
/

Patch result is: patch_hack_force

PL/SQL procedure successfully completed.

So now we need to see what Oracle has stored as the patch, and whether it works.

column signature format 999999999999999999999

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

FOR              SIGNATURE SQL_TEXT
--- ---------------------- --------------------------------------------------------------------------------
NO    14028721741576327483 select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

1 row selected.

So we have an SQL Patch in the data dictionary; it’s not “force matching”, and its signature does match the value we reported above as the “exact signature”. Now we can test the effects of the patch – starting with a flush of the shared_pool [see warning] so that we don’t pick up the previous execution plan by accident.

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

We re-execute the original query so see if the plan changes and the SQL Patch is reported, then execute a “similar enough” query to see if the patch has any effect on it:

  COUNT(*)
----------
         1

1 row selected.

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    25 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("N1"=15 AND "N2"=15))

Note
-----
   - SQL patch "patch_hack_force" used for this statement


23 rows selected.


  COUNT(*)
----------
         1

1 row selected.


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16

Plan hash value: 2432955788

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N1"=16 AND "N2"=16)

As you can see the plan for the original query is now using a full tablescan and the SQL Patch is reported as used; but only for an exact matching query. The extremely similar query where the 15’s have been changed to 16’s still uses an index range scan and (trust me) didn’t have any note mentioning the SQL Patch.

Time for hacking

execute dbms_sqldiag.create_stgtab_sqlpatch(table_name         => 'my_sql_patches')
execute dbms_sqldiag.pack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force')

select signature, sql_handle, sqlflags from my_sql_patches;

             SIGNATURE SQL_HANDLE                       SQLFLAGS
---------------------- ------------------------------ ----------
  14028721741576327483 SQL_c2b00818bb74d13b                    0

update my_sql_patches set
        signature = &m_force_signature,
        sql_handle = '&m_force_handle',
        sqlflags = 1
where
        signature = &m_exact_signature
and     sql_handle = '&m_exact_handle'
;

1 row updated.

You’ll notice that I’ve been cautious with my update to the staging table – I’ve updated it only if the signature and sql_handle both match the values I extracted with my query against v$sql above

So now we need to import that contents of the staging table and overwrite the original SQL Patch. Unfortunately for patches it seems as if you have to drop the original – using the “replace” option without first dropping resulted in error ORA-13841:

ORA-13841: SQL profile named patch_hack_force already exists for a different signature/category pair

You might note particularly that this refers to an SQL profile, not an SQL patch. Just one of several indications that the SQL Profile and SQL Patch code have a significant overlap.

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
end;
/

execute dbms_sqldiag.unpack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force', replace=>true)

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

FOR              SIGNATURE SQL_TEXT
--- ---------------------- --------------------------------------------------------------------------------
YES    2882966267652493885 select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

After the unpack you see that the patch is now force matching, and its signature matches the force_matching_signature from the orginial query against v$sql above. The only thing left to do is check if the “similar enough” query now uses the same SQL patch (flushing the shared_pool [see warning] first to avoid confusion):

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

  COUNT(*)
----------
         1

1 row selected.


PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    25 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("N1"=16 AND "N2"=16))

Note
-----
   - SQL patch "patch_hack_force" used for this statement

Footnotes

It’s worth reminding everyone that force-matching doesn’t take place for statements that have a mixture of bind variables and literals (even if the bind variables are only in the select list) until 19.12.

You need various privileges to run through the example I’ve shown. To make sure everything worked as expected I dropped and created a specific user to run the tests each time. Here’s the script I used to do this:

drop user patch_hacker cascade;

create user patch_hacker identified by patch_hacker;

grant create session to patch_hacker;
grant create table to patch_hacker;

alter user patch_hacker default tablespace test_8k_assm;
alter user patch_hacker quota unlimited on test_8k_assm;

grant execute on dbms_sqldiag to patch_hacker;
grant administer sql management object to patch_hacker;

grant select on dba_sql_patches to patch_hacker;

grant execute on dbms_xplan to patch_hacker;
grant select on v_$sql to patch_hacker;
grant select on v_$session to patch_hacker;
grant select on v_$sql_plan to patch_hacker;
grant select on v_$sql_plan_statistics_all to patch_hacker;

grant alter session to patch_hacker;
grant alter system to patch_hacker;

The privileges to select on the various v$ (v_$) views are necessary to take full advantage of dbms_xplan.display_cursor() The alter system/session were for testing only (enabling trace, flushing the shared pool). And, of course, you’ll need to pick your own tablespace for setting a quota and making default.

The only thing that worries me about using this strategy is the privilege on “administer sql management object” – is this something that requires the diagnostic and performance packs?

For convenience, the following drop-down includes the complete SQL script, with pauses, comments, prompts, etc. (and a call to setenv.sql which simply sets up a load of column and format definitions, purges the recyclebin, and does general tidying).

Click here to expand / contract the complete SQL script
rem
rem     Script:         patch_hack_force_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2021 / Nov 2022
rem     Purpose:        
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem     Not tested
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1
rem
rem     Notes:
rem     How to hack an SQL Patch to be "force matching"
rem
rem     See also:
rem     https://jonathanlewis.wordpress.com/2023/08/19/force_match-patch/
rem     https://community.oracle.com/mosc/discussion/4547670/change-existing-sql-profile-s-to-force-match-true-without-recreating
rem

drop table t1;
drop table my_sql_patches;

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
--        when others then raise;
end;
/

@@setup

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4    -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(n1, n2);

spool patch_hack_force_2a.lst

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format=>'-hint_report'));

-- pause Press return for sql_id etc.

column exact_signature new_value m_exact_signature
column force_signature new_value m_force_signature

column exact_handle new_value m_exact_handle
column force_handle new_value m_force_handle

column sql_id new_value m_sql_id

select
        sql_id, child_number,
        to_char(exact_matching_signature,'999999999999999999999') exact_signature,
        'SQL_' || to_char(exact_matching_signature,'fm0xxxxxxxxxxxxxxx') exact_handle,
        to_char(force_matching_signature,'999999999999999999999') force_signature,
        'SQL_' || to_char(force_matching_signature,'fm0xxxxxxxxxxxxxxx') force_handle
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

-- pause press return

set serveroutput on

begin
        begin
                dbms_sqldiag.drop_sql_patch('patch_hack_force');
        exception
                when others then null;
        end;

        dbms_output.put_Line(
                'Patch result is: ' ||
                dbms_sqldiag.create_sql_patch(
                        sql_id  => '&m_sql_id',
                        name    => 'patch_hack_force',
                        hint_text => 'full(t1@sel$1)',
                        validate => true
                )
        );

end;
/

column signature format 999999999999999999999

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

pause Did we create a patch ?

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

prompt  ==============================
prompt  Patch NOT used for 16/16 query
prompt  ==============================

pause press return

prompt  About to create patch table
execute dbms_sqldiag.create_stgtab_sqlpatch(table_name         => 'my_sql_patches')

prompt  About to pack patch table
execute dbms_sqldiag.pack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force')


prompt  ===============================================
prompt  check and update the patch in the staging table
prompt  ===============================================

select signature, sql_handle, sqlflags from my_sql_patches;

-- set verify on

update my_sql_patches set
        signature = &m_force_signature,
        sql_handle = '&m_force_handle',
        sqlflags = 1
where
        signature = &m_exact_signature
and     sql_handle = '&m_exact_handle'
;

commit;

prompt  =================================================
prompt  Dropping and replacing patch that has been hacked
prompt  =================================================

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
end;
/

execute dbms_sqldiag.unpack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force', replace=>true)

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

pause   Is the patch now force matching?

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

prompt  ==============================
prompt  Patch now used for 16/16 query
prompt  ==============================

drop table my_sql_patches purge;

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
end;
/


spool off

set echo off
set autotrace off
set serveroutput on

alter session set events '10046 trace name context off';
alter session set statistics_level = typical;
alter session set tracefile_identifier = '';

begin
        dbms_application_info.set_module(
                module_name     => null,
                action_name     => null
        );

        dbms_application_info.set_client_info(
                client_info     => null
        );
end;
/


set doc off
doc


#

Warning

A reader sent me a private email about this post, highlighting the dangers of flushing the shared_pool on a production system, and the potential for massive performance problems over an extended period as all the production SQL was re-optimised – with all the threats of dynamic sampling, statistics feedback, inflection point derivation etc. The “flush shared_pool” calls in this note are for testing and demonstration purposes only to avoid any possibility of being fooled by results which were (e.g.) the side effect of cached cursors.

Do not test on production, do not flush the shared_pool on production (or even on a shared dev/test/uat where other people could get very upset) unless you have a very good reason to do so. An alternative in this case would be to gather stats on the table (which no-one else should be using) with “no_invalidate=>false”. Again, though, on a shared system this could have side effects since all the cursors relating to that table will be invalidated. [Ed: see comment #3 below – reader izp offers an alternative for invalidating all SQL relating to this table by issuing a “zero-impact” DDL on it – add a comment to it.]

I had considered purging all the cursors (dbms_shared_pool.purge()) that might be affected by the patch (both before and after) by using a pl/sql loop based on the force_matching_signature ; but my correspondent pointed out that this wouldn’t purge cursors that were executing, and didn’t invalidate them either. However, this prompted me to test what happened at two particular points in the code, with the following results:

  • At the “drop_sql_patch()” the exact_matching_signature cursor was invalidated
  • At the call to unpack the staging table any force_matching_signature cursors were invalidated.

This removes the need for any flushing of the shared_pool in your production systems – but it’s worth thinking about the edge cases (and I’ve see a few) where a very large number of similar cursors are generated dynamically and all need to be invalidated. The contents of v$sql are not “indexed” by either signature so the call to unpack will require at least one “full scan” of the library cache, and invalidating a large number of cursors may produce some shared pool contention.

July 13, 2023

Parallel DML

Filed under: Oracle,Parallel Execution,Troubleshooting — Jonathan Lewis @ 5:35 pm BST Jul 13,2023

Here’s a note that was prompted by a report about a delete that started up lots of PX processes but apparently did a serial delete even though parallel DML has been forced (alter session force parallel DML parallel 10). The description of “serial” was subsequently refined to explain that “only one of the 48 spawned sessions was doing all the work / IO etc.” so this was not a case of parallel DML being disabled deliberately by Oracle leaving the QC to do the work.

Unfortunately the OP didn’t supply an execution plan (despite 3 people suggesting that this would be helpful), so I’ve dug out an old script that might explain the observed behaviour. It’s based on a simple copy of data from the view all_objects and tries to delete a large fraction of that data using a simple parallel delete statement.

rem
rem     Script:         para_dml_lock.sql
rem     Dated:          June 2010
rem     Author:         Jonathan Lewis
rem

create table t1 as select * from all_objects;
-- create index t1_i1 on t1(owner, object_id, object_name);
-- create index t1_i2 on t1(object_name, owner, object_id);

alter system flush buffer_cache;

spool para_dml_lock

set linesize 180
set pagesize 60
set trimspool on
set serveroutput off

alter session force parallel dml;
alter session set statistics_level = all;
-- alter session set events '10046 trace name context forever, level 8';

delete
         /*+ 
                parallel(t1,2)
                gather_plan_statistics 
        */
from    t1 
where   owner = 'SYS'
;

select * from table(dbms_xplan.display_cursor(format=>'parallel allstats'));

break on  qcsid skip 1 on server_group on server_set on server# on sid skip 1

select
        ses.qcsid,
        ses.server_group,
        ses.server_set,
        ses.server#,
        lck.sid,
        lck.type,
        lck.id1, lck.id2,
        lck.lmode, lck.request,
        lck.ctime, lck.block
from
        V$lock          lck,
        v$px_session    ses
where
        ses.sid = lck.sid
-- and     lck.sid in (select sid from v$session where username = 'TEST_USER')
order by
        case when ses.qcsid = ses.sid then 0 else 1 end,
        ses.qcsid,ses.server_group, ses.server_set, ses.server#,
        lck.sid, lck.type desc , lck.id1, lck.id2
;

spool off

set serveroutput on
alter session set events '10046 trace name context off';
alter session set statistics_level = typical;

You’ll see that I’ve allowed for code to create a couple of indexes as that may affect the way that Oracle behaves; I’ve also enabled and reported rowsource execution statistics so that I can get an idea of the volume of data moving through each process. The query against v$lock and v$px_session is the reason I wrote the script 13 years ago – there are some unexpected locking details when parallel DML is involved, but I won’t be looking at those in this note.

The Tablescan path

Here’s the execution plan from 19.11.0.0 when there are no indexes on the table. (The table has about 64,000 rows spread over nearly 1,300 blocks):

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |          |      1 |        |        |      |            |      4 |00:00:00.49 |      13 |      3 |
|   1 |  PX COORDINATOR       |          |      1 |        |        |      |            |      4 |00:00:00.49 |      13 |      3 |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |      0 |   2465 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
|   3 |    DELETE             | T1       |      2 |        |  Q1,00 | PCWP |            |      0 |00:00:00.94 |   54061 |   1288 |
|   4 |     PX BLOCK ITERATOR |          |      2 |   2465 |  Q1,00 | PCWC |            |  44733 |00:00:00.14 |    1345 |   1267 |
|*  5 |      TABLE ACCESS FULL| T1       |     26 |   2465 |  Q1,00 | PCWP |            |  44733 |00:00:00.05 |    1345 |   1267 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER"='SYS')

Note
-----
   - Degree of Parallelism is 2 because of hint

There’s nothing terribly interesting in this plan. We can see that the delete (operation 3) took place in parallel because it appears as a child to the PX send QC (operation 2) which means the PX processes did the deletion before sending the query coordinator messages to tell it what they’d done. We can also infer (before reading the Note) that the degree of parallelism was 2 because the delete operation reports 2 in the Starts column. The 26 starts for the Table Access Full (operation 5) is also something we might expect (for tablescans and index fast full scans) as Oracle has a basic target of 13 rowid ranges per process.

One important “anomaly” is that the plan reports a final A-Time of 0.49 seconds while the delete at operation 3 reports 0.94 seconds. Time collected by enabling rowsource execution statistics should always be treated a little cautiously, but in this case the explanation of the contradiction is simple: 0.94 is the sum of the elapsed times seen by the two PX processes, 0.49 is the elapsed time from the viewpoint of the query coordinator.

Index Effects

Here’s the plan if we repeat the experiment but include the first of the two index creation statements:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT                    |          |      1 |        |        |      |            |      2 |00:00:00.66 |      10 |      3 |       |       |          |
|   1 |  PX COORDINATOR                     |          |      1 |        |        |      |            |      2 |00:00:00.66 |      10 |      3 |   337K|   337K|     1/0/0|
|   2 |   PX SEND QC (RANDOM)               | :TQ10002 |      0 |   2465 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    INDEX MAINTENANCE                | T1       |      2 |        |  Q1,02 | PCWP |            |      2 |00:00:00.31 |    2290 |      9 |       |       |          |
|   4 |     PX RECEIVE                      |          |      2 |   2465 |  Q1,02 | PCWP |            |  44735 |00:00:00.15 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE                  | :TQ10001 |      0 |   2465 |  Q1,01 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       DELETE                        | T1       |      2 |        |  Q1,01 | PCWP |            |      0 |00:00:01.14 |   53228 |   1068 |       |       |          |
|   7 |        PX RECEIVE                   |          |      2 |   2465 |  Q1,01 | PCWP |            |  44733 |00:00:00.18 |       0 |      0 |       |       |          |
|   8 |         PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |   2465 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   9 |          PX SELECTOR                |          |      2 |        |  Q1,00 | SCWC |            |  44733 |00:00:00.04 |     380 |    380 |       |       |          |
|* 10 |           INDEX RANGE SCAN          | T1_I1    |      1 |   2465 |  Q1,00 | SCWP |            |  44733 |00:00:00.02 |     380 |    380 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  10 - access("OWNER"='SYS')

Note
-----
   - Degree of Parallelism is 2 because of hint

The first thing to notice, perhaps, is that the driving operation for the delete is an index range scan rather than a full tablescan. The range scan is called by a PX SELECTOR operation which means only one PX process is used to do it – so the range scan is, indeed, serialized and in the past (prior to 12c, I think) we might have seen serialization like this handled by the query coordinator. Having walked the index in order the single PX process distributes by hashing (PX SEND HASH (BLOCK ADDRESS)) the block ids that it has derived from the rowids it has identified in the index – it’s at this point that the work switches from serial to parallel (IN-OUT showing S->P).

The purpose of the hash distribution by block id is to ensure that the PX processes that do the actual deletion don’t end up colliding on table blocks and introducing buffer busy waits and problems with ITL waits. Because I flushed the buffer cache I can say that the 1,068 blocks read by the delete operation correspond to the number of table blocks that hold rows where owner = ‘SYS’ and if I enable the extended SQL trace (10046, level 8) I can see that the blocks read by the two relevant PX processes are made up of two disjoint sets.

This pattern of behaviour may, of course, explain why the OP thought the delete was running serially and only one PX process was working: with 48 processes live (which means DOP 24 for this index-driven plan) and only one PX process doing a range scan and spreading table blocks across 24 processes, a relatively small delete might give the impression that the one process called by the PX Selector was the only one doing anything.

Moving on up the plan we can see that we now do a range based distribution to the “next” set of PX processes that handle the index maintenance. Range-based so that the two processes don’t collide on index leaf blocks (except, perhaps, for one leaf block where the two ranges meet – that’s a detail to be investigated). Once the index has been updated the PX processes sends a completion message to the QC.

It’s interesting to note that when we delete 44,733 rows from the table it takes 53,228 buffer gets, but when we maintain (delete) the same number of index entries it takes only 2,290 buffer gets. This suggests that the table deletes might be operating as single rows deletes while the index deletes are array (block) based.

There are always more things to check, of course, when investigating a mechanism – taking a snapshot of the session stats is a necessity if, for example, you want to find out what the buffer gets were for. I didn’t do that, but one other thing I did check was the contents of the view v$pq_tqstat so that I could address the question of whether the PX SEND HASH (BLOCK ADDRESS) operation actually spread the work well or whether all the rowid information went to a single PX process in the next set, making the plan effectively operate like a serial plan.

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCYa
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P000                 44733    2877765    100.00     100.00          0          0           0
                                             1 P001                     0         48      0.00       0.00          0          0           0
                      Consumer               1 P002                 23394    1499012     52.30      52.09         29         26           0
                                             1 P003                 21339    1378801     47.70      47.91         30         27           0

                    1 Ranger                 1 QC                       2     180625    100.00     100.00          5          2           0
                      Producer               1 P002                 24565    1968590     52.30      52.14          2          1           0
                                             1 P003                 22407    1806894     47.70      47.86          2          1           0
                      Consumer               1 P000                 22353    1749518     49.97      48.67         34         27           0
                                             1 P001                 22382    1845341     50.03      51.33         42         35           0

                    2 Producer               1 P000                     1        154     50.00      50.00         13          4           0
                                             1 P001                     1        154     50.00      50.00         11          4           0
                      Consumer               1 QC                       2        308    100.00     100.00         63         19           0


Reading the table queue IDs (TQ_ID) from top to bottom we can see that P000 was obviously the selected process that ran the index range scan, and it split the data a little unevenly between P002 and P003. It’s not immediately obvious how P002 and P003 can report more rows forwarded (46,972) to P000 and P001 for index maintenance than originally arrived (44,733), but it seems a little suspicious that the excess is extremely close to 5%, and the excess disappears when you check the rows consumed by P000 and P001.

Check the number of rows and bytes consumed by the QC in the guise of a Ranger, though:180,625 is a lot of bytes for 2 rows, so maybe that’s actually 2 packets (one per producer PX process) and maybe it’s the QC that has consumed the 5% excess as its ranging sample – as you can check by doing the arithmetic for the bytes reported in tq_id number 1: P002 + P003 = P000 + P001 + QC

More Indexes

Naturally there are all sorts of variations on this delete that we could investigate – what happens with partitioned tables, what if they have local indexes, what about global and globally partitioned indexes. Do IOTs and bitmap indexes have any interesting effects. What about domain indexes of different types. I’m not going to push the envelope in any exciting direction, but it’s worth checking what happens in the very common case where there are multiple indexes – how much changes when I re-run the test but create the second index.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT                    |          |      1 |        |        |      |            |      2 |00:00:01.10 |      10 |      3 |       |       |          |
|   1 |  PX COORDINATOR                     |          |      1 |        |        |      |            |      2 |00:00:01.10 |      10 |      3 |   337K|   337K|     2/0/0|
|   2 |   PX SEND QC (RANDOM)               | :TQ10002 |      0 |   2465 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    INDEX MAINTENANCE                | T1       |      2 |        |  Q1,02 | PCWP |            |      2 |00:00:00.97 |    4836 |    482 |       |       |          |
|   4 |     PX RECEIVE                      |          |      2 |   2465 |  Q1,02 | PCWP |            |  89468 |00:00:00.30 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE                  | :TQ10001 |      0 |   2465 |  Q1,01 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       DELETE                        | T1       |      2 |        |  Q1,01 | PCWP |            |      0 |00:00:01.57 |   53207 |   1068 |       |       |          |
|   7 |        PX RECEIVE                   |          |      2 |   2465 |  Q1,01 | PCWP |            |  44733 |00:00:00.18 |       0 |      0 |       |       |          |
|   8 |         PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |   2465 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   9 |          PX SELECTOR                |          |      2 |        |  Q1,00 | SCWC |            |  44733 |00:00:00.04 |     380 |    380 |       |       |          |
|* 10 |           INDEX RANGE SCAN          | T1_I1    |      1 |   2465 |  Q1,00 | SCWP |            |  44733 |00:00:00.02 |     380 |    380 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The only significant differences between this plan and the previous plan appears at the operations relating to index maintenance. We see the A-Rows value for the PX RECEIVE at operation 4 has doubled (there are now 2 sets of index entries to handle rather than 1), and the number of buffer visits and disk reads have gone up from 2,290 and 9 respectively to 4,836 and 482 respectively as the 2nd index has to be updated (remember the first index would have been well cached at this point because it was used to identify the rows to be deleted).

Summary

There are many reasons why Oracle will refuse to use parallel DML, even when hinted or “forced”. You can, however, be fooled into thinking that it’s not using parallel DML when it is but the pattern of the data and work distribution introduces (possibly temporarily) an extreme skew in the diagnostic data you’re looking at. It’s best to model what you’re doing at a small scale so that you can investigate the options that you are likely to see before you run into an unusual case in production.

Footnote

I said that I’d written this little script some time ago to check on the locking that took place for operations like a parallel update etc. The results may be a little surprising if you’ve not thought about them before. What locks can you hold on a table if someone else it holding an exclusive lock? You might think there was a clue is in the word “exclusive”, but perhaps not really. Here’s the state of the locks I was holding for the indexed case after I had done the delete but before the commit (the tablescan case is very similar but report only 2 PX sessions):

     QCSID SERVER_GROUP SERVER_SET    SERVER#        SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- ------------ ---------- ---------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
        30                                            30 TX     589843      21080          6          0          1          0
                                                         TM     181671          0          6          0          1          0
                                                         PS          1          0          4          0          1          0
                                                         PS          1          1          4          0          1          0
                                                         PS          1          2          4          0          1          0
                                                         PS          1          3          4          0          1          0
                                                         AE        134    3792595          4          0          7          0

                      1          1          1        192 TX     458756      18683          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          0          4          0          1          0

                                            2         17 TX     262167      18430          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          1          4          0          1          0

                                 2          1         41 TX     196620      20448          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          2          4          0          1          0

                                            2        207 TX     655383      20651          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          3          4          0          1          0

The AE lock is the Editioning lock and isn’t currently of interest.

The QC and the 4 PX sessions all hold separate TX (transaction) locks in exclusive mode. The QC also holds 4 PS locks, which represent the locks on the PX processes (the id1 / id2 values correspond to the inst_id / server_name columns in gv$px_process). These PS locks are held in share mode and are also held by the PX sessions – one each – identifying which process they correspond to. Finally, the QC is holding an exclusive lock on the table (TM / 181671, mode 6).

Despite the “exclusiveness” of the QC’s TM lock, the PX sessions are also holding locks on that table – in fact two locks each, one in mode 1 the other in mode 4. However you will note that the mode 1 lock is held on a TM lock with the (usual) id2 value of 0; while the mode 4 lock is held on a TM lock with the (unusual) id2 value of 1.

The main point of this footnote – which once was planned as the complete article based on the script – is simply to point out that when you start playing around with feature that are new to you, it’s a good idea to investigate some of the basic v$ views to find out what unusual variations you are likely to see if you put the feature into production.

July 12, 2023

Parallel DML – not

Filed under: Oracle,Parallel Execution,Troubleshooting — Jonathan Lewis @ 2:21 pm BST Jul 12,2023

A fairly common question about parallel DML is: “Why isn’t it happening?” A fairly common (correct) answer to this question is: “Because you haven’t enabled it.” Unlike parallel query and parallel DDL which are enabled by default parallel DML could cause strange deadlocking side-effects so you have to enable it explicitly before you use it (and disable it afterwards), with the statements like:

alter session enable parallel dml;
alter session force parallel dml;
alter session force parallel dml parallel 3;
alter session disable parallel dml;

There are, however, many reasons why Oracle will disallow parallel DML (e.g. this example from 2013) and some of them are listed in the documentation (Link is for 19c). Bear in mind that some things that are listed are no longer true following updates, or are only true in particular circumstances; and some things that block parallel DML are not listed, or are described in such a way that they don’t appear to be listed.

To keep myself up to date I search the executable from time to time for “revealing phrases”, starting with a Linux execution of “strings -a oracle”. For parallel DML I piped the results through “grep -n -i PDML”. I won’t show you the entire (307 line) output from the call to grep, but I will pick out a set of consecutive lines which look very promising:

1827860:PDML disabled in session
1827861:no intrapart PDML and instead-of trigger on view
1827862:no intrapart PDML and only one partition
1827863:no intrapart PDML and two partitions or greater
1827865:PDML disabled for MERGE using conventional insert
1827866:no PDML for domain index during delete/merge/update 
1827867:no PDML on global temp table
1827868:default PDML not allowed
1827869:no PDML for domain indexes that is not parallel ready 
--      always PIDL on cursor temp table  
1827871:no PDML. IPD disabled on table with bitmap index
1827872:PDML disabled due to MTI with conventional load
1827873:PDML enabled in session
--      non-partitioned IOT
--      table locks disabled
1827876:DML violated a PDML restriction
1827877:kkfdtpdml
1827878:ctxflg:%x ctxxyfl:%x pdml %sallowed
1827879:object reference. PDML disallowed. kctflg:0x%x
1827880:arrays. PDML disallowed. kctflg:0x%x
1827881:nested tables. PDML disallowed. kctflg:0x%x
1827882:adts. PDML disallowed. kctflg:0x%x
1827883:delete with retry on row change hint. PDML disallowed
1827884:PDML disallowed for insert-values
1827885:PDML disallowed: Memoptimize for Write insert
--      MTI uniq constr violation                                   [ed: MTI = Multi-table insert ??]
1827887:PDML disallowed: conv insert into temp tab
1827888:PDML disallowed:conv insert into single frag && !IPD
1827889:ignore/change dupkey hint on INSERT. PDML disallowed
1827890:kkfdpdml_restrictions_10
1827891:PDML disallowed: MTI and HWM brokering not allowed
1827892:Complex ADT: PDML disallowed.
1827893:arrays. PDML disallowed. kctflg: 0x%x
--      arrays. PIDL disallowed. Not stored as securefile.
1827895:nested tables. PDML disallowed. kctflg: 0x%x
1827896:typed subquery. PDML disallowed. kctflg: 0x%x
1827897:object references. PDML disallowed. kctflg: 0x%x
1827898:kkfdpdml_restrictions: PDML allowed
1827899:PDML disallowed: Transaction duration Private Temp Table
1827900:Table with ADT columns and VPD Policy: PDML disallowed
1827901:update with retry on row change hint. PDML disallowed
1827902:serializable update. PDML disallowed
1827903: not in parallel long to lob. PDML disallowed
1827904:kkfdpdml_restrictions
--      Basicfile LOB column
1827906:remote table. PDML disallowed
1827907:fro is a view. PDML disallowed
1827908:PDML disallowed: query block can produce at most one row

You’ll notice that there are a number of occurrences of “PDML disallowed” in this list, which makes it look as if it may be the definitive list of notes that appear either in the 10053 trace file or the Notes section of the execution plan, or in the PX trace. There were, however, a few gaps as far as “consecutive” lines were concerned, so I went back to the strings output to fill them – these are the lines marked with “–“

I’m not going to expand on the content or relevance of the list, it’s just something to leave in place as a quick clue to check if you ever find that your DML isn’t executing parallel when you expect it to – do your circumstances seem to match any of the above conditions.

Addendum

You’ll note that a number of the strings reported above include the text: “kctflg: 0x%x”. Having noticed that I was getting some “PDML disabled” comments that weren’t in the list above I did a further text search in the executable and found a set of consecutive strings that might be the values that can go into the place-holders, and so may be further clues when you fail to see parallel DML (or, in some cases of the below, direct path loads):

delete cascade used
workspaces used
deferrable constraints are used
parent referential constraints are present
child constraint is deferrable
returning clause is present
array binds are used
triggers are defined
non partitioned or single fragment IOT used
remote table used
clustered table used
IOT with rowid mapping table used
partitioned IOT with MV dependecy used
table has one or more nested table columns
table has one or more array columns
table has one or more object reference columns
ADTs are used
temporary tables with indexes used
partitioned temporary tables used
LOBs are used with row movement possible
bitmap indexes are defined with row movement possible
bitmap indexes are defined
child ref-partitioned table may need cascaded migration
table has one or more lob columns
merge statement used on an IOT
table has one or more opaque column
cube organized table used
function is not pure and not declared parallel enabled
query block can produce at the most one row
transaction isolation level is serializable
retry on row change hint used
typed subquery used
ignore or change duplicate key hint was used
insert values was used
single fragment or non partitioned table used
MTI used on tables with unique constraints
parallel execution either disabled or not enabled in session
object is not decorated with parallel clause
policies with referential integrity constraints were added
domain indexes are defined
merge statement with conventional insert used
table locks have been disabled
index is not parallel ready
PDML is disabled in current session
table has one or more XMLType columns
instead of triggers defined
unique constraints are enforced by non-unique index
object has FGA policy
no append hint given and not executing in parallel
merge statement has only an update branch
function or domain index defined on the LOB column
check constraints are defined
noappend hint used
insert values with no append values hint used
_direct_path_insert_features parameter used to disable IDLV
row_locking parameter value is not INTENT

June 26, 2023

Case Study

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 11:32 am BST Jun 26,2023

Here’s an example of reading an execution plan that appeared on one of the Oracle forums recently. It’s a simple example, made harder by the initial (costmetic) presentation of the problem. Essentially the owner has said:

  • Here’s a query that’s taking too long.
  • Here’s a bit of the query that seems to be the slow bit
  • Here’s the live execution plan with rowsource execution stats enabled.

The query hadn’t arrived in an easily readable format – possibly because the OP didn’t know how to get a fixed font code display in the forum – and to make things harder the select list included several scalar subqueries and the text used the “commas at the start of line” convention. (Moreover it wasn’t immediately obvious that the first few lines of SQL shown were the first few lines of the problematic scalar subquery and not the start of the entire query.)

The OP had, however, supplied a link to a fixed format text of the execution plan, including the Query Block / Alias information, the Outline information (which showed they were running 19c with optimizer features set to 11.2.0.4), the Predicate information, and the Column Projection information.

I’ve extracted the key details from this output showing just the operations that claim to be responsible for most of the time of the execution. As you might guess, this is the sub-plan for the in-line scalar subquery that the OP had identified as the problem part, and in a follow-up posting had commented that with this subquery in place the query took one hour but with the query removed it “completed faster”.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name                          | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  25 |  SORT UNIQUE                                  |                               |   8149 |      1 |    46 |   145   (2)| 00:00:01 |   1414 |00:19:44.48 |      91M|    176K|  2048 |  2048 |     2/0/0|
|  26 |   WINDOW SORT                                 |                               |   8149 |      1 |    46 |   145   (2)| 00:00:01 |  37811 |00:19:44.47 |      91M|    176K|   214K|   214K|     2/0/0|
|  27 |    NESTED LOOPS                               |                               |   8149 |        |       |            |          |  37811 |00:19:58.14 |      91M|    176K|       |       |          |
|  28 |     NESTED LOOPS                              |                               |   8149 |      1 |    46 |   143   (0)| 00:00:01 |     17M|00:14:44.21 |      73M|  90450 |       |       |          |
|  29 |      TABLE ACCESS BY INDEX ROWID              | OE_ORDER_LINES_ALL            |   8149 |      1 |    31 |   139   (0)| 00:00:01 |     17M|00:10:17.69 |      19M|   8604 |       |       |          |
|* 30 |       INDEX RANGE SCAN                        | XXOM_OE_ORDER_LINES_N100      |   8149 |    142 |       |     4   (0)| 00:00:01 |     17M|00:00:20.96 |     100K|   6570 |       |       |          |
|* 31 |      INDEX RANGE SCAN                         | XXOM_3LP_SYM_ORA_ORDER_N11    |     17M|      1 |       |     3   (0)| 00:00:01 |     17M|00:03:45.52 |      54M|  81846 |       |       |          |
|* 32 |     TABLE ACCESS BY INDEX ROWID               | XXOM_3LP_SYM_ORA_ORDER_LINES  |     17M|      1 |    15 |     4   (0)| 00:00:01 |  37811 |00:05:59.72 |      19M|  86126 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  30 - access("OL"."INVENTORY_ITEM_ID"=:B1 AND "OL"."SHIP_FROM_ORG_ID"=:B2)
  31 - access("ML"."LINE_ID"="OL"."LINE_ID" AND "ML"."ORG_ID"="OL"."ORG_ID")
  32 - filter("MODEL_STRING" IS NOT NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
  25 - (#keys=1) FIRST_VALUE("MODEL_STRING") OVER ( PARTITION BY "OL"."INVENTORY_ITEM_ID" ORDER BY "OL"."CREATION_DATE" RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW )[240]

  26 - (#keys=3) "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "MODEL_STRING"[VARCHAR2,240], "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "ML"."LINE_ID"[NUMBER,22],
       "ML"."ORG_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22], "ML".ROWID[ROWID,10], FIRST_VALUE("MODEL_STRING") OVER ( PARTITION BY "OL"."INVENTORY_ITEM_ID" ORDER BY
       "OL"."CREATION_DATE" RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW )[240]

  27 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22],
       "ML".ROWID[ROWID,10], "ML"."ORG_ID"[NUMBER,22], "MODEL_STRING"[VARCHAR2,240], "ML"."LINE_ID"[NUMBER,22]
  28 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22],
       "ML".ROWID[ROWID,10], "ML"."LINE_ID"[NUMBER,22], "ML"."ORG_ID"[NUMBER,22]
  29 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22]
  30 - "OL".ROWID[ROWID,10], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22]
  31 - "ML".ROWID[ROWID,10], "ML"."LINE_ID"[NUMBER,22], "ML"."ORG_ID"[NUMBER,22]
  32 - "ML".ROWID[ROWID,10], "MODEL_STRING"[VARCHAR2,240]

As I said in my reply, you have to be careful of two traps with execution plans with row source execution stats (gather_plan_statistics) enabled – first that enabling the feature can add a lot of overhead and be very misleading if the query includes a large number of very short operations, secondly that the workload and time information doesn’t always appear exactly where you feel it ought to be.

In this case we’ve got the OP saying the query completed in an hour while the report suggests that this part of the plan took 19 and three-quarter hours to complete! It seems likely that the discrepancy is a reporting error due to the side effect of the 17 million starts of operations 31 and 32, with a granularity error far in excess of the actual run-time.

If you want to check the rest of the execution plan, the driving body of the query (operations 123 to 175) has a few steps that report a few seconds with a total time of 26.90 seconds, and none of the other scalar subquery subplans report more that a few seconds. So it seems fairly clear that the necessary strategy for improving the performance of the whole thing is to find a way of improving the performance of this scalar subquery.

Note: addressing just the scalar subquery “in a vacuum” may be possible and may be sufficient. If this doesn’t work than the next – and more complex – strategy would be to take a wider view of the query and see if there were some way to get the optimizer to treat the scalar subquery differently – e.g. passing more access/filter predicates into the subquery, or finding a way of unnesting the subquery so that a brute force mechanism could be used once rather than the basic nested loop being used 8,149 times.

Analysis of subquery

I’ve shown the extracted plan for the critical subquery further up the page (which you can open from here in a new tab), here’s the in-line scalar subquery (with some cosmetic editing) that appeared in the original posting:

(
select distinct 
        first_value(model_string) over (
                partition by ol.inventory_item_id 
                order by ol.creation_date
        )
from
        xxom.xxom_3lp_sym_ora_order_lines       ml,
        apps.oe_order_lines_all                 ol
where 
        ol.inventory_item_id = pl.item_id                       -- po_lines_all
and     ol.ship_from_org_id  = pll.ship_to_organization_id      -- po_line_locations_all
and     ml.line_id           = ol.line_id
and     ml.org_id            = ol.org_id
and     model_string is not null
)       model_string,

I’ve highlighted the lines with the predicates that correlate the subquery back to the main query. You’ll note that there are two tables driving the subquery which means, in theory, that the join order and join method of those tables in the main query could (thanks to the way that scalar subquery caching is implemented) affect the number of times the subquery was called . For the purposes of this note I’m not going to worry about that detail, though.

The plan shows a simple nested loop, followed by a window sort (for the first_value() call), then a sort unique (for the distinct). We can see from the starts column that the subquery is called 8,149 times, and something that stands out quite sharply is that the index probe and table access for the second table in the nested loop (xxom_slp_sym_ora_order_lines) happens roughly 17 million times, i.e. for each row selected from oe_order_lines_all there’s an average of roughly 2,000 rows in identified and returned from xxom_slp_sym_ora_order_lines. Any tiny operation happening millions of times is likely to introduce some problematic side-effects, so it’s clearly worth having a look at this area of the plan (the fact that the values reported in the A-Time column are extreme just makes it stand out even more).

You have to be very cautious about the A-Time column when enabling rowsource execution statistics, it’s notorious for introducing double-counting and measurement error. For my own purposes I would ignore the silly figures at this point, but for the purposes of the case study I’ll make the suggestion that the 19 hours 58 minutes at operation 27 (nested loop) is the 14 hours and 44 minutes at operation 26 plus the 5 hours and 59 minutes at operation 32 (effectively double counting); then (leaving you to find the numbers and operations) 14:44 is 10:17 + 3:45 (double counting (again) with granularity errors); then 10:17 is double counting (yet again) 3:45 and 5:59; finally those last two figures have a measurement/granularity overhead that is larger than the actual time spent anyway. So ignore the exact details of the timing (once it’s given you a pointer), and look at the work done an output achieved.

Critically, we access xxom_3lp_sym_ora_order_lines 17 million times by rowid at operation 32 to return just 37,811 rows, and we also note that this 37,811 rows is the (8,149 versions of the) result set that is sorted for the first_value() function and sorted then again for the uniqueness operation. So a basic optimization strategy would be to see if we can reduce that 17 million to 37,811 earlier in the plan. Possibly there’s a different index that we could use that would allow us either to visit only the table rows we need (dropping the table access starts down to 37,811) or even to avoid visiting the table at all.

Option 1

Aiming for the simplest option – what do we do at the table: we have a filter predicate filter(“MODEL_STRING” IS NOT NULL), and the projection information says we’re only interested in “ML”.ROWID[ROWID,10], “MODEL_STRING”[VARCHAR2,240]. To sum up – if we can add model_string as a column of the index currently used then we could eliminate operation 32 entirely. The Outline Data (which I haven’t reproduced) tells us that the index in question has three columns (line_id, org_id, assembly_type), so we would have to decide whether to include the model_string as column 3 or column 4 – which we can’t do at this point because we have no information about how this index might otherwise be used.

Option 2

Adding a column to the index may eliminate some 17M function falls, 86K physical reads 19M buffer gets, but we still probe the index 17M times, taking 81K physical reads and 54M buffer gets; we also introduce the possibility of other queries performing less well because of the impact of the size and placement of the extra column – so we could consider a better approach: can we introduce a function-based index, or index on virtual columns, that would be much safer and also much smaller, and then modify the code to join through this index?

We access the table 17M times and find that the model_string is null in all but a few thousand cases. Does this give us a true picture of the whole table, or is it just a co-incidence in this case? If this is indicative of the whole table then an index defined as follows could be very effective:

(
        case when model_string is null then null else line_id end,
        case when model_string is null then null else org_id end,
	model_string
)

(For all recent versions of Oracle I would prefer to create two invisible, virtual columns, then create the index on the virtual columns and model_string. I would also investigate the meaning of the line_id and org_id columns since there may be scope of basic index compression on one or both these columns, which might dictate a better order for the first two columns).

With this definition a row with a null model_string will not appear in the index then, with a corresponding modification to the SQL, the work of probing the index would be reduced to a minimum – though it would still have to happen 17M times!

Option 3

It’s possible that the performance will be good enough after eliminating the probe of the second table and the corresponding nested loop operation, and reducing the work needed for the index probe; but it may be necessary to do far more. The fact that the available information suggests the model_string is often null and results in almost all the 17M probes returning no data prompts what might be the next (in terms of complexity) strategy to investigate – would it be possible to visit the xxom_3lp_sym_ora_order_lines table first and the oe_order_lines_all table second and only (we hope) a few thousand times.

Option 4

It doesn’t look as if a simple rewrite would be possible, but it’s very hard to tell given the number of tables in the query and the number of scalar subqueries in the select list and my ignorance of what all the tables represent and what the code is supposed to do, so I’m not going to think about that.

Instead I’ll go back to a point that I made at the start of this note: system is 19c but the optimizer features have been set back to 11.2.0.4. In 12.2.0.1 (possibly 12.1.0.1) Oracle introduced code to unnest scalar subqueries in the select list and it’s possible that an internal rewrite might manage to do something sensible with this subquery.

If it hadn’t been done already I’d try testing this query with the default optimizer features. If that was a disaster I’d repeat the exercise but start adding /*+ no_unnest */ hints to the scalar subqueries – maybe all but the critical one as a first test. I’d probably have to use some of the hints from the Outline Data of the 11g plan to disable some of the more esoteric 19c features.

If I didn’t get anywhere with the default 19c features I’d also do a few experiments with optimizer_features_enable set to 12.2.0.1 (which, I think, is the earliest version that supported scalar subquery unnesting – though it might have been 12.1.x.x).

Conclusion

A detailed execution plan, with runtime statistics generated through the gather_plan_statistics hint gives a very clear indication of where most of the query time is (apparently) spent. When a very small operation executes a very large number of times the effect of granularity errors often exaggerate the times so a little caution is always needed – especially where nested loops are concerned.

In the interests of keeping any changes as simple as possible a nice strategy is to ask two questions

  • can I make this operation happen less frequently
  • can I make this operation less resource intensive each time it happens

In our case, adding a column to an existing index could make one of the critical operations (table access by rowid) disappear completely; then, in a refinement of that localised approach, creating a function-based index (or index on virtual columns) might also make the remaining operation (index range scan) much less expensive.

Beyond that it was not immediately obvious how a simple rewrite could make the whole query more efficient, but allowing the optimizer to use some of its newer features might allow it to produce an internal rewrite that could eliminate a “row by row” process of calling scalar subqueries with embedded nested loops into a “bulk” process that unnested the subquery and allowed Oracle to find the relevant data for all the rows in a single operation before combining with a hash join.

Footnote

There is a little anomaly with the query that may have had some impact on the optimizer’s choice of execution path, but I didn’t want to get stuck into a detail that might be completely irrelevant when looking at a plan that flags up such a huge timing skew. The oddity appears in the Predicate information at operation 141 of the execution plan:

141 - filter(("PH"."ORG_ID"=9082 AND ((-1)=:BUYER OR "PH"."AGENT_ID"=:BUYER) AND (SYS_OP_C2C(COALESCE("PH"."CLOSED_CODE",'OPEN'))=U'OPEN' OR
SYS_OP_C2C(COALESCE("PH"."CLOSED_CODE",'OPEN'))=U'CLOSED')))

The sys_op_c2c() function calls tells us that there must be some “national language” activity going on somewhere in this query, and looking back at the full text of the query we can see this in the predicates:

AND COALESCE(ph.closed_code, 'OPEN') IN (N'OPEN',N'CLOSED')
AND PLL.SHIP_TO_ORGANIZATION_ID IN (N'12389',N'12390',N'12469',N'9083',N'12549')

If you’re wondering why I haven’t said anything about any sys_op_c2c() function calls appearing elsewhere in the predicate information, it’s because they don’t. Here (at operation 152) is the ship_to_organization_id predicate showing the character literals being coerced to numeric values:

 152 - access("MP"."ORGANIZATION_ID"="PLL"."SHIP_TO_ORGANIZATION_ID")
       filter(("MP"."ORGANIZATION_ID"=9083 OR "MP"."ORGANIZATION_ID"=12389 OR "MP"."ORGANIZATION_ID"=12390 OR "MP"."ORGANIZATION_ID"=12469 OR "MP"."ORGANIZATION_ID"=12549))

Character-set conversions and storing numbers as characters are mechanisms that make it hard for the optimizer to produce a reasonable cardinality estimate – and bad estimates usually produce bad plans.

June 13, 2023

Case Study

Filed under: Oracle,Performance,Problem Solving,Troubleshooting,Tuning — Jonathan Lewis @ 12:50 pm BST Jun 13,2023

Here’s a problem that appeared on the Oracle-L list server a little while ago. It features an Exadata system with a “very large” GTT (global temporary table) and a pair of queries that are repeated frequently, using a different literal value on each repetition of the pair.

The queries, with sample execution plans, have been reported using the SQL Monitor feature and are available on github, but I’ll repeat them here, individually, for convenience. The first query is a very simple single-table with aggregation:

SELECT FG.FID,   FG.DCD, FG.MDT,1.00 , SUM (FG.N_CD), SUM (FG.d_nsd), and several more summed columns
    FROM F_GTT FG
   WHERE FG.FID = {literal constant}
GROUP BY FG.FID, FG.DCD, FG.MDT
ORDER BY 1, 3


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)           
 Instance ID         :  3                         
 SQL ID              :  gmm2v6dtvu0yb             
 SQL Execution ID    :  50331648                  
 Execution Started   :  06/01/2023 14:52:21       
 First Refresh Time  :  06/01/2023 14:52:25       
 Last Refresh Time   :  06/01/2023 14:53:11       
 Duration            :  50s                       
 Fetch Calls         :  1                         

Global Stats
==================================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  |  Other   | Fetch | Buffer | Read  | Read  | Uncompressed |  Offload   |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes |    Bytes     | Elig Bytes | Returned Bytes |
==================================================================================================================================================
|      50 |      48 |     0.68 |        0.01 |     0.00 |     0.72 |     1 |     3M | 45632 |  22GB |         22GB |       22GB |           22GB |
==================================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3209725901)
==============================================================================================================================================================================
| Id |          Operation           |      Name      |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  | Activity |      Activity Detail      |
|    |                              |                | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |   (%)    |        (# samples)        |
==============================================================================================================================================================================
|  0 | SELECT STATEMENT             |                |         |      |        47 |     +4 |     1 |       31 |       |       |     . |          |                           |
|  1 |   SORT GROUP BY              |                |     217 | 792K |        47 |     +4 |     1 |       31 |       |       | 14336 |          |                           |
|  2 |    TABLE ACCESS STORAGE FULL | F_GTT          |   63836 | 792K |        50 |     +1 |     1 |    37491 | 45632 |  22GB |  14MB |    97.96 | Cpu (46)                  |
|    |                              |                |         |      |           |        |       |          |       |       |       |          | cell smart table scan (2) |
==============================================================================================================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - storage("FG"."FID"={literal value})
       filter("FG"."FID"={literal value})
	   

It’s not a difficult query, and it’s easy to see that almost all the time is spent in a tablescan that reads 22GB to identify 37,491 rows, with fractions of a second spent aggregating those rows. And of the time spent in the tablescan almost all of it is CPU time (rather than waits for I/O).

Details to note:

  • We’ve done 45.000 read requests for 22GB of data read, which suggests we’re doing reads of 500KB (rather than the 1MB that you might have expected).
  • We’ve done 3M buffer gets – which is a good match for 22GB of data read with an 8KB block size, so it looks like we’re doing a simple tablescan into buffer cache. (This is a little unexpected since we spend time on “cell smart tablescan” so it would be nice to see the session statistics for one execution so that we could see the underlying reason for these buffer gets).
  • We don’t seem to have saved any data traffic through storage indexes (but that might be because Oracle doesn’t create storage indexes for temporary tablespaces – that’s a detail I’d have to check).

It would be nice to know, of course, how many rows there were in the table, and how many distinct values of FID, and how variable the number of rows per FID was across the table. Based on the little we know, though, and assuming that this single execution represents a fairly typical value of FID, the basic question seems to be: would we benefit from defining an appropriate index on the table, though there are several other ways that you could choose to improve the performance if you had a completely free hand and could make whatever simple changes you liked in the surrounding code.

We can note, of course, that with a suitable index we might have done about 37,500 (single block) read requests compared to 45,000 multi-block read requests, and that looks like a small win on I/O but with a potentially big win on CPU. The obvious index is on (FID), defined with “compress 1” but if we created the index on (FID, MDT, DCD) we might get some extra benefit since Oracle could walk the index using a “SORT GROUP BY NOSORT” operation to aggregate and return the data in the right order.

Note: the SQL says: select fid, dcd, mdt … group by fid, dcd, mdt order by 1,3 – but Oracle has assumed that the order by clause can be ignored because of the group by (the optimizer’s allowed to do that, programmers aren’t). This means the optimizer has recognized that it could the query to group by fid, dcd, mdt order by 1,2 to produce the same result.

Since the initial 37,491 rows for the FID collapsed to 31 rows after aggregation might find that we could even create the larger index with “compress 3”.

Here’s another thought that goes with the idea of creating the perfect index for walking the data and avoiding sorting for every execution of the query: this table is re-populated before the queries are executed so could we populate the table with pre-ordered data so that all the rows for any individual FID are packed into the smallest number of blocks?

Of course, changes like adding an index and pre-sorting data might have significant side effects in other parts of the code. What we’re trying to do here is find big wins using simple strategies that avoid complicated code changes – we need to be sure that we haven’t introduced side-effects that mean we’ve moved the problem rather than solving it.

We should keep our thoughts about the first query in mind as we move on to the second query. This one is a very simple join and aggregation (via distinct) which features the same very large tablescan:

SELECT  DISTINCT ft.fT_NM, FGTT.DCD, ft.COl5, ft.FT_CID, RGTT.mcg_ind,   C.D_CD
    FROM rt_gtt         RGTT,
         F_GTT          FGTT,
         RD_FTIN ft,
         RDC           c
   WHERE     FGTT.eid = RGTT.nid
         AND FGTT.etyp = RGTT.nety
         AND FGTT.FID = ft.ftid
         AND FGTT.FID = {literal value}
         AND FGTT.DCD = c.DCD
         AND ft.FT_CID = RGTT.mcid
ORDER BY ft.fT_NM, DCD

	   
	   Global Information
------------------------------
 Status              :  DONE (ALL ROWS)           
 Instance ID         :  3                         
 SQL ID              :  c3y3n73an8b69             
 SQL Execution ID    :  50331648                  
 Execution Started   :  06/01/2023 14:54:39       
 First Refresh Time  :  06/01/2023 14:54:43       
 Last Refresh Time   :  06/01/2023 14:55:17       
 Duration            :  38s                       
 Fetch Calls         :  1                         

Global Stats
==================================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  |  Other   | Fetch | Buffer | Read  | Read  | Uncompressed |  Offload   |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes |    Bytes     | Elig Bytes | Returned Bytes |
==================================================================================================================================================
|      39 |      37 |     0.57 |        0.01 |     0.00 |     0.59 |     1 |     4M | 45640 |  22GB |         22GB |       22GB |           22GB |
==================================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=891026795)
========================================================================================================================================================================================
| Id |             Operation              |             Name             |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  | Activity | Activity Detail |
|    |                                    |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |   (%)    |   (# samples)   |
========================================================================================================================================================================================
|  0 | SELECT STATEMENT                   |                              |         |      |         1 |    +38 |     1 |        1 |       |       |     . |          |                 |
|  1 |   SORT UNIQUE                      |                              |   19196 |   6M |         1 |    +38 |     1 |        1 |       |       |  2048 |          |                 |
|  2 |    NESTED LOOPS                    |                              |         |      |         1 |    +38 |     1 |      380 |       |       |     . |          |                 |
|  3 |     NESTED LOOPS                   |                              |   19196 |   6M |         1 |    +38 |     1 |      380 |       |       |     . |          |                 |
|  4 |      NESTED LOOPS                  |                              |      94 |   6M |         1 |    +38 |     1 |        2 |       |       |     . |          |                 |
|  5 |       NESTED LOOPS                 |                              |      1M |   4M |         1 |    +38 |     1 |        2 |       |       |     . |          |                 |
|  6 |        VIEW                        | VW_DTP_E7E675E0              |      3M | 815K |         1 |    +38 |     1 |        2 |       |       |     . |          |                 |
|  7 |         HASH UNIQUE                |                              |      3M | 815K |        35 |     +4 |     1 |        2 |       |       |  11MB |          |                 |
|  8 |          TABLE ACCESS STORAGE FULL | F_GTT                        |      3M | 792K |        39 |     +0 |     1 |    27239 | 45638 |  22GB |  14MB |   100.00 | Cpu (38)        |
|  9 |        TABLE ACCESS BY INDEX ROWID | rt_gtt                       |       1 |    2 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 10 |         INDEX RANGE SCAN           | rt_gtt_IX5                   |       1 |    1 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 11 |       TABLE ACCESS BY INDEX ROWID  | RD_FTIN                      |       1 |    3 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 12 |        INDEX RANGE SCAN            | RD_FTIN_IX1                  |       1 |    2 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 13 |      INDEX RANGE SCAN              | RDC_IX1                      |     206 |    1 |         1 |    +38 |     2 |      380 |     2 | 16384 |     . |          |                 |
| 14 |     TABLE ACCESS BY INDEX ROWID    | RDC                          |     204 |  169 |         1 |    +38 |   380 |      380 |       |       |     . |          |                 |
========================================================================================================================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - storage("FGTT"."FID"={literal value})
       filter("FGTT"."FID"={literal value})
   9 - filter("ITEM_3"="RGTT"."nety")
  10 - access("ITEM_4"="RGTT"."nid")
  11 - filter("FT"."FT_CID"="RGTT"."mcid")
  12 - access("ITEM_2"="FT"."ftid")
  13 - access("ITEM_1"="C"."DCD")

Again we have a query that selects all the rows for a given FID from the global temporary table, but this time it joins them to three other tables before eliminating duplicates. Unsurprisingly a very large fraction of the time is spent on the tablescan with the same 45,000 read requests pushing 22GB across the interconnect to the buffer cache, with most of the time recorded as CPU time not I/O wait time.

There’s a very interesting feature of this plan, though. The SQL statement says “join then make distinct”, but the plan reports “make distinct then join” – applying a hash unique operation to the GTT data reducing 27,239 rows to 2 rows before joining the other three tables. (I’ve written about the “group by placement” transformation in the past, this is the equivalent transformation for select distinct.)

Note: it’s a little odd that two queries running just one and a half minutes apart from each other should have estimates which are out by a factor of two in one case and 100 in the other for a simple “column = constant” predicate. This might be something to do with “out of range” values, or a side effect of a hybrid histogram with an unlucky sample size.

The only thing that really matters in this query is the tablescan of the GTT, and obviously we could again consider creating an index on (FID) as a possible way to reduce the workload, giving us a single extra index that might benefit both queries. In this case, though, there are (only) three other column from the GTT appearing in the query, so we could create an index that included those column so that Oracle didn’t have to visit the table at all. For example (FID, eid, etyp, dcd) – with some level of compression that we’d have to decide on by reviewing the data.

Of course that does leave us wanting two new indexes on the same large table that have to be maintained as the data is loaded (remember that you can’t create indexes on GTT’s after the data has been inserted). So we ought to look for alternative strategies.

Again the reduction from tens of thousands of rows to “just a few” rows prompts an idea. I’m guessing that Oracle aggregates (distinct) the GTT table down with the equivalent of an inline view: “select distinct fid, eid, etyp, dcd from … “ (the 10053 (CBO) trace might show this as one of the “unparsed” queries). If that’s the case, why not have a summary GTT that emulates a “refresh complete materialized view” on the main GTT and populate it with the result of that select distinct immediately after populating the main GTT; then reference the summary GTT in the join query instead of the main GTT. With a reduction by a factor of around 10,000 in the number of rows and a further reduction due to the column restriction, the tablescan might drop from 22GB to just a couple of (cached) megabytes.

With the “materialized view” concept in mind, we could revisit the first query. The big problem with the current coding strategy is that we populate a large table once, then keep executing a pair of queries that are interested in one FID at a time. Maybe we could put a wrapper around the whole thing to populate two pre-calculated, small, summary tables, and change then tweak the two problem queries to reference their corresponding “materialized views”.

Summary

The system populates a very large GTT, then seems to be running a loop to execute two queries per “FID” value. Both queries spend most of their time doing a tablescan of the large GTT before aggregating a few thousand rows down to a handful of rows.

A basic strategy for improving performance would be to create two new indexes on this table (though one might be “good enough” for both queries in the short term); the indexing strategy could be made more effective if the code to load the GTT could be modified to pre-sort the data before loading it.

An alternative strategy that takes advantage of the massive drop in volume as the data is aggregated is pre-load two (much smaller) “summary” GTTs after loading the main GTT and before starting the loop through the FIDs. This would require the two queries to be edited to reference their respective summary tables. (The summary tables could, of course, be indexed to add a little extra benefit.)

January 28, 2023

Case Study

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 12:46 pm GMT Jan 28,2023

This example appeared quite recently on one of the public Oracle Forums, and the person who posted it had already worked out (the important bit of) what was going on before anyone asked for more information or supplied any suggestions for action (although there was one response pointing to a MOS Note that might have had some relevance and did contain some interesting, though unrelated, background information).

The title of the question was: “KTSJ process running select dbms_rowid.rowid_type(rowid) causing huge load”, and the opening post contain a number of bits of information about what KTSJ was and how it relates to SMCO. The critical details, though, were the SQL, and what it was running on.

The system was a 4 node RAC, running 19.12. The problem query was running multiple times between 4:00 a.m. and 9:00 a.m. and the table it references is about 4 GB and “not fragmented”. The table is “basic compressed” and partitioned on a date column, there are no LOB columns, and there are about 200,000 updates on the table “each morning” (but no comment about whether this is before, during, or after the performance problem). The query was simple:

select dbms_rowid.rowid_type(rowid) from <owner>.<table_name> where rownum=1;

In a follow-up post we got the following information about a typical run of the query:

Global Stats
============================================================================
| Elapsed |    Cpu  |      IO  |  Cluster | Fetch | Buffer |  Read |  Read |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |   Gets |  Reqs | Bytes |
============================================================================
|     123 |      16 |       94 |       13 |     1 |     1M | 81215 |  10GB |
============================================================================

If you’re familiar with the the SQL Monitor you’ll recognise this as the summary information from an SQL Monitor report – which means, of course, that the OP also has an execution plan for the query.

It’s easy of course to do the Bart Simpson “I knew that” claim after the OP has given the answer, but even with this tiny amount of information there are several clues to tell you what’s probably going on and why.

Inferences

In this case the query is so simple there are virtually no options for the optimizer. It’s a reference to a single table, it’s asking only for the rowid, and it’s going to stop after one row is returned – there are only three possible (unhinted) execution plans: a tablescan, an index fast full scan, or an index full scan – all with a stopkey. In fact, it’s only in earlier versions of Oracle that the optimizer would choose an index full scan over an index fast full scan.

The idea of a segment scan (whether tablescan or index fast full scan) may sound a bit extreme as a path to return just one row, but the optimizer knows that it’s going to stop after one row and adjusts the cost accordingly. Since indexes on a table are (almost always) smaller than the table itself we will probably see an index fast full scan on whichever index reports the smallest number of leaf blocks (and included at least one column declared not null), and we should see the scan stopping after a very small number of block visits.

The Global Stats tell us a completely different story. We might be expecting one multiblock read of 128 blocks, and possibly a few more single block reads for read-consistency (undo records applied) reasons; we’ve actually done 81,215 Read Requests totalling about 10GB (which really means 9.5GB to 10.5GB) or roughly 1.3 million blocks. This average read size of fractionally over 16 suggests a huge segment scan that didn’t stop early. As a quick sanity check you’ll note we’ve also reported 1M Buffer Gets (which means 0.5M to 1.5M) which is consistent with a segment scan into the buffer cache.

  • Question: Why would Oracle do a huge segment scan when we know it should have stopped scanning almost immediately?
  • Answer: there was a huge amount of empty space at the “start” of the segment.

One detail the OP told us was that the table was only 4GB and “not fragmented” – I don’t think you’d say that if there were 10GB of empty blocks in the table before a scan got to the first row. On the other hand, the OP didn’t say anything about the size or state of any indexes and we’re expecting an index fast full scan of the “smallest” index on the table.

So how does the smallest index on a 4GB table turn into a 10GB fast full scan? The answer is in two parts:

  • First is that the optimizer is looking at user_indexes.leaf_blocks to calculate the cost of a fast full scan.
  • Second is that the code to generate the value of leaf_blocks doesn’t see empty leaf blocks
  • Third is that there are patterns of data processing that can leave an index in a state where it holds a huge number of empty leaf blocks that haven’t been flagged for re-use.

The OP said there were about 200K updates on the table each morning – what if the activity was something like:

  • Insert a batch of rows with a status column holding a starting value
  • The column is indexed to allow rapid access to “new” (and other interesting, “rare”) status values
  • The column cycles through a few different status values before getting to a “final” status.

Depending on batch sizes, timing, concurrency effects, version of Oracle, and the possible values for status you could end up with an index that had a huge number of empty leaf blocks at the “low value” end and all the populated leaf blocks at the “high value” end. It’s not supposed to happen, but occasionally it does. If you gathered index stats Oracle would count only the blocks that held the few high values but when you did an index fast full scan you’d have to scan through a load of empty leaf blocks before you found one with an index entry.

Summary

Realistically the correct diagnostic method for problem would have been to check the plan (and activity) from the SQL Monitor report, notice the index fast full scan and (if they were visible) multiblock read waits, then check the state of the index. This note was just a little lesson on the way in which you can make reasonable guesses (inferences / deductions) from insufficient data so that you have some idea of what might have gone wrong.

In this case the steps were:

  1. Know that there are (realistically) only two possible execution paths for the stated query
  2. Do some arithmetic to recognise the activity as a large segment scan
  3. Eliminate the tablescan because the description of its state doesn’t allow for the observed phenomenon
  4. Recognise the potential for an index to get into a state that matches the phenomenon

Lagniappe

The OP had already replied to his own question explaining that he had identified an index that had grown to 13GB (when the table was only 4GB), and that the index was only 0.03GB (30MB) after a rebuild, and the problem disappeared.

This suggests a need for two further investigations:

  • Why does the index grow so much, and does it need to be coalesced or shrunk on a very regular basis (e.g. daily, or once per week).
  • A average table row is 133 (= 4 / 0.03) times as long as its index entry. That sounds like the rows may be rather long. (At least 1,470 bytes since the minimum Oracle allows for an index entry is 11 bytes). Has the tables pctfree been set for optimum storage, and might the table be suffering from lots of migrated rows?

January 15, 2023

Quiz Night

Filed under: Infrastructure,Oracle,Parallel Execution,Troubleshooting — Jonathan Lewis @ 6:25 pm GMT Jan 15,2023

Here’s a little extract from one of my “snap instance activity stats” packages picking out the figures where the upper case of the statistic name is like ‘%PARALLEL%’. I was very careful that nothing except one of my SQL*Plus sessions had done anything in the few seconds between the start and end snapshots so there’s no “(un)lucky timing” to explain the anomaly in these figures.

The quesion is: how can Oracle manage to claim hundreds of “checkpoint buffers written for parallel query” when there had been no parallel statements executing around the time the snapshots were taken?

Name                                                        Value
----                                                   ----------
DBWR parallel query checkpoint buffers written              1,430
queries parallelized                                            0
DML statements parallelized                                     0
DDL statements parallelized                                     0
DFO trees parallelized                                          0
Parallel operations not downgraded                              0
Parallel operations downgraded to serial                        0
Parallel operations downgraded 75 to 99 pct                     0
Parallel operations downgraded 50 to 75 pct                     0
Parallel operations downgraded 25 to 50 pct                     0
Parallel operations downgraded 1 to 25 pct                      0

Here’s a little background information if you don’t know why this is a puzzle.

When you start executiing a parallel tablescan (or index fast full scan) the first step that your session takes is to tell DBWR to write to disc any dirty blocks for the object you’re about to scan and wait for DBWR to confirm that the blocks have been written. The session does this because a parallel scan will bypass the buffer cache and read directly from disc to the PGA, so if there were any dirty blocks for the object in the buffer cache the query would not find them and get them to a correctly read-consistent state. The blocks written by DBWR in this case would (should) be reported under “DBWR parallel query checkpoint buffers written”.

Answer

It didn’t take long for the correct answer to appear in the comments. I updated a large number of rows in a table, then I set the parameter “_serial_direct_read” to “always” in my session and executed a query that did a tablescan of that table.

My session called the database writer to do an “object checkpoin”, then started waiting on event “enq: KO – fast object checkpoint” before reading the table using direct path reads of (just under) 1MB each.

I published this note because a question came up on the Oracle developer forum which (as a side note) had noted the apparent contradiction in a Statspack report, and I thought it was a useful lesson covering two important points.

First: the Oracle code keeps evolving, and the instrumentation and statistics don’t always keep up; in this case I think there’s a code path that says “fast object checkpoint”, but doesn’t have a flag that separates the newer serial scan option from the original parallel scan option – hence serial scans triggering “parallel query checkpoints”.

Secondly: when you see some performance figures that don’t seem to make sense or contradict your previous experience, it’s worth thinking around the oddity to see if you can come up with a feasible (and testable) explanation along the lines of “what if Oracle’s doing something new but calling some old instrumentation”.

Footnote

One of the comments includes some questions about how the whole parallel query / read-consistency / checkpointing works if more DML is happening as the query runs. I’ll come back to that in a couple of days.

November 21, 2022

Row_number() sorts

Filed under: Oracle,Troubleshooting,Tuning,Upgrades — Jonathan Lewis @ 5:47 pm GMT Nov 21,2022

An email on the Oracle-L list server a few days ago described a performance problem that had appeared after an upgrade from 11.2.0.4 to 19c (19.15). A long running statement (insert as select, running parallel 16) that had run to completion in 11g using about 20GB of temporary space (with 50GB read and written) had failed after running for a couple of hours in 19c and consuming 2.5 TB of temporary space even when the 11g execution plan was recreated through an SQL Profile.

When I took a look at the SQL Monitor report for 19c it turned out that a large fraction of the work done was in an operation called WINDOW CHILD PUSHED RANK which was there to deal with a predicate:

row_number() over(partition by t.ds_no, t.c_nbr order by c.cpcl_nbr desc) = 1

Checking the succesful 11g execution, this operation had taken an input rowsource of 7 Billion rows and produced an output rowsource of 70 Million rows.

Checking the SQL Monitor report for the failed executions in 19c the “pure” 19c plan had reported 7 billion input rows, 6GB memory allocated and 1TB of temp space at the same point, the plan with the 11g profile had reported 10 billion input rows, but the operation had not yet reported any output rows despite reporting 9GB as the maximum memory allocation and 1TB as the maximum temp space usage. (Differences in row counts were probably due to the report being run for different dates.)

So, the question to the list server was: “is this a bug in 19c?”

Modelling

It’s a little unfortunate that I couldn’t model the problem in 19c at the time because my 19c VM kept crashing; but I built a very simple model to allow me to emulate the window sort and row_number() predicate in an 11g instance, then re-played the model in an instance of 21c.

For the model data I took 50 copies of the first 50,000 rows from view all_objects to produce a table of 2,500,000 rows covering 35,700 blocks and 279 MB, (55,000 blocks / 430 MB in 21c); then I ran the query below and reported its execution plan with a basic call to dbms_xplan.display_cursor():

select
        /*+ dynamic_sampling(0) */
        owner, max(object_name)
from    (
        select 
                /*+ no_merge */
                owner, object_name 
        from    (
                select 
                        owner, object_name,
                        row_number() over (partition by object_name order by object_type desc) orank 
                from 
                        t1
                )  where orank= 1
        )
group by 
        owner
order by
        owner
/

-------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |       |       |       | 29491 (100)|          |
|   1 |  SORT GROUP BY             |      |     8 |   184 |       | 29491   (9)| 00:02:28 |
|   2 |   VIEW                     |      |  2500K|    54M|       | 28532   (6)| 00:02:23 |
|*  3 |    VIEW                    |      |  2500K|   112M|       | 28532   (6)| 00:02:23 |
|*  4 |     WINDOW SORT PUSHED RANK|      |  2500K|    95M|   124M| 28532   (6)| 00:02:23 |
|   5 |      TABLE ACCESS FULL     | T1   |  2500K|    95M|       |  4821   (8)| 00:00:25 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ORANK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "OBJECT_NAME" ORDER BY
              INTERNAL_FUNCTION("OBJECT_TYPE") DESC )<=1)

Oracle 21c produced the same execution plan – though the row estimate for the VIEW operations (numbers 2 and 3) was a more realistic 46,236 (num_distinct recorded for object_name) compared to the unchanged 2,500,000 from 11g. (Of course it should have been operation 4 that showed the first drop in cardinality.)

With my first build, the timings weren’t what I expected: In 21c the query completed in 3.3 seconds, in 11g it took 11.7 seconds. Most of the difference was due to a large (55MB) spill to temp space that appeared in 11g but not in 21c. This would have been because myb11g wasn’t allowed a large enough PGA, so I set the workarea_size_policy to manual and the sort_area_size to 100M, which looks as if it should have been enough to cover the 11g requirement – it wasn’t and I had to grow the sort_area_size to 190 MB before the 11g operation completed in memory, allocating roughly 155MB. By comparison 21c reported an increase of only 19MB of PGA to run the query, claiming that it needed only 4.7MB to handle the critical operation.

For comparison purposes here are the two run-time execution plans, with rowsource execution stats (which messed the timing up a little) and the column projection information;

Results for 11g

-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |       |       | 29491 (100)|      8 |00:00:03.96 |   35513 |       |       |          |
|   1 |  SORT GROUP BY             |      |      1 |      8 |   184 |       | 29491   (9)|      8 |00:00:03.96 |   35513 |  3072 |  3072 | 2048  (0)|
|   2 |   VIEW                     |      |      1 |   2500K|    54M|       | 28532   (6)|  28575 |00:00:04.07 |   35513 |       |       |          |
|*  3 |    VIEW                    |      |      1 |   2500K|   112M|       | 28532   (6)|  28575 |00:00:03.93 |   35513 |       |       |          |
|*  4 |     WINDOW SORT PUSHED RANK|      |      1 |   2500K|    95M|   124M| 28532   (6)|   1454K|00:00:08.82 |   35513 |   189M|  4615K|  168M (0)|
|   5 |      TABLE ACCESS FULL     | T1   |      1 |   2500K|    95M|       |  4821   (8)|   2500K|00:00:10.85 |   35513 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ORANK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "OBJECT_NAME" ORDER BY INTERNAL_FUNCTION("OBJECT_TYPE") DESC )<=1)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) "OWNER"[VARCHAR2,30], MAX("OBJECT_NAME")[30]
   2 - "OWNER"[VARCHAR2,30], "OBJECT_NAME"[VARCHAR2,30]
   3 - "OWNER"[VARCHAR2,30], "OBJECT_NAME"[VARCHAR2,30], "ORANK"[NUMBER,22]
   4 - (#keys=2) "OBJECT_NAME"[VARCHAR2,30], INTERNAL_FUNCTION("OBJECT_TYPE")[19], "OWNER"[VARCHAR2,30], ROW_NUMBER() OVER ( PARTITION BY
       "OBJECT_NAME" ORDER BY INTERNAL_FUNCTION("OBJECT_TYPE") DESC )[22]
   5 - "OWNER"[VARCHAR2,30], "OBJECT_NAME"[VARCHAR2,30], "OBJECT_TYPE"[VARCHAR2,19]

It’s an interesting oddity, and possibly a clue about the excess memory and temp space, that the A-Rows column for the Window Sort operation reports 1,454K rows output when it surely ought to be the final 45,982 at that point. It’s possible to imagine a couple of strategies that Oracle might be following to do the window sort that would reasult in the excess volume appearing, and I’ll leave it to the readers to investigate that

Results for 21c

--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |       |       | 48864 (100)|     12 |00:00:02.98 |   54755 |  54750 |       |       |          |
|   1 |  SORT GROUP BY             |      |      1 |     12 |   852 |       | 48864   (1)|     12 |00:00:02.98 |   54755 |  54750 |  5120 |  5120 | 4096  (0)|
|   2 |   VIEW                     |      |      1 |  46236 |  3205K|       | 48859   (1)|  45982 |00:00:02.97 |   54755 |  54750 |       |       |          |
|*  3 |    VIEW                    |      |      1 |  46236 |  6547K|       | 48859   (1)|  45982 |00:00:02.97 |   54755 |  54750 |       |       |          |
|*  4 |     WINDOW SORT PUSHED RANK|      |      1 |   2500K|   131M|   162M| 48859   (1)|  45982 |00:00:02.97 |   54755 |  54750 |  5297K|   950K| 4708K (0)|
|   5 |      TABLE ACCESS FULL     | T1   |      1 |   2500K|   131M|       | 15028   (1)|   2500K|00:00:00.28 |   54755 |  54750 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ORANK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "OBJECT_NAME" ORDER BY INTERNAL_FUNCTION("OBJECT_TYPE") DESC )<=1)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1; rowset=256) "OWNER"[VARCHAR2,128], MAX("OBJECT_NAME")[128]
   2 - (rowset=256) "OWNER"[VARCHAR2,128], "OBJECT_NAME"[VARCHAR2,128]
   3 - (rowset=256) "OWNER"[VARCHAR2,128], "OBJECT_NAME"[VARCHAR2,128], "ORANK"[NUMBER,22]
   4 - (#keys=2; rowset=256) "OBJECT_NAME"[VARCHAR2,128], "OBJECT_TYPE"[VARCHAR2,23], "OWNER"[VARCHAR2,128], ROW_NUMBER() OVER ( PARTITION BY
       "OBJECT_NAME" ORDER BY INTERNAL_FUNCTION("OBJECT_TYPE") DESC )[22]
   5 - (rowset=256) "OWNER"[VARCHAR2,128], "OBJECT_NAME"[VARCHAR2,128], "OBJECT_TYPE"[VARCHAR2,23]

In this case we see the A-rows from the Window Sort meeting our expectations – but that may be a beneficial side effect of the operation completing in memory.

Given the dramatically different demands for memory for a query that ought to do the same thing in both versions it looks as if 21c may be doing something clever that 11g doesn’t do, or maybe doesn’t do very well, or maybe tries to do but has a bug that isn’t dramatic enough to be obvious unless you’re looking closely.

Modelling

Here’s a script that I used to build the test data, with scope for a few variations in testing. You’ll notice that the “create table” includes an “order by” clause that is close to the sorting requirement of the over() clause that appears in the query. The results I’ve show so far were for data that didn’t have this clause in place.

rem
rem     Script:         analytic_sort_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2022
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

create table t1 nologging 
as
select 
        ao.*
from
        (select * from all_objects where rownum <= 50000) ao,
        (select rownum from dual connect by rownum <= 50)
order by
        object_name, object_type -- desc
/

--
--      Stats collection to get histograms
--

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 254'
        );
end;
/

--
-- reconnect here to maximise visibility of PGA allocation
--

connect xxxxxxxx/xxxxxxxx

set linesize 180
set trimspool on
set tab off

-- alter session set workarea_size_policy = manual;
-- alter session set sort_area_size = 199229440;

alter session set events '10046 trace name context forever, level 8';
-- alter session set statistics_level = all;
-- alter session set "_rowsource_execution_statistics"= true;

spool analytic_sort_2

select
        /*  monitoring */
        owner, max(object_name)
from    (
        select 
                /*+ no_merge */
                owner, object_name 
        from    (
                select 
                        owner, object_name,
                        row_number() over (partition by object_name order by object_type desc) orank 
                from 
                        t1
                )  where orank= 1
        )
group by 
        owner
order by
        owner
/

select * from table(dbms_xplan.display_cursor(format=>'cost bytes allstats last projection'));

alter session set events '10046 trace name context off';
alter session set "_rowsource_execution_statistics"= false;
alter session set statistics_level = typical;
alter session set workarea_size_policy = auto;

spool off

The results I’m going to comment on now are the ones I got after running the script with the order by clause in place, then reconnecting and flushing the shared pool before repeat the second half of the script (i.e. without recreating the table).

In 11g, going back to the automatic workarea sizing the session used 37MB of memory and then spilled (only) 3MB to temp. The run time was approximately 3 seconds – which is a good match for the “unsorted” 21c run time. As with the original tests, the value reported in A-rows is larger than we would expect (in this case suspiciously close to twice the correct values – but that’s more likely to be a coincidence than a clue). Interestingly, when I switched to the manual workarea_size_policy and set the sort_area_size to 190MB Oracle said “that’s the optimum memory” and used nearly all of it to complete in memory – for any value less than that (even down to 5MB) Oracle spilled just 3 MB to disk in a one-pass operation. So it looks as if Oracle “knows” it doesn’t need to sort the whole data set, but still uses as much memory as is available to do something before it starts to get clever.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |       |       | 29491 (100)|      8 |00:00:01.76 |   35523 |   2145 |    331 |       |       |          |         |
|   1 |  SORT GROUP BY             |      |      1 |      8 |   184 |       | 29491   (9)|      8 |00:00:01.76 |   35523 |   2145 |    331 |  2048 |  2048 | 2048  (0)|         |
|   2 |   VIEW                     |      |      1 |   2500K|    54M|       | 28532   (6)|  28575 |00:00:02.00 |   35523 |   2145 |    331 |       |       |          |         |
|*  3 |    VIEW                    |      |      1 |   2500K|   112M|       | 28532   (6)|  28575 |00:00:01.83 |   35523 |   2145 |    331 |       |       |          |         |
|*  4 |     WINDOW SORT PUSHED RANK|      |      1 |   2500K|    95M|   124M| 28532   (6)|  57171 |00:00:02.10 |   35523 |   2145 |    331 |  2979K|   768K|   37M (1)|    3072 |
|   5 |      TABLE ACCESS FULL     | T1   |      1 |   2500K|    95M|       |  4821   (8)|   2500K|00:00:11.84 |   35513 |   1814 |      0 |       |       |          |         |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In 21c there’s essentially no difference between the sorted and unsorted tests, which suggests that with my data the session had been able to apply its optimisation strategy at the earliest possible moment rather than waiting until it had no alternative but to spill to disc.

--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |       |       | 48864 (100)|     12 |00:00:00.98 |   54753 |  54748 |       |       |          |
|   1 |  SORT GROUP BY             |      |      1 |     12 |   852 |       | 48864   (1)|     12 |00:00:00.98 |   54753 |  54748 |  4096 |  4096 | 4096  (0)|
|   2 |   VIEW                     |      |      1 |  46236 |  3205K|       | 48859   (1)|  45982 |00:00:00.97 |   54753 |  54748 |       |       |          |
|*  3 |    VIEW                    |      |      1 |  46236 |  6547K|       | 48859   (1)|  45982 |00:00:00.97 |   54753 |  54748 |       |       |          |
|*  4 |     WINDOW SORT PUSHED RANK|      |      1 |   2500K|   131M|   162M| 48859   (1)|  45982 |00:00:00.97 |   54753 |  54748 |  5155K|   940K| 4582K (0)|
|   5 |      TABLE ACCESS FULL     | T1   |      1 |   2500K|   131M|       | 15028   (1)|   2500K|00:00:00.42 |   54753 |  54748 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Bug description: possibly

Given the way that 11g reports a very small spill to disc, which stays fairly constant in size no matter how large or small the available PGA allocation is, when the input data is sorted to help the over() clause, and given how large the spill to disc can become when the data is not sorted, I feel that Oracle has an optimisation that discards input rows early in the analytic window sort. But we also have some evidence of a flaw in the code in versions prior to 21c that means Oracle fails to re-use memory that becomes available from rows that have been discarded.

This means the OP’s problem may have been just bad luck in terms of available memory and (relatively) tiny variations in demands for space between the 11g and 19c instances perhaps due to differences in the quantity or distribution of data.

Although the impact was dramatic in this case, a query that is supposed to return 70 million rows (irrespective of how many it starts with) is an extreme case, and one that deserves a strong justification and a significant investment in time spent on finding cunning optimisation strategies.

So maybe this is a bug that doesn’t usually get noticed that will go away on an upgrade to 21c; and maybe there’s a backport and patch already available if you can find a bug number in the 21c patch release notes.

Strategy

I’ve said in the past that if you’re using analytic functions you ought to minimise the size of the data you’re processing before you apply the analytic part. Another step that can help is to make sure you’ve got the data into a (fairly well) sorted order before you reach the analytic part.

In the case of versions of Oracle prior to 21c, it also seems to make sense (if you can arrange it) to minimise the amount of memory the session is allowed to use for a sort operation, as this will reduce the CPU used by the session and avoid grabbing excess redundant memory that could be used more effectively by other sessions.

Addendum

Just before publishing I found a way of keeping my 19.11.0.0 instance alive long enough to run the tests, then also ran them on an instance of 12.2.0.1. Both versions showed the same pattern of doing a large allocation of memory and large spill to disc when the data was not sorted, and a large allocation of memory but a small spill to disc when the data was sorted.

As a little sanity check I also exported the 19c data and imported it to 21c in case it was a simple variation in the data that allwoed made 21c to operate more efficiently than19c. The change in data made no difference to the way in which 21c handled it, in both cases it called for a small allocation of memory with no spill to disc.

Next Page »

Website Powered by WordPress.com.