Oracle Scratchpad

August 28, 2020

FK on delete

Filed under: constraints,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 8:51 am BST Aug 28,2020

This is part 1 of a short reference note about the work load created by referential integrity constraints when you delete from a parent table. It was prompted by a question on the Oracle Developer Community forum about how to handle a very large delete from a table which (a) included a column of type CLOB and (b) had 9 child tables.

The 9 referential integrity constraints were declared with “on delete cascade”, but the delete was taking too long even though all the related child data had been deleted before the parent delete. In outline the process was designed to operate in batches as follows:

  • populate global temporary table with a small set of IDs
  • delete from 9 child tables where parent_id in (select id from gtt)
  • delete from parent where id in (select id from gtt);

The process was running very slowly. At first sight – and without examining any trace files or workload statistics – the “obvious” guess would be that this was something to do with the CLOBs – but a CLOB delete is usually a “logical” delete, i.e. it sets a flag amd shouldn’t really be doing a lot of work actually deleting and freeing space, and it doesn’t generate undo for the CLOB itself. Examination of the session statistics showed that the problem was with the work that Oracle was doing to handle the referential integrity – even though all related rows had been deleted in advance of the parent delete.

A highly suggestive statistic from the session stats (v$sesstat for the session) for a controlled test that deleted 1170 parent rows was the “execute count” which was 10,892.  Why would you execute that many statements when all you’re doing is a simple “delete from parent where id in (select from gtt)” – check the arithmetic: 1,170 * 9 = 10,530 which is a fairly good indicator that every row deleted from the parent results in 9 statements being executed to delete from the 9 child tables.

Foreign Key options

Oracle is a little restrictive in how it allows you to define foreign key constraints – MySQL, for example, allows you to specify the action the database should take on update or delete of a parent row,, and it allows 5 possible actions, so you have:


on update [restrict | cascade | set null | no action | set default]
on delete [restrict | cascade | set null | no action | set default]

In comparison Oracle only implements “on delete”, and the only options it allows are “cascade”, “set null”, and “no action” (which is the default, though you can’t actually specify it).

Side note: “No Action” and “Restrict” are very similar – if you try to delete a parent for which a child row exists then the attempt will raise an error. The difference between the two operations is that “no action” will delete the parent then rollback when it finds the child while “restrict” will check to see if the child exists before attempting to delete the parent. (See footnote 1 for detailed examination of how Oracle handles “no action”).

As an initial investigation of the work that Oracle does to handle referential integrity and deletes I want to examine one simple but extreme example. I’m going to create a parent/child pair of tables with “on delete no action”, and delete one parent row. The thing that makes the demonstration “extreme”  is that I will first create, and then delete, a large number of rows for that parent row before deleting the parent. Here’s the initial data setup:

rem
rem     Script:         fk_lock_stress.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem
rem     Last tested 
rem             19.3.0.0
rem

create table child
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        trunc((rownum-1)/1200)  n1,
        lpad('x',80)            idx_padding,
        lpad(rownum,8,'0')      small_vc
from
        generator       v1,
        generator       v2
where
        rownum <= 6e4 -- > comment to avoid wordpress format issue
;

create index child_ix on child(n1, idx_padding) pctfree 95;

create table parent as
select
        id,
        lpad(rownum,8,'0')      small_vc,
        lpad('x',80)            padding
from    (
        select 
                distinct n1             id
        from
                child
        )
;

alter table parent add constraint par_pk primary key(id);
alter table child add constraint chi_fk_par foreign key(n1) references parent;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PARENT',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'CHILD',
                method_opt       => 'for all columns size 1'
        );
end;
/


select  index_name, num_rows, distinct_keys, leaf_blocks, avg_leaf_blocks_per_key 
from    user_indexes
where   index_name in ('PARENT','CHILD')
;

select  object_id, object_name
from    user_objects
order by
        object_id
;

delete from child where n1 = 10;
commit;
delete from child where n1 = 20;
commit;
delete from child where n1 = 30;
commit;
delete from child where n1 = 40;
commit;

execute dbms_stats.gather_table_stats(user, 'child', cascade=>true)

alter system flush buffer_cache;

For each parent row there are 1,200 child rows, and I’ve given the “foreign key” index on child a pctfree of 95% which has resulted in the index needing roughly 600 leaf blocks per key. (When I first created this test my value for x$kcbbf (buffer pins) was 500 and I wanted to see what would happen if I needed to pin more buffers that could fit the array).

After creating the index and gathering stats I’ve deleted all the rows for 4 of the possible parent values from the child table because I want to see how much work it takes to delete a parent for which there are no longer any child rows, as follows:

execute snap_events.start_snap
execute snap_my_stats.start_snap

alter session set events '10046 trace name context forever, level 8';
alter session set tracefile_identifier = 'del';

delete from parent where id = 10;
commit;

delete from parent where id = 40;
commit;

alter session set tracefile_identifier = '';
alter session set events '10046 trace name context off';

execute snap_my_stats.end_snap
execute snap_events.end_snap

The “snap_my_stats / snap_events” packages are just my usual capture of workload information over the interval. I’ve enabled extended tracing at level 8 (waits) so that I can see if Oracle runs any interesting recursive SQL and, since I;ve flushed the buffer cache, this will also let me see what datablocks Oracle has to read from the database.

Here are the key results from a test run on 19.3.0.0:


Statistic                                                Value
---------                                                -----
physical reads                                           1,225

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                           1,225           0           0.86        .001           1
events in waitclass Other                            11           0           0.01        .001          73

Is it a coincidence that I’ve done 1,225 single block reads when the number of leaf blocks per key in the child index is a little over 600 and I’ve just deleted two parent key values? (Spoiler: NO).

When we apply tkprof to the trace file we find the following summary of the first delete:


SQL ID: 0u6t174agkq27 Plan Hash: 3366423708

delete from parent
where
 id = 10


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.01          0          0          1           0
Execute      1      0.00       0.02        608          1        609           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.03        608          1        610           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 107
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  PARENT (cr=1 pr=608 pw=0 time=23591 us starts=1)
         1          1          1   INDEX UNIQUE SCAN PAR_PK (cr=1 pr=1 pw=0 time=60 us starts=1 cost=0 size=3 card=1)(object id 94905)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                       608        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************

The “Row Source Operation” statistics tell use that it has taken 608 single block reads to access a single row from the parent table by unique index and delete it. So let’s look at the content of the trace file – looking for any WAIT, that is a “db file sequential read” wait  relating to this cursor. (I started by looking for the statement in the trace file so that I could extract its cursor number before using the following grep command):


grep "#139773970345352" or19_ora_17541_del.trc | grep "WAIT.*db file sequential read" | sed "s/^.*obj#=//" | sed "s/ .*$//" | sort | uniq -c | sort -n

      2 0
    604 94903
      1 94904
      1 94905

If you’re wondering which objects these numbers correspond to (though you may have made a reasonalble guess by now), here are the results from my earlier query against user_objects:


 OBJECT_ID OBJECT_NAME
---------- --------------------
     94902 CHILD
     94903 CHILD_IX
     94904 PARENT
     94905 PAR_PK

I visit one index block in the parent index, then one block in the parent table, then 604 blocks in the child index to check that there are no current, or potential, index entries in each leaf block. I’ve done that with single block reads, walking the child index in order, and I do it without showing any SQL that might help me spot that it’s happening.

Exactly the same mechanism then comes into play for deleting parent_id = 40. We effectively I do a “silent” index range scan of the child index where n1 = 40 to make sure that there are no current child rows.

When I first created this test it was because I wanted to see if Oracle would pin all the buffers holding the current image of the relevant child index leaf blocks in exclusive mode as it walked through the index – this seemed to be the obvious way to ensure that no other session could insert a child row into a leaf block that had already been checked – but when I enabled pin tracing I found 600+ pairs of “Aquire/Release”. This may help to explain Oracle’s choice of “No Action” (rather than “Restrict”): if the parent row is locked and marked for deletion then there is no need to keep the child leaf blocks pinned exclusively as any attempt by another session to introduce a new child would first check the parent and discover that it was marked for deletion and wait for the deleting transaction to commit or rollback.

There is a gap in this hypothesis, though, which shows up when there is no appropriate index on the table. (Which includes the case where the “foreign key” index has been created as a bitmap instead of a B-tree, or where there’s a B-tree index that includes the foreign key column(s) but not at the start of the index definition, or if the foreign key columns are at the start of the index but declared as descending).

If there is no appropriate “foreign key” index there are two changes in the activity. First, the child table will temporarily be locked in mode 4 (share mode) or mode 5 (share row exclusive mode) for the duration of the referential integrity check. Secondly you will see the text of the recursive SQL statement that does the check; here, for example, is the SQL extracted from a 19.3 trace file in a few variants of the original test:


select /*+ all_rows */ count(1)
from
 "TEST_USER"."CHILD" where "N1" = :1

Oracle’s treatment of this statement is cost-based, and in my case, depending on the choice of index, level of compression, and settings for system statistics I found that I could get any one of the following plans:


Bitmap index
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=3 pr=2 pw=0 time=139 us starts=1)
         0          0          0   BITMAP CONVERSION COUNT (cr=3 pr=2 pw=0 time=132 us starts=1 cost=2 size=3600 card=1200)
         0          0          0    BITMAP INDEX RANGE SCAN CHILD_IX (cr=3 pr=2 pw=0 time=128 us starts=1)(object id 95098)

No index at all
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=826 pr=412 pw=0 time=4880 us starts=1)
         0          0          0   TABLE ACCESS FULL CHILD (cr=826 pr=412 pw=0 time=4870 us starts=1 cost=114 size=3600 card=1200)

Index on (idx_padding, n1)
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=826 pr=412 pw=0 time=5601 us starts=1)
         0          0          0   TABLE ACCESS FULL CHILD (cr=826 pr=412 pw=0 time=5591 us starts=1 cost=114 size=3600 card=1200)

Index on (idx_padding, n1) compress 1 (and tweaking MBRC)
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=128 pr=126 pw=0 time=2467 us starts=1)
         0          0          0   INDEX SKIP SCAN CHILD_IX (cr=128 pr=126 pw=0 time=2460 us starts=1 cost=122 size=3600 card=1200)(object id 95111)

You’ll note that I have two cases where the strategy chosen for the query seems to be a close match to the operation performed with the “proper foreign key index” in place – moreover the number of blocks read for the tablescan strategy is less than the number of blocks read when we had the “foreign key” index in place. So what’s the difference that makes it possible to for Oracle to avoid locking the table in mode 4/5 when we have that index?

These results, by the way, explain why Mikhail Velikikh (in comment #1) can say that large numbers of buffer gets and disk reads for the parent delete are NOT an indication of a missing foreign key index … when the index is missing the large numbers are correctly reported against the SQL that checks the child, not against the parent delete.

Summary

Even if you’ve deleted all the child data before attempting to delete a parent row, and YOU know that that there’s no child data when ORACLE is told to delete the parent row it has to check that there’s no child data, and in a large scale delete (particularly where there may be many child rows per parent) you may end up seeing a a surprising amount of I/O on single block reads of the “foreign key” indexes on any child tables.

This means for a very large “house-keeping” or archiving delete with a limited time-window you may want to take special steps, such as rebuilding “foreign key” indexes after deleting large number of child rows and before deleting the parent. You may even want to disable/drop the foreign key constraint before the parent delete and re-enable it afterwards if you think you can avoid getting any bad data into the job is running.

In the next article:  on delete cascade, on delete set null..

Footnote 1

To demonstrate Oracle’s behaviour for the default “no action  foreign key constraint when you attempt to delete a parent row for which child data exists (which will result in raising an Oracle error or the form: “ORA-02292: integrity constraint (TEST_USER.CHI_FK_PAR) violated – child record found”) you need only set up a
small amount of data, attempt the delete and then dump the redo for the transaction – if nothing else is going on you’ll find the following set or redo change vectors (taken from a trace file generated in 19.3, using the command “grep OP {filename}”):

CHANGE #1 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00084 OBJ:94828 SCN:0x00000b860f303fa4 SEQ:2 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:35 AFN:17 DBA:0x04400110 OBJ:4294967295 SCN:0x00000b860f303f98 SEQ:2 OP:5.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c0009c OBJ:94829 SCN:0x00000b860f303fa4 SEQ:2 OP:10.4 ENC:0 RBL:0 FLG:0x0000
CHANGE #4 CON_ID:3 TYP:0 CLS:36 AFN:17 DBA:0x04404564 OBJ:4294967295 SCN:0x00000b860f303f98 SEQ:3 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #5 CON_ID:3 TYP:0 CLS:36 AFN:17 DBA:0x04404564 OBJ:4294967295 SCN:0x00000b860f304062 SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c0009c OBJ:94829 SCN:0x00000b860f304062 SEQ:1 OP:10.5 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:36 AFN:17 DBA:0x04404564 OBJ:4294967295 SCN:0x00000b860f304062 SEQ:2 OP:5.6 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00084 OBJ:94828 SCN:0x00000b860f304062 SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:35 AFN:17 DBA:0x04400110 OBJ:4294967295 SCN:0x00000b860f304062 SEQ:1 OP:5.11 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:35 AFN:17 DBA:0x04400110 OBJ:4294967295 SCN:0x00000b860f304062 SEQ:2 OP:5.4 ENC:0 RBL:0 FLG:0x0000

11.3  delete row piece
5.2   start transaction
10.4  delete leaf row    (ed: this is for the primary key index)
5.1   create undo record for table block
5.1   create undo record for index leaf block

10.5  restore leaf row during rollback
5.6   mark index undo as applied

11.2  insert rowpiece
5.11  mark table undo as applied

5.4   commit;

So Oracle deletes the parent, (discovers the child) then rolls back the parent delete.

 

 

August 24, 2020

Flashback Bug

Filed under: 18c,Bugs,Oracle,redo,Troubleshooting — Jonathan Lewis @ 9:34 am BST Aug 24,2020

Here’s a problem with the “flashback versions” technology that showed up at the end of last week. There’s a thread about it on the Oracle Developer community forum, and a chain of tweets that was my initial response to a twitter alert about it that Daniel Stein posted.

The problem appears somewhere in the 18c timeline – it doesn’t seem to be present in 12.2.0.1 – so if you’re running any versions 18+ here’s a modified version of the test case supplied by Daniel Stein to demonstrate the issue.

rem
rem     Script:         flashback_bug.sql
rem     Author:         Jonathan Lewis / Daniel Stein
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0        Wrong result
rem             12.2.0.1        Correct result
rem

create table t1 (n1 number(4)) 
segment creation immediate
;

prompt  =============================================================
prompt  Sleeping 10 seconds after create table to avoid subsequent
prompt  ORA-01466: unable to read data - table definition has changed
prompt  =============================================================


execute dbms_lock.sleep(10)

prompt  =========
prompt  Start SCN
prompt  =========

column current_scn new_value scn_vorher2 
select to_char(current_scn,'9999999999999999') current_scn from V$DATABASE;

insert into t1(n1) values (1);
insert into t1(n1) values (2);
insert into t1(n1) values (3);
insert into t1(n1) values (4);
insert into t1(n1) values (5);
insert into t1(n1) values (6);
insert into t1(n1) values (7);
insert into t1(n1) values (8);
insert into t1(n1) values (9);
insert into t1(n1) values (10);
insert into t1(n1) values (11);
insert into t1(n1) values (12);

delete from t1 where n1 in (2, 5, 8, 11);

commit;

prompt  =======
prompt  End SCN
prompt  =======

column current_scn new_value scn_nachher
select to_char(current_scn,'9999999999999999') current_scn from V$DATABASE;

prompt  =============
Propmt  Version Query 
prompt  =============

column VERSIONS_STARTSCN        format 9999999999999999 heading "Start SCN"
column VERSIONS_ENDSCN          format 9999999999999999 heading "End SCN"
column VERSIONS_OPERATION       format A9               heading "Operation"
column SCN_RANGE                format A35              heading "SCN Range"

select 
        n1, rowid, 
        versions_operation, versions_startscn, versions_endscn,
        &scn_vorher2||' and '||&scn_nachher scn_range
from
        t1 versions between scn &scn_vorher2 and &scn_nachher
order by 
        rowid, versions_startscn, versions_operation, versions_endscn
;

prompt  ==================
prompt  Current Table data
prompt  ==================

select n1,rowid from t1;


alter system dump redo scn min &scn_vorher2 scn max &scn_nachher;

I’ve created a table then inserted a few rows and deleted some of them in the same transaction. I’ve captured the database SCN at the start and end of the transaction and then tried to run a “versions between” query across that range of SCNs. Here are the results from the “versions between” query and the final query of the current contents of the table on a test on a database running 19.3:

        N1 ROWID              Operation         Start SCN           End SCN SCN Range
---------- ------------------ --------- ----------------- ----------------- -----------------------------------
         1 AAAXFOAATAAAACDAAA I            12670408139684                   12670408139679 and 12670408139687
         3 AAAXFOAATAAAACDAAC I            12670408139684                   12670408139679 and 12670408139687
         4 AAAXFOAATAAAACDAAD I            12670408139684                   12670408139679 and 12670408139687
         5 AAAXFOAATAAAACDAAE I            12670408139684                   12670408139679 and 12670408139687
         6 AAAXFOAATAAAACDAAF I            12670408139684                   12670408139679 and 12670408139687
         7 AAAXFOAATAAAACDAAG I            12670408139684                   12670408139679 and 12670408139687
         8 AAAXFOAATAAAACDAAH I            12670408139684                   12670408139679 and 12670408139687
         9 AAAXFOAATAAAACDAAI I            12670408139684                   12670408139679 and 12670408139687
        10 AAAXFOAATAAAACDAAJ I            12670408139684                   12670408139679 and 12670408139687
        11 AAAXFOAATAAAACDAAK I            12670408139684                   12670408139679 and 12670408139687

10 rows selected.

        N1 ROWID
---------- ------------------
         1 AAAXFOAATAAAACDAAA
         3 AAAXFOAATAAAACDAAC
         4 AAAXFOAATAAAACDAAD
         6 AAAXFOAATAAAACDAAF
         7 AAAXFOAATAAAACDAAG
         9 AAAXFOAATAAAACDAAI
        10 AAAXFOAATAAAACDAAJ
        12 AAAXFOAATAAAACDAAL

8 rows selected.

There’s a trap in this demonstration. If you follow the link to the forum thread you’ll find that Daniel Stein says there’s a problem with Oracle 19.6 and upwards (but not 12.2) – then various other people test on their versions and find that 19.7, 18.5, and 19.3 are broken. That last one is what made me run the test on the 19.3 I had at hand – and I got the right result, unlike the results above.

If you look at the output from this test run you’ll see that the versioning query says that in the SCN range I’ve inserted 10 rows, and that rows 2 and 12 have never been inserted. Given the SQL I’d run I was expecting to see 16 rows in the output, 12 of them saying I’d inserted values 1 through 12, and then 4 saying I’d deleted 2, 5, 8 and 11 – but (when it works correctly) Oracle manages to combine the insert and delete in the same transaction to cancel the entries. So the versioning query ought to be showing exactly the 8 rows that appear in the current data query.

So I had two puzzles:

  • What’s gone wrong for everyone else ?
  • Why hasn’t it gone wrong for me ?

To answer the second question first: I took a look at my instance parameters and found that I had set statistics_level to all. Clearly this ought to have nothing to do with an error in flashback version queries, but I changed it to typical anyway and re-ran the test: and got the (now expected) wrong results. If you read the twitter thread you’ll find that Oracle support had had an SR opened on this problem for a few days but had been unable to reproduce the problem – perhaps they, too, had a system with statistics_level set to all.

Getting back to the first question: what’s gone wrong. I couldn’t think of a good reason why setting the statistics_level should make a difference, so I took the view that the versioning query is running through the redo log to find all the changes that took place in the range, so maybe the answer will appear in the redo. That’s why the last line in the demo above is a command to dump the redo generated between the two SCNs.

Short-circuiting the details of what I looked at I’m just going to show you the layer 11 (table) redo change vectors for my DML, which I extracted from the trace file using the command grep “OP:11” {filename}.

First the set when statistics_level = all:


CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1b SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1b SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:4 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:5 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:6 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:7 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:8 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:9 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:10 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:2 CLS:1 AFN:10 DBA:0x01009c57 OBJ:720 SCN:0x00000b860f2d3cb8 SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:11 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:1 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:2 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:3 OP:11.3 ENC:0 RBL:0 FLG:0x0000

What you can see are 13 op codes “11.2” – which is “insert row piece”. Most of these are for OBJ 94554, which is the object_id for my table, and one of them if for OBJ 720, which is a table owned by sys called exp_head$ (which is something to do with “expression tracking”). After the inserts there are 4 op Ccodes “11.3” which is “delete row piece”. So there’s no obvious error or complication in the redo.

And now when we set statistics_level to typical and re-run the script (which means we get a new OBJ number):

CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8a SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8a SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:4 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:5 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:6 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:7 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:8 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:9 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:10 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:2 CLS:1 AFN:10 DBA:0x01009c57 OBJ:720 SCN:0x00000b860f2d3d1d SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:11 OP:11.12 ENC:0 RBL:0 FLG:0x0000

In this case we get 13 “insert row piece” Op Codes, and then a single “11.12” which is “delete multiple rows”. So there IS a difference in the redo, and something about that difference is probably behind the error coming and going.

Taking a closer look at the 11.12 and one of the 11.2 redo change vectors, this is what we see:


CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:11 OP:11.12 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C  uba: 0x04400a24.0672.3c
KDO Op code: QMD row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x04c00083  hdba: 0x04c00082
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 lock: 1 nrow: 4
slot[0]: 1
slot[1]: 4
slot[2]: 7
slot[3]: 10


CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:3 OP:11.3 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C  uba: 0x04400467.0595.2c
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x04c00083  hdba: 0x04c00082
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 10(0xa)

The 11.12 Op Code shows us a list of 4 entries in the block’s “row directory” for “table 0”. THe 11.3 Op Code shows us just one – it’s the last of the four deletes so it’s reporting entry 10 in the “row directory”, the previous three Op Code 11.3 vectors were reporting slots 1, 4, and 7 respectively.

Obviously we can’t draw any firm conclusions about what’s going wrong, we simply have a few observations that might give us some ideas of the type of error; and after a few more experiments I decided that the code handling flashback versioning mis-interpreting the “delete multiple row” record. It consistently seemed to delete the first row identified the slot[0] entry and then jumped straight to the last slot but add one to the index entry before attempting to apply the vector.

Footnote

I’ve used dbms_lock.sleep() to pause for 10 seconds after creating the table. The sleep() procedure was copied into the dbms_session package in recent versions of Oracle so that end-user code could access it without having to receive privileges on the rather more dangerous dbms_lock package.

I’ve referenced the statistics_level parameter as the one affecting the generation of the “delete multiple row” redo vector. In fact you can get the same effect by setting the hidden parameter “_rowsource_execution_statistics” to true. Setting statistics_level to all has the effect of enabling rowsource execution statistics so this isn’t surprising.

I pointed out that the flashback versioning code seemed to “add 1” to the final slot[n] identifier before producing the flashback output. Think about that that means if you change the demonstration to delete the LAST row in the list of rows inserted. Here’s what mning query did in a test when I changed 11 to 12 in my delete statement:


ERROR at line 2:
ORA-00600: internal error code, arguments: [ktrvtgr_2], [79691907], [0], [11], [], [], [], [], [], [], [], []

79691907 is the block number of the block where the table rows are stored.
11 is the correct row directory index for the last row (12) in the table – buy my hypothesis is that the flashback code was trying to find (non-existent) directory index 12 because it adds one to the supplied index entry.

Update

In the interval between completing the final draft yesterday and publishing the note this morning, Oracle support replied to the the forum thread with a bug number (28910586 – not yet publicly visible) and a note that a patch could be available on request if an SR were raised giving the Oracle version and platform. It will be interesting to see whether the patch blocks the creation of the 11.12 op codes or whether it corrects the interpretation of 11.12 codes by the flashback versions code.

August 18, 2020

Explain Rewrite

Filed under: Materialized view,Oracle,Troubleshooting — Jonathan Lewis @ 7:07 pm BST Aug 18,2020

This is one of those notes that I thought I’d written years ago. It answers two questions:

  • what can I do with my materialized view?
  • why isn’t the optimizer using my materialized view for query rewrite?

I’ve actually supplied an example of code to address the first question as a throwaway comment in a blog that dealt with a completely different problem, but since the two questions above go together, and the two answers depend on the same package, I’m going to repeat the first answer.

The reason for writing this note now is that the question “why isn’t this query using my materialized view” came up on the Oracle Developer community forum a few days ago – and I couldn’t find the article that I thought I’d written.

Note: a couple of days after I started drafting this note Frank Pachot tweeted the links to a couple of extensive posts on materialized views that included everything I had to say (and more) about “what can my materialized view do”. Fortunately he didn’t get on to second question – so I decided to publish the whole of my note anyway as the two questions go well together.

The key feature is the dbms_mview package, and the two procedures (both overloaded) explain_mview() and explain_rewrite(). To quote the 12.2 “PL/SQL Supplied Packages” manual page, the first procedure:

“explains what is possible with a materialized view or potential [ed: my emphasis] materialized view”

note particularly that the materialized view doesn’t need to have been created before you run the package – the second procedure:

“explains why a query failed to rewrite or why the optimizer chose to rewrite a query with a particular materialized view or materialized views”.

Here’s the relevant extract from the SQL*Plus describe of the package –

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             VARCHAR2                IN
 STMT_ID                        VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             CLOB                    IN
 STMT_ID                        VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             VARCHAR2                IN
 MSG_ARRAY                      EXPLAINMVARRAYTYPE      IN/OUT

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             CLOB                    IN
 MSG_ARRAY                      EXPLAINMVARRAYTYPE      IN/OUT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          VARCHAR2                IN
 MV                             VARCHAR2                IN     DEFAULT
 STATEMENT_ID                   VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          CLOB                    IN
 MV                             VARCHAR2                IN     DEFAULT
 STATEMENT_ID                   VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          VARCHAR2                IN
 MV                             VARCHAR2                IN     DEFAULT
 MSG_ARRAY                      REWRITEARRAYTYPE        IN/OUT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          CLOB                    IN
 MV                             VARCHAR2                IN     DEFAULT
 MSG_ARRAY                      REWRITEARRAYTYPE        IN/OUT

As you can see there are 4 overloaded versions of explain_mview() and 4 of explain_rewrite(): both procedures take an input of an SQL statement (parameter mv for explain_mv(), parameter query for explain_rewrite()) – which can be either a varchar2() or a CLOB, and both procedures supply an output which can be written to a table or written to an “in/out” pl/sql array.

Two possible input options times two possible output options gives 4 overloaded versions of each procedure. In this note I’ll restrict myself to the first version of the two procedures – varchar2() input, writing to a pre-created table.

Here’s a simple demo script. Before we do anything else we need to call a couple of scripts in the $ORACLE_HOME/rdbms/admin directory to create the target tables. (If you want to do something a little clever you could tweak the scripts to create them as global temporary tables in the sys schema with a public synonym – just like the plan_table used in calls to explain plan.)

rem
rem     Script:         c_explain_mv.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem

@$ORACLE_HOME/rdbms/admin/utlxmv.sql
@$ORACLE_HOME/rdbms/admin/utlxrw.sql

-- @$ORACLE_HOME/sqlplus/demo/demobld.sql

create materialized view dept_cost
refresh complete on demand
enable query rewrite
as
select 
        d.deptno,sum(e.sal) 
from 
        emp e,dept d
where 
        e.deptno = d.deptno
group by 
        d.deptno
;
 
set autotrace traceonly explain
 
select 
        d.deptno,sum(e.sal) 
from 
        emp e,dept d
where 
        e.deptno = d.deptno
and     d.deptno=10
group by 
        d.deptno
;

set autotrace off

/*

Execution Plan
----------------------------------------------------------
Plan hash value: 3262931184

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |     7 |     2   (0)| 00:00:01 |
|*  1 |  MAT_VIEW REWRITE ACCESS FULL| DEPT_COST |     1 |     7 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DEPT_COST"."DEPTNO"=10)

*/

My original script is rather old and in any recent version of Oracle the Scott tables are no longer to be found in demobld.sql, so I’ve added them at the end of this note.

After creating the emp and dept tables I’ve created a materialized view and then enabled autotrace to see if a suitable query will use the materialized view – and as you can see from the execution plan the materialized view can be used.

So let’s do a quick analysis of the view and the rewrite:


column audsid new_value m_audsid
select sys_Context('userenv','sessionid') audsid from dual;

begin
        dbms_mview.explain_mview(
--              mv      => 'DEPT_COST',
                mv      => q'{
                        create materialized view dept_cost
                        refresh complete on demand
                        enable query rewrite
                        as
                        select 
                                d.deptno,sum(e.sal) 
                        from 
                                emp e,dept d
                        where 
                                e.deptno = d.deptno
                        group by 
                                d.deptno
                }',
                stmt_id         => '&m_audsid'
        );
end;
/
 
set linesize 180

column cap_class noprint
column capability_name format a48
column related_text format a15
column short_msg format a90

break on cap_class skip 1
 
select
        substr(capability_name,1,3) cap_class,
        capability_name, possible, related_text, msgno, substr(msgtxt,1,88) short_msg
from
        mv_capabilities_table
where
        mvname       = 'DEPT_COST'
and     statement_id = '&m_audsid'
order by
        substr(capability_name,1,3), related_num, seq
;


I’ve captured the session’s audsid because the mv_capabilities_table table and the rewrite_table table both have a statement_id column and the audsid is a convenient value to use to identify the most recent data you’ve created if you’re sharing the table. Then I’ve called dbms_mview.explain_mview() indicating two possible strategies (with one commented out, of course).

I could pass in a materialized view name as the mv parameter, or I could pass in the text of a statement to create a materialized view (whether or not I have previously created it). As you can see, I’ve also used a substitution variable to pass in my audsid as the statement id.

After setting up a few SQL*Plus format options I’ve then queried some of the columns from the mv_capabilitie_table table, with the following result:


CAPABILITY_NAME                                  P RELATED_TEXT    MSGNO SHORT_MSG
------------------------------------------------ - --------------- ----------------------------------------
PCT_TABLE                                        N EMP             2068 relation is not a partitioned table
PCT_TABLE_REWRITE                                N EMP             2068 relation is not a partitioned table
PCT_TABLE                                        N DEPT            2068 relation is not a partitioned table
PCT_TABLE_REWRITE                                N DEPT            2068 relation is not a partitioned table
PCT                                              N

REFRESH_FAST_AFTER_ONETAB_DML                    N SUM(E.SAL)      2143 SUM(expr) without COUNT(expr)
REFRESH_COMPLETE                                 Y
REFRESH_FAST                                     N
REFRESH_FAST_AFTER_INSERT                        N TEST_USER.EMP   2162 the detail table does not have a materialized view log
REFRESH_FAST_AFTER_INSERT                        N TEST_USER.DEPT  2162 the detail table does not have a materialized view log
REFRESH_FAST_AFTER_ONETAB_DML                    N                 2146 see the reason why REFRESH_FAST_AFTER_INSERT is disabled
REFRESH_FAST_AFTER_ONETAB_DML                    N                 2142 COUNT(*) is not present in the select list
REFRESH_FAST_AFTER_ONETAB_DML                    N                 2143 SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_ANY_DML                       N                 2161 see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_PCT                                 N                 2157 PCT is not possible on any of the detail tables in the materialized view

REWRITE                                          Y
REWRITE_FULL_TEXT_MATCH                          Y
REWRITE_PARTIAL_TEXT_MATCH                       Y
REWRITE_GENERAL                                  Y
REWRITE_PCT                                      N                 2158 general rewrite is not possible or PCT is not possible on any of the detail tables


20 rows selected.

If you want the complete list of possible messages, the msgno is interpreted from $ORACLE_HOME/rdbms/mesg/qsmus.msg (which is shared with the explain_rewrite() procedure. Currently (19.3) it holds a slightly surprising 796 messages. A call to “oerr qsm nnnnn” will translate the number to the message text.

In a similar vein we can call dbms_mview.explain_rewrite(). Before we do so I’m going to do something that will stop the rewrite from taking place, then call the procedure:

update emp set ename = 'JAMESON' where ename = 'JAMES';
commit;

begin
        dbms_mview.explain_rewrite ( 
                query => q'{
                        select 
                                d.deptno,sum(e.sal) 
                        from 
                                emp e,dept d
                        where 
                                e.deptno = d.deptno
                        and     d.deptno=10
                        group by 
                                d.deptno
                        }',
                mv => null,
                statement_id => '&m_audsid'
        );
end;
/

column message format a110

select
        sequence, message 
from 
        rewrite_table
where
        statement_id = '&m_audsid'
order by
        sequence
/

You’ll notice that there’s an input parameter of mv – there may be cases where the optimizer has a choice of which materialized view to use to rewrite a query, you could supply the name of a materialized view for this parameter to find out why Oracle didn’t choose the materialized view you were expecting. (In this case mv has to supply a materialized view name, not the text to create a materialized view)

In my example I get the following results:

  SEQUENCE MESSAGE
---------- --------------------------------------------------------------------------------------------------------------
         1 QSM-01150: query did not rewrite
         2 QSM-01106: materialized view, DEPT_COST, is stale with respect to some partition(s) in the base table(s)
         3 QSM-01029: materialized view, DEPT_COST, is stale in ENFORCED integrity mode

3 rows selected.

The 2nd and 3rd messages are a bit of a clue – so let’s change the session’s query_rewrite_integrity parameter to stale_tolerated and re-execute the procedure; which gets us to:


  SEQUENCE MESSAGE
---------- --------------------------------------------------------------------------------------------------------------
         1 QSM-01151: query was rewritten
         2 QSM-01033: query rewritten with materialized view, DEPT_COST

Footnote:

You’ll notice that I’ve quote-escaping in my inputs for the materialized view definition and query. This is convenience that let’s me easily cut and paste a statement into the scripts – or even use the @@script_name mechanism – writing a query (without a trailing slash, semi-colon, or any blank lines) in a separate file and then citing the script name between the opening and closing quote lines, e.g.

begin
        dbms_mview.explain_rewrite ( 
                query        => q'{
@@test_script
                                }',
                mv           => null,
                statement_id => '&m_audsid'
        );
end;
/

It’s worth noting that the calls to dbms_mview.explain_mv() and dbms_mview.explain_rewrite() don’t commit the rows they write to their target tables, so you ought to include a rollback; at the end of your script to avoid any confusion as you test multiple views and queries.

Footnote:

Here’s the text of the demobld.sql script as it was when I copied it in the dim and distant past. I’m not sure which version it came from – but I don’t think it’s the original v4/v5 release which I’m fairly sure had only the emp and dept tables. (For reference, and hash joins, there used to be a MOS note explaining that EMP was the big table and DEPT was the small table ;)


CREATE TABLE EMP (
        EMPNO           NUMBER(4) NOT NULL,
        ENAME           VARCHAR2(10),
        JOB             VARCHAR2(9),
        MGR             NUMBER(4),
        HIREDATE        DATE,
        SAL             NUMBER(7, 2),
        COMM            NUMBER(7, 2),
        DEPTNO          NUMBER(2)
);

insert into emp values
        (7369, 'SMITH',  'CLERK',     7902,
        to_date('17-DEC-1980', 'DD-MON-YYYY'),  800, NULL, 20);

insert into emp values
        (7499, 'ALLEN',  'SALESMAN',  7698,
        to_date('20-FEB-1981', 'DD-MON-YYYY'), 1600,  300, 30);

insert into emp values
        (7521, 'WARD',   'SALESMAN',  7698,
        to_date('22-FEB-1981', 'DD-MON-YYYY'), 1250,  500, 30);

insert into emp values
        (7566, 'JONES',  'MANAGER',   7839,
        to_date('2-APR-1981', 'DD-MON-YYYY'),  2975, NULL, 20);

insert into emp values
        (7654, 'MARTIN', 'SALESMAN',  7698,
        to_date('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);

insert into emp values
        (7698, 'BLAKE',  'MANAGER',   7839,
        to_date('1-MAY-1981', 'DD-MON-YYYY'),  2850, NULL, 30);

insert into emp values
        (7782, 'CLARK',  'MANAGER',   7839,
        to_date('9-JUN-1981', 'DD-MON-YYYY'),  2450, NULL, 10);

insert into emp values
        (7788, 'SCOTT',  'ANALYST',   7566,
        to_date('09-DEC-1982', 'DD-MON-YYYY'), 3000, NULL, 20);

insert into emp values
        (7839, 'KING',   'PRESIDENT', NULL,
        to_date('17-NOV-1981', 'DD-MON-YYYY'), 5000, NULL, 10);

insert into emp values
        (7844, 'TURNER', 'SALESMAN',  7698,
        to_date('8-SEP-1981', 'DD-MON-YYYY'),  1500,    0, 30);

insert into emp values
        (7876, 'ADAMS',  'CLERK',     7788,
        to_date('12-JAN-1983', 'DD-MON-YYYY'), 1100, NULL, 20);

insert into emp values
        (7900, 'JAMES',  'CLERK',     7698,
        to_date('3-DEC-1981', 'DD-MON-YYYY'),   950, NULL, 30);

insert into emp values
        (7902, 'FORD',   'ANALYST',   7566,
        to_date('3-DEC-1981', 'DD-MON-YYYY'),  3000, NULL, 20);

insert into emp values
        (7934, 'MILLER', 'CLERK',     7782,
        to_date('23-JAN-1982', 'DD-MON-YYYY'), 1300, NULL, 10);

CREATE TABLE DEPT(
        DEPTNO  NUMBER(2),
        DNAME   VARCHAR2(14),
        LOC     VARCHAR2(13) 
);

insert into dept values (10, 'ACCOUNTING', 'NEW YORK');
insert into dept values (20, 'RESEARCH',   'DALLAS');
insert into dept values (30, 'SALES',      'CHICAGO');
insert into dept values (40, 'OPERATIONS', 'BOSTON');

CREATE TABLE BONUS
        (
        ENAME VARCHAR2(10)      ,
        JOB VARCHAR2(9)  ,
        SAL NUMBER,
        COMM NUMBER
);

CREATE TABLE SALGRADE
      ( GRADE NUMBER,
        LOSAL NUMBER,
        HISAL NUMBER 
);

INSERT INTO SALGRADE VALUES (1,700,1200);
INSERT INTO SALGRADE VALUES (2,1201,1400);
INSERT INTO SALGRADE VALUES (3,1401,2000);
INSERT INTO SALGRADE VALUES (4,2001,3000);
INSERT INTO SALGRADE VALUES (5,3001,9999);
COMMIT;

 

July 12, 2020

Massive Deletes

Filed under: 12c,Infrastructure,Oracle,Upgrades — Jonathan Lewis @ 7:36 pm BST Jul 12,2020

One of the recurrent questions on the Oracle Developer Commuity forum is:

What’s the best way to delete millions of rows from a table?

There are an enormous number of relevant details that you need to know before you can give the “right” answer to this question, e.g.

  • Which version of Oracle
  • Standard or Enterprise Edition
  • Is “millions” a tiny percentage of the table or a large percentage
  • Are there any referential integrity constraints in place
  • Does the system have to keep running while the deletion takes place
  • Is the table compressed
  • How many indexes are there – and can you drop some of them
  • How much space do you have to do this job
  • How much time do you have to do this job

One of the most important ones, of course, is “Which version of Oracle?” because it can make an enormous difference to the range of possible strategies. I’m writing this particular note because the question came up a little while ago where the user wanted to delete all the data from 2008 through to the end of 2018, keeping only the last 18 months of data.

That sounds like the volume of data to be deleted (11 years) is very much larger than the volume of data to be kept (1.5 years) – but we can’t be sure of that since businesses tend to grow over time so that last 18 months of data might actually be just as big as the previous 11 years.

As usually happens in response to this question there were suggestions to “create a new table selecting the data you want to keep”, “use dbms_parallel_execute to delete by rowid ranges in parallel”, and a relatively new one “convert to a partitioned table so that the data you want to keep is in its own partition and drop the other partition(s)”. 

I wrote a note a few years ago giving an example of converting a simple heap table to a partitioned table – while eliminating the data you don’t want to keep so there’s no need to waste resources copying redundant data, maintaining indexes (choosing between local and global) and doing the whole job online.  So, after learning that the OP was running 12.2 Enterprise Edition with the Partitioning option, I suggested that (s)he convert the table into a hash partitioned table with a single partition as this should (for purposes of optimisation) behave just like a simple heap table using the “including rows” clause to copy only the last 18 months of data.

I pointed out that their version of Oracle(EE + PO) gave them the 2nd best option – because I knew that in 19c you could simply do something like:

rem
rem     Script:         122_move.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             12.2.0.1
rem

create table t1
as
select  *
from    all_objects
where   rownum <= 50000
;

alter table t1 add constraint t1_pk primary key(object_id);

alter table t1 move
        including rows where owner != 'SYS'
        online
;

It wasn’t until a little later that a tiny nagging doubt crept into my mind that maybe this enhancement to the basic move command may have appeared at the same time as the modify partition enhancement – in other words in 12.2.0.1; so I ran the test above and found that it did actually seem to work. (And I haven’t yet found any bugs on MOS suggesting that it shouldn’t be used.)

Having discovered that the command was available I thought that I’d better check whether it was also documented, and found that it was in the 12.2 SQL Reference Manual (though not the 12.1 reference manual – for the obvious reason) under Alter Table. Page down to the “tram-tracks” for the Alter Table command and follow the link for the “move_table_clause”, and from there follow the link for “filter_condition”.

Note:

This option is not available on 12.1, and if you run the test Oracle will raise error “ORA-25184: column name expected” at the point where the word “rows” appears. This may look somewhat counter-intuitive, but in that version of Oracle a command like “alter table TabX move including ColY online”; is how you would rebuild an index organized table (IOT) with all columns up to ColY in the “IOT_TOP” segment.

Update [The following morning]

Once you’ve got the framework of a test in place it really doesn’t take very long to start running through “what if” cases or potential boundary conditions.  So this morning I added one very obvious test – what happens if you have referential integrity declared between two tables and try to move both of them including a subset of rows from each that ensures that the referential integrity is still in place:


rem
rem     Script:         122_move_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0        Parent can't move
rem             12.2.0.1        Parent can't move
rem

create table parent
as
select  *
from    all_objects
where   rownum <= 50000
;

alter table parent add constraint par_pk primary key(object_id);


create table child
as
select  *
from    parent
order by
        dbms_random.value
;


alter table child add constraint chi_pk primary key(object_id);
alter table child add constraint chi_fk_par foreign key(object_id) references parent;

I’ve created the child table from the parent data, with random ordering. Now I’m going to delete all the child rows where owner = ‘PUBLIC’ using an online move, then I’ll try and do the same for the parent.


alter table child move
        including rows where owner != 'PUBLIC'
        online
;

-- Child move succeeds (of course)

alter table parent move
        including rows where owner != 'PUBLIC'
        online
;

--
-- Trying to do the matching move on the parent results in:
-- ORA-02266: unique/primary keys in table referenced by enabled foreign keys
--

So there’s a gap in the functionality that makes it less desirable than the simplest case suggests. The referential integrity constraint has to be disabled before the parent table can be restructured.

But something that merits a little further investigation is the option to set the foreign key to “disable validate” (which is sufficient to allow the parent move to take place) and then to set the constraint back to “enable”. When I tried this I had expected Oracle to do a lot of work to revalidate the constraint before enabling it, but I couldn’t find any indication that any such work had taken place.

February 4, 2020

maxquerylen

Filed under: Bugs,Infrastructure,Oracle,undo — Jonathan Lewis @ 3:50 pm GMT Feb 4,2020

The view v$undostat is a view holding summary information about undo activity that can be used by the automatic undo mechanism to deal with optimising the undo retention time (hence undo space allocation). The view holds one row for every ten minute interval in the last 4 days (96 hours) and includes two columns called maxquerylen and maxqueryid – which tell you something about the query that was considered to be the longest running query active in the interval.

In this note I want to explain why the contents of these two columns are sometimes (possibly often) completely irrelevant despite there being a few notes on the internet about how you should investigate them to help you decide on a suitable setting for the undo_retention.

The descriptions in the 19c reference manual for these columns is as follows:

  • maxquerylen – Identifies the length of the longest query (in seconds) executed in the instance during the period. You can use this statistic to estimate the proper setting of the UNDO_RETENTION initialization parameter. The length of a query is measured from the cursor open time to the last fetch/execute time of the cursor. Only the length of those cursors that have been fetched/executed during the period are reflected in the view.
  • maxqueryid – SQL identifier of the longest running SQL statement in the period

It would be hard to misunderstand the meaning of the second column – if the first column tells us that Oracle has spotted a “longest query” then the second column gives us the sql_id so we can check v$sql to find out what it was. But what sort of queries are going to show up as the “longest query” in the interval?

Here’s an example from a few hours of a nearly idle instance, querying the begin and end times (formatted to show just day of month + time), with the two critical columns,


select  begin_time, end_time, maxquerylen, maxqueryid
from    v$undostat
order by
    begin_time
;

BEGIN_TIME	END_TIME    MAXQUERYLEN MAXQUERYID
--------------- ----------- ----------- -------------
...
04 10:50:18	04 11:00:18	      0
04 11:00:18	04 11:10:18	    356 f3yfg50ga0r8n
04 11:10:18	04 11:20:18	    883 25u1mbkcr9rnu
04 11:20:18	04 11:30:18	   1486 25u1mbkcr9rnu
04 11:30:18	04 11:40:18	   2090 25u1mbkcr9rnu
04 11:40:18	04 11:50:18	      0
04 11:50:18	04 12:00:18	   3299 25u1mbkcr9rnu
04 12:00:18	04 12:10:18	   3903 25u1mbkcr9rnu
04 12:10:18	04 12:20:18	   4507 25u1mbkcr9rnu
04 12:20:18	04 12:30:18	      0
04 12:30:18	04 12:40:18	      0
04 12:40:18	04 12:50:18	      0
04 12:50:18	04 13:00:18	      0
04 13:00:18	04 13:10:18	      0
04 13:10:18	04 13:20:18	      0
04 13:20:18	04 13:30:18	      0
04 13:30:18	04 13:37:27	   9035 25u1mbkcr9rnu

173 rows selected.

Notice, particularly, that the SQL_ID 25u1mbkcr9rnu disappears from the 11:40 interval, then re-appears at 11:50, then disappears again from 12:20 through 13:20 (lunchtime), then reappears again at 13:30. And when it reappears after an absence the query length has increased in a way that’s consistent with the gap. So it looks as if the query wasn’t running during the gap, but turns out to have been running after the gap ended. (Is this Quantum queries?)

The explanation is in the detail of the definition: “from the cursor open time to the last fetch/execute time”. From an SQL*Plus session I “set pause on” then executed the query “select * from all_objects” and hit return a few times to get a few pages of output. Then, spread over the next couple of hours, I hit return a few more times to get a few more pages of output. Each time I hit return my session does another fetch call, and the code behind v$undostat notices that my query is still active.

I don’t know exactly how Oracle is keeping track of “active” statements because there seem to be some inconsistencies in the reporting (and I’ll comment on those later), but as a first approximation, until you close a cursor (either explicitly or implicitly) some piece of Oracle’s code registers the fact that the query might do further fetches, which means it might need to apply more undo to make current data read-consistent with the cursor’s start SCN, which means that it should take note of the time the cursor has been open because the undo retention time might need to be that long.

Inconsistencies.

I said there were some inconsistencies in the reporting. I’ve noticed three anomalies – perhaps due to the extreme idleness of the instance I was using for testing.

  1. At about 12:45 I hit return a few times to get the maxquerylen and maxqueryid refreshed – but the code didn’t seem to notice that I had done a few more fetches of a long running query. So it seems to be possible for Oracle to miss the query that should be reported.
  2. At about 11:52 I hit return a few times again, and you can see that the two critical columns were refreshed, but if you do the arithmetic Oracle has attributed 600 seconds to the query – the whole of the 10 minute refresh interval, not just the time up to the moment of the last fetch that I had done in that interval.
  3. I didn’t hit return again until about 12:45 (the moment in point 1 above when the two columns didn’t refresh the way I though they should), but the columns kept updating through 12:00 and 12:10 intervals anyway before disappearing from the 12:20 interval. So it looks like queries can be reported as long running even when they haven’t been executing or fetching.

The is one last anomaly – and this relates to the reason I started looking at this columns. If you check the entry for 11:00 you’ll see that the SQL Id ‘f3yfg50ga0r8n’ has been reported as “running” for 356 seconds. But here’s what I found in v$sql for that query:


select  executions, parse_calls, fetches, end_of_fetch_count, elapsed_time, cpu_time, rows_processed, sql_text
from    v$sql
where   sql_id = 'f3yfg50ga0r8n'
;

EXECUTIONS PARSE_CALLS	  FETCHES END_OF_FETCH_COUNT ELAPSED_TIME   CPU_TIME ROWS_PROCESSED SQL_TEXT
---------- ----------- ---------- ------------------ ------------ ---------- -------------- ------------------------------------------
        79          79         79                  0        20487      10667             79 select obj# from obj$ where dataobj# = :1

The SQL lookes like a sys-recursive statement which, in 79 executions, has accumulated 20 milliseconds of elapsed time (rather than 356 seconds – but that difference could just be related to one or other of the anomalies I reported earlier). The key thing to note is that the value of column end_of_fetch_count is zero: this looks like a statement where Oracle has simply not bothered to fetch all the data and has then not bothered to close the cursor. As a result it’s possible that each time the statement is executed (note that parse_calls = executions, so we’re not looking at a “held” cursor) the code behind v$undostat looks back at the time the cursor was initially opened to measure the query length, rather than looking at the time the statement was re-executed.

This may go a long way to answering the question that came up on Oracle-l earlier on today as follows:

The following query (sql_id is 89w8y2pgn25yd) was recorded in v$undostat.maxqueryid in a 12.2. database during a period of a high undo usage: select ts# from sys.ts$ where ts$.online$ != 3 and bitand(flags,2048) != 2048;

select
        undoblks,txncount,maxquerylen,maxconcurrency,activeblks
from    v$undostat u
where   maxqueryid='89w8y2pgn25yd'
;

UNDOBLKS  TXNCOUNT  MAXQUERYLEN  MAXCONCURRENCY  ACTIVEBLKS
--------  --------  -----------  --------------  ----------
   39199      4027         1378               5     2531960

What is this query, and how does it manage to report a maximum query length of 1,378 seconds? Just like the one above it’s a sys-recursive query; and this one appears when you query dba_tablespaces – and though it executes once for every row and takes just fractions of a second to execute. But if you trace a query like “select tablespace_name from dba_tablespaces” you’ll see that every time the query is called the trace file will show lines for: “Parse, Exec, Fetch, Close” until the last call – which doesn’t report a “Close”.

Just like my original “select * from all_objects” there’s a query dangling with an open cursor, leaving Oracle with the opportunity to go back to the moment it was opened and report it as a “long running query”.

tl;dr

The maxquerylen and maxqueryid in v$undostat don’t tell you about statements that have taken a long time to change a lot of data and generate a lot of undo; they tell you about statements that might need to apply a lot of undo to see read-consistent data and therefore might become victims in a “ORA-01555: snapshot too old” event.

For various reasons the statements reported may be completely irrelevant because there are various reasons (some, possibly, bug-related) why a cursor may be treated as if it opened a long time in the past when it was actually a re-use of an existing “dangling” cursor. It’s also possible that some bad client code will treat cursor in a way that does no harm to the client program but hides a more interesting query that would otherwise have been reported by these two columns.

Update Feb 2020

I don’t know why I didn’t think of searching MOS before poking around at this issue, but I’ve just done a search on “maxquerylen maxqueryid” and one of the first Knowledgebase articles to show up – dated September 2019 – was Doc ID 2005931.1: Wrongly calculated MAXQUERYLEN Causing ORA-01555 or ORA-30036 Errors.

The document reports several bugs all relating to the same issue, and all closed as duplicates of unpublished bug: 12314489: OCIRELEASESTMT() TO PUT THE CURSORS IN CURBOUND STATE WHEN PLACING THEM IN CACHE This is also close and replaced by bug 26833932 which is also unpublished and doesn’t even have a description.

January 27, 2020

ANSI flashback

Filed under: ANSI Standard,Bugs,Flashback,Oracle — Jonathan Lewis @ 9:45 am GMT Jan 27,2020

I am seeing “traditional” Oracle SQL syntax being replaced by “ANSI”-style far more frequently than I used to – so I thought I’d just flag up another reminder that you shouldn’t be too surprised if you see odd little glitches showing up in ANSI style that don’t show up when you translate to traditional; so if your SQL throws an unexpected error (and if it’s only a minor effort to modify the code for testing purposes) it might be a good idea to see if the problem goes away when you switch styles. Today’s little glitch is one that showed up on the Oracle-l listserver 7 years ago running 11.2.0.3 but the anomaly still exists in 19c.

As so often happens it’s a problem that appears in one of the less commonly used Oracle features – in this case flashback queries. We’ll start by creating a table, then pausing for thought (Note: this code is little more than a cosmetic rewrite of the original posting on Oracle-l):


rem
rem     Script:         ansi_flashback_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 
as
select  * 
from    all_objects
where   rownum <= 10000 -- > comment to avoid wordpress format issue
;

create table t2
as
select  *
from    t1
where   rownum <= 10 -- > comment to avoid wordpress format issue
;
 
prompt  =======================
prompt  Sleeping for 10 seconds
prompt  =======================

execute dbms_lock.sleep(10)

column current_scn new_value m_scn format a15

select  to_char(current_scn,'99999999999999') current_scn 
from    v$database
/

 
select 
        v1.object_name
from 
        t1 as of scn &m_scn v1
join 
        t2 as of scn &m_scn v2
on 
        v2.object_id = v1.object_id
/

I’ve created a couple of tables then introduced a 10 second sleep before checking the database SCN. The sleep is there because I want to be able to query the tables “as of SCN” and if I don’t pause for a little while (typically about 5 seconds) the code will probably raise Oracle error ORA-01466: unable to read data – table definition has changed.

The query I want to use references both tables as of the same SCN, using “ANSI” syntax to do the join. The query behaves perfectly reasonably when run from SQL(Plus; the problem starts to appear when I try to embed the query as a cursor in a PL/SQL procedure. First I’ll copy the SQL exactly as it is (with substitution variable) into a procedure declaration. The variable will be replaced in both cases by an actual value before the procedure is created, as the subsequent check of user_source will show:


create or replace procedure p1( myscn in varchar2 ) as

        cursor c1 is 
                select  v1.object_name
                from 
                        t1 as of scn &m_scn v1
                join 
                        t2 as of scn &m_scn v2
                on 
                        v2.object_id = v1.object_id
        ;

        l_row c1%rowtype;

begin
        open c1;
        fetch c1 into l_row;
        dbms_output.put_line(l_row.object_name);
        close c1;
end;
/
 
select  text 
from    user_source
where   type = 'PROCEDURE'
and     name = 'P1'
order by 
        line
/

execute p1 ('0')

The procedure compiles successfully and the query against user_source shows it stored as follows (note, particularly, an actual value has been stored for the SCN):


procedure p1( myscn in varchar2 ) as

        cursor c1 is
                select  v1.object_name
                from
                        t1 as of scn  12670394063090 v1
                join
                        t2 as of scn  12670394063090 v2
                on
                        v2.object_id = v1.object_id
        ;

        l_row c1%rowtype;

begin
        open c1;
        fetch c1 into l_row;
        dbms_output.put_line(l_row.object_name);
        close c1;
end;

Next we recreate the procedure but replace the substitution variable with the name of the incoming formal parameter:


create or replace procedure p1( myscn in varchar2 ) as

        cursor c1 is
                select  v1.object_name
                from 
                        t1 as of scn myscn v1
                join 
                        t2 as of scn myscn v2
                on 
                        v2.object_id = v1.object_id
        ;

        l_row c1%rowtype;

begin
        open c1;
        fetch c1 into l_row;
        dbms_output.put_line(l_row.object_name);
        close c1;
end;
/
 
show errors

You’ll notice that instead of doing a test execution of the procedure I’ve called “show errors”. This is because the procedure won’t compile and reports “Warning: Procedure created with compilation errors” with the following output from the call to show errors:


Errors for PROCEDURE P1:

LINE/COL ERROR
-------- -----------------------------------------------------------------
3/9      PLS-00341: declaration of cursor 'C1' is incomplete or malformed
4/3      PL/SQL: ORA-00984: column not allowed here
4/3      PL/SQL: SQL Statement ignored
13/8     PL/SQL: Item ignored
17/2     PL/SQL: SQL Statement ignored
17/16    PLS-00320: the declaration of the type of this expression is
         incomplete or malformed

18/2     PL/SQL: Statement ignored
18/23    PLS-00320: the declaration of the type of this expression is
         incomplete or malformed

So we check to see if the same warning and list of errors appear if I switch to “traditional” Oracle syntax:


create or replace procedure p1( myscn in varchar2 ) as

        cursor c1 is
                select  v1.object_name
                from 
                        t1 as of scn myscn v1,
                        t2 as of scn myscn v2
                where 
                        v2.object_id = v1.object_id
        ;

        l_row c1%rowtype;

begin
        open c1;
        fetch c1 into l_row;
        dbms_output.put_line(l_row.object_name);
        close c1;
end;
/
 
execute p1 (&m_scn)

The answer is no. This version of the query is accepted by the PL/SQL compiler, and the call to execute it with the SCN supplied in the substitution variable produces the expected results.

Is there anything we can do to stick with ANSI style syntax? Almost invariably the answer will be yes. Here’s a little workaround in this case:

create or replace procedure p1( myscn in varchar2 ) as

        cursor c1 is
                select  v1.object_name
                from 
                        (select * from t1 as of scn myscn) v1
                join 
                        (select * from t2 as of scn myscn) v2
                on 
                        v2.object_id = v1.object_id
        ;

        l_row c1%rowtype;

begin
        open c1;
        fetch c1 into l_row;
        dbms_output.put_line(l_row.object_name);
        close c1;
end;
/
 
execute p1 (&m_scn)

We simply embed each “as of scn” clause inside an inline view and then join the views. If you enable the CBO (10053) trace before executing this version of the procedure you’ll find that the final “unparsed” SQL produced by the optimzer has, of course, been translated back into the traditional syntax.

Warning: it’s fairly likely that this workaround will do what you want, but it’s possible that in a few cases it may result in a different execution plan from the one you were expecting (or would get from traditional syntax).

 

January 23, 2020

WITH Subquery

Filed under: 12c,Infrastructure,Oracle — Jonathan Lewis @ 8:37 pm GMT Jan 23,2020

Here’s another anomaly that appears when you mix and match Oracle features. In this case it’s “With” subqueries (common table expressions / CTEs) and Active Dataguard (ADG) Standby databases. The problem appeared on the Oracle-l listserver and luckily for the OP another member of the list had seen it before and could point to a relevant MOS document id which explained the issue and supplied a workaround.

The OP had their standby database opened read-only for reporting and found the following oddity in the extended SQL trace file for one of their reports:


WAIT #140196872952648: nam='db file scattered read' ela= 1588 file#=4097 block#=579715946 blocks=128 obj#=-39778567 tim=17263910670242
WAIT #140196872952648: nam='db file scattered read' ela= 1495 file#=4097 block#=579715947 blocks=128 obj#=-39778567 tim=17263910672065
WAIT #140196872952648: nam='db file scattered read' ela= 1671 file#=4097 block#=579715948 blocks=128 obj#=-39778567 tim=17263910674042
WAIT #140196872952648: nam='db file scattered read' ela= 1094 file#=4097 block#=579715949 blocks=128 obj#=-39778567 tim=17263910675443

Before pointing out the oddity (if you haven’t spotted it already) I’ll just explain a few of the numbers thayt are a little unusual.

  • File# = 4097: the user has parameter db_files = 4096, so this is the first Temp file.
  • Block# = 579,715,946: the database is 120TB, and the temporary tablespace is a “bigfile” tablespace so it’s okay for the file to hold more than 579M blocks.
  • Obj# < 0: Negative object numbers is a characteristic of materialized CTEs: if you look at the execution plan a materialized CTE will be reported as a table with a name like  SYS_TEMP_FDA106F9_E259E68.  If you take the first hexadecimal number and treat is as a 32-bit signed integer you get the value that would be reported as the obj# in the trace file.  (Converting to decimal and subtract power(2,32) is one way of doing the arithmetic).
  • tim= nnnnnnnn:  this is the timestamp (usually in microseconds), and we can see intervals of roughly 1,400 to 2,000 microseconds between these lines.

So here’s the oddity: in this set of 4 consecutive waits we’re waiting for multiblock reads of 128 blocks – but each read starts one block after the previous read. It’s as if Oracle is reading 128 blocks and forgetting everything after the first one. And the timestamps are significant because they tell us that this isn’t a case of Oracle spending so much time between reads that the other blocks fall off the end of  the buffer cache before the query reaches them.

I think I’ve seen a pattern like this once before but it would have been quite a long time ago and I can’t find any notes I might have made about it (and it turns out that my previous experience was not relevant to this case). Fortunately another member of Oracle-l had also seen the pattern and supplied the solution through a reference to a MOS document that led to: Doc ID 2251339.1 With Subquery Factorization Temp Table Does Not Cache in Standby in 12.1.0.2.

It’s not a bug – Oracle is supposed to do this if you manage to materialize a CTE in a Read-only Standby database. I don’t understand exactly why there’s a problem but thanks to some feature of how consistent reads operate and block SCNs are generated when you populate the blocks of the global temporary table (GTT) that is your materialized CTE it’s possible for Oracle to produce the wrong results if it re-visits blocks that have been read into the cache from the GTT. So when you do a multiblock read during a tablescan of the GTT Oracle can use the first block it has read (presumably because it’s immediately pinned), but can’t use the remaining 127 – and so you get the odd pattern of consecutive blocks appearing at the start of consecutive multiblock reads.

This raises a couple of interesting (and nasty) questions.

  • First – does every 128 block read get read to the middle of the buffer cache, pushing another 128 blocks out of the buffer cache or does Oracle automatically read the blocks to the “cold” end of the LRU, minimising the impact on the rest of the cache; we hope it’s the latter.
  • Second – If I use a small fetch size while running my query might I find that I have to re-read the same block (with its 127 neghbours) many times because Oracle releases any pinned blocks at the end of each fetch and has to re-acquire the blocks on the next fetch.

If anyone wants to test the second question by running a query from SQL*Plus with extended trace enabled the following simple query should answer the question:

alter session set events '10046 trace name context forever, level 8';
set arraysize 2

with v1 as (select /*+ materialize */ * from all_objects)
select object_name from v1;

Workarounds

There is a workaround to the issue – you can add the hint /*+ inline */ to the query to ensure that the CTE is not materialized. There is a bit of a catch to this, though (on top of the fact that you might then need to have two slightly different versions of the code if you want to run the query on production and standby) – if Oracle places the subquery text inline the optimizer may manage to merge it into the rest of the query and come up with a bad execution plan. Again you can probably work around this threat by extending the hint to read: /*+ inline no_merge */. Even then the optimizer could decide it has better statistics about the “real” table columns that it might have lost when it materialized the subquery, so it could still produce a different execution plan from the materialized plan.

As an alternative (and somewhat more brutal) workaround you could set the hidden parameter “_with_subquery” to inline either at the session or system level, or in the startup parameter file.

 

December 24, 2019

Flashback Archive

Filed under: Bugs,Flashback,Oracle — Jonathan Lewis @ 8:33 pm GMT Dec 24,2019

A classic example of Oracle’s “mix and match” problem showed up on the Oracle Developer Forum a few days ago. Sometimes you see two features that are going to be really helpful in your application – and when you combine them something breaks. In this case it was the combination of Virtual Private Database (VPD/FGAC/RLS) and Flashback Data Archive (FDA/FBA) that resulted in the security predicate not being applied the way you would expect, hence allowing users to see data they were not supposed to see.

The OP supplied us with a model (based in part on Tim Hall’s FDA article) to demonstrate the issue on 11.2.0.4, and I’ve hacked it about a bit to explain it here, and to test it on 12.2.0.1 and 19.3.0.0 where the same failure occurs.

I’m going to start with just the VPD part of the setup before adding in the FDA. Most of the code has been written to run as the SYS user and it creates a new tablespace and a couple of users so you may want to do some editing before you try any tests. There’s also a short script at the end of the blog to remove the flashback data archive, tablespace, and users – again, something to be run by SYS.

You’ll note that this script assumes you already have a tablespace called test_8k_assm, and a temporary tablespace called temp. The latter may well be a valid assumption, the former probably isn’t.

rem
rem     Script:         vpd_fda_bug.sql
rem     Author:         Duncs (ODC)
rem     Dated:          Dec 2019
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem
rem     Notes
rem     Has to be run as SYS
rem
rem     See also
rem     https://community.oracle.com/thread/4307453
rem


create user vpd_test_data_owner identified by Password_1234 
        default tablespace test_8k_assm
        temporary tablespace temp 
        quota unlimited on test_8k_assm
;
 
grant 
        connect,
        resource, 
        create any context
to
        vpd_test_data_owner
;

grant
        execute on dbms_rls
to
        vpd_test_data_owner
;
 
 
create table vpd_test_data_owner.person (
        person_id       number, 
        surname         varchar2(30), 
        unit_id         number
);

insert into  vpd_test_data_owner.person values (-1, 'One',  -1);
insert into  vpd_test_data_owner.person values (.2, 'Two',  -2);
insert into  vpd_test_data_owner.person values (.3, 'Three',-3);
insert into  vpd_test_data_owner.person values (-4, 'Four', -4);
insert into  vpd_test_data_owner.person values (-5, 'Five', -5);

commit;

create user vpd_test_function_owner identified by Password_1234
        default tablespace test_8k_assm 
        temporary tablespace temp 
        quota unlimited on test_8k_assm
;
 
grant 
        connect, 
        resource
to 
        vpd_test_function_owner
;
 
prompt  ============================================
prompt  Create a packaged function to set a context
prompt  that we will use in a VPD security predicate
prompt  ============================================

create or replace package vpd_test_function_owner.context_api_pkg AS

procedure set_parameter(
        p_name  in  varchar2,
        p_value in  varchar2
);

end context_api_pkg;
/
 
create or replace package body vpd_test_function_owner.context_api_pkg IS
 
procedure set_parameter (
        p_name  in  varchar2,
        p_value in  varchar2
) is
begin
        dbms_session.set_context('my_vpd_context', p_name, p_value);
end set_parameter;

end context_api_pkg;
/

prompt  ======================================================
prompt  Allow public to set the context value.  (Not sensible)
prompt  ======================================================

grant execute on vpd_test_function_owner.context_api_pkg to public;

prompt  ===============================================================
prompt  Create a context that can only be set by our packaged procedure
prompt  ===============================================================

create or replace context my_vpd_context 
        using vpd_test_function_owner.context_api_pkg
;

prompt  =====================================================
prompt  Create a security function that generates a predicate
prompt  based on our context, then create a policy to connect
prompt  the function to the test table for select statements.
prompt  =====================================================
 
create or replace function vpd_test_function_owner.test_vpd_function (
    p_schema  in varchar2 default null
  , p_object  in varchar2 default null
)
return varchar2
as
    lv_unit_id number := nvl(sys_context('my_vpd_context','unit_id'), -1);
begin
    return 'unit_id = ' || lv_unit_id;
end test_vpd_function;
/

begin
      dbms_rls.add_policy (
               object_schema    => 'vpd_test_data_owner'
             , object_name      => 'person'
             , policy_name      => 'test_vpd_policy'
             , function_schema  => 'vpd_test_function_owner'
             , policy_function  => 'test_vpd_function'
             , statement_types  => 'select'
      );
end;
/


There are several quick and dirty bits to the script – you shouldn’t be using the connect and resoruce roles, for example; they exist only for backwards compatibility and don’t even manage that very well any more. Any grants made should be carefully chosen to be the minimum necessary to achieve the required functionality, and you should be defining roles of your own rather than using pre-existing ones.

Generally you don’t expect to set up a security policy that stops the owner of the data from seeing all the data – and I’ve left the policy to default to dynamic which means the function will execute on every parse and execute of a statement accessing the table (and that’s somethin to avoid if you can). For convenience I’ve also alloweed the owner of the data to execute the function that changes the context that is used by the predicate function – and you don’t really want to allow anyone who is constrained by a security policy to be able to modify their own access rights like this.

Since the code allows a deliberately lax setup on VPD you could at this point do something like the following to check that VPD is actually working before moving on to test the effect of FDA:

connect vpd_test_data_owner/Password_1234
select * from person;

execute vpd_test_function_owner.context_api_pkg.set_parameter('unit_id',-2)
select * from person;

The first execution of the query should show you only the row where unit_id = -1 as “unit_id = -1” is the default return value from the security function. The second execution should return only the row where unit_id = -2 as the call to set_parameter() changes the context value so that when the security function re-executes it generate a new security predicate “unit_it = -2”. (It’s worth noting that one of the options for security policies is to make them context-dependent so that they re-execute only when the relevant context is changed – but in this case the policy defaults to “re-execute the function on every parse and execute”.)  [NOTE: for some clues on the possible performance impact of badly defined VPD, check the comments made in response to this blog note]

Once you’re satisfied that the security policy is working correctly you can move on to the second feature – flashback data archive. Logging on as SYS once again, execute the following code – which, amongst other things, creates a new tablespace. You’ll notice that I’ve got three lines in the “create tablespace” statement naming a datafile (though one of them doesn’t actually supply a name). The names (or absence thereof) correspond to the default naming conventions I have for my sandbox 11g, 12c, and 19c instances in that order. You will want to adjust according to your file-naming conventions.


prompt  ============================
prompt  Setting up Flashback Archive
prompt  ============================

create tablespace fda_ts 
        datafile        
--              no name needed if OMF
--              '/u01/app/oracle/oradata/orcl12c/orcl/fda_ts.dbf'
--              '/u02/data/OR19/orclpdb/fda_ts.dbf'
        size 1m autoextend on next 1m
;

alter user vpd_test_data_owner quota unlimited on fda_ts;

create flashback archive default fda_1year tablespace fda_ts
quota 1g retention 1 year;
 
grant flashback archive on fda_1year to vpd_test_data_owner;
grant flashback archive administer to vpd_test_data_owner;
grant execute on dbms_flashback_archive to vpd_test_data_owner;
 
prompt  Sleeping for 1 minute before adding table to flashback archive
execute dbms_lock.sleep(60);
alter table vpd_test_data_owner.person flashback archive fda_1year;

prompt  Sleeping for 1 minute before updating the date
execute dbms_lock.sleep(60);
update vpd_test_data_owner.person set surname = upper(surname);

commit;

prompt  Sleeping for 5 minutes to give FDA a chance to do its thing.
execute dbms_lock.sleep(300);
alter system flush shared_pool;

prompt  ==================================================
prompt  Now connect to the data owner schema and run the 
prompt  original query then a couple of flashback queries, 
prompt  pulling their plans from memory
prompt  ==================================================

connect vpd_test_data_owner/Password_1234

set linesize 120
set pagesize 50
set trimspool on
set serveroutput off

spool vpd_fda_bug.lst
-- set autotrace on explain

select * from vpd_test_data_owner.person;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-1/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-2/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-3/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-4/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-5/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-6/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-7/1440;  
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-8/1440;  
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-9/1440;  
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-10/1440; 
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-15/1440; 
set autotrace off
spool off 

I’ve created a tablespace that I’m going to reserve for the flashback archive and given my data owner a quota on that tablespace; then I’ve created a flashback archive in that tablespace and granted various privileges relating to flashback archive to my data owner.

The next few lines of code include a couple of calls to dbms_lock.sleep() because I want to avoid the risk of getting an Oracle error ORA-01466: unable to read data – table definition has changed, but all I’ve done otherwise is modify the person table to be archiving and then made a little change that will eventually be recorded as part of the archive.

I’ve then introduced a 5 minute wait as it seems to take about 5 minutes before the flashback process takes any action to capture the original table data and copy any related undo; but after that 5 minutes is up I’ve queried the person table directly (which should show you the one row where unit_id = -1, then gradually gone backwards in time re-querying the data.

You should see the same result being produced for a few minutes, then a version of the “pre-update” data (upper case ‘ONE’ changing to mixed case ‘One’), and then you will (I hope) see the entire original data set appearing and finally you should see Oracle raising error “ORA-01466: unable to read data – table definition has changed” when your “as of timestamp” goes back beyond the moment you created the archive. (Except that that doesn’t happen with 11.2.0.4, which manages to report the data as if it had existed long before you created it).

I’ve commented out the “set autotrace on explain” in the above, but if you leave it in, or introduce it for one of the queries, you’ll see what’s going on that allows flashback data archive show you data that should have been hidden by the security predicate. Here’s the execution plan for one run:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                     |     2 |    86 |    17  (12)| 00:00:01 |       |       |
|   1 |  VIEW                     |                     |     2 |    86 |    17  (12)| 00:00:01 |       |       |
|   2 |   UNION-ALL               |                     |       |       |            |          |       |       |
|*  3 |    FILTER                 |                     |       |       |            |          |       |       |
|   4 |     PARTITION RANGE SINGLE|                     |     1 |    71 |     7   (0)| 00:00:01 |   KEY |     1 |
|*  5 |      TABLE ACCESS FULL    | SYS_FBA_HIST_353151 |     1 |    71 |     7   (0)| 00:00:01 |   KEY |     1 |
|*  6 |    FILTER                 |                     |       |       |            |          |       |       |
|   7 |     MERGE JOIN OUTER      |                     |     1 |  2083 |    10  (20)| 00:00:01 |       |       |
|   8 |      SORT JOIN            |                     |     1 |    55 |     7  (15)| 00:00:01 |       |       |
|*  9 |       TABLE ACCESS FULL   | PERSON              |     1 |    55 |     6   (0)| 00:00:01 |       |       |
|* 10 |      SORT JOIN            |                     |     5 | 10140 |     3  (34)| 00:00:01 |       |       |
|* 11 |       TABLE ACCESS FULL   | SYS_FBA_TCRV_353151 |     5 | 10140 |     2   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111)<12670390363943)
   5 - filter("ENDSCN" .le. 12670390363943 AND ("OPERATION" IS NULL OR "OPERATION"<>'D') AND
              "ENDSCN">"TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111) AND ("STARTSCN" IS
              NULL OR "STARTSCN" .le. "TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111)))
   6 - filter("STARTSCN"<="TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111) OR
              "STARTSCN" IS NULL)
   9 - filter("UNIT_ID"=(-1) AND ("VERSIONS_OPERATION" IS NULL OR "VERSIONS_OPERATION"<>'D') 
             AND ("VERSIONS_STARTSCN" IS NULL OR "VERSIONS_STARTSCN".le."TIMESTAMP_TO_SCN(SYSDATE@!-.004861111111111111111111111111111111111111))
             AND ("VERSIONS_ENDSCN" IS NULL OR "VERSIONS_ENDSCN">"TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111)))
  10 - access("RID"(+)=ROWIDTOCHAR("T".ROWID))
       filter("RID"(+)=ROWIDTOCHAR("T".ROWID))
  11 - filter(("ENDSCN"(+) IS NULL OR "ENDSCN"(+)>12670390363943) AND ("STARTSCN"(+) IS NULL OR
              "STARTSCN"(+)<12670390363943))

Note
-----
   - dynamic sampling used for this statement (level=2)

Notice that the predicate “unit_id = -1″ appears on the full table scan of person at operation 9 – that’s Oracle applying the security predicate to the person table. But the flashback code has replaced the person table with a union all of (some partititions of) the SYS_FBA_HIST_353151 and a join between the person table and the SYS_FBA_TCRV_353151 table. And the code path that attaches the security predicate fails to attach it to the history table.

tl;dr

VPD (virtual private database) does not seem to be aware of the query rewrite that takes place if a table has an assocated FDA (flashback data archive), so a flashback query may report rows from the “history” table that should have been blocked by the VPD security policy.

Lagniappe

There is another little problem with FDA that might affect you if you try to optimizer flashback queries by creating SQL Plan Baselines. If you create a baseline on a test system (that isn’t a backup copy of the production system) and use the export/import facility to move the baseline to production then the baseline won’t work because the sys_fba_hist_nnnnn and sys_dba_tcrv_nnnnn table names are constructed from the object_id of the base table – which means the archive table names (and associated baseline hints) in the test system are probably going to have different names from the production system.

Housekeeping

To clean up the database after you’ve done all this testing, run the following script (modified to match any changes you’ve made in the test) after logging on as SYS:


alter table vpd_test_data_owner.person no flashback archive;

drop flashback archive fda_1year;

drop USER VPD_TEST_FUNCTION_OWNER cascade;
drop USER VPD_TEST_DATA_OWNER cascade;

drop tablespace fda_ts including contents and datafiles;

 

 

 

 

November 9, 2019

Nested Tables

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 11:28 am GMT Nov 9,2019

This note is a little side effect of answering a question about the total space associated with a table, including its indexes, LOB columns and nested tables. The first suggestion I published failed to take account of the fact that nested tables can contain their own nested tables so I had to modify the code.

The basic change was easy – in the original code I had joined to the view dba_nested_tables to translate between a table_name and its parent_table_name. The only change I needed to make was to replace the reference to the view with a different view that joined a table_name to its “oldest ancestor”. To achieve this I had to do two things: first, create a table with multiple levels of nesting, then create a suitable view definition.  For reference – because it may help somebody – I’ve published the two stages here.

A revolting nested table:

The following code creates three table types, but the second table type


rem
rem     Script:         nested_tables.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create or replace type jpl_item3 as object (v1 varchar2(10), n3 number);
/

create or replace type jpl_tab3_type as table of jpl_item3;
/

create or replace type jpl_item2 as object(v2 varchar2(10), jpl3 jpl_tab3_type);
/

create or replace type jpl_tab2_type as table of jpl_item2;
/

create or replace type jpl_item1 as object(v3 varchar2(10), jpl2 jpl_tab2_type);
/

create or replace type jpl_tab1_type as table of jpl_item1;
/

create table demo_nest_2 (
        id      number  constraint d2_pk primary key,
        jpl1    jpl_tab1_type
)
segment creation immediate
nested table jpl1 store as t_jpl1
        (
        nested table jpl2  store as t_jpl2
                (
                nested table jpl3 store as t_jpl3 
                return as locator
                )
        return as locator
        )
return as locator
/

I’ve never seen nested tables manually created in a production system though I believe they are likely to appear (along with varrays and LOBs) as a side effect of some XML or JSON mechanisms, but many years ago (in Practical Oracle 8i, possibly) I discussed the pros and cons of returning them “by value” or (as I have here) “by reference”. As you can see, you need to exercise some care with brackets and locating the text as you try to refine multiple levels of nesting.

Tne Ancestor View

We’ll take this in three steps – first, report from user_nested_tables. (The final script for reporting space used dba_nested_tables with a predicate available on the owner column, but I don’t give myself DBA privileges while writing quick and dirty bits of SQL.).


select 
        parent_table_name, parent_table_column, table_name, 
        ltrim(storage_spec) storage_spec,       -- the DBA view definition includes lpad( , 30) !!
        ltrim(return_type ) return_type         -- the DBA view definition includes lpad( , 20) !!
from 
        user_nested_tables
order by
        parent_table_name, parent_table_column
/

PARENT_TABLE_NAME    PARENT_TABLE_COLUMN  TABLE_NAME           STORAGE_SPEC         RETURN_TYPE
-------------------- -------------------- -------------------- -------------------- --------------------
DEMO_NEST_2          JPL1                 T_JPL1               USER_SPECIFIED       LOCATOR
T_JPL1               JPL2                 T_JPL2               USER_SPECIFIED       LOCATOR
T_JPL2               JPL3                 T_JPL3               DEFAULT              LOCATOR

You’ll notice the odd ltrim() – I have no idea why the view defines these columns to be left-padded the way they are, possibly it dates back to the time when something like cast(… as vachar2(30)) wasn’t a possible option.

Next a simple “connect by” query what uses the above list in a materialize “with subquery” (CTE):


with my_nested_tables as (
select
        /*+ materialize */
        parent_table_name, table_name
from
        user_nested_tables
)
select
        parent_table, child_table, pathname
from    (
        select
                level,
                sys_connect_by_path(table_name, '/')    pathname,
                connect_by_root parent_table_name parent_table,
                table_name child_table
        from
                my_nested_tables
        connect by
                parent_table_name = prior table_name
        )
order by
        parent_table, child_table, pathname
/

PARENT_TABLE         CHILD_TABLE          PATHNAME
-------------------- -------------------- ----------------------------------------
DEMO_NEST_2          T_JPL1               /T_JPL1
DEMO_NEST_2          T_JPL2               /T_JPL1/T_JPL2
DEMO_NEST_2          T_JPL3               /T_JPL1/T_JPL2/T_JPL3
T_JPL1               T_JPL2               /T_JPL2
T_JPL1               T_JPL3               /T_JPL2/T_JPL3
T_JPL2               T_JPL3               /T_JPL3


As required this shows me demo_nest_2 as the owning ancestor of t_jpl1, t_jpl2 and t_jpl3. Unfortunately it has also produced three rows that we don’t want to see in our final space-summing code. But it’s easy enough to get rid of those – the only rows we want are the rows with a parent_table that doesn’t appear as a child_table:


with my_nested_tables as (
select
        /*+ materialize */
        parent_table_name, table_name
from
        user_nested_tables
)
select  parent_table, child_table, pathname
from    (
        select
                level,
                sys_connect_by_path(table_name, '/')    pathname,
                connect_by_root parent_table_name parent_table,
                table_name child_table
        from
                my_nested_tables
        connect by
                parent_table_name = prior table_name
        )
where   (parent_table) not in (
                select table_name
                from   my_nested_tables
        )
order by
        parent_table, child_table, pathname
/

PARENT_TABLE         CHILD_TABLE          PATHNAME
-------------------- -------------------- ----------------------------------------
DEMO_NEST_2          T_JPL1               /T_JPL1
DEMO_NEST_2          T_JPL2               /T_JPL1/T_JPL2
DEMO_NEST_2          T_JPL3               /T_JPL1/T_JPL2/T_JPL3

3 rows selected.


And there’s the result we wanted to see. A first simple corroboration of the fact is that the output corresponds with the “NESTED TABLE” segments reported by user_segments.

Of course, having written a query that gets the right result from a table definition we used to help us define the query in the first place we now ought to create a few more tables with different structures to see if the query continues to give the right results in more complex cases.

What happens, for example, if I create a table with two columns of nested tables, and one of the nested table typed also contained two nested tables ? What happens if the base table is an index organized table ?

It’s easy to do the second test – just add “organization index” immediately after “segment creation immediate” in the table creation statement. The correct results drop out.

As for the first test – here’s the SQL to create one example (and the query still gets the right answers – even if you change the table to be index organized).


drop type jpl_tab1_type;
drop type jpl_item1;

drop type jpl_tab2_type;
drop type jpl_item2;

drop type jpl_tab3_type;
drop type jpl_item3;

purge recyclebin;

create or replace type jpl_item3 as object (v1 varchar2(10), n3 number);
/

create or replace type jpl_tab3_type as table of jpl_item3;
/

create or replace type jpl_item2 as object(v2 varchar2(10), jpl3 jpl_tab3_type, jpl3x jpl_tab3_type);
/

create or replace type jpl_tab2_type as table of jpl_item2;
/

create or replace type jpl_item1 as object(v3 varchar2(10), jpl2 jpl_tab2_type)
/

create or replace type jpl_tab1_type as table of jpl_item1;
/

create table demo_nest_3 (
        id      number  constraint d2_pk primary key,
        jpl1    jpl_tab1_type,
        jpl1a   jpl_tab1_type
)
segment creation immediate
-- organization index
nested table jpl1 store as t_jpl1
        (
        nested table jpl2  store as t_jpl2
                (
                nested table jpl3 store as t_jpl3 
                return as locator
                nested table jpl3x store as t_jpl3x 
                return as locator
                )
        return as locator
        )
return as locator
nested table jpl1a store as t_jpl1a
        (
        nested table jpl2  store as t_jpl2a
                (
                nested table jpl3 store as t_jpl3a
                return as locator
                nested table jpl3x store as t_jpl3xa
                return as locator
                )
        return as locator
        )
return as locator
/

All that remains now is to modify the code to use the equivalent DBA views, with the addition of the owner column, then slot the resulting code into the original query in place of the simple references to dba_nested_tables. If you go to the original posting you’ll see that I’ve done this by wrapping the code into a CTE so that the only changes to the rest of the code appear as two (flagged) changes where the CTE is then used.

October 31, 2019

IOT Hash

Filed under: Execution plans,Hash Join,Infrastructure,IOT,Joins,Oracle,Troubleshooting — Jonathan Lewis @ 2:59 pm GMT Oct 31,2019

It’s another of my double-entendre titles. The optimizer can turn a hash join involving an index-organized table into a real performance disaster (though you may have to help it along the way by using a silly definition for your primary key columns). This post was inspired by a question posted on the Oracle Developer Community forum recently so the table and column names I’ve used in my model reflect (almost, with a few corrections) the names used in the post.

We start with a simple requirement expressed through the following SQL:


rem
rem     Script:         iot_hash.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

insert
        /*+
                qb_name(insert)
        */
into t_iot(
        id, inst_id, nr_time,
        o_time, st, null_col, svname
)
select
        /*+
                qb_name(main)
                unnest(@subq)
                leading(@sel$a93afaed apar@main ob@subq)
                use_hash(@sel$a93afaed ob@subq)
                swap_join_inputs(@sel$a93afaed ob@subq)
                index_ss_asc(@sel$a93afaed ob@subq (t_iot.st t_iot.inst_id t_iot.svname))a
        */
        apar.id,
        'UP',
        to_date('2019-10-24','yyyy-mm-dd'),
        to_date('1969-12-31','yyyy-mm-dd'),
        'IDLE',
        null,
        'tkt007.jj.bb.com'
from
        t_base apar
where
        apar.id not in (
                select
                        /*+
                                qb_name(subq)
                        */
                        id
                from
                        t_iot ob
                where
                        inst_id = 'UP'
        )
and     nvl(apar.gp_nm,'UA') = 'UA'
and     rownum <= 5000
/

The requirement is simple – insert into table t_iot a set of values dictated by a subset of the rows in table t_base if they do not already exist in t_iot. To model the issue that appeared I’ve had to hint the SQL to get the following plan (which I pulled from memory after enabling rowsource execution stats):


---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |             |      1 |        |   296 (100)|      0 |00:00:00.03 |     788 |    148 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL | T_IOT       |      1 |        |            |      0 |00:00:00.03 |     788 |    148 |       |       |          |
|*  2 |   COUNT STOPKEY          |             |      1 |        |            |    100 |00:00:00.03 |      99 |     91 |       |       |          |
|*  3 |    HASH JOIN RIGHT ANTI  |             |      1 |    100 |   296   (2)|    100 |00:00:00.03 |      99 |     91 |    14M|  1843K|   15M (0)|
|*  4 |     INDEX SKIP SCAN      | T_IOT_STATE |      1 |  12614 |   102   (0)|  10000 |00:00:00.01 |      92 |     91 |       |       |          |
|*  5 |     TABLE ACCESS FULL    | T_BASE      |      1 |    100 |     2   (0)|    100 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<=5000)
   3 - access("APAR"."ID"="ID")
   4 - access("INST_ID"='UP')
       filter("INST_ID"='UP')
   5 - filter(NVL("APAR"."GP_NM",'UA')='UA')

The optimizer has unnested (as hinted) the subquery and converted it to an anti-join using a right hash anti-join. Take a look at the Used-mem for the hash join – would it surprise you to learn that the total size of the (not compressed in any way) IOT, and all its indexes, and the t_base table together total less than 4 MB. Something dramatically awful has happened in the hash join to generated a requirement of 14MB. (In the case of the OP this appeared as an unexpected 5GB written to the temporary tablespace.)

Before I address the source of the high memory usage, take a close look at the Predicate Information, particularly operation 3, and ask yourself what the definition of index t_iot_state might be. The predicate joins t_base.id to t_iot.id, and here’s the code to create both tables and all the indexes.

create table t_iot (
        nr_time         timestamp,
        id              varchar2(1024),
        inst_id         varchar2(200),
        o_time          timestamp,
        st              varchar2(200),
        null_col        varchar2(100),
        svname          varchar2(200),
        constraint t_iot_pk primary key(nr_time, id, inst_id)
)
organization index
/

insert into t_iot
select
        sysdate,
        dbms_random.string('l',10),
        'UP',
        sysdate,
        'IDLE',
        null,
        rpad('x',25,'x')
from
        all_objects
where
        rownum <= 1e4 -- > hint to avoid wordpress format issue
/

create index t_iot_state on t_iot(st, inst_id, svname); 
create index idx2        on t_iot(id, inst_id, svname);

create table t_base(
        id              varchar2(400) not null,
        gp_nm           varchar2(200)
)
/

insert into t_base
select
        dbms_random.string('l',10),
        'UA'
from
        all_objects
where
        rownum <= 100 -- > hint to avoid wordpress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 't_iot',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 't_base',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/


The index t_iot_state that Oracle has used in the hash join is defined on the columns (st, inst_id, svname) – so the predicate is doing a comparison with a column that’s not in the index! At least, it’s not visibly declared in the index; but this is a secondary index on an IOT, and IOTs don’t have “normal” rowids, the rowid in a secondary index is the value of the primary key (plus a “block guess”). So the columns in the index (even though not declared in the index) are: (st, inst_id, svname, {nr_time, id, inst_id, blockguess}). So this index does supply the required id column.

Side note: you’ll see in the list of columns above that inst_id appears twice. In fact (since Oracle 9, I think) the code to handle secondary indexes has been smart enough to avoid this duplication. If the secondary index contains columns from the primary key then the “rowid” doesn’t store those columns, the code knows how to construct the primaryh key value from the stored PK columns combined with the needed columns from the index entry. This can make IOTs a very nice choice of implementation for “intersection” tables that are used to represent many-to-many joins between two other tables.

Unfortunately this “rowid” is the explanation for the massive memory demand. Take a look at the “Column Projection Information” for my execution plan:


Column Projection Information (identified by operation id):
-----------------------------------------------------------
   2 - "APAR"."ID"[VARCHAR2,400], "APAR"."GP_NM"[VARCHAR2,200], ROWNUM[8]
   3 - (#keys=1) "APAR"."ID"[VARCHAR2,400], "APAR"."GP_NM"[VARCHAR2,200]
   4 - "OB".ROWID[ROWID,1249], "NR_TIME"[TIMESTAMP,11], "ID"[VARCHAR2,1024], "INST_ID"[VARCHAR2,200], "OB".ROWID[ROWID,1249]
   5 - "APAR"."ID"[VARCHAR2,400], "APAR"."GP_NM"[VARCHAR2,200]

The interesting line is operation 4. A hash join takes the rowsource from its first child (the build table) and creates an in-memory hash table (which may spill to disc, of course), so if I see an unreasonable memory allocation (or unexpected spill to disc) a good starting point is to look at what the first child is supplying. In this case the first child seems to be saying that it’s supplying (or allowing for) nearly 3,700 bytes to be passed up to the hash join.

On closer inspection we can see it’s reporting the “rowid” twice, and also reporting the three columns that make up that rowid. I think it’s reasonable to assume that it’s only supplying the rowid once, and maybe it’s not even supplying the other three columns because they are embedded in the rowid. Doing a quick arithmetic check, let’s multiply the size of the rowid by the value of A-rows: 1,249 * 10,000 = 12,490,000. That’s pretty close to the 14MB reported by the hash join in operation 3.

Hypothesis – to get at the id column, Oracle has used this index (actually a very bad choice of those available) to extract the rowid and then passed the rowid up to the parent in a (length padded) fixed format. Oracle has then created a hash table by extracting the id column from the rowid and building the hash table on it but also carrying the length-padded rowid into the hash table.  Possible variants on this theme are that some or all of the other columns in the Column Projection Information are also passed upwards so that the id doesn’t have to be extracted, but if they are they are not padded to their maximum length.

A simple test that this is broadly the right assumption is to re-run the model making the declared length of the rowid much larger to see what happens to the memory allocation. Changing the inst_id declaration from 200 bytes to 1000 bytes (note the stored value is only the 2 bytes needed for the value ‘UP’) the Used-mem jumps to 23 MB (which is an increment very similar to 800 * 10,000).  You’ll note that I chose to experiment with a column that wasn’t the column used in the join. It was a column in the secondary index definition, though, so another test would be to change the nr_time column from a timestamp (11 bytes) to a large varchar2, so I re-ran the test declaring the nr_time as a varchar2(1000) – reverting the inst_id to varchar2(200) – and the Used-mem increased to 25MB.

Preliminary Conclusion

If Oracle uses the contents of the rowid of a secondary index on an IOT in a join then it constructs a fixed format version for the rowid by padding every column in the primary key to its maximum length and concatenating the results. This can have catastrophic side effects on performance if you’ve declared some very long columns “just in case”. Any time you use index organized tables you should remember to check the Column Projection Information in any execution plans that use secondary indexes in case they are passing a large, padded, primary key through the plan to a point where a blocking operation (such as a hash join or merge join) has to accumulate a large number of rows.

Footnote

In my test case I had to hint the query heavily to force Oracle into the path I wanted to demonstrate.

It’s surprising that the optimizer should have chosen this path in the OP’s system, given that there’s another secondary index that contains the necessary columns in its definition. (So one thought is that there’s a statistics problem to address, or possibly the “good” index is subject to updates that make it become very inefficient (large) very quickly.)

Another oddity of the OP’s system was that Oracle should have chosen to do a right hash anti-join when it looked as if joining the tables in the opposite order would produce a much smaller memory demand and lower cost – there was an explict swap_join_inputs() hint in the Outline Information (so copying the outline into the query and changing that to no_swap_join_inputs() might have been abother viable workaround.) In the end the OP hinted the query to use a nested loop (anti-)join from t_base to t_iot – which is another way to avoid the hash table threat with padded rowids.

 

October 11, 2019

v$session

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:29 pm BST Oct 11,2019

Here’s an odd, and unpleasant, detail about querying v$session in the “most obvious” way. (And if you were wondering what made me resurrect and complete a draft on “my session id” a couple of days ago, this posting is the reason). Specifically if you want to select some information for your own session from v$session the query you’re likely to use in any recent version of Oracle will probably be of the form:


select {list for columns} from v$session where sid = to_number(sys_context('userenv','sid'));

Unfortunately that one little statement hides two anomalies – which you can see in the execution plan. Here’s a demonstration cut from an SQL*Plus session running under 19.3.0.0:


SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  gcfrzq9knynj3, child number 0
-------------------------------------
select program from V$session where sid = sys_context('userenv','sid')

Plan hash value: 2422122865

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  MERGE JOIN CARTESIAN     |                 |     1 |    33 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    12 |     0   (0)|
|*  3 |    FIXED TABLE FULL       | X$KSLWT         |     1 |     8 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |     1 |     4 |     0   (0)|
|   5 |   BUFFER SORT             |                 |     1 |    21 |     0   (0)|
|*  6 |    FIXED TABLE FULL       | X$KSUSE         |     1 |    21 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------ -------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid')))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter((BITAND("S"."KSUSEFLG",1)<>0 AND BITAND("S"."KSSPAFLG",1)<>0 AND 
              "S"."INDX"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))
              AND INTERNAL_FUNCTION("S"."CON_ID") AND "S"."INST_ID"=USERENV('INSTANCE')))

As you can see, v$session is a join of 3 separate structures – x$kslwt (v$session_wait), x$ksled (v$event_name), and x$ksuse (the original v$session as it was some time around 8i), and the plan shows two “full tablescans” and a Cartesian merge join. Tablescans and Cartesian merge joins are not necessarily bad – especially where small tables and tiny numbers of rows are concerned – but they do merit at least a brief glance.

x$ksuse is a C structure in the fixed SGA and that structure is a segmented array (which seems to be chunks of 126 entries in 19.3, and chunks of 209 entries in 12.2 – but that’s fairly irrelevant). The SID is simply the index into the array counting from 1, so if you have a query with a predicate like ‘SID = 99’ Oracle can work out the address of the 99th entry in the array and access it very quickly – which is why the SID column is reported as a “fixed index” column in the view v$indexed_fixed_column.

But we have two problems immediately visible:

  1. the optimizer is not using the “index” to access x$ksuse despite the fact that we’re giving it exactly the value we want to use (and we can see a suitable predicate at operation 6 in the plan)
  2. the optimizer has decided to start executing the query at the x$kslwt table

Before looking at why thing’s have gone wrong, let’s check the execution plan to see what would have happened if we’d copied the value from the sys_context() call into a bind variable and queried using the bind variable – which we’ll keep as a character type to make it a fair comparison:

SQL_ID  cm3ub1tctpdyt, child number 0
-------------------------------------
select program from v$session where sid = to_number(:v1)

Plan hash value: 1627146547

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  MERGE JOIN CARTESIAN     |                 |     1 |    32 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    12 |     0   (0)|
|*  3 |    FIXED TABLE FIXED INDEX| X$KSLWT (ind:1) |     1 |     8 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |     1 |     4 |     0   (0)|
|   5 |   BUFFER SORT             |                 |     1 |    20 |     0   (0)|
|*  6 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    20 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(:V1))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter(("S"."INDX"=TO_NUMBER(:V1) AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))


When we have a (character) bind variable instead of a sys_context() value the optimizer manages to use the “fixed indexes”., but it’s still started executing at x$kslwt, and still doing a Cartesian merge join. The plan would be the same if the bind variable were a numeric type, and we’d still get the same plan if we replaced the bind variable with a literal number.

So problem number 1 is that Oracle only seems able to use the fixed index path for literal values and simple bind variables (plus a few “simple” functions). It doesn’t seem to use the fixed indexes for most functions (even deterministic ones) returning a value and the sys_context() function is a particular example of this.

Transitivity

Problem number 2 comes from a side-effect of something that I first described about 13 years ago – transitive closure. Take a look at the predicates in both the execution plans above. Where’s the join condition between x$ksuse and x$kslwt ? There should be one, because the underlying SQL defining [g]v$session  has the following joins:

from
      x$ksuse s,
      x$ksled e,
      x$kslwt w
where
      bitand(s.ksspaflg,1)!=0
and   bitand(s.ksuseflg,1)!=0
and   s.indx=w.kslwtsid       -- this is the SID column for v$session and v$session_wait
and   w.kslwtevt=e.indx
 

What’s happened here is that the optimizer has used transitive closure: “if a = b and b = c then a = c” to clone the predicate “s.indx = to_number(sys_context(…))” to “w.kslwtsid = to_number(sys_context(…))”. But at the same time the optmizer has eliminated the predicate “s.indx = w.kslwtsid”, which it shouldn’t do because this is 12.2.0.1 and ever since 10g we’ve had the parameter _optimizer_transitivity_retain = true — but SYS is ignoring the parameter setting.

So we no longer have a join condition between x$ksuse and x$kslwt – which means there has to be a cartesian merge join between them and the only question is whether this should take place before or after the join between x$kslwt and x$ksled. In fact, the order doesn’t really matter because there will be only one row identified in x$kslwt and one row in x$ksuse, and the join to x$ksled is simply a lookup (by undeclarable unique key) to translate an id into a name and it will take place only once whatever we do about the other two structures.

But there is a catch – especially if your sessions parameter is 25,000 (which it shouldn’t be) and the number of connected sessions is currently 20,000 (which it shouldn’t be) – the predicate against x$ksuse does a huge amount of work as it walks the entire array testing every row (and it doesn’t even do the indx test first – it does a couple of bitand() operations). Even then this wouldn’t be a disaster – we’re only talking a couple of hundredths of a second of CPU – until you find the applications that run this query a huge number of times.

We would prefer to avoid two full tablescans since the arrays could be quite large, and of the two it’s the tablescan of x$ksuse that is going to be the greater threat; so is there a way to bypass the threat?  Once we’ve identified the optimizer anomaly we’ve got a pointer to a solution. Transitivity is going wrong, so let’s attack the transitivity. Checking the hidden parameters we can find a parameter: _optimizer_generate_transitive_pred which defaults to true, so let’s set it to false for the query and check the plan:

select  /*+   opt_param('_optimizer_generate_transitive_pred','FALSE')
*/  program from  v$session where  sid = sys_context('userenv','sid')

Plan hash value: 3425234845

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  NESTED LOOPS             |                 |     1 |    32 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    28 |     0   (0)|
|   3 |    FIXED TABLE FULL       | X$KSLWT         |    47 |   376 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    20 |     0   (0)|
|*  5 |   FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |     4 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("S"."INDX"="W"."KSLWTSID" AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INDX"=TO_NUMBER(SYS_CONTEXT('user
              env','sid')) AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTEVT"="E"."INDX")


Although it’s not nice to insert hidden parameters into the optimizer activity we do have a result. We don’t have any filtering on x$kslwt – fortunately this seems to be limited in size (but see footnote) to the number of current sessions (unlike x$ksuse which has an array size defined by the sessions parameter or derived from the processes parameter). For each row in x$kslwt we do an access into x$ksuse using the “index” (note that we don’t see access predicates for the fixed tables, we just have to note the operation says FIXED INDEX and spot the “index-related” predicate in the filter predicate list), so this strategy has reduced the number of times we check the complex predicate on x$ksuse rows.

It’s still far from ideal, though. What we’d really like to do is access x$kslwt by index using the known value from sys_context(‘userenv’,’sid’). As it stands the path we get from using a hidden parameter which isn’t listed as legal for the opt_param() hint is a plan that we would get if we used an unhinted query that searched for audsid = sys_context(‘userenv’,’sessionid’).

SQL_ID  7f3f9b9f32u7z, child number 0
-------------------------------------
select  program from  v$session where  audsid =
sys_context('userenv','sessionid')

Plan hash value: 3425234845

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  NESTED LOOPS             |                 |     2 |    70 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     2 |    62 |     0   (0)|
|   3 |    FIXED TABLE FULL       | X$KSLWT         |    47 |   376 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    23 |     0   (0)|
|*  5 |   FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |     4 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("S"."INDX"="W"."KSLWTSID" AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."KSUUDSES"=TO_NUMBER(SYS_CONTEXT('
              userenv','sessionid')) AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTEVT"="E"."INDX")


The bottom line, then, seems to be that if you need a query by SID against v$session to be as efficient as possible then your best bet is to load a numeric variable with the sys_context(‘userenv’,’sid’) and then select where “sid = :bindvariable”.  Otherwise query by audsid, or use a hidden parameter to affect the optimizer.

Until the the SYS schema follows the _optimizer_transitivity_retain parameter or treats sys_context() the same way it treats a bind variable there is always going to be some unnecessary work when querying v$session and that excess will grow with either the number of connected sessions (if you optimize the query) or with the value of the sessions parameter.

Footnote

In (much) older versions of Oracle v$session_wait sat on top of x$ksusecst, which was part of the same C structure as x$ksuse. In newer versions of Oracle x$kslwt is a structure that is created on demand in the PGA/UGA – I hope that there’s a short cut that allows Oracle to find the waiting elements in x$ksuse[cst] efficiently, rather than requiring a walk through the whole thing, otherwise a tablescan of the (nominally smaller) x$kslwt structure will be at least as expensive as a tablescan of the x$ksuse structure.

Update (just a few minutes after posting)

Bob Bryla has pointed out in a tweet that there are many “bugs” not fixed until 19.1 for which the workaround is to set “_optimizer_transitivity_retain” to false. So maybe this isn’t an example of SYS doing something particularly strange – it may be part of a general reworking of the mechanism that still has a couple of undesirable side effects.

Bob’s comment prompted me to clone the x$ tables into real tables in a non-SYS schema and model the fixed indexes with primary keys, and I found that the resulting plan (though very efficient) still discarded the join predicate. So we may be seeing the side effects of a code enhancement relating to generating predicates that produce unique key access paths. (A “contrary” test, the one in the 2013 article I linked to, still retains the join predicate for the query that has non-unique indexes.)

 

October 9, 2019

My SID

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 12:03 pm BST Oct 9,2019

Here’s a little note that’s been hanging around as a draft for more than eight years according to the OTN (as it was) posting that prompted me to start writing it. At the time there were still plenty of people using Oracle 10g. so the question didn’t seem entirely inappropriate:

On 10g R2 when I open a sqlplus session how can I know my session SID ? I’m not DBA then can not open as sysdba and query v$session.

In all fairly recent versions of Oracle, of course, we have the option to use the sys_context() function to get the SID, but this specific option didn’t appear until some time in the 10g timeline – so you might have spent years “knowing” that you could get the audsid though sys_context(‘userenv’,’sessionid’) there was no equivalent way to get the sid. Now, of course, and even in the timeline of the original posting, the simplest solution to the requirement is to execute:


select sys_context('userenv','sid') from dual;

But there are a number of alternatives – which may occasionally do a better job (and sometimes are just plain silly). It’s also worth noting that even in 19c Oracle still doesn’t have access to v$session.serial# through sys_context() and, anyway, sys_context() behaves like an unpeekable bind variable – which can be a problem.

So here’s the first of several options:

select sid from V$mystat where rownum = 1;

You’ll need SYS to grant you select on v_$mystat to use this one, of course, but v$mystat is a very convenient view giving you the session activity stats since logon for your own session – so there ought to be some mechanism that allows you to see some form of it in place anyway (ideally including the join to v$statname).

One of the oldest ways of getting access to your session ID without having access to any of the dynamic performance views was through the dbms_support package:

variable v1 varchar2(32)
execute :v1 := dbms_support.mysid
execute dbms_output.put_line(:v1)

Again you’ll need SYS to grant you extra privileges, in this case execute on the dbms_support package – worse still, the package is not installed by default. In fact (after installing it) if you call dbms_support.package_version it returns the value: “DBMS_SUPPORT Version 1.0 (17-Aug-1998) – Requires Oracle 7.2 – 8.0.5” – which gives you some idea of how old it is. It used to be useful for the start_trace_in_session() procedure it contains but that procedure has been superseded by many newer mechanisms. If you enable SQL tracing to see what’s happening under the covers when you call dbms_support.mysid you’ll see that the function actually runs the query I showed above against v$mystat .

Unlike dbms_support the dbms_session package is installed automatically with the privilege to execute granted to public,  and this gives you a function to generate a “unique session id”, . The notes in the scripts $ORACLE_HOME/rdbms/admin/dbmssess.sql that create the package say that the return value can be up to 24 bytes long, but so far the maximum I’ve seen is 12.


select dbms_session.unique_session_id from dual;
UNIQUE_SESSION_ID
--------------------------
00FF5F980001


select
        to_number(substr(dbms_session.unique_session_id,1,4),'XXXX') sid,
        to_number(substr(dbms_session.unique_session_id,5,4),'XXXX') serial#,
        to_number(substr(dbms_session.unique_session_id,9,4),'XXXX') instance
from
        dual
;

       SID    SERIAL# INSTANCE
---------- ---------- --------
       255      24472        1

As you can see, the session_unique_id can be decoded to produce three useful bits of information, and the nice thing about this call is that it gives you session serial# at the same time as the SID. It’s possible, of course, that this query is as efficient as it could be, but there’s some scope for writing a query that uses a non-mergeable in-line view to call the function once, then splits the result into three pieces.

While we’re on the session_unique_id, the dbms_pipe package also has a “unique identifier” function unique_session_name():

SQL> select dbms_pipe.unique_session_name from dual;

UNIQUE_SESSION_NAME
------------------------
ORA$PIPE$00FF5F980001

It doesn’t take a lot of effort to spot that the “unique session name” is the “unique session id” of dbms_session prefixed with the text “ORA$PIPE$”. It’s convenient for the dbms_pipe package to be able to generate a unique name so that one session can create a safely named pipe and tell another session about it. Anyone using pipes should take advantage of this function for its original purpose. Unlike dbms_session you’ll need to be granted the privilege to execute this package, it’s not available to public. Interestingly the script that creates dbms_pipe says that this function could return 30 bytes – since it appears to be 9 bytes prepended to the (“could be 24 bytes”) dbms_session.unique_session_id you have to wonder whether there’s something more subtle that could happen.

There may be many more mechanisms available as built-ins, but the last one I know of is in the dbms_debug_jdwp package (another package with execute privilege already granted to public and the ability to supply both the sid and serial#):

SQL> select
  2          dbms_debug_jdwp.current_session_id     sid,
  3          dbms_debug_jdwp.current_session_serial serial#
  4  from dual
  5  /

       SID    SERIAL#
---------- ----------
       255      24472

There is a reason why I’ve decided to resurrect this list of ways of getting at a session’s SID, but that’s the topic of another blog note.

 

 

October 7, 2019

Resumable

Filed under: Infrastructure,Oracle,Tuning — Jonathan Lewis @ 10:31 am BST Oct 7,2019

There are two questions about temporary space that appear fairly regularly on the various Oracle forums. One is of the form:

From time to time my temporary tablespace grows enormously (and has to be shrunk), how do I find what’s making this happen?

The other follows the more basic pattern:

My process sometimes crashes with Oracle error: “ORA-01652: unable to extend temp segment by %n in tablespace %s” how do I stop this happening?

Before moving on to the topic of the blog, it’s worth pointing out two things about the second question:

  • First, it’s too easy to get stuck at the word temp and leap to the conclusion that the problem is about the temporary tablespace without noticing that the error message includes the specific tablespace that’s raised the problem. If, for example, you rebuild an index in a nominated tablespace Oracle first creates the index as a temporary segment (with a name like {starting_file_number}.{starting_block_number}) in that tablespace then renames it to match the original index name once the rebuild is complete and drops the old index.
  • Secondly a process that raises ORA-01652 isn’t necessarily the guilty party – it may be the victim of some other process hogging all the available space when it shouldn’t. Moreover that other process may have completed and released its space by the time you start looking for the problem – causing extra confusion because your process seems to have crashed without a cause. Taking my example of an index rebuild – your index rebuild may fail because someone else was rebuilding a different index at the same time in the same tablespace; but when you check the tablespace all the space from their original index is now free as their rebuild completed in the interim.

So, before you start chasing something that you think is a problem with your code, pause a moment to double-check the error message and think about whether you could have been the victim of some concurrent, but now complete, activity.

I’ve listed the two questions as variants on the same theme because the workaround to one of them introduces the risk of the other – if you want to avoid ORA-01652 you could make all your data files and temp files “autoextensible”, but then there may be occasions when they extend far too much and you need to shrink them down again (and that’s not necessarily easy if it’s not the temporary tablespace). Conversely, if you think your data or temp files randomly explode to ludicrous sizes you could decide on a maximum size for your files and disable autoextension – then handle the complaints when a user reports an ORA-01652.

There are various ways you could monitor your system in near real time to spot the threat as it builds, of course; and there are various ways to identify potentially guilty SQL after the event. You could keep an eye on various v$ dynamic performance views or dba_ administrative views to try and intercept a problem; you could set event 1652 to dump an errorstack (or even systemstate) for post-crash analysis to see what that reported. Neither is an ideal solution – one requires you to pay excessive attention to the system, the other is designed to let the problem happen then leave you to clean up afterwards.  There is, however, a strategy that may stop the problem from appearing without requiring constant monitoring. The strategy is to enable (selectively) resumable operations.

If a resumable operation needs to allocate space but is unable to do so – i.e. it would normally be about to raise ORA-01652 – it will suspend itself for a while going into the wait state “statement suspended, wait error to be cleared” which will show up as the event in v$session_wait, timing out every 2 seconds The session will also be reporting its current action in the view v$resumable or, for slightly more information, dba_resumable. As it suspends the session will also write a message to the alert log but you can also create an “after suspend” database trigger to alert you that a problem has occurred.

If you set the resumable timeout to a suitable value then you may find:

  • the problem goes away of its own accord and the session resumes before the timeout is reached

or

  • you receive a warning and have some time to identify the source of the problem and take the minimum action needed to allow the session to resume

Implementation

The parameter resumable_timeout is a general control for resumable sessions if you don’t handle the feature at a more granular level than the system.

By default this parameter is set to zero which translates into a default value of 7,200 seconds but that default doesn’t come into effect unless a session declares itself resumable. If you set the parameter to a non-zero value all session will automatically be operating as resumable sessions – and you’ll soon hear why you don’t want to do that.

The second enabling feature for resumable sessions is the resumable privilege – a session can’t control it’s own resumability unless the schema has been granted the resumable privilege – which may be granted through a role. If a session has the privilege it may set its own resumable_timeout, even if the system value is zero.

Assume we have set resumable_timeout to 10 (seconds) through the instance parameter file and restarted the instance. If we now issue (for example) the following ‘create table’ statement:


create table t1 (n1, v1 ) 
pctfree 90 pctused 10
tablespace tiny
as
select 
        rownum, cast(lpad('x',800) as varchar2(1000))
from    all_objects
where   rownum <= 20000
/

This will attempt to allocate 1 row per block for 20,000 blocks (plus about 1.5% for bitmap space management blocks) – and tablespace tiny lives up (or down) to its name, consisting of a single file of only 10,000 Oracle blocks. Shortly after starting, the session will hit Oracle error “ORA-01652: unable to extend temp segment by 128 in tablespace TINY”, but it won’t report it; instead it will suspend itself for 10 seconds before failing and reporting the error. This will happen whether or not the session has the resumable privilege – in this case the behaviour is dictated by our setting the system parameter. If you look in the alert log after the session finally errors out you will find text like the following:

2019-10-04T14:01:11.847943+01:00
ORCL(3):ORA-1652: unable to extend temp segment by 128 in tablespace TINY [ORCL] 
ORCL(3):statement in resumable session 'User TEST_USER(138), Session 373, Instance 1' was suspended due to
ORCL(3):    ORA-01652: unable to extend temp segment by 128 in tablespace TINY
2019-10-04T14:01:23.957586+01:00
ORCL(3):statement in resumable session 'User TEST_USER(138), Session 373, Instance 1' was timed out

Note that there’s a 10 (plus a couple) second gap between the point where the session reports that it is suspending itself and the point where it fails with a timeout. The two-extra seconds appear because the session polls every 2 seconds to see whether the problem is still present or whether it has spontaneously disappeared so allowing the session to resume.

Let’s change the game slightly; let’s try to create the table again, but this time execute the following statement first:

alter session enable resumable timeout 60 name 'Help I''m stuck';

The initial response to this will be Oracle error “ORA-01031: insufficient privileges” because the session doesn’t have the resumable privilege, but after granting resumable to the user (or a relevant role) we try again and find we will be allowed a little extra time before the CTAS times out. Our session now overrides the system timeout and will wait 60 seconds (plus a bit) before failing.The “timeout” clause is optional and if we omit it the session will use the system value, similarly the “name” clause is optional though there’s no default for it, it’s just a message that will get into various views and reports.

There are several things you might check in this 60 second grace period. The session wait history will confirm that your session has been timing out every two seconds (as will the active session history if you’re licensed to use it):


select seq#, event, wait_time from v$session_wait_history where sid = 373

      SEQ# EVENT							     WAIT_TIME
---------- ---------------------------------------------------------------- ----------
	 1 statement suspended, wait error to be cleared			   204
	 2 statement suspended, wait error to be cleared			   201
	 3 statement suspended, wait error to be cleared			   201
	 4 statement suspended, wait error to be cleared			   201
	 5 statement suspended, wait error to be cleared			   200
	 6 statement suspended, wait error to be cleared			   200
	 7 statement suspended, wait error to be cleared			   202
	 8 statement suspended, wait error to be cleared			   200
	 9 statement suspended, wait error to be cleared			   200
	10 statement suspended, wait error to be cleared			   200

Then there’s a special dynamic performance view, v$resumable which I’ve reported below using a print_table() procedure that Tom Kyte wrote many, many years ago to report rows in a column format:

SQL> set serveroutput on
SQL> execute print_table('select * from v$resumable where sid = 373')

ADDR                          : 0000000074515B10
SID                           : 373
ENABLED                       : YES
STATUS                        : SUSPENDED
TIMEOUT                       : 60
SUSPEND_TIME                  : 10/04/19 14:26:20
RESUME_TIME                   :
NAME                          : Help I'm stuck
ERROR_NUMBER                  : 1652
ERROR_PARAMETER1              : 128
ERROR_PARAMETER2              : TINY
ERROR_PARAMETER3              :
ERROR_PARAMETER4              :
ERROR_PARAMETER5              :
ERROR_MSG                     : ORA-01652: unable to extend temp segment by 128 in tablespace TINY
CON_ID                        : 0
-----------------
1 rows selected

Notice how the name column reports the name I supplied when I enabled the resumable session. The view also tells us when the critical statement was suspended and how long it is prepared to wait (in total) – leaving us to work out from the current time how much time we have left to work around the problem.

There’s also a dba_resumable variant of the view which is slightly more informative (though the sample below is not consistent with the one above because I ran the CTAS several times, editing the blog as I did so):

SQL> execute print_table('select * from dba_resumable where session_id = 373')

USER_ID                       : 138
SESSION_ID                    : 373
INSTANCE_ID                   : 1
COORD_INSTANCE_ID             :
COORD_SESSION_ID              :
STATUS                        : SUSPENDED
TIMEOUT                       : 60
START_TIME                    : 10/04/19 14:21:14
SUSPEND_TIME                  : 10/04/19 14:21:16
RESUME_TIME                   :
NAME                          : Help I'm stuck
SQL_TEXT                      : create table t1 (n1, v1 ) pctfree 90 pctused 10 tablespace tiny as  select rownum, 
                                cast(lpad('x',800) as varchar2(1000)) from all_objects where rownum <= 20000
ERROR_NUMBER                  : 1652
ERROR_PARAMETER1              : 128
ERROR_PARAMETER2              : TINY
ERROR_PARAMETER3              :
ERROR_PARAMETER4              :
ERROR_PARAMETER5              :
ERROR_MSG                     : ORA-01652: unable to extend temp segment by 128 in tablespace TINY
-----------------
1 rows selected

This view includes the text of the statement that has been suspended and shows us when it started running (so that we can decide whether we really want to rescue it, or might be happy to kill it to allow some other suspended session to resume).

If you look at the alert log in this case you’ll see that the name has been reported there instead of the user, session and instance – which means you might want to think carefully about how you use the name option:


2019-10-04T14:21:16.151839+01:00
ORCL(3):statement in resumable session 'Help I'm stuck' was suspended due to
ORCL(3):    ORA-01652: unable to extend temp segment by 128 in tablespace TINY
2019-10-04T14:22:18.655808+01:00
ORCL(3):statement in resumable session 'Help I'm stuck' was timed out

Once your resumable task has completed (or timed out and failed) you can stop the session from being resumable with the command:

alter session disable resumable;

And it’s important that every time you enable resumability you should disable it as soon as the capability is no longer needed. Also, be careful about when you enable it, don’t be tempted to make every session resumable. Use it only for really important cases. Once a session is resumable virtually everything that goes on in that session is deemed to be resumable, and this has side effects.

The first side effect that may spring to mind is the impact of the view v$resumable – it’s a memory structure in the SGA so that everyone can see it and all the resumable sessions can populate and update it. That means there’s got to be some latch (or mutex) protection going on – and if you look at v$latch you’ll discover that there;s just a single (child) latch doing the job, so resumability can introduce a point of contention. Here’s a simple script (using my “start_XXX” strategy to “select 1 from dual;” one thousand times, with calls to check the latch activity:

set termout off
set serveroutput off
execute snap_latch.start_snap

@start_1000

set termout on
set serveroutput on
execute snap_latch.end_snap(750)

And here are the results of running the script – reporting only the latches with more than 750 gets in the interval – first without and then with a resumable session:

---------------------------------
Latch waits:-   04-Oct 15:04:31
Lower limit:-  750
---------------------------------
Latch                              Gets      Misses     Sp_Get     Sleeps     Im_Gets   Im_Miss Holding Woken Time ms
-----                              ----      ------     ------     ------     -------   ------- ------- ----- -------
session idle bit                  6,011           0          0          0           0         0       0     0      .0
enqueue hash chains               2,453           0          0          0           0         0       0     0      .0
enqueue freelist latch                1           0          0          0       2,420         0       0     0      .0
JS queue state obj latch          1,176           0          0          0           0         0       0     0      .0

SQL> alter session enable resumable;

SQL> @test
---------------------------------
Latch waits:-   04-Oct 15:04:46
Lower limit:-  750
---------------------------------
Latch                              Gets      Misses     Sp_Get     Sleeps     Im_Gets   Im_Miss Holding Woken Time ms
-----                              ----      ------     ------     ------     -------   ------- ------- ----- -------
session idle bit                  6,011           0          0          0           0         0       0     0      .0
enqueue hash chains               2,623           0          0          0           0         0       0     0      .0
enqueue freelist latch                1           0          0          0       2,588         0       0     0      .0
resumable state object            3,005           0          0          0           0         0       0     0      .0
JS queue state obj latch          1,260           0          0          0           0         0       0     0      .0

PL/SQL procedure successfully completed.

SQL> alter session disable resumable;

That’s 1,000 selects from dual – 3,000 latch gets on a single child latch. It looks like every call to the database results in a latch get and an update to the memory structure. (Note: You wouldn’t see the same effect if you ran a loop inside an anonymous PL/SQL block since the block would be the single database call).

For other side effects with resumability think about what else is going on around your session. If you allow a session to suspend for (say) 3600 seconds and it manages to resume just in time to avoid a timeout it now has 3,600 seconds of database changes to unwind if it’s trying to produce a read-consistent result; so not only do you have to allow for increasing the size of the undo tablespace and increasing the undo retention time, you have to allow for the fact that when the process resumes it may run much more slowly than usual because it spends more of its time trying to see the data as it was before it suspended, which may require far more single block reads of the undo tablespace – and the session may then crash anyway with an Oracle error ORA-01555 (which is so well-known that I won’t quote the text).

In the same vein – if a process acquires a huge amount of space in the temporary tablespace (in particular) and fails instantly because it can’t get any more space it normally crashes and releases the space. If you allow that process to suspend for an hour it’s going to hold onto that space – which means other processes that used to run safely may now crash because they find there’s no free space left for them in the temporary tablespace.

Be very cautious when you introduce resumable sessions – you need to understand the global impact, not just the potential benefit to your session.

Getting Alerts

Apart from the (passive) views telling you that a session has suspended it’s also possible to get some form of (active) alert when the event happens. There’s an “after suspend” event that you can use to create a database trigger to take some defensive action, e.g.:

create or replace trigger call_for_help
after suspend
on test_user.schema
begin
        if sysdate between trunc(sysdate) and trunc(sysdate) + 3/24 then
                null;
                -- use utl_mail, utl_smtp et. al. to page the DBA
        end if;
end;
/

This trigger is restricted to the test_user schema, and (code not included) sends a message to the DBA’s pager only between the hours of midnight and 3:00 a.m. Apart from the usual functions in dbms_standard that returnn error codes, names of objects and so on you might want to take a look at the dbms_resumable package for the “helper” functions and procedures it supplies.

For further information on resumable sessions here’s a link to the 12.2 manual to get you started.

October 3, 2019

Trace Files

Filed under: 12c,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:38 pm BST Oct 3,2019

A recent blog note by Martin Berger about reading trace files in 12.2 poped up in my twitter timeline yesterday and reminded me of a script I wrote a while ago to create a simple view I could query to read the tracefile generated by the current session while the session was still connected. You either have to create the view and a public synonym through the SYS schema, or you have to use the SYS schema to grant select privileges on several dynamic performance views to the user to allow the user to create the view in the user’s schema. For my scratch database I tend to create the view in the SYS schema.

Script to be run by SYS:

rem
rem     Script: read_trace_122.sql
rem     Author: Jonathan Lewis
rem     Dated:  Sept 2018
rem
rem     Last tested
rem             12.2.0.1

create or replace view my_trace_file as
select 
        *
from 
        v$diag_trace_file_contents
where
        (adr_home, trace_filename) = (
                select
                --      substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                        substr(
                                substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                1,
                                instr(
                                        substr(tracefile, 1, instr(tracefile,'/',-1)),
                                        'trace'
                                ) - 2
                        ),
                        substr(tracefile, instr(tracefile,'/',-1)+1) trace_filename
                from 
                        v$process
                where   addr = (
                                select  paddr
                                from    v$session
                                where   sid = (
                                        sys_context('userenv','sid')
                                        -- select sid from v$mystat where rownum = 1
                                        -- select dbms_support.mysid from dual
                                )
                        )
        )
;


create public synonym my_trace_file for sys.my_trace_file;
grant select on my_trace_file to {some role};

Alternatively, the privileges you could grant to a user from SYS so that they could create their own view:


grant select on v_$process to some_user;
grant select on v_$session to some_user;
grant select on v_$diag_trace_file_contents to some_user;
and optionally one of:
        grant select on v_$mystat to some_user;
        grant execute on dbms_support to some_user;
                but dbms_support is no longer installed by default.

The references to package dbms_support and view v$mystat are historic ones I have lurking in various scripts from the days when the session id (SID) wasn’t available in any simpler way.

Once the view exists and is available, you can enable some sort of tracing from your session then query the view to read back the trace file. For example, here’s a simple “self-reporting” (it’s going to report the trace file that it causes) script that I’ve run from 12.2.0.1 as a demo:


alter system flush shared_pool;
alter session set sql_trace true;

set linesize 180
set trimspool on
set pagesize 60

column line_number      format  999,999
column piece            format  a150    
column plan             noprint
column cursor#          noprint

break on plan skip 1 on cursor# skip 1

select
        line_number,
        line_number - row_number() over (order by line_number) plan,
        substr(payload,1,instr(payload,' id=')) cursor#,
        substr(payload, 1,150) piece
from
        my_trace_file
where
        file_name = 'xpl.c'
order by
        line_number
/

alter session set sql_trace false;

The script flushes the shared pool to make sure that it’s going to trigger some recursive SQL then enables a simple SQL trace. The query then picks out all the lines in the trace file generated by code in the Oracle source file xpl.c (execution plans seems like a likely guess) which happens to pick out all the STAT lines in the trace (i.e. the ones showing the execution plans).

I’ve used the “tabibitosan” method to identify all the lines that belong to a single execution plan by assuming that they will be consecutive lines in the output starting from a line which includes the text ” id=1 “ (the surrounding spaces are important), but I’ve also extracted the bit of the line which includes the cursor number (STAT #nnnnnnnnnnnnnnn) because two plans may be dumped one after the other if multiple cursors close at the same time. There is still a little flaw in the script because sometimes Oracle will run a sys-recursive statement in the middle of dumping a plan to turn an object_id into an object_name, and this will cause a break in the output.

The result of the query is to extract all the execution plans in the trace file and print them in the order they appear – here’s a sample of the output:


LINE_NUMBER PIECE
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------
         38 STAT #140392790549064 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=53 us cost=4 size=113 card
         39 STAT #140392790549064 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=0 card=1)'


         53 STAT #140392790535800 id=1 cnt=1 pid=0 pos=1 obj=0 op='MERGE JOIN OUTER (cr=5 pr=0 pw=0 str=1 time=95 us cost=2 size=178 card=1)'
         54 STAT #140392790535800 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 str=1 time=57 us cost=2 size=138 card=1)'
         55 STAT #140392790535800 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'
         56 STAT #140392790535800 id=4 cnt=0 pid=1 pos=2 obj=0 op='BUFFER SORT (cr=2 pr=0 pw=0 str=1 time=29 us cost=0 size=40 card=1)'
         57 STAT #140392790535800 id=5 cnt=0 pid=4 pos=1 obj=73 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=2 pr=0 pw=0 str=1 time=10 us cost=0 size=40 card=1)
         58 STAT #140392790535800 id=6 cnt=0 pid=5 pos=1 obj=74 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=2 pr=0 pw=0 str=1 time=8 us cost=0 size=0 card=1)'


         84 STAT #140392791412824 id=1 cnt=1 pid=0 pos=1 obj=20 op='TABLE ACCESS BY INDEX ROWID BATCHED ICOL$ (cr=4 pr=0 pw=0 str=1 time=25 us cost=2 size=54 card
         85 STAT #140392791412824 id=2 cnt=1 pid=1 pos=1 obj=42 op='INDEX RANGE SCAN I_ICOL1 (cr=3 pr=0 pw=0 str=1 time=23 us cost=1 size=0 card=2)'


         94 STAT #140392790504512 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 str=1 time=432 us cost=6 size=374 card=2)'
         95 STAT #140392790504512 id=2 cnt=2 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=7 pr=0 pw=0 str=1 time=375 us cost=5 size=374 card=2)'
         96 STAT #140392790504512 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 str=1 time=115 us cost=2 size=288 card=2)'
         97 STAT #140392790504512 id=4 cnt=2 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 str=1 time=100 us cost=2 size=184 card=2)'
         98 STAT #140392790504512 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=85 us cost=1 size=0 card=1)'
         99 STAT #140392790504512 id=6 cnt=0 pid=3 pos=2 obj=75 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=1 pr=0 pw=0 str=2 time=8 us cost=0 size=52 card=1)'
        100 STAT #140392790504512 id=7 cnt=0 pid=6 pos=1 obj=76 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=1 pr=0 pw=0 str=2 time=7 us cost=0 size=0 card=1)'
        101 STAT #140392790504512 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW  (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=43 card=1)'
        102 STAT #140392790504512 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=0 pw=0 str=1 time=44 us cost=3 size=15 card=1)'
        103 STAT #140392790504512 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS CLUSTER CDEF$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=15 card=1)'
        104 STAT #140392790504512 id=11 cnt=1 pid=10 pos=1 obj=30 op='INDEX UNIQUE SCAN I_COBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        116 STAT #140392791480168 id=1 cnt=4 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 str=1 time=62 us cost=3 size=858 card=13)'
        117 STAT #140392791480168 id=2 cnt=4 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 str=1 time=24 us cost=2 size=858 card=13)'
        118 STAT #140392791480168 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        126 STAT #140392789565328 id=1 cnt=1 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=68 card=1)'
        127 STAT #140392789565328 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 str=1 time=12 us cost=1 size=0 card=1)'


        135 STAT #140392789722208 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=22 us cost=3 size=51 card=
        136 STAT #140392789722208 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 str=1 time=16 us cost=2 size=0 card=1)'


        153 STAT #140392792055264 id=1 cnt=1 pid=0 pos=1 obj=68 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 str=1 time=25 us)'
        154 STAT #140392792055264 id=2 cnt=1 pid=1 pos=1 obj=70 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 str=1 time=19 us)'

If you want to investigate further, the “interesting” columns in the underlying view are probably: section_name, component_name, operation_name, file_name, and function_name. The possible names of functions, files, etc. vary with the trace event you’ve enabled.

 

October 2, 2019

_cursor_obsolete_threshold

Filed under: 12c,Infrastructure,Oracle,Performance — Jonathan Lewis @ 2:39 pm BST Oct 2,2019

At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?

In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8192 was a good idea ?

The answer is PDBs – specifically the much larger number of PDBs a single CBD can (theoretically) support in 12.2.

In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:

The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.

Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.

It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.

Impact analysis

What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation onhis work. It addresses the following question:

If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.

Here’s the complete code for the test:


create table t1
select 1 id from dual
/

alter table t1 add constraint t1_pk primary key (id)
/

spool cursor_obsolete.lst

alter system flush shared_pool;
alter system flush shared_pool;

set serveroutput off
select /*+ index(t1) */ id from t1 where id > 0;
select * from table(dbms_xplan.display_cursor);

execute snap_my_stats.start_snap
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+8192 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

column sql_text format a60
select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number;

prompt  ===============
prompt  Low child reuse
prompt  ===============

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

prompt  ================
prompt  High child reuse
prompt  ================

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 7168+1..7168+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

spool off

I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.

I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.

The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.

The test runs in three parts.

  • First I loop 8192 times executing the same statement, but with a different value for the optimizer_index_cost_adj for each execution – this gives me the limit of 8192 child cursors, each reporting “Optimizer Mismatch” as the reason for not sharing. I’ve run a query against v$sql after this to check that I have 8192 child cursors – you’ll need to make sure your shared pool is a few hundred megabytes if you want to be sure of keeping them all in memory.
  • The second part of the test simply repeats the loop, but only for the first 1,024 child cursors. At this point the child cursors exist, so the optimizer should be doing “soft” parses rather than hard parses.
  • The final part of the test repeats the loop again, but only for the last 1,024 child cursors. Again they should exist and be usable, so the optimizer should again be doing “soft” parses rather than hard parses.

What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment. So if you have to find child cursor 8191 you will have to following 512 segment pointers, and 16 pointers per segment (totalling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.

One preipheral question to ask, of course, is whether Oracle keeps appending to the segmented array, or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).

And the results, limited to just the second and third parts, with just a couple of small edits are as follows:


host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt

===============                                            |    ================
Low child reuse                                            |    High child reuse
===============                                            |    ================

Interval:-  0 seconds                                      |    Interval:-  6 seconds

opened cursors cumulative                      2,084       |    opened cursors cumulative                      2,054
recursive calls                                6,263       |    recursive calls                                6,151
recursive cpu usage                               33       |    recursive cpu usage                              570
session logical reads                          1,069       |    session logical reads                          1,027
CPU used when call started                        33       |    CPU used when call started                       579
CPU used by this session                          37       |    CPU used by this session                         579
DB time                                           34       |    DB time                                          580
non-idle wait count                               16       |    non-idle wait count                                5
process last non-idle time                         1       |    process last non-idle time                         6
session pga memory                           524,288       |    session pga memory                            65,536
enqueue requests                                  10       |    enqueue requests                                   3
enqueue releases                                  10       |    enqueue releases                                   3
consistent gets                                1,069       |    consistent gets                                1,027
consistent gets from cache                     1,069       |    consistent gets from cache                     1,027
consistent gets pin                            1,039       |    consistent gets pin                            1,024
consistent gets pin (fastpath)                 1,039       |    consistent gets pin (fastpath)                 1,024
consistent gets examination                       30       |    consistent gets examination                        3
consistent gets examination (fastpath)            30       |    consistent gets examination (fastpath)             3
logical read bytes from cache              8,757,248       |    logical read bytes from cache              8,413,184
calls to kcmgcs                                    5       |    calls to kcmgcs                                    3
calls to get snapshot scn: kcmgss              1,056       |    calls to get snapshot scn: kcmgss              1,026
table fetch by rowid                              13       |    table fetch by rowid                               1
rows fetched via callback                          6       |    rows fetched via callback                          1
index fetch by key                                 9       |    index fetch by key                                 1
index scans kdiixs1                            1,032       |    index scans kdiixs1                            1,024
session cursor cache hits                         14       |    session cursor cache hits                          0
cursor authentications                         1,030       |    cursor authentications                         1,025
buffer is not pinned count                     1,066       |    buffer is not pinned count                     1,026
parse time cpu                                    23       |    parse time cpu                                   558
parse time elapsed                                29       |    parse time elapsed                               556
parse count (total)                            2,076       |    parse count (total)                            2,052
parse count (hard)                                11       |    parse count (hard)                                 3
execute count                                  1,050       |    execute count                                  1,028
bytes received via SQL*Net from client         1,484       |    bytes received via SQL*Net from client         1,486

Two important points to note:

  • the CPU utilisation goes up from 0.33 seconds to 5.7 seconds.
  • the number of hard parses is zero, this is all about searching for the

You might question are the 2,048-ish parse count(total) – don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.

The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.

Bottom line

The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.

If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.

Footnote 1

The odd thing about this “cursor obselete” feature is that I have a distinct memory that when  PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.

Footnote 2

If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id  296377.1“Troubleshooting: High Version Count Issues” is a useful reference.

Next Page »

Powered by WordPress.com.