Oracle Scratchpad

June 20, 2022

ORA-29857

Filed under: Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 3:30 pm BST Jun 20,2022

This is another little “case study” on solving a problem. It’s based on a question on the MOSC DBA Forum (needs an account) about trying to drop a tablespace and seeing the error “ORA-29857: domain indexes and/or secondary objects exist in the tablespace” when (apparently) there are no domain indexes or secondary objects in the tablespace according to the queries that have been used to try and locate the problem objects.

Not being completely familiar with all the possibilities that might be relevant to that particular error, but having a vague memory of seeing an oddity once before, I thought I’d do a quick check on a couple of possible causes.

  • Case 1, which I assumed ought to raise the error because the tablespace held some objects that met the requirements for the error: I created a table in a tablespace called users and, for the domain indexes a couple of Context (Oracle Text) indexes in a tablespace called test_8k_assm. Then I tried to drop the tablespace holding the context indexes. Here’s the relevant cut-n-paste (19.11.0.0):
SQL> drop tablespace test_8k_assm including contents and datafiles;
drop tablespace test_8k_assm including contents and datafiles
*
ERROR at line 1:
ORA-29857: domain indexes and/or secondary objects exist in the tablespace

  • Case 2, which I thought would drop the tablespace because the model didn’t meet the requirement of the error message: I swapped the locations of the table and its indexes, creating the table in the tablespace called test_8k_assm and the Context indexes in the tablespace called users then I tried to drop the tablespace holding only the table. Here’s the relevant cut-n-paste (19.11.0.0):
SQL> drop tablespace test_8k_assm including contents and datafiles;
drop tablespace test_8k_assm including contents and datafiles
*
ERROR at line 1:
ORA-29857: domain indexes and/or secondary objects exist in the tablespace

So, the error message is misleading and, possibly, this might explain why the queries the OP had posted weren’t reporting any guilty objects. Whatever the error message says, the domain indexes that you need to deal with may not be in the tablespace you want to drop.

I also have a question-mark over “secondary objects” – I don’t know what objects (other than the bits and pieces that make up domain indexes) are deemed to be secondary.

Investigating

It’s at this point that the purpose of the blog note starts to appear – I don’t know the answer to the OP’s question and I’ve discovered a gap in my knowledge so I’m willing to spend a little time extending my knowledge and trying to give a little help to the OP.

We can start with the view dba_secondary_objects (based on secobj$ from script dcore.bsq) which tells us that there are two possible sources of secondary objects – index types and statistics types:

SQL> select dbms_metadata.get_ddl('VIEW','DBA_SECONDARY_OBJECTS') view_text from dual;

-- with a little cosmetic tidying:

VIEW_TEXT
-------------------------------------------------------------------------------------------
CREATE OR REPLACE FORCE NONEDITIONABLE VIEW "SYS"."DBA_SECONDARY_OBJECTS" ("INDEX_OWNER", "INDEX_NAME", "SECONDARY_OBJECT_OWNER", "SECONDARY_OBJECT_NAME", "SECONDARY_OBJDATA_TYPE")
AS
select u.name, o.name, u1.name, o1.name, 
       decode(s.spare1, 0, 'FROM INDEXTYPE',1, 'FROM STATISTICS TYPE')
from   sys.user$ u, sys.obj$ o, sys.user$ u1, sys.obj$ o1, sys.secobj$ s
where  s.obj# = o.obj# and o.owner# = u.user# and
       s.secobj# = o1.obj#  and  o1.owner# = u1.user#;

With this view definition I’d be tempted to do a little drilling down, with queries that return increasing amounts of data, and then pick subsets to expand to higher precision, e.g:

select  secondary_objdata_type, count(*)
from    dba_secondary_objects
group by
        secondary_objdata_type
/

Are there any secondary objects that aren’t domain indexes – if there are a few then query the view for just those to find out what they are and who owns them.

select  *
from    dba_secondary_objects
where   secondary_objdata_type = 'FROM STATISTICS TYPE'
/

In fact if the total number of secondary objects was small I might report the details about all of them and try to track them down, or if there was only one schema using that tablespace I might restrict my queries to the subset of secondary objects owned by that schema:

select  distinct index_owner, index_name
from    dba_secondary_objects
where   secondary_object_owner = 'TEST_USER'
;

The OP had a rather larger task than my little model – there were 14,499 segments in the tablespace, and a query for (segment_owner, segment_type, count(*)) returned 332 rows – so had taken a similar approach to mine to try and minimise the amount of work needed, starting with the following queries (which I’ve enhanced slightly and given a cosmetic makeover):

prompt  ==================================
prompt  Domain indexes (in the tablespace)
prompt  ==================================

select  owner, index_name, table_owner, table_name, tablespace_name
from    dba_indexes 
where   index_type = 'DOMAIN'
-- and  tablespace_name = 'TEST_8K_ASSM'        -- error: domain indexes don't have tablespaces
;

prompt  ==========================================
prompt  Domain indexes on tables in the tablespace
prompt  ==========================================

select  t.table_name, t.tablespace_name, i.index_name, i.tablespace_name
from    dba_indexes i, dba_tables t
where   t.table_name = i.table_name
and     t.owner = i.owner
and     i.index_type = 'DOMAIN'
and     t.tablespace_name = 'TEST_8K_ASSM'
;

prompt  ===================================================
prompt  Secondary objects that are tables in the tablespace
prompt  ===================================================

select  index_owner, index_name, secondary_object_owner, secondary_object_name, secondary_objdata_type
from    dba_secondary_objects
where   secondary_object_name in (
                select table_name 
                from dba_tables 
                where tablespace_name = 'TEST_8K_ASSM'
        )
;

prompt  ======================================================
prompt  Indexes in the tablespace that are secondary objects ?
prompt  ======================================================

select  index_owner, index_name, secondary_object_owner, secondary_object_name, secondary_objdata_type
from    dba_secondary_objects
where   index_name in (
                select index_name 
                from dba_indexes 
                where tablespace_name = 'TEST_8K_ASSM'
        )
;

The first query had an error – domain indexes do not report a tablespace_name in dba_indexes, so this query inevitably returned no rows.

The second query would report any domain indexes on tables that were in the given tablespace – which would be useful for the simple case where everything was created in the user’s default tablespace.

The third query reports any tables in the given tablespace that are the physical tables instantiating the “logical” domain index – which is particularly important for my example of the indexes being in the given tablespace when the table itself was in some other tablespace.

The final query is trying to report domain indexes that have corresponding physical objects in the given tablespace – but this isn’t going to help (at least with my Context indexes) because the index_name in dba_secondary_indexes will be a name that appears in dba_indexes as index_type = ‘DOMAIN’ and won’t have a value stored in the tablespace_name column.

Here’s the output I get (preceded by a full report of the view dba_secondary_objects in my otherwise empty little database) when I had the table in test_8k_assm and the indexes in tablespace users.

SQL> select * from dba_secondary_objects;

INDEX_OWNER     INDEX_NAME           SECONDARY_OBJECT_OWNER           SECONDARY_OBJECT_NAME            SECONDARY_OBJDATA_TY
--------------- -------------------- -------------------------------- -------------------------------- --------------------
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$I                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$K                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$N                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$U                  FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$I                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$K                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$N                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$U                FROM INDEXTYPE

8 rows selected.

==================================
Domain indexes (in the tablespace)
==================================

OWNER           INDEX_NAME           TABLE_OWNER     TABLE_NAME                TABLESPACE_NAME
--------------- -------------------- --------------- ------------------------- ------------------------------
TEST_USER       SH_IX_NAME           TEST_USER       SCRIPT_HANDLES
TEST_USER       SH_IX_HANDLE         TEST_USER       SCRIPT_HANDLES

2 rows selected.

==========================================
Domain indexes on tables in the tablespace
==========================================

TABLE_NAME                TABLESPACE_NAME                INDEX_NAME           TABLESPACE_NAME
------------------------- ------------------------------ -------------------- ------------------------------
SCRIPT_HANDLES            TEST_8K_ASSM                   SH_IX_NAME
SCRIPT_HANDLES            TEST_8K_ASSM                   SH_IX_HANDLE

2 rows selected.

===================================================
Secondary objects that are tables in the tablespace
===================================================

no rows selected

=====================================================
Indexes in the tablespace that have secondary objects
=====================================================

no rows selected

As you can see I have two Context indexes, each represented by 4 secondary tables (and that’s not the complete story because those tables have their own indexes and a couple of them have LOB columns, and one of the secondary tables is actually an index-organized table).

And this is the result when the base table is in tablespace users and the indexes are in test_8k_assm:

==================================
Domain indexes (in the tablespace)
==================================

OWNER           INDEX_NAME           TABLE_OWNER     TABLE_NAME                TABLESPACE_NAME
--------------- -------------------- --------------- ------------------------- ------------------------------
TEST_USER       SH_IX_NAME           TEST_USER       SCRIPT_HANDLES
TEST_USER       SH_IX_HANDLE         TEST_USER       SCRIPT_HANDLES

2 rows selected

==========================================
Domain indexes on tables in the tablespace
==========================================

no rows selected

===================================================
Secondary objects that are tables in the tablespace
===================================================

INDEX_OWNER     INDEX_NAME           SECONDARY_OBJECT_OWNER           SECONDARY_OBJECT_NAME            SECONDARY_OBJDATA_TY
--------------- -------------------- -------------------------------- -------------------------------- --------------------
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$I                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$K                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$U                  FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$I                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$K                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$U                FROM INDEXTYPE

6 rows selected

=====================================================
Indexes in the tablespace that have secondary objects
=====================================================

no rows selected

So one of the middle two queries ought to return some rows if there are any Context indexes in the tablespace or on tables in the tablespace. (And for the final option – i.e. the table and its Context indexes being in the tablespace – both queries will return rows). The OP got no rows returned on all 4 queries.

What next?

It looks like the OP must have some secondary objects that aren’t playing by the rules if they’re “from index types”, or there are some secondary objects that are “from statistics types” and don’t get picked up by the queries. There is a view dba_ustat for “user stats types” which has interesting column names, but at this point I decided to follow a strategy that I’ve used many times in the past when Oracle raises an unexpected error on a repeatable process: repeat the process but enable SQL trace and see what Oracle was doing just before it raised the error.

Here’s the critical line in the trace file I produced; with a couple of the precding lines:

CLOSE #140708355472384:c=1860,e=1512,dep=1,type=0,tim=31701605233
EXEC #140708354845632:c=68018,e=68006,p=0,cr=1276,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=31701605472
ERROR #140708354845632:err=29857 tim=31701605485

A search backwards up the trace file showed that cursor #140708354845632 was for the “drop tablespace” statement. Just above the ERROR and EXEC lines we see cursor #140708355472384 closing. It’s reasonably likely (though not 100% certain) that this is the cursor representing the last SQL statement that Oracle executed before raising the error so searching backwards for “PARSING IN CURSOR #140708355472384” was my next step (and if that statement didn’t look promising I’d have search backwards from the ERROR line to find the cursor number from the most recent “EXEC” line). In this case (with my data) the CLOSE took me to the following (cosmetically cleaned) SQL:

select null from  (
             select o.obj# from obj$ o, tab$ t      where t.ts#=:1 and t.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, tabpart$ tp      where tp.ts#=:1 and tp.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, tabsubpart$ tsp      where tsp.ts#=:1 and tsp.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, ind$ i      where i.ts#=:1 and i.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, indpart$ ip      where ip.ts#=:1 and ip.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, indsubpart$ isp      where isp.ts#=:1 and isp.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, clu$ c      where c.ts#=:1 and c.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select i.obj# from ind$ i, tab$ t, tabpart$ tp      where tp.ts#=:1 and tp.bo# = t.obj# and t.obj#=i.bo# and i.type#=9    
union all    select i.obj# from ind$ i, tab$ t, tabcompart$ tcp,tabsubpart$ tsp      where tsp.ts#=:1 and tsp.pobj#=tcp.obj# and tcp.bo#=t.obj# and            t.obj#=i.bo# and i.type#=9    
union all    select i.obj# from ind$ i, tab$ t      where t.ts#=:1 and t.obj#=i.bo# and i.type#=9
)

Every single line of the UNION ALL is addressing the same tablespace by number (:1) looking for objects where bit 5 is set in the obj$.flags column (i.e. secondary object) or indexes where the ind$.type# is 9 (“cooperative index method”).

That looks like a good bet for being the code that Oracle uses to check if there are any objects that should block the drop, so I changed the bind variable to the tablespace number (from v$tablespace) changed the “select null” to “select obj#” and stuck the resulting query block into an IN subquery against dba_objects (this may not be the most efficient of queries):

select 
        secondary, owner, object_name, object_type, object_id
from    dba_objects
where   object_id in (
                select obj# from  (
                                  select o.obj# from obj$ o, tab$ t where t.ts#=6 and t.obj#=o.obj# and (decode(bitand(o.flags, 16), 16, 1, 0)=1)
                        union all ...
                        union all select i.obj# from ind$ i, tab$ t where t.ts#=6 and t.obj#=i.bo# and i.type#=9
                )
        )
order by secondary, owner, object_name
/

S OWNER           OBJECT_NAME                      OBJECT_TYPE              OBJECT_ID
- --------------- -------------------------------- ----------------------- ----------
N TEST_USER       SH_IX_HANDLE                     INDEX                       123956
N TEST_USER       SH_IX_NAME                       INDEX                       123944
Y TEST_USER       DR$SH_IX_HANDLE$I                TABLE                       123957
Y TEST_USER       DR$SH_IX_HANDLE$K                TABLE                       123960
Y TEST_USER       DR$SH_IX_HANDLE$KD               INDEX                       123966
Y TEST_USER       DR$SH_IX_HANDLE$KR               INDEX                       123967
Y TEST_USER       DR$SH_IX_HANDLE$U                TABLE                       123963
Y TEST_USER       DR$SH_IX_HANDLE$X                INDEX                       123965
Y TEST_USER       DR$SH_IX_NAME$I                  TABLE                       123945
Y TEST_USER       DR$SH_IX_NAME$K                  TABLE                       123948
Y TEST_USER       DR$SH_IX_NAME$KD                 INDEX                       123954
Y TEST_USER       DR$SH_IX_NAME$KR                 INDEX                       123955
Y TEST_USER       DR$SH_IX_NAME$U                  TABLE                       123951
Y TEST_USER       DR$SH_IX_NAME$X                  INDEX                       123953
Y TEST_USER       SYS_C0016528                     INDEX                       123952
Y TEST_USER       SYS_C0016536                     INDEX                       123964
Y TEST_USER       SYS_IL0000123945C00006$$         INDEX                       123947
Y TEST_USER       SYS_IL0000123957C00006$$         INDEX                       123959
Y TEST_USER       SYS_IOT_TOP_123949               INDEX                       123950
Y TEST_USER       SYS_IOT_TOP_123961               INDEX                       123962

20 rows selected.

This is the list I got when the base table and the Context indexes were all created in the tablespace I wanted to drop – notice that the “logical” indexes (the first two items, flagged as secondary=’N’) are reported along with all the physical objects relating to the context indexes.

When I created the indexes in the target tablespace but the base table in an alternative tablespace the query reported only the physical objects making up the Context indexes; when I created the base table in the target tablespace and the Context indexes in an alternative tablespace the query reported only the “logical” indexes.

Conclusion

As a last resort, when the simple queries you find in the manuals or on MOS (or elsewhere on the internet) don’t seem to supply all the answers the strategy of finding and hacking the query that Oracle runs just before raising the error is a good generic method for getting a better handle on what’s causing the problem.

In this example of Oracle error ORA-29758 you can get a list of the objects (logical or physical or both) that are blocking a “drop tablespace” command and that makes it easier to find out what else you need to drop before you can drop the tablespace.

Bear in mind that I’ve just demonstrated the approach while experimenting with a couple of Context indexes, where the information I needed was available through a couple of ordinary data dictionary views – I only dived down into the internals because the data dictionary views weren’t giving the OP a complete answer.

I don’t know what was in the OP’s database to make this problem happen (it may simply be that the database was version 11.2.0.4 while mine was 19.11.0.0 and there’s a bug that’s been fixed since 11g), so the OP may find that after they’ve cleared all domain indexes indicated by this query they may still see ORA-29758 and find that there’s another query that Oracle uses to check for other conditions (maybe those “from statistics type” secondary objects) that also needs to be hijacked and subverted to complete the job. I know that when I traced a successful “drop tablespace” there were 13 more statements after the one I’ve shown that looked as if they might be checking for the (non-)existence of other types of object before the tablespace could be dropped.

Update

The OP has reported back that following this strategy they’ve chased the problem through (for their system) three further sets of objects, with the following error messages:

ORA-23515: materialized views and/or their indices exist in the tablespace

ORA-25183: index-organized table top index segment is in a different tablespace

ORA-02449: unique/primary keys in table referenced by foreign keys

February 18, 2022

Index ITL Limit

Filed under: deadlocks,Indexing,Infrastructure,Locks,Oracle,trace files — Jonathan Lewis @ 5:02 pm GMT Feb 18,2022

Here’s a little script that I wrote more than 10 years ago when investigating some undesirable behaviour with indexes. I’ve just rediscovered it after seeing it mentioned in a comment to an old article that I had been prompted to revisit. This isn’t going to help you solve any specific problem, but it might give you some feel for how much work Oracle has to do to cater for efficient index maintenance.

The script is just a dirty little hack to crash a session by calling a procedure recursively until something breaks – in this case when all the ITL slots of an index leaf block are full and the block doesn’t split for the next insert (which was a little surprising in its own right).

rem
rem     Script:         itl_limit.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2010
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.1.0.6
rem  

create table itl_limit(n1 number) pctfree 0;
create unique index il_01 on itl_limit(n1) pctfree 0;

create or replace procedure recursive_itl(i_tx_count number)
as
        pragma autonomous_transaction;
begin
        if i_tx_count != 0 then
                insert into itl_limit values(200 - i_tx_count);
                recursive_itl(i_tx_count - 1);
                commit;
        end if;
end;
/

alter session set events '10046 trace name context forever, level 8';

execute recursive_itl(200);

alter system checkpoint;

alter session set events '10046 trace name context off';

prompt  ==========================================
prompt  If there is no index on the table then you
prompt  should see 169 rows in one block and 31 in
prompt  the other. But if there is an index there
prompt  should be no rows thanks to the rollback
prompt  caused by the error.
prompt  ==========================================

select 
        dbms_rowid.rowid_block_number(rowid), count(*) 
from 
        itl_limit
group by 
        dbms_rowid.rowid_block_number(rowid)
;

prompt  =================================
prompt  Try for a tree dump on the index
prompt  after which you can dump the root
prompt  block to see the ITL entries
prompt  =================================

column object_id new_value m_object_id

select  object_id, object_type, object_name
from    user_objects
where   object_name = 'IL_01'
/

alter session set events 'immediate trace name treedump level &m_object_id ';

If you comment out the creation of the index il_01 then the script completes very quickly (complaining, of course, about the attempt to do a treedump with a null level (ORA-49100: Failed to process event statement). Assuming you’re using an 8KB block size the rowid count query will show that you’ve got 169 rows in one block and 31 rows in the other – and if you dump the block with 169 rows you will find that (a) there’s loads of empty space in the block, and (b) the number of ITL entries has reached 169, and that’s the limiting factor that restricted the number of rows we could insert.

If you create the index then you’ll have to wait roughly 142 seconds for the procedure call to fail (with an ORA-00060: deadlock detected error) and the script to complete. And when you generate the tkprof output from the trace file you’ll find that most of the time is spent in the following statement:

INSERT INTO ITL_LIMIT
VALUES
(200 - :B1 )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    169      0.96     143.69         11      56324       4939         168
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      170      0.96     143.69         11      56324       4939         168

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 138     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  ITL_LIMIT (cr=5 pr=0 pw=0 time=192 us starts=1)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                        11        0.00          0.00
  enq: TX - allocate ITL entry                   48        5.01        142.49
********************************************************************************

We’ve execute 169 times, but reported only 168 rows, which suggests something went wrong on one insert. The 48 waits for “enq:TX – allocate ITL entry” invite further investigation, of course – so let’s get back to the raw trace file and find them.

grep -T -n  "WAIT.*alloc" or19_ora_15879.trc

 571   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 1000682 name|mode=1415053316 usn<<16 | slot=10420232 sequence=2 obj#=94939 tim=5285776687
 574   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 1016171 name|mode=1415053316 usn<<16 | slot=1310726 sequence=6209 obj#=94939 tim=5286793280
 577   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 1001580 name|mode=1415053316 usn<<16 | slot=10223624 sequence=2 obj#=94939 tim=5287795235
...
 709   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 4999483 name|mode=1415053316 usn<<16 | slot=11468804 sequence=2 obj#=94939 tim=5423905081
42381  :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 2999710 name|mode=1415053316 usn<<16 | slot=9633800 sequence=2 obj#=94939 tim=5426905028

All 48 waits occur after the 169th attempt to insert a row. Oracle rotates through 12 ITL slots waiting one second on each, then goes round the loop again waiting 2 seconds on each, then 4 seconds, then 5 seconds – except it doesn’t wait on the 12th ITL on the final loop, instead it reports “DEADLOCK DETECTED (ORA-00060)” in the trace file and dumps a Deadlock graph of the form:

Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00930008-00000002-0039DED3-00000000         44      49     X        18979      44      49           S  18979

Note the self-deadlock – the holder and waiter are the same session. After the deadlock graph we get the usual stack dump and after 48,000 lines of trace we see the message “Attempting to break deadlock by signaling ORA-00060” after which the session waits on the 12th ITL for 3 seconds and then the whole stack of autonomous transactions rolls back:

WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 2999710 name|mode=1415053316 usn<<16 | slot=9633800 sequence=2 obj#=94939 tim=5426905028
EXEC #139861052053576:c=429678,e=142567211,p=0,cr=504,cu=11,mis=0,r=0,dep=1,og=1,plh=0,tim=5427342894
ERROR #139861052053576:err=60 tim=5427342914
CLOSE #139861052053576:c=1,e=1,dep=1,type=3,tim=5427343141
XCTEND rlbk=1, rd_only=0, tim=5427343240
XCTEND rlbk=1, rd_only=0, tim=5427343336
...

My test script reports the object_id (not the data_object_id) of the index and does a treedump of it (which should show just a root block (which is also a leaf block) with no entries. The root block address lets you do a treedump, which will show something like:

----- begin tree dump
leaf: 0x400008b 67109003 (0: row:0.0 avs:3988)
----- end tree dump

Notice how the available space (avs) in this root/leaf block is only 3988 bytes rather than roughly 8,000 for an 8KB block size. That’s because the ITL area has taken advantage of its ability to grow to half the size of the block, and once it has grown it doesn’t shrink (Note: there is a hard limit of 255 which will only become apparent with larger block sizes – and if you want to test that you’ll have to edit my script to change both occurrences of the constant 200 to (at least) 256).

The data block address (DBA) of the root/leaf block is given in both hexadecimal and decimal; but if you can’t work out which file and block number this represents (I happen to know it’s file 16, and I can convert 0x8b to 139 decimal in my head) then you can check for the segment header block and add one to the block number; or use calls to the dbms_utility package to translate the DBA before doing a block dump:

SQL> select
  2          dbms_utility.data_block_address_file(67109003) file#,
  3          dbms_utility.data_block_address_block(67109003) block#
  4  from
  5          dual
  6  /

     FILE#     BLOCK#
---------- ----------
        16        139

1 row selected.

SQL> alter system flush buffer_cache;

System altered.

SQL> alter system dump datafile 16 block 139;

System altered.

Here’s the section of the resulting trace that shows you how bad things had got before the

 seg/obj: 0x172dd  csc:  0x0000000001311be1  itc: 169  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x4000088 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000001311be1
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x04   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
...
0xa8   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0xa9   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
Leaf block dump

A final comment: if you’re wondering why the table (without an index) can get to 169 rows while the index achieves 168 rows and fails on the 169th – there’s one ITL in the index ITL area that is reserved for the “service ITL” (see answer 2), the one that Oracle uses to register and hold the block on an “index node split”.

Footnote

One of the entertaining things, about poking around with Oracle is the way that you discover unexpected details – some of which are side-effects that you realise you should have predicted, some of which are just a little bizarre. In this case the easter egg in the trace file was the following statement (reproduced from the tkprof output, and made highly visible because I used the sort=execnt option.)

update /*+ rule */ undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,
  undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13
where
 us#=:1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      336      0.00       0.00          0          0          0           0
Execute    336      0.02       0.02          3        336        531         336
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      672      0.03       0.03          3        336        531         336

The predictable side effect was that Oracle was going to create a number of new undo segments: as a standard detail of minimising contention Oracle tries to give every concurrent transaction (which means every one of my autonomous transactions) its own undo segment.

The surprising detail was the /*+ rule */ hint – still in 19.11.0.0. I guess that that was to ensure that Oracle executed the update through an index access – but possibly a suitable /*+ index() */ hint would be more appropriate to almost every version of Oracle.

There were a number of other segment/undo related statements that operated 140+ times in the course of this test – which did make me wonder if a very busy OLTP system (doing lots of tiny, concurrent, transacation) with a lot of undo segments could spend a significant amount of its time managing undo segments – but that’s a thought for another day. Another “look at that some time” thing that appeared was the large number of selects and inserts to a table call undohist$.

August 31, 2020

Tracing Errors

Filed under: Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 10:16 am BST Aug 31,2020

This is a little lesson in trouble-shooting. It assumes you have the privilege to generate and edit trace files, and all I’m going to do is show how I worked out the answer to a fairly simple question that appeared recently on the Oracle Developer Community forum in a thread with the title  Cannot drop table after start dropping unused columns checkpoint.

I have a table t1 which is reasonably large (1M rows) with a column v30, and I’ve issued the command.

rem
rem     Script:         drop_column.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2003
rem

alter table t1 set unused column v30;

After this I’ve issued another command, pressed return, and immediately hit ctrl-C – I’ve got a brief window for this interrupt as the command is going to generate roughly 230MB of redo. If you want to try the experiment it might take a couple of attempts to get the timing right.

alter table t1
        drop unused columns
        checkpoint 1000
/
{ctrl-C}

Then I’ve tried to drop the table with the following result:

drop table t1
           *
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-12986: columns in partially dropped state. Submit ALTER TABLE DROP COLUMNS CONTINUE

This emulates the problem that appeared on the forum but the OP didn’t really want to issue the “continue” command because their table was “large” and the drop had been running for 24 hours when it was interrupted. It’s worth noting that four columns had been specified as unused, which means the drop command was probably generating roughly 1KB of redo per row. It’s also worth mentioning that the table was 600 columns wide – so there may have been a few performance side effects due to the multiple (minimum 3) row-pieces per row and row-chaining.

Ignoring any questions about the possible impact of having 600 columns, the key question in this case is:

  • Why isn’t it possible to drop the table (the manuals suggest it should be possible at this point)
  • Is there a way to bypass the problem?

This is a repeatable error – we can try to drop the table as many times as we like and we will get the same error reported in fractions of a second. so an obvious strategy is to enable tracing and see if the trace file can tell us anything about why the error is happening. This is a particularly sensible strategy to follow since error ORA-00604 is telling us that there’s something wrong in the recursive SQL, which means there’s a very good chance that we will actually find an SQL statement in the trace file that is the rescursive statement triggering the error.

So, enable sql_trace (or set event 10046), or do whatever you like to do to enable basic tracing (no need for anything above level 1 just yet); try to execute the drop; then search the trace file for the word ERROR at the start of a line (“^ERROR”). Here’s what I found as the first match in my trace file:

ERROR #139987889121800:err=12986 tim=424276176462

Note that the err= value is the 12986 that was reported as the error under the ORA-00604 error. Sometimes it’s necessary to search backwards in the trace file until you find the matching cursor number (#139987889121800), but in this case it was already visible just a few lines further up the file. So here’s the fragment of the file around that ERROR line:


PARSING IN CURSOR #139987889121800 len=72 dep=1 uid=0 oct=15 lid=0 tim=424276175961 hv=1894278903 ad='75c06e38' sqlid='4wv7gq1sfhtrr'
ALTER TABLE "TEST_USER"."T1" RENAME TO "BIN$rhxBELdQGMXgUwEAAH93eQ==$0"
END OF STMT
PARSE #139987889121800:c=2896,e=3035,p=0,cr=4,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=424276175960
EXEC #139987889121800:c=186,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=424276176418
ERROR #139987889121800:err=12986 tim=424276176462
CLOSE #139987889121800:c=10,e=9,dep=1,type=0,tim=424276176712
EXEC #139987891478792:c=34686,e=35859,p=0,cr=73,cu=12,mis=0,r=0,dep=0,og=1,plh=0,tim=424276176774
ERROR #139987891478792:err=604 tim=424276176808

The error has come from an SQL statement that is trying to rename my table to some wierd and wonderful name which starts with the characters “BIN$” – that’s a “drop” command trying to move a table into the recycle bin by renaming it – and you’re not allowed to rename a table that is in a partially dropped state. So that’s why we get the error; and the obvious way to bypass it is: “drop table t1 purge;” – which works.

You’ll notice that I’ve included a couple of lines after the first ERROR. This is to show you the line that generated the ORA-00604 (err=604) error. It comes from cursor #139987891478792, and seraching backwards up the file for that cursor number I get to:

PARSING IN CURSOR #139987891478792 len=13 dep=0 uid=107 oct=12 lid=107 tim=424276140765 hv=202649412 ad='7f517dd3fe00' sqlid='d47kdkn618bu4'
drop table t1
END OF STMT

That’s not a suprise, of course, but it is important to cross-check that you haven’t been chasing the wrong error. There are some cases where the Oracle code does something to see if an error will occur but has an exception handler that means the error doesn’t end up reaching the application, so you do need to do a check that the error you found first was the one that floated up to the top of the stack.

Footnote

From Oracle 12.2.0.1 you could arrange to read your own trace file – while your session is connected – through the dynamic performance view v$diag_trace_file_contents.

March 12, 2020

Sequence Costs

Filed under: Bugs,Oracle,Performance,trace files,Troubleshooting — Jonathan Lewis @ 8:18 pm GMT Mar 12,2020

[Update: make sure you read the comments – the behaviour described here is a bug, related to the setting of audit_trail, introduced in 19c, fixed by 19.11 and fixed in 21.3]

You’re probably aware of the “identity” construct that appeared in 12.1 and uses Oracle’s sequence mechanism to model the Autonumber or Identity columns that other databases have. A posting from Clay Jackson on the Oracle-L list server suggests that something about their code path has introduced a surprising overhead in 19c … when you’re not using them.

The following code is a slightly modified version of a test case that Clay Jackson posted to demonstrate a strange difference in performance between 12.2 and 19.3

rem
rem     Script:         19c_sequence_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2020
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem

drop table test_tab2 purge;
drop table test_tab1 purge;

drop sequence test_seq1;
drop sequence test_seq2;

create sequence test_seq1;
create sequence test_seq2;

create table test_tab1 (
   id number primary key,
   c1 varchar2(1000)
);

create table test_tab2 (
   id number,
   t1_id number,
   c2 varchar2(1000),
   constraint test_tab2_fk foreign key (t1_id)
        references test_tab1 (id) on delete cascade
);

spool 19c_sequence_cost.lst

set feedback off
set pagesize 0
set serveroutput off

execute snap_my_stats.start_snap
alter session set events '10046 trace name context forever';

prompt  ==============================================================================
prompt  start_1000.sql cascades to start_1.sql
prompt  start_1.sql holds:
prompt
prompt  insert into test_tab1 values (test_seq1.nextval,'hello');
prompt  insert into test_tab2 values (test_seq2.nextval, test_seq1.currval, 'byebye');
prompt  commit;
prompt
prompt  ==============================================================================

start start_1000

set serveroutput on
set pagesize 60
alter session set events '10046 trace name context off';
execute snap_my_stats.end_snap

spool off

I’ve got a couple of tables with columns that I plan to generate from sequences, and I’ve got a referential integrity constraint between those tables. I’ll be using nextval from one sequence to populate the first table, then use currval from the same sequence for the foreign key and nextval from the other sequence as a unique identifier for the second table.

I’ve used my “repeater” code to run a short script from the SQL*Plus prompt 1,000 times (as suggested by Clay Jackson in his posting). I’ve also added lines to enable SQL_trace at the basic level, and taken a snapshot of the session activity stats.

Just as Clay Jackson had reported – 19.3.0.0 took significantly longer than 12.2.0.1 to complete the 1,000 calls to the script. (Though in absolute terms we’re only talking fractions of a second for a small single user test.) Examination of the stats – which prompted me to add the 10046 trace and repeat – made it very clear where the difference lay. Here are a few figures picked from the session activity stats:

Version    Statistic                        Value
========   ========================         =====
12.2.0.1   CPU used by this session            53
19.3.0.0   CPU used by this session           114

12.2.0.1   recursive cpu usage                 18
19.3.0.0   recursive cpu usage                 44

12.2.0.1   recursive calls                  1,182
19.3.0.0   recursive calls                  7,215

12.2.0.1   execute count                    2,137
19.3.0.0   execute count                    8,151

Clearly it’s worth finding out what was happening in those 6,000 extra recursive calls. And this is what the 19.3 trace file (after passing through tkprof … sort=execnt) showed:


select count(1)
from
 sys.idnseq$ where seqobj# = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   6000      0.07       0.09          0          0          0           0
Fetch     6000      0.03       0.04          0       6000          0        6000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    12001      0.10       0.14          0       6000          0        6000

If you can’t guess what the table idnseq$ is about, the following definition appears in $ORACLE_HOME/rdbms/admin/dcore.bsq:

create table idnseq$     /* stores table identity column to sequence mapping */
( obj#         number not null,                       /* table object number */
  intcol#      number not null,                    /* identity column number */
  seqobj#      number not null,                    /* sequence object number */
  startwith    number not null,                   /* sequence starting value */
  supplemental log group idnseq$_log_grp (obj#, intcol#, seqobj#) always
)
/

We’re not using identity columns in this test, but every time we run the script we do six checks against this data dictionary table to see (presumably) if we’re using a sequence that is internally associated with an identity column. The requirement doesn’t seem completely unreasonable – but it’s a bit surprising that we do every check twice. (Note – we have to check test_seq1 for both statements but test_seq2 only for the second statement, which is how we get 6,000 executions: 2 * (2 checks for test_seq1 + 1 check for test_seq2) * 1,000 executions of the script.

The doubling looks like a bug, and you have to wonder why a suitable piece of information isn’t kept in the dictionary cache anyway to allow Oracle to avoid executing the recursive statement. I should point out that the surprisingly large impact is visible because I’ve executed 2,000 separate statements from the client side; mechanisms like array inserts and pl/sql loops might only have to do this check once for each array or pl/sql call. (Tests of this hypothesis are left as exercise for the interested reader.)

October 3, 2019

Trace Files

Filed under: 12c,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:38 pm BST Oct 3,2019

A recent blog note by Martin Berger about reading trace files in 12.2 popped up in my twitter timeline yesterday and reminded me of a script I wrote a while ago to create a simple view I could query to read the tracefile generated by the current session while the session was still connected. You either have to create the view and a public synonym through the SYS schema, or you have to use the SYS schema to grant select privileges on several dynamic performance views to the user to allow the user to create the view in the their own schema. For my scratch database I tend to create the view in the SYS schema.

Script to be run by SYS:

rem
rem     Script: read_trace_122.sql
rem     Author: Jonathan Lewis
rem     Dated:  Sept 2018
rem
rem     Last tested
rem             12.2.0.1

create or replace view my_trace_file as
select 
        *
from 
        v$diag_trace_file_contents
where
        (adr_home, trace_filename) = (
                select
                --      substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                        substr(
                                substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                1,
                                instr(
                                        substr(tracefile, 1, instr(tracefile,'/',-1)),
                                        'trace'
                                ) - 2
                        ),
                        substr(tracefile, instr(tracefile,'/',-1)+1) trace_filename
                from 
                        v$process
                where   addr = (
                                select  paddr
                                from    v$session
                                where   sid = (
                                        sys_context('userenv','sid')
                                        -- select sid from v$mystat where rownum = 1
                                        -- select dbms_support.mysid from dual
                                )
                        )
        )
;


create public synonym my_trace_file for sys.my_trace_file;
grant select on my_trace_file to {some role};

Alternatively, the privileges you could grant to a user from SYS so that they could create their own view:


grant select on v_$process to some_user;
grant select on v_$session to some_user;
grant select on v_$diag_trace_file_contents to some_user;
and optionally one of:
        grant select on v_$mystat to some_user;
        grant execute on dbms_support to some_user;
                but dbms_support is no longer installed by default.

The references to package dbms_support and view v$mystat are historic ones I have lurking in various scripts from the days when the session id (SID) wasn’t available in any simpler way.

Once the view exists and is available, you can enable some sort of tracing from your session then query the view to read back the trace file. For example, here’s a simple “self-reporting” (it’s going to report the trace file that it causes) script that I’ve run from 12.2.0.1 as a demo:


alter system flush shared_pool;
alter session set sql_trace true;

set linesize 180
set trimspool on
set pagesize 60

column line_number      format  999,999
column piece            format  a150    
column plan             noprint
column cursor#          noprint

break on plan skip 1 on cursor# skip 1

select
        line_number,
        line_number - row_number() over (order by line_number) plan,
        substr(payload,1,instr(payload,' id=')) cursor#,
        substr(payload, 1,150) piece
from
        my_trace_file
where
        file_name = 'xpl.c'
order by
        line_number
/

alter session set sql_trace false;

The script flushes the shared pool to make sure that it’s going to trigger some recursive SQL then enables a simple SQL trace. The query then picks out all the lines in the trace file generated by code in the Oracle source file xpl.c (execution plans seems like a likely guess) which happens to pick out all the STAT lines in the trace (i.e. the ones showing the execution plans).

I’ve used the “tabibitosan” method to identify all the lines that belong to a single execution plan by assuming that they will be consecutive lines in the output starting from a line which includes the text ” id=1 “ (the surrounding spaces are important), but I’ve also extracted the bit of the line which includes the cursor number (STAT #nnnnnnnnnnnnnnn) because two plans may be dumped one after the other if multiple cursors close at the same time. There is still a little flaw in the script because sometimes Oracle will run a sys-recursive statement in the middle of dumping a plan to turn an object_id into an object_name, and this will cause a break in the output.

The result of the query is to extract all the execution plans in the trace file and print them in the order they appear – here’s a sample of the output:


LINE_NUMBER PIECE
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------
         38 STAT #140392790549064 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=53 us cost=4 size=113 card
         39 STAT #140392790549064 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=0 card=1)'


         53 STAT #140392790535800 id=1 cnt=1 pid=0 pos=1 obj=0 op='MERGE JOIN OUTER (cr=5 pr=0 pw=0 str=1 time=95 us cost=2 size=178 card=1)'
         54 STAT #140392790535800 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 str=1 time=57 us cost=2 size=138 card=1)'
         55 STAT #140392790535800 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'
         56 STAT #140392790535800 id=4 cnt=0 pid=1 pos=2 obj=0 op='BUFFER SORT (cr=2 pr=0 pw=0 str=1 time=29 us cost=0 size=40 card=1)'
         57 STAT #140392790535800 id=5 cnt=0 pid=4 pos=1 obj=73 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=2 pr=0 pw=0 str=1 time=10 us cost=0 size=40 card=1)
         58 STAT #140392790535800 id=6 cnt=0 pid=5 pos=1 obj=74 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=2 pr=0 pw=0 str=1 time=8 us cost=0 size=0 card=1)'


         84 STAT #140392791412824 id=1 cnt=1 pid=0 pos=1 obj=20 op='TABLE ACCESS BY INDEX ROWID BATCHED ICOL$ (cr=4 pr=0 pw=0 str=1 time=25 us cost=2 size=54 card
         85 STAT #140392791412824 id=2 cnt=1 pid=1 pos=1 obj=42 op='INDEX RANGE SCAN I_ICOL1 (cr=3 pr=0 pw=0 str=1 time=23 us cost=1 size=0 card=2)'


         94 STAT #140392790504512 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 str=1 time=432 us cost=6 size=374 card=2)'
         95 STAT #140392790504512 id=2 cnt=2 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=7 pr=0 pw=0 str=1 time=375 us cost=5 size=374 card=2)'
         96 STAT #140392790504512 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 str=1 time=115 us cost=2 size=288 card=2)'
         97 STAT #140392790504512 id=4 cnt=2 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 str=1 time=100 us cost=2 size=184 card=2)'
         98 STAT #140392790504512 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=85 us cost=1 size=0 card=1)'
         99 STAT #140392790504512 id=6 cnt=0 pid=3 pos=2 obj=75 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=1 pr=0 pw=0 str=2 time=8 us cost=0 size=52 card=1)'
        100 STAT #140392790504512 id=7 cnt=0 pid=6 pos=1 obj=76 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=1 pr=0 pw=0 str=2 time=7 us cost=0 size=0 card=1)'
        101 STAT #140392790504512 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW  (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=43 card=1)'
        102 STAT #140392790504512 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=0 pw=0 str=1 time=44 us cost=3 size=15 card=1)'
        103 STAT #140392790504512 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS CLUSTER CDEF$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=15 card=1)'
        104 STAT #140392790504512 id=11 cnt=1 pid=10 pos=1 obj=30 op='INDEX UNIQUE SCAN I_COBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        116 STAT #140392791480168 id=1 cnt=4 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 str=1 time=62 us cost=3 size=858 card=13)'
        117 STAT #140392791480168 id=2 cnt=4 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 str=1 time=24 us cost=2 size=858 card=13)'
        118 STAT #140392791480168 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        126 STAT #140392789565328 id=1 cnt=1 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=68 card=1)'
        127 STAT #140392789565328 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 str=1 time=12 us cost=1 size=0 card=1)'


        135 STAT #140392789722208 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=22 us cost=3 size=51 card=
        136 STAT #140392789722208 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 str=1 time=16 us cost=2 size=0 card=1)'


        153 STAT #140392792055264 id=1 cnt=1 pid=0 pos=1 obj=68 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 str=1 time=25 us)'
        154 STAT #140392792055264 id=2 cnt=1 pid=1 pos=1 obj=70 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 str=1 time=19 us)'

If you want to investigate further, the “interesting” columns in the underlying view are probably: section_name, component_name, operation_name, file_name, and function_name. The possible names of functions, files, etc. vary with the trace event you’ve enabled.

 

March 1, 2019

Hash Optimisation-

Filed under: Oracle,Parallel Execution,Performance,trace files — Jonathan Lewis @ 4:39 pm GMT Mar 1,2019

Franck Pachot did an interesting presentation at the OBUG (Belgium user group) Tech Days showing how to use one of the O/S debug/trace tools to step through the function calls that Oracle made during different types of joins. This prompted me to ask him a question about a possible optimisation of hash joins as follows:

The hash join operation creates an in-memory hash table from the rowsource produced by its first child operation then probes the hash table with rows from the row source produced by the second child operation; but if there are no rows in the first row source then there’s no need to acquire rows from the second row source, so Oracle doesn’t call the second child operation.

Does the hash join adopt a “symmetrical” strategy – starting the second child operation prematurely to find the first couple of rows before executing the first child operation and building the in-memory hash table ?

Franck got back to me before the end of the event with the answer – which was “no”. (This makes sense, of course, because if there’s no data in the second rowsource – which is supposed to be the “big” rowsource – you’ve probably done a lot of work which you may not have needed to do if you’d checked the first rowsource properly first.  (Thought – might Oracle look for the first row from the first row source, then check the second row source for a first row, then finish the first rowsource and build? How difficult do you want to make your life?)

So Franck got me an answer by looking at function calls – but I’d only asked him because I thought he might have done the experiment already and might be able to give me an answer immediately rather than having to waste his valuable Belgian Beer Drinking time to do some tests for me. (And I’ve never got around to playing with strace, ptrace, dtrace, truss, et. al. so I didn’t fancy doing the tracing for myself, of course). But this particular question doesn’t need any examination of function calls to get an answer: an extended trace file would be sufficient. Here’s how to start:


rem
rem     Script:         hash_join_opt.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2019
rem
rem     Last tested
rem             11.2.0.4
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,20)                  n20,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table t2
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,10)                  n10,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
        );
end;
/

set serveroutput off

alter system flush buffer_cache;
execute dbms_lock.sleep(2)

alter session set statistics_level = all;
alter session set events '10046 trace name context forever, level 8';

select
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n20 = 19
and     t2.id = t1.id
and     t2.n10 = 7.5
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set events '10046 trace name context off';
alter session set statistics_level = typical;

I’ve created a pair of tables with similar volumes of data and then executed a query that should do a hash join between the tables. The t2 table will return no rows since the predicate on n10 is asking for a value that doesn’t exist but the optimizer will think it will supply 10% of the table since I’ve avoided creating a histogram on the column and the num_distinct for the n10 column is 10. You’ll notice that I’ve flushed the buffer_cache (with a couple of seconds pause, just in case) before executing the query. Here’s the execution plan with rowsource execution stats:


SQL_ID  bscu1r7cu36ur, child number 0
-------------------------------------
select  t1.v1, t2.v1 from  t1, t2 where  t1.n20 = 19 and t2.id = t1.id
and t2.n10 = 7.5

Plan hash value: 1838229974

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      0 |00:00:00.04 |     352 |    348 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    500 |      0 |00:00:00.04 |     352 |    348 |  1519K|  1519K| 1487K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    500 |    500 |00:00:00.03 |     176 |    174 |       |       |          |
|*  3 |   TABLE ACCESS FULL| T2   |      1 |   1000 |      0 |00:00:00.01 |     176 |    174 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   2 - filter("T1"."N20"=19)
   3 - filter("T2"."N10"=7.5)

As you can see, the tablescan of t2 returned no rows so the hash join returned no rows, but the stats show 174 blocks read from both t1 and t2. The fact that we read 174 blocks from t2 doesn’t mean we didn’t do a pre-emptive check, of course, as those reads could (by the hypothesis) have been the reads required by the pre-emptive check. The fact that we did a complete scan both tables, though, tells us that the reads on t2 (probably) weren’t from a pre-emptive check (if they were then Oracle shouldn’t have scanned the whole of t1).

We can get extra corroborative evidence though when we check the trace file – which shows us the following waits:

PARSING IN CURSOR #139927916187824 len=88 dep=0 uid=62 oct=3 lid=62 tim=1551445144496821 hv=3651246935 ad='97353e10' sqlid='bscu1r7cu36ur'
select
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n20 = 19
and     t2.id = t1.id
and     t2.n10 = 7.5
END OF STMT

PARSE #139927916187824:c=3000,e=2760,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1838229974,tim=1551445144496811
EXEC #139927916187824:c=0,e=78,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1838229974,tim=1551445144497155
WAIT #139927916187824: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=351199 tim=1551445144497224

WAIT #139927916187824: nam='db file sequential read' ela= 634 file#=5 block#=130 blocks=1 obj#=351198 tim=1551445144498386
WAIT #139927916187824: nam='db file scattered read' ela= 397 file#=5 block#=131 blocks=5 obj#=351198 tim=1551445144498971
WAIT #139927916187824: nam='db file scattered read' ela= 1752 file#=5 block#=136 blocks=8 obj#=351198 tim=1551445144502411
WAIT #139927916187824: nam='db file scattered read' ela= 426 file#=5 block#=145 blocks=7 obj#=351198 tim=1551445144504056
WAIT #139927916187824: nam='db file scattered read' ela= 590 file#=5 block#=152 blocks=8 obj#=351198 tim=1551445144505655
WAIT #139927916187824: nam='db file scattered read' ela= 8063 file#=5 block#=161 blocks=7 obj#=351198 tim=1551445144514574
WAIT #139927916187824: nam='db file scattered read' ela= 402 file#=5 block#=168 blocks=8 obj#=351198 tim=1551445144515741
WAIT #139927916187824: nam='db file scattered read' ela= 803 file#=5 block#=177 blocks=7 obj#=351198 tim=1551445144517495
WAIT #139927916187824: nam='db file scattered read' ela= 423 file#=5 block#=184 blocks=8 obj#=351198 tim=1551445144518743
WAIT #139927916187824: nam='db file scattered read' ela= 422 file#=5 block#=193 blocks=7 obj#=351198 tim=1551445144520696
WAIT #139927916187824: nam='db file scattered read' ela= 266 file#=5 block#=200 blocks=8 obj#=351198 tim=1551445144521460
WAIT #139927916187824: nam='db file scattered read' ela= 307 file#=5 block#=209 blocks=7 obj#=351198 tim=1551445144522249
WAIT #139927916187824: nam='db file scattered read' ela= 313 file#=5 block#=216 blocks=8 obj#=351198 tim=1551445144523166
WAIT #139927916187824: nam='db file scattered read' ela= 247 file#=5 block#=225 blocks=7 obj#=351198 tim=1551445144523927
WAIT #139927916187824: nam='db file scattered read' ela= 315 file#=5 block#=232 blocks=8 obj#=351198 tim=1551445144524646
WAIT #139927916187824: nam='db file scattered read' ela= 326 file#=5 block#=241 blocks=7 obj#=351198 tim=1551445144525565
WAIT #139927916187824: nam='db file scattered read' ela= 186 file#=5 block#=248 blocks=8 obj#=351198 tim=1551445144526255
WAIT #139927916187824: nam='db file scattered read' ela= 341 file#=5 block#=258 blocks=55 obj#=351198 tim=1551445144527363
----------------
WAIT #139927916187824: nam='db file sequential read' ela= 201 file#=5 block#=386 blocks=1 obj#=351199 tim=1551445144531165
WAIT #139927916187824: nam='db file scattered read' ela= 221 file#=5 block#=387 blocks=5 obj#=351199 tim=1551445144531492
WAIT #139927916187824: nam='db file scattered read' ela= 194 file#=5 block#=392 blocks=8 obj#=351199 tim=1551445144531750
WAIT #139927916187824: nam='db file scattered read' ela= 301 file#=5 block#=401 blocks=7 obj#=351199 tim=1551445144532271
WAIT #139927916187824: nam='db file scattered read' ela= 272 file#=5 block#=408 blocks=8 obj#=351199 tim=1551445144532756
WAIT #139927916187824: nam='db file scattered read' ela= 258 file#=5 block#=417 blocks=7 obj#=351199 tim=1551445144533218
WAIT #139927916187824: nam='db file scattered read' ela= 242 file#=5 block#=424 blocks=8 obj#=351199 tim=1551445144533704
WAIT #139927916187824: nam='db file scattered read' ela= 232 file#=5 block#=433 blocks=7 obj#=351199 tim=1551445144534125
WAIT #139927916187824: nam='db file scattered read' ela= 213 file#=5 block#=440 blocks=8 obj#=351199 tim=1551445144534506
WAIT #139927916187824: nam='db file scattered read' ela= 241 file#=5 block#=449 blocks=7 obj#=351199 tim=1551445144534914
WAIT #139927916187824: nam='db file scattered read' ela= 221 file#=5 block#=456 blocks=8 obj#=351199 tim=1551445144535258
WAIT #139927916187824: nam='db file scattered read' ela= 311 file#=5 block#=465 blocks=7 obj#=351199 tim=1551445144536518
WAIT #139927916187824: nam='db file scattered read' ela= 215 file#=5 block#=472 blocks=8 obj#=351199 tim=1551445144536906
WAIT #139927916187824: nam='db file scattered read' ela= 164 file#=5 block#=481 blocks=7 obj#=351199 tim=1551445144537343
WAIT #139927916187824: nam='db file scattered read' ela= 160 file#=5 block#=488 blocks=8 obj#=351199 tim=1551445144537624
WAIT #139927916187824: nam='db file scattered read' ela= 253 file#=5 block#=497 blocks=7 obj#=351199 tim=1551445144538041
WAIT #139927916187824: nam='db file scattered read' ela= 252 file#=5 block#=504 blocks=8 obj#=351199 tim=1551445144538456
WAIT #139927916187824: nam='db file scattered read' ela= 947 file#=5 block#=514 blocks=55 obj#=351199 tim=1551445144539690

FETCH #139927916187824:c=25996,e=43227,p=348,cr=352,cu=0,mis=0,r=0,dep=0,og=1,plh=1838229974,tim=1551445144540502

I’ve inserted a few gaps into this section of the trace file but haven’t deleted any lines. As you can see there’s a series of reads for obj# 351198 (table t1), followed by a series of reads of obj# 351199. We don’t do anything subtle like peeking at t1, switching to peek at t2, then continuing with t1: it’s a simple end to end brute force scan of each table in turn.

But there’s more …

Before I created the test above I decided to check whether I had already done the test some time in the past and written about it. A search through my blog didn’t turn up any notes about this question, but it did uncover the following footnote to an earlier posting about hash joins:

taking a closer look at the set of trace files generated in the broadcast test I discovered that the first set of slaves start their parallel tablescan of t1 but stop after just one read from each slave, then the second set of slaves scans and builds the hash table from t2 before calling for further data from t1.

Naturally I included a link to the footnote in the draft notes for this blog entry and promptly forgot about writing the blog note until a couple of days ago when Chinar Aliyev produced a note on the hash join buffered operation in which he had included an execution plan similar to the broadcast plan from my earlier blog note – so I thought I’d take another look at it, because it looks like some versions of the parallel hash join can do exactly the pre-emptive test that the serial execution plan doesn’t.

So, here’s a query to run against the same data set, and the resulting execution plan – pulled from memory after executing the query:


select
        /*+
                parallel(2)
                leading(t1 t2)
                use_hash(t2)
                pq_distribute(t2 none broadcast)
        */
        t1.v1, t2.v1
--      t1.v1, count(*)
from
        t1, t2
where
        t1.n20 = 19
and     t2.id = t1.id
and     t2.n10 = 7
;

----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |        |      |            |      0 |00:00:00.22 |      10 |      2 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |        |      |            |      0 |00:00:00.22 |      10 |      2 |       |       |          |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001 |      0 |    500 |  Q1,01 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN             |          |      1 |    500 |  Q1,01 | PCWP |            |      0 |00:00:00.12 |     128 |    173 |  1483K|  1483K|     2/0/0|
|   4 |     PX BLOCK ITERATOR    |          |      2 |    500 |  Q1,01 | PCWC |            |    500 |00:00:00.03 |     248 |    173 |       |       |          |
|*  5 |      TABLE ACCESS FULL   | T1       |     25 |    500 |  Q1,01 | PCWP |            |    500 |00:00:00.02 |     248 |    173 |       |       |          |
|   6 |     BUFFER SORT          |          |      2 |        |  Q1,01 | PCWC |            |   2000 |00:00:00.07 |       0 |      0 |   108K|   108K|     2/0/0|
|   7 |      PX RECEIVE          |          |      2 |   1000 |  Q1,01 | PCWP |            |   2000 |00:00:00.06 |       0 |      0 |       |       |          |
|   8 |       PX SEND BROADCAST  | :TQ10000 |      0 |   1000 |  Q1,00 | P->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   9 |        PX BLOCK ITERATOR |          |      2 |   1000 |  Q1,00 | PCWC |            |   1000 |00:00:00.05 |     248 |    173 |       |       |          |
|* 10 |         TABLE ACCESS FULL| T2       |     25 |   1000 |  Q1,00 | PCWP |            |   1000 |00:00:00.04 |     248 |    173 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------

Now the rule for reading a DFO tree in a parallel plan is to follow the table queues (a.k.a. DFO numbers / TQ numbers) – but in this case we have a bit of a mixture as the table scan of t1 isn’t used as the input to a distribution so it’s hard to know whether the first physical activity is supposed to be the tablescan of t2 (feeding the zeroth table queue) or the tablescan of t1 which doesn’t feed a table queue until after the hash join.

Given that t2 is being broadcast we should probably expect to see the first set of parallel query slaves starting first with a scan of “random” sections of t1 to build an in-memory hash table, and then see the second set of parallel query slaves scanning t2 and broadcasting the results to the first set of slaves (every slave gets a copy of every row) to do the probe. We can try to confirm this by looking at the trace files. First we extract “db file scattered read” lines from the four trace files and then we sort them by timestamp.

test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 677 file#=5 block#=220 blocks=4 obj#=351218 tim=1551450113700076
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 214 file#=5 block#=225 blocks=3 obj#=351218 tim=1551450113701131
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 238 file#=5 block#=161 blocks=7 obj#=351218 tim=1551450113702026
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 318 file#=5 block#=177 blocks=6 obj#=351218 tim=1551450113703464
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 371 file#=5 block#=190 blocks=2 obj#=351218 tim=1551450113705168
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 227 file#=5 block#=193 blocks=5 obj#=351218 tim=1551450113705690
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 225 file#=5 block#=138 blocks=6 obj#=351218 tim=1551450113706850
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 487 file#=5 block#=131 blocks=5 obj#=351218 tim=1551450113708512
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 228 file#=5 block#=136 blocks=2 obj#=351218 tim=1551450113709161
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 219 file#=5 block#=294 blocks=7 obj#=351218 tim=1551450113710099
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 150 file#=5 block#=184 blocks=6 obj#=351218 tim=1551450113711398
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 229 file#=5 block#=273 blocks=7 obj#=351218 tim=1551450113712456
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 223 file#=5 block#=280 blocks=7 obj#=351218 tim=1551450113713829
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 253 file#=5 block#=168 blocks=7 obj#=351218 tim=1551450113715124
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 1014 file#=5 block#=308 blocks=5 obj#=351218 tim=1551450113717046
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 266 file#=5 block#=235 blocks=5 obj#=351218 tim=1551450113718684
test_p000_5963.trc:WAIT #140673176773096: nam='db file scattered read' ela= 208 file#=5 block#=241 blocks=2 obj#=351218 tim=1551450113719321
----
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 326 file#=5 block#=287 blocks=7 obj#=351218 tim=1551450113700003
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 147 file#=5 block#=198 blocks=2 obj#=351218 tim=1551450113701394
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 72 file#=5 block#=200 blocks=5 obj#=351218 tim=1551450113701840
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 459 file#=5 block#=153 blocks=7 obj#=351218 tim=1551450113703442
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 308 file#=5 block#=259 blocks=7 obj#=351218 tim=1551450113705132
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 313 file#=5 block#=266 blocks=7 obj#=351218 tim=1551450113706540
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 304 file#=5 block#=243 blocks=5 obj#=351218 tim=1551450113707925
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 219 file#=5 block#=248 blocks=2 obj#=351218 tim=1551450113708505
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 590 file#=5 block#=228 blocks=4 obj#=351218 tim=1551450113709705
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 178 file#=5 block#=232 blocks=3 obj#=351218 tim=1551450113710270
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 142 file#=5 block#=205 blocks=3 obj#=351218 tim=1551450113711046
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 98 file#=5 block#=209 blocks=4 obj#=351218 tim=1551450113711457
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 179 file#=5 block#=213 blocks=3 obj#=351218 tim=1551450113712308
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 296 file#=5 block#=216 blocks=4 obj#=351218 tim=1551450113712967
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 188 file#=5 block#=301 blocks=7 obj#=351218 tim=1551450113714238
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 207 file#=5 block#=250 blocks=6 obj#=351218 tim=1551450113715393
test_p001_5967.trc:WAIT #140483692405224: nam='db file scattered read' ela= 567 file#=5 block#=146 blocks=6 obj#=351218 tim=1551450113717686
----
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 260 file#=5 block#=529 blocks=7 obj#=351219 tim=1551450113698677
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 196 file#=5 block#=522 blocks=7 obj#=351219 tim=1551450113722013
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 145 file#=5 block#=564 blocks=5 obj#=351219 tim=1551450113723239
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 317 file#=5 block#=491 blocks=5 obj#=351219 tim=1551450113724578
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 128 file#=5 block#=497 blocks=2 obj#=351219 tim=1551450113725217
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 495 file#=5 block#=402 blocks=6 obj#=351219 tim=1551450113726250
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 168 file#=5 block#=394 blocks=6 obj#=351219 tim=1551450113727928
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 249 file#=5 block#=424 blocks=7 obj#=351219 tim=1551450113730384
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 120 file#=5 block#=476 blocks=4 obj#=351219 tim=1551450113731225
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 485 file#=5 block#=481 blocks=3 obj#=351219 tim=1551450113732064
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 594 file#=5 block#=387 blocks=5 obj#=351219 tim=1551450113734168
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 323 file#=5 block#=392 blocks=2 obj#=351219 tim=1551450113735040
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 121 file#=5 block#=506 blocks=6 obj#=351219 tim=1551450113736253
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 285 file#=5 block#=440 blocks=6 obj#=351219 tim=1551450113739084
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 345 file#=5 block#=433 blocks=6 obj#=351219 tim=1551450113740940
----
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 261 file#=5 block#=550 blocks=7 obj#=351219 tim=1551450113698687
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 155 file#=5 block#=484 blocks=4 obj#=351219 tim=1551450113721738
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 131 file#=5 block#=488 blocks=3 obj#=351219 tim=1551450113722260
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 143 file#=5 block#=557 blocks=7 obj#=351219 tim=1551450113723245
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 138 file#=5 block#=454 blocks=2 obj#=351219 tim=1551450113724545
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 124 file#=5 block#=456 blocks=5 obj#=351219 tim=1551450113724954
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 243 file#=5 block#=469 blocks=3 obj#=351219 tim=1551450113725947
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 523 file#=5 block#=472 blocks=4 obj#=351219 tim=1551450113726745
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 178 file#=5 block#=409 blocks=7 obj#=351219 tim=1551450113727781
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 230 file#=5 block#=543 blocks=7 obj#=351219 tim=1551450113729470
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 488 file#=5 block#=515 blocks=7 obj#=351219 tim=1551450113731618
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 356 file#=5 block#=461 blocks=3 obj#=351219 tim=1551450113733840
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 185 file#=5 block#=465 blocks=4 obj#=351219 tim=1551450113735195
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 284 file#=5 block#=536 blocks=7 obj#=351219 tim=1551450113736172
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 204 file#=5 block#=499 blocks=5 obj#=351219 tim=1551450113737957
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 306 file#=5 block#=504 blocks=2 obj#=351219 tim=1551450113738871
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 281 file#=5 block#=446 blocks=2 obj#=351219 tim=1551450113739710
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 158 file#=5 block#=449 blocks=5 obj#=351219 tim=1551450113740159
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 99 file#=5 block#=417 blocks=7 obj#=351219 tim=1551450113741709


tim=1551450113698677 obj#=351219
tim=1551450113698687 obj#=351219
----
tim=1551450113700003 obj#=351218
tim=1551450113700076 obj#=351218
tim=1551450113701131 obj#=351218
tim=1551450113701394 obj#=351218
tim=1551450113701840 obj#=351218
tim=1551450113702026 obj#=351218
tim=1551450113703442 obj#=351218
tim=1551450113703464 obj#=351218
tim=1551450113705132 obj#=351218
tim=1551450113705168 obj#=351218
tim=1551450113705690 obj#=351218
tim=1551450113706540 obj#=351218
tim=1551450113706850 obj#=351218
tim=1551450113707925 obj#=351218
tim=1551450113708505 obj#=351218
tim=1551450113708512 obj#=351218
tim=1551450113709161 obj#=351218
tim=1551450113709705 obj#=351218
tim=1551450113710099 obj#=351218
tim=1551450113710270 obj#=351218
tim=1551450113711046 obj#=351218
tim=1551450113711398 obj#=351218
tim=1551450113711457 obj#=351218
tim=1551450113712308 obj#=351218
tim=1551450113712456 obj#=351218
tim=1551450113712967 obj#=351218
tim=1551450113713829 obj#=351218
tim=1551450113714238 obj#=351218
tim=1551450113715124 obj#=351218
tim=1551450113715393 obj#=351218
tim=1551450113717046 obj#=351218
tim=1551450113717686 obj#=351218
tim=1551450113718684 obj#=351218
tim=1551450113719321 obj#=351218
--
tim=1551450113721738 obj#=351219
tim=1551450113722013 obj#=351219
tim=1551450113722260 obj#=351219
tim=1551450113723239 obj#=351219
tim=1551450113723245 obj#=351219
tim=1551450113724545 obj#=351219
tim=1551450113724578 obj#=351219
tim=1551450113724954 obj#=351219
tim=1551450113725217 obj#=351219
tim=1551450113725947 obj#=351219
tim=1551450113726250 obj#=351219
tim=1551450113726745 obj#=351219
tim=1551450113727781 obj#=351219
tim=1551450113727928 obj#=351219
tim=1551450113729470 obj#=351219
tim=1551450113730384 obj#=351219
tim=1551450113731225 obj#=351219
tim=1551450113731618 obj#=351219
tim=1551450113732064 obj#=351219
tim=1551450113733840 obj#=351219
tim=1551450113734168 obj#=351219
tim=1551450113735040 obj#=351219
tim=1551450113735195 obj#=351219
tim=1551450113736172 obj#=351219
tim=1551450113736253 obj#=351219
tim=1551450113737957 obj#=351219
tim=1551450113738871 obj#=351219
tim=1551450113739084 obj#=351219
tim=1551450113739710 obj#=351219
tim=1551450113740159 obj#=351219
tim=1551450113740940 obj#=351219
tim=1551450113741709 obj#=351219

I’ve got four slave processes p000 through to p0003, and in the top part of the output you can see that p000/p001 scan object 351218 (after several experiments the object_id for t1 has climbed a bit), and p002/p003 scan object 351219 (t2). Sorting by the timestamps (tim=), though, we can see that there are two reads for t2 (351219) followed by many reads of t1 (351218), finishing with a load of reads of t2. Cross-checking the timestamps on the sorted output with the original extract we can see that the two initial reads of t2 are the first reads by p002 and p003 respectively.

tim=1551450113698677 obj#=351219
test_p002_5971.trc:WAIT #139968162111976: nam='db file scattered read' ela= 260 file#=5 block#=529 blocks=7 obj#=351219 tim=1551450113698677

tim=1551450113698687 obj#=351219
test_p003_5975.trc:WAIT #140601219817960: nam='db file scattered read' ela= 261 file#=5 block#=550 blocks=7 obj#=351219 tim=1551450113698687

It really does look as if at run time Oracle does the scan and build for t1 first but starts with a quick peek at t2 to check that it’s going to return some data. We can be a little more fussy with this test – let’s change some of the data in t2 so that Oracle doesn’t find any rows in the first two scattered reads and see if anything changes.

This is going to take a little fiddling but in my case I can see that the first reads by slaves p002/p003 are looking at file 5, blocks 529 – 535 and blocks 550 – 556; and broadly speaking the scattered reads are working backwards down the table – so I’m going to update a load of data that appears in the second half of the table on the next run of the test with a simple command to put the n10 column value out of range:


update t2 set n10 = 0 where n10 = 7 and id > 750;
commit;

-- test the query

update t2 set n10 = 0 where n10 = 7;
commit;

-- test the query

After the first update the pattern of reads (in time order) changed do:

  • 22 scattered reads of t2
  • 34 scattered reads of t1
  • 12 scattered reads of t2

This really does look like Oracle checking whether t2 will have any data before doing the full scan and build from t1.

Unfortunately when I did the final update that eliminated all the interesting data from t2 the pattern (in time order) turned into:

  • 34 scattered reads of t2
  • 34 scattered reads of t1

In other words – having discovered NO data in t2 that could join Oracle still scanned and built the hash table from t1. So maybe there’s a reason other than a sanity check for peeking at the second table, or maybe there’s a bug in the extreme case where the second table produces no data at all. (For reference you might also want to see an older note on how a parallel hash join doesn’t skip the probe table when the build table is empty.)

Just as a quick check that I wasn’t chasing an old bug at this point I repeated the tests on 18.3.0.0 – the same patterns emerged.

 

May 9, 2016

RI Locks

Filed under: deadlocks,Indexing,Locks,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 12:24 pm BST May 9,2016

RI = Referential Integrity: also known informally as parent/child integrity, and primary (or unique) key/foreign key checking.

I’m on a bit of a roll with things that I must have explained dozens or even hundreds of times in different environments without writing them up on my blog, so here’s a blog note I could have used to answer a question that came up over the weeked on the Oracle database forum.

What happens in the following scenario:


create table parent (
        id        number(8,0),
        constraint par_pk primary key(id)
);

create table child  (
        id_p      number(8,0) not null references parent,
        id_c      number(8,0) not null,
        constraint child_pk primary key(id_p, id_c)
)
;

--
--      session 1
--

insert into parent values(1);

--
--      session 2
--

insert into child values(1,1);

Since the parent row corresponding to the child row doesn’t seem to exist as far as session 2 is concerned you might expect session 2 to respond immediately with an error message like:

ERROR at line 1:
ORA-02291: integrity constraint (TEST_USER.SYS_C0017926) violated - parent key not found

In fact, although the end-user session is not allowed to see the uncommitted parent row, the user’s shadow process can see the uncommitted row and will wait until session 1 commits or rolls back – so if you examine v$lock for the current locks for the two sessions you’d see something like this:

  1  select  sid, type, id1, id2, lmode, request, ctime, block
  2  from    V$lock
  3  where   sid in (select sid from V$session where username = 'TEST_USER')
  4  and     type != 'AE'
  5  order by
  6*         sid, type desc
  7  /

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
         3 TX     327709      12584          6          0        283          1
           TM     143734          0          2          0        283          0
           TM     143732          0          3          0        283          0

       250 TX     589829      12877          6          0        240          0
           TX     327709      12584          0          4        240          0
           TM     143734          0          3          0        240          0
           TM     143732          0          3          0        240          0


7 rows selected.

In the above, SID 250 is session 2: it’s holding a transaction lock (TX) in mode 6 (exclusive) because it has acquired an undo segment and has generated some undo, it’s also waiting for a transaction lock in mode 4 (share) and – checking id1 and id2 – we can see that the transaction table entry it’s waiting for is held by SID 3 (session 1)  in mode 6 (and we also note that the lock held by SID 3 is marked as a blocker).

If SID 3 commits (thus releasing the transaction lock) SID 250 will continue processing the insert; if SID 3 rolls back SID 250 will raise error ORA-02291 and roll back its insert statement. (Note: if this were a multi-statement transaction it would only be the insert into child that would be rolled back; that’s another one of those details that is important but often isn’t stated explicitly, leaving people believing that the entire transaction would be rolled back.)

Updates and deletes can produce the same effects. Imagine that we have just created the two tables, and then run the following:



--
--      session 1
--

insert into parent values(1);
commit;
delete from parent where id = 1;

--
--      session 2
--

insert into child values(1,1);

Again session 2 will wait for session 1 to commit or roll back. In this case if session 1 commits session 2 will raise Oracle error ORA-02291, if session 1 rolls back session 2 will continue with the insert.

Deadlocks

Whenever you can demonstrate a way of producing a wait chain you can also manage to produce a deadlock. Consider the following (starting, again, from empty tables);


-- (1) session 1
insert into parent values(1);

-- (2) session 2
insert into parent values(2);

-- (3) session 1
insert into child values(2,2);

-- (4)session 2
insert into child values(1,1);

Session 1 will start waiting for session 2 to commit (or rollback) at step 3, then session 2 will start to wait for session 1 at step 4 – with the result that session 1 will recognise the deadlock after about three seconds and rollback its last statement, raising exception ORA-00060 and dumping a trace file. (Note: session 1 will not, as many people think, roll back the entire transaction, it will roll back only the single statement that allowed the deadlock to develop). Session 2 will still be waiting for session 1 to commit or rollback its insert into parent.

Contrary to the popular claim, Oracle will not “resolve” the deadlock, it will simply break the deadlock leaving one session waiting for the other session to respond appropriately to the deadlock error.

For reference, here’s the deadlock graph (from a 12c trace file) produced by session 1 (SID = 3) for this demo:


Deadlock graph:
                                          ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name                             process session holds waits  process session holds waits
TX-00010017-000026C7-00000000-00000000          6       3     X             33     250           S
TX-000A000D-000026F8-00000000-00000000         33     250     X              6       3           S

session 3: DID 0001-0006-00000004       session 250: DID 0001-0021-00000041
session 250: DID 0001-0021-00000041     session 3: DID 0001-0006-00000004

Rows waited on:
  Session 3: no row
  Session 250: no row

When you see a deadlock graph with TX waits of type S (share, mode 4) it’s a very good bet that the wait has something to do with indexes – which may mean referential integrity as discussed here, but may mean collisions on primary keys, and may mean something to do with simple collisions on index-organized tables. You’ll notice that the “Rows waited on:” section shows no row – unfortunately in earlier versions of Oracle you may find a spurious row entry here because the wait information from some other (block) wait has been left in the relevant columns in v$session.

January 26, 2016

Trace file size

Filed under: 12c,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 8:30 am GMT Jan 26,2016

Here’s a convenient enhancement for tracing that came up on Twitter a few days ago – first in a tweet that I retweeted, then in a question from Christian Antognini based on this bit of the 12c Oracle documentation (opens in separate tab). The question was – does it work for you ?

The new description for max_dump_file_size says that for large enough values Oracle will split the file into multiple chunks of a few megabytes, using a suffix to identify the sequence of the chunks, keeping only the first chunk and the most recent chunks. Unfortunately this doesn’t seem to be true. However, prompted by Chris’ question I ran a quick query against the full parameter list looking for parameters with the word “trace” in their name:

rem
rem     Script:         params.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 1998 (and tweaks)
rem

define m_search = '&1'

set linesize 512
set pagesize 60
set trimspool on

column  name            format a57 heading "Parameter"
column  ses_val         format a30 word wrapped heading "Session Value"
column  sys_val         format a30 word wrapped heading "System Value"
column  description     format a70 heading "Description"
 
select
        /*+
                leading(nam val val2)
                full(name)
                full(val)  use_hash(val)  no_swap_join_inputs(val)
                full(val2) use_hash(val2) no_swap_join_inputs(val2)
        */
        nam.ksppinm                             name,
        val.ksppstvl                            ses_val,
        val2.ksppstvl                           sys_val,
        nam.ksppdesc                            description,
        nam.indx+1                              numb,
        nam.ksppity                             type,
        val.ksppstdf                            is_def,
        decode(bitand(nam.ksppiflg/256,1),
                1,'True',
                  'False'
        )                                       ses_mod,
        decode(bitand(nam.ksppiflg/65536,3),
                1,'Immediate',
                2,'Deferred' ,
                3,'Immediate',
                  'False'
        )                                       sys_mod,
        decode(bitand(val.ksppstvf,7),
                1,'Modified',
                4,'System Modified',
                  'False'
        )                                       is_mod,
        decode(bitand(val.ksppstvf,2),
                2,'True',
                  'False'
        )                                       is_adj,
        val.ksppstcmnt                          notes
from
        x$ksppi         nam,
        x$ksppcv        val,
        x$ksppsv        val2
where
        nam.indx = val.indx
and     val2.indx = val.indx
and     upper(ksppinm) like upper('%&m_search.%') escape '\'
order by
        nam.ksppinm
;

Glancing through the result I spotted a couple of interesting parameters with the letters “uts” in their names, so re-ran my query looking for all the “uts” parameters, getting the following (edited) list:


NAME                           SYS_VAL         DESCRIPTION    
------------------------------ --------------- ---------------------------------------------
_diag_uts_control              0               UTS control parameter
_uts_first_segment_retain      TRUE            Should we retain the first trace segment
_uts_first_segment_size        0               Maximum size (in bytes) of first segments 
_uts_trace_disk_threshold      0               Trace disk threshold parameter
_uts_trace_segment_size        0               Maximum size (in bytes) of a trace segment
_uts_trace_segments            5               Maximum number of trace segments 

Note particularly the “first segment size” and “trace segment size” – defaulting to zero (which often means a hidden internal setting, though that doesn’t seem to be the case here, but maybe that’s what the “diag control” is for). I haven’t investigated all the effects, but after a little experimentation I found that all I needed to do to get the behaviour attributed to max_dump_file_size was to set the following two parameters – which I could do at the session level.


alter session set "_uts_first_segment_size" = 5242880;
alter session set "_uts_trace_segment_size" = 5242880;

The minimum value for these parameters is the one I’ve shown above (5120 KB) and with the default value for _uts_trace_segments you will get a maximum of 5 trace files with sequential names like the following:

ls -ltr *4901*.trc

-rw-r----- 1 oracle oinstall 5243099 Jan 26 08:15 orcl_ora_4901_1.trc
-rw-r----- 1 oracle oinstall 5243064 Jan 26 08:15 orcl_ora_4901_12.trc
-rw-r----- 1 oracle oinstall 5243058 Jan 26 08:15 orcl_ora_4901_13.trc
-rw-r----- 1 oracle oinstall 5242993 Jan 26 08:15 orcl_ora_4901_14.trc
-rw-r----- 1 oracle oinstall 1363680 Jan 26 08:15 orcl_ora_4901.trc

As you can see I’m currently generating my 15th trace, and Oracle has kept the first one and the previous three. It’s always working on a file with no suffix to its name but as soon as that file hits its limiting size (plus or minus a few bytes) it gets its appropriate suffix, the oldest file is deleted, and a new trace file without a suffix is started.

Apart from the usual header information the trace files start and end with lines like:

*** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_11.trc ***
  
*** TRACE SEGMENT RENAMED TO /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_12.trc ***

There is one little trap to watch out for: if you set either of these parameters to be larger than max_dump_file_size tracing stops as soon as one of the segments hits the max_dump_file_size and that trace file ends with the usual “overflow” message – e.g, when I changed the max_dump_file_size to 4M in mid-session:

*** DUMP FILE SIZE IS LIMITED TO 4194304 BYTES ***

In my case I had started with max_dump_file_size set to 20M, so I got lucky with my choice of 5M as the segment size.

Further investigation is left as an exercise to the interested reader.

 

May 18, 2015

Migrated rows

Filed under: Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 6:43 pm BST May 18,2015

I received an email recently describing a problem with a query which was running a full tablescan but: “almost all the waits are on ‘db file sequential read’ and the disk read is 10 times the table blocks”.  Some further information supplied was that the tablespace was using ASSM and 16KB block size; the table had 272 columns (so that means the problem is probably about chained rows, not migrated rows) and the Oracle version was 11.2.0.4.

In his researches he had read my article on wide rows, and had picked out of one of the comments the line: “the very bad thing about chained rows and direct reads that is that finding the rest of row by ‘db file sequential read’ is never cached”, but he wasn’t sure that this was the problem he was seeing so, very sensibly, he had re-run the query with extended tracing available, and dumped (and formatted/edited) a couple of blocks from the table. He then sent me the trace file and block dump. Since the problem had some relevance to one of my older blog notes I’ve written up a few notes on it.

Here’s a carefully edited subset of the block dump – showing all the pertinent information:


Start dump data blocks tsn: 99 file#:100 minblk 2513181 maxblk 2513181

Block header dump:  0x1926591d
 Object id on Block? Y
 seg/obj: 0x1652a7  csc: 0x53.891880b8  itc: 12  flg: E  typ: 1 - DATA
     brn: 1  bdba: 0x1965b70c ver: 0x01 opc: 0
     inc: 84  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0010.01d.0000dff9  0x2b442286.3469.09  C---    0  scn 0x0053.891880b1
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x04   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x05   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x06   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x07   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x08   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x09   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x0a   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x0b   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x0c   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
bdba: 0x1926591d

data_block_dump,data header at 0x11083f154
===============
tsiz: 0x1ea8
hsiz: 0x26
pbl: 0x11083f154
     76543210
flag=--------
ntab=1
nrow=10
frre=-1
fsbo=0x26
fseo=0x4c5
avsp=0x49f
tosp=0x49f
0xe:pti[0]	nrow=10	offs=0
0x12:pri[0]	offs=0x1c15
0x14:pri[1]	offs=0x197b
0x16:pri[2]	offs=0x16e1
0x18:pri[3]	offs=0x1448
0x1a:pri[4]	offs=0x11b8
0x1c:pri[5]	offs=0xf1f
0x1e:pri[6]	offs=0xc85
0x20:pri[7]	offs=0x9ec
0x22:pri[8]	offs=0x752
0x24:pri[9]	offs=0x4c5
block_row_dump:
tab 0, row 0, @0x1c15
tl: 659 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 1, @0x197b
tl: 666 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 2, @0x16e1
tl: 666 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 3, @0x1448
tl: 665 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 4, @0x11b8
tl: 656 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 5, @0xf1f
tl: 665 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 7, @0x9ec
tl: 665 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 8, @0x752
tl: 666 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 9, @0x4c5
tl: 653 fb: -----L-- lb: 0x0  cc: 255

In the ITL (Interested Transaction List) you can see 10 entries with the flag set to “C—-” (Committed) with no XID or SCN; that’s consistent with 10 rows migrating into the block in a single transaction. In the row directory you can see the block holds 10 rows, and in the body of the block you can see the header for each of those 10 rows with 255 columns (presumably the 2nd section of each row of 272 columns) with the flag byte set to “—–L–“ (the Last piece of a chained – so not just a purely migrated – row).

So the block dump is consistent with the possiblity of a direct path read of some other block (with 10 head pieces) having to read this block 10 times shortly afterwards. Can we find further corroboration in the trace file? The blockdump was for block 0x1926591d = 421943581 decimal


PARSE #4573135368:c=29,e=48,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4116693033,tim=79008343283418
EXEC #4573135368:c=53,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4116693033,tim=79008343283607
WAIT #4573135368: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=15477650 tim=79008343283636
WAIT #4573135368: nam='Disk file operations I/O' ela= 38 FileOperation=2 fileno=101 filetype=2 obj#=1462951 tim=79008343283973
WAIT #4573135368: nam='direct path read' ela= 8991 file number=100 first dba=947580 block cnt=13 obj#=1462951 tim=79008343293041

WAIT #4573135368: nam='db file sequential read' ela= 4934 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298032
WAIT #4573135368: nam='db file sequential read' ela= 155 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298216
WAIT #4573135368: nam='db file sequential read' ela= 127 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298378
WAIT #4573135368: nam='db file sequential read' ela= 125 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298526
WAIT #4573135368: nam='db file sequential read' ela= 128 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298677
WAIT #4573135368: nam='db file sequential read' ela= 123 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298826
WAIT #4573135368: nam='db file sequential read' ela= 134 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298983
WAIT #4573135368: nam='db file sequential read' ela= 129 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343299135
WAIT #4573135368: nam='db file sequential read' ela= 180 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343299341
WAIT #4573135368: nam='db file sequential read' ela= 133 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343299497

WAIT #4573135368: nam='db file sequential read' ela= 11039 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343310565
WAIT #4573135368: nam='db file sequential read' ela= 133 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343310730
WAIT #4573135368: nam='db file sequential read' ela= 139 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343310895
WAIT #4573135368: nam='db file sequential read' ela= 124 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311045
WAIT #4573135368: nam='db file sequential read' ela= 122 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311190
WAIT #4573135368: nam='db file sequential read' ela= 127 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311339
WAIT #4573135368: nam='db file sequential read' ela= 125 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311490
WAIT #4573135368: nam='db file sequential read' ela= 134 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311647
WAIT #4573135368: nam='db file sequential read' ela= 128 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311797
WAIT #4573135368: nam='db file sequential read' ela= 124 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311947

WAIT #4573135368: nam='db file sequential read' ela= 10592 file#=100 block#=2513309 blocks=1 obj#=1462951 tim=79008343322564
WAIT #4573135368: nam='db file sequential read' ela= 142 file#=100 block#=2513309 blocks=1 obj#=1462951 tim=79008343322740
WAIT #4573135368: nam='db file sequential read' ela= 126 file#=100 block#=2513309 blocks=1 obj#=1462951 tim=79008343322889

There are a couple of interesting details in this trace file.

First we note (as the OP said) there are very few direct path read waits- but direct path reads can be asynchronous with several running concurrently, which means that Oracle may report one direct path read while the data returned from others records no time. (You’ll have to take my word for the sparseness of direct path reads – there were 5 reading a total of 58 blocks from the object, compared to 50,000 db file sequential reads)

Then you can see that although each block that was subject to a “db file sequential read” wait is reported 10 times, the first read is much slower than the subsequent ones – a fairly good indication that the later reads are coming from a cache somewhere. (The 50,000 reads consisted of roughly 3,300 blocks being read 10 times, 1,400 blocks being read 9 times, 460 blocks being read 8 times, and a few blocks being read 7 or fewer times.)

You might also notice that the “coincidental” jump of 64 blocks between the sets of 10 reads – this appears fairly frequently, and it’s the type of pattern you might expect to see when a serial process is allocating blocks for use in a clean ASSM tablespace after the extent sizes have become fairly large (possibly the 64MB size that eventually appears with system managed extent sizes). There’s a “pseudo-random” choice of block within extent dicated by the process id, that spreads the work done by a single process steadily through the extent. Having filled 2513181, 2513245, 2513309 and so on for 16 steps the trace file comes back to 2513182, 2513246, 2513309 and so on.

It’s interesting (and time-consuming) to check the patterns but what we really need next, and don’t have, to check the theory is the set of 13 blocks dictated by the first direct path read:

WAIT #4573135368: nam='direct path read' ela= 8991 file number=100 first dba=947580 block cnt=13 obj#=1462951 tim=79008343293041

It’s likely that somewhere in the 13 blocks in the range 947580 onwards we would find the 10 row head pieces pointing to block 2513181; then the 10 row head pieces pointing to block 2513245, and so on – and I’d hope that we might see a pattern of many consecutive (or near-consecutive) rows in each originating block pointing to the same “next block”. In fact, with a few blocks in the early range, we might even get some idea of how the application was loading and updating data and be able to make some suggestions for changing the strategy to avoid row chaining.

Footnote

The OP also had a follow-up question which was: “One question for the block dump is why there is no hrid in it since the row pieces are the second row pieces and the flag bit is ‘—–L–‘?”  It would be nice to see this, of course – then we wouldn’t need to see the 947580-947592 range to see what had been happening to the data – but that’s not the way Oracle works, as I’ve pointed out above; but since the answer was in another posting of mine I simply emailed the relevant URL to the OP.

May 22, 2014

sql_trace

Filed under: Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:24 pm BST May 22,2014

Here’s a convenient aid to trouble-shooting that appeared in 11g with its enhancements to setting events. It’s a feature that may help you to work out (among other things) why a given statement seems to have a highly variable performance profile. If you can find the SQL_ID for a parent cursor you can enable tracing for just that cursor whenever it executes, whoever executes it.

rem
rem     Script:         sql_trace_11g.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2012
rem
rem     Last tested:
rem           19.11.0.0
rem           11.2.0.4
rem

define m_sql_id = '&1'
define m_sql_id = '9tz4qu4rj9rdp'

alter system set events
        '
        sql_trace[SQL: &m_sql_id ]
/*
        plan_stat=all_executions,
        wait=false,
        bind=false
*/
        '
;

pause Press return to stop tracing

alter system set events
        '
        sql_trace[SQL: &m_sql_id ]
        off
        '
;


This is the whole of a little script I’ve got – the generic &1 is how I normally use it, I’ve just included a specific value (which is the SQL_ID for “select count(*) from all_objects;” as an example) that starts tracing across the entire system, but only for a given SQL_ID. On a production system, if I think I really need to do this, I would check the expected frequency of execution for the statement and wait enough time to capture a few occurrences before disabling the trace. Each session generates its own trace file, but if you’ve been sufficiently patient you get a reasonable sampling of the different workloads – and if you’ve captured the bind variables as well, this may give you some clues about why different work loads can appear.

A nice detail about this feature is that if the SQL_ID is for a pl/sql block, all the SQL executing inside the block is traced as part of the trace on the block (and that was particularly helpful the last time I had to make use of the feature) and that makes it easy to see the effects of SQL statements calling PL/SQL functions that contain SQL statements. In the sample I’ve given I also found that some recursive SQL – relating to the xmlschema object types defined in the view – was also traced as the main statement was traced …

NOTE: the pingback in comment #1 will lead you to a note on how to use a similar method to force Oracle to dump the CBO (10053) trace file for a given SQL_ID every time it is hard parsed.

Update for 12c [Dec 2017]

Another little detail (which I hadn’t considered until a posting on Oracle forums today reminded me of the possibility) if you enable this trace for a statement the resulting trace file will also include the dynamic sampling SQL used by the optimizer in 12c as it tries to optimize the statement.  That’s a real boon if you’re trying to answer the question: “Why does it take so long to get going the first time I run this query?” when all the usual answers don’t apply.

Update Aug 2021

A question on the Oracle developers’ forum about getting the execution plans for all the SQL called by a table() operation reminded me that this is (usually) just a special case of tracing a pl/sql function. For example:


select * from table(dbms_xplan.display_cursor(format => 'outline'));

If you enable tracing for this statement (its SQL_ID is ’04j57fnt6uzg5′) then the trace file will contain all the SQL executed by the display_cursor() function of the dbms_xplan package whenever anyone executes the statement.

Footnote

You can use similar syntax to enable tracing in your own session while limiting the content of the trace file to just the specific SQL_ID you’re particularly interested in. Just change “system” to “session”, e.g.

create table t1 as select * from all_objects where rownum <= 50000;

alter session set events
        '
        sql_trace[SQL:g2y0pjcy1hnqr]
        plan_stat=all_executions,
        wait=false,
        bind=true
        '
;

declare 
        n1 number;
begin
        for r in (select distinct owner from t1) loop

                select  count(*) 
                into    n1
                from    t1 
                where   owner = r.owner
                ;

                dbms_output.put_line(r.owner || chr(9) || n1);

        end loop;
end;
/

alter session set events
        '
        sql_trace[SQL:g2y0pjcy1hnqr]
        off
        '
;

The trace file contains nothing but the details about executing the “select count(*) …” query, and the output from tkprof for the trace file shows:

SQL ID: g2y0pjcy1hnqr Plan Hash: 2432955788

SELECT COUNT(*)
FROM
 T1 WHERE OWNER = :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute     13      0.00       0.00          0          0          0           0
Fetch       13      0.00       0.00          0        134          0          13
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       26      0.00       0.00          0        134          0          13

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 112     (recursive depth: 1)
Number of plan statistics captured: 13

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=10 pr=0 pw=0 time=502 us starts=1)
         6       3846      42034   INDEX RANGE SCAN T1_I1 (cr=10 pr=0 pw=0 time=732 us starts=1 cost=1 size=30 card=6)(object id 168462)

Note how the highlighted line says there are 13 sets of plan statistics captured (one for each execution), and the Rows (1st) / (avg) / (max) are not identical for every line of the plan. The stats in the Row Source Operation column come from the first execution, so if you want to see the effects of the varying :B1 bind variable you have to examine the raw trace file – or play games with awk et. al.

July 12, 2013

Wrong Index 2

Filed under: CBO,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 5:17 pm BST Jul 12,2013

A couple of days ago I wrote an article about Oracle picking the “wrong index” after an index rebuild, and I mentioned that the sample data I had generated looked a little odd because it came from a script I had been using to investigate a completely different problem. This note describes that other problem, which appeared on the Oracle-L mailing list last month.

Stripped to a bare minimum, here’s the issue: we have a simple query against a single table with two indexes IDX2_AUFTRAG(arsAuftragsNr, dategAuftragsNr) and IDX7_AUFTRAG(arsAuftragsNr), and a predicate “arsAuftragsNr = {constant}”. Since the second column in the two-column index is irrelevant (we can’t use it to avoid visiting the table, and it’s not part of a group by or order by clause), and since adding a column to an index is likely to increase the clustering_factor and leaf block count of the index, we would probably expect to see Oracle pick the single column index as the path to the table – but it doesn’t, it picks the two-column index.

(more…)

July 10, 2013

12c First N

Filed under: 12c,Oracle,trace files,Tuning — Jonathan Lewis @ 5:59 pm BST Jul 10,2013

There have been a couple of nice posts about the “Top N” (or Fetch First N / Next N)  syntax that has appeared in 12c, here and here for example. I particularly like the first set of examples because they include some execution plans that give you a good idea of what’s  going on under the covers. “Under the covers” is important, because if you don’t actually have a large data set to test on you might not realise what impact a “Fetch First” query might have on a full-size production data set.

(more…)

July 7, 2013

Cursor Sharing

Filed under: CBO,Oracle,Statistics,trace files — Jonathan Lewis @ 5:49 pm BST Jul 7,2013

Here’s a couple of extracts from a trace file after I’ve set optimizer_dynamic_sampling to level 3. I’ve run two, very similar, SQL statements that both require dynamic sampling according to the rules for the parameter – but take a look at the different ways that sampling has happened, and ask yourself what’s going on:

(more…)

June 23, 2013

Index Hints

Filed under: CBO,Hints,Indexing,Oracle,trace files — Jonathan Lewis @ 6:04 pm BST Jun 23,2013

In my last post I made a comment about how the optimizer will use the new format of the index hint to identify an index that is an exact match if it can, and any index that starts with the same columns (in the right order) if it can’t find an exact match. It’s fairly easy to demonstrate the behaviour in 11g by examining the 10053 (CBO) trace file generated by a simple, single table, query – in fact, this is probably a case that Doug Burns might want to cite as an example of how, sometimes, the 10053 is easy to interpret (in little patches):

(more…)

August 30, 2012

Surprises

Filed under: Bugs,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 5:55 pm BST Aug 30,2012

I’ve given examples in the past of how you can be suprised by Oracle when a mechanism that has “always worked” in the past suddenly just doesn’t work because some unexpected internal coding mechanism hits a boundary condition. One example I gave of this was rebuilding indexes online – where a limitation on the key size of index organized tables made it impossible to do an online rebuild of an index on a heap table because of an ORA-01450 (maximum key length) error that was raised against the (index-organized) journal table that  Oracle creates internally to support the rebuild.

Slightly more recently the topic of the sys_op_map_nonnull() function came up in some comments about the ora_hash function – and I made a passing comment about the function appearing in materialized view refreshes and then never followed it up. Fortunately I got an email from Jared Still and Don Seiler a few days ago which contained a nice short demonstration of the problem so, with their permission, I’ve published it below.
(more…)

Next Page »

Website Powered by WordPress.com.