Oracle Scratchpad

June 20, 2022

ORA-29857

Filed under: Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 3:30 pm BST Jun 20,2022

This is another little “case study” on solving a problem. It’s based on a question on the MOSC DBA Forum (needs an account) about trying to drop a tablespace and seeing the error “ORA-29857: domain indexes and/or secondary objects exist in the tablespace” when (apparently) there are no domain indexes or secondary objects in the tablespace according to the queries that have been used to try and locate the problem objects.

Not being completely familiar with all the possibilities that might be relevant to that particular error, but having a vague memory of seeing an oddity once before, I thought I’d do a quick check on a couple of possible causes.

  • Case 1, which I assumed ought to raise the error because the tablespace held some objects that met the requirements for the error: I created a table in a tablespace called users and, for the domain indexes a couple of Context (Oracle Text) indexes in a tablespace called test_8k_assm. Then I tried to drop the tablespace holding the context indexes. Here’s the relevant cut-n-paste (19.11.0.0):
SQL> drop tablespace test_8k_assm including contents and datafiles;
drop tablespace test_8k_assm including contents and datafiles
*
ERROR at line 1:
ORA-29857: domain indexes and/or secondary objects exist in the tablespace

  • Case 2, which I thought would drop the tablespace because the model didn’t meet the requirement of the error message: I swapped the locations of the table and its indexes, creating the table in the tablespace called test_8k_assm and the Context indexes in the tablespace called users then I tried to drop the tablespace holding only the table. Here’s the relevant cut-n-paste (19.11.0.0):
SQL> drop tablespace test_8k_assm including contents and datafiles;
drop tablespace test_8k_assm including contents and datafiles
*
ERROR at line 1:
ORA-29857: domain indexes and/or secondary objects exist in the tablespace

So, the error message is misleading and, possibly, this might explain why the queries the OP had posted weren’t reporting any guilty objects. Whatever the error message says, the domain indexes that you need to deal with may not be in the tablespace you want to drop.

I also have a question-mark over “secondary objects” – I don’t know what objects (other than the bits and pieces that make up domain indexes) are deemed to be secondary.

Investigating

It’s at this point that the purpose of the blog note starts to appear – I don’t know the answer to the OP’s question and I’ve discovered a gap in my knowledge so I’m willing to spend a little time extending my knowledge and trying to give a little help to the OP.

We can start with the view dba_secondary_objects (based on secobj$ from script dcore.bsq) which tells us that there are two possible sources of secondary objects – index types and statistics types:

SQL> select dbms_metadata.get_ddl('VIEW','DBA_SECONDARY_OBJECTS') view_text from dual;

-- with a little cosmetic tidying:

VIEW_TEXT
-------------------------------------------------------------------------------------------
CREATE OR REPLACE FORCE NONEDITIONABLE VIEW "SYS"."DBA_SECONDARY_OBJECTS" ("INDEX_OWNER", "INDEX_NAME", "SECONDARY_OBJECT_OWNER", "SECONDARY_OBJECT_NAME", "SECONDARY_OBJDATA_TYPE")
AS
select u.name, o.name, u1.name, o1.name, 
       decode(s.spare1, 0, 'FROM INDEXTYPE',1, 'FROM STATISTICS TYPE')
from   sys.user$ u, sys.obj$ o, sys.user$ u1, sys.obj$ o1, sys.secobj$ s
where  s.obj# = o.obj# and o.owner# = u.user# and
       s.secobj# = o1.obj#  and  o1.owner# = u1.user#;

With this view definition I’d be tempted to do a little drilling down, with queries that return increasing amounts of data, and then pick subsets to expand to higher precision, e.g:

select  secondary_objdata_type, count(*)
from    dba_secondary_objects
group by
        secondary_objdata_type
/

Are there any secondary objects that aren’t domain indexes – if there are a few then query the view for just those to find out what they are and who owns them.

select  *
from    dba_secondary_objects
where   secondary_objdata_type = 'FROM STATISTICS TYPE'
/

In fact if the total number of secondary objects was small I might report the details about all of them and try to track them down, or if there was only one schema using that tablespace I might restrict my queries to the subset of secondary objects owned by that schema:

select  distinct index_owner, index_name
from    dba_secondary_objects
where   secondary_object_owner = 'TEST_USER'
;

The OP had a rather larger task than my little model – there were 14,499 segments in the tablespace, and a query for (segment_owner, segment_type, count(*)) returned 332 rows – so had taken a similar approach to mine to try and minimise the amount of work needed, starting with the following queries (which I’ve enhanced slightly and given a cosmetic makeover):

prompt  ==================================
prompt  Domain indexes (in the tablespace)
prompt  ==================================

select  owner, index_name, table_owner, table_name, tablespace_name
from    dba_indexes 
where   index_type = 'DOMAIN'
-- and  tablespace_name = 'TEST_8K_ASSM'        -- error: domain indexes don't have tablespaces
;

prompt  ==========================================
prompt  Domain indexes on tables in the tablespace
prompt  ==========================================

select  t.table_name, t.tablespace_name, i.index_name, i.tablespace_name
from    dba_indexes i, dba_tables t
where   t.table_name = i.table_name
and     t.owner = i.owner
and     i.index_type = 'DOMAIN'
and     t.tablespace_name = 'TEST_8K_ASSM'
;

prompt  ===================================================
prompt  Secondary objects that are tables in the tablespace
prompt  ===================================================

select  index_owner, index_name, secondary_object_owner, secondary_object_name, secondary_objdata_type
from    dba_secondary_objects
where   secondary_object_name in (
                select table_name 
                from dba_tables 
                where tablespace_name = 'TEST_8K_ASSM'
        )
;

prompt  ======================================================
prompt  Indexes in the tablespace that are secondary objects ?
prompt  ======================================================

select  index_owner, index_name, secondary_object_owner, secondary_object_name, secondary_objdata_type
from    dba_secondary_objects
where   index_name in (
                select index_name 
                from dba_indexes 
                where tablespace_name = 'TEST_8K_ASSM'
        )
;

The first query had an error – domain indexes do not report a tablespace_name in dba_indexes, so this query inevitably returned no rows.

The second query would report any domain indexes on tables that were in the given tablespace – which would be useful for the simple case where everything was created in the user’s default tablespace.

The third query reports any tables in the given tablespace that are the physical tables instantiating the “logical” domain index – which is particularly important for my example of the indexes being in the given tablespace when the table itself was in some other tablespace.

The final query is trying to report domain indexes that have corresponding physical objects in the given tablespace – but this isn’t going to help (at least with my Context indexes) because the index_name in dba_secondary_indexes will be a name that appears in dba_indexes as index_type = ‘DOMAIN’ and won’t have a value stored in the tablespace_name column.

Here’s the output I get (preceded by a full report of the view dba_secondary_objects in my otherwise empty little database) when I had the table in test_8k_assm and the indexes in tablespace users.

SQL> select * from dba_secondary_objects;

INDEX_OWNER     INDEX_NAME           SECONDARY_OBJECT_OWNER           SECONDARY_OBJECT_NAME            SECONDARY_OBJDATA_TY
--------------- -------------------- -------------------------------- -------------------------------- --------------------
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$I                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$K                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$N                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$U                  FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$I                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$K                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$N                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$U                FROM INDEXTYPE

8 rows selected.

==================================
Domain indexes (in the tablespace)
==================================

OWNER           INDEX_NAME           TABLE_OWNER     TABLE_NAME                TABLESPACE_NAME
--------------- -------------------- --------------- ------------------------- ------------------------------
TEST_USER       SH_IX_NAME           TEST_USER       SCRIPT_HANDLES
TEST_USER       SH_IX_HANDLE         TEST_USER       SCRIPT_HANDLES

2 rows selected.

==========================================
Domain indexes on tables in the tablespace
==========================================

TABLE_NAME                TABLESPACE_NAME                INDEX_NAME           TABLESPACE_NAME
------------------------- ------------------------------ -------------------- ------------------------------
SCRIPT_HANDLES            TEST_8K_ASSM                   SH_IX_NAME
SCRIPT_HANDLES            TEST_8K_ASSM                   SH_IX_HANDLE

2 rows selected.

===================================================
Secondary objects that are tables in the tablespace
===================================================

no rows selected

=====================================================
Indexes in the tablespace that have secondary objects
=====================================================

no rows selected

As you can see I have two Context indexes, each represented by 4 secondary tables (and that’s not the complete story because those tables have their own indexes and a couple of them have LOB columns, and one of the secondary tables is actually an index-organized table).

And this is the result when the base table is in tablespace users and the indexes are in test_8k_assm:

==================================
Domain indexes (in the tablespace)
==================================

OWNER           INDEX_NAME           TABLE_OWNER     TABLE_NAME                TABLESPACE_NAME
--------------- -------------------- --------------- ------------------------- ------------------------------
TEST_USER       SH_IX_NAME           TEST_USER       SCRIPT_HANDLES
TEST_USER       SH_IX_HANDLE         TEST_USER       SCRIPT_HANDLES

2 rows selected

==========================================
Domain indexes on tables in the tablespace
==========================================

no rows selected

===================================================
Secondary objects that are tables in the tablespace
===================================================

INDEX_OWNER     INDEX_NAME           SECONDARY_OBJECT_OWNER           SECONDARY_OBJECT_NAME            SECONDARY_OBJDATA_TY
--------------- -------------------- -------------------------------- -------------------------------- --------------------
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$I                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$K                  FROM INDEXTYPE
TEST_USER       SH_IX_NAME           TEST_USER                        DR$SH_IX_NAME$U                  FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$I                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$K                FROM INDEXTYPE
TEST_USER       SH_IX_HANDLE         TEST_USER                        DR$SH_IX_HANDLE$U                FROM INDEXTYPE

6 rows selected

=====================================================
Indexes in the tablespace that have secondary objects
=====================================================

no rows selected

So one of the middle two queries ought to return some rows if there are any Context indexes in the tablespace or on tables in the tablespace. (And for the final option – i.e. the table and its Context indexes being in the tablespace – both queries will return rows). The OP got no rows returned on all 4 queries.

What next?

It looks like the OP must have some secondary objects that aren’t playing by the rules if they’re “from index types”, or there are some secondary objects that are “from statistics types” and don’t get picked up by the queries. There is a view dba_ustat for “user stats types” which has interesting column names, but at this point I decided to follow a strategy that I’ve used many times in the past when Oracle raises an unexpected error on a repeatable process: repeat the process but enable SQL trace and see what Oracle was doing just before it raised the error.

Here’s the critical line in the trace file I produced; with a couple of the precding lines:

CLOSE #140708355472384:c=1860,e=1512,dep=1,type=0,tim=31701605233
EXEC #140708354845632:c=68018,e=68006,p=0,cr=1276,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=31701605472
ERROR #140708354845632:err=29857 tim=31701605485

A search backwards up the trace file showed that cursor #140708354845632 was for the “drop tablespace” statement. Just above the ERROR and EXEC lines we see cursor #140708355472384 closing. It’s reasonably likely (though not 100% certain) that this is the cursor representing the last SQL statement that Oracle executed before raising the error so searching backwards for “PARSING IN CURSOR #140708355472384” was my next step (and if that statement didn’t look promising I’d have search backwards from the ERROR line to find the cursor number from the most recent “EXEC” line). In this case (with my data) the CLOSE took me to the following (cosmetically cleaned) SQL:

select null from  (
             select o.obj# from obj$ o, tab$ t      where t.ts#=:1 and t.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, tabpart$ tp      where tp.ts#=:1 and tp.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, tabsubpart$ tsp      where tsp.ts#=:1 and tsp.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, ind$ i      where i.ts#=:1 and i.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, indpart$ ip      where ip.ts#=:1 and ip.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, indsubpart$ isp      where isp.ts#=:1 and isp.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select o.obj# from obj$ o, clu$ c      where c.ts#=:1 and c.obj#=o.obj# and            (decode(bitand(o.flags, 16), 16, 1, 0)=1)    
union all    select i.obj# from ind$ i, tab$ t, tabpart$ tp      where tp.ts#=:1 and tp.bo# = t.obj# and t.obj#=i.bo# and i.type#=9    
union all    select i.obj# from ind$ i, tab$ t, tabcompart$ tcp,tabsubpart$ tsp      where tsp.ts#=:1 and tsp.pobj#=tcp.obj# and tcp.bo#=t.obj# and            t.obj#=i.bo# and i.type#=9    
union all    select i.obj# from ind$ i, tab$ t      where t.ts#=:1 and t.obj#=i.bo# and i.type#=9
)

Every single line of the UNION ALL is addressing the same tablespace by number (:1) looking for objects where bit 5 is set in the obj$.flags column (i.e. secondary object) or indexes where the ind$.type# is 9 (“cooperative index method”).

That looks like a good bet for being the code that Oracle uses to check if there are any objects that should block the drop, so I changed the bind variable to the tablespace number (from v$tablespace) changed the “select null” to “select obj#” and stuck the resulting query block into an IN subquery against dba_objects (this may not be the most efficient of queries):

select 
        secondary, owner, object_name, object_type, object_id
from    dba_objects
where   object_id in (
                select obj# from  (
                                  select o.obj# from obj$ o, tab$ t where t.ts#=6 and t.obj#=o.obj# and (decode(bitand(o.flags, 16), 16, 1, 0)=1)
                        union all ...
                        union all select i.obj# from ind$ i, tab$ t where t.ts#=6 and t.obj#=i.bo# and i.type#=9
                )
        )
order by secondary, owner, object_name
/

S OWNER           OBJECT_NAME                      OBJECT_TYPE              OBJECT_ID
- --------------- -------------------------------- ----------------------- ----------
N TEST_USER       SH_IX_HANDLE                     INDEX                       123956
N TEST_USER       SH_IX_NAME                       INDEX                       123944
Y TEST_USER       DR$SH_IX_HANDLE$I                TABLE                       123957
Y TEST_USER       DR$SH_IX_HANDLE$K                TABLE                       123960
Y TEST_USER       DR$SH_IX_HANDLE$KD               INDEX                       123966
Y TEST_USER       DR$SH_IX_HANDLE$KR               INDEX                       123967
Y TEST_USER       DR$SH_IX_HANDLE$U                TABLE                       123963
Y TEST_USER       DR$SH_IX_HANDLE$X                INDEX                       123965
Y TEST_USER       DR$SH_IX_NAME$I                  TABLE                       123945
Y TEST_USER       DR$SH_IX_NAME$K                  TABLE                       123948
Y TEST_USER       DR$SH_IX_NAME$KD                 INDEX                       123954
Y TEST_USER       DR$SH_IX_NAME$KR                 INDEX                       123955
Y TEST_USER       DR$SH_IX_NAME$U                  TABLE                       123951
Y TEST_USER       DR$SH_IX_NAME$X                  INDEX                       123953
Y TEST_USER       SYS_C0016528                     INDEX                       123952
Y TEST_USER       SYS_C0016536                     INDEX                       123964
Y TEST_USER       SYS_IL0000123945C00006$$         INDEX                       123947
Y TEST_USER       SYS_IL0000123957C00006$$         INDEX                       123959
Y TEST_USER       SYS_IOT_TOP_123949               INDEX                       123950
Y TEST_USER       SYS_IOT_TOP_123961               INDEX                       123962

20 rows selected.

This is the list I got when the base table and the Context indexes were all created in the tablespace I wanted to drop – notice that the “logical” indexes (the first two items, flagged as secondary=’N’) are reported along with all the physical objects relating to the context indexes.

When I created the indexes in the target tablespace but the base table in an alternative tablespace the query reported only the physical objects making up the Context indexes; when I created the base table in the target tablespace and the Context indexes in an alternative tablespace the query reported only the “logical” indexes.

Conclusion

As a last resort, when the simple queries you find in the manuals or on MOS (or elsewhere on the internet) don’t seem to supply all the answers the strategy of finding and hacking the query that Oracle runs just before raising the error is a good generic method for getting a better handle on what’s causing the problem.

In this example of Oracle error ORA-29758 you can get a list of the objects (logical or physical or both) that are blocking a “drop tablespace” command and that makes it easier to find out what else you need to drop before you can drop the tablespace.

Bear in mind that I’ve just demonstrated the approach while experimenting with a couple of Context indexes, where the information I needed was available through a couple of ordinary data dictionary views – I only dived down into the internals because the data dictionary views weren’t giving the OP a complete answer.

I don’t know what was in the OP’s database to make this problem happen (it may simply be that the database was version 11.2.0.4 while mine was 19.11.0.0 and there’s a bug that’s been fixed since 11g), so the OP may find that after they’ve cleared all domain indexes indicated by this query they may still see ORA-29758 and find that there’s another query that Oracle uses to check for other conditions (maybe those “from statistics type” secondary objects) that also needs to be hijacked and subverted to complete the job. I know that when I traced a successful “drop tablespace” there were 13 more statements after the one I’ve shown that looked as if they might be checking for the (non-)existence of other types of object before the tablespace could be dropped.

Update

The OP has reported back that following this strategy they’ve chased the problem through (for their system) three further sets of objects, with the following error messages:

ORA-23515: materialized views and/or their indices exist in the tablespace

ORA-25183: index-organized table top index segment is in a different tablespace

ORA-02449: unique/primary keys in table referenced by foreign keys

June 17, 2022

Case Study

Filed under: Execution plans,Oracle,Parallel Execution,Performance,Problem Solving,Troubleshooting,Tuning — Jonathan Lewis @ 1:25 pm BST Jun 17,2022

The question “How do you trouble-shoot a performance problem” came up in an online session I did for the AIOUG (All-India OUG) today. It’s a very broad question and the only possible answers are either extremely generic, or very specific – so here’s a specific example that I talked about to give some idea of the processes I go through. It’s an example from the Oracle-l list server asking the (paraphrased) question:

I’ve got a parallel query that take 24 seconds to return 2,500 rows for 2018. .The same query for 2019 should return a similar amount of data but consumes a lot of TEMP space before failing; it takes 45 minutes to complete if I remove the parallel hint. The only difference between the two queries is a change to a single predicate: “opclf.year_number = to_number(‘YYYY’)” and the statistics are up to date – what’s going on?

The ease of diagnosing a problem is dependent on the ease of access to all the right information, and you don’t always know initially what the right information might be. In this case the OP had started by posting to github the SQL Monitor reports for the two queries, which were automatically generated since the queries were using parallel execution.

Click here to show/hide the first few sections for the 2019 output
SQL Text
------------------------------
SELECT /*+ PARALLEL(8) */ DISTINCT rd.document_id AS doc_id ,'firm_id' AS criteria_key ,opf.ultimate_parent_firm_id AS series_id ,period_number FROM ( SELECT /*+ PARALLEL(8) */ DISTINCT rd.document_id ,rd.client_role_id ,( CASE WHEN MONTHS_BETWEEN(TO_DATE('04/28/2022', 'MM/DD/YYYY'), TO_DATE('04/01/2017', 'MM/DD/YYYY')) > 12 THEN TRUNC(dc.date_value, 'YEAR') ELSE TRUNC(dc.date_value, 'MONTH') END ) period_number FROM REPORT_ENGINE.date_code dc ,REPORT_ENGINE.lit_fact_bd rd INNER JOIN
report_engine.firm FIRM ON rd.firm_id = FIRM.firm_id WHERE dc.date_value BETWEEN TO_DATE('04/01/2017', 'MM/DD/YYYY') AND TO_DATE('04/28/2022', 'MM/DD/YYYY') AND rd.publication_date_id = dc.date_id AND rd.year_number = to_number('2019') AND (FIRM.ultimate_parent_firm_id IN (to_number('252094'))) ) rd INNER JOIN report_engine.opposing_counsel op ON rd.client_role_id = op.legal_role_id INNER JOIN report_engine.lit_fact_bd opclf ON opclf.document_id = rd.document_id AND op.opposing_counsel_role_id
= opclf.client_role_id AND opclf.year_number = to_number('2019') INNER JOIN report_engine.firm opf ON opclf.firm_id = opf.firm_id AND opf.firm_id >= 1000 WHERE 1 = 1

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  SYS (647:28741)
 SQL ID              :  3cjg20q2vw41f
 SQL Execution ID    :  16777216
 Execution Started   :  06/09/2022 05:08:24
 First Refresh Time  :  06/09/2022 05:08:25
 Last Refresh Time   :  06/09/2022 05:08:27
 Duration            :  221s
 Module/Action       :  sqlplus@c111dhw (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  sqlplus@c111dhw (TNS V1-V3)

Global Stats
====================================================================
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Write | Write |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes |
====================================================================
|     222 |     179 |       28 |       15 |    49M | 11624 |   2GB |
====================================================================

Parallel Execution Details (DOP=8 , Servers Allocated=16)
=====================================================================================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |  Other   | Buffer | Write | Write |         Wait Events         |
|                |       |         | Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes |         (sample #)          |
=====================================================================================================================================
| PX Coordinator | QC    |         |    0.26 |    0.26 |          |     0.00 |  12144 |       |     . |                             |
| p000           | Set 1 |       1 |         |         |          |          |        |       |     . |                             |
| p001           | Set 1 |       2 |         |         |          |          |        |       |     . |                             |
| p002           | Set 1 |       3 |         |         |          |          |        |       |     . |                             |
| p003           | Set 1 |       4 |         |         |          |          |        |       |     . |                             |
| p004           | Set 1 |       5 |         |         |          |          |        |       |     . |                             |
| p005           | Set 1 |       6 |         |         |          |          |        |       |     . |                             |
| p006           | Set 1 |       7 |         |         |          |          |        |       |     . |                             |
| p007           | Set 1 |       8 |         |         |          |          |        |       |     . |                             |
| p008           | Set 2 |       1 |     220 |     177 |       28 |       15 |    48M | 11624 |   2GB | direct path write temp (28) |
| p009           | Set 2 |       2 |         |         |          |          |        |       |     . |                             |
| p010           | Set 2 |       3 |         |         |          |          |        |       |     . |                             |
| p011           | Set 2 |       4 |    1.71 |    1.70 |          |     0.01 |   595K |       |     . |                             |
| p012           | Set 2 |       5 |         |         |          |          |        |       |     . |                             |
| p013           | Set 2 |       6 |         |         |          |          |        |       |     . |                             |
| p014           | Set 2 |       7 |         |         |          |          |        |       |     . |                             |
| p015           | Set 2 |       8 |         |         |          |          |        |       |     . |                             |
=====================================================================================================================================

You’ll note that I haven’t got as far as the body of the execution plan yet, and I’ve highlighted line 44 – a line in the middle of the summary of activity for the parallel execution processes. There are 8 servers in each of two sets (we’re running parallel 8) and the line I’ve highlighted is the first server of the second set. The reason I’ve highlighted it is that it’s the one server that’s doing almost all the work – effectively the query (at some point in the plan) is running serially.

So, a first thought, maybe we’ve just been unlucky and running parallel 7 (say) would result in a better distribution of data across parallel servers and allow each of 7 processes to do whatever they had to do to a much smaller amount of data. Maybe a change to the distribution method (pq_distribute() hint) would spread the workload more evenly. In either case “N” smaller workload might still use a lot of TEMP, but possibly no individual process would fail, and the job would complete nearly N times as quickly.

Note: the 2018 Monitor report showed an equivalent skew in the data distribution, but the amount of work needed was much smaller with a read/write load of only 143MB compared to 2GB for the 2019 report. The OP did try running parallel 7, but with no change to the overall effect.

Let’s take a quick glance at the plan body (click to view/hide)
SQL Plan Monitoring Details (Plan Hash Value=1862006233)
=========================================================================================================================================================================================================================
| Id    |                     Operation                      |           Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Write | Write | Mem | Temp | Activity |        Activity Detail       |
|       |                                                    |                           | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes |     |      |   (%)     |         (# samples)         |
=========================================================================================================================================================================================================================
|     0 | SELECT STATEMENT                                   |                           |         |      |           |        |     3 |          |       |       |     |      |   |                             |
|     1 |   PX COORDINATOR                                   |                           |         |      |           |        |     3 |          |       |       |     |      |   |                             |
|     2 |    PX SEND QC (RANDOM)                             | :TQ10003                  |     781 |  153 |           |        |       |          |       |       |     |      |   |                             |
|     3 |     HASH UNIQUE                                    |                           |     781 |  153 |           |        |       |          |       |       |     |      |   |                             |
|     4 |      PX RECEIVE                                    |                           |     781 |  151 |           |        |       |          |       |       |     |      |   |                             |
|     5 |       PX SEND HASH                                 | :TQ10002                  |     781 |  151 |           |        |     2 |          |       |       |     |      |   |                             |
|  -> 6 |        BUFFER SORT                                 |                           |     781 |  153 |       219 |     +3 |     2 |        0 | 11624 |   2GB | 61M |   2G |    26.21 | Cpu (26)                     |
|       |                                                    |                           |         |      |           |        |       |          |       |       |     |      |   | direct path write temp (28) |
|  -> 7 |         NESTED LOOPS                               |                           |     781 |  151 |       219 |     +3 |     2 |      39M |       |       |     |      |   |                             |
|  -> 8 |          NESTED LOOPS                              |                           |     781 |  151 |       219 |     +3 |     2 |      39M |       |       |     |      |     0.49 | Cpu (1)                      |
|  -> 9 |           NESTED LOOPS                             |                           |     777 |   44 |       219 |     +3 |     2 |      39M |       |       |     |      |   |                             |
| -> 10 |            NESTED LOOPS                            |                           |      41 |   26 |       219 |     +3 |     2 |     6463 |       |       |     |      |   |                             |
| -> 11 |             HASH JOIN                              |                           |      41 |   21 |       219 |     +3 |     2 |     6463 |       |       |  6M |      |   |                             |
|    12 |              BUFFER SORT                           |                           |         |      |         1 |     +3 |     2 |    36855 |       |       |     |      |   |                             |
|    13 |               PX RECEIVE                           |                           |      87 |   19 |         1 |     +3 |     2 |    36855 |       |       |     |      |   |                             |
|    14 |                PX SEND HASH                        | :TQ10000                  |      87 |   19 |         1 |     +3 |     1 |    38694 |       |       |     |      |   |                             |
|    15 |                 NESTED LOOPS                       |                           |      87 |   19 |         1 |     +3 |     1 |    38694 |       |       |     |      |   |                             |
|    16 |                  NESTED LOOPS                      |                           |      87 |   19 |         1 |     +3 |     1 |    38694 |       |       |     |      |   |                             |
|    17 |                   TABLE ACCESS BY INDEX ROWID      | FIRM                      |       1 |    2 |         1 |     +3 |     1 |       43 |       |       |     |      |   |                             |
|    18 |                    INDEX RANGE SCAN                | FIRM_ULT_PARENT_FIRM_IDX1 |       1 |    1 |         1 |     +3 |     1 |       43 |       |       |     |      |   |                             |
|    19 |                   PARTITION RANGE SINGLE           |                           |         |      |         1 |     +3 |    43 |    38694 |       |       |     |      |   |                             |
|    20 |                    BITMAP CONVERSION TO ROWIDS     |                           |         |      |         1 |     +3 |    43 |    38694 |       |       |     |      |   |                             |
|    21 |                     BITMAP INDEX SINGLE VALUE      | LIT_FACT_BD_IDX09         |         |      |         1 |     +3 |    43 |       49 |       |       |     |      |   |                             |
|    22 |                  TABLE ACCESS BY LOCAL INDEX ROWID | LIT_FACT_BD               |      63 |   19 |         3 |     +1 | 38694 |    38694 |       |       |     |      |     0.49 | Cpu (1)                      |
|    23 |              PX RECEIVE                            |                           |      20 |    2 |         1 |     +3 |     2 |        2 |       |       |     |      |   |                             |
|    24 |               PX SEND HASH                         | :TQ10001                  |      20 |    2 |           |        |       |          |       |       |     |      |   |                             |
|    25 |                PX BLOCK ITERATOR                   |                           |      20 |    2 |           |        |       |          |       |       |     |      |   |                             |
|    26 |                 TABLE ACCESS FULL                  | OPPOSING_COUNSEL          |      20 |    2 |           |        |       |          |       |       |     |      |   |                             |
| -> 27 |             TABLE ACCESS BY INDEX ROWID            | DATE_CODE                 |       1 |      |       219 |     +3 |  6465 |     6463 |       |       |     |      |   |                             |
| -> 28 |              INDEX UNIQUE SCAN                     | PK_DATE_CODE              |       1 |      |       219 |     +3 |  6465 |     6465 |       |       |     |      |   |                             |
| -> 29 |            PARTITION RANGE SINGLE                  |                           |      19 |      |       219 |     +3 |  6465 |      39M |       |       |     |      |   |                             |
| -> 30 |             TABLE ACCESS BY LOCAL INDEX ROWID      | LIT_FACT_BD               |      19 |      |       220 |     +2 |  6465 |      39M |       |       |     |      |    35.92 | Cpu (74)                     |
| -> 31 |              INDEX RANGE SCAN                      | LIT_FACT_BD_IDX20         |       1 |      |       219 |     +3 |  6465 |      39M |       |       |     |      |     9.22 | Cpu (19)                     |
| -> 32 |           INDEX UNIQUE SCAN                        | PK_FIRM                   |       1 |      |       219 |     +3 |   39M |      39M |       |       |     |      |    10.68 | Cpu (22)                     |
| -> 33 |          TABLE ACCESS BY INDEX ROWID               | FIRM                      |       1 |      |       219 |     +3 |   39M |      39M |       |       |     |      |    16.99 | Cpu (35)                     |
===================================================================================================================================================================

You can see from the “->” symbols at the left hand side of the plan that this report was generated while the plan was still running. The thing that leaps out as you glance down the page is the value in the “Rows (Actual)” column at operations 7-9 (which show the rowsources generated by some nested loop joins) and operations 29, 32 and 33 of the plan that tell us something about how those rowsources were generated.

Operation 29 has executed (Execs) 6,465 so far, producing a total of 39M rows, and operations 32 and 33 have both executed 39M times each producing a total of 39M rows by index unique scan.

The plan for the 2018 data was similar though the join order for DATE_CODE, LIT_FACT_BD and FIRM was different (and it was the join to LIT_FACT_BD that increased the row count dramatically – so hinting it to be the last table in the join might help a bit), but the largest rowcount for the 2018 query was only 3M rows, not the 39M that had appeared after only 6,465 rows of a possible driving 39,855 in the 2019 query.

So it’s not surprising that the query could take much longer for 2019. It’s not the volume of output that matters, it’s the volume of input (or, more accurately, throughput or intermediate) data that matters.

Let’s think about that volume, though: the 2018 plan generated 3M rows and then crunched them down to 2,500 rows and the 2019 plan was supposed to produce a similar sized output (from 39M+ rows). Could we rewrite the query in some way that made it do some intermediate aggregation so that the volume of data to be aggregated was never enormous?

Let’s take a look at the plan from the 2018 Monitor report (click to show/hide)
SQL Plan Monitoring Details (Plan Hash Value=472871521)
=======================================================================================================================================================================================================================================
| Id |                     Operation                      |           Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Temp  | Activity |       Activity Detail       |
|    |                                                    |                           | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes | (Max) | (Max) |   (%)     |        (# samples)         |
=======================================================================================================================================================================================================================================
|  0 | SELECT STATEMENT                                   |                           |         |      |         1 |    +24 |    17 |     2613 |      |       |       |       |       |       |   |                            |
|  1 |   PX COORDINATOR                                   |                           |         |      |         1 |    +24 |    17 |     2613 |      |       |       |       |       |       |   |                            |
|  2 |    PX SEND QC (RANDOM)                             | :TQ10003                  |       1 |   39 |         1 |    +24 |     8 |     2613 |      |       |       |       |       |       |   |                            |
|  3 |     HASH UNIQUE                                    |                           |       1 |   39 |         9 |    +16 |     8 |     2613 |      |       |       |       |    9M |       |     6.90 | Cpu (2)                     |
|  4 |      PX RECEIVE                                    |                           |       1 |   38 |         9 |    +16 |     8 |       3M |      |       |       |       |       |       |   |                            |
|  5 |       PX SEND HASH                                 | :TQ10002                  |       1 |   38 |        12 |    +14 |     8 |       3M |      |       |       |       |       |       |     3.45 | Cpu (1)                     |
|  6 |        BUFFER SORT                                 |                           |       1 |   39 |        23 |     +2 |     8 |       3M | 4584 | 143MB |   703 | 143MB |  151M |  151M |    34.48 | Cpu (2)                     |
|    |                                                    |                           |         |      |           |        |       |          |      |       |       |       |       |       |   | direct path read temp (6)  |
|    |                                                    |                           |         |      |           |        |       |          |      |       |       |       |       |       |   | direct path write temp (2) |
|  7 |         NESTED LOOPS                               |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
|  8 |          NESTED LOOPS                              |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
|  9 |           NESTED LOOPS                             |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
| 10 |            NESTED LOOPS                            |                           |       1 |   38 |        15 |     +2 |     8 |       3M |      |       |       |       |       |       |   |                            |
| 11 |             HASH JOIN                              |                           |      41 |   21 |        15 |     +2 |     8 |    19334 |      |       |       |       |    7M |       |   |                            |
| 12 |              BUFFER SORT                           |                           |         |      |        13 |     +2 |     8 |    19233 |      |       |       |       |    1M |       |   |                            |
| 13 |               PX RECEIVE                           |                           |      89 |   19 |        13 |     +2 |     8 |    19233 |      |       |       |       |       |       |   |                            |
| 14 |                PX SEND HASH                        | :TQ10000                  |      89 |   19 |         1 |     +1 |     1 |    19233 |      |       |       |       |       |       |   |                            |
| 15 |                 NESTED LOOPS                       |                           |      89 |   19 |         1 |     +1 |     1 |    19233 |      |       |       |       |       |       |   |                            |
| 16 |                  NESTED LOOPS                      |                           |      89 |   19 |         1 |     +1 |     1 |    19233 |      |       |       |       |       |       |   |                            |
| 17 |                   TABLE ACCESS BY INDEX ROWID      | FIRM                      |       1 |    2 |         1 |     +1 |     1 |       43 |      |       |       |       |       |       |   |                            |
| 18 |                    INDEX RANGE SCAN                | FIRM_ULT_PARENT_FIRM_IDX1 |       1 |    1 |         1 |     +1 |     1 |       43 |      |       |       |       |       |       |   |                            |
| 19 |                   PARTITION RANGE SINGLE           |                           |         |      |         1 |     +1 |    43 |    19233 |      |       |       |       |       |       |   |                            |
| 20 |                    BITMAP CONVERSION TO ROWIDS     |                           |         |      |         1 |     +1 |    43 |    19233 |      |       |       |       |       |       |   |                            |
| 21 |                     BITMAP INDEX SINGLE VALUE      | LIT_FACT_BD_IDX09         |         |      |         1 |     +1 |    43 |       51 |      |       |       |       |       |       |   |                            |
| 22 |                  TABLE ACCESS BY LOCAL INDEX ROWID | LIT_FACT_BD               |      64 |   19 |         1 |     +1 | 19233 |    19233 |      |       |       |       |       |       |   |                            |
| 23 |              PX RECEIVE                            |                           |      20 |    2 |        15 |     +2 |     8 |       20 |      |       |       |       |       |       |   |                            |
| 24 |               PX SEND HASH                         | :TQ10001                  |      20 |    2 |         1 |    +14 |     8 |       20 |      |       |       |       |       |       |   |                            |
| 25 |                PX BLOCK ITERATOR                   |                           |      20 |    2 |         1 |    +14 |     8 |       20 |      |       |       |       |       |       |   |                            |
| 26 |                 TABLE ACCESS FULL                  | OPPOSING_COUNSEL          |      20 |    2 |         1 |    +14 |     3 |       20 |      |       |       |       |       |       |   |                            |
| 27 |             PARTITION RANGE SINGLE                 |                           |       1 |      |        15 |     +2 | 19334 |       3M |      |       |       |       |       |       |   |                            |
| 28 |              TABLE ACCESS BY LOCAL INDEX ROWID     | LIT_FACT_BD               |       1 |      |        16 |     +1 | 19334 |       3M |      |       |       |       |       |       |    17.24 | Cpu (5)                     |
| 29 |               INDEX RANGE SCAN                     | LIT_FACT_BD_IDX20         |       1 |      |        15 |     +2 | 19334 |       3M |      |       |       |       |       |       |   |                            |
| 30 |            TABLE ACCESS BY INDEX ROWID             | DATE_CODE                 |       1 |      |        15 |     +2 |    3M |       3M |      |       |       |       |       |       |    10.34 | Cpu (3)                     |
| 31 |             INDEX UNIQUE SCAN                      | PK_DATE_CODE              |       1 |      |        16 |     +1 |    3M |       3M |      |       |       |       |       |       |     6.90 | Cpu (2)                     |
| 32 |           INDEX UNIQUE SCAN                        | PK_FIRM                   |       1 |      |        23 |     +2 |    3M |       3M |      |       |       |       |       |       |     6.90 | Cpu (2)                     |
| 33 |          TABLE ACCESS BY INDEX ROWID               | FIRM                      |       1 |      |        16 |     +1 |    3M |       3M |      |       |       |       |       |       |    13.79 | Cpu (4)                     |
=======================================================================================================================================================================================================================================

We see from operations 3 – 7 that the 3M rows generated from the nested loop joins pass up through a buffer sort operation before being crunched down to 2,613 rows. It’s probably the buffer sort (which is buffering but not sorting) that has mostly passed through a single server and spilled to disc in the 2019 report. We just don’t want that 39M+ rows ever to exist.

So how easy will it be to change the SQL (click to view/hide)
SELECT
        /*+ PARALLEL(8) */
        DISTINCT rd.document_id AS doc_id
        ,'firm_id' AS criteria_key
        ,opf.ultimate_parent_firm_id AS series_id
        ,period_number
FROM (
        SELECT
                /*+ PARALLEL(8) */
                DISTINCT rd.document_id
                ,rd.client_role_id
                ,(
                        CASE 
                                WHEN MONTHS_BETWEEN(TO_DATE('04/28/2022', 'MM/DD/YYYY'), TO_DATE('04/01/2017', 'MM/DD/YYYY')) > 12
                                        THEN TRUNC(dc.date_value, 'YEAR')
                                ELSE TRUNC(dc.date_value, 'MONTH')
                                END
                        ) period_number
        FROM REPORT_ENGINE.date_code dc
                ,REPORT_ENGINE.lit_fact_bd rd
        INNER JOIN report_engine.firm FIRM ON rd.firm_id = FIRM.firm_id
        WHERE dc.date_value BETWEEN TO_DATE('04/01/2017', 'MM/DD/YYYY')
                        AND TO_DATE('04/28/2022', 'MM/DD/YYYY')
                AND rd.publication_date_id = dc.date_id
                AND rd.year_number = to_number('2019')
                AND (FIRM.ultimate_parent_firm_id IN (to_number('252094')))
        ) rd
INNER JOIN report_engine.opposing_counsel op ON rd.client_role_id = op.legal_role_id
INNER JOIN report_engine.lit_fact_bd opclf ON opclf.document_id = rd.document_id
        AND op.opposing_counsel_role_id = opclf.client_role_id
        AND opclf.year_number = to_number('2019')
INNER JOIN report_engine.firm opf ON opclf.firm_id = opf.firm_id
        AND opf.firm_id >= 1000
WHERE 1 = 1;

Lines 7-10 and 27 tell us we alredy have an inline view where we’re doing a “select distinct” and, unwinding the mix of “Oracle” and “ANSI” syntax, we can see that it joins DATE_CODE, LIT_FACT_BD and FIRM, and we know that one of those tables explodes the intermediate data size to something enormous. So it looks like the original author of this code had already worked out that the query needed to aggregate early.

Checking back to the original plans we note that there’s only one “hash unique” operation, and there’s no sign of a “view” operation, so maybe the performance problem is a result of the optimizer suddenly deciding it can do complex view merging with this inline view, and perhaps all we need to do is add the hint /*+ no_merge */ to the inline view and see what happens.

Here’s the plan after adding the hint (click to hide/vew)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----
| Id  | Operation                                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----
|   0 | SELECT STATEMENT                                     |                           |       |       |   314 (100)|          |       |       |        |      | |
|   1 |  PX COORDINATOR                                      |                           |       |       |            |          |       |       |        |      | |
|   2 |   PX SEND QC (RANDOM)                                | :TQ10005                  |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,05 | P->S | QC (RAN D)  |
|   3 |    HASH UNIQUE                                       |                           |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,05 | PCWP | |
|   4 |     PX RECEIVE                                       |                           |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,05 | PCWP | |
|   5 |      PX SEND HASH                                    | :TQ10004                  |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,04 | P->P | HASH |
|   6 |       HASH UNIQUE                                    |                           |    23 |  2599 |   314   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|   7 |        NESTED LOOPS                                  |                           |    23 |  2599 |   313   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|   8 |         NESTED LOOPS                                 |                           |    23 |  2599 |   313   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|   9 |          NESTED LOOPS                                |                           |    23 |  2323 |   310   (1)| 00:00:06 |       |       |  Q1,04 | PCWP | |
|* 10 |           HASH JOIN                                  |                           |   388 | 21340 |   148   (1)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  11 |            PX RECEIVE                                |                           |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,04 | PCWP | |
|  12 |             PX SEND BROADCAST                        | :TQ10002                  |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,02 | P->P | BROADCA ST  |
|  13 |              PX BLOCK ITERATOR                       |                           |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWC | |
|* 14 |               TABLE ACCESS FULL                      | OPPOSING_COUNSEL          |    20 |   160 |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWP | |
|  15 |            VIEW                                      |                           |   835 | 39245 |   146   (1)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  16 |             HASH UNIQUE                              |                           |   835 | 63460 |   146   (1)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  17 |              PX RECEIVE                              |                           |   835 | 63460 |   145   (0)| 00:00:03 |       |       |  Q1,04 | PCWP | |
|  18 |               PX SEND HASH                           | :TQ10003                  |   835 | 63460 |   145   (0)| 00:00:03 |       |       |  Q1,03 | P->P | HASH |
|* 19 |                HASH JOIN BUFFERED                    |                           |   835 | 63460 |   145   (0)| 00:00:03 |       |       |  Q1,03 | PCWP | |
|  20 |                 BUFFER SORT                          |                           |       |       |            |          |       |       |  Q1,03 | PCWC | |
|  21 |                  PX RECEIVE                          |                           |   835 | 52605 |   136   (0)| 00:00:03 |       |       |  Q1,03 | PCWP | |
|  22 |                   PX SEND HASH                       | :TQ10000                  |   835 | 52605 |   136   (0)| 00:00:03 |       |       |        | S->P | HASH |
|  23 |                    NESTED LOOPS                      |                           |   835 | 52605 |   136   (0)| 00:00:03 |       |       |        |      | |
|  24 |                     NESTED LOOPS                     |                           |   835 | 52605 |   136   (0)| 00:00:03 |       |       |        |      | |
|  25 |                      TABLE ACCESS BY INDEX ROWID     | FIRM                      |     1 |    12 |     2   (0)| 00:00:01 |       |       |        |      | |
|* 26 |                       INDEX RANGE SCAN               | FIRM_ULT_PARENT_FIRM_IDX1 |     1 |       |     1   (0)| 00:00:01 |       |       |        |      | |
|  27 |                      PARTITION RANGE SINGLE          |                           |       |       |            |          |    30 |    30 |        |      | |
|  28 |                       BITMAP CONVERSION TO ROWIDS    |                           |       |       |            |          |       |       |        |      | |
|* 29 |                        BITMAP INDEX SINGLE VALUE     | LIT_FACT_BD_IDX09         |       |       |            |          |    30 |    30 |        |      | |
|* 30 |                     TABLE ACCESS BY LOCAL INDEX ROWID| LIT_FACT_BD               |   598 | 30498 |   136   (0)| 00:00:03 |    30 |    30 |        |      | |
|  31 |                 PX RECEIVE                           |                           |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,03 | PCWP | |
|  32 |                  PX SEND HASH                        | :TQ10001                  |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,01 | P->P | HASH |
|  33 |                   PX BLOCK ITERATOR                  |                           |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,01 | PCWC | |
|* 34 |                    TABLE ACCESS FULL                 | DATE_CODE                 |  1854 | 24102 |     9   (0)| 00:00:01 |       |       |  Q1,01 | PCWP | |
|  35 |           PARTITION RANGE SINGLE                     |                           |     1 |    46 |     0   (0)|          |    30 |    30 |  Q1,04 | PCWP | |
|* 36 |            TABLE ACCESS BY LOCAL INDEX ROWID         | LIT_FACT_BD               |     1 |    46 |     0   (0)|          |    30 |    30 |  Q1,04 | PCWP | |
|* 37 |             INDEX RANGE SCAN                         | LIT_FACT_BD_IDX20         |     1 |       |     0   (0)|          |    30 |    30 |  Q1,04 | PCWP | |
|* 38 |          INDEX UNIQUE SCAN                           | PK_FIRM                   |     1 |       |     0   (0)|          |       |       |  Q1,04 | PCWP | |
|  39 |         TABLE ACCESS BY INDEX ROWID                  | FIRM                      |     1 |    12 |     0   (0)|          |       |       |  Q1,04 | PCWP | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----

Note particularly that operations 15 and 16 tell us that we’ve forced the optimizer into treating the inline view as a separate query block and we now have two aggregation steps, one inside the view, and another after joining FIRM (again) and LIT_FACT_BD (again) to the inline view.

Unfortunately the plan shown here is pulled from memory using dbms_xplan.display_cursor() after execution, so it include the various parallel executoin colums (TQ, IN-OUT, PQ Distrib), but doesn’t have the rowsource execution stats enabled so we can’t see the actual workload and data volume, but in the words of the OP: “adding no_merge hint did the trick and now the SQL is just executing fine”.

Summary

The steps for solving the performance problems of a specific SQL statement are very fluid. For a long-running or parallel statement the SQL Monitor report will automatically be created (though there are limits on the size of the plan that may disable the feature) and this is the easiest source of useful information, though you might also need to pull the execution plan from v$sql_plan to get details about parallel execution and partitioning at the same time.

If you’re not licensed for the diagnostic and performance packs, though, enabling SQL Trace to get the plan and waits gets you a lot of infomation, and querying (g)v$pq_tqstat immediately after running the query can fill in the parallel traffic details.

In this example the SQL Monitor report showed a highly skewed distribution, which might have been fixable by changing the PQ distribution, or even by simply changing the degree of parallelism.

Further examination of the report showed that the query generated an enormous rowsource which it then crunched down to a small result set. Comparing the 2018 and 2019 plans (which were not quite identical but were sufficiently similar to justify comparison) the same skew and explosion of rowsource were visible in both, though the data size involved in the 2018 plan made it seem that the plan was a “good” one which really it wasn’t.

The obvious target for tuning was to see if the explosion in volume could be reduced or eliminated by writing the query with some intermediate non-mergeable view(s), and it turned out that the query had been written with that intent in its original form but without a hint to block complex view merging. After adding the hint the performance was acceptable.

May 20, 2022

Quiz Night

Filed under: Infrastructure,Oracle,Problem Solving,Statistics — Jonathan Lewis @ 6:09 pm BST May 20,2022

It’s a long time since I’ve done a quiz night – but here’s something that fooled me (briefly) when it appeared on the Oracle Developers’ Forum. Here’s a table definition – and I’m not going to make it easy by giving you a “create table” statement – but it’s just a simple heap table:

sql> desc interr_skuplannparam
 name                             null?    type
 -------------------------------- -------- -------------------
 atpdur                                    number(38)
 depdmdopt                                 number(38)
 externalskusw                             number(1)
 firstreplendate                           date
 lastfrzstart                              date
 lastplanstart                             date
 plandur                                   number(38)
 planleadtime                              number(38)
 planleadtimerule                          number(38)
 planshipfrzdur                            number(38)
 restrictdur                               number(38)
 allocbatchsw                              number(1)
 cmpfirmdur                                number(38)
 custservicelevel                          float(126)
 maxchangefactor                           float(126)
 mfgleadtime                               number(38)
 recschedrcptsdur                          number(38)
 cpppriority                               number(38)
 cpplocksw                                 number(1)
 criticalmaterialsw                        number(1)
 aggexcesssupplyrule                       number(38)
 aggundersupplyrule                        number(38)
 bufferleadtime                            number(38)
 maxoh                                     float(126)
 maxcovdur                                 number(38)
 drpcovdur                                 number(38)
 drpfrzdur                                 number(38)
 drprule                                   number(38)
 drptimefencedate                          date
 drptimefencedur                           number(38)
 incdrpqty                                 float(126)
 mindrpqty                                 float(126)
 mpscovdur                                 number(38)
 mfgfrzdur                                 number(38)
 mpsrule                                   number(38)
 mpstimefencedate                          date
 mpstimefencedur                           number(38)
 incmpsqty                                 float(126)
 minmpsqty                                 float(126)
 shrinkagefactor                           number(38)
 item                                      varchar2(50 char)
 loc                                       varchar2(50 char)
 expdate                                   date
 atprule                                   number(38)
 prodcal                                   varchar2(50 char)
 prodstartdate                             date
 prodstopdate                              date
 orderingcost                              float(126)
 holdingcost                               float(126)
 eoq                                       float(126)
 ff_trigger_control                        number(38)
 workingcal                                varchar2(50 char)
 lookaheaddur                              number
 orderpointrule                            number
 orderskudetailsw                          number(1)
 supsdmindmdcovdur                         number(38)
 orderpointminrule                         number(38)
 orderpointminqty                          float(126)
 orderpointmindur                          number(38)
 orderuptolevelmaxrule                     number(38)
 orderuptolevelmaxqty                      float(126)
 orderuptolevelmaxdur                      number(38)
 aggskurule                                number(38)
 fwdbuymaxdur                              number(38)
 costuom                                   number(38)
 cumleadtimedur                            number(38)
 cumleadtimeadjdur                         number(38)
 cumleadtimerule                           number(38)
 roundingfactor                            float(126)
 limitplanarrivpublishsw                   number(1)
 limitplanarrivpublishdur                  number
 maxohrule                                 number(1)
 integration_stamp                         date
 integration_jobid                not null varchar2(32 char)
 error_str                                 varchar2(2000 char)
 error_stamp                               date

The column integration_jobid (the single “not null” column) has been defined with the default value of “INT_JOB”, which takes 7 bytes to store. What’s the result of the query at the end of this little script:

truncate table interr_skuplannparam;

insert into interr_skuplannparam (atpdur) 
select   0 
from     all_objects 
where    rownum <= 10000
/

commit;

execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM' -- added later
;

Hint: the value zero is represented internally as a single byte holding the value 0x80 (decimal 128).

Answer

It’s fairly common knowledge that Oracle includes a “length byte” in the estimates for average column length and average row length. It’s also fairly well known that “trailing nulls take no space” in a row so don’t need a length byte. There are a couple of “less common knowledge” details to add, though:

  1. any column that is null in every row gets a zero for the avg_col_len even if it isn’t a “trailing null”.
  2. but point 1 doesn’t apply to date columns that are always null, they get a 1 for the avg_col_len even the column is a “trailing null”.. The same is true for the various timestamp and interval types.
  3. for columns that hold at least one value the avg_col_len is the average over all rows of the actual space used by that column’s data, rounded up, plus 1 for the length byte.
  4. the avg_row_len is not the sum(avg_col_len) it is based on the average of the summed column lengths for each row, plus the count of the length bytes recorded.
  5. User defined type, LOBs, varray types etc. introduce all sorts of other strange effects. (But that’s probably “more common” knowledge.

So what does that mean in my example where there’s a declared not null column near the end of the row, with two trailing columns and with every column except the first and the non-null column set to null for every single row in the table? The easy option is to create the model and show you the results of querying user_tab_cols.

break on report 
compute sum of avg_col_len on report

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM'
;

select 
        column_name, data_type, avg_col_len 
from
         user_tab_cols 
where 
        table_name = 'INTERR_SKUPLANNPARAM'
and     avg_col_len != 0
order by
        column_id
;


============

AVG_ROW_LEN
-----------
         20

1 row selected.


 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME              DATA_TYPE                 AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
         1                 1 ATPDUR                   NUMBER                              2
         4                 4 FIRSTREPLENDATE          DATE                                1
         5                 5 LASTFRZSTART             DATE                                1
         6                 6 LASTPLANSTART            DATE                                1
        29                29 DRPTIMEFENCEDATE         DATE                                1
        36                36 MPSTIMEFENCEDATE         DATE                                1
        43                43 EXPDATE                  DATE                                1
        46                46 PRODSTARTDATE            DATE                                1
        47                47 PRODSTOPDATE             DATE                                1
        73                73 INTEGRATION_STAMP        DATE                                1
        74                74 INTEGRATION_JOBID        VARCHAR2                            8
        76                76 ERROR_STAMP              DATE                                1
                                                                                -----------
sum                                                                                      20

12 rows selected.


My query of user_tab_cols orders by column_id, technically it should order by segment_column_id to show the physical ordering in the data segment to allow for all the strange effects you can get in more complex scenarios, but in this very simple case the two values are the same.

As you can see, every date type (including the trailing error_stamp) has an avg_col_len of 1, even though all the dates are null in every row. Column atdpur has avg_col_len = 2, which is 1 byte for storing zero plus a length byte and integration_job_id has avg_col_len = 8, which is 7 bytes for storing ‘INT_JOB’ plus a length byte.

In this case where every single row is identical there are no rounding effects due to the calculation of average column length (the column data stored is the same in every row) so the avg_row_len = sum(avg_col_len).

Conclusion

This was an extreme (though real) production) case. I wrote this note because someone was asking why there was such a massive difference between their estimate of the space needed for a table and the actual space taken by the table.

There are many reasons why this could happen, of course, but in their case their estimate was based on multiplying the avg_row_len by the number of rows in the table – and they had a large number of columns that were always null but still required space for the length byte, so the avg_row_len was about half the actual stored space for the average row. (And then the avg_row_len doesn’t allow for the row overhead and row directory entry, which total another 5 bytes per row).

Lagniappe

It wasn’t part of the quiz, but once you’ve got a simple model you can always start playing with it to find out a little more, and to check how well your guesswork holds up as you make the scenario more complicated. For example, let’s update a few columns in just one row of the table, gather stats again, and report the row and column lengths.

update interr_skuplannparam
set 
        shrinkagefactor = 1234567890,
        item            = 'xxx',
        expdate         = sysdate
where
        rownum = 1
/

commit;

execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM'
;

select 
        column_id, segment_column_id, column_name, data_type, avg_col_len 
from
        user_tab_cols 
where 
        table_name = 'INTERR_SKUPLANNPARAM'
and     avg_col_len != 0
order by
        column_id
;

============================

AVG_ROW_LEN
-----------
         22

1 row selected.


 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME              DATA_TYPE                 AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
         1                 1 ATPDUR                   NUMBER                              2
         4                 4 FIRSTREPLENDATE          DATE                                1
         5                 5 LASTFRZSTART             DATE                                1
         6                 6 LASTPLANSTART            DATE                                1
        29                29 DRPTIMEFENCEDATE         DATE                                1
        36                36 MPSTIMEFENCEDATE         DATE                                1
        40                40 SHRINKAGEFACTOR          NUMBER                              2
        41                41 ITEM                     VARCHAR2                            2
        43                43 EXPDATE                  DATE                                2
        46                46 PRODSTARTDATE            DATE                                1
        47                47 PRODSTOPDATE             DATE                                1
        73                73 INTEGRATION_STAMP        DATE                                1
        74                74 INTEGRATION_JOBID        VARCHAR2                            8
        76                76 ERROR_STAMP              DATE                                1
                                                                                -----------
sum                                                                                      25


The total of the avg_col_len has gone up from 20 to 25 – this is two bytes each for the shrinkage_factor and item columns (a tiny average for the stored data, plus 1 for a length byte), and one extra byte for the expdate column (a tiny average for the stored data). All three values rounded up from “1 and a bit” to 2.

The avg_row_len, however, has gone up by only 2 – which I am going to assume is the two newlength bytes, ,and with no allowance for the impact of the one row in 10,000 that is now a few bytes longer. It looks as if the rounding rules for the row length may be different from the rounding (up) rules for the column length.

May 5, 2022

Lag/Lead slow

Filed under: Bugs,Oracle,Performance,Problem Solving,Troubleshooting,Tuning — Jonathan Lewis @ 10:05 am BST May 5,2022

This note is about a surprising performance difference between the lead() and lag() analytic functions (which31 turns out to be due to the behaviour of the nth_value() function) when the option to “ignore nulls” is included in their use (jump to conclusion). The detail I’ll be writing about was highlighted in a thread on the Oracle developer forum about a requirement to add a number of analytic columns to a dataset of 156 million rows using a statement of the following shape:

create table tb_target_ignore
as
select
        pat_id,
        visit_date_dt,
        ed_ucc_dt,
        lag (ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as prev_ed_ucc_dt,
        lead(ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as next_ed_ucc_dt,
        row_number() over (partition by pat_id order by visit_date_dt) as row_num
from
        tb_source
;

You’ll notice that I’ve introduced a row_number(), and both a lead() and a lag() of a column called ed_ucc_dt. All three analytic columns use the same partitioning and ordering, though, so Oracle will only be doing one “window sort” in the execution plan. Part of the performance problem, of course, was that with 156M rows of a couple of dozen existing columns and adding a dozen new columns, the workload due to sorting was bound to be very large, so there were various suggestions of how to minimise that part of the workload.

However Solomon Yakobson pointed out that the code was using the “ignore nulls” option and there was a bug in 11g that made lead() and lag() very slow when this option was used. He subsequently reported that this defect was still present in 19c, restricted it to just the lag() function, and cited a MOS document ID referencing the problem: LAG Function is slow when using ignore nulls (Doc ID 2811596.1). The implication of the MOS note is that we shouldn’t expect this to change.

A follow-up posting by User_H3J7U gave us a reason for the slowness of the lag() function by running a sample query through dbms_utility.expand_sql(). Oracle rewrites the query to use variants of the nth_value() function when you use “ignore nulls”, but rewrites it to use variants of first_value() when you aren’t using the “ignore nulls” option. This isn’t a complete explanation of why lag() should be slow while lead() is not – but it’s a significant pointer towards a possible implementation issue and is a good clue about working around the problem. So let’s build a model of the situation.

The basic model

rem
rem     Script:         lag_ignore_nulls.sql
rem     Author:         Jonathan Lewis / Sam P
rem     Dated:          May 2022
rem     Purpose:     
rem
rem     Last tested
rem             19.11.0.0
rem

create table tb_source (
        pat_id,
        visit_date_dt,
        ed_ucc_dt
)
as
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 1e4    --> comment to avoid wordpress format issue
)
select
        g1.id,
        to_date('01-Apr-2022') + dbms_random.value(0,100),
        to_date('01-Apr-2022') + dbms_random.value(5,105)
--      to_date(null)
from
        generator g1,
        generator g2
where
        g2.id <= 20     --> comment to avoid wordpress format issue
order by
        dbms_random.value
/


spool lag_ignore_nulls.lst

set serveroutput off
set timing on

prompt  ======================
prompt  Without "ignore nulls"
prompt  (My time 0.61 seconds)
prompt  ======================

create table tb_target_no_ignore
as
select
        pat_id,
        visit_date_dt,
        ed_ucc_dt,
        lag (ed_ucc_dt, 1) over (partition by pat_id order by visit_date_dt) as prev_ed_ucc_dt,
        lead(ed_ucc_dt, 1) over (partition by pat_id order by visit_date_dt) as next_ed_ucc_dt,
        row_number() over (partition by pat_id order by visit_date_dt) as row_num
from
        tb_source
;

prompt  ======================
prompt  With "ignore nulls"
prompt  (My time 0.88 seconds)
prompt  ======================

create table tb_target_ignore
as
select
        pat_id,
        visit_date_dt,
        ed_ucc_dt,
        lag (ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as prev_ed_ucc_dt,
        lead(ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as next_ed_ucc_dt,
        row_number() over (partition by pat_id order by visit_date_dt) as row_num
from
        tb_source
;

I’ve created a source table with 200,000 rows, consisting of 10,000 pat_id values, and 20 rows per pat_id. The 20 rows for a pat_id (probably) each have a different visit_date_dt and a different ed_ucc_dt.

After creating the data set I’ve created two more tables using the lead() and lag() functions to generate a previous (lag) and next (lead) ed_ucc_dt for each row, partitioning by pat_id, ordering by visit_date_dt. One statement includes the “ignore nulls” option the other doesn’t and, as you can see, the time to create the “no ignore” table was 0.61 seconds while the time to create the “ignore null” table was 0.88 seconds.

The variation isn’t dramatic – but this is just 200,000 rows, in memory, with only a few columns and only two columns added through lead and lag.

After the baseline test I tweaked the statement that created the table with the “ignore nulls” option to get three more times.

  • With neither lead() nor lag() the time was 0.29 seconds
  • With just the lead() column the time was 0.46 seconds – an increase of 0.17 seconds
  • With just the lag() column the time was 0.71 seconds – an increase of 0.42 seconds

You might note that 0.29 + 0.17 + 0.42 = 0.88 (the time I got for adding both columns) – it’s a little lucky that it looks like a perfect match, but even matching within a couple of hundredths of a second would be have been a nice detail. It certainly seems that lag() – with my test data – consumes more resources than lead() for a pair of operationd that look as if they should produce the same workloads.

Internal Rewrite

The next step was to check what the internal rewrite of the code looked like, so I passed the select part of the statements (the procedure won’t accepts “create as select”) through dbms_utility.expand_sql() and reformatted the results. Here are the two rewritten statements – first without “ignore nulls”:

select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
        decode(
                count(*) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 preceding and 1 preceding
                        ),a
                 1,     first_value(a1.ed_ucc_dt) over (
                                partition by a1.pat_id order by a1.visit_date_dt
                                rows between 1 preceding and 1 preceding
                        ),
                        null
        ) prev_ed_ucc_dt,
        decode(
                count(*) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 following and 1 following
                        ),a
                 1,     first_value(a1.ed_ucc_dt) over (
                                partition by a1.pat_id order by a1.visit_date_dt
                                rows between 1 following and 1 following
                        ),
                        null
        ) next_ed_ucc_dt,
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1


The code looks a little long and messy, but that’s mainly because just about everything it does happens twice. The lag() function (prev column) turns into a first_value() function that looks at the row preceding the current row in the partition (rows between 1 preceding and 1 preceding). However it first has to count over the same clause to see if a row exists, and then either report its value or report a null – hence the structure decode(count(), 1, first_value(), null)

Note: the full lag() function call is: “lag(expression, offset, default)” where the offset (number of rows to lag) defaults to 1 and the default is the value you want reported when there is no matching row, and defaults to null.

The call to lead() basically does the same thing, but uses (rows between 1 following and 1 following) to access the next row in the partition.

On the other hand this is the SQL that Oracle generates when we include the “ignore nulls” clause (which means Oracle can’t restrict the row range to just one preceding or following row):

select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
        nvl(
                nth_value(a1.ed_ucc_dt, 1) from last ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between unbounded preceding and 1 preceding
                ),
                 null
        ) prev_ed_ucc_dt,
        nvl(
                nth_value(a1.ed_ucc_dt, 1)           ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 following and unbounded following
                ),
                null
        ) next_ed_ucc_dt,
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1

Both lag() and lead() turn into nth_value() with a second parameter of 1 (i.e. nth == 1st … which makes you wonder why Oracle isn’t using first_value()), and we can also see the “ignore nulls” still being used.

The lag() call now uses the range (rows between unbounded preceding and 1 preceding) i.e. everything from the start of partition to the previous row, while the lead() call uses the range (rows between 1 following and unbounded following) i.e. from the next row to the end of partition.

The other important detail to note is that the translation of the lag() call also includes the clause “from last” – in other words we want the first row when reading the partition in reverse order, and that might have something to do with the extra time it takes to operate the (translated) lag() function.

Workaround

Oracle is using a generic nth_value() to translate a generic lead()/lag(), but we’re in the special case where we know n = 1, which means we (and Oracle) could use first_value()/last_value(). It’s perfectly reasonable for Oracle’s internal code to avoid special cases if it makes no difference to performance, of course, but maybe in this case we could imitate Oracle’s rewrite to get some benefit.

  • Step 1 – change nth_value() to the appropriate first/last.
  • Step 2 – get rid of the “from last” which won’t be needed with last_value()
  • Step 3 – move the “ignore nulls” to the spot that Oracle wants to see it with first/last

Here’s the resulting SQL – I’ve left the nvl(count, expression, null) in place, but if you wanted a null as the default return value for the original lead()/lag() calls you could simplify the code a little further.

create table tb_target 
as
select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
        nvl(
                last_value(a1.ed_ucc_dt ignore nulls) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between unbounded preceding and 1 preceding
                ),
                 null
        ) prev_ed_ucc_dt,
        nvl(
                first_value(a1.ed_ucc_dt ignore nulls) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 following and unbounded following
                ),
                null
        ) next_ed_ucc_dt,
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1
/

  • Run-time: 0.61 seconds.
  • Run-time with just last_value() / preceding: 0.47 seconds
  • Run time with just first_value() / following: 0.43 seconds

There still seems to be a little discrepancy between accessing to the preceding data compared to accessing the following data but there’s a much better balance than before.

One more edit – taking out the nvl() construct because the original lead()/lag() calls didn’t have a non-null default supplied:

create table tb_target 
as
select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
--
        last_value(a1.ed_ucc_dt ignore nulls) over (
                partition by a1.pat_id order by a1.visit_date_dt
                rows between unbounded preceding and 1 preceding
        ) prev_ed_ucc_dt,
--
        first_value(a1.ed_ucc_dt ignore nulls) over (
                partition by a1.pat_id order by a1.visit_date_dt
                rows between 1 following and unbounded following
        ) next_ed_ucc_dt,
--
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1
/

With this simplification the time dropped by a further couple of hundredths of a second hovering between 0.57 and 0.58 seconds.

There was one last detail about the test code that should be mentioned – do the changes in code still produce the same results? As a quick and dirty check I ran the following query after each variant of creating the tb_target table (tb_target_ignore is the table created using the original lead()/lag() code with “ignore nulls”):

select  *
from   (
        select * from tb_target minus select * from tb_target_ignore
        union all
        select * from tb_target_ignore minus select * from tb_target
        )
/

The result was always zero rows.

Hypothesis

I raised the idea that the difference in timing for the lead() and lag() functions might have something to do with the volume of data that Oracle could be processing to find the one row it needed.

My thinking was that for the lead() rewrite – the call to nth_value(ed_ucc_dt,1) – would simply be looking at the next row in the partition (if it existed) because my data has no nulls that neeed to be ignored, while the rewrite of the lag() function with its “from last” requirement could be making Oracle re-read the entire preceding section of the partition before starting to process it backwards.

As a possible check to see if this was a viable hypothesis I ran one more test – visible in the initial declaration of tb_source – I created the data with ed_ucc_dt set to null in every single row, so that Oracle would be forced to process from the current position to whichever end of the partition was relevant regardless of whether it was calling lead() or lag().

With this change in place the timing for the lead() only and lag() only statements were nearly identical – which is a weak support for the hypothesis.

And once I’d done that test the next obvious test was to see what happened if I increased size of each partition (using non-null values for ed_ucc_dt) to see if larger partitions would increase the difference between the forward and backward tests. To do this I changed the script to create the tb_source table to produce 5,000 pat_id value with 40 rows per pat_id by changing the where clause to:

where
        g2.id <= 40     --> comment to avoid wordpress format issue
and     g1.id <= 5e3    --> comment to avoid wordpress format issue

With this change in place the timings for the original form of the lead()/lag() statement were:

  • With both lead() and lag() in place the time was 1.05 seconds
  • With neither lead() nor lag() the time was 0.25 seconds
  • With just the lead() column the time was 0.41 seconds – an increase of 0.16 seconds
  • With just the lag() column the time was 0.98 seconds – an increase of 0.73 seconds

So the lag() time (ballpark figures) nearly doubles as the partition size doubles but the lead() time stays pretty much the same.

The results of these two tests do tend to suggest that the generic nth_value() implementation can do some short-circuiting when working “forwards”, using a mechanism that isn’t available when the “from last” clause requires it to work “backwards”.

Writing the previous paragraph prompted me to do one last test – it wouldn’t produce the same results, of course, but I ought to check the performance when I moved the “from last” clause out of the “prev”/lag() column expression into the “next”/lead() column expression in Oracle’s original translation to confirm that the problem was associated with the “from last” and not with the choice of “preceding” or “following” in the row range section of the over() clause. (It was the “from last” that made the difference.)

tl;dr

If you’re using the lag() or lead() functions with “ignore nulls” on a very large dataset you may find that you can rewrite the code with first_value() or last_value() calls that use less CPU. The benefit is probably only significant on fairly large data sets, and may be particularly noticeable for cases where the same over() clause is used many times and the partition sizes are more than a couple of dozen rows each.

The potential for excess CPU usage comes from the effect of a generic internal rewrite using the nth_value() function with the “from last” clause even for the special case where your lead()/lag() have an offset (i.e. n) of 1 which would allow for the use of first_value()/last_value().

To get the correct rewrite you can use dbms_utility.expand_sql() to generate a suitable statement from which you can extract and edit the relevant pieces of text.

Footnote

A further point raised by Solomon Jakobson was that in the generic case (where you can’t use first_value() / last_value() to rewrite the SQL because you want to use an offset greater than one) it’s worth noting that the following two expressions are effectively identical:

        nvl(
                nth_value(a1.ed_ucc_dt, 2) from last ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between unbounded preceding and 1 preceding
                ),
                 null
        ) prev_ed_ucc_dt

        nvl(
                nth_value(a1.ed_ucc_dt, 2)           ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt desc
                        rows between 1 following and unbounded following
                ),
                 null
        ) prev_ed_ucc_dt
/

Note particularly that to avoid the issue of scanning the data “from last”, I’ve changed the ordering (in line 11) to descending and I’ve changed the preceding of line 4 to following in line 12 while reversing the positions of 1 and unbounded. (The same strategy can be applied for the lag() function in the original code).

In general it’s likely that lead() is will be more efficient than lag() so this change could make a big difference in performance. It’s worth noting, however, that if you’ve been using both lead() and lag() as my example did then your execution plan will (truthfully) show two “window sort” operations and the second (i.e. lower number in the plan) window sort will also be sorting the columns added by the first window sort. Depending on the nature of your data the additional sort might increase the workload by more than the benefit you get from eliminating a lag().

In an example created by Solomon Jakobson the effect of using lag( ,5) was catastrophic and the change that introduced the extra sort to get a lead( ,5) made a huge difference (40 seconds down to 0.15 seconds – and I haven’t made a mistake in the decimal point there); in my original test case the improvement I got from applying the same strategy was small (0.88 seconds down to 0.86 seconds). The critical difference between the test cases was that one had a single partition of 20,000 rows, the other had (10,000) partitions of only 20 rows.

(Footnote to the footnote – it did cross my mind that if you had a performance catastrophe with the simple lag(,n) and with the extra sort from reversing the order to use a lead(,n) would Oracle do something clever if you found a way to rewrite the query with a cascade of inline-views that all used last_value() – but I decided I wasn’t going to look at that option until someone paid me to. More seriously I did wonder about the option for using match_recognize to bypass the lag() problem, and may get around to thinking about that more carefully at some point.)

September 5, 2021

Quiz Night

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 6:35 pm BST Sep 5,2021

This little observation came from running a couple of tests while looking at a problem on OTN – hence the odd bit of PL/SQL.

declare
        cursor c1
        is
        select * from t1;

        rec c1%rowtype;
        ch char(1);

begin
        open c1;
        loop
              fetch c1 into rec;
                  exit when c1%notfound;
                  select null into ch from t1 where id  = rec.id;
                  select null into ch from t1 where id  = rec.id;
                  select null into ch from t1 where id  = rec.id;
        end loop;
        close c1;
end;

There is a simple B-tree index on t1(id), and t1 was populated with the values from 1 to 10,000 in order. A query against user_tables (immediately after gathering stats) reported 20 blocks and 10,000 rows. This is NOT an attempt to trick anyone, it’s simple demonstrating a surprising result.

The trace file shows a tablescan for the driving cursor and an index(-only) path for the three identical queries inside the loop.

Question: if you look at the session activity stats (v$mystat) from running this anonymous PL/SQL block what value (ballpark figure) would you expect to see for the following statistics:

table scans (short tables)
table scan rows gotten
table scan blocks gotten

If it’s any help: when I counted rows per block I had 15 blocks with 657 rows and one block with 145 rows. (The 20 blocks reported in the table stats is the number of formatted blocks below the highwater mark, but some may be empty and some may be space management (bitmap) blocks).

Answers

For my test data, block size, and segment space management the results I got for 19.11.0.0 (and they’d be pretty much the same for any version back to 10g) were as follows:

table scans (short tables)                  1
table scan rows gotten              6,495,760
table scan blocks gotten               10,000

Sneak insight of where that 6.5M comes from – I said I had 15 blocks with 657 rows and one with 145 rows:

15 * (657 * 657) + (145 * 145) = 6,495,760

When the PL/SQL loop is doing single row fetches it release the table block after each row, so has to do 10,000 gets on the table – which are all counted towards the “table scan blocks gotten”. Then, to fetch one row from a block the code seems to believe that it has to visit every single row in the block, hence the bit of arithmetic that produced a row scan count of around 6.5M for this “short table” tablescan.

When I ran up this test initially the three indexed accesses in the middle were emulating the code supplied in the original statement of the problem but, as Viacheslav Andzhich noted out in his follow-up to comment #3, they don’t have anything to do with this effect.

We could change from an explicit cursor loop to an implicit cursor loop as follows, of course:

declare
        cursor c1
        is
        select * from t1;

        ch char(10) := '0';

begin
        for rec in c1 loop
                  select null into ch from t1 where id  = rec.id;
                  select null into ch from t1 where id  = rec.id;
                  select null into ch from t1 where id  = rec.id;
        end loop;
end;
/

While you might think this won’t make much (or any) difference you need to remember that quite a long time ago Oracle introduced a PL/SQL optimisation behind the scenes that turns the single row loop into an array fetch of 100 rows (see footnote), so the table scan results I got were as follows:

table scans (short tables)                 1
table scan rows gotten                74,531
table scan blocks gotten                 115

Again the numbers are higher than we might expect but rather more realistic. Again the results are due to double (and treble, and quadruple…) counting at the start of each array fetch, but I haven’t yet tried to work out the exact numbers in detail – though you can see that since I have roughly 7 arrays per block it’s not entirely surprising that the “blocks gotten” should be somewhere in the region of 7 * “data blocks below hwm”.

Addendum

It’s hard to leave out a little detail when the numbers don’t quite fit expectations – so I enabled the 10202 trace at level 4 while running the second loop and checked the resulting trace file for the buffer header dumps as the tablescan took place. (I commented out the 3 indexed accesses before I did this). Here’s a sample of the dump for one buffer access:

buffer tsn: 6 rdba: 0x03400084 (13/132)
scn: 0x82af1e seq: 0x02 flg: 0x04 tail: 0x0201002c
frmt: 0x02 chkval: 0x716f type: 0x06=trans data
Hex dump of corrupt header 4 = CORRUPT
Dump of memory from 0x000000009B0BE000 to 0x000000009B0BE014
09B0BE000 0000A206 03400084 0082AF1E 04020000  [......@.........]
09B0BE010 0000716F                             [oq..]
Consistent read complete...

Once I’d got this trace file I just extracted all the lines starting with “buffer tsn”, did a sort with count on them, and then worked out (and annotated each block address in turn) how each block had contributed to multiples of 100 rows. The results were as follows – and the answer was 115 buffer visits as reported:

grep "^buffer tsn" or21_ora_10141.trc | sort | uniq -c >temp.txt
vi temp.txt

      7 buffer tsn: 6 rdba: 0x03400084 (13/132)              600 + 57
      8 buffer tsn: 6 rdba: 0x03400085 (13/133)         43 + 600 + 14
      7 buffer tsn: 6 rdba: 0x03400086 (13/134)         86 + 500 + 71
      8 buffer tsn: 6 rdba: 0x03400087 (13/135)         29 + 600 + 28
      7 buffer tsn: 6 rdba: 0x03400088 (13/136)         72 + 500 + 85
      8 buffer tsn: 6 rdba: 0x03400089 (13/137)         15 + 600 + 42
      7 buffer tsn: 6 rdba: 0x0340008a (13/138)         58 + 500 + 99
      8 buffer tsn: 6 rdba: 0x0340008b (13/139)          1 + 600 + 56
      8 buffer tsn: 6 rdba: 0x0340008c (13/140)         44 + 600 + 13
      7 buffer tsn: 6 rdba: 0x0340008d (13/141)         87 + 500 + 70
      8 buffer tsn: 6 rdba: 0x0340008e (13/142)         30 + 600 + 27
      7 buffer tsn: 6 rdba: 0x0340008f (13/143)         73 + 500 + 84
      8 buffer tsn: 6 rdba: 0x03400090 (13/144)         16 + 600 + 41
      7 buffer tsn: 6 rdba: 0x03400091 (13/145)         59 + 500 + 98
      8 buffer tsn: 6 rdba: 0x03400092 (13/146)          2 + 600 + 55
      2 buffer tsn: 6 rdba: 0x03400093 (13/147)         45 + 100

To finish off the arithmetic, this also shows 113 visits to blocks with 657 rows and 2 visits to blocks with 145 rows: (113 * 657) + (2 * 145) = 74,531 (as reported by “table scan rows gotten”).

Setup

If you want to do some experimentation on variations, here’s the code I used to create my test data.

rem
rem     Script:         tablescan_joke.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2021
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem
rem     Notes:
rem     When you do a tablescan with single row fetches 
rem     inside a pl.sql loop, evey row fetch is a block 
rem     fetch and the row count increments by the number
rem     of rows in the block.
rem
rem     In 19.3 a call to rowid_count gave me the result
rem     ROWS_STARTING_IN_BLOCK     BLOCKS
rem     ---------------------- ----------
rem                        145          1
rem                        657         15
rem     So total rows =
rem             657 * (10000 - 145) +
rem             145 * 145
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(id);

select  blocks, num_rows 
from    user_tables
where   table_name = 'T1'
;

The code I used to check the number of rows per block is at this URL.

Footnote

In 10g Oracle introduced the plsql_optimize_level to affect the way in which the pl/sql optimizer was allowed to manipulate your code. The default level is 2, and at this level an implicit cursor for loop will use an array fetch size of 100 behind the scenes. If you want to disable this optimisation you need to set the parameter value down to 1 (or zero).

August 5, 2021

Finding SQL

Filed under: dbms_xplan,Oracle,Problem Solving,Troubleshooting — Jonathan Lewis @ 6:24 pm BST Aug 5,2021

There are some questions about performance issues for which there is no easy answer, and sometimes the best you can do is try to generate an approximate answer then examine the results to eliminate the innocent.

Three such problems – essentially similar – appeared recently on the Oracle Developer Forum, and in this note I’ll supply a mechanism that may be a good enough step in the right direction for at least two of them. The questions were:

  • How do I find who’s been using up all the temporary space recently?
  • How do I find the execution plans for all the SQL that is called by a procedure?
  • What SQL is responsible for generating most redo?

A basic (but incomplete) strategy for attacking these questions is to think of a way of to identify the sql_id and child_number for statements that might be contributing to the problem. If you can think of a suitable attack then you can get all the execution plans (or SQL Monitor reports) for those statements and examine them further.

The strategy is incomplete on two counts – first that you won’t get find the perfect set of statements, you’ll get more than you need but still miss some that are relevant; second that you’ll probably have to run secondary queries to get extra details about statements that look like good culprits for the problem you’re trying to solve.

Who’s been using the temporary space

The way this question was actually posed was more like:

I’ve got a query that started crashing with Oracle error “ORA-01652: unable to extend temp segment by 256 in tablespace TEMP”, but when I check the contents of the temporary tablespace (TEMP) there’s plenty of space available. How do I fix this problem.

This often means your query has changed execution plan and picked a very bad join order with some hash (or merge) joins that have dumped huge amount of data to disc; but it may mean that the space was being taken up by some other activity that doesn’t usually happen when you’re running your query.

So if you find that the SQL Monitor report (or simple call to dbms_xplan.display_cursor) makes you think that your query wasn’t doing anything differently a step that may help is to find all the sql in the library cache that might have been using a lot of temporary space.

For temporary space we can always check v$sql_workarea to identify operations from plans ran as one-pass or multi-pass operations, and we can check their most recent “tempseg size” or maximum tempseg size. Hence:

rem
rem     Script:         check_full_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2021
rem
rem     Columns of interest for temporary space:
rem     onepass_executions, multipasses_executions, last_tempseg_size, max_tempseg_size

select  
        distinct sql_id, child_number
from    v$sql_workarea 
where   onepass_executions != 0
or      multipasses_executions != 0
-- or      max_tempseg_size > 1e7
/


select
        distinct
        sql.sql_id, sql.child_number
from
        v$sql           sql,
        v$sql_workarea  war
where
        (   war.onepass_executions != 0
         or war.multipasses_executions != 0
        )
and     sql.sql_id = war.sql_id
and     sql.child_number = war.child_number
and     sql.last_active_time > sysdate - 15/1440
;

The first query here simply scans through the v$sql_workarea structure (which means it will actually thrash its way through the library cache) looking for operations that spilled to disk or (commented out) have used at least some specified amount of memory.

The second variation joins v$sql_workarea to v$sql so that it can restrict the chosen statements to those that were active some time in the last 15 minutes.

Obviously you will be able to think of other ways of tweaking these statements, and once you’ve got a statement expressing a suitable set of criteria you can embed it into a query that calls dbms_xplan.display_cursor() – as I demontrated about 10 years ago – or dbms_sql_monitor.report_sql_monitor() if you’re suitably licensed.

set linesize 230
set trimspool on
set pagesize 90
set tab off

set long 20000


select
        plan_table_output
from    (
        select  
                distinct sql_id, child_number
        from    v$sql_workarea 
        where   onepass_executions != 0
        or      multipasses_executions != 0
        ) v,
        table(dbms_xplan.display_cursor(v.sql_id, v.child_number, 'memstats'))
;


select
        dbms_sql_monitor.report_sql_monitor(
                sql_id             => v.sql_id,
                start_time_filter  => sysdate - 15/1440,
                type               =>'TEXT'
        ) text_line
from    (
        select
                distinct sql.sql_id
        from
                v$sql           sql,
                v$sql_workarea  war
        where
                (   war.onepass_executions != 0
                or war.multipasses_executions != 0
                )
        and     sql.sql_id = war.sql_id
        and     sql.child_number = war.child_number
        and     sql.last_active_time > sysdate - 15/1440
        ) v
/



A couple of points to note. I’ve included the MEMSTATS format option in the call to dbms_xplan.display_cursor() so that it shows some summary information from v$sql_workarea. However this does have a defect, it doesn’t show space used in temp by materialized “with” subqueries (CTEs) – which is where the call to dbms_sql_monitor.report_sql_monitor() helps because if the execution was captured it will show writes to disc in the “LOAD AS SELECT” operation under the TEMPORARY TABLE TRANSFORMATION operation.

Here’s a sample of output I got from the two queries after forcing a nasty plan to do a big hash join that ultimately produced a small result set.

First the output from the query using package dbms_xplan:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------
with ttemp as (  select /*+ materialize */ * from t1 ) select  /*+
no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */
t1a.object_type,  max(t1a.object_name) from  ttemp t1a, ttemp t1b where
 t1a.object_id = t1b.object_id group by  t1a.object_type order by
t1a.object_type

Plan hash value: 1682228242

-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                      | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem |  O/1/M   | Max-Tmp |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                           |      1 |        |     45 |00:00:16.24 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                           |      1 |        |     45 |00:00:16.24 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6645_5FF4FE |      1 |        |      0 |00:00:02.53 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                        |      1 |   1154K|   1154K|00:00:00.55 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                           |      1 |     45 |     45 |00:00:13.72 | 11264 | 11264 |     1/0/0|         |
|*  5 |    HASH JOIN                             |                           |      1 |     18M|     18M|00:00:08.63 |    58M|    12M|          |      71M|
|   6 |     VIEW                                 |                           |      1 |   1154K|   1154K|00:00:01.03 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6645_5FF4FE |      1 |   1154K|   1154K|00:00:01.03 |       |       |          |         |
|   8 |     VIEW                                 |                           |      1 |   1154K|   1154K|00:00:00.49 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6645_5FF4FE |      1 |   1154K|   1154K|00:00:00.43 |       |       |          |         |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


30 rows selected.



Note the absence of any numbers in the Max-Tmp column for operation 2 (highlighted line 18).

Then compare with the results below of the the query using package dbms_sql_monitor:

TEXT_LINE
-----------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
with ttemp as ( select /*+ materialize */ * from t1 ) 
select /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ t1a.object_type, max(t1a.object_name) 
from ttemp t1a, ttemp t1b where t1a.object_id = t1b.object_id 
group by t1a.object_type order by t1a.object_type

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (10:47180)
 SQL ID              :  1cwabt12zq6zb
 SQL Execution ID    :  16777216
 Execution Started   :  08/05/2021 17:24:56
 First Refresh Time  :  08/05/2021 17:25:00
 Last Refresh Time   :  08/05/2021 17:25:12
 Duration            :  16s
 Module/Action       :  MyModule/MyAction
 Service             :  orclpdb
 Program             :  sqlplus@linux183.localdomain (TNS V1-V3)
 Fetch Calls         :  4

Global Stats
================================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes | Reqs  | Bytes |
================================================================================
|      16 |      14 |     2.23 |     4 |  84677 |  986 | 253MB |   866 | 219MB |
================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1682228242)
=====================================================================================================================================================================================================================
| Id |                 Operation                  |           Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Temp  | Activity | Activity Detail |
|    |                                            |                           | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes | (Max) | (Max) |   (%)    |   (# samples)   |
=====================================================================================================================================================================================================================
|  0 | SELECT STATEMENT                           |                           |         |       |         1 |    +16 |     1 |       45 |      |       |       |       |     . |     . |          |                 |
|  1 |   TEMP TABLE TRANSFORMATION                |                           |         |       |         1 |    +16 |     1 |       45 |      |       |       |       |     . |     . |          |                 |
|  2 |    LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6645_5FF4FE |         |       |         5 |     +0 |     1 |        2 |      |       |   351 | 176MB |     . |     . |          |                 |
|  3 |     TABLE ACCESS FULL                      | T1                        |      1M |  3166 |         1 |     +4 |     1 |       1M |      |       |       |       |     . |     . |          |                 |
|  4 |    SORT GROUP BY                           |                           |      45 | 17727 |        11 |     +6 |     1 |       45 |      |       |       |       | 10240 |     . |          |                 |
|  5 |     HASH JOIN                              |                           |     18M |  9804 |        13 |     +4 |     1 |      18M |  561 |  66MB |   561 |  66MB |  59MB |  71MB |          |                 |
|  6 |      VIEW                                  |                           |      1M |  2966 |         1 |     +4 |     1 |       1M |      |       |       |       |     . |     . |          |                 |
|  7 |       TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6645_5FF4FE |      1M |  2966 |         2 |     +3 |     1 |       1M |  351 | 176MB |       |       |     . |     . |          |                 |
|  8 |      VIEW                                  |                           |      1M |  2966 |         3 |     +6 |     1 |       1M |      |       |       |       |     . |     . |          |                 |
|  9 |       TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6645_5FF4FE |      1M |  2966 |         5 |     +4 |     1 |       1M |   74 |  12MB |       |       |     . |     . |          |                 |
=====================================================================================================================================================================================================================


1 row selected.

In this report you can see that the materialization resulted in 176MB of data being written to temp at operation 2 (highlighted line 43), which is actually more than was written to temp as the hash join spilled over from memory.

There are a number of defects in this call to dbms_sql_monitor – including the need for an extra cost license. Most particularly (a) the plan is captured only if the query runs for more than 6 seconds or has a parallel componet and (b) we’re only passing in the SQL_ID with a time limit, so we could get several executions reported. We could refine the inputs, though, by including the sql_plan_hash_value in our query against v$sql. and we might include an end-time filter.

Whatever we do to minimise the number of plans reported the point will almost certainly come where we have to do eyeball the data to see if we can identify the queries which were almost certainly running and using the temp space we needed.

How do I find the execution plans for all the SQL that is called by a procedure?

To be written

What SQL is responsible for generating most redo?

To be written

April 13, 2021

CSV

Filed under: Infrastructure,Oracle,Problem Solving — Jonathan Lewis @ 3:00 pm BST Apr 13,2021

An old question with a little extra twist came up on the Oracle Developer forum recently – someone wanted to export a table into a CSV file but wanted to do a character set conversion at the same time because various “non-printable” characters were getting corrupted.

Looking at the code they had supplied they were running from SQL*Plus and were trying to use a select statement to generate a script that, for each row in a table, produced a statement of the form:

insert into tableX({list of columns}) values({list of values});

This worked reasonably well when they didn’t attempt to get a UTF8 output, but when they modified their client O/S environment to make SQL*Plus produce the desired UTF8 output the operation was much slower, increasing from about 1 minute to more than half an hour. (This may simply be the overhead of SQL*Net translating each line of output from the server character set to the client character set.)

Since the system in question was running on 12.2 I suggested a change of strategy so that they could at least take advantage of a built-in CSV generator to see how this behaved. Oracle extended the “set markup” command in 12c to include a CSV option:

set markup csv on delimiter {character} quote on

If this behaves nicely with the user’s attempt to get the output in their chosen characterset then a change in strategy to generate a “proper” CSV file and reload it through an external table, rather than generating and executing a list of “insert ….” statements might do a lot less work. (Unfortunately they had a requirement to generate the insert statements rather than adopting this bulk strategy – but I wasn’t going to let that stop me finishing this note.)

One detail to note with this markup option is that, unlike the “delimiter” character, the “quote” character cannot be specified and will always be the double-quote (ASCII character 34). If this character appears in any of the reported strings Oracle will “escape” it by duplicating it. Here’s a little example to demonstrate the effects – first a little data with a basic output (running 12.2.0.1):

rem
rem     Script:         markup.csv.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem

create table t1 (v1 varchar2(10), n1 number, n2 number, v2 varchar2(10));

insert into t1 values('abcdef',1,1,'qwer');
insert into t1 values('abc"def',2,null,'asdf');
insert into t1 values (null,3,3,null);

commit;

prompt  ==================
prompt  markup not yet set
prompt  ==================

select * from t1;


V1                 N1         N2 V2
---------- ---------- ---------- ----------
abcdef              1          1 qwer
abc"def             2            asdf
                    3          3

3 rows selected.

I’ve included one row with a double-quote in one of the columns, and a couple of rows with NULLs in various columns. Now I just switch markup to CSV using its default options:

prompt  ==================
prompt  Default CSV markup
prompt  ==================

set markup csv on
select * from t1;


"V1","N1","N2","V2"
"abcdef",1,1,"qwer"
"abc""def",2,,"asdf"
,3,3,

3 rows selected.


As you can see this has behaved nicely – every column is present (even when empty – judging by the number of delimiters) and character strings have been quoted with the double-quote character, and the stored double-quote in the second row has been duplicated. The default option for CSV should be nearly perfect for use as an extrernal table – though we probably want to get rid of column headings, page breaks, and feedback etc.

Before creating a suitable external table, let’s play around with a couple of variations for the options:

prompt  ===========
prompt  delimiter =
prompt  ===========

set heading off
set pagesize 0
set feedback off

set markup csv on delimiter =
select * from t1;

"abcdef"=1=1="qwer"
"abc""def"=2=="asdf"
=3=3=

prompt  ================================
prompt  quote off (delimiter remembered)
prompt  ================================

set markup csv on quote off
select * from t1;

abcdef=1=1=qwer
abc"def=2==asdf
=3=3=

Now to read back one of the outputs – I’ll pick the default output for this (but with headers etc. off to leave just three lines of data). Here’s how I can define my external table to read back the values from the file:

create or replace directory ext_tab as '/mnt/working/';
create or replace directory ext_log as '/mnt/working/';

create table ext(
        v1 varchar2(10), 
        n1 number, 
        n2 number, 
        v2 varchar2(10)
)
organization external(
        type oracle_loader
        default directory ext_tab
        access parameters (
                records delimited by newline 
                discardfile     ext_log:'exttab_dis.txt'
                badfile         ext_log:'exttab_bad.txt'
                logfile         ext_log:'exttab_log.txt'
        fields 
                csv 
                with embedded
        missing field values are null
        )
        location ('markup_csv.lst')
)
reject limit 10
;

select * from ext;

V1                 N1         N2 V2
---------- ---------- ---------- ----------
abcdef              1          1 qwer
abc"def             2            asdf
                    3          3

3 rows selected.

I’ve created a couple of directories to get started – linking a physical location to a logical name that I can use in the external table definition.

The key details that allow Oracle to read the file correctly appear in the “fields” specification: it’s a pity that the output from the default CSV markup doesn’t become the default expectation for the external table CSV option, but I needed to use the “with embedded” to adjust the duplicated “double-quote” characters and then add “missing field values are null” to cope with a null column at the end of the line.

Summary

With a simple call to “set markup csv on” (and a handful of other set {formatting} commands) it’s very easy to turn a table into a flat CSV file that is comma separated, (double)quote deliminted.

It’s then very easy – but requires two non-default settings – to read this flat file back into an Oracle database using “fields CSV” as the basic formatting directive.

Footnote

I’ve kept the example very simple, but as soon as you start dealing with dates and timestamps – and even very large numbers – you’ll have to worry about formatting when generate the flat file (set nls_date_format, column n1 format 99999, etc.) to make sure you don’t lose precision on the output; and for dates and timestamps you’ll have to include the equivalent date_format options in the external table definition.

I’ve only demonstrated how to reload the file back into Oracle as an external table. If you wanted to use this markup option to prepare the data for an Excel spreadsheet all you have to do is spool to a file with the “.csv” extension and Excel will be able to open the output file and parse the data correctly into a multi-column display. (But see comment #1 below for a warning from Jason Bucata.)

March 12, 2021

Distributed Sequences

Filed under: distributed,Execution plans,Oracle,Performance,Problem Solving — Jonathan Lewis @ 9:09 am GMT Mar 12,2021

A request for help came up some time ago on ODC reporting a query that was hanging when it included a sequence.nextval. In fact the intial “query” was an “insert as select” with a select that was a join of two remote tables.

Making the fairly automatic assumption that many people say “hanging” when they really mean “hasn’t finished yet” the first thought I had about the structure of the statement was that it was just a variation of the standard problem of distributed DML. I haven’t written anything previously about how using sequences can introduce the problem so here’s a note to demonstrate the issue and suggest a workaround:

We start with a database link and a few tables:


rem
rem     Script:         distributed_sequence.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1
rem

rem     create public database link orcl@loopback using 'orcl';

define m_target=orcl@loopback

create sequence s1 cache 10000;
select s1.nextval from dual;

create table t1
segment creation immediate
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n2,
        lpad(rownum,10,'0')             v2,
        lpad('x',100,'x')               padding
from
        generator       v1
;

create table t3(
        id1     number(6,0),
        id2     number(6,0),
        n0      number(6,0),
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(10),
        v2      varchar2(10)
)
segment creation immediate
;

create or replace view v1 as
select
        t1.id id1,
        t2.id id2,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1, t2
where
        t2.id = t1.id
;

The command to create a public database link (one example of the many optional commands in my original source) has to be run by a suitably privileged schema as a one-off event. The definition of the substitution variable m_target that I’ll be using as my database link (again with many possibilities in my original script) means I only have to edit my choice of database link once in my script as I change Oracle instances. You’ll notice I’ve done my usual trick of using a loopback database link to emulate a distributed system.

I’ve then created two populated tables (t1, t2) and a third empty table (t3) that will be the target of an insert. I’ve also created a view (v1) that joins the first two tables and a sequence (s1) that I’ve primed with a single call to nextval. When I get to the tests I’ll be using t1, t2 and v1 as if they had been created in the remote database (referenced through my loopback database link) while t3 and s1 will be local objects.

So let’s run a couple of statements and see what their execution plans look like:


set serveroutput off

prompt  =======================
prompt  Insert without sequence
prompt  =======================

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        t1.id,
        t2.id,
        0,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1@&m_target    t1,
        t2@&m_target    t2
where
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));

prompt  =======================
prompt  Insert WITH sequence
prompt  =======================

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        t1.id,
        t2.id,
        s1.nextval,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1@&m_target    t1,
        t2@&m_target    t2
where
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;

The code joins t1 and t2, selects a couple of columns and inserts then into t3 without or with a call to s1.nextval (the local sequence). Here’s the output (with minor cosmetic changes) from executing this code under 19.3.0.0 (the effects are the same in 18.3.0.0 and 12.2.0.1):


=======================
Insert without sequence
=======================

10000 rows created.


SQL_ID  373nz810u3frv, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
0,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback t1,
t2@orclpdb@loopback t2 where  t2.id = t1.id

-----------------------------------------------------------------
| Id  | Operation                | Name | Cost  | Inst   |IN-OUT|
-----------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |        |      |
|   2 |   REMOTE                 |      |       | ORCLP~ | R->S |
-----------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - SELECT "A2"."ID","A1"."ID",0,"A2"."N1","A1"."N2","A2"."V1","A1"."
       V2" FROM "T1" "A2","T2" "A1" WHERE "A1"."ID"="A2"."ID" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - cpu costing is off (consider enabling it)


=======================
Insert WITH sequence
=======================

10000 rows created.


SQL_ID  8jg23arujnh01, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
s1.nextval,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback
t1,  t2@orclpdb@loopback t2 where  t2.id = t1.id

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    54 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|*  3 |    HASH JOIN             |      | 10000 |   937K|    54   (8)| 00:00:01 |        |      |
|   4 |     REMOTE               | T1   | 10000 |   468K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
|   5 |     REMOTE               | T2   | 10000 |   468K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"="T1"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","N1","V1" FROM "T1" "A2" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

   5 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","N2","V2" FROM "T2" "A1" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

The key thing to notice is that when we want to insert the local sequence as a column in the select list Oracle breaks the hash join into two separate accesses to the remote database and pulls all the data we might need from the two tables before trying to join them locally. When the query is “fully remote” the local database can let the remote database deal with the join, when the query is distributed – which is a side effect of introducing the sequence – the local site becomes the driving site and has to work out the least worst way of handling the join, which might be much slower than the remote join.  (It’s an odd little quirk that when the select is fully remote the optimizer thinks that it’s not using CPU costing. Possibly that’s because all the arithmetic happens somewhere else and the local cost of the query never gets above zero.)

This is one of those cases where we might work around the problem by creating a remote view to handle the join – hence the creation of the view v1; here’s a suitable statement and the resulting execution plan:

prompt  ==============================
prompt  Insert using view and sequence
prompt  ==============================

set serveroutput off

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        v1.id1,
        v1.id2,
        s1.nextval,
        v1.n1,
        v1.n2,
        v1.v1,
        v1.v2
from
        v1@&m_target v1
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;



==============================
Insert using view and sequence
==============================

10000 rows created.


SQL_ID  4tz0rrqt87nb8, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  v1.id1,  v1.id2,
s1.nextval,  v1.n1,  v1.n2,  v1.v1,  v1.v2 from  v1@orclpdb@loopback v1

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    27 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|   3 |    REMOTE                | V1   | 10000 |   937K|    27   (8)| 00:00:01 | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID1","ID2","N1","N2","V1","V2" FROM "V1" "V1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

As you can see, the local optimizer doesn’t know enough about the remote view to be able to split it into components and make a mess of the execution plan, it simply sees a query against a “single table” and leaves the remote database to worry about optimising it. But, of course, we don’t always have the luxury of being able to create objects on someone else’s database, so what’s the alternative?

Try rewriting the query to use an inline view with the /*+ no_merge() */ hint:


prompt  =======================================
prompt  Insert from a no-merge inline view with
prompt  a sequence.nextval in the outer query
prompt  ========================================

set serveroutput off

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        id1, id2, s1.nextval, n1, n2, v1, v2
from    (
        select  /*+ no_merge */
                t1.id   id1,    
                t2.id   id2,
                0       n0,
                t1.n1,
                t2.n2,
                t1.v1,
                t2.v2
        from
                t1@&m_target    t1,
                t2@&m_target    t2
        where
                t2.id = t1.id
        )       v1
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;


=======================================
Insert from a no-merge inline view with
a sequence.nextval in the outer query
========================================

10000 rows created.


SQL_ID  20z81g550tbsk, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  id1, id2,
s1.nextval, n1, n2, v1, v2 from (  select /*+ no_merge */   t1.id id1,
 t2.id id2,   0 n0,   t1.n1,   t2.n2,   t1.v1,   t2.v2  from
t1@orclpdb@loopback t1,   t2@orclpdb@loopback t2  where   t2.id = t1.id
 ) v1

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    54 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|   3 |    VIEW                  |      | 10000 |   937K|    54   (8)| 00:00:01 |        |      |
|   4 |     REMOTE               |      |       |       |            |          | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   4 - SELECT /*+ NO_MERGE */ "A2"."ID","A1"."ID",0,"A2"."N1","A1"."N2","A2"."V1","A1"."V
       2" FROM "T1" "A2","T2" "A1" WHERE "A1"."ID"="A2"."ID" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )


The hint has done exactly what we needed: it has passed the text of the inline view to the remote database for optimisation so the join happens remotely, and the sequence number is then included after the result set comes back from the remote database. The SQL is a little messier, of course, mostly thanks to the doubled appearance of the columns in the select list.

Summary

Using a local sequence in DML that accesses a remote database makes the optimizer treats the underlying query as a distributed query, and this may mean it can’t find an efficient execution path unless you do some re-engineering of the code. If you can manage to make an insert with a constant efficient then using that version of the code as an in-line no_merge view with one extra layer that brings the sequence into play may be all you need to do to make the DML operate efficiently.

November 26, 2020

Row sizes 3

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 12:08 pm GMT Nov 26,2020

Several years ago I published a couple of examples of scripts that I’d been using since Oracle 6 (with minor changes over time) to analyse the content of tables in some detail. The first was a breakdown of the lengths of the rows in the table, the second was a map showing the distribution of nulls in the rows. There used to be a third component of the analysis which produced a report of the number of non-null entries for each column in the table,  but by the time I published the first two parts there didn’t seem to be much point in publishing the third since you could get a sufficiently accurate picture by querying view user_tab_columns after gathering stats:


break on num_rows

select
        tab.num_rows, col.column_name , col.num_nulls
from 
        user_tables             tab
join
        user_tab_columns        col
on
        col.table_name = tab.table_name
where
        tab.table_name = 'T1' 
order by 
        col.num_nulls desc, col.column_id
/


  NUM_ROWS COLUMN_NAME           NUM_NULLS
---------- -------------------- ----------
     56777 EDITION_NAME              56777
           CREATED_APPID             56777
           CREATED_VSNID             56777
           MODIFIED_APPID            56777
           MODIFIED_VSNID            56777
           SUBOBJECT_NAME            56570
           DATA_OBJECT_ID            55353
           DEFAULT_COLLATION         51058
           EDITIONABLE               40216
           OWNER                         0
           OBJECT_NAME                   0
           OBJECT_ID                     0
           OBJECT_TYPE                   0
           CREATED                       0
           LAST_DDL_TIME                 0
           TIMESTAMP                     0
           STATUS                        0
           TEMPORARY                     0
           GENERATED                     0
           SECONDARY                     0
           NAMESPACE                     0
           SHARING                       0
           ORACLE_MAINTAINED             0
           APPLICATION                   0
           DUPLICATED                    0
           SHARDED                       0


In this example t1 is a copy of view all_objects and I’ve listed the columns in order of most frequently null with the table’s num_rows reported at the top for comparison purposes. If you were going to do something that made it a good idea to rebuild the table you could use this report as a guide to re-arranging the column order to position the most frequently null columns towards the end of the row (saving the “trailing nulls take no space” length bytes).

This approach depends on the stats being up to date and accurate, of course, so if you didn’t have them, and didn’t want to collect them, another strategy would be to run a query like the following:


select
        to_char(count(OWNER),'999,999,999,999,999') OWNER,
        to_char(count(OBJECT_NAME),'999,999,999,999,999') OBJECT_NAME,
        to_char(count(SUBOBJECT_NAME),'999,999,999,999,999') SUBOBJECT_NAME,
        to_char(count(OBJECT_ID),'999,999,999,999,999') OBJECT_ID,
        to_char(count(DATA_OBJECT_ID),'999,999,999,999,999') DATA_OBJECT_ID,
        to_char(count(OBJECT_TYPE),'999,999,999,999,999') OBJECT_TYPE,
        to_char(count(CREATED),'999,999,999,999,999') CREATED,
        to_char(count(LAST_DDL_TIME),'999,999,999,999,999') LAST_DDL_TIME,
        to_char(count(TIMESTAMP),'999,999,999,999,999') TIMESTAMP,
        to_char(count(STATUS),'999,999,999,999,999') STATUS,
        to_char(count(TEMPORARY),'999,999,999,999,999') TEMPORARY,
        to_char(count(GENERATED),'999,999,999,999,999') GENERATED,
        to_char(count(SECONDARY),'999,999,999,999,999') SECONDARY,
        to_char(count(NAMESPACE),'999,999,999,999,999') NAMESPACE,
        to_char(count(EDITION_NAME),'999,999,999,999,999') EDITION_NAME,
        to_char(count(SHARING),'999,999,999,999,999') SHARING,
        to_char(count(EDITIONABLE),'999,999,999,999,999') EDITIONABLE,
        to_char(count(ORACLE_MAINTAINED),'999,999,999,999,999') ORACLE_MAINTAINED,
        to_char(count(APPLICATION),'999,999,999,999,999') APPLICATION,
        to_char(count(DEFAULT_COLLATION),'999,999,999,999,999') DEFAULT_COLLATION,
        to_char(count(DUPLICATED),'999,999,999,999,999') DUPLICATED,
        to_char(count(SHARDED),'999,999,999,999,999') SHARDED,
        to_char(count(CREATED_APPID),'999,999,999,999,999') CREATED_APPID,
        to_char(count(CREATED_VSNID),'999,999,999,999,999') CREATED_VSNID,
        to_char(count(MODIFIED_APPID),'999,999,999,999,999') MODIFIED_APPID,
        to_char(count(MODIFIED_VSNID),'999,999,999,999,999') MODIFIED_VSNID,
        to_char(count(*),'999,999,999,999,999') row_count 
fromi
         t1
;

You don’t really need the to_char() function, but it’s a conveience for what I’m going to do with the SQL. Obviouslty it would be a bit tedious to create this statement by hand so, as I did in the previous “rowsize” notes, I’ve written some code to generate it for me:


rem
rem     row_size_2b.sql
rem     
rem     Generate SQL to report counts of 
rem     non-null columns in a table.
rem

set linesize 32000
set feedback off
define m_table = '&1'

declare
        m_string        varchar2(32000) := 'select ';
        m_cursor        sys_refcursor;

begin
        for r in (
                select 
                        column_name, data_type
                from    user_tab_columns
                where   table_name = upper('&m_table')
        ) loop
                m_string := m_string ||  
                                ' to_char(count(' || trim(r.column_name) || '),''999,999,999,999,999'') ' ||
                                trim(r.column_name) || ',' || chr(10) ;
        end loop;

        m_string := m_string || ' to_char(count(*),''999,999,999,999,999'') row_count from &m_table';

--      dbms_output.put_line(m_string);
        print_table(m_string);

end;
/

set linesize 156

The script accepts a table name from the user’s schema (you could edit this to query dba_tab_columns), constructs a string (as shown above – though I’ve done a little cosmetic work on it), and then passes the resulting string to a verion of Tom Kyte’s print_table() routine which produces the following output:


OWNER                         :               56,777
OBJECT_NAME                   :               56,777
SUBOBJECT_NAME                :                  207
OBJECT_ID                     :               56,777
DATA_OBJECT_ID                :                1,424
OBJECT_TYPE                   :               56,777
CREATED                       :               56,777
LAST_DDL_TIME                 :               56,777
TIMESTAMP                     :               56,777
STATUS                        :               56,777
TEMPORARY                     :               56,777
GENERATED                     :               56,777
SECONDARY                     :               56,777
NAMESPACE                     :               56,777
EDITION_NAME                  :                    0
SHARING                       :               56,777
EDITIONABLE                   :               16,561
ORACLE_MAINTAINED             :               56,777
APPLICATION                   :               56,777
DEFAULT_COLLATION             :                5,719
DUPLICATED                    :               56,777
SHARDED                       :               56,777
CREATED_APPID                 :                    0
CREATED_VSNID                 :                    0
MODIFIED_APPID                :                    0
MODIFIED_VSNID                :                    0
ROW_COUNT                     :               56,777

-----------------

1 rows selected

As with many of the scripts I’ve pubished, the task is one I rarely do, so I haven’t bothered to add in all the extra little refinements that might make it foolproof and totally self-contained. In this case, for example I’d have to go through a final couple of manual steps to edit the output by deleteing the last few lines then (since I use vi) issuing “!%!sort -k3 -n -r” which would sort the results in reverse order, numerically, according to the third field.

 

print_table()

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 9:40 am GMT Nov 26,2020

Many years ago Tom Kyte published a small PL/SQL procedure to output each row in a table as a list of (column_name , value). I often use the original version, even though Tom refined it several times over the years. From time to time the links in my blog that I’ve set up to point to Tom’s code have simply stopped working so I’ve decided to publish a reference copy on my blog in case newer readers need to find a copy and the current link above stops  working.

rem
rem     Script:         print_table.sql
rem     Dated:          Nov 2001
rem     Author:         Tom Kyte - from Expert One on One Oracle.
rem
rem     Purpose:
rem     Prints the results of a query one column per line.
rem
rem     Created as a 'authid current_user' so that it can
rem     access any table that the caller is allowed to access
rem     from the SQL*Plus prompt.
rem
rem     A suitably privilefed user could also make this publicly
rem     available through the public synonym and grant at the end
rem     of the script
rem
rem     Minor modifications by Jonathan Lewis to separate rows
rem     in the result set, and report the number of rows returned
rem

create or replace procedure print_table( p_query in varchar2 )
authid current_user
is
        l_theCursor     integer default dbms_sql.open_cursor;
        l_columnValue   varchar2(4000);
        l_status        integer;
        l_descTbl       dbms_sql.desc_tab;
        l_colCnt        number;
        l_rowcnt        number := 0;
begin
        execute immediate 'alter session set nls_date_format=''dd-mon-yyyy hh24:mi:ss'' ';

        dbms_sql.parse(  l_theCursor,  p_query, dbms_sql.native );
        dbms_sql.describe_columns( l_theCursor, l_colCnt, l_descTbl );

        for i in 1 .. l_colCnt loop
                dbms_sql.define_column(l_theCursor, i, l_columnValue, 4000 );
        end loop;

        l_status := dbms_sql.execute(l_theCursor);

        while ( dbms_sql.fetch_rows(l_theCursor) > 0 ) loop
                for i in 1 .. l_colCnt loop
                        dbms_sql.column_value( l_theCursor, i, l_columnValue );
                        dbms_output.put_line(
                                rpad( l_descTbl(i).col_name, 30 )
                                || ' : ' || l_columnValue
                        );
                end loop;
                dbms_output.new_line;
                dbms_output.put_line( '-----------------' );
                dbms_output.new_line;
                l_rowcnt := l_rowcnt + 1;
        end loop;

        dbms_output.put_line(l_rowcnt || ' rows selected');

        execute immediate 'alter session set nls_date_format=''dd-mon-rr'' ';

exception
        when others then
                execute immediate 'alter session set nls_date_format=''dd-mon-rr'' ';
        raise;
end;
/

-- create public synonym print_table for print_table;
-- grant execute on print_table to public;

The reason I’m publishing this now is that I’m about to publish the third variant of a piece of code I wrote many years ago, and first blogged about in 2012.

October 5, 2020

Direct Path

Filed under: Oracle,Performance,Problem Solving,RAC,Troubleshooting,Tuning — Jonathan Lewis @ 11:29 am BST Oct 5,2020

Here’s a note that I might have written once already – but I can’t find it and I’ve just been reminded about what it (might have) said by a posting that came up on the Oracle database forum in the last few days.

The posting in question is asking why, after setting the hidden parameter _serial_direct_read to ‘always’ a particular query is now taking hours to complete when it used to complete in a minute or so.

The answer is partly “because you’ve forced direct path serial reads”, partly “because you’re running on RAC” and (most directly) because the optimizer is using a really bad execution plan for that query and the direct path reads have had a massive impact as a consequence. (It turns out, after modelling, that the answer might also include “because you’re running 11.2.0.4”)

I’m going to demonstrate the issue by forcing a very simple query to take a very bad execution plan.

rem
rem     Script:         ko.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem 

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

create table t2
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

select  table_name, blocks 
from    user_tables
where   table_name in ('T1','T2')
;

alter system flush buffer_cache;

execute snap_events.start_snap
execute snap_my_stats.start_snap
alter session set "_serial_direct_read"=always;


select
        /*+ 
                leading(t1 t2)
                use_nl(t2)
                full(t2)
        */
        t1.object_type,
        t2.object_type
from
        t1, t2
where
        t2.object_id = t1.object_id + 0.5
;

execute snap_my_stats.end_snap
execute snap_events.end_snap

alter session set "_serial_direct_read"=auto;


My query very carefully ensures that it’s not going to return any rows; but it’s going to do a lot of work finding no data because I’ve forced Oracle into doing a tablescan of t2 for every row in t1 – so 10,000 scans of a table of 140 – 190 (depending on version) data blocks.

The snap_my_stats and snap_events packages are a couple of my simple diagnostic packages that allows me to find the change in some v$ content between the start and end snapshots. In this case it’s v$mystat and v$session_event for the session.

On a test using 11.2.0.4 the query ran for about 41 seconds with the following wait events reported:

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.11        .054           1
direct path read                                  6,527           0         368.30        .056           6
SQL*Net message to client                            13           0           0.00        .000           0
SQL*Net message from client                          13           0      10,689.26     822.251      10,689

The 6,500 “direct path read” waits corresponded to 1.33M “physical reads direct” reported in the session activity stats. Although the t2 table was fairly small Oracle was forced to use direct path reads for every single cycle through the nested loop. As a quick comparison, here are the figures if I don’t force direct path serial scans.

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.09        .047           2
db file scattered read                               34           0           2.45        .072           0
SQL*Net message to client                            11           0           0.01        .001           0
SQL*Net message from client                          11           0         174.36      15.851      82,849



We’ve waited for only 34 “db file scattered reads” and 2.45 centiseconds as we read the t2 (and the t1) tables into the cache for the first time, but then we’ve been able to revisit the blocks in the cache. We also saw a reduction in CPU usage and the total run time dropped from 41 seconds to about 22 seconds.

In this tiny example it hasn’t made a staggering difference to the overall run time, but the OP wasn’t that lucky with his “couple of minutes” to “hours”.

If you look at the fragment of the SQL Monitor report supplied by the OP you’ll see that they have an operation which shows:

=====================================================================================================================================================================================================================================
| Id    |                  Operation                    |            Name              | Rows    | Cost |  Time     |  Start | Execs |    Rows  | Read | Read  | Cell   | Mem | Activity |            Activity Detail               |
|       |                                               |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload |    |    (%)   |              (# samples)                 |
=====================================================================================================================================================================================================================================
| -> 25 |       INDEX STORAGE FAST FULL SCAN            | TMRC_IX1                     |      1  |      |     10745 |     +0 |  268K |     112K | 267K |   2GB |  96.53% | 1M |    99.74 | enq: KO - fast object checkpoint (4783)  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | Cpu (753)                                |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | gcs drm freeze in enter server mode (25) |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | latch free (1)                           |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | reliable message (3505)                  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | cell smart index scan (1635)             |
=====================================================================================================================================================================================================================================

The “index fast full scan” is an example of a “segment scan” and will be subject to direct path reads, just like a tablescan. We don’t really know how big this index is – but we can see that we have read it 268K times (Execs column) reading approximately 2GB after 267K read requests. This suggests the index is at most 1MB in size since it can be read in a single read request, and may consist of only one populated block (2,000,000,000/267,000 = 7,490 bytes. Despite this small size the total time sampled for all these scans is more than 10,600 seconds – roughly 39 millisecond per scan. That’s not very good.

Looking more closely at the sampled time we notice 3 key features:

============================================
|            Activity Detail               |
|              (# samples)                 |
============================================
| enq: KO - fast object checkpoint (4783)  |
| Cpu (753)                                |
| gcs drm freeze in enter server mode (25) |
| latch free (1)                           |
| reliable message (3505)                  |
| cell smart index scan (1635)             |
============================================
  • A massive fraction of the time was spent on “enq: KO – fast object checkpoint”
  • A large fraction of the time was spent on “reliable message”
  • There was some time (relatively small, but large in absolute terms) for “gcs drm freeze …”

The last of these three is about global cache services, distributed resource manager” and is a clear indication that the system is running RAC and we have some hefty competition for “object mastering” between instances. But the actual time lost there is relatively small – though associated chatter between instances could be significant.

But what’s the “KO enqueue”? Every time an instance starts a direct path segment scan it has to get a message to the database writer (possibly via the checkpoint process) – hence the “reliable message” waits – to copy every dirty block for that segment from the buffer cache down to disc and it has to wait for the write to complete. This is necessary to ensure that the tablescan doesn’t miss any changes that have been made in memory without yet being written to disc.

The KO enqueue synchronises this activity – I haven’t worked out the complete chain of events, but the enqueue is negotiated between the session and the checkpoint process – and if you’re running RAC every instance has to write any dirty blocks it is holding for the segment, so you have to have a degree of cross-instance chatter to make this happen.

Thanks to the enforced serial direct reads the OP’s plan – which, surely, shouldn’t expect to do 267K index fast full scans – has a massive overhead thanks to the need for the repeated object checkpoints.

You may ask, at this point, why I didn’t see any KO enqueue waits in my test results – the answer is simple, I’d flushed the buffer cache before I started the test, so there were no dirty blocks for the session to worry about. Let’s see what happens if I introduce a little activity to keep dirtying a few blocks in the t2 table. Here’s a little loop that will update a few rows once per second:

begin
        for i in 1..1 loop
                update t2 set data_object_id = 0 where mod(object_id,1000) = i;
                dbms_output.put_line(sql%rowcount);
                commit;
                dbms_lock.sleep(1);
        end loop;
end;
/

You’ll notice the code runs through the loop just once – I started with a loop count of 30, and discovered it wasn’t necessary, but the option remains for further testing.

If I execute this anonymous block (which updates about 10 rows each time through the loop) from another session just after SQL*Plus reports my “alter session”, so that it runs just after the query starts, this is what the session event report looks like if I run the test against 11.2.0.4 (the version reported by the OP):

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
enq: KO - fast object checkpoint                  7,645           0         290.34        .038           6
db file sequential read                               2           0           0.13        .066           2
direct path read                                 10,714           0         675.50        .063           6
SQL*Net message to client                            14           0           0.00        .000           0
SQL*Net message from client                          14           0       1,556.23     111.160     101,653
events in waitclass Other                         5,607           0         218.04        .039           8


Suddenly we see a lot of time spent on the KO enqueue waits and the “events in waitclass Other” (which turn out to be “reliable message” waits). Apparently the session keeps finding dirty t2 blocks in the cache and telling the database writer they need to be written to disc before the next tablescan of t2 can take place.

There’s something odd here, though and I’ll introduce it with this comment: when I repeated the test on 19.3 (even with with the constant trickle of updates once per second), we only see a tiny number of KO enqueues and reliable message waits – the fact that we see a huge number of them in 11g is a defect in the design of the code.

Think about what’s happening with the KO enqueue: when you start the first tablescan of t2 you force every dirty block for that segment to be copied from the cache to the disc.

As the tablescan proceeds you may have to apply some undo change vectors to the blocks you’re reading to take them back to the SCN as at the start of query execution, but you know that any data that had been committed before the query started is on disc (even if it has been over-written by committed changes made after the query started, or by uncommitted changes made before the query started). What’s on the disc right now will be usable to get the correct read-consistent version of the data for the duration of the query run, no matter how many newer changes are made, whether or not they over-write the disc blocks before the query ends. There is no need to force write any dirty blocks as the tablescan is repeated and, it seems, by 19.3 the code has been adjusted to accomodate that fact.

Footnote

After I had posted this comment on the forum, the OP raised the question of whether or not the fix might apply to 12c as well – so I ran up a VM of 12.1.0.2 and 12.2.0.1 and re-ran the tests. The results were initially promising – neither version reported an extreme number of KO enqueue waits or reliable message waits.

However when I changed the loop counter from 1 back to 30 I found that the waits re-appeared – though the numbers were significantly less than those from 11g – so perhaps there’s a timing element involved that might need further investigation and stress testing even for 19.3.

Footnote 2

Another snapshot I took in testing was from the v$enqueue stats – which showed that (approximately) for every KO enqueue wait my session reported, the instance reported about 10 – 12 KO enqueue requests.

When checking v$enqueue_stats it’s important to remember that session activity stats (v$sesstat) report both “enqueue requests” and “enqueue conversions”. In v$enqueue_stats the conversions aren’t reported separately the view simply adds the two figures together under “requests”.

In the 11g test the session reported 7,645 KO enqueue waits, but the session activity stats reported 19,399 enqueue requests and 38,796 enqueue conversions; while v$enqueue_stats reported 96,990 KO enqueue requests. The remaining 38,796 KO enqueue requests were made by the checkpoint process (CKPT) – and it was only my session that repoted any waits for KO enqueue requests.

Without further low-level investigation this is what leads me to believe that the session sends CKPT a message that it wants an object-level checkpoint performed and waits for the message to be acknowledged (reliable message) before trying to convert a low-level KO enqueue to an exclusive one. But CKPT has acquired and converted the same KO enqueue before acknowledging the message from the session and will only release the enqueue when the database writer acknowledges that the checkpoint request has been completed. So this mechanism ensures that the session will have to wait until the checkpoint is complete and CKPT has released the enqueue before it can continue processing.

(P.S. If you enable event 10704 you will find that there seem to be two KO enqueues involved – one with id2 = 1, one with id2 = 2.)

August 31, 2020

Tracing Errors

Filed under: Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 10:16 am BST Aug 31,2020

This is a little lesson in trouble-shooting. It assumes you have the privilege to generate and edit trace files, and all I’m going to do is show how I worked out the answer to a fairly simple question that appeared recently on the Oracle Developer Community forum in a thread with the title  Cannot drop table after start dropping unused columns checkpoint.

I have a table t1 which is reasonably large (1M rows) with a column v30, and I’ve issued the command.

rem
rem     Script:         drop_column.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2003
rem

alter table t1 set unused column v30;

After this I’ve issued another command, pressed return, and immediately hit ctrl-C – I’ve got a brief window for this interrupt as the command is going to generate roughly 230MB of redo. If you want to try the experiment it might take a couple of attempts to get the timing right.

alter table t1
        drop unused columns
        checkpoint 1000
/
{ctrl-C}

Then I’ve tried to drop the table with the following result:

drop table t1
           *
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-12986: columns in partially dropped state. Submit ALTER TABLE DROP COLUMNS CONTINUE

This emulates the problem that appeared on the forum but the OP didn’t really want to issue the “continue” command because their table was “large” and the drop had been running for 24 hours when it was interrupted. It’s worth noting that four columns had been specified as unused, which means the drop command was probably generating roughly 1KB of redo per row. It’s also worth mentioning that the table was 600 columns wide – so there may have been a few performance side effects due to the multiple (minimum 3) row-pieces per row and row-chaining.

Ignoring any questions about the possible impact of having 600 columns, the key question in this case is:

  • Why isn’t it possible to drop the table (the manuals suggest it should be possible at this point)
  • Is there a way to bypass the problem?

This is a repeatable error – we can try to drop the table as many times as we like and we will get the same error reported in fractions of a second. so an obvious strategy is to enable tracing and see if the trace file can tell us anything about why the error is happening. This is a particularly sensible strategy to follow since error ORA-00604 is telling us that there’s something wrong in the recursive SQL, which means there’s a very good chance that we will actually find an SQL statement in the trace file that is the rescursive statement triggering the error.

So, enable sql_trace (or set event 10046), or do whatever you like to do to enable basic tracing (no need for anything above level 1 just yet); try to execute the drop; then search the trace file for the word ERROR at the start of a line (“^ERROR”). Here’s what I found as the first match in my trace file:

ERROR #139987889121800:err=12986 tim=424276176462

Note that the err= value is the 12986 that was reported as the error under the ORA-00604 error. Sometimes it’s necessary to search backwards in the trace file until you find the matching cursor number (#139987889121800), but in this case it was already visible just a few lines further up the file. So here’s the fragment of the file around that ERROR line:


PARSING IN CURSOR #139987889121800 len=72 dep=1 uid=0 oct=15 lid=0 tim=424276175961 hv=1894278903 ad='75c06e38' sqlid='4wv7gq1sfhtrr'
ALTER TABLE "TEST_USER"."T1" RENAME TO "BIN$rhxBELdQGMXgUwEAAH93eQ==$0"
END OF STMT
PARSE #139987889121800:c=2896,e=3035,p=0,cr=4,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=424276175960
EXEC #139987889121800:c=186,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=424276176418
ERROR #139987889121800:err=12986 tim=424276176462
CLOSE #139987889121800:c=10,e=9,dep=1,type=0,tim=424276176712
EXEC #139987891478792:c=34686,e=35859,p=0,cr=73,cu=12,mis=0,r=0,dep=0,og=1,plh=0,tim=424276176774
ERROR #139987891478792:err=604 tim=424276176808

The error has come from an SQL statement that is trying to rename my table to some wierd and wonderful name which starts with the characters “BIN$” – that’s a “drop” command trying to move a table into the recycle bin by renaming it – and you’re not allowed to rename a table that is in a partially dropped state. So that’s why we get the error; and the obvious way to bypass it is: “drop table t1 purge;” – which works.

You’ll notice that I’ve included a couple of lines after the first ERROR. This is to show you the line that generated the ORA-00604 (err=604) error. It comes from cursor #139987891478792, and seraching backwards up the file for that cursor number I get to:

PARSING IN CURSOR #139987891478792 len=13 dep=0 uid=107 oct=12 lid=107 tim=424276140765 hv=202649412 ad='7f517dd3fe00' sqlid='d47kdkn618bu4'
drop table t1
END OF STMT

That’s not a suprise, of course, but it is important to cross-check that you haven’t been chasing the wrong error. There are some cases where the Oracle code does something to see if an error will occur but has an exception handler that means the error doesn’t end up reaching the application, so you do need to do a check that the error you found first was the one that floated up to the top of the stack.

Footnote

From Oracle 12.2.0.1 you could arrange to read your own trace file – while your session is connected – through the dynamic performance view v$diag_trace_file_contents.

July 15, 2020

Fetch First vs. Rownum

Filed under: Execution plans,Oracle,Partitioning,Performance,Problem Solving,Tuning — Jonathan Lewis @ 10:11 am BST Jul 15,2020

I’ve pointed out fairly frequently that if you’re running Standard Edition but would like to take advantage of a few features of the Partitioning option then you might be able to do something appropriate with Partition Views (but I’ve just discovered while searching my blog for a suitable item to link to that I haven’t published any of my PV notes on the blog).

I’ve also pointed out that while 12c allows you to use “fetch first N rows” instead of “where rownum <= N” there’s a hidden threat to using the feature because “fetch first N” turns into a hidden row_number() over() analytic function.

Today’s note is a combination of these two topics, prompted by a request to solve a particular coding issue that has appeared a few times on the Oracle Developer Forum and is likely to be recognisable to a number of people.

I’ll start with a very simple model demonstrateing the simplest use of “fetch first N”:


rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);

With this data set I want to write a query that selects rows for class A where id > 9500, ordered by id – but I only want the first two rows. Here’s a very simple query that gets the result I want, followed by the execution plan from 12.2.0.1 (the A-Rows and E-Rows from 19.3 are slightly different):

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)

The plan – with its stats – shows us that we’ve done an index range scan of an index which will return the data in exactly the order we want, and the “fetch first 2 rows” has been translated into the row_number() over() that we expected; but to our great joy the “window sort stopkey” makes the processing stop very early because Oracle recognises that the base data is arriving in the right order so it isn’t necessary to fetch all of it and sort it. The A-Rows column confirms this interpretation of what has happened.

You might notice, by the way, that the optimizer has costed the query as if it were fetching all the rows even though it “knows” that it’s going to fetch only the first two rows. That’s why I had to include the index hint to make the optimizer use the obvious index – a popular alternative is to use the /*+ first_rows(N) */ hint where N matches, or is similar to, the number of rows you want to fetch. If I had omitted the hint the optimizer would have done a full tablescan and then applied a “window sort pushed rank” operation to sort and limit the result to 2 rows.

So now we come to the real problem: the user has a “current” table and an identical “history” table, and would like to replace the table reference with a reference to a union all view for their clients on Standard Edition, or to a partitioned table for clients running Enterprise Edition – and they don’t really want to do any other code changes. So let’s see what happens when we model the union all. I started with a table called t_odd that held only odd values for id, so I’m going to add a table called t_even that holds only even values for id.

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

As you can see t_even is an identically structured table with similar data, and I’ve created a union all view on top of the two tables, changing the query to reference the view rather than referencing a table. Thanks to the costing anomaly (combined with the small size of the tables) I’ve had to supply a couple of “global” hints to tell the optimizer to use the indexes to access the two tables. So how well does the optimizer do its job when we have a union all view?


----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)

Answer: Bad luck, the optimizer isn’t smart enought to find a cheap way through this query. It’s fetched all the relevant data from the two tables before applying the window sort (which it does with some efficiency – the pushed rank) to produce the right answer. As you can see from the A-Rows column, though, it’s had to acquire a couple of hundred rows from each table before getting down to the 2 rows we wanted.

Partitioned Tables

So let’s try to solve the problem by buying into the partitioning option and creating a list-partitioned table with two partitions, one flagged for current data and one flagged for history data – or ‘O’dd and ‘E’ven data as I’ve created them in my model.



create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')

Note particularly that I have created a local index on this partitioned table – so there’s a very close correspondance between the two tables in the previous example and the two partitions in this example. Here’s the plan when I query the partitioned table for the first two rows:


select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)

The optimizer has let us down again. The plan shows us that we have to acquire all the relevant data from the two partitions before applying the row_number() analytic function and discarding all but the two rows we want. (Again we can check the A-Rows column to see that we have started by fetching a total of 404 rows from the table.)

But what happens if we fall back to the good old-fashioned (non-standard) rownum method:


select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)

Oracle really knows how to use rownum well – notice how there is a count stopkey operation as a child to the partition list all operation, and that’s where our rownum <= 2 predicate is first applied. For each partition Oracle finds “the first two rows” and after it has collected two rows from every partition it sorts them (again with a stopkey) to find the top two in that subset. Check the A-Rows column – we selected a total of 4 rows from the table (2 per partition) and reduced that to 2 rows at operation 3.

Conclusion

There are some patterns of processing where partitioned tables can be a lot friendlier to coders than partition views; if you do have to stick with Standard Edition you can usually get what you want but the coding investment may be significantly higher. Even with partitioned tables, though, there are some “old-fashioned” Oracle methods that do a much nicer job than some of the new-fangled “ANSI” mechanisms.

Footnote

Part of the problem presented here revolves around the desire to keep a pattern of SQL generation that already exists, doing nothing more than replacing a table name with a view (or partitioned table) name.

As we’ve seen, if you start with a simple heap table and try to replace it with a partitioned table you have to use the rownum mechanism rather than the fetch first N rows mechanism.

If you’re running with Standard Edition you can’t do anything simple to replace a table name with the name of a union all view; you’d have to change your code generator to apply all the predicates twice (once for each table) and then apply the rownum predicate or fetch first directive again outside the union all. In other words you have to emulate exactly what Oracle EE manages to do with partitioned tables and rownum.

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)


As you can see, the E-Rows still predicts a lot of work, but the A-Rows tells us the work was kept to the minimum we want.

 

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

November 6, 2019

Table Space

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 5:59 pm GMT Nov 6,2019

There’s a request on the Oracle Developer Forum for assistance to write a report that shows the total space allocation for a table – including its indexes and LOB columns. There is no requirement to allow for partitioned tables, but it’s not entirely clear whether the code should allow for index-organized tables and clustered tables, and there’s no comment about nested tables.

The OP has made an initial attempt to write a suitable query, but the query joins dba_tables to dba_indexes then to dba_lobs then three times to dba_segments (all the joins being outer joins) before aggregating on table name. Unsurprisingly this produces the wrong results because (for example) if a table has two indexes the join from dba_tables to dba_indexes will double the bytes reported for the table. As Andrew Sayer points out in the thread, it would be better to aggregate on each of the separate component types before joining.

This introduces an important, generic, principle for writing code.

  • If it’s difficult to write a single statement can you break the task down into separate components that are easier to handle.
  • If you can express the problem as a small set of simpler components can you then combine the components
  • If this approach works is any loss of efficiency balanced by a gain in clarity and maintainability.

In the case of the OP this isn’t a query that’s going to be runing every few seconds – maybe it will be once per day, maybe three or four times per day. so there’s no need to be worried about making it as efficient as possible – so let’s go for simplicity.

Here’s a query that reports the space allocated for a simple heap table. It references dba_tables and dba_segments, so has to be run by a user with DBA privileges, and at the moment it’s restricted to reporting a single user.

Wait – before I write the query I ought to create a testbed to see if the query is working. So let’s take an empty schema and create a few objects. Let’s create

  • a simple heap table with two indexes and two LOB columns.
  • an Index Organized Table (IOT) with an overflow and two LOB columns.
  • a table with two columns that are a nested table type

Here’s some suitable code:


rem
rem     Script:         sum_table_space.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

prompt  ============================================================
prompt  Simple heap table - copy of all_objects - then add two CLOBs
prompt  ============================================================

create table test_user.t1 as select * from all_objects;

alter table test_user.t1 add (c1 clob) lob (c1) store as basicfile t1_lob1 (disable storage in row);
alter table test_user.t1 add (c2 clob) lob (c2) store as basicfile t1_lob2 (disable storage in row);

update test_user.t1 set c1 = 'x', c2 = 'x' where rownum <= 125;

create index test_user.t1_i1 on test_user.t1(object_name, object_type, object_id);
create index test_user.t1_i2 on test_user.t1(object_type, object_name, object_id);

prompt  =========================================================
prompt  Simple IOT table - with a couple of CLOBs in the overflow
prompt  =========================================================

create table test_user.iot1 (
        id              number          not null,
        ind_pad         varchar2(20)    not null,
        n2              number          not null,
        n3              number          not null,
        padding         varchar2(50),
        ct1             clob,
        ct2             clob,
        constraint iot1_pk primary key (id, ind_pad)
)
organization index
        overflow
        including n3
lob(ct1) store as basicfile iot_lob1(
        disable storage in row
)
lob(ct2) store as basicfile iot_lob2(
        disable storage in row
)
;

insert into test_user.iot1
select
        rownum,
        rpad(rownum,20),
        rownum,
        rownum,
        rpad('x',50,'x'),
        case when mod(rownum,100) = 0 then rpad('x',100,'x') end,
        case when mod(rownum,100) = 0 then rpad('x',100,'x') end
from
        all_objects
where
        rownum <= 10000
;

commit;

prompt  ====================================================
prompt  Now create types to allow creation of a nested table
prompt  ====================================================

create type test_user.jpl_item as object (n1 number, v1 varchar2(32));
/

create type test_user.jpl_tab_type as table of jpl_item;
/

create table test_user.demo_nest (
        id      number,
        nest1   test_user.jpl_tab_type,
        nest2   test_user.jpl_tab_type
)
nested table nest1 store as t_nested1
return as locator
nested table nest2 store as t_nested2
return as locator
;

create unique index test_user.tn1_pk on test_user.t_nested1(nested_table_id, n1) compress 1;
create unique index test_user.tn2_pk on test_user.t_nested2(nested_table_id, n1) compress 1;

create index  test_user.tn1_v1 on test_user.t_nested1(v1);
create index  test_user.tn2_v1 on test_user.t_nested2(v1);

insert into test_user.demo_nest values (
        1,
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item One one'),
                test_user.jpl_item(2,'Item One two')
        ),
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item One one'),
                test_user.jpl_item(2,'Item One two')
        )
);

insert into test_user.demo_nest values (
        2,
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item Two one'),
                test_user.jpl_item(2,'Item Two two')
        ),
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item Two one'),
                test_user.jpl_item(2,'Item Two two')
        )
);

commit;

You’ll notice that I’ve prefixed every table, index and type name with a schema name. This is because I set up this test to run as a DBA so I’m creating the objects in one schema while connected to a different schema.

Of course, before trying to report allocations summed by base table, it would be sensible to produce a simple list of the segments we should see so that we know when we’ve reported all of them. So let’s start with that very simple report:


column bytes format 999,999,999,999
column segment_name format a25

break on report on owner on object_name skip 1
compute sum of bytes on report

select
        segment_name, segment_type, bytes
from
        dba_segments
where
        owner = 'TEST_USER'
order by
        segment_type, segment_name
;

SEGMENT_NAME              SEGMENT_TYPE                  BYTES
------------------------- ------------------ ----------------
IOT1_PK                   INDEX                     1,048,576
SYS_C008380               INDEX                     1,048,576
SYS_C008381               INDEX                     1,048,576
SYS_FK0000075579N00002$   INDEX                     1,048,576
SYS_FK0000075579N00004$   INDEX                     1,048,576
T1_I1                     INDEX                     5,242,880
T1_I2                     INDEX                     5,242,880
TN1_PK                    INDEX                     1,048,576
TN1_V1                    INDEX                     1,048,576
TN2_PK                    INDEX                     1,048,576
TN2_V1                    INDEX                     1,048,576
SYS_IL0000075565C00027$$  LOBINDEX                  1,048,576
SYS_IL0000075565C00028$$  LOBINDEX                  1,048,576
SYS_IL0000075572C00006$$  LOBINDEX                  1,048,576
SYS_IL0000075572C00007$$  LOBINDEX                  1,048,576
IOT_LOB1                  LOBSEGMENT                1,048,576
IOT_LOB2                  LOBSEGMENT                1,048,576
T1_LOB1                   LOBSEGMENT                2,097,152
T1_LOB2                   LOBSEGMENT                2,097,152
T_NESTED1                 NESTED TABLE              1,048,576
T_NESTED2                 NESTED TABLE              1,048,576
DEMO_NEST                 TABLE                     1,048,576
SYS_IOT_OVER_75572        TABLE                     1,048,576
T1                        TABLE                    12,582,912
                                             ----------------
sum                                                47,185,920

So we now have a list of segments and a sum of bytes to aim for. One thing you might notice, though, is that the name “IOT1”  has disappeared; instead the thing we see labelled as a TABLE is its overflow segment, called SYS_IOT_OVER_75572 (the number being the object_id of the table we originally defined.  We will have to do something in our code to translate that sys_iot_over_75572″ to iot1″ if we want to make our final report easy to understand.

There are a few other oddities in the list but some of them we may be familiar with already – the indexes with names like SYS_IL0000075565C00027$$ are the lob indexes for the four lobs we created. (I deliberately created basicfile LOBs in case the OP wasn’t using the securefile LOBs that most systems are now using).  The 75565 in the example I just quoted is the object_id of the base table the C00027 tells us that the LOB is column 27 of the table definition.

You may be wondering about the indexes like SYS_C008380 and SYS_FK0000075579N00002$. We may need to work on those – but I’ll tell you the answer straight away – the former is a unique index on the “nest1” column  in the demo_nest table, the latter is a “foreign key” index on the (invisible) nested_table_id column in the nested tables.

Side note: although the sys_fk0000075579N000025 index looks as if it’s supposed to be a “foreign key” index, as suggested by the name, the dependency, and the function, there’s no foreign key constraint declared between the parent table and the nested table. Also, if you check dba_indexes, the generated column has the value ‘N’, even though the name of the index was clearly generated by Oracle internal code.  (The index at the other end of the join – sys_c008380 – is, however, flagged with generated = ‘Y’)

Let’s start building our solution with the easiest bit.


select 
        tab.owner, 
        cast('TABLE' as varchar2(11))   object_type,
        tab.table_name,
        nvl(seg_tab.bytes,0)            bytes  
from
        dba_tables        tab  
left outer join
        dba_segments      seg_tab  
on
        seg_tab.owner      = tab.owner
and     seg_tab.table_name = tab.segment_name  
where
        tab.owner          = 'TEST_USER'
/


OWNER           OBJECT_TYPE TABLE_NAME                      BYTES
--------------- ----------- -------------------- ----------------
TEST_USER       TABLE       T1                         12,582,912
                TABLE       SYS_IOT_OVER_75572          1,048,576
                TABLE       DEMO_NEST                   1,048,576
                TABLE       IOT1                                0
***************                                  ----------------
sum                                                    14,680,064

Immediately we have to correct two problems – we can see the original Iiot1 table – but it doesn’t have an associated data segment, it has a “top”, which is the primary key index segment, and an “overflow” which is designated by Oracle as a table segment. So we need to check the columns in dba_tables to work out how to get the more appropriate name reported for the table segment, and add a predicate to discard the row with the original name. Here’s “tables only mark 2”:

select 
        tab.owner, 
        cast('TABLE' as varchar2(11))   object_type,
        case
                 when tab.iot_type = 'IOT_OVERFLOW' 
                        then tab.iot_name
                else tab.table_name
        end                             object_name,
        case
                 when tab.iot_type = 'IOT_OVERFLOW' 
                        then tab.table_name
        end                             auxiliary_name,
        nvl(seg_tab.bytes,0)            bytes  
from
        dba_tables        tab  
left outer join
        dba_segments      seg_tab  
on
        seg_tab.owner        = tab.owner
and     seg_tab.table_name   = tab.segment_name  
where
        tab.owner            = 'TEST_USER'
and     nvl(tab.iot_type,'NORMAL') != 'IOT'


OWNER           OBJECT_TYPE OBJECT_NAME          AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       TABLE       T1                                                   12,582,912
TEST_USER       TABLE       IOT1                 SYS_IOT_OVER_75572               1,048,576
TEST_USER       TABLE       DEMO_NEST                                             1,048,576

I’ve identified the table type using the iot_type column in dba_tables, eliminated rows where the iot_type is ‘IOT’, and reported the iot_name (with table_name as an auxiliary name) when the iot_type is ‘IOT_OVERFLOW’. And we can now check that the result is consistent with the content of dba_segments that lists segment_type = ‘TABLE’.

So let’s move on to indexes.

select 
        ind.table_owner                 owner, 
        cast('INDEX' as varchar2(11))   object_type,
        ind.table_name,
        index_name                      auxiliary_name,
        nvl(seg_ind.bytes,0)            bytes  
from
        dba_indexes       ind  
left outer join 
        dba_segments      seg_ind  
on
        ind.owner      = seg_ind.owner
and     ind.index_name = seg_ind.segment_name  
where
        ind.table_owner = 'TEST_USER'


OWNER           OBJECT_TYPE TABLE_NAME           AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       INDEX       T1                   SYS_IL0000075565C00027$$         1,048,576
TEST_USER       INDEX       T1                   SYS_IL0000075565C00028$$         1,048,576
TEST_USER       INDEX       T1                   T1_I1                            5,242,880
TEST_USER       INDEX       T1                   T1_I2                            5,242,880
TEST_USER       INDEX       IOT1                 IOT1_PK                          1,048,576
TEST_USER       INDEX       IOT1                 SYS_IL0000075572C00007$$         1,048,576
TEST_USER       INDEX       IOT1                 SYS_IL0000075572C00006$$         1,048,576
TEST_USER       INDEX       T_NESTED1            SYS_FK0000075579N00002$          1,048,576
TEST_USER       INDEX       T_NESTED2            SYS_FK0000075579N00004$          1,048,576
TEST_USER       INDEX       DEMO_NEST            SYS_C008380                      1,048,576
TEST_USER       INDEX       DEMO_NEST            SYS_C008381                      1,048,576
TEST_USER       INDEX       T_NESTED1            TN1_PK                           1,048,576
TEST_USER       INDEX       T_NESTED2            TN2_PK                           1,048,576
TEST_USER       INDEX       T_NESTED1            TN1_V1                           1,048,576
TEST_USER       INDEX       T_NESTED2            TN2_V1                           1,048,576

A quick check shows that we’ve picked up 15 indexes that can be matched against the segments reported by dba_segments – although my query labelled them all as just “INDEX” while dba_segments labels LOB indexes differently. As you can see I’ve reported the table name in each case and reported the index name under the heading auxiliary”_name. This will be a convenience when I’m summing up the space associated with each table.

On to the LOBs – also easy:

select 
        lob.owner, 
        cast('LOB' as varchar2(11))   object_type,
        lob.table_name,
        lob.column_name               auxiliary_name,
        nvl(seg_lob.bytes,0)          bytes  
from
        dba_lobs    lob
left outer join 
        dba_segments      seg_lob
on
        seg_lob.owner        = lob.owner
and     seg_lob.segment_name = lob.segment_name
where 
        lob.owner            = 'TEST_USER'


OWNER           OBJECT_TYPE TABLE_NAME           AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       LOB         T1                   C1                               2,097,152
TEST_USER       LOB         T1                   C2                               2,097,152
TEST_USER       LOB         IOT1                 CT1                              1,048,576
TEST_USER       LOB         IOT1                 CT2                              1,048,576

And finally the nested tables:


select 
        nst.owner, 
        cast('NESTED' as varchar2(11))   object_type,
        nst.parent_table_name            object_name,
        table_name                       auxiliary_name,
        nvl(seg_nst.bytes,0)             bytes  
from
        dba_nested_tables    nst  
left outer join 
       dba_segments          seg_nst  
on
        seg_nst.owner        = nst.owner
and     seg_nst.segment_name = nst.table_name  
where
        nst.owner            = 'TEST_USER'

OWNER           OBJECT_TYPE OBJECT_NAME          AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       NESTED      DEMO_NEST            T_NESTED1                        1,048,576
TEST_USER       NESTED      DEMO_NEST            T_NESTED2                        1,048,576

A quick check tells us that the four pieces of code produce an exact match for all the segments reported in dba_segments – so all we have to do now is stitch the four queries together with UNION ALL then sum(bytes) by owner and table_name.

There are 3 little problems to deal with though.

  • We have the predicate “where {owner} = ‘TEST_USER'” appearing 4 times in total, which is a bit messy. If we put this outside the UNION ALL will Oracle be smart enough to push it back inside the UNION ALL as part of its query transformation. It seems to, but the plan (which is several hundred lines long – so I’m not going to look too closely) does change a little, but on my little test system it didn’t make much difference to the performance. Initially it looks as if it might be okay to create a view from this query with no restricition on table owner and allow users to query the view with a schema name.
  • The code to report indexes reports the indexes on the nested tables under t_nested1 and t_nested2 – we want the indexes to be associated with table demo_nest, so we need to refine that bit of the code. It needs an outer join to dba_nested_tables to supply the parent_table_name if it exists.
  • On my little system the query takes several seconds to run. Should I worry about that. Not until I’ve tested it on a much bigger system, and not until I know how frequently it needs to run. It may be good enough as it stands, and the ease with which I actually modified the first version of my code to handle the nested tables indexing issue is an indication of the benefits of keeping a complex job as the sum of its parts if it’s reasonable to do so.

Let’s go straight to the final (so far) SQL:


select
        owner, object_name, sum(bytes) 
from    (
        select 
                tab.owner, 
                cast('TABLE' as varchar2(11))   object_type,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.iot_name
                        else tab.table_name
                end                             object_name,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.table_name
                end                             auxiliary_name,
                nvl(seg_tab.bytes,0)            bytes  
        from
                dba_tables        tab  
        left outer join
                dba_segments      seg_tab  
        on
                seg_tab.owner        = tab.owner
        and     seg_tab.segment_name = tab.table_name
        where
                tab.owner            = 'TEST_USER'
        and     nvl(tab.iot_type,'NORMAL') != 'IOT'
        union all
        select 
                ind.table_owner                  owner, 
                cast('INDEX' as varchar2(11))    object_type,
                nvl(
                        nst.parent_table_name,
                        ind.table_name
                )                                table_name,
                index_name                       auxiliary_name,
                nvl(seg_ind.bytes,0)             bytes  
        from
                dba_indexes       ind  
        left outer join
                dba_nested_tables nst
        on
                nst.owner       = ind.table_owner
        and     nst.table_name  = ind.table_name
        left outer join 
                dba_segments      seg_ind  
        on
                seg_ind.owner         = ind.owner
        and     seg_ind.segment_name  = ind.index_name 
        where
                ind.table_owner = 'TEST_USER'
        union all
        select 
                lob.owner, 
                cast('LOB' as varchar2(11))   object_type,
                lob.table_name,
                lob.column_name               auxiliary_name,
                nvl(seg_lob.bytes,0)          bytes  
        from
                dba_lobs    lob
        left outer join 
                dba_segments      seg_lob
        on
                seg_lob.owner        = lob.owner
        and     seg_lob.segment_name = lob.segment_name
        where 
                lob.owner            = 'TEST_USER'
        union all
        select 
                nst.owner, 
                cast('NESTED' as varchar2(11))   object_type,
                nst.parent_table_name            object_name,
                table_name                       auxiliary_name,
                nvl(seg_nst.bytes,0)             bytes  
        from
                dba_nested_tables    nst  
        left outer join 
               dba_segments          seg_nst  
        on
                seg_nst.owner        = nst.owner
        and     seg_nst.segment_name = nst.table_name
        where
                nst.owner         = 'TEST_USER'
        )
where
        owner = 'TEST_USER'
group by
        owner, object_name
order by
        sum(bytes)
/

OWNER           OBJECT_NAME                SUM(BYTES)
--------------- -------------------- ----------------
TEST_USER       IOT1                        6,291,456
TEST_USER       DEMO_NEST                  11,534,336
TEST_USER       T1                         29,360,128
                                     ----------------
sum                                        47,185,920


I wrote this note because my response to the original query was an informal suggestion of taking this approach; then I thought it might be worth writing a general comment about solving big problems by starting with little problems and stitching the pieces together using this case as a concrete example – then I just went ahead to finish the thing off within the boundaries of the original requirements.

There’s a reason, though, why I tend to avoid publishing SQL that might be useful – it’s always possible to overlook bits and end up with people trying to use code that’s not totally appropriate to their system. In this case, of course, there’s a total (but deliberate) failure to handle clusters, partitions and subpartitions, but I don’t know how many other of the “less commonly used” bits of Otacle I haven’t considered. I do know that I haven’t touched on domain indexes (such as text indexes with all their convoluted collections of tables and indexes) but there may be some oddities of (e.g.) advanced queueing, replication, and audit vault that fall outside the range of the possibilities I’ve covered above.

Update 7th Nov 2019

The OP from OTN reported seeing some nested tables in the output – which I thought I’d catered for so I modified the code I have on my sandbox to add a nested table to the IOT, and added a couple of extra indexes on the parent of the nested table, changed a couple of columns and object names to quoted mixed case, and everything still seemed to work okay.

It turned out that the OP had nested tables that were nested 5 deep – which means I needed a hierarchical query to connect the 2nd to 5th (or nth) layer of nesting back to the parent to report the total volume against the top level table. In order to deal with this problem I modified the query in the smallest possible way.

I had two references to the view dba_nested_tables, so I created a hierarchical query based on dba_nested_tables that returned the “oldest ancestor” with each child table rather than the immediate parent. (I’ve written a separate note showing the derivation and testing). Then I turned this query into a “with subquery” (CTE) called top_nested_tables and changed the two calls that the main query made to dba_nested_tables  so that they referenced the CTE instead:


with my_nested_tables as (
select
        /*+ materialize */
        owner, parent_table_name, table_name
from
        dba_nested_tables
where   owner = 'TEST_USER'
),
top_nested_tables as (                                                             -- ***** Replacement CTE defined
select  /*+ materialize */
        owner, parent_table parent_table_name, table_name
from    (
        select
                owner,
                level,
                connect_by_root parent_table_name parent_table,
                table_name 
        from
                my_nested_tables
        connect by
                parent_table_name = prior table_name
        )
where   (owner, parent_table) not in (
                select owner, table_name
                from   my_nested_tables
        )
)
select                                                                             -- ***** Main select
        owner, object_name, sum(bytes)  total_bytes
from    (
        select 
                tab.owner, 
                cast('TABLE' as varchar2(11))   object_type,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.iot_name
                        else tab.table_name
                end                             object_name,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.table_name
                end                             auxiliary_name,
                nvl(seg_tab.bytes,0)            bytes  
        from
                dba_tables        tab  
        left outer join
                dba_segments      seg_tab  
        on
                seg_tab.owner        = tab.owner
        and     seg_tab.segment_name = tab.table_name
        where
                tab.owner            = 'TEST_USER'
        and     nvl(tab.iot_type,'NORMAL') != 'IOT'
        union all
        select 
                ind.table_owner                  owner, 
                cast('INDEX' as varchar2(11))    object_type,
                nvl(
                        nst.parent_table_name,
                        ind.table_name
                )                                table_name,
                index_name                       auxiliary_name,
                nvl(seg_ind.bytes,0)             bytes  
        from
                dba_indexes       ind  
        left outer join
                top_nested_tables nst                                              -- ***** view replacement 
        on
                nst.owner       = ind.table_owner
        and     nst.table_name  = ind.table_name
        left outer join 
                dba_segments      seg_ind  
        on
                seg_ind.owner         = ind.owner
        and     seg_ind.segment_name  = ind.index_name 
        where
                ind.table_owner = 'TEST_USER'
        union all
        select 
                lob.owner, 
                cast('LOB' as varchar2(11))   object_type,
                lob.table_name,
                lob.column_name               auxiliary_name,
                nvl(seg_lob.bytes,0)          bytes  
        from
                dba_lobs    lob
        left outer join 
                dba_segments      seg_lob
        on
                seg_lob.owner        = lob.owner
        and     seg_lob.segment_name = lob.segment_name
        where 
                lob.owner            = 'TEST_USER'
        union all
        select 
                nst.owner, 
                cast('NESTED' as varchar2(11))   object_type,
                nst.parent_table_name            object_name,
                table_name                       auxiliary_name,
                nvl(seg_nst.bytes,0)             bytes  
        from
                top_nested_tables nst                                              -- ***** view replacement 
        left outer join 
                dba_segments          seg_nst  
        on
                seg_nst.owner        = nst.owner
        and     seg_nst.segment_name = nst.table_name
        where
                nst.owner         = 'TEST_USER'
        )
where
        owner = 'TEST_USER'
group by
        owner, object_name
order by
        sum(bytes)
;

If you followed the link to the original Oracle Developer Forum thread you’ll see that the OP has made much more use of the “with subquery” (CTE) construct, – including a rather nice touch of having a CTE that reads “select cast(‘MY_SCHEMA’ as varchar2(30)) from dual” – this is particularly cute because it means you only have to supply a fixed schema name once (rather than repeating it many times) in the query and then using a join to the CTE everywhere the name is needed.

 

Next Page »

Website Powered by WordPress.com.