Oracle Scratchpad

April 13, 2021

CSV

Filed under: Infrastructure,Oracle,Problem Solving — Jonathan Lewis @ 3:00 pm BST Apr 13,2021

An old question with a little extra twist came up on the Oracle Developer forum recently – someone wanted to export a table into a CSV file but wanted to do a character set conversion at the same time because various “non-printable” characters were getting corrupted.

Looking at the code they had supplied they were running from SQL*Plus and were trying to use a select statement to generate a script that, for each row in a table, produced a statement of the form:

insert into tableX({list of columns}) values({list of values});

This worked reasonably well when they didn’t attempt to get a UTF8 output, but when they modified their client O/S environment to make SQL*Plus produce the desired UTF8 output the operation was much slower, increasing from about 1 minute to more than half an hour. (This may simply be the overhead of SQL*Net translating each line of output from the server character set to the client character set.)

Since the system in question was running on 12.2 I suggested a change of strategy so that they could at least take advantage of a built-in CSV generator to see how this behaved. Oracle extended the “set markup” command in 12c to include a CSV option:

set markup csv on delimiter {character} quote on

If this behaves nicely with the user’s attempt to get the output in their chosen characterset then a change in strategy to generate a “proper” CSV file and reload it through an external table, rather than generating and executing a list of “insert ….” statements might do a lot less work. (Unfortunately they had a requirement to generate the insert statements rather than adopting this bulk strategy – but I wasn’t going to let that stop me finishing this note.)

One detail to note with this markup option is that, unlike the “delimiter” character, the “quote” character cannot be specified and will always be the double-quote (ASCII character 34). If this character appears in any of the reported strings Oracle will “escape” it by duplicating it. Here’s a little example to demonstrate the effects – first a little data with a basic output (running 12.2.0.1):

rem
rem     Script:         markup.csv.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem

create table t1 (v1 varchar2(10), n1 number, n2 number, v2 varchar2(10));

insert into t1 values('abcdef',1,1,'qwer');
insert into t1 values('abc"def',2,null,'asdf');
insert into t1 values (null,3,3,null);

commit;

prompt  ==================
prompt  markup not yet set
prompt  ==================

select * from t1;


V1                 N1         N2 V2
---------- ---------- ---------- ----------
abcdef              1          1 qwer
abc"def             2            asdf
                    3          3

3 rows selected.

I’ve included one row with a double-quote in one of the columns, and a couple of rows with NULLs in various columns. Now I just switch markup to CSV using its default options:

prompt  ==================
prompt  Default CSV markup
prompt  ==================

set markup csv on
select * from t1;


"V1","N1","N2","V2"
"abcdef",1,1,"qwer"
"abc""def",2,,"asdf"
,3,3,

3 rows selected.


As you can see this has behaved nicely – every column is present (even when empty – judging by the number of delimiters) and character strings have been quoted with the double-quote character, and the stored double-quote in the second row has been duplicated. The default option for CSV should be nearly perfect for use as an extrernal table – though we probably want to get rid of column headings, page breaks, and feedback etc.

Before creating a suitable external table, let’s play around with a couple of variations for the options:

prompt  ===========
prompt  delimiter =
prompt  ===========

set heading off
set pagesize 0
set feedback off

set markup csv on delimiter =
select * from t1;

"abcdef"=1=1="qwer"
"abc""def"=2=="asdf"
=3=3=

prompt  ================================
prompt  quote off (delimiter remembered)
prompt  ================================

set markup csv on quote off
select * from t1;

abcdef=1=1=qwer
abc"def=2==asdf
=3=3=

Now to read back one of the outputs – I’ll pick the default output for this (but with headers etc. off to leave just three lines of data). Here’s how I can define my external table to read back the values from the file:

create or replace directory ext_tab as '/mnt/working/';
create or replace directory ext_log as '/mnt/working/';

create table ext(
        v1 varchar2(10), 
        n1 number, 
        n2 number, 
        v2 varchar2(10)
)
organization external(
        type oracle_loader
        default directory ext_tab
        access parameters (
                records delimited by newline 
                discardfile     ext_log:'exttab_dis.txt'
                badfile         ext_log:'exttab_bad.txt'
                logfile         ext_log:'exttab_log.txt'
        fields 
                csv 
                with embedded
        missing field values are null
        )
        location ('markup_csv.lst')
)
reject limit 10
;

select * from ext;

V1                 N1         N2 V2
---------- ---------- ---------- ----------
abcdef              1          1 qwer
abc"def             2            asdf
                    3          3

3 rows selected.

I’ve created a couple of directories to get started – linking a physical location to a logical name that I can use in the external table definition.

The key details that allow Oracle to read the file correctly appear in the “fields” specification: it’s a pity that the output from the default CSV markup doesn’t become the default expectation for the external table CSV option, but I needed to use the “with embedded” to adjust the duplicated “double-quote” characters and then add “missing field values are null” to cope with a null column at the end of the line.

Summary

With a simple call to “set markup csv on” (and a handful of other set {formatting} commands) it’s very easy to turn a table into a flat CSV file that is comma separated, (double)quote deliminted.

It’s then very easy – but requires two non-default settings – to read this flat file back into an Oracle database using “fields CSV” as the basic formatting directive.

Footnote

I’ve kept the example very simple, but as soon as you start dealing with dates and timestamps – and even very large numbers – you’ll have to worry about formatting when generate the flat file (set nls_date_format, column n1 format 99999, etc.) to make sure you don’t lose precision on the output; and for dates and timestamps you’ll have to include the equivalent date_format options in the external table definition.

I’ve only demonstrated how to reload the file back into Oracle as an external table. If you wanted to use this markup option to prepare the data for an Excel spreadsheet all you have to do is spool to a file with the “.csv” extension and Excel will be able to open the output file and parse the data correctly into a multi-column display. (But see comment #1 below for a warning from Jason Bucata.)

March 12, 2021

Distributed Sequences

Filed under: distributed,Execution plans,Oracle,Performance,Problem Solving — Jonathan Lewis @ 9:09 am GMT Mar 12,2021

A request for help came up some time ago on ODC reporting a query that was hanging when it included a sequence.nextval. In fact the intial “query” was an “insert / select” with a select that was a join of two remote tables. Making the fairly automatic assumption that many people say “hanging” when they really mean “hasn’t finished yet” the first thought I had about the structure of the statement was that it was just a variation of the standard problem of distributed DML. I haven’t written anything previously about how using sequences can introduce the problem so here’s a note to demonstrate the issue and suggest a workaround:

We start with a database link and a few tables:


rem
rem     Script:         distributed_sequence.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1
rem

rem     create public database link orcl@loopback using 'orcl';

define m_target=orcl@loopback

create sequence s1 cache 10000;
select s1.nextval from dual;

create table t1
segment creation immediate
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n2,
        lpad(rownum,10,'0')             v2,
        lpad('x',100,'x')               padding
from
        generator       v1
;

create table t3(
        id1     number(6,0),
        id2     number(6,0),
        n0      number(6,0),
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(10),
        v2      varchar2(10)
)
segment creation immediate
;

create or replace view v1 as
select
        t1.id id1,
        t2.id id2,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1, t2
where
        t2.id = t1.id
;

The step to create a public database link (one of many in my original source) has to be run by a suitably privileged schema as a one-off event. The definition of the substitution variable m_target that I’ll be using as my database link (again with many possibilities in my original script) means I only have to edit my choice of database link once in my script as I change Oracle instances. You’ll notice I’ve done my usual trick of using a loopback database link to emulate a distributed system.

I’ve then created two populated tables and a third empty table that will be the target of an insert. I’ve also created a view that joins the first two tables and a sequence number that I’ve primed with a single call to nextval. When I get to the tests I’ll be using t1, t2 and v1 as if they had been created in the remote database (referenced through my loopback database link) while t3 and s1 will be local objects.

So let’s run a couple of statements and see what their execution plans look like:


set serveroutput off

prompt  =======================
prompt  Insert without sequence
prompt  =======================

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        t1.id,
        t2.id,
        0,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1@&m_target    t1,
        t2@&m_target    t2
where
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));

prompt  =======================
prompt  Insert WITH sequence
prompt  =======================

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        t1.id,
        t2.id,
        s1.nextval,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1@&m_target    t1,
        t2@&m_target    t2
where
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;

So we join t1 and t2, select a couple of columns, and insert into t3 without or with a call to s1.nextval – the local sequence. Here’s the output (with minor cosmetic changes) running this chunk of code under 19.3.0.0 (the effects are the same in 18.3.0.0 and 12.2.0.1):


=======================
Insert without sequence
=======================

10000 rows created.


SQL_ID  373nz810u3frv, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
0,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback t1,
t2@orclpdb@loopback t2 where  t2.id = t1.id

-----------------------------------------------------------------
| Id  | Operation                | Name | Cost  | Inst   |IN-OUT|
-----------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |        |      |
|   2 |   REMOTE                 |      |       | ORCLP~ | R->S |
-----------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - SELECT "A2"."ID","A1"."ID",0,"A2"."N1","A1"."N2","A2"."V1","A1"."
       V2" FROM "T1" "A2","T2" "A1" WHERE "A1"."ID"="A2"."ID" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - cpu costing is off (consider enabling it)


=======================
Insert WITH sequence
=======================

10000 rows created.


SQL_ID  8jg23arujnh01, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
s1.nextval,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback
t1,  t2@orclpdb@loopback t2 where  t2.id = t1.id

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    54 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|*  3 |    HASH JOIN             |      | 10000 |   937K|    54   (8)| 00:00:01 |        |      |
|   4 |     REMOTE               | T1   | 10000 |   468K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
|   5 |     REMOTE               | T2   | 10000 |   468K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"="T1"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","N1","V1" FROM "T1" "A2" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

   5 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","N2","V2" FROM "T2" "A1" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

The key thing to notice is that when we want to insert the local sequence as a column in the select list Oracle breaks the hash join into two separate accesses to the remote database and pulls all the data we might need from the tables before trying to join them locally. When the query is “fully remote” the local database can let the remote database deal with the join, when the query is distributed – which is a side effect of introducing the sequence – the local site becomes the driving site and has to work out the least worst way of handling the join, which might be much slower than the remote join.  (It’s an odd little quirk that when the select is fully remote the optimizer thinks that it’s not using CPU costing. Possibly that’s because all the arithmetic happens somewhere else and the local cost of the query never gets above zero.)

This is one of those cases where we might work around the problem by creating a remote view to handle the join – hence the creation of the view v1; here’s a suitable statement and the resulting execution plan:

prompt  ==============================
prompt  Insert using view and sequence
prompt  ==============================

set serveroutput off

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        v1.id1,
        v1.id2,
        s1.nextval,
        v1.n1,
        v1.n2,
        v1.v1,
        v1.v2
from
        v1@&m_target v1
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;



==============================
Insert using view and sequence
==============================

10000 rows created.


SQL_ID  4tz0rrqt87nb8, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  v1.id1,  v1.id2,
s1.nextval,  v1.n1,  v1.n2,  v1.v1,  v1.v2 from  v1@orclpdb@loopback v1

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    27 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|   3 |    REMOTE                | V1   | 10000 |   937K|    27   (8)| 00:00:01 | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID1","ID2","N1","N2","V1","V2" FROM "V1" "V1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

As you can see, the local optimizer doesn’t know enough about the remote view to be able to split it into components and make a mess of the execution plan, it simply sees a query against a “single table” and let’s the remote database worry about optimising it. But, of course, we don’t always have the luxury of being able to create objects on someone else’s database, so what’s the alternative?

Try rewriting the query to use an inline view with the /*+ no_merge() */ hint:


prompt  =======================================
prompt  Insert from a no-merge inline view with
prompt  a sequence.nextval in the outer query
prompt  ========================================

set serveroutput off

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        id1, id2, s1.nextval, n1, n2, v1, v2
from    (
        select  /*+ no_merge */
                t1.id   id1,    
                t2.id   id2,
                0       n0,
                t1.n1,
                t2.n2,
                t1.v1,
                t2.v2
        from
                t1@&m_target    t1,
                t2@&m_target    t2
        where
                t2.id = t1.id
        )       v1
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;


=======================================
Insert from a no-merge inline view with
a sequence.nextval in the outer query
========================================

10000 rows created.


SQL_ID  20z81g550tbsk, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  id1, id2,
s1.nextval, n1, n2, v1, v2 from (  select /*+ no_merge */   t1.id id1,
 t2.id id2,   0 n0,   t1.n1,   t2.n2,   t1.v1,   t2.v2  from
t1@orclpdb@loopback t1,   t2@orclpdb@loopback t2  where   t2.id = t1.id
 ) v1

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    54 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|   3 |    VIEW                  |      | 10000 |   937K|    54   (8)| 00:00:01 |        |      |
|   4 |     REMOTE               |      |       |       |            |          | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   4 - SELECT /*+ NO_MERGE */ "A2"."ID","A1"."ID",0,"A2"."N1","A1"."N2","A2"."V1","A1"."V
       2" FROM "T1" "A2","T2" "A1" WHERE "A1"."ID"="A2"."ID" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )


The hint has done exactly what we needed: it has passed the text of the inline view to the remote database for optimisation so the join happens remotely, and the sequence number is then included after the result set comes back from the remote database. The SQL is a little messier, of course, mostly thanks to the doubled appearance of the columns in the select list.

Summary

Using a local sequence in DML that accesses a remote database makes the optimizer treats the underlying query as a distributed query, and this may mean it can’t find an efficient execution path unless you do some re-engineering of the code. If you can manage to make an insert with a constant efficient then using that version of the code as an in-line no_merge view with one extra layer that brings the sequence into play may be all you need to do to make the DML operate efficiently.

November 26, 2020

Row sizes 3

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 12:08 pm GMT Nov 26,2020

Several years ago I published a couple of examples of scripts that I’d been using since Oracle 6 (with minor changes over time) to analyse the content of tables in some detail. The first was a breakdown of the lengths of the rows in the table, the second was a map showing the distribution of nulls in the rows. There used to be a third component of the analysis which produced a report of the number of non-null entries for each column in the table,  but by the time I published the first two parts there didn’t seem to be much point in publishing the third since you could get a sufficiently accurate picture by querying view user_tab_columns after gathering stats:


break on num_rows

select
        tab.num_rows, col.column_name , col.num_nulls
from 
        user_tables             tab
join
        user_tab_columns        col
on
        col.table_name = tab.table_name
where
        tab.table_name = 'T1' 
order by 
        col.num_nulls desc, col.column_id
/


  NUM_ROWS COLUMN_NAME           NUM_NULLS
---------- -------------------- ----------
     56777 EDITION_NAME              56777
           CREATED_APPID             56777
           CREATED_VSNID             56777
           MODIFIED_APPID            56777
           MODIFIED_VSNID            56777
           SUBOBJECT_NAME            56570
           DATA_OBJECT_ID            55353
           DEFAULT_COLLATION         51058
           EDITIONABLE               40216
           OWNER                         0
           OBJECT_NAME                   0
           OBJECT_ID                     0
           OBJECT_TYPE                   0
           CREATED                       0
           LAST_DDL_TIME                 0
           TIMESTAMP                     0
           STATUS                        0
           TEMPORARY                     0
           GENERATED                     0
           SECONDARY                     0
           NAMESPACE                     0
           SHARING                       0
           ORACLE_MAINTAINED             0
           APPLICATION                   0
           DUPLICATED                    0
           SHARDED                       0


In this example t1 is a copy of view all_objects and I’ve listed the columns in order of most frequently null with the table’s num_rows reported at the top for comparison purposes. If you were going to do something that made it a good idea to rebuild the table you could use this report as a guide to re-arranging the column order to position the most frequently null columns towards the end of the row (saving the “trailing nulls take no space” length bytes).

This approach depends on the stats being up to date and accurate, of course, so if you didn’t have them, and didn’t want to collect them, another strategy would be to run a query like the following:


select
        to_char(count(OWNER),'999,999,999,999,999') OWNER,
        to_char(count(OBJECT_NAME),'999,999,999,999,999') OBJECT_NAME,
        to_char(count(SUBOBJECT_NAME),'999,999,999,999,999') SUBOBJECT_NAME,
        to_char(count(OBJECT_ID),'999,999,999,999,999') OBJECT_ID,
        to_char(count(DATA_OBJECT_ID),'999,999,999,999,999') DATA_OBJECT_ID,
        to_char(count(OBJECT_TYPE),'999,999,999,999,999') OBJECT_TYPE,
        to_char(count(CREATED),'999,999,999,999,999') CREATED,
        to_char(count(LAST_DDL_TIME),'999,999,999,999,999') LAST_DDL_TIME,
        to_char(count(TIMESTAMP),'999,999,999,999,999') TIMESTAMP,
        to_char(count(STATUS),'999,999,999,999,999') STATUS,
        to_char(count(TEMPORARY),'999,999,999,999,999') TEMPORARY,
        to_char(count(GENERATED),'999,999,999,999,999') GENERATED,
        to_char(count(SECONDARY),'999,999,999,999,999') SECONDARY,
        to_char(count(NAMESPACE),'999,999,999,999,999') NAMESPACE,
        to_char(count(EDITION_NAME),'999,999,999,999,999') EDITION_NAME,
        to_char(count(SHARING),'999,999,999,999,999') SHARING,
        to_char(count(EDITIONABLE),'999,999,999,999,999') EDITIONABLE,
        to_char(count(ORACLE_MAINTAINED),'999,999,999,999,999') ORACLE_MAINTAINED,
        to_char(count(APPLICATION),'999,999,999,999,999') APPLICATION,
        to_char(count(DEFAULT_COLLATION),'999,999,999,999,999') DEFAULT_COLLATION,
        to_char(count(DUPLICATED),'999,999,999,999,999') DUPLICATED,
        to_char(count(SHARDED),'999,999,999,999,999') SHARDED,
        to_char(count(CREATED_APPID),'999,999,999,999,999') CREATED_APPID,
        to_char(count(CREATED_VSNID),'999,999,999,999,999') CREATED_VSNID,
        to_char(count(MODIFIED_APPID),'999,999,999,999,999') MODIFIED_APPID,
        to_char(count(MODIFIED_VSNID),'999,999,999,999,999') MODIFIED_VSNID,
        to_char(count(*),'999,999,999,999,999') row_count 
fromi
         t1
;

You don’t really need the to_char() function, but it’s a conveience for what I’m going to do with the SQL. Obviouslty it would be a bit tedious to create this statement by hand so, as I did in the previous “rowsize” notes, I’ve written some code to generate it for me:


rem
rem     row_size_2b.sql
rem     
rem     Generate SQL to report counts of 
rem     non-null columns in a table.
rem

set linesize 32000
set feedback off
define m_table = '&1'

declare
        m_string        varchar2(32000) := 'select ';
        m_cursor        sys_refcursor;

begin
        for r in (
                select 
                        column_name, data_type
                from    user_tab_columns
                where   table_name = upper('&m_table')
        ) loop
                m_string := m_string ||  
                                ' to_char(count(' || trim(r.column_name) || '),''999,999,999,999,999'') ' ||
                                trim(r.column_name) || ',' || chr(10) ;
        end loop;

        m_string := m_string || ' to_char(count(*),''999,999,999,999,999'') row_count from &m_table';

--      dbms_output.put_line(m_string);
        print_table(m_string);

end;
/

set linesize 156

The script accepts a table name from the user’s schema (you could edit this to query dba_tab_columns), constructs a string (as shown above – though I’ve done a little cosmetic work on it), and then passes the resulting string to a verion of Tom Kyte’s print_table() routine which produces the following output:


OWNER                         :               56,777
OBJECT_NAME                   :               56,777
SUBOBJECT_NAME                :                  207
OBJECT_ID                     :               56,777
DATA_OBJECT_ID                :                1,424
OBJECT_TYPE                   :               56,777
CREATED                       :               56,777
LAST_DDL_TIME                 :               56,777
TIMESTAMP                     :               56,777
STATUS                        :               56,777
TEMPORARY                     :               56,777
GENERATED                     :               56,777
SECONDARY                     :               56,777
NAMESPACE                     :               56,777
EDITION_NAME                  :                    0
SHARING                       :               56,777
EDITIONABLE                   :               16,561
ORACLE_MAINTAINED             :               56,777
APPLICATION                   :               56,777
DEFAULT_COLLATION             :                5,719
DUPLICATED                    :               56,777
SHARDED                       :               56,777
CREATED_APPID                 :                    0
CREATED_VSNID                 :                    0
MODIFIED_APPID                :                    0
MODIFIED_VSNID                :                    0
ROW_COUNT                     :               56,777

-----------------

1 rows selected

As with many of the scripts I’ve pubished, the task is one I rarely do, so I haven’t bothered to add in all the extra little refinements that might make it foolproof and totally self-contained. In this case, for example I’d have to go through a final couple of manual steps to edit the output by deleteing the last few lines then (since I use vi) issuing “!%!sort -k3 -n -r” which would sort the results in reverse order, numerically, according to the third field.

 

print_table()

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 9:40 am GMT Nov 26,2020

Many years ago Tom Kyte published a small PL/SQL procedure to output each row in a table as a list of (column_name , value). I often use the original version, even though Tom refined it several times over the years. From time to time the links in my blog that I’ve set up to point to Tom’s code have simply stopped working so I’ve decided to publish a reference copy on my blog in case newer readers need to find a copy and the current link above stops  working.

rem
rem     Script:         print_table.sql
rem     Dated:          Nov 2001
rem     Author:         Tom Kyte - from Expert One on One Oracle.
rem
rem     Purpose:
rem     Prints the results of a query one column per line.
rem
rem     Created as a 'authid current_user' so that it can
rem     access any table that the caller is allowed to access
rem     from the SQL*Plus prompt.
rem
rem     A suitably privilefed user could also make this publicly
rem     available through the public synonym and grant at the end
rem     of the script
rem
rem     Minor modifications by Jonathan Lewis to separate rows
rem     in the result set, and report the number of rows returned
rem

create or replace procedure print_table( p_query in varchar2 )
authid current_user
is
        l_theCursor     integer default dbms_sql.open_cursor;
        l_columnValue   varchar2(4000);
        l_status        integer;
        l_descTbl       dbms_sql.desc_tab;
        l_colCnt        number;
        l_rowcnt        number := 0;
begin
        execute immediate 'alter session set nls_date_format=''dd-mon-yyyy hh24:mi:ss'' ';

        dbms_sql.parse(  l_theCursor,  p_query, dbms_sql.native );
        dbms_sql.describe_columns( l_theCursor, l_colCnt, l_descTbl );

        for i in 1 .. l_colCnt loop
                dbms_sql.define_column(l_theCursor, i, l_columnValue, 4000 );
        end loop;

        l_status := dbms_sql.execute(l_theCursor);

        while ( dbms_sql.fetch_rows(l_theCursor) > 0 ) loop
                for i in 1 .. l_colCnt loop
                        dbms_sql.column_value( l_theCursor, i, l_columnValue );
                        dbms_output.put_line(
                                rpad( l_descTbl(i).col_name, 30 )
                                || ' : ' || l_columnValue
                        );
                end loop;
                dbms_output.new_line;
                dbms_output.put_line( '-----------------' );
                dbms_output.new_line;
                l_rowcnt := l_rowcnt + 1;
        end loop;

        dbms_output.put_line(l_rowcnt || ' rows selected');

        execute immediate 'alter session set nls_date_format=''dd-mon-rr'' ';

exception
        when others then
                execute immediate 'alter session set nls_date_format=''dd-mon-rr'' ';
        raise;
end;
/

-- create public synonym print_table for print_table;
-- grant execute on print_table to public;

The reason I’m publishing this now is that I’m about to publish the third variant of a piece of code I wrote many years ago, and first blogged about in 2012.

October 5, 2020

Direct Path

Filed under: Oracle,Performance,Problem Solving,RAC,Troubleshooting,Tuning — Jonathan Lewis @ 11:29 am BST Oct 5,2020

Here’s a note that I might have written once already – but I can’t find it and I’ve just been reminded about what it (might have) said by a posting that came up on the Oracle database forum in the last few days.

The posting in question is asking why, after setting the hidden parameter _serial_direct_read to ‘always’ a particular query is now taking hours to complete when it used to complete in a minute or so.

The answer is partly “because you’ve forced direct path serial reads”, partly “because you’re running on RAC” and (most directly) because the optimizer is using a really bad execution plan for that query and the direct path reads have had a massive impact as a consequence. (It turns out, after modelling, that the answer might also include “because you’re running 11.2.0.4”)

I’m going to demonstrate the issue by forcing a very simple query to take a very bad execution plan.

rem
rem     Script:         ko.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem 

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

create table t2
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

select  table_name, blocks 
from    user_tables
where   table_name in ('T1','T2')
;

alter system flush buffer_cache;

execute snap_events.start_snap
execute snap_my_stats.start_snap
alter session set "_serial_direct_read"=always;


select
        /*+ 
                leading(t1 t2)
                use_nl(t2)
                full(t2)
        */
        t1.object_type,
        t2.object_type
from
        t1, t2
where
        t2.object_id = t1.object_id + 0.5
;

execute snap_my_stats.end_snap
execute snap_events.end_snap

alter session set "_serial_direct_read"=auto;


My query very carefully ensures that it’s not going to return any rows; but it’s going to do a lot of work finding no data because I’ve forced Oracle into doing a tablescan of t2 for every row in t1 – so 10,000 scans of a table of 140 – 190 (depending on version) data blocks.

The snap_my_stats and snap_events packages are a couple of my simple diagnostic packages that allows me to find the change in some v$ content between the start and end snapshots. In this case it’s v$mystat and v$session_event for the session.

On a test using 11.2.0.4 the query ran for about 41 seconds with the following wait events reported:

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.11        .054           1
direct path read                                  6,527           0         368.30        .056           6
SQL*Net message to client                            13           0           0.00        .000           0
SQL*Net message from client                          13           0      10,689.26     822.251      10,689

The 6,500 “direct path read” waits corresponded to 1.33M “physical reads direct” reported in the session activity stats. Although the t2 table was fairly small Oracle was forced to use direct path reads for every single cycle through the nested loop. As a quick comparison, here are the figures if I don’t force direct path serial scans.

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.09        .047           2
db file scattered read                               34           0           2.45        .072           0
SQL*Net message to client                            11           0           0.01        .001           0
SQL*Net message from client                          11           0         174.36      15.851      82,849



We’ve waited for only 34 “db file scattered reads” and 2.45 centiseconds as we read the t2 (and the t1) tables into the cache for the first time, but then we’ve been able to revisit the blocks in the cache. We also saw a reduction in CPU usage and the total run time dropped from 41 seconds to about 22 seconds.

In this tiny example it hasn’t made a staggering difference to the overall run time, but the OP wasn’t that lucky with his “couple of minutes” to “hours”.

If you look at the fragment of the SQL Monitor report supplied by the OP you’ll see that they have an operation which shows:

=====================================================================================================================================================================================================================================
| Id    |                  Operation                    |            Name              | Rows    | Cost |  Time     |  Start | Execs |    Rows  | Read | Read  | Cell   | Mem | Activity |            Activity Detail               |
|       |                                               |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload |    |    (%)   |              (# samples)                 |
=====================================================================================================================================================================================================================================
| -> 25 |       INDEX STORAGE FAST FULL SCAN            | TMRC_IX1                     |      1  |      |     10745 |     +0 |  268K |     112K | 267K |   2GB |  96.53% | 1M |    99.74 | enq: KO - fast object checkpoint (4783)  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | Cpu (753)                                |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | gcs drm freeze in enter server mode (25) |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | latch free (1)                           |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | reliable message (3505)                  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | cell smart index scan (1635)             |
=====================================================================================================================================================================================================================================

The “index fast full scan” is an example of a “segment scan” and will be subject to direct path reads, just like a tablescan. We don’t really know how big this index is – but we can see that we have read it 268K times (Execs column) reading approximately 2GB after 267K read requests. This suggests the index is at most 1MB in size since it can be read in a single read request, and may consist of only one populated block (2,000,000,000/267,000 = 7,490 bytes. Despite this small size the total time sampled for all these scans is more than 10,600 seconds – roughly 39 millisecond per scan. That’s not very good.

Looking more closely at the sampled time we notice 3 key features:

============================================
|            Activity Detail               |
|              (# samples)                 |
============================================
| enq: KO - fast object checkpoint (4783)  |
| Cpu (753)                                |
| gcs drm freeze in enter server mode (25) |
| latch free (1)                           |
| reliable message (3505)                  |
| cell smart index scan (1635)             |
============================================
  • A massive fraction of the time was spent on “enq: KO – fast object checkpoint”
  • A large fraction of the time was spent on “reliable message”
  • There was some time (relatively small, but large in absolute terms) for “gcs drm freeze …”

The last of these three is about global cache services, distributed resource manager” and is a clear indication that the system is running RAC and we have some hefty competition for “object mastering” between instances. But the actual time lost there is relatively small – though associated chatter between instances could be significant.

But what’s the “KO enqueue”? Every time an instance starts a direct path segment scan it has to get a message to the database writer (possibly via the checkpoint process) – hence the “reliable message” waits – to copy every dirty block for that segment from the buffer cache down to disc and it has to wait for the write to complete. This is necessary to ensure that the tablescan doesn’t miss any changes that have been made in memory without yet being written to disc.

The KO enqueue synchronises this activity – I haven’t worked out the complete chain of events, but the enqueue is negotiated between the session and the checkpoint process – and if you’re running RAC every instance has to write any dirty blocks it is holding for the segment, so you have to have a degree of cross-instance chatter to make this happen.

Thanks to the enforced serial direct reads the OP’s plan – which, surely, shouldn’t expect to do 267K index fast full scans – has a massive overhead thanks to the need for the repeated object checkpoints.

You may ask, at this point, why I didn’t see any KO enqueue waits in my test results – the answer is simple, I’d flushed the buffer cache before I started the test, so there were no dirty blocks for the session to worry about. Let’s see what happens if I introduce a little activity to keep dirtying a few blocks in the t2 table. Here’s a little loop that will update a few rows once per second:

begin
        for i in 1..1 loop
                update t2 set data_object_id = 0 where mod(object_id,1000) = i;
                dbms_output.put_line(sql%rowcount);
                commit;
                dbms_lock.sleep(1);
        end loop;
end;
/

You’ll notice the code runs through the loop just once – I started with a loop count of 30, and discovered it wasn’t necessary, but the option remains for further testing.

If I execute this anonymous block (which updates about 10 rows each time through the loop) from another session just after SQL*Plus reports my “alter session”, so that it runs just after the query starts, this is what the session event report looks like if I run the test against 11.2.0.4 (the version reported by the OP):

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
enq: KO - fast object checkpoint                  7,645           0         290.34        .038           6
db file sequential read                               2           0           0.13        .066           2
direct path read                                 10,714           0         675.50        .063           6
SQL*Net message to client                            14           0           0.00        .000           0
SQL*Net message from client                          14           0       1,556.23     111.160     101,653
events in waitclass Other                         5,607           0         218.04        .039           8


Suddenly we see a lot of time spent on the KO enqueue waits and the “events in waitclass Other” (which turn out to be “reliable message” waits). Apparently the session keeps finding dirty t2 blocks in the cache and telling the database writer they need to be written to disc before the next tablescan of t2 can take place.

There’s something odd here, though and I’ll introduce it with this comment: when I repeated the test on 19.3 (even with with the constant trickle of updates once per second), we only see a tiny number of KO enqueues and reliable message waits – the fact that we see a huge number of them in 11g is a defect in the design of the code.

Think about what’s happening with the KO enqueue: when you start the first tablescan of t2 you force every dirty block for that segment to be copied from the cache to the disc.

As the tablescan proceeds you may have to apply some undo change vectors to the blocks you’re reading to take them back to the SCN as at the start of query execution, but you know that any data that had been committed before the query started is on disc (even if it has been over-written by committed changes made after the query started, or by uncommitted changes made before the query started). What’s on the disc right now will be usable to get the correct read-consistent version of the data for the duration of the query run, no matter how many newer changes are made, whether or not they over-write the disc blocks before the query ends. There is no need to force write any dirty blocks as the tablescan is repeated and, it seems, by 19.3 the code has been adjusted to accomodate that fact.

Footnote

After I had posted this comment on the forum, the OP raised the question of whether or not the fix might apply to 12c as well – so I ran up a VM of 12.1.0.2 and 12.2.0.1 and re-ran the tests. The results were initially promising – neither version reported an extreme number of KO enqueue waits or reliable message waits.

However when I changed the loop counter from 1 back to 30 I found that the waits re-appeared – though the numbers were significantly less than those from 11g – so perhaps there’s a timing element involved that might need further investigation and stress testing even for 19.3.

Footnote 2

Another snapshot I took in testing was from the v$enqueue stats – which showed that (approximately) for every KO enqueue wait my session reported, the instance reported about 10 – 12 KO enqueue requests.

When checking v$enqueue_stats it’s important to remember that session activity stats (v$sesstat) report both “enqueue requests” and “enqueue conversions”. In v$enqueue_stats the conversions aren’t reported separately the view simply adds the two figures together under “requests”.

In the 11g test the session reported 7,645 KO enqueue waits, but the session activity stats reported 19,399 enqueue requests and 38,796 enqueue conversions; while v$enqueue_stats reported 96,990 KO enqueue requests. The remaining 38,796 KO enqueue requests were made by the checkpoint process (CKPT) – and it was only my session that repoted any waits for KO enqueue requests.

Without further low-level investigation this is what leads me to believe that the session sends CKPT a message that it wants an object-level checkpoint performed and waits for the message to be acknowledged (reliable message) before trying to convert a low-level KO enqueue to an exclusive one. But CKPT has acquired and converted the same KO enqueue before acknowledging the message from the session and will only release the enqueue when the database writer acknowledges that the checkpoint request has been completed. So this mechanism ensures that the session will have to wait until the checkpoint is complete and CKPT has released the enqueue before it can continue processing.

(P.S. If you enable event 10704 you will find that there seem to be two KO enqueues involved – one with id2 = 1, one with id2 = 2.)

August 31, 2020

Tracing Errors

Filed under: Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 10:16 am BST Aug 31,2020

This is a little lesson in trouble-shooting. It assumes you have the privilege to generate and edit trace files, and all I’m going to do is show how I worked out the answer to a fairly simple question that appeared recently on the Oracle Developer Community forum in a thread with the title  Cannot drop table after start dropping unused columns checkpoint.

I have a table t1 which is reasonably large (1M rows) with a column v30, and I’ve issued the command.

rem
rem     Script:         drop_column.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2003
rem

alter table t1 set unused column v30;

After this I’ve issued another command, pressed return, and immediately hit ctrl-C – I’ve got a brief window for this interrupt as the command is going to generate roughly 230MB of redo. If you want to try the experiment it might take a couple of attempts to get the timing right.

alter table t1
        drop unused columns
        checkpoint 1000
/
{ctrl-C}

Then I’ve tried to drop the table with the following result:

drop table t1
           *
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-12986: columns in partially dropped state. Submit ALTER TABLE DROP COLUMNS CONTINUE

This emulates the problem that appeared on the forum but the OP didn’t really want to issue the “continue” command because their table was “large” and the drop had been running for 24 hours when it was interrupted. It’s worth noting that four columns had been specified as unused, which means the drop command was probably generating roughly 1KB of redo per row. It’s also worth mentioning that the table was 600 columns wide – so there may have been a few performance side effects due to the multiple (minimum 3) row-pieces per row and row-chaining.

Ignoring any questions about the possible impact of having 600 columns, the key question in this case is:

  • Why isn’t it possible to drop the table (the manuals suggest it should be possible at this point)
  • Is there a way to bypass the problem?

This is a repeatable error – we can try to drop the table as many times as we like and we will get the same error reported in fractions of a second. so an obvious strategy is to enable tracing and see if the trace file can tell us anything about why the error is happening. This is a particularly sensible strategy to follow since error ORA-00604 is telling us that there’s something wrong in the recursive SQL, which means there’s a very good chance that we will actually find an SQL statement in the trace file that is the rescursive statement triggering the error.

So, enable sql_trace (or set event 10046), or do whatever you like to do to enable basic tracing (no need for anything above level 1 just yet); try to execute the drop; then search the trace file for the word ERROR at the start of a line (“^ERROR”). Here’s what I found as the first match in my trace file:

ERROR #139987889121800:err=12986 tim=424276176462

Note that the err= value is the 12986 that was reported as the error under the ORA-00604 error. Sometimes it’s necessary to search backwards in the trace file until you find the matching cursor number (#139987889121800), but in this case it was already visible just a few lines further up the file. So here’s the fragment of the file around that ERROR line:


PARSING IN CURSOR #139987889121800 len=72 dep=1 uid=0 oct=15 lid=0 tim=424276175961 hv=1894278903 ad='75c06e38' sqlid='4wv7gq1sfhtrr'
ALTER TABLE "TEST_USER"."T1" RENAME TO "BIN$rhxBELdQGMXgUwEAAH93eQ==$0"
END OF STMT
PARSE #139987889121800:c=2896,e=3035,p=0,cr=4,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=424276175960
EXEC #139987889121800:c=186,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=424276176418
ERROR #139987889121800:err=12986 tim=424276176462
CLOSE #139987889121800:c=10,e=9,dep=1,type=0,tim=424276176712
EXEC #139987891478792:c=34686,e=35859,p=0,cr=73,cu=12,mis=0,r=0,dep=0,og=1,plh=0,tim=424276176774
ERROR #139987891478792:err=604 tim=424276176808

The error has come from an SQL statement that is trying to rename my table to some wierd and wonderful name which starts with the characters “BIN$” – that’s a “drop” command trying to move a table into the recycle bin by renaming it – and you’re not allowed to rename a table that is in a partially dropped state. So that’s why we get the error; and the obvious way to bypass it is: “drop table t1 purge;” – which works.

You’ll notice that I’ve included a couple of lines after the first ERROR. This is to show you the line that generated the ORA-00604 (err=604) error. It comes from cursor #139987891478792, and seraching backwards up the file for that cursor number I get to:

PARSING IN CURSOR #139987891478792 len=13 dep=0 uid=107 oct=12 lid=107 tim=424276140765 hv=202649412 ad='7f517dd3fe00' sqlid='d47kdkn618bu4'
drop table t1
END OF STMT

That’s not a suprise, of course, but it is important to cross-check that you haven’t been chasing the wrong error. There are some cases where the Oracle code does something to see if an error will occur but has an exception handler that means the error doesn’t end up reaching the application, so you do need to do a check that the error you found first was the one that floated up to the top of the stack.

Footnote

From Oracle 12.2.0.1 you could arrange to read your own trace file – while your session is connected – through the dynamic performance view v$diag_trace_file_contents.

July 15, 2020

Fetch First vs. Rownum

Filed under: Execution plans,Oracle,Partitioning,Performance,Problem Solving,Tuning — Jonathan Lewis @ 10:11 am BST Jul 15,2020

I’ve pointed out fairly frequently that if you’re running Standard Edition but would like to take advantage of a few features of the Partitioning option then you might be able to do something appropriate with Partition Views (but I’ve just discovered while searching my blog for a suitable item to link to that I haven’t published any of my PV notes on the blog).

I’ve also pointed out that while 12c allows you to use “fetch first N rows” instead of “where rownum <= N” there’s a hidden threat to using the feature because “fetch first N” turns into a hidden row_number() over() analytic function.

Today’s note is a combination of these two topics, prompted by a request to solve a particular coding issue that has appeared a few times on the Oracle Developer Forum and is likely to be recognisable to a number of people.

I’ll start with a very simple model demonstrateing the simplest use of “fetch first N”:


rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);

With this data set I want to write a query that selects rows for class A where id > 9500, ordered by id – but I only want the first two rows. Here’s a very simple query that gets the result I want, followed by the execution plan from 12.2.0.1 (the A-Rows and E-Rows from 19.3 are slightly different):

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)

The plan – with its stats – shows us that we’ve done an index range scan of an index which will return the data in exactly the order we want, and the “fetch first 2 rows” has been translated into the row_number() over() that we expected; but to our great joy the “window sort stopkey” makes the processing stop very early because Oracle recognises that the base data is arriving in the right order so it isn’t necessary to fetch all of it and sort it. The A-Rows column confirms this interpretation of what has happened.

You might notice, by the way, that the optimizer has costed the query as if it were fetching all the rows even though it “knows” that it’s going to fetch only the first two rows. That’s why I had to include the index hint to make the optimizer use the obvious index – a popular alternative is to use the /*+ first_rows(N) */ hint where N matches, or is similar to, the number of rows you want to fetch. If I had omitted the hint the optimizer would have done a full tablescan and then applied a “window sort pushed rank” operation to sort and limit the result to 2 rows.

So now we come to the real problem: the user has a “current” table and an identical “history” table, and would like to replace the table reference with a reference to a union all view for their clients on Standard Edition, or to a partitioned table for clients running Enterprise Edition – and they don’t really want to do any other code changes. So let’s see what happens when we model the union all. I started with a table called t_odd that held only odd values for id, so I’m going to add a table called t_even that holds only even values for id.

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

As you can see t_even is an identically structured table with similar data, and I’ve created a union all view on top of the two tables, changing the query to reference the view rather than referencing a table. Thanks to the costing anomaly (combined with the small size of the tables) I’ve had to supply a couple of “global” hints to tell the optimizer to use the indexes to access the two tables. So how well does the optimizer do its job when we have a union all view?


----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)

Answer: Bad luck, the optimizer isn’t smart enought to find a cheap way through this query. It’s fetched all the relevant data from the two tables before applying the window sort (which it does with some efficiency – the pushed rank) to produce the right answer. As you can see from the A-Rows column, though, it’s had to acquire a couple of hundred rows from each table before getting down to the 2 rows we wanted.

Partitioned Tables

So let’s try to solve the problem by buying into the partitioning option and creating a list-partitioned table with two partitions, one flagged for current data and one flagged for history data – or ‘O’dd and ‘E’ven data as I’ve created them in my model.



create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')

Note particularly that I have created a local index on this partitioned table – so there’s a very close correspondance between the two tables in the previous example and the two partitions in this example. Here’s the plan when I query the partitioned table for the first two rows:


select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)

The optimizer has let us down again. The plan shows us that we have to acquire all the relevant data from the two partitions before applying the row_number() analytic function and discarding all but the two rows we want. (Again we can check the A-Rows column to see that we have started by fetching a total of 404 rows from the table.)

But what happens if we fall back to the good old-fashioned (non-standard) rownum method:


select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)

Oracle really knows how to use rownum well – notice how there is a count stopkey operation as a child to the partition list all operation, and that’s where our rownum <= 2 predicate is first applied. For each partition Oracle finds “the first two rows” and after it has collected two rows from every partition it sorts them (again with a stopkey) to find the top two in that subset. Check the A-Rows column – we selected a total of 4 rows from the table (2 per partition) and reduced that to 2 rows at operation 3.

Conclusion

There are some patterns of processing where partitioned tables can be a lot friendlier to coders than partition views; if you do have to stick with Standard Edition you can usually get what you want but the coding investment may be significantly higher. Even with partitioned tables, though, there are some “old-fashioned” Oracle methods that do a much nicer job than some of the new-fangled “ANSI” mechanisms.

Footnote

Part of the problem presented here revolves around the desire to keep a pattern of SQL generation that already exists, doing nothing more than replacing a table name with a view (or partitioned table) name.

As we’ve seen, if you start with a simple heap table and try to replace it with a partitioned table you have to use the rownum mechanism rather than the fetch first N rows mechanism.

If you’re running with Standard Edition you can’t do anything simple to replace a table name with the name of a union all view; you’d have to change your code generator to apply all the predicates twice (once for each table) and then apply the rownum predicate or fetch first directive again outside the union all. In other words you have to emulate exactly what Oracle EE manages to do with partitioned tables and rownum.

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)


As you can see, the E-Rows still predicts a lot of work, but the A-Rows tells us the work was kept to the minimum we want.

 

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

November 6, 2019

Table Space

Filed under: Oracle,Problem Solving — Jonathan Lewis @ 5:59 pm GMT Nov 6,2019

There’s a request on the Oracle Developer Forum for assistance to write a report that shows the total space allocation for a table – including its indexes and LOB columns. There is no requirement to allow for partitioned tables, but it’s not entirely clear whether the code should allow for index-organized tables and clustered tables, and there’s no comment about nested tables.

The OP has made an initial attempt to write a suitable query, but the query joins dba_tables to dba_indexes then to dba_lobs then three times to dba_segments (all the joins being outer joins) before aggregating on table name. Unsurprisingly this produces the wrong results because (for example) if a table has two indexes the join from dba_tables to dba_indexes will double the bytes reported for the table. As Andrew Sayer points out in the thread, it would be better to aggregate on each of the separate component types before joining.

This introduces an important, generic, principle for writing code.

  • If it’s difficult to write a single statement can you break the task down into separate components that are easier to handle.
  • If you can express the problem as a small set of simpler components can you then combine the components
  • If this approach works is any loss of efficiency balanced by a gain in clarity and maintainability.

In the case of the OP this isn’t a query that’s going to be runing every few seconds – maybe it will be once per day, maybe three or four times per day. so there’s no need to be worried about making it as efficient as possible – so let’s go for simplicity.

Here’s a query that reports the space allocated for a simple heap table. It references dba_tables and dba_segments, so has to be run by a user with DBA privileges, and at the moment it’s restricted to reporting a single user.

Wait – before I write the query I ought to create a testbed to see if the query is working. So let’s take an empty schema and create a few objects. Let’s create

  • a simple heap table with two indexes and two LOB columns.
  • an Index Organized Table (IOT) with an overflow and two LOB columns.
  • a table with two columns that are a nested table type

Here’s some suitable code:


rem
rem     Script:         sum_table_space.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

prompt  ============================================================
prompt  Simple heap table - copy of all_objects - then add two CLOBs
prompt  ============================================================

create table test_user.t1 as select * from all_objects;

alter table test_user.t1 add (c1 clob) lob (c1) store as basicfile t1_lob1 (disable storage in row);
alter table test_user.t1 add (c2 clob) lob (c2) store as basicfile t1_lob2 (disable storage in row);

update test_user.t1 set c1 = 'x', c2 = 'x' where rownum <= 125;

create index test_user.t1_i1 on test_user.t1(object_name, object_type, object_id);
create index test_user.t1_i2 on test_user.t1(object_type, object_name, object_id);

prompt  =========================================================
prompt  Simple IOT table - with a couple of CLOBs in the overflow
prompt  =========================================================

create table test_user.iot1 (
        id              number          not null,
        ind_pad         varchar2(20)    not null,
        n2              number          not null,
        n3              number          not null,
        padding         varchar2(50),
        ct1             clob,
        ct2             clob,
        constraint iot1_pk primary key (id, ind_pad)
)
organization index
        overflow
        including n3
lob(ct1) store as basicfile iot_lob1(
        disable storage in row
)
lob(ct2) store as basicfile iot_lob2(
        disable storage in row
)
;

insert into test_user.iot1
select
        rownum,
        rpad(rownum,20),
        rownum,
        rownum,
        rpad('x',50,'x'),
        case when mod(rownum,100) = 0 then rpad('x',100,'x') end,
        case when mod(rownum,100) = 0 then rpad('x',100,'x') end
from
        all_objects
where
        rownum <= 10000
;

commit;

prompt  ====================================================
prompt  Now create types to allow creation of a nested table
prompt  ====================================================

create type test_user.jpl_item as object (n1 number, v1 varchar2(32));
/

create type test_user.jpl_tab_type as table of jpl_item;
/

create table test_user.demo_nest (
        id      number,
        nest1   test_user.jpl_tab_type,
        nest2   test_user.jpl_tab_type
)
nested table nest1 store as t_nested1
return as locator
nested table nest2 store as t_nested2
return as locator
;

create unique index test_user.tn1_pk on test_user.t_nested1(nested_table_id, n1) compress 1;
create unique index test_user.tn2_pk on test_user.t_nested2(nested_table_id, n1) compress 1;

create index  test_user.tn1_v1 on test_user.t_nested1(v1);
create index  test_user.tn2_v1 on test_user.t_nested2(v1);

insert into test_user.demo_nest values (
        1,
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item One one'),
                test_user.jpl_item(2,'Item One two')
        ),
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item One one'),
                test_user.jpl_item(2,'Item One two')
        )
);

insert into test_user.demo_nest values (
        2,
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item Two one'),
                test_user.jpl_item(2,'Item Two two')
        ),
        test_user.jpl_tab_type(
                test_user.jpl_item(1,'Item Two one'),
                test_user.jpl_item(2,'Item Two two')
        )
);

commit;

You’ll notice that I’ve prefixed every table, index and type name with a schema name. This is because I set up this test to run as a DBA so I’m creating the objects in one schema while connected to a different schema.

Of course, before trying to report allocations summed by base table, it would be sensible to produce a simple list of the segments we should see so that we know when we’ve reported all of them. So let’s start with that very simple report:


column bytes format 999,999,999,999
column segment_name format a25

break on report on owner on object_name skip 1
compute sum of bytes on report

select
        segment_name, segment_type, bytes
from
        dba_segments
where
        owner = 'TEST_USER'
order by
        segment_type, segment_name
;

SEGMENT_NAME              SEGMENT_TYPE                  BYTES
------------------------- ------------------ ----------------
IOT1_PK                   INDEX                     1,048,576
SYS_C008380               INDEX                     1,048,576
SYS_C008381               INDEX                     1,048,576
SYS_FK0000075579N00002$   INDEX                     1,048,576
SYS_FK0000075579N00004$   INDEX                     1,048,576
T1_I1                     INDEX                     5,242,880
T1_I2                     INDEX                     5,242,880
TN1_PK                    INDEX                     1,048,576
TN1_V1                    INDEX                     1,048,576
TN2_PK                    INDEX                     1,048,576
TN2_V1                    INDEX                     1,048,576
SYS_IL0000075565C00027$$  LOBINDEX                  1,048,576
SYS_IL0000075565C00028$$  LOBINDEX                  1,048,576
SYS_IL0000075572C00006$$  LOBINDEX                  1,048,576
SYS_IL0000075572C00007$$  LOBINDEX                  1,048,576
IOT_LOB1                  LOBSEGMENT                1,048,576
IOT_LOB2                  LOBSEGMENT                1,048,576
T1_LOB1                   LOBSEGMENT                2,097,152
T1_LOB2                   LOBSEGMENT                2,097,152
T_NESTED1                 NESTED TABLE              1,048,576
T_NESTED2                 NESTED TABLE              1,048,576
DEMO_NEST                 TABLE                     1,048,576
SYS_IOT_OVER_75572        TABLE                     1,048,576
T1                        TABLE                    12,582,912
                                             ----------------
sum                                                47,185,920

So we now have a list of segments and a sum of bytes to aim for. One thing you might notice, though, is that the name “IOT1”  has disappeared; instead the thing we see labelled as a TABLE is its overflow segment, called SYS_IOT_OVER_75572 (the number being the object_id of the table we originally defined.  We will have to do something in our code to translate that sys_iot_over_75572″ to iot1″ if we want to make our final report easy to understand.

There are a few other oddities in the list but some of them we may be familiar with already – the indexes with names like SYS_IL0000075565C00027$$ are the lob indexes for the four lobs we created. (I deliberately created basicfile LOBs in case the OP wasn’t using the securefile LOBs that most systems are now using).  The 75565 in the example I just quoted is the object_id of the base table the C00027 tells us that the LOB is column 27 of the table definition.

You may be wondering about the indexes like SYS_C008380 and SYS_FK0000075579N00002$. We may need to work on those – but I’ll tell you the answer straight away – the former is a unique index on the “nest1” column  in the demo_nest table, the latter is a “foreign key” index on the (invisible) nested_table_id column in the nested tables.

Side note: although the sys_fk0000075579N000025 index looks as if it’s supposed to be a “foreign key” index, as suggested by the name, the dependency, and the function, there’s no foreign key constraint declared between the parent table and the nested table. Also, if you check dba_indexes, the generated column has the value ‘N’, even though the name of the index was clearly generated by Oracle internal code.  (The index at the other end of the join – sys_c008380 – is, however, flagged with generated = ‘Y’)

Let’s start building our solution with the easiest bit.


select 
        tab.owner, 
        cast('TABLE' as varchar2(11))   object_type,
        tab.table_name,
        nvl(seg_tab.bytes,0)            bytes  
from
        dba_tables        tab  
left outer join
        dba_segments      seg_tab  
on
        seg_tab.owner      = tab.owner
and     seg_tab.table_name = tab.segment_name  
where
        tab.owner          = 'TEST_USER'
/


OWNER           OBJECT_TYPE TABLE_NAME                      BYTES
--------------- ----------- -------------------- ----------------
TEST_USER       TABLE       T1                         12,582,912
                TABLE       SYS_IOT_OVER_75572          1,048,576
                TABLE       DEMO_NEST                   1,048,576
                TABLE       IOT1                                0
***************                                  ----------------
sum                                                    14,680,064

Immediately we have to correct two problems – we can see the original Iiot1 table – but it doesn’t have an associated data segment, it has a “top”, which is the primary key index segment, and an “overflow” which is designated by Oracle as a table segment. So we need to check the columns in dba_tables to work out how to get the more appropriate name reported for the table segment, and add a predicate to discard the row with the original name. Here’s “tables only mark 2”:

select 
        tab.owner, 
        cast('TABLE' as varchar2(11))   object_type,
        case
                 when tab.iot_type = 'IOT_OVERFLOW' 
                        then tab.iot_name
                else tab.table_name
        end                             object_name,
        case
                 when tab.iot_type = 'IOT_OVERFLOW' 
                        then tab.table_name
        end                             auxiliary_name,
        nvl(seg_tab.bytes,0)            bytes  
from
        dba_tables        tab  
left outer join
        dba_segments      seg_tab  
on
        seg_tab.owner        = tab.owner
and     seg_tab.table_name   = tab.segment_name  
where
        tab.owner            = 'TEST_USER'
and     nvl(tab.iot_type,'NORMAL') != 'IOT'


OWNER           OBJECT_TYPE OBJECT_NAME          AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       TABLE       T1                                                   12,582,912
TEST_USER       TABLE       IOT1                 SYS_IOT_OVER_75572               1,048,576
TEST_USER       TABLE       DEMO_NEST                                             1,048,576

I’ve identified the table type using the iot_type column in dba_tables, eliminated rows where the iot_type is ‘IOT’, and reported the iot_name (with table_name as an auxiliary name) when the iot_type is ‘IOT_OVERFLOW’. And we can now check that the result is consistent with the content of dba_segments that lists segment_type = ‘TABLE’.

So let’s move on to indexes.

select 
        ind.table_owner                 owner, 
        cast('INDEX' as varchar2(11))   object_type,
        ind.table_name,
        index_name                      auxiliary_name,
        nvl(seg_ind.bytes,0)            bytes  
from
        dba_indexes       ind  
left outer join 
        dba_segments      seg_ind  
on
        ind.owner      = seg_ind.owner
and     ind.index_name = seg_ind.segment_name  
where
        ind.table_owner = 'TEST_USER'


OWNER           OBJECT_TYPE TABLE_NAME           AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       INDEX       T1                   SYS_IL0000075565C00027$$         1,048,576
TEST_USER       INDEX       T1                   SYS_IL0000075565C00028$$         1,048,576
TEST_USER       INDEX       T1                   T1_I1                            5,242,880
TEST_USER       INDEX       T1                   T1_I2                            5,242,880
TEST_USER       INDEX       IOT1                 IOT1_PK                          1,048,576
TEST_USER       INDEX       IOT1                 SYS_IL0000075572C00007$$         1,048,576
TEST_USER       INDEX       IOT1                 SYS_IL0000075572C00006$$         1,048,576
TEST_USER       INDEX       T_NESTED1            SYS_FK0000075579N00002$          1,048,576
TEST_USER       INDEX       T_NESTED2            SYS_FK0000075579N00004$          1,048,576
TEST_USER       INDEX       DEMO_NEST            SYS_C008380                      1,048,576
TEST_USER       INDEX       DEMO_NEST            SYS_C008381                      1,048,576
TEST_USER       INDEX       T_NESTED1            TN1_PK                           1,048,576
TEST_USER       INDEX       T_NESTED2            TN2_PK                           1,048,576
TEST_USER       INDEX       T_NESTED1            TN1_V1                           1,048,576
TEST_USER       INDEX       T_NESTED2            TN2_V1                           1,048,576

A quick check shows that we’ve picked up 15 indexes that can be matched against the segments reported by dba_segments – although my query labelled them all as just “INDEX” while dba_segments labels LOB indexes differently. As you can see I’ve reported the table name in each case and reported the index name under the heading auxiliary”_name. This will be a convenience when I’m summing up the space associated with each table.

On to the LOBs – also easy:

select 
        lob.owner, 
        cast('LOB' as varchar2(11))   object_type,
        lob.table_name,
        lob.column_name               auxiliary_name,
        nvl(seg_lob.bytes,0)          bytes  
from
        dba_lobs    lob
left outer join 
        dba_segments      seg_lob
on
        seg_lob.owner        = lob.owner
and     seg_lob.segment_name = lob.segment_name
where 
        lob.owner            = 'TEST_USER'


OWNER           OBJECT_TYPE TABLE_NAME           AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       LOB         T1                   C1                               2,097,152
TEST_USER       LOB         T1                   C2                               2,097,152
TEST_USER       LOB         IOT1                 CT1                              1,048,576
TEST_USER       LOB         IOT1                 CT2                              1,048,576

And finally the nested tables:


select 
        nst.owner, 
        cast('NESTED' as varchar2(11))   object_type,
        nst.parent_table_name            object_name,
        table_name                       auxiliary_name,
        nvl(seg_nst.bytes,0)             bytes  
from
        dba_nested_tables    nst  
left outer join 
       dba_segments          seg_nst  
on
        seg_nst.owner        = nst.owner
and     seg_nst.segment_name = nst.table_name  
where
        nst.owner            = 'TEST_USER'

OWNER           OBJECT_TYPE OBJECT_NAME          AUXILIARY_NAME                       BYTES
--------------- ----------- -------------------- ------------------------- ----------------
TEST_USER       NESTED      DEMO_NEST            T_NESTED1                        1,048,576
TEST_USER       NESTED      DEMO_NEST            T_NESTED2                        1,048,576

A quick check tells us that the four pieces of code produce an exact match for all the segments reported in dba_segments – so all we have to do now is stitch the four queries together with UNION ALL then sum(bytes) by owner and table_name.

There are 3 little problems to deal with though.

  • We have the predicate “where {owner} = ‘TEST_USER'” appearing 4 times in total, which is a bit messy. If we put this outside the UNION ALL will Oracle be smart enough to push it back inside the UNION ALL as part of its query transformation. It seems to, but the plan (which is several hundred lines long – so I’m not going to look too closely) does change a little, but on my little test system it didn’t make much difference to the performance. Initially it looks as if it might be okay to create a view from this query with no restricition on table owner and allow users to query the view with a schema name.
  • The code to report indexes reports the indexes on the nested tables under t_nested1 and t_nested2 – we want the indexes to be associated with table demo_nest, so we need to refine that bit of the code. It needs an outer join to dba_nested_tables to supply the parent_table_name if it exists.
  • On my little system the query takes several seconds to run. Should I worry about that. Not until I’ve tested it on a much bigger system, and not until I know how frequently it needs to run. It may be good enough as it stands, and the ease with which I actually modified the first version of my code to handle the nested tables indexing issue is an indication of the benefits of keeping a complex job as the sum of its parts if it’s reasonable to do so.

Let’s go straight to the final (so far) SQL:


select
        owner, object_name, sum(bytes) 
from    (
        select 
                tab.owner, 
                cast('TABLE' as varchar2(11))   object_type,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.iot_name
                        else tab.table_name
                end                             object_name,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.table_name
                end                             auxiliary_name,
                nvl(seg_tab.bytes,0)            bytes  
        from
                dba_tables        tab  
        left outer join
                dba_segments      seg_tab  
        on
                seg_tab.owner        = tab.owner
        and     seg_tab.segment_name = tab.table_name
        where
                tab.owner            = 'TEST_USER'
        and     nvl(tab.iot_type,'NORMAL') != 'IOT'
        union all
        select 
                ind.table_owner                  owner, 
                cast('INDEX' as varchar2(11))    object_type,
                nvl(
                        nst.parent_table_name,
                        ind.table_name
                )                                table_name,
                index_name                       auxiliary_name,
                nvl(seg_ind.bytes,0)             bytes  
        from
                dba_indexes       ind  
        left outer join
                dba_nested_tables nst
        on
                nst.owner       = ind.table_owner
        and     nst.table_name  = ind.table_name
        left outer join 
                dba_segments      seg_ind  
        on
                seg_ind.owner         = ind.owner
        and     seg_ind.segment_name  = ind.index_name 
        where
                ind.table_owner = 'TEST_USER'
        union all
        select 
                lob.owner, 
                cast('LOB' as varchar2(11))   object_type,
                lob.table_name,
                lob.column_name               auxiliary_name,
                nvl(seg_lob.bytes,0)          bytes  
        from
                dba_lobs    lob
        left outer join 
                dba_segments      seg_lob
        on
                seg_lob.owner        = lob.owner
        and     seg_lob.segment_name = lob.segment_name
        where 
                lob.owner            = 'TEST_USER'
        union all
        select 
                nst.owner, 
                cast('NESTED' as varchar2(11))   object_type,
                nst.parent_table_name            object_name,
                table_name                       auxiliary_name,
                nvl(seg_nst.bytes,0)             bytes  
        from
                dba_nested_tables    nst  
        left outer join 
               dba_segments          seg_nst  
        on
                seg_nst.owner        = nst.owner
        and     seg_nst.segment_name = nst.table_name
        where
                nst.owner         = 'TEST_USER'
        )
where
        owner = 'TEST_USER'
group by
        owner, object_name
order by
        sum(bytes)
/

OWNER           OBJECT_NAME                SUM(BYTES)
--------------- -------------------- ----------------
TEST_USER       IOT1                        6,291,456
TEST_USER       DEMO_NEST                  11,534,336
TEST_USER       T1                         29,360,128
                                     ----------------
sum                                        47,185,920


I wrote this note because my response to the original query was an informal suggestion of taking this approach; then I thought it might be worth writing a general comment about solving big problems by starting with little problems and stitching the pieces together using this case as a concrete example – then I just went ahead to finish the thing off within the boundaries of the original requirements.

There’s a reason, though, why I tend to avoid publishing SQL that might be useful – it’s always possible to overlook bits and end up with people trying to use code that’s not totally appropriate to their system. In this case, of course, there’s a total (but deliberate) failure to handle clusters, partitions and subpartitions, but I don’t know how many other of the “less commonly used” bits of Otacle I haven’t considered. I do know that I haven’t touched on domain indexes (such as text indexes with all their convoluted collections of tables and indexes) but there may be some oddities of (e.g.) advanced queueing, replication, and audit vault that fall outside the range of the possibilities I’ve covered above.

Update 7th Nov 2019

The OP from OTN reported seeing some nested tables in the output – which I thought I’d catered for so I modified the code I have on my sandbox to add a nested table to the IOT, and added a couple of extra indexes on the parent of the nested table, changed a couple of columns and object names to quoted mixed case, and everything still seemed to work okay.

It turned out that the OP had nested tables that were nested 5 deep – which means I needed a hierarchical query to connect the 2nd to 5th (or nth) layer of nesting back to the parent to report the total volume against the top level table. In order to deal with this problem I modified the query in the smallest possible way.

I had two references to the view dba_nested_tables, so I created a hierarchical query based on dba_nested_tables that returned the “oldest ancestor” with each child table rather than the immediate parent. (I’ve written a separate note showing the derivation and testing). Then I turned this query into a “with subquery” (CTE) called top_nested_tables and changed the two calls that the main query made to dba_nested_tables  so that they referenced the CTE instead:


with my_nested_tables as (
select
        /*+ materialize */
        owner, parent_table_name, table_name
from
        dba_nested_tables
where   owner = 'TEST_USER'
),
top_nested_tables as (                                                             -- ***** Replacement CTE defined
select  /*+ materialize */
        owner, parent_table parent_table_name, table_name
from    (
        select
                owner,
                level,
                connect_by_root parent_table_name parent_table,
                table_name 
        from
                my_nested_tables
        connect by
                parent_table_name = prior table_name
        )
where   (owner, parent_table) not in (
                select owner, table_name
                from   my_nested_tables
        )
)
select                                                                             -- ***** Main select
        owner, object_name, sum(bytes)  total_bytes
from    (
        select 
                tab.owner, 
                cast('TABLE' as varchar2(11))   object_type,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.iot_name
                        else tab.table_name
                end                             object_name,
                case
                         when tab.iot_type = 'IOT_OVERFLOW' 
                                then tab.table_name
                end                             auxiliary_name,
                nvl(seg_tab.bytes,0)            bytes  
        from
                dba_tables        tab  
        left outer join
                dba_segments      seg_tab  
        on
                seg_tab.owner        = tab.owner
        and     seg_tab.segment_name = tab.table_name
        where
                tab.owner            = 'TEST_USER'
        and     nvl(tab.iot_type,'NORMAL') != 'IOT'
        union all
        select 
                ind.table_owner                  owner, 
                cast('INDEX' as varchar2(11))    object_type,
                nvl(
                        nst.parent_table_name,
                        ind.table_name
                )                                table_name,
                index_name                       auxiliary_name,
                nvl(seg_ind.bytes,0)             bytes  
        from
                dba_indexes       ind  
        left outer join
                top_nested_tables nst                                              -- ***** view replacement 
        on
                nst.owner       = ind.table_owner
        and     nst.table_name  = ind.table_name
        left outer join 
                dba_segments      seg_ind  
        on
                seg_ind.owner         = ind.owner
        and     seg_ind.segment_name  = ind.index_name 
        where
                ind.table_owner = 'TEST_USER'
        union all
        select 
                lob.owner, 
                cast('LOB' as varchar2(11))   object_type,
                lob.table_name,
                lob.column_name               auxiliary_name,
                nvl(seg_lob.bytes,0)          bytes  
        from
                dba_lobs    lob
        left outer join 
                dba_segments      seg_lob
        on
                seg_lob.owner        = lob.owner
        and     seg_lob.segment_name = lob.segment_name
        where 
                lob.owner            = 'TEST_USER'
        union all
        select 
                nst.owner, 
                cast('NESTED' as varchar2(11))   object_type,
                nst.parent_table_name            object_name,
                table_name                       auxiliary_name,
                nvl(seg_nst.bytes,0)             bytes  
        from
                top_nested_tables nst                                              -- ***** view replacement 
        left outer join 
                dba_segments          seg_nst  
        on
                seg_nst.owner        = nst.owner
        and     seg_nst.segment_name = nst.table_name
        where
                nst.owner         = 'TEST_USER'
        )
where
        owner = 'TEST_USER'
group by
        owner, object_name
order by
        sum(bytes)
;

If you followed the link to the original Oracle Developer Forum thread you’ll see that the OP has made much more use of the “with subquery” (CTE) construct, – including a rather nice touch of having a CTE that reads “select cast(‘MY_SCHEMA’ as varchar2(30)) from dual” – this is particularly cute because it means you only have to supply a fixed schema name once (rather than repeating it many times) in the query and then using a join to the CTE everywhere the name is needed.

 

July 13, 2012

Recursive SQL

Filed under: CBO,Infrastructure,Oracle,Partitioning,Problem Solving,Troubleshooting — Jonathan Lewis @ 4:41 pm BST Jul 13,2012

A question came up on the OTN database forum a few days ago asking why a very simple query against a hash partitioned table was doing so much physical I/O to get a very small amount of data from a single partition:

I have table hash-partitioned on an indexed column “id” which is non-unique and part of my primary key. Inside each partition, rows with the same id are located close to each other which is done by dbms_redefinition reorg using orderby_cols. The intention is to reduce the amount of physical reads since there are no queries that don’t filter on the id column.

What I’m seeing is a lot of physical reads though. The first partition has roughly 80K rows, an average row length of 347, block size of 8K and compression … resulting in 821 blocks. And when (after flushing buffered cache and shared pool) submit a query that filters on “id” only and results is 106 selected rows, I see roughly 1400 physical reads.

The poster has clearly applied some careful thought to minimising the work required – so what’s gone wrong ? To help us analyse the problem he posted the results from running a query through autotrace:

(more…)

Website Powered by WordPress.com.