Oracle Scratchpad

August 26, 2019

Troubleshooting

Filed under: CBO,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:19 pm BST Aug 26,2019

A recent thread on the Oracle Developer Community starts with the statement that a query is taking a very long time (with the question “how do I make it go faster?” implied rather than asked). It’s 12.1.0.2 (not that that’s particularly relevant to this blog note), and we have been given a number that quantifies “very long time” (again not particularly relevant to this blog note – but worth mentioning because your “slow” might be my “wow! that was fast” and far too many people use qualitative adjectives when the important detail is quantative). The query had already been running for 15 hours – and here it is:


SELECT 
        OWNER, TABLE_NAME 
FROM
        DBA_LOGSTDBY_NOT_UNIQUE 
WHERE
        (OWNER, TABLE_NAME) NOT IN (
                SELECT 
                        DISTINCT OWNER, TABLE_NAME 
                        FROM     DBA_LOGSTDBY_UNSUPPORTED
        ) 
AND     BAD_COLUMN = 'Y'

There are many obvious suggestions anyone could make for things to do to investigate the problem – start with the execution plan, check whether the object statistics are reasonably representative, run a trace with wait state tracing enabled to see where the time goes; but sometimes that are a couple of very simple observation you can make that point you to simple solutions.

Looking at this query we can recognise that it’s (almost certainly) about a couple of Oracle data dictionary views (which means it’s probably very messy under the covers with a horrendous execution plan) and, as I’ve commented from time to time in the past, Oracle Corp. developers create views for their own purposes so you should take great care when you re-purpose them. This query also has the very convenient feature that it looks like two simpler queries stitched together – so a very simple step in trouble-shooting, before going into any fine detail, is to unstitch the query and run the two parts separately to see how much data they return and how long they take to complete:


SELECT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_NOT_UNIQUE WHERE BAD_COLUMN = 'Y'

SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED

It’s quite possble that the worst case scenario for the total run time of the original query could be reduced to the sum of the run time of these two queries. One strategy to achieve this would be a rewrite of the form:

select  * 
from    (
        SELECT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_NOT_UNIQUE WHERE BAD_COLUMN = 'Y'
        minus
        SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED
)

Unfortunately the immediately obvious alternative may be illegal thanks to things like duplicates (which disappear in MINUS operations) or NULLs (which can make ALL the data “disappear” in some cases). In this case the original query might be capable of returning duplicates of (owner, table_name) from dba_lgstdby_not_unique which would collapse to a single ocurrence each in my rewrite – so my version of the query is not logically equivalent (unless the definition of the view enforces uniqueness); on the other hand tracking, back through the original thread to the MoS article where this query comes from, we can see that even if the query could return duplicates we don’t actually need to see them.

And this is the point of the blog note – it’s a general principle (that happens to be a very obvious strategy in this case): if a query takes too long, how does it compare with a simplified version of the query that might be a couple of steps short of the final target. If it’s easy to spot the options for simplification, and if the simplified version operates efficiently, them isolate it (using a no_merge hint if necessary), and work forwards from there. Just be careful that your rewrite remains logically equivalent to the original (if it really needs to).

In the case of this query, the two parts took 5 seconds and 9 seconds to complete, returning 209 rows and 815 rows respectively. Combining the two queries with a minus really should get the required result in no more than 14 seconds.

Footnote

The “distinct” in the second query is technically redundant as the minus operation applies a sort unique operation to both the two intermediate result sets before comparing them.  Similarly the  “distinct” was also redundant when the second query was used for the “in subquery” construction – again there would be an implied uniqueness operation if the optimizer decided to do a simple unnest of the subquery.

 

 

 

 

August 23, 2019

Optimizer Tricks 1

Filed under: CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 12:39 pm BST Aug 23,2019

I’ve got a number of examples of clever little tricks the optimizer can do to transform your SQL before starting in on the arithmetic of optimisation. I was prompted to publish this one by a recent thread on ODC. It’s worth taking note of these tricks when you spot one as a background knowledge of what’s possible makes it much easier to interpret and trouble-shoot from execution plans. I’ve labelled this one “#1” since I may publish a few more examples in the future, and then I’ll have to catalogue them – but I’m not making any promises about that.

Here’s a table definition, and a query that’s hinted to use an index on that table.


rem
rem     Script:         optimizer_tricks_01.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem

create table t1 (
        v1      varchar2(10),
        v2      varchar2(10),
        v3      varchar2(10),
        padding varchar2(100)
);

create index t1_i1 on t1(v1, v2, v3);


explain plan for
select
        /*+ index(t1 (v1, v2, v3)) */
        padding 
from 
        t1
where
        v1 = 'ABC'
and     nvl(v3,'ORA$BASE') = 'SET2'
;

select * from table(dbms_xplan.display);

The query uses the first and third columns of the index, but wraps the 3rd column in an nvl() function. Because of the hint the optimizer will generate a plan with an index range scan, but the question is – what will the Predicate Information tell us about Oracle’s use of my two predicates:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC')
       filter(NVL("V3",'ORA$BASE')='SET2')

The nvl() test is used during the index range scan (from memory I think much older versions of Oracle would have postponed the predicate test until they had accessed the table itself). This means Oracle will do a range scan over the whole section of the index where v1 = ‘ABC’, testing every index entry it finds against the nvl() predicate.

But what happens if we modify column v3 to be NOT NULL? (“alter table t1 modify v3 not null;”) Here’s the new plan:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC' AND "V3"='SET2')
       filter("V3"='SET2')


The optimizer will decide that with the NOT NULL status of the column the nvl() function can be eliminated and the predicate can be replaced with a simple column comparison. At this point the v3 predicate can now be used to reduce the number of index entries that need to be examined by using a type of skip-scan/iterator approach, but Oracle still has to test the predciate against the index entries it walks through – so the predicate still appears as a filter predicate as well.

You might notice, by the way, that the Plan hash value does not change as the predicate use changes – even though the change in use of predicates could make a huge difference to the performance. (As indicated in the comments at the top of the script, I’ve run this model against 11.2.0.4 – which is the version used in the ODC thread – and 19.3.0.0: the behaviour is the same in both versions, and the Plan hash value doesn’t change from version to version.)

Footnote

The reason why I decided to publish this note is that the original thread on the ODC forums reported the Following contradictory details – an index definition and the optimizer’s use of that index as shown in the predicate section of the plan:


Index column name      Column position
---------------------- ----------------
FLEX_VALUE_SET_ID      1
PARENT_FLEX_VALUE      2
RANGE_ATTRIBUTE        3
CHILD_FLEX_VALUE_LOW   4
CHILD_FLEX_VALUE_HIGH  5
ZD_EDITION_NAME        6

---------------------------------------------------------------------------
|* 17 |      INDEX RANGE SCAN             | FND_FLEX_VALUE_NORM_HIER_U1   |
---------------------------------------------------------------------------
  17 - access("FLEX_VALUE_SET_ID"=:B1 AND NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2')  
       filter((NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2'  ..... lots more bits of filter predicate.

Since the expression nvl(zd_edition_name, ‘ORA$BASE’) = ‘SET2’ appears as an access predicate and a filter predicate it must surely be a column in the index. So either this isn’t the definition of the index being used or, somehow, there’s a trick that allows zd_edition_name to appear as a column name in the index when it really means nvl(zd_edition_name,’ORA$BASE’) at run-time. (And if there is I want to know what it is – edition-based redefinition and tricks with virtual columns spring to mind, but I avoid thinking about complicated explanations when a simpler one might be available.)

 

August 21, 2019

sql_patch

Filed under: Infrastructure,Oracle,Tuning — Jonathan Lewis @ 4:49 pm BST Aug 21,2019

This note is a short follow-up to a note I wrote some time ago about validating foreign key constraints where I examined the type of SQL Oracle generates internally to do the validation between parent and child tables.  In that article I suggested (before testing) that you could create an SQL patch for the generated SQL to over-ride the plan taken by Oracle – a plan dictated to some extent by hints (including a “deprecated” ordered hint) embedded in the code. I did say that the strategy might not work for SQL optimised by SYS, but it turned out that it did.

Here’s a little script I ran to test a few variations on the theme:


declare
        v1      varchar2(128);
begin
        v1 :=   dbms_sqldiag.create_sql_patch(
                        sql_id  => 'g2z10tbxyz6b0',
                        name    => 'validate_fk',
                        hint_text => 'ignore_optim_embedded_hints'
--                      hint_text => 'parallel(a@sel$1 8)'      -- worked
--                      hint_text => 'parallel(8)'              -- worked
--                      hint_text => q'{opt_param('_fast_full_scan_enabled' 'false')}'  -- worked
                );
        dbms_output.put_line(v1);
end;
/

I’ve tested this on 12.2.0.1 and 19.3.0.0, but for earlier versions of Oracle, and depending what patches you’ve applied, you will need to modify the code.

The SQL_ID represents the query for my specific tables, of course, so you will have to do a test run to find the query and SQL_ID for the validation you want to do. This is what the statement for my parent/child pair looked like (cosmetically adjusted):

select /*+ all_rows ordered dynamic_sampling(2) */ 
        A.rowid, :1, :2, :3
from
        "TEST_USER"."CHILD" A , 
        "TEST_USER"."PARENT" B 
where
        ("A"."OBJECT_ID" is not null) 
and     ("B"."OBJECT_ID"(+) = "A"."OBJECT_ID")
and     ("B"."OBJECT_ID" is null)

The patch that the script creates simply tells Oracle to ignore the embedded hints (in particular I don’t want that ordered hint), but I’ve left a few other options in the text, commenting them out.

Without the patch I got the following plan:.


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  NESTED LOOPS ANTI (cr=399 pr=279 pw=0 time=47801 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=250 pr=247 pw=0 time=19943 us starts=1 cost=32 size=1700000 card=100000)(object id 73191)
     10000      10000      10000   INDEX UNIQUE SCAN PAR_PK (cr=149 pr=32 pw=0 time=3968 us starts=10000 cost=0 size=49995 card=9999)(object id 73189)

Rerunning the validation test with the patch in place I got the following plan – clearly the patch had had an effect.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  HASH JOIN RIGHT ANTI (cr=246 pr=242 pw=0 time=96212 us starts=1 cost=39 size=22000 card=1000)
     10000      10000      10000   INDEX FAST FULL SCAN PAR_PK (cr=24 pr=23 pw=0 time=1599 us starts=1 cost=4 size=50000 card=10000)(object id 73235)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=222 pr=219 pw=0 time=27553 us starts=1 cost=32 size=1700000 card=100000)(object id 73237)
(object id 73229)

Don’t worry too much about the fact that in my tiny example, and with a very new, nicely structured, data set the original plan was a little faster. In a production environment creating a hash table from the parent keys and probing it with the child keys may reduce the CPU usage and random I/O quite dramatically.

Bear in mind that the best possible plan may depend on many factors, such as the number of child rows per parent, the degree to which the parent and child keys arrive in sorted (or random) order, and then you have to remember that Oracle gets a little clever with the original anti-join (note that there are only 10,000 probes for 100,000 child rows – there’s an effect similar to the scalar subquery caching going on there), so trying to patch the plan the same way for every parent/child pair may not be the best strategy.

If you want to drop the patch after playing around with this example a call to execute dbms_sqldiag.drop_sql_patch(name=>’validate_fk’) will suffice.

 

August 20, 2019

Join View

Filed under: constraints,Infrastructure,Joins,Oracle — Jonathan Lewis @ 12:39 pm BST Aug 20,2019

It’s strange how one thing leads to another when you’re trying to check some silly little detail. This morning I wanted to find a note I’d written about the merge command and “stable sets”, and got to a draft about updatable join views that I’d started in 2016 in response to a question on OTN (as it was at the time) and finally led to a model that I’d written in 2008 showing that the manuals were wrong.

Since the manual – even the 19c manual – is still wrong regarding the “Delete Rule” for updatable (modifiable) join views I thought I’d quickly finish off the draft and post the 2008 script. Here’s what the manual says about deleting from join views (my emphasis on “exactly”):

Rows from a join view can be deleted as long as there is exactly one key-preserved table in the join. The key preserved table can be repeated in the FROM clause. If the view is defined with the WITH CHECK OPTION clause and the key preserved table is repeated, then the rows cannot be deleted from the view.

But here’s a simple piece of code to model a delete from a join view that breaks the rule:


rem
rem     Script:         delete_join.sql 
rem     Dated:          Dec 2008
rem     Author:         J P Lewis
rem

create table source
as
select level n1
from dual
connect by level <= 10
/ 
 
create table search
as
select level n1
from dual
connect by level <= 10
/ 

alter table source modify n1 not null;
alter table search modify n1 not null;

create unique index search_idx on search(n1);
-- create unique index source_idx on source(n1)

I’ve set up a “source” and a “search” table with 10 rows each and the option for creating unique indexes on each table for a column that’s declared non-null. Initially, though, I’ve only created the index on search to see what happens when I run a couple of “join view” deletes using “ANSI” syntax.

prompt  ===============================
prompt  Source referenced first in ANSI
prompt  ===============================

delete from (select * from source s join search s1 on s.n1 = s1.n1);
select count(1) source_count from source;
select count(1) search_count from search;
rollback;
 
prompt  ===============================
prompt  Search referenced first in ANSI
prompt  ===============================

delete from (select * from search s join source s1 on s.n1 = s1.n1);
select count(1) source_count from source;
select count(1) search_count from search;
rollback;

With just one of the two unique indexes in place the order of the tables in the inline view makes no difference to the outcome. Thanks to the unique index on search any row in the inline view corresponds to exactly one row in the source table, while a single row in the search table could end up appearing in many rows in the view – so the delete implictly has to operate as “delete from source”. So both deletes will result in the source_count being zero, and the search_count remaining at 10.

If we now repeat the experiment but create BOTH unique indexes, both source and search will be key-preserved in the join. According to the manual the delete should produce some sort of error. In fact the delete works in both cases – but the order that the tables appear makes a difference. When source is the first table in the in-line view the source_count drops to zero and the search_count stays at 10; when search is the first table in the in-line view the search_count drops to zero and the source_count stays at 10.

I wouldn’t call this totally unreasonable – but it’s something you need to know if you’re going to use the method, and something you need to document very carefully in case someone editing your code at a later date (or deciding that they could add a unique index) doesn’t realise the significance of the table order.

This does lead on to another important test – is it the order that the tables appear in the from clause that matters, or the order they appear in the join order that Oracle uses to optimise the query. (We hope – and expect – that it’s the join order as written, not the join order as optimised, otherwise the effect of the delete could change from day to day as the optimizer chose different execution plans!). To confirm my expectation I switched to traditional Oracle syntax with hints (still with unique indexes on both tables), writing a query with search as the first table in the from clause, but hinting the inline view to vary the optimised join order.


prompt  ============================================
prompt  Source hinted as leading table in join order 
prompt  ============================================

delete from (
        select 
                /*+ leading(s1, s) */
                * 
        from 
                search s,
                source s1 
        where
                s.n1 = s1.n1
        )
;

select count(1) source_count from source; 
select count(1) search_count from search;
rollback;

prompt  ============================================
prompt  Search hinted as leading table in join order 
prompt  ============================================

delete from (
        select 
                /*+ leading(s, s1) */
                * 
        from 
                search s,
                source s1 
        where
                s.n1 = s1.n1
        )
;

select count(1) source_count from source; 
select count(1) search_count from search;
rollback;

In both cases the rows were deleted from search (the first table in from clause). And, to answer the question you should be asking, I did check the execution plans to make sure that the hints had been effective:


============================================
Source hinted as leading table in join order
============================================

------------------------------------------------------------------
| Id  | Operation           | Name       | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | DELETE STATEMENT    |            |    10 |    60 |     1 |
|   1 |  DELETE             | SEARCH     |       |       |       |
|   2 |   NESTED LOOPS      |            |    10 |    60 |     1 |
|   3 |    INDEX FULL SCAN  | SOURCE_IDX |    10 |    30 |     1 |
|*  4 |    INDEX UNIQUE SCAN| SEARCH_IDX |     1 |     3 |       |
------------------------------------------------------------------

============================================
Search hinted as leading table in join order
============================================

------------------------------------------------------------------
| Id  | Operation           | Name       | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | DELETE STATEMENT    |            |    10 |    60 |     1 |
|   1 |  DELETE             | SEARCH     |       |       |       |
|   2 |   NESTED LOOPS      |            |    10 |    60 |     1 |
|   3 |    INDEX FULL SCAN  | SEARCH_IDX |    10 |    30 |     1 |
|*  4 |    INDEX UNIQUE SCAN| SOURCE_IDX |     1 |     3 |       |
------------------------------------------------------------------

Summary

Using updatable join views to handle deletes can be very efficient but the manual’s statement of the “Delete Rule” is incorrect. It is possible to have several key-preserved tables in the view that you’re using, and if that’s the case you need to play safe and ensure that the table you want to delete from is the first table in the from clause. This means taking steps to eliminate the risk of someone editing some code at a later date without realising the importance of the table order.

Update (very shortly after publication)

Iduith Mentzel has pointed out in comment #1 below that the SQL Language Reference Guide and the DBA Administration Guide are not consistent in their descriptions of deleting from a join view, and that the SQL Language Reference Guide correctly states that the delete will be applied to the first mentioned key-preserved table.

 

 

August 14, 2019

gather_system_stats

Filed under: CBO,Exadata,Oracle,Statistics,System Stats — Jonathan Lewis @ 2:20 pm BST Aug 14,2019

What happens when you execute dbms_stats.gather_system_stats() with the ‘Exadata’ option ?

Here’s what my system stats look like (12.2.0.1 test results) after doing so. (The code to generate the two different versions is at the end of the note).


System Stats
============
Status: COMPLETED
Timed: 13-Aug-2019 15:00:00 - 13-Aug-2019 15:00:00
--------------------------------------------------
CPUSPEED        :
CPUSPEEDNW      :          918
IOSEEKTIM       :           10
IOTFRSPEED      :      204,800
MAXTHR          :
MBRC            :          128
MREADTIM        :
SLAVETHR        :
SREADTIM        :

PL/SQL procedure successfully completed.

MBRC       :          128
MREADTIM   :
SREADTIM   :
CPUSPEED   :
CPUSPEEDNW :          918
IOSEEKTIM  :           10
IOTFRSPEED :      204,800
MAXTHR     :
SLAVETHR   :

PL/SQL procedure successfully completed.

All the code does is set the MBRC, IOSEEKTIM, and IOTFRSPEED to fixed values and the only real gather is the CPUSPEEDNW. The parameters showing blanks are deliberately set null by the procedure – before I called the gather_system_stats() every parameter had a value. You could also check the SQL trace file (with bind captured enabled) to see the statements that deliberately set those parameters to null if you want more proof.

What are the consequences of this call (assuming you haven’t also done something with the calibrate_io() procedure? Essentially Oracle now has information that says a single (8KB) block read request will take marginally over 10 milliseconds, and a multiblock read request of 1MB will take just over 15 milliseconds: in other words “tablescans are great, don’t use indexes unless they’re really precisely targetted”. To give you a quantitative feel for the numbers: given the choice between doing a tablescan of 1GB to pick 1,500 randomly scattered rows and using a perfect index the optimizer would choose the index.

To explain the time calculations: Oracle has set an I/O seek time of 10 ms, and a transfer rate of 204,800 bytes per ms (200 MB/s), with the guideline that a “typical” multiblock read is going to achieve 128 blocks. So the optimizer believes a single block read will take 10 + 8192/204800 ms = 10.04ms, while a multiblock read request for 1MB will take 10 + 1048576/204800 ms = 15.12 ms.

It’s also important to note that Oracle will use the 128 MBRC value in its calculation of the cost of the tablescan – even if you’ve set the db_file_mulitblock_read_count parameter for the session or system to something smaller; and if you have set the db_file_multiblock_read_count that’s the maximum size of multiblock read that the run-time engine will use.

Code

Here are the two procedures I used to report the values above. You will only need the privilege to execute the dbms_stats package for the second one, but you’ll need the privilege to access the SYS table aux_stats$ to use the first. The benefit of the first one is that it can’t go out of date as versions change.


rem
rem     Script:         get_system_stats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

set linesize 180
set trimspool on
set pagesize 60
set serveroutput on

spool get_system_stats

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        for r1 in (
                select  rownum rn, pname
                from    sys.aux_stats$
                where   sname = 'SYSSTATS_MAIN'
        ) loop
                dbms_stats.get_system_stats(m_status, m_start, m_stop, r1.pname, m_value);
                if r1.rn = 1 then
                        dbms_output.put_line('System Stats');
                        dbms_output.put_line('============');
                        dbms_output.put_line('Status: ' || m_status);
                        dbms_output.put_line(
                                'Timed: ' ||
                                to_char(m_start,'dd-Mon-yyyy hh24:mi:ss') ||
                                ' - ' ||
                                to_char(m_stop ,'dd-Mon-yyyy hh24:mi:ss')
                        );
                        dbms_output.put_line('--------------------------------------------------');
                end if;
                dbms_output.put_line(rpad(r1.pname,15) ||  ' : ' || to_char(m_value,'999,999,999'));
        end loop;
end;
/

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MBRC', m_value);
        dbms_output.put_line('MBRC       : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MREADTIM', m_value);
        dbms_output.put_line('MREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SREADTIM', m_value);
        dbms_output.put_line('SREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEED', m_value);
        dbms_output.put_line('CPUSPEED   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEEDNW', m_value);
        dbms_output.put_line('CPUSPEEDNW : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOSEEKTIM', m_value);
        dbms_output.put_line('IOSEEKTIM  : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOTFRSPEED', m_value);
        dbms_output.put_line('IOTFRSPEED : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MAXTHR', m_value);
        dbms_output.put_line('MAXTHR     : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SLAVETHR', m_value);
        dbms_output.put_line('SLAVETHR   : ' || to_char(m_value,'999,999,999'));
end;
/

spool off

August 13, 2019

Multi-table

Filed under: Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 1:34 pm BST Aug 13,2019

Here’s a problem (and I think it should be called a bug) that I first came across about 6 years ago, then forgot for a few years until it reappeared some time last year and then again a few days ago. The problem has been around for years (getting on for decades), and the first mention of it that I’ve found is MoS Bug 2891576, created in 2003, referring back to Oracle 9.2.0.1, The problem still exists in Oracle 19.2 (tested on LiveSQL).

Here’s the problem: assume you have a pair of tables (call them parent and child) with a referential integrity constraint connecting them. If the constraint is enabled and not deferred then the following code may fail, and if you’re really unlucky it may only fail on rare random occasions:


insert all
        into parent({list of parent columns}) values({list of source columns})
        into child ({list of child columns})  values({list of source columns})
select
        {list of columns}
from    {source}
;

The surprising Oracle error is “ORA-02291: integrity constraint ({owner.constraint_name}) violated – parent key not found”, and the reason is simple (and documented in MoS note 265826.1 Multi-table Insert Can Cause ORA-02291: Integrity Constraint Violated for Master-Detail tables: the order in which the insert operations take place is “indeterminate” so that child rows may be inserted before their parent rows (and for the multi-table insert the constraint checks are not postponed until the statement completes as they are, for instance, for updates to a table with a self-referencing RI constraint).

Two possible workarounds are suggested in Doc ID 265826.1

  • drop the foreign key constraint and recreate it after the load,
  • make the foreign key constraint deferrable and defer it before the insert so that it is checked only on commit (or following an explicit call to make it immediate)

The second option would probably be preferable to the first but it’s still not a very nice thing to do and could leave your database temporarily exposed to errors that are hard to clean up. There are some details of the implementation of deferrable constraints in the comments of this note on index rebuilds if you’re interested in the technicalities.

A further option which seems to work is to create a (null) “before row insert” trigger on the parent table – this appears to force the parent into a pattern of single row inserts and the table order of insertion then seems to behave. Of course you do pay the price of an increase in the volume of undo and redo. On the down-side Bug 2891576 MULTITABLE INSERT FAILS WITH ORA-02291 WHEN FK & TRIGGER ARE PRESENT can also be fouind on MoS, leading 265826.1 to suggests disabling triggers if their correctness is in some way dependent on the order in which your tables are populated. That dependency threat should be irrelevant if the trigger is a “do nothing” trigger. Sadly there’s a final note that I should mention: Bug 16133798 : INSERT ALL FAILS WITH ORA-2291 reports the issue as “Closed: not a bug”

There is a very simple example in the original bug note demonstrating the problem, but it didn’t work on the version of Oracle where I first tested it, so I’ve modified it slightly to get it working on a fairly standard install. (I suspect the original was executed on a database with a 4KB block size.)


drop table child purge;
drop table parent purge;

create table parent (id number primary key);

create table child  (id number, v1 varchar2(4000),v2 varchar2(3920));
alter table child add constraint fk1 foreign key (id) references parent (id);
 
create or replace trigger par_bri
before insert on parent
for each row
begin
        null;
end;
.

insert all
        into parent ( id ) values ( id )
        into child  ( id ) values ( id )
select  100 id from dual
;

In the model above, and using an 8KB block in ASSM, the code as is resulted in an ORA-02991 error. Changing the varchar2(3920) to varchar2(3919) the insert succeeded, and when I kept the varchar2(3920) but created the trigger the insert succeeded.

Fiddling around in various ways and taking some slightly more realistic table definitions here’s an initial setup to demonstrate the “randomness” of the failure (tested on various versions up to 18.3.0.0):


rem
rem     Script:         insert_all_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             10.2.0.5
rem              9.2.0.8
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             small_vc,
        lpad(rownum,100,'0')            medium_vc,
        lpad(rownum,200,'0')            big_vc
from
        generator       v1
;

create table parent(
        id              number,
        small_vc        varchar2(10),
        medium_vc       varchar2(100),
        big_vc          varchar2(200),
        constraint par_pk primary key(id)
)
segment creation immediate
;

create table child(
        id              number,
        small_vc        varchar2(10),
        medium_vc       varchar2(100),
        big_vc          varchar2(200),
        constraint chi_pk primary key(id),
        constraint chi_fk_par foreign key (id) references parent(id)
)
segment creation immediate
;

create table child2(
        id              number,
        small_vc        varchar2(10),
        medium_vc       varchar2(100),
        big_vc          varchar2(200),
        constraint ch2_pk primary key(id),
        constraint ch2_fk_par foreign key (id) references parent(id)
)
segment creation immediate
;

I’ve created a “source” table t1, and three “target” tables – parent, child and child2. Table parent has a declared primary key and both child and child2 have a referential integrity constraint to parent. I’m going to do a multi-table insert selecting from t1 and spreading different columns across the three tables.

Historical note: When I first saw the “insert all” option of multi-table inserts I was delighted with the idea that it would let me query a de-normalised source data set just once and insert the data into a normalised set of tables in a single statement – so (a) this is a realistic test from my perspective and (b) it has come as a terrible disappointment to discover that I should have been concerned about referential integrity constraints (luckily very few systems had them at the time I last used this feature in this way).

The multi-table insert I’ve done is as follows:


insert all
        into parent(id, small_vc)  values(id, small_vc)
        into child (id, medium_vc) values(id, medium_vc)
        into child2(id, medium_vc) values(id, medium_vc)
--      into child2(id, big_vc)    values(id, big_vc)
select
        id, small_vc, medium_vc, big_vc
from
        t1
where
        rownum <= &m_rows_to_insert
;

You’ll notice that I’ve allowed user input to dictate the number of rows selected for insertion and I’ve also allowed for an edit to change the column that gets copied from t1 to child2. Althought it’s not visible in the create table statements I’ve also tested the effect of varying the size of the big_vc column in t1.

Starting with the CTAS and multi-table insert as shown the insert runs to completion if I select 75 rows from t1, but if I select 76 rows the insert fails with “ORA-02991: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”. If I change the order of the inserts into child1 and child2 the violated constraint is TEST_USER.CH2_FK_PAR – so Oracle appears to be applying the inserts in the order they appear in the statement in some circumstances.

Go back to the original order of inserts for child1 and child2, but use the big_vc option for child2 instead of the medium_vc. In this case the insert succeeds for 39 rows selected from t1, but fails reporting constraint TEST_USER.CH2_FK_PAR when selecting 40 rows. Change the CTAS and define big_vc with as lpad(rownum,195) and the insert succeeds with 40 rows selected and fails on 41 (still on the CH2_FK_PAR constraint); change big_vc to lpad(rownum,190) and the insert succeeds on 41 rows selected, fails on 42.

My hypothesis on what’s happening is this: each table in the multitable insert list gets a buffer of 8KB (maybe matching one Oracle block if we were to try different block sizes). As the statement executes the buffers will fill and, critically, when the buffer is deemed to be full (or full enough) it is applied to the table – so if a child buffer fills before the parent buffer is full you can get child rows inserted before their parent, and it looks like Oracle isn’t postponing foreign key checking to the end of statement execution as it does with other DML – it’s checking as each array is inserted.

Of course there’s a special boundary condition, and that’s why the very first test with 75 rows succeeds – neither of the child arrays gets filled before we reach the end of the t1 selection, so Oracle safely inserts the arrays for parent, child and child2 in that order. The same boundary applies occurs in the first of every other pair of tests that I’ve commented on.

When we select 76 rows from t1 in the first test the child and child2 arrays hit their limit and Oracle attempts to insert the child1 rows first – but the parent buffer is far from full so its rows are not inserted and the attempted insert results in the ORA-02991 error. Doing a bit of rough arithmetic the insert was for 76 rows totalling something like: 2 bytes for the id, plus a length byte, plus 100 bytes for the medium_vc plus a length byte, totalling 76 * 104 =7,904 bytes.

When we switch to using the big_vc for child2 the first array to fill is the child2 array, and we have 3 sets of results as we shorten big_vc:

  • 40 * ((1 + 2) + (1 + 200)) = 8160
  • 41 * ((1 + 2) + (1 + 195)) = 8159
  • 42 * ((1 + 2) + (1 + 190)) = 8148

While I’m fairly confident that my “8KB array” hypothesis is in the right ballpark I know I’ve still got some gaps to explain – I don’t like the fact that I’ve got a break point around 7,900 in the first example and something much closer to 8,192 in the other three examples.  I could try to get extra precision by running up a lot more examples with different numbers and lengths of columns to get a better idea of where the error is appearing – but I’m sufficiently confident that the idea is about right so I can’t persuade myself to make the effort to refine it. An example of an alternative algorithm (which is actually a better fit though a little unexpected) is to assume that the normal 5 byte row overhead (column count, lock byte, flags and 2-byte row directory entry) has been included in the array sizing code, and the insert takes place at the point incoming row breaks, or just touches, the limit. In this case our 4 results would suggest the following figures:

  • 75 * 109 = 8175
  • 39 * 209 = 8151
  • 40 * 204 = 8160
  • 41 * 199 = 8159

With these numbers we can see 8KB (8,192 bytes) very clearly, and appreciate that the one extra row would take us over the critical limit, hence triggering the insert and making the array space free to hold the row.

Bottom Line

If you’re using the multi-table “insert all” syntax and have referential integrity declared between the various target tables then you almost certainly need to ensure that the foreign key constraints are declared as deferrable and then deferred as the insert takes place otherwise you may get random (and, until now, surprisingly inexplicable) ORA-02991 foreign key errors.

A possible alternative workaround is to declare a “do nothing” before row insert trigger on the top-level as this seems to switch the process into single row inserts on the top-most parent that force the other array inserts to take place with their parent row using small array sizes and protecting against the foreign key error. This is not an officially sanctioned workaround, though, and may only have worked by accident in the examples I tried.

It is possible, if the 8KB working array hypothesis is correct, that you will never see the ORA-02991 if the volume of data (number of rows * row length) for the child rows of any given parent row is always less than the size of the parent row – but that might be a fairly risky thing to hope for in a production system. It might be much better to pay the overhead of deferred foreign key checking than having a rare, unpredictable error appearing.

 

August 11, 2019

Troubleshooting

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 9:28 pm BST Aug 11,2019

Here’s a question to provoke a little thought if you’ve got nothing more entertaining to do on a Sunday evening.  What threats do you think of when you see a statement like the following in (say) an AWR report, or in a query against v$sql ?

update tableX set
        col001 = :1, col002 = :2, col003 = :3, ...
        -- etc. 
        -- the names are supposed to indicate that the statement updates 302 columns
        -- etc.
        col301 = :301, col302 = :302
where
        pk_col = :303
;

I’ll be writing up some suggestions tomorrow (Monday, UK BST), possible linking to a few other articles for background reading.

Update

The first three comments have already hit the high points, but I’m going to jot down a few notes anyway.

The first two things that really (should) make an impact are:

  • There’s a table in the application with (at least) 303 columns – anything over 255 is a potential disaster area
  • An update statement that updates 302 columns is probably machine generated by a non-scalable application

A secondary detail that might be useful is recognising the pattern of the text – lower case for column names, simple “:nnn” for bind variables.  As it stands I don’t recognise the possible source for this style, but I know it’s not SQL embedded in PL/SQL (which would be all capitals with “:Bnnn” as bind variable names) and it’s not part of a distributed query from a remote database (which would be in capitals with quoted names, and table aliases like “A1, A2, …”), and it’s not “raw” Hiberbate code which produces meaningless table and column aliases based on numbers with a “t” for table and “c” for column.

So let’s think about possible problems and symptoms relating to the two obvious issues:

Wide tables

Once you have more than 255 (real) columns in a table – even if that count includes columns that have been marked unused – Oracle will have to split rows into “rowpieces” that do not exceed 255 columns and chain those pieces together. Oracle will try to be as efficient as possible – with various damage-limiting code changes appearing across versions – attempting store these row pieces together and keeping the number to a minimum, but there are a number of anomalies that can appear that have a significant impact on performance.

Simply having to visit two row pieces to pick up a column in the 2nd row piece (even if it is in the same block) adds to the cost of processing; but when you have to visit a second block to acquire a 2nd (or 3rd, or 4th) rowpiece the costs can be significant. As a quirky little extra, Oracle’s initial generation of row-pieces creates them from the end backwards – so a row with 256 columns starts with a row-piece of one column following by a rowpiece of 255 columns: so you may find that you have to fetch multiple row pieces for virtually every row you access.

It’s worth noting that a row splitting is based only on columns that have been used in the row. If your data is restricted to the first 255 column of a row then the entire row can be stored as a single row piece (following the basic rule that “trailing nulls take no space”); but as soon as you start to update such a row by populating columns past the 255 boundary Oracle will start splitting from the end – and it may create a new trailing row-piece each time you populate a column past the current “highest” column.  In an extreme case I’ve managed to show an example of a single row consisting of 746 row pieces, each in a different block (though that’s a bug/feature that’s been addressed in very recent versions of Oracle).

If rows have been split across multiple blocks then one of the nastier performance problems appears with direct path read tablescans. As Oracle follows the pointer to a secondary row piece it will do a physical read of the target block then immediately forget the target block so, for example, if you have inserted 20 (short) rows into a block then updated all of them in a way that makes them split and all their 2nd row pieces go to the same block further down the table you can find that Oracle does a single direct path read that picks up the head pieces, then 20 “db file sequential read” calls to the same block to pick up the follow-on pieces. (The same effect appears with simple migrated rows.) Contrarily, if you did the same tablescan using “db file scattered read” requests then Oracle might record a single, highly deceptive “table fetch continued row” because it managed to pin the target block and revisit it 20 times.

Often a very wide row (large number of columns) means the table is trying to hold data for multiple types of object. So a table of 750 columns may use the first 20 columns for generic data, columns 21 to 180 for data for object type A, 181 to 395 for data for object type B, and so on.  This can lead to rows with a couple of hundred used columns and several hundred null columns in the middle of each row – taking one byte per null column and burning up lots of CPU as Oracle walks a row to find a relevant column. A particularly nasty impact can appear from this type of arrangement when you upgrade an applications:  imagine you have millions of rows of the “type A” above which use only the first 180 columns.  For some reason the application adds one new “generic” column that (eventually) has to be populated for each row – as the column is populated for a type A row the row grows by 520 (null counting) bytes and splits into at least 3 pieces. The effect could be catastrophic for anyone who had been happy with their queries reporting type A data.

One of the difficulties of handling rows that are chained due to very high column counts is that the statistics can be quite confusing (and subject to change across versions). The most important clue comes from “table fetch continued row”; but this can’t tell you whether your “continued” rows are migrated or chained (or both), which table they come from, and whether you’ve been fetching the same small number multiple times or many different rows. Unfortunately the segment statistics (v$segstat / v$segment_statistics) don’t capture the number of continued fetches by segment – it would be nice if they did since it ought to be a rare (and therefore low-cost) event. The best you can do, probably, is to look at the v$sql_monitor report for queries that report tablescans against large tables but report large numbers of single block reads in the tablescan – and for repeatable cases enable SQL trace with wait tracing enabled against suspect queries to see if they show the characteristic mix of direct path reads and repeated db file sequential reads.

Update every column

The key performance threat in statements that update every column – including the ones that didn’t change – is that Oracle doesn’t compare before and after values when doing the update. Oracle’s code path assumes you know what you’re doing so it saves every “old” value to an undo record (which gets copied to the redo) and writes every “new” value to a redo change vector.  (Fortunately Oracle does check index definitions to see which index entries really have suffered changes, so it doesn’t visit index leaf blocks unnecessarily). It’s possible that some front-end tool that adopts this approach has a configuration option that switches from “one SQL statement for all update” to “construct minimal statement based on screen changes”.

The simple trade-off between these two options is the undo/redo overhead vs. parsing and optimisation overheads as the tool creates custom statements on demand. In the case of the table with more than 255 columns, of course, there’s the added benefit that an update of only the changed columns might limit the update to columns that are in the first rowpiece, eliminating the need (some of the time) to chase pointers to follow-up pieces.

Limiting the update can help with undo and redo, of course, but if the tool always pulls the entire row to the client anyway you still have network costs to consider. With the full row pulled and then updated you may find it takes several SQL*Net roundtrips to transfer the whole row between client and server.  In a quick test on a simple 1,000 column table with an update that set every column in a single row to null (using a bind variables) I found that the a default setup couldn’t even get 1,000 NULLs (let alone “real values”) across the network without resorting to one wait on “SQL*Net more data from client”


variable b1 number
exec :b1 := null;

update t1 set
        col0001 = :b1,
        col0002 = :b1,
        ...
        col1000 = :b1
;

Although “SQL*Net message to/from client” is typically (though not always correctly) seen as an ignorable wait, you need to be a little more suspicious of losing time to “SQL*Net more data to/from client”. The latter two waits mean you’re sending larger volumes of information across the network and maybe you can eliminate some of the data or make the transfer more efficient – perhaps a simple increase in the SDU (session data unit) in the tnsnames.ora, listener.ora, or sqlnet.ora (for default_sdu_size) might be helpful.

Warning

One of the features of trouble-shooting from cold is that you don’t know very much about the system you’re looking at – so it’s nice to be able to take a quick overview of a system looking for any potentially interesting anomalies and take a quick note of what they are and what thoughts they suggest before you start asking questions and digging into a system. This article is just a couple of brief notes along the lines of: “that doesn’t look nice- what questions does it prompt”.

August 9, 2019

Troubleshooting

Filed under: Philosophy — Jonathan Lewis @ 1:33 pm BST Aug 9,2019

An anecdote with a moral.

Many years ago – in the days of Oracle 7.2.3, when parallel query and partition views were in their infancy and when RAC was still OPS (Oracle Parallel Server), I discovered a bug that caused parallel queries against partition views to crash (Maxim – mixing two new features is a good way to find bugs). I no longer recall the details of exact syntax but the problem revolved around the way Oracle used to rewrite queries for parallel execution. In outline it was something like the following:


create or replace view v1
as
select * from t1
union all
select * from t2
;

select  /*+ parallel(v1 2) */ * 
from    v1 
where   pv_col between 1 and 10
and     date_col = to_date('1-Apr-1999','dd-mm-yyyy')

If you had followed the rules about partition views then Oracle would generate some code that managed to combine the partitioning definitions with the view definition and query predicates and come up with rewritten code for the parallel query slaves that looked something like (e.g.)

select  {list of columns}
from    t1
where   pv_col between 1 and 10 
and     pv_col >= 0 and pv_col < 3000 -- > comment to avoid wordpress format issue 
and     date_col = to_date(:SYS_B1,'dd-mm-yyyy') 
union all 
select  {list of columns} 
from    t2 
where   pv_col between 1 and 10 
and     pv_col >= 3000 and pv_col < 6000 -- > comment to avoid wordpress format issue
and     date_col = to_date(:SYS_B2,'dd-mm-yyyy')

In this case I’ve defined my partitions (using constraints on the underlying tables) so that t1 will hold rows where pv_col >= 0 and pv_col < 3000, and t2 will hold rows where pv_col >= 3000 and pv_col < 6000. As you can see the optimizer has expanded the query with the view text and  pulled the constraints into the query and will be able to bypass all access to t2 because it can reduce the pv_col predicates on t2 into the contradiction “1 = 0”. Here’s the basic form of the execution plan we’d be looking for with partition elimination:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    15 |  1500 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |    15 |  1500 |     2   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    VIEW                 | V1       |    15 |  1500 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|   4 |     UNION-ALL           |          |       |       |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR  |          |    12 |  1308 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL | T1       |    12 |  1308 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |      FILTER             |          |       |       |            |          |  Q1,00 | PCWC |            |
|   8 |       PX BLOCK ITERATOR |          |     1 |   109 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  9 |        TABLE ACCESS FULL| T2       |     1 |   109 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------


Note the FILTER at operation 7 – the predicate there is actually “NULL IS NOT NULL” for new versions of Oracle, but would have been “1 = 0” for older versions, so operations 8 and 9 would not get executed.

Notice, also, that my date predicate changes from a literal to a bind variable. This just happened to be the way things were done for parallel query in that version of Oracle. So now we can get to the bug. Our data was actually partitioned by day using a date(-only) column, and the initial target was to keep two years of data. The trouble was that certain queries kept crashing with Oracle error “ORA-12801: error signaled in parallel query server”.

Naturally I created a simplified model (smaller tables, fewer partitions in the views) to try and track down the problem – and the problem disappeared. So I took a crashing query from production, and started creating partition views with fewer and fewer table until the query stopped crashing, and what I discovered was the following:

  • If you had 2 dates in the query it crashed if the view held 128 or more tables
  • If you had 3 dates in the query it crashed if the view held 86 or more tables

Think about the arithmetic for a moment: 2 * 128 = 256, 3 * 86 = 258. Does this give you a clue ?

What it suggested to me was that someone in Oracle Development had used a single byte to index the array of bind variables that they defined for use with the generated parallel query text, so when you had 2 dates and needed 256 bind variable the counter overflowed, when you had 3 dates and needed 258 bind variables the counter overflowed. Having made the hypothesis I predicted that a query would crash if there were 4 dates and 64 partitions, but survive if there were only 63 partitions. (And I was right.)

When I called this in to Oracle support (who remembers those early days when you were supposed to “phone” Oracle support on a “land-line”) and suggested the source of the problem I was told that there was “no limit on the number of bind variables a query could handle in Oracle”. Notice how this is essentially a true statement – but has nothing to do with my suggestion.

Several months (maybe even a couple of years) later – long after the client had upgraded to 7.3.2 then 7.3.4 and I was no long on site – I got a call from Oracle support who wanted to close the TAR (as SR’s used to be known) because they’d discovered the problem and it was … see above. I got the feeling that no-one had considered my suggestion for a long time because they “knew” it had to be  wrong.

The moral(s) of the story

  • Listen to the question carefully – you may not  have heard what you were assuming you would hear.
  • Listen to the answer carefully – it may sound like a convincing response to your question while not being relevant to the actual question.
  • “It’s not supposed to do that” isn’t the same as “That didn’t happen” or (to mis-quote a well-known philosophical problem): “you can’t turn an ‘ought not’ into a ‘did not”

One thing that’s worth emphasising is that everyone (and that does include me) will occasionally hear what they’re expecting to hear and completely misunderstand the point of the question. So when someone says something which is clearly very silly pause for thought then ask them, with care and precision, if they just said what you thought they said – maybe what they said and what you heard were very different. (The same is true for twitter, list servers, forums etc. – it’s very easy to misinterpret a short piece of text, and it may be the way it’s been written but it may be the way it’s been read.)

 

Split Partition

Filed under: Oracle,Partitioning,Performance,Tuning — Jonathan Lewis @ 1:02 pm BST Aug 9,2019

This is a little case study on “pre-emptive trouble-shooting”, based on a recent question on the ODC database forum asking about splitting a range-based partition into two at a value above the current highest value recorded in a max_value partition.

The general principle for splitting (range-based) partitions is that if the split point is above the current high value Oracle will recognise that it can simply rename the existing partition and create a new, empty partition, leaving all the indexes (including the global and globally partitioned indexes) in a valid state. There are, however, three little wrinkles to this particular request:

  • first is that the question relates to a system running 10g
  • second is that there is a LOB column in the table
  • third is that the target is to have the new (higher value) partition(s) in a different tablespace

It’s quite possible that 10g won’t have all the capabilities of partition maintenance of newer versions, and if anything is going to go wrong LOBs are always a highly dependable point of failure, and since all the examples in the manuals tend to be very simple examples maybe any attempt to introduce complications like tablespace specification will cause problems.

So, before you risk doing the job in production, what are you going to test?

In Oracle terms we want to check the following

  • Will Oracle have silently copied/rebuilt some segments rather than simply renaming old segments and creating new, empty segments.
  • Will the segments end up where we want them
  • Will all the indexes stay valid

To get things going, the OP had supplied a framework for the table and told us about two indexes, and had then given us two possible SQL statements to do the split, stating they he (or she) had tested them and they both worked. Here’s the SQL (with a few tweaks) that creates the table and indexes. I’ve also added some data – inserting one row into each partition.

rem
rem     Script:         split_pt_lob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem
rem     Last tested 
rem             12.2.0.1
rem             10.2.0.5
rem

define m_old_ts = 'test_8k'
define m_new_ts = 'assm_2'

drop table part_tab purge;

create table part_tab(
  pt_id            NUMBER,
  pt_name          VARCHAR2(30),
  pt_date          DATE default SYSDATE,
  pt_lob           CLOB,
  pt_status        VARCHAR2(2)
)
tablespace &m_old_ts
lob(pt_lob) store as (tablespace &m_old_ts)
partition by range (pt_date)
(
  partition PRT1 values less than (TO_DATE('2012-01-01', 'YYYY-MM-DD')),
  partition PRT2 values less than (TO_DATE('2014-09-01', 'YYYY-MM-DD')),
  partition PRT_MAX values less than (MAXVALUE)
)
/

alter table part_tab
add constraint pt_pk primary key(pt_id)
/

create index pt_i1 on part_tab(pt_date, pt_name) local
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    1,'one',to_date('01-Jan-2011'),rpad('x',4000),'X'
)
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    2,'two',to_date('01-Jan-2013'),rpad('x',4000),'X'
)cascade=>trueee
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    3,'three',to_date('01-Jan-2015'),rpad('x',4000),'X'
)
/

commit;

execute dbms_stats.gather_table_stats(null,'part_tab',cascade=>true,granularity=>'ALL')

We were told that

The table has
– Primary Key on pt_id column with unique index (1 Different table has FK constraint that refers to this PK)
– Composite index on pt_date and pt_name columns

This is why I’ve added a primary key constraint (which will generate a global index) and created an index on (pt_date,pt_name) – which I’ve created as a local index since it contains the partitioning column.

The description of the requirement was:

  • The Task is to split partition(PRT_MAX) to a different tablespace
  • New partition(s) won’t have data at the moment of creation

And the two “tested” strategies were:

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts,
        PARTITION PRT_MAX tablespace &m_new_ts
);

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts LOB (pt_lob) store as (TABLESPACE &m_old_ts), 
        PARTITION PRT_MAX tablespace &m_new_ts LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
;
 

If we’re going to test these strategies properly we will need queries similar to the following:


break on object_name skip 1
select object_name, subobject_name, object_id, data_object_id  from user_objects order by object_name, subobject_name;

break on index_name skip 1
select index_name, status from user_indexes;
select index_name, partition_name, status from user_ind_partitions order by index_name, partition_name;

break on segment_name skip 1
select segment_name, partition_name, tablespace_name from user_segments order by segment_name, partition_name;

First – what are the object_id and data_object_id for each object before and after the split. Have we created new “data objects” while splitting, or has an existing data (physical) object simply changed its name.

Secondly – are there any indexes or index partitions that are no longer valid

Finally – which tablespaces do physical objects reside in.

On a test run of the first, simpler, split statement here are the before and after results for the object_id and data_object_id, followed by the post-split results for index and segment details:


Before Split
============

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT_MAX                     23679          23679
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT_MAX                     23692          23692
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P254                 23687          23687

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P251                23683          23683
                                                             23680          23680

After split
===========

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT3                        23693          23679
                                 PRT_MAX                     23679          23694
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT3                        23700          23692
                                 PRT_MAX                     23699          23699
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P257                 23697          23687
                                 SYS_IL_P258                 23698          23698

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P255                23695          23683
                                 SYS_LOB_P256                23696          23696
                                                             23680          23680


INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023676C00004$$         N/A


INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023676C00004$$         SYS_IL_P252            USABLE
                                 SYS_IL_P253            USABLE
                                 SYS_IL_P257            USABLE
                                 SYS_IL_P258            USABLE


SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023676C00004$$  SYS_IL_P252            TEST_8K
                          SYS_IL_P253            TEST_8K
                          SYS_IL_P257            TEST_8K
                          SYS_IL_P258            TEST_8K

SYS_LOB0000023676C00004$$ SYS_LOB_P249           TEST_8K
                          SYS_LOB_P250           TEST_8K
                          SYS_LOB_P255           TEST_8K
                          SYS_LOB_P256           TEST_8K

Before the split partition PRT_MAX – with 4 segments: table, index, LOB, LOBINDEX – has object_id = data_object_id, with the values: 23679 (table), 23692 (index), 23683 (LOB), 23687 (LOBINDEX); and after the split these reappear as the data_object_id values for partition PRT3 (though the object_id values are larger than the data_object_id values) – so we infer that Oracle has simply renamed the various PRT_MAX objects to PRT3 and created new, empty PRT_MAX objects.

We can also see that all the indexes (including the global primary key index) have remained valid. We also note that the data_object_id of the primary key index has not changed, so Oracle didn’t have to rebuild it to ensure that it stayed valid.

There is a problem, though, the LOB segment and LOBINDEX segments for the new PRT_MAX partition are not in the desired target tablespace. So we need to check the effects of the second version of the split command where we add the specification of the LOB tablespaces. This is what we get – after rerunning the entire test script from scratch:


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT_MAX                     23729          23729
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT_MAX                     23742          23742
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P274                 23737          23737

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P271                23733          23733
                                                             23730          23730


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT3                        23743          23743
                                 PRT_MAX                     23729          23744
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT3                        23750          23750
                                 PRT_MAX                     23749          23749
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P277                 23747          23747
                                 SYS_IL_P278                 23748          23748

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P275                23745          23745
                                 SYS_LOB_P276                23746          23746
                                                             23730          23730

INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            UNUSABLE
SYS_IL0000023726C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   UNUSABLE
                                 PRT_MAX                USABLE

SYS_IL0000023726C00004$$         SYS_IL_P272            USABLE
                                 SYS_IL_P273            USABLE
                                 SYS_IL_P277            USABLE
                                 SYS_IL_P278            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023726C00004$$  SYS_IL_P272            TEST_8K
                          SYS_IL_P273            TEST_8K
                          SYS_IL_P277            TEST_8K
                          SYS_IL_P278            ASSM_2

SYS_LOB0000023726C00004$$ SYS_LOB_P269           TEST_8K
                          SYS_LOB_P270           TEST_8K
                          SYS_LOB_P275           TEST_8K
                          SYS_LOB_P276           ASSM_2


Before looking at the more complex details the first thing that leaps out to hit the eye is the word UNUSABLE – which appears for the status of the (global) primary key index and the PRT3 subpartition. The (empty) PRT_MAX LOB and LOBINDEX partitions are where we wanted them, but by specifying the location we seem to have broken two index segments that will need to be rebuilt.

It gets worse, because if we check the data_object_id of the original PRT_MAX partition (23729) and its matching index partition (23742) we see that they don’t correspond to the (new) PRT3 data_object_id values which are 23743 and 23750 respectively – the data has been physically copied from one data object to another completely unnecessarily; moreover the same applies to the LOB and LOBINDEX segments – the data object ids for the PRT_MAX LOB and LOBINDEX partitions were 23733 and 23737, the new PRT3 data object ids are 23746 and 23747.

If you did a test with only a tiny data set you might not notice the implicit threat that these changes in data_object_id tell you about – you’re going to be copying the whole LOB segment when you don’t need to.

Happy Ending (maybe)

A quick check with 12.2 suggested that Oracle had got much better at detecting that it didn’t need to copy LOB data and invalidate indexes with the second form of the code; but the OP was on 10g – so that’s not much help. However it was the thought that Oracle might misbehave when you specifyied tablespaces that made me run up this test – in particular I had wondered if specifying a tablespace for the partition that would end up holding the existing data might trigger an accident, so here’s a third variant of the split statement I tested, with the results on the indexes, segments, and data objects. Note that I specify the tablespace only for the new (empty) segments:


alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
    PARTITION PRT3,
    PARTITION PRT_MAX tablespace &m_new_ts  LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
/

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT_MAX                     23754          23754
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT_MAX                     23767          23767
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P284                 23762          23762

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P281                23758          23758
                                                             23755          23755

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT3                        23768          23754
                                 PRT_MAX                     23754          23769
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT3                        23775          23767
                                 PRT_MAX                     23774          23774
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P287                 23772          23762
                                 SYS_IL_P288                 23773          23773

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P285                23770          23758
                                 SYS_LOB_P286                23771          23771
                                                             23755          23755
INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023751C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023751C00004$$         SYS_IL_P282            USABLE
                                 SYS_IL_P283            USABLE
                                 SYS_IL_P287            USABLE
                                 SYS_IL_P288            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023751C00004$$  SYS_IL_P282            TEST_8K
                          SYS_IL_P283            TEST_8K
                          SYS_IL_P287            TEST_8K
                          SYS_IL_P288            ASSM_2

SYS_LOB0000023751C00004$$ SYS_LOB_P279           TEST_8K
                          SYS_LOB_P280           TEST_8K
                          SYS_LOB_P285           TEST_8K
                          SYS_LOB_P286           ASSM_2

All the index and index partitions stay valid; the new empty segments all end up in the target tablespace, and all the data object ids for the old PRT_MAX partitions becaome the data object ids for the new PRT3 partitions. Everything we want, and no physical rebuilds of any data sets.

Moral:

When you’re testing, especially when you’re doing a small test while anticipating a big data set, don’t rely on the clock; check the data dictionary (and trace files, if necessary) carefully to find out what activity actually took place.

Footnote:

It’s possible that there are ways to fiddle around with the various default attributes of the partitioned table to get the same effect – but since 12.2 is much better behaved anyway there’s no point in me spending more time looking for alternative solutions to a 10g problem.

 

August 8, 2019

Free Space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 1:27 pm BST Aug 8,2019

Several years ago I wrote a note about reporting dba_free_space and dba_extents to produce a map of the space usage in a tablespace in anticipation of messing about with moving or rebuilding objects to try and reduce the size of the files in the tablespace.  In the related page where I published the script I pointed out that a query against dba_extents would be expensive because it makes use of structure x$ktfbue which generates the information dynamically by reading segment header blocks. I also pointed out in a footnote to the original article that if you’ve enabled the recyclebin and have “dropped” some objects then there will be some space that is reported as free but is not quite free since the extents will still be allocated. This brings me to the topic for today’s blog.

While visiting a client site recently I came across an instance that was running a regular report to monitor available space in the database. Basically this was a query against view dba_free_space. Surprisingly it was taking a rather long time to complete – and the reason for this came in two parts. First, the recyclebin was enabled and had some objects in it and secondly there were no stats on the fixed object x$ktfbue.

In the case of the client the particular query produced a plan that included the following lines:


Id  Operation             Name              Rows    Bytes  Cost (%CPU)  Time
--  --------------------- ----------------  ----   ------  -----------  --------
63  HASH JOIN                               2785     212K     46  (85)  00:00:01
64    TABLE ACCESS FULL   RECYCLEBIN$       1589    20657      7   (0)  00:00:01
65    FIXED TABLE FULL    X$KTFBUE          100K    6347K     38 (100)  00:00:01 

This is part of the view where Oracle calculates the size of all the extents of objects in the recyclebin so that they can be reported as free space. Notice that in this plan (which is dependent on version, system stats, object_stats and various optimizer parameters) the optimizer has chosen to do a hash join between the recyclebin (recyclebin$) and the x$ structure – and that has resulted in a “full tablescan” of x$ktfbue, which means Oracle reads the segment header block of every single segment in the entire database. (I don’t know where the row stats came from as there were no stats on x$ktfbue, and this plan was pulled from the AWR history tables so the query had been optimised and captured some time in the past.)

If there had been nothing in the recyclebin the hash join and two tablescans wouldn’t have mattered, unfortunately the recyclebin had been enabled and there were a few rows in recyclebin$, so the “tablescan” happened. Here’s a cut-n-paste from a much simpler query run against a fairly new (no 3rd party app) database running 12.1.0.2 to give you some idea of the impact:


SQL> execute snap_events.start_snap

PL/SQL procedure successfully completed.

SQL> select count(*) from x$ktfbue;

  COUNT(*)
----------
      8774

1 row selected.

SQL> execute snap_events.end_snap
---------------------------------------------------------
Session Events - 01-Aug 21:28:13
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
Disk file operations I/O                              7           0           0        .018           1
db file sequential read                           5,239           0          14        .003           6
SQL*Net message to client                             7           0           0        .000           0
SQL*Net message from client                           7           0       1,243     177.562         572
events in waitclass Other                             3           1           0        .002           0

PL/SQL procedure successfully completed.

On my little laptop, with nothing else going on, I’ve managed to get away with “only” 5,239 single block reads, and squeezed them all into just 14 centiseconds (local SSD helps). The clients wasn’t so lucky – they were seeing tens of thousands of real physical reads.

The ideal solution, of course, was to purge the recyclebin and disable the feature – it shouldn’t be necessary to enable it on a production system – but that’s something that ought to require at least some paperwork. In the short term gathering stats on the fixed table helped because the plan changed from a hash join with “tablescan” of x$ktfbue to a nested loop with an “indexed” access path, looking more like the following (from a query against just recyclebin$ and x$ktfbue)

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |       |       |     4 (100)|          |
|   1 |  NESTED LOOPS            |                  |     7 |   182 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL      | RECYCLEBIN$      |     6 |    66 |     4   (0)| 00:00:01 |
|*  3 |   FIXED TABLE FIXED INDEX| X$KTFBUE (ind:1) |     1 |    15 |     0   (0)|          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("BUE"."KTFBUESEGBNO"="BLOCK#" AND "BUE"."KTFBUESEGTSN"="TS#" AND
              "BUE"."KTFBUESEGFNO"="FILE#"))

This was still fairly resource-intensive for the client, but was something of an improvement – they had a lot more than 6 items in their recyclebin.

Part of the problem, of course, is that x$ktfbue is one of the objects that Oracle skips when you gather “fixed object” stats – it can be a bit expensive for exactly the reason that querying it can be expensive, all those single block segment header reads.

If you want to check the stats and gather them (as a one-off, probably) here’s some suitable SQL:


select
        table_name, num_rows, avg_row_len, sample_size, last_analyzed
from
        dba_tab_statistics
where
        owner = 'SYS'
and     table_name = 'X$KTFBUE'
;

begin
        dbms_stats.gather_table_stats('SYS','X$KTFBUE');
end;
/

Summary

You probably shouldn’t have the recyclebin enabled in a production system; but if you do, and if you also run a regular report on free space (as many sites seem to do) make sure (a) you have a regular routine to minimise the number of objects that it accumulates and (b) gather statistics (occasionally) on x$ktfbue to minimise the overhead of the necessary join between recyclebin$ and x$ktfbue.

August 6, 2019

Parse Solution

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 1:14 pm BST Aug 6,2019

In the “Parse Puzzle” I posted a couple of days ago I showed a couple of extracts from an AWR report that showed contradictory results about the time the instance spent in parsing and hard parsing, and also showed an amazing factor of 4 difference between the DB Time and the “SQL ordered by Elapsed Time”. My example was modelling a real world anomaly I had come across, but was engineered to exaggerate the effect to make it easy to see what was going on.

The key feature was VPD (virtual private database) a.k.a. FGAC (find grained access control) or RLS (row-level security). I’ve created a “policy function” (the thing that generates the “security predicate”) to execute an extremely expensive SQL query; then I’ve created a policy with policy_type = ‘DYNAMIC’ so that the function gets executed every time a query against a particular table is executed. In fact my example holds three tables, and each table has its own policy function, and each policy function calls the same very expensive piece of SQL.

To see the effect this has on the AWR report I’ll now supply the contents of the “SQL ordered by Elapsed Time” and work through the list (though not in the order shown) explaining what each statement represents:


SQL ordered by Elapsed Time               DB/Inst: OR18/or18  Snaps: 2059-2060
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
-> %Total - Elapsed Time  as a percentage of Total DB time
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for  302.6% of Total DB Time (s):             158
-> Captured PL/SQL account for  101.7% of Total DB Time (s):             158

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
           156.8              1        156.84   99.4   99.8     .0 1ubpdythth4q1
Module: SQL*Plus
select id, f_rls(n1, n2, n3) from fgac_base where rownum .le. 10                   -- edited to avoid WP  format issue

           156.8             33          4.75   99.3   99.8     .0 9dhvggqtk2mxh
Module: SQL*Plus
 select count(*) X from waste_cpu connect by n .gt. prior n start with n = 1       -- edited to avoid WP  format issue

            53.2             10          5.32   33.7   99.8     .0 5531kmrvrzcxy
Module: SQL*Plus
SELECT COUNT(*) FROM FGAC_REF3 WHERE ID = :B1

            53.2             11          4.83   33.7   99.8     .0 8g2uv26waqm8g
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE3"(:sn, :on); end;

            52.7             10          5.27   33.4   99.7     .0 awk070fhzd4vs
Module: SQL*Plus
SELECT COUNT(*) FROM FGAC_REF1 WHERE ID = :B1

            52.7             11          4.79   33.4   99.7     .0 c8pwn9j11gw5s
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE1"(:sn, :on); end;

            50.9             10          5.09   32.3   99.9     .0 964u0zv0rwpw1
Module: SQL*Plus
SELECT COUNT(*) FROM FGAC_REF2 WHERE ID = :B1

            50.9             11          4.63   32.3   99.9     .0 bgqf405f34u4v
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE2"(:sn, :on); end;

             2.8              1          2.79    1.8   98.4     .0 fhf8upax5cxsz
BEGIN sys.dbms_auto_report_internal.i_save_report (:rep_ref, :snap_id, :pr_class
, :rep_id, :suc); END;

             2.6              1          2.64    1.7   98.4     .0 0w26sk6t6gq98
SELECT XMLTYPE(DBMS_REPORT.GET_REPORT_WITH_SUMMARY(:B1 )) FROM DUAL

             2.4              1          2.43    1.5   98.3     .0 1q1spprb9m55h
WITH MONITOR_DATA AS (SELECT INST_ID, KEY, NVL2(PX_QCSID, NULL, STATUS) STATUS,
FIRST_REFRESH_TIME, LAST_REFRESH_TIME, REFRESH_COUNT, PROCESS_NAME, SID, SQL_ID,
 SQL_EXEC_START, SQL_EXEC_ID, DBOP_NAME, DBOP_EXEC_ID, SQL_PLAN_HASH_VALUE, SQL_
FULL_PLAN_HASH_VALUE, SESSION_SERIAL#, SQL_TEXT, IS_FULL_SQLTEXT, PX_SERVER#, PX

                          ------------------------------------------------------

The first statement is an SQL statement that calls a PL/SQL function f_rls() for 10 consecutive rows in an “ordinary table”. This is the query that actuallly takes 157 seconds to complete as far as the client SQL Plus session is concerned.

The function (called 10 times) passed in three values n1, n2, n3. The function uses n1 to query table FGAC_REF1, n2 to query table FGAC_REF2, and n3 to query FGAC_REF3 – and we can see those three queries appearing as statements 5, 7, and 3 (in that order) in the output. The main query takes 157 seconds to complete because each of those statements takes approximately 52 seconds to complete 10 executions each.

But each of those three statements references a table with a policy that causes a predicate function to be executed for each parse and execute of the statement (one parse, 10 executes) and we can see 11 executions each of calls to fgac_pack.fgac_predicateN (N in 1,2,3), which take roughly 4.8 seconds each, for a total of about 52 seconds across 11 executions.

But those calls to the functions (11 each) all execute the same “connect by” query that appears as statement 2 in the list – for a total off 33 calls of a SQL statement that averages 4.75 seconds – and almost all of the “real” database time is in that 33 calls (33 * 4.75 = 156.75).

So we count 157 seconds because that’s the time spent in the “connect by” queries”, but we count that time again (but under PL/SQL execution) because it’s called from the policy functions, then count it again (under SQL execution) because the functions are called by the “select count(*) from fgac_refN” queries, then count it one final time (under SQL execution) for the driving query. This gives us a total 300% of the actual database time apparently being spent in SQL and 100% apparently being spent in PL/SQL.

You’ll notice that “real” optimisation of the SQL that does run would have taken just fractions of a second (as we saw in the Instance Activity Statistics); but one execution of the “connect by” query would have been associated with the first parse call of each of the fgac_refN queries, so 15 seconds of policy function / connect by query time would have been accounted under the parse time elapsed / hard parse time elapsed we saw in the Time Model statistics.

One of the strangest things about the reporting showed up in the ASH figures – which only ever managed to sample three SQL_IDs, reporting them as “on CPU” in every single sample, and those three SQL_IDs were for the “select count(*) from fgac_refN” queries; the “connect by” queries – the real consumer of CPU resource – didn’t get into the ASH sample at all.

Footnote

I did two things to make an important anomaly very obvious – I added a CPU intensive query to the policy function so that it was easy to see where the time was really going, and I made the VPD policy_type “dynamic” so that the policy function would execute on every parse and execute against the underlying table.

In real life the typcial SQL called within a policy function is very lightweight, and policies tend to be declared with type “context_sensitive”, and this combination minimises the number of calls to the function and the cost of each call. Unfortunately there are 3rd party applications on the market that include complex PL/SQL frameworks in their policy functions and then have the application server reset the session context for every call to the database.

And that’s where I came in – looking at a system where 10% of the CPU was being spent on parsing that apparently couldn’t possibly be happening.

August 4, 2019

Parse Puzzle

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 10:46 am BST Aug 4,2019

Here are some details from an AWR report covering a few minutes in the lifetime of an instance of 18.3. It’s a carefully constructed demonstration and all I’ve done is take an AWR snapshot, execute a single SQL statement, then take another snapshot, so the only thing captured by the report is the work done in that brief time interval. The purpose of the exercise is to demonstrate how some Oracle features can make a complete nonsense of the AWR. (I have, as I often do, produced a model that reproduces an affect that can appear in production but exaggerates the effect to make it more clearly visible.)

First the Time Model statistics:

                                                                % of  % of Total
Statistic Name                                       Time (s) DB Time   CPU Time
------------------------------------------ ------------------ ------- ----------
sql execute elapsed time                                157.8   100.0
DB CPU                                                  157.5    99.8       97.3
parse time elapsed                                       13.6     8.6
hard parse elapsed time                                  13.6     8.6
PL/SQL execution elapsed time                             0.3      .2
PL/SQL compilation elapsed time                           0.0      .0
hard parse (sharing criteria) elapsed time                0.0      .0
repeated bind elapsed time                                0.0      .0
DB time                                                 157.8
background elapsed time                                   7.9
background cpu time                                       4.4                2.7
total CPU time                                          161.9
                          ------------------------------------------------------


Note particularly the parse times – the Time Model show 13.6 seconds spent in (hard) parsing.

Note also that (with a small error) DB time = DB CPU = SQL execute elapsed time, and the background time is very small (in particular it’s less than the parse time). This background time, by the way, is probably related to things that Oracle does behind the scenes when you take an AWR snapshot or run an AWR report.

Given the significant amount of time spent in hard parsing let’s take a look at the Instance Activity statistics – picking only the statistics relating to parse calls:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
parse count (describe)                            0            0.0           0.0
parse count (failures)                            0            0.0           0.0
parse count (hard)                              325            2.1         325.0
parse count (total)                           1,662           10.5       1,662.0
parse time cpu                                   39            0.3          39.0
parse time elapsed                               42            0.3          42.0

Although the Time Model thinks Oracle has spent 13.6 seconds in (hard) parse time, the Instance Activity Statistics says it has only spent 0.42 seconds (the statistic is repored in hundredths) That a fairly significant difference of opinion. So let’s see if we can find out more from the “SQL ordered by …”, and I’m only going to show you one heading as a teaser for the rest of the weekend:


SQL ordered by Elapsed Time               DB/Inst: OR18/or18  Snaps: 2059-2060
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
-> %Total - Elapsed Time  as a percentage of Total DB time
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for  302.6% of Total DB Time (s):             158
-> Captured PL/SQL account for  101.7% of Total DB Time (s):             158

How do you squeeze 400% of the elapsed time into SQL and PL/SQL executions? (Observation: it’s not an IBM P9 taking advantage of SMT/4)

One last set of stats – which will have some room for statistical error since they come from v$active_session_history:


select
        in_parse, in_hard_parse, sql_id, sql_exec_id, count(*)
from
        v$active_session_history  ash
where
        session_id = &m_session_id
and     sample_id between &m_start_sample_id and &m_end_sample_id
group by
        in_parse, in_hard_parse, sql_id, sql_exec_id
order by
        count(*)
/

I I SQL_ID	  SQL_EXEC_ID	COUNT(*)
- - ------------- ----------- ----------
Y Y 964u0zv0rwpw1		       3
Y Y awk070fhzd4vs		       4
Y Y 5531kmrvrzcxy		       5

N N 5531kmrvrzcxy		      42
N N 964u0zv0rwpw1		      42
N N awk070fhzd4vs		      51

6 rows selected.

So the ASH information seems to agree (reasonably closely) with the Time Model statistics – in the interval of the snapshot it’s noted 12 samples (which statistically represents 12 seconds) of hard parse time (and though my query doesn’t show it, the “event” is  null, i.e. “on CPU”).

 

To be continued, some time later this week …

August 1, 2019

LOB reads

Filed under: Infrastructure,LOBs,Oracle — Jonathan Lewis @ 9:24 pm BST Aug 1,2019

This is a blog note that started life in September 2001 (which was before I started blogging, I think), got drafted as a blog in January 2014 because of a question on the Oracle-L list server, and has finally got to publication following a question on the ODC database forum. (But the comments in the blog don’t look as if they are relevant to the question.)

The question on Oracle-l was:  Why is the same CLOB datablock read multiple times?

The answer is basically: client programs tend to set a “sensible” fetch buffer size for LOBs and if the buffer size is less than the size of the LOB the block(s) the LOB is in have to be visited multiple times and for nocache LOBs that means the block(s) will be read multiple times.

This can be seen quite easily in SQL*Plus which has two helpful parameters (one dating back to at least v5):

set long N
set longchunksize M

The “long” setting tells SQL*Plus how much of the LOB to display on screen, the “longchunksize” tells SQL*Plus how much of the LOB to fetch from the database on each call. The default value is 80 for both settings. So let’s set up some data and do a few simple demonstrations using nothing more subtle than autotrace.


rem
rem     Script:         lob_fetch_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2012
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.1.0.2
rem             11.2.0.2
rem

create table t1 (
        id      number , 
        v1      varchar2(60),
        c1      clob, 
        c2      clob
)
lob(c1) store as (enable storage in row cache)
lob(c2) store as (enable storage in row cache)
;

begin
        for i in 1..1000 loop
                insert into t1 values(i, lpad(i,60), lpad(i,60), empty_clob());
        end loop;
        commit;
end;
/

execute dbms_stats.gather_table_stats(null,'t1')

I’ve got a table with two CLOB columns, both declared to enable storage in row and cache. I’ve also got a varchar2() column, and I’ve loaded one of the CLOB columns and the varchar2() column with a 60 character string, setting the other CLOB column explicitly to the empty_clob() value – which is not the same as setting it to NULL.

Now I’ll do 3 selects, fetching the id column and one other column, showing only the autotrace statistics:


set long 80
set longchunksize 80

set arraysize 5000

set autotrace traceonly statistics

prompt  ==============
prompt  Varchar2 fetch
prompt  ==============

select id, v1 from t1;

prompt  ================
prompt  Small CLOB fetch
prompt  ================

select id, c1 from t1;

prompt  ================
prompt  Empty CLOB fetch
prompt  ================

select id, c2 from t1;

set autotrace off

I’ve explicitly set long and longchunksize to 80 (the defaults) just to make it clear that that’s what they are for this test; and I’ve set the arraysize to 5000 so that I can get the 1,000 rows of data in the smallest number of fetches. Here are the three sets of autotrace statistics:


==============
Varchar2 fetch
==============

1000 rows selected.


Statistics
----------------------------------------------------------
         11  recursive calls
         11  db block gets
         75  consistent gets
          0  physical reads
       2040  redo size
      70545  bytes sent via SQL*Net to client
        624  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

================
Small CLOB fetch
================

1000 rows selected.


Statistics
----------------------------------------------------------
          4  recursive calls
          5  db block gets
       2036  consistent gets
          0  physical reads
        992  redo size
     707454  bytes sent via SQL*Net to client
     296624  bytes received via SQL*Net from client
       2002  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

================
Enpty CLOB fetch
================

1000 rows selected.


Statistics
----------------------------------------------------------
          3  recursive calls
          5  db block gets
       2036  consistent gets
          0  physical reads
       1048  redo size
     585454  bytes sent via SQL*Net to client
     296624  bytes received via SQL*Net from client
       2002  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Key points to note (ignoring minor details like the recursive calls to parse the statements):

  • The query for the the varchar2() column did two SQL*Net roundtrips and 75 consistent gets.
  • The query for the small, inline, CLOB did 2,002 SQL*Net roundtrips and 2,036 consistent gets
  • The query for the empty CLOB did exactly the same number of SQL*Net roundtrips and consistent gets as for the small CLOB

Because we were selecting a LOB column SQL*Plus switched from array fetches to single row fetches. In the first fetch of each pair of fetches it fetched the non-CLOB columns and the “LOB locator”; then in the second fetch of each pair it fetched the CLOB data – and it did this even when the LOB locator could (in principle) have told it that there was no data to fetch. (If you select both clob columns in the same query the number of SQL*Net roundtrips will go up to 3 per row in this test.)

Remember I said there was a difference between empty_clob() and NULL ? If you update the table to set c2 to null before running the query to select c2 you’ll see only 1,000 (and a few) SQL*Net roundtrips – you still get single row processing because you’re selecting a LOB column but when the NULL  arrives at SQL*Plus the code will know that there is no LOB data to retrieve.

Now try this:


truncate table t1;

begin
        for i in 1..1000 loop
                insert into t1 values(i, lpad(i,60), lpad(i,180), empty_clob());
        end loop;
        commit;
end;
/

execute dbms_stats.gather_table_stats(null,'t1')

set long 170
select id, c1 from t1;

Pause for thought as you work out what the stats will look like
.
.
.
.
.
The longchunksize (still at the default of 80) is now too small to collect all the data that should be displayed in one SQL*Net roundtrip – it’s going to take 3 roundtrips to get 170 bytes, so we might expect to see about 4,000 SQL*Net roundtrips and a similar number of consistent gets. This is what the stats look like:


Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
       4030  consistent gets
          0  physical reads
          0  redo size
    1485454  bytes sent via SQL*Net to client
     866624  bytes received via SQL*Net from client
       4002  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Sure enough we do single row fetches, one SQL*Net roundtrip for each row with its LOB locator, then three more roundtrips for each LOB for a total of 4,000 round trips and (approximately) 4,000 consistent gets.

One last test – let’s change the c1 clob to nocache and disable storage in row then repeat the last experiment where the long setting is larger than the longchunksize setting.


alter table t1 move lob(c1) store as (disable storage in row nocache);
execute dbms_stats.gather_table_stats(null,'t1')
select id, c1 from t1;

Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
       4001  consistent gets
       6000  physical reads
          0  redo size
    1485454  bytes sent via SQL*Net to client
     866624  bytes received via SQL*Net from client
       4002  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Now there’s a surprise!  I was expecting to see 3000 physical reads as each “out of row, nocache” LOB was accessed three times to pick up the three pieces of longchunksize using direct path reads. So why have I got 6,000 physical reads? Time to enable extended tracing (event 10046, level 8) and repeat.

Here’s a sample from the trace file – this trace is from 18.3, but the LOBREAD lines are a feature that appeared some time around 11.2.0.2.

FETCH #140355181475400:c=58,e=57,p=0,cr=1,cu=0,mis=0,r=1,dep=0,og=1,plh=3617692013,tim=5288719901
WAIT #140355181475400: nam='SQL*Net message from client' ela= 78 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288720015

WAIT #0: nam='direct path read' ela= 12 file number=2 first dba=2867 block cnt=1 obj#=132008 tim=5288720198
WAIT #0: nam='direct path read' ela= 8 file number=2 first dba=2867 block cnt=1 obj#=132008 tim=5288720288
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288720326
LOBREAD: type=PERSISTENT LOB,bytes=80,c=247,e=246,p=2,cr=1,cu=0,tim=5288720350

WAIT #0: nam='SQL*Net message from client' ela= 82 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288720483
WAIT #0: nam='direct path read' ela= 12 file number=2 first dba=2867 block cnt=1 obj#=132008 tim=5288720733
WAIT #0: nam='direct path read' ela= 8 file number=2 first dba=2867 block cnt=1 obj#=132008 tim=5288720836
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288720873
LOBREAD: type=PERSISTENT LOB,bytes=80,c=0,e=261,p=2,cr=1,cu=0,tim=5288720898

WAIT #0: nam='SQL*Net message from client' ela= 121 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288721071
WAIT #0: nam='direct path read' ela= 11 file number=2 first dba=2867 block cnt=1 obj#=132008 tim=5288721216
WAIT #0: nam='direct path read' ela= 8 file number=2 first dba=2867 block cnt=1 obj#=132008 tim=5288721300
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288721335
LOBREAD: type=PERSISTENT LOB,bytes=20,c=0,e=236,p=2,cr=1,cu=0,tim=5288721359

WAIT #0: nam='SQL*Net message from client' ela= 73 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288721506
WAIT #140355181475400: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288721664
FETCH #140355181475400:c=51,e=51,p=0,cr=1,cu=0,mis=0,r=1,dep=0,og=1,plh=3617692013,tim=5288721695
WAIT #140355181475400: nam='SQL*Net message from client' ela= 74 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288721801

WAIT #0: nam='direct path read' ela= 12 file number=2 first dba=2868 block cnt=1 obj#=132008 tim=5288721939
WAIT #0: nam='direct path read' ela= 47 file number=2 first dba=2868 block cnt=1 obj#=132008 tim=5288722065
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288722122
LOBREAD: type=PERSISTENT LOB,bytes=80,c=357,e=357,p=2,cr=1,cu=0,tim=5288722204

WAIT #0: nam='SQL*Net message from client' ela= 81 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288722351
WAIT #0: nam='direct path read' ela= 10 file number=2 first dba=2868 block cnt=1 obj#=132008 tim=5288722489
WAIT #0: nam='direct path read' ela= 11 file number=2 first dba=2868 block cnt=1 obj#=132008 tim=5288722671
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288722720
LOBREAD: type=PERSISTENT LOB,bytes=80,c=0,e=349,p=2,cr=1,cu=0,tim=5288722746

WAIT #0: nam='SQL*Net message from client' ela= 76 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288722874
WAIT #0: nam='direct path read' ela= 11 file number=2 first dba=2868 block cnt=1 obj#=132008 tim=5288723013
WAIT #0: nam='direct path read' ela= 10 file number=2 first dba=2868 block cnt=1 obj#=132008 tim=5288723160
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=132008 tim=5288723199
LOBREAD: type=PERSISTENT LOB,bytes=20,c=0,e=302,p=2,cr=1,cu=0,tim=5288723224

As you can see, the client seems to FETCH one row then do two direct path reads to read 80 bytes from a LOB, then another two direct path reads for the next 80 bytes, then another two direct path reads for the last 20 bytes. So the 6,000 reads is (strangely) doubling the 3,000 I was expecting to see and the trace file seems to say it’s really happening that way.

Unfortunately I decided to take snapshots of the session stats at this point – and that made things even worse (claiming, for example, 4,000 “consistent gets” but 7,000 “no work consistent gets”), the stats seem to be rather messed up and the whole thing looked very different when I ran this particular test on 12.1.0.2 (which claimed only one direct path read per CLOB rather than the three I was expecting).

Bottom line, though: notwithstanding any oddities of reporting, when your client progam is fetching LOBs from the database, you probably have to put up with:
single row fetches – that get the LOB locator then multiple fetches for each LOB, with the number of LOBREAD calls dependent on the buffer size the client allocates for pulling LOB data from the database.

So when handling LOBs make sure you know how the client can be configured to minimise SQL*Net roundtrips, and check a few simple examples looking at the the trace files, session stats and session events before you start loading data at production volumes.

 

July 12, 2019

opt_estimate 5

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 10:28 am BST Jul 12,2019

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

So here’s a simplified version of the problem – it involves pushing a predicate into a union all view. First some data and a baseline query:

rem
rem     Script:         opt_estimate_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem

create table t1
as
select
        rownum                          id,
        100 * trunc(rownum/100)-1       id2,
        mod(rownum,1e3)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        dual
connect by
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create table t2a pctfree 75 as select * from t1;
create table t2b pctfree 75 as select * from t1;

create index t2ai on t2a(id);
create index t2bi on t2b(id);

explain plan for
select
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select 'a' flag, t2a.* from t2a
         union all
         select 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
/

select * from table(dbms_xplan.display(null,null,'outline alias'))
/


There is one row with t1.id = 99, and I would like the optimizer to use an indexed access path to select the one matching row from each of the two tables in the union all view. The smart execution plan would be a nested loop using a “pushed join predicate” – and that’s exactly what we get by default with this data set:


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     1 |    15 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     1 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     1 |    15 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1        / T1@SEL$1
   3 - SET$5715CE2E / T2@SEL$1
   4 - SET$5715CE2E
   5 - SEL$639F1A6F / T2A@SEL$2
   6 - SEL$639F1A6F / T2A@SEL$2
   7 - SEL$B01C6807 / T2B@SEL$3
   8 - SEL$B01C6807 / T2B@SEL$3

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
      INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SET$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
      OUTLINE_LEAF(@"SET$5715CE2E")
      OUTLINE_LEAF(@"SEL$B01C6807")
      OUTLINE_LEAF(@"SEL$639F1A6F")
      ALL_ROWS
      OPT_PARAM('_nlj_batching_enabled' 0)
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)
   6 - access("T2A"."ID"="T1"."N1")
   8 - access("T2B"."ID"="T1"."N1")

So that worked well – operation 2 predicts one row for the tablescan of t1, with a nested loop join and union all pushed predicate where an index range scan of t2a_i1 and t2b_i1 gives us one row from each table. The “Predicate Information” tells us that the t1.n1 join predicate has been pushed inside the view to both subqueries so we see “t2a.id = t1.n1”, and “t2b.id = t1.n1”.

So what if I want to tell Oracle that it will actually find 5 rows in the t2a range scan and table access and 7 rows in the t2b range scan and table access (perhaps in a more complex view that would persuade Oracle to use two different indexes to get into the view and change the join order and access method for the next few tables it accessed). Since I’ve recently just written about the nlj_index_scan option for opt_estimate() you might think that this is the one we need to use – perhaps something like:


opt_estimate(@sel$639f1a6f nlj_index_scan, t2a@sel$2 (t1), t2a_i1, scale_rows=5)
opt_estimate(@sel$b01c6807 nlj_index_scan, t2b@sel$3 (t1), t2b_i1, scale_rows=7)

You’ll notice I’ve been very careful to find the fully qualified aliases for t2a and t2b by looking at the “Query Block Name / Object Alias” section of the plan (if the view appeared as a result of Oracle using Concatenation or OR-Expansion you would find that you got two query block names that looked similar but had suffixes of “_1” and “_2”). But it wasn’t worth the effort, it didn’t work. Fiddling around with all the possible variations I could think of didn’t help (maybe I should have used set$5715ce2e as the query block target for both the hints – no; what if I …)

Of course if we look at the “Outline Data” we’d notice that the use_nl() hint in the outline says: “USE_NL(@SEL$1 T2@SEL$1)”, so we don’t have a nested loop into t2a and t2b, we have a nested loop into the  view t2. So I decided to forget the nested loop idea and just go for the indexes (and the tables, when I got to them) with the following hints (you’ll notice that during the course of my experiments I added my own query block names to the initial query blocks – so the generated query block names have changed):



explain plan for
select
        /*+
                qb_name(main)
                opt_estimate(@sel$f2bf1101, index_scan, t2a@subq_a, t2ai, scale_rows=5)
                opt_estimate(@sel$f2bf1101, table,      t2a@subq_a,       scale_rows=5)
                opt_estimate(@sel$f4e7a233, index_scan, t2b@subq_b, t2bi, scale_rows=7)
                opt_estimate(@sel$f4e7a233, table,      t2b@subq_b,       scale_rows=7)
        */
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select /*+ qb_name(subq_a) */ 'a' flag, t2a.* from t2a
         union all
         select /*+ qb_name(subq_b) */ 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
;

select * from table(dbms_xplan.display(null,null,'outline alias'));


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     5 |    75 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     5 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     7 |   105 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     7 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Excellent – we get the cardinalities we want to see for the tables – except the view operator doesn’t hold the sum of the table cardinalities, and the join doesn’t multiply up the estimates either. I couldn’t find a way of getting the view to show 12 rows (not even with a guessed – but presumably unimplemented – opt_estimate(view …) hint!), however during the course of my experiments I tried the hint: “opt_estimate(@main, table, t2@main, scale_rows=15)”. This didn’t have any visible effect in the plan but while searching through the 10053 trace file I found the following lines:

Table Stats::
  Table: from$_subquery$_002  Alias: T2  (NOT ANALYZED)
  #Rows: 20000  SSZ: 0  LGR: 0  #Blks:  37  AvgRowLen:  15.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 9

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 20000.000000 to 300000.000000 due to opt_estimate hint

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 12.000000 to 180.000000 due to opt_estimate hint

So at some point in the code path the optimizer is aware that 5 + 7 = 12, and that 12 * 15 = 180. But this doesn’t show up in the final execution plan. You might notice, by the way, that the scale_rows=15 has been applied NOT ONLY to the place where I was aiming – it’s also been applied to scale up the 20,000 rows that are estimated to be in the union all to 300,000 as the estimate for a tablescan of the two tables.

Possibly if I spent more time working through the 10053 trace file (which, as I’ve said before, I try to avoid doing) I might have found exactly which code path Oracle followed to get to the plan it produced and managed to tweak some hints to get the numbers I wanted to see. Possibly the optimizer was already following the code path that actually produced the numbers I wanted, then “forgot” to use them. One day, perhaps, I’ll tale another look at the problem – but since I wasn’t trying to solve a problem for a client (and given that there was an alternative workaround) I closed the 10053 trace file and put the model aside for a rainy day.

Footnote

One thought did cross my mind as a way of finding out if there was a real solution – and I offer this for anyone who wants to play: create a second data set that genuinely produces the 5 and 7 I want to see (and, check that the view reports the sum of the two components); then run the original query against the original data so that you’ve got the execution plan in memory, overwrite the original data with the new data set (without changing the statistics on the orginal). Then use the SQL Tuning Advisor to see if it produces a SQL profile for the captured SQL_ID that reproduces the correct plan for the second data set and check what opt_estimate() hints it uses.  (Warning – this may turn into a frustrating waste of time.)

Update Oct 2019

I’ve been saying for years that I don’t like the trick of pulling the Outline Information from an execution plan in memory and storing it in the database as an SQL Profile because that’s effectively storing an SQL Plan Baseline as an SQL Profile and there might be subtle and (potentially) misleading side effects of abusing the two mechanisms. Behind the argument I’ve also made the observation that while both mechamisms store hints, the hints for an SQL Profile are about statistics and the hints for an SQL Plan Baseline are about transformations, joins, and other mechanis.

However .;.

I’ve now down the test I described in the foot note above – created a table with data in it that made Oracle choose full tablescans for the t2a and t2b tables, then changed the data (without changing the object statistic) and run the SQL Tuning tool to see if the optimizer would suggest the plan I wanted and offer a profile to produce it.

I was successful – Oracle offered the profile, and when I looked at it (before accepting it) it looked like this:


         1 OPT_ESTIMATE(@"SEL$1", TABLE, "T2"@"SEL$1", SCALE_ROWS=200)
         1 OPT_ESTIMATE(@"SEL$1", JOIN, ("T2"@"SEL$1", "T1"@"SEL$1"), SCALE_ROWS=15)
         1 OPTIMIZER_FEATURES_ENABLE(default)
         1 IGNORE_OPTIM_EMBEDDED_HINTS

But when I accepted it and looked at it again it looked like this:


        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
        IGNORE_OPTIM_EMBEDDED_HINTS
        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
        INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
        USE_NL(@"SEL$1" "T2"@"SEL$1")
        LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
        NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
        FULL(@"SEL$1" "T1"@"SEL$1")
        OUTLINE(@"SET$1")
        OUTLINE(@"SEL$3")
        OUTLINE(@"SEL$2")
        OUTLINE_LEAF(@"SEL$1")
        PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
        OUTLINE_LEAF(@"SET$5715CE2E")
        OUTLINE_LEAF(@"SEL$B01C6807")
        OUTLINE_LEAF(@"SEL$639F1A6F")
        ALL_ROWS
        DB_VERSION('19.1.0')
        OPTIMIZER_FEATURES_ENABLE('19.1.0')
        INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))

In other words, Oracle has recorded something that looks like an SQL Plan Baseline and called it an SQL Profile.

July 9, 2019

Assumptions

Filed under: Oracle,Philosophy — Jonathan Lewis @ 11:47 am BST Jul 9,2019

Over the last few days I’ve been tweeting little extracts from Practical Oracle 8i, and one of the tweets contained the following quote:

This lead to the question:

Good question! The whole undo/redo infrastructure in Oracle is probably the most astounding technological achievement in the entire code base – so would you test it to see that it was working properly and if you could break it ? Probably not – although if you were about to recreate your undo tablespace with a 32KB block size you might test to see if the change would produce any surprise side-effects); or you might wonder if anything funny could happen to the redo generation  if you created all your varchar2() columns as 4000 bytes “just in case”. or possibly you’d check for undo or redo anomalies if you were told to create a table with more than 255 columns.

I don’t know quite what I was trying to imply (20 years ago) when I wrote the quoted sentence. Possibly I was trying to avoid saying “new features”, because it’s not just the new features you need to test. I was probably trying to suggest the flavour of “exotic”, “high-tech”, “exciting” – which basically comes down to the things where you think you might be (h/t Martin Widlake) a “thought leader” or ground-breaker.  If very few people have used some feature of Oracle you might be the first person to use that feature in a specific fashion – so if there’s a surprise (or bug) waiting to be found you’ll be the first to find it and you don’t want to find it in production.

Anything in Oracle might have an odd boundary condition, and life (or the project life-cycle) is too short to test everything – but almost any time you feel you may be going beyond “common usage”, it’s worth thinking about what might go wrong.

As a closing item of entertainment – here’s a little demonstration (last run on 19.2):


rem
rem     Script:         assumptions.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem     Purpose:
rem
rem     Last tested
rem             19.2.0.0        (LiveSQL)
rem             18.3.0.0
rem
rem     Notes:
rem     Add the predicate "where rownum <= 1600" to test on LiveSSQL
rem

create table t1
as
select * from all_objects
/

create table pt1(
        OWNER, OBJECT_NAME, SUBOBJECT_NAME,
        OBJECT_ID constraint pt1_pk primary key using index local,
        DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
        TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE,
        ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,
        SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID
)
partition by hash (object_id) (
        partition p1,
        partition p2,
        partition p3,
        partition p4
)
as
select * from all_objects
/

create table iot1 (
        OWNER, OBJECT_NAME, SUBOBJECT_NAME,
        OBJECT_ID constraint iot1_pk primary key,
        DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
        TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE,
        ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,
        SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID
)
organization index
as
select * from all_objects
/

create table ptiot1 (
        OWNER, OBJECT_NAME, SUBOBJECT_NAME,
        OBJECT_ID constraint ptiot1_pk primary key,
        DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
        TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE,
        ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,
        SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID
)
organization index
partition by hash (object_id) (
        partition p1,
        partition p2,
        partition p3,
        partition p4
)
as
select * from all_objects
/

alter table t1 move online;

alter table pt1 move partition p1 online;

alter table iot1 move online;

alter table ptiot1 move partition p1 online;

It's a simple test – copying data from view all_objects I've created:

  • A simple heap table
  • A hash partitioned heap table – with locally partitioned primary key index
  • A simple index-organized table
  • A hash partitioned indesc organized table

Then I’ve issued an online move command for each table. I often lose track of which enhancements to features appeared in which version of Oracle, but I think the following is correct:

  • alter table t1 move online – the online option became possible in 12.2
  • alter table pt1 move partition online – the online option because possible in 12.1
  • alter table iot1 move online – the online option became possible in Oracle 8i (and gets a mention in Practical Oracle 8i)
  • alter table ptiot1 move partition online – any guesses ?)

In the absence of my bait-and-switch lead-up to the final question I think you could be forgiven for assuming that you would be able to move a partition of a partitioned index-organized table online – but even in 19.2 you’ll end up with error message: ORA-14808: table does not support ONLINE MOVE PARTITION.

In a vacuum it’s okay to make the mistake – on the other hand if someone suggested changing a partitioned table in your production system into a partitioned IOT it ought to be one of the first things you’d check (on a small model). Sadly I have been in design meetings where weeks of effort have been spent on producing a detailed design that can’t possibly work because no-one checked to see if some critical detail (like online move of IOT partitions) was actually possible – and that’s the background for the statement:

“If you’re going to depend on a technological feature of Oracle, you need to make sure that you have tried to break it, in half a dozen ways, before you use it in production.”

There are many technological features of Oracle that you can assume (safely) have been tested by many other people – when you get to the edge of the know universe your watchword should be: Here be Dragons.

 

« Previous PageNext Page »

Powered by WordPress.com.