Oracle Scratchpad

July 12, 2016

Union All MV

Filed under: Infrastructure,Materialized view,Oracle — Jonathan Lewis @ 10:10 am BST Jul 12,2016

In an article I wrote last week about Bloom filters disappearing as you changed a SELECT to a (conventional) INSERT/SELECT I suggested using the subquery_pruning() hint to make the optimizer fall back to an older strategy of partition pruning. My example showed this working with a range partitioned table but one of the readers reported a problem when trying to apply the strategy to a composite range/hash partitioned table and followed this up with an execution plan of a select statement with a Bloom filter where the subquery_pruning() hint didn’t introduced subquery pruning when the select was used for an insert.

A couple of standard ways to work around this probelm are to embed the select statement in a pipeline function so that we can “insert into table select from table(pipeline_function)”, or to write a pl/sql block that opens a cursor to do a select with bulk collect and loops through an array insert. The overhead in both cases is likely to be relatively small (especially when compared with the overhead of failing to filter). In this case, however, the reader suggested that maybe the problem appeared because the driving table (i.e. the one that would have been query to derive the pruning values) was actually an inline view with a union all.

After modifying my working model to try a couple of different tests I was inclined to agree. Since the two tables in the view looked as if they were likely to be relatively tiny and static I suggested that it would be safe to create a materialized view defined to “refresh on commit” and then use the materialized view explicitly in the query. This, finally, brings me to the point of today’s article – how do you create such a materialized view ?

I’m going to start by creating a couple of small base tables from a familiar object:


create table tt as select * from all_objects where object_type = 'TABLE';
create table tv as select * from all_objects where object_type = 'VIEW';

alter table tt add constraint tt_pk primary key (object_id);
alter table tv add constraint tv_pk primary key (object_id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'TT',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'TV',
                method_opt       => 'for all columns size 1'
        );
end;
/

Assume, now, that I need an inline view that is interested in the things you will recognise from the above as the tables owned by OUTLN (which will apper in tt) and the views owned by SYSTEM (which will appear in tv) – in the 11.2.0.4 system I’m playing on at the moment that’s three rows from each of the two tables). Here’s the SQL I’d put into the inline view:


select
        object_id, object_type, object_name
from    tt
where   owner = 'OUTLN'
union all
select
        object_id, object_type, object_name
from    tv
where   owner = 'SYSTEM'
;

Since this view won’t give me partition pruning I have to replace it with a table and because I want to ensure that the table is always up to date I have to generate it as the container for a materialized view with refresh on commit. First I need some materialized view logs so that I can do a fast refresh:


create materialized view log on tt
with
        rowid, primary key
        (object_type, object_name, owner)
including new values
;

create materialized view log on tv
with
        rowid, primary key
        (object_type, object_name, owner)
including new values
;

I’ve included the primary key in the definition because I happen to want the object_id column in the log – but I could just have included it as a column in the filter list. I’ve included the rowid in the definition because Oracle needs the rowid if it’s going to be able to do a fast refresh. I can now create a materialized view:


create materialized view mv_t
        build immediate
        refresh fast on commit
as
select
        'T' mv_marker,
        rowid rid,
        object_id, object_type, object_name
from    tt
where   owner = 'OUTLN'
union all
select
        'V' mv_marker,
        rowid rid,
        object_id, object_type, object_name
from    tv
where   owner = 'SYSTEM'
;

I’ve taken the option to “build immediate” and specified – most importantly for my needs – “refresh on commit”. You’ll notice I haven’t chosen to “enable query rewrite”; for the purposes of this demo I don’t need that particular feature.

There are two key features to the materialized view that are a little special – first I’ve included the rowid of each source table as a named column in the materialized view; as I mentioned above Oracle will not allow the view to be fast refreshable without the rowid. The second feature is that I’ve introduced a literal value into the view which I’ve named mv_marker; this makes it easy to see which table a row comes from when you query the materialized view … and Oracle needs to see this.

That’s the job done. Just to demonstrate that my materialized view is working as required here’s a little more SQL (following by the output):


select * from mv_t;

delete from tt where object_name = 'OL$';
update tv set object_name = 'PRODUCT_PRIVILEGES' where object_name = 'PRODUCT_PRIVS';

commit;

select * from mv_t;

=======================================

M RID                 OBJECT_ID OBJECT_TYPE         OBJECT_NAME
- ------------------ ---------- ------------------- --------------------------------
T AAA6tXAAFAAAAEBAAI        471 TABLE               OL$
T AAA6tXAAFAAAAEBAAJ        474 TABLE               OL$HINTS
T AAA6tXAAFAAAAEBAAK        478 TABLE               OL$NODES
V AAA6tWAAFAAAACgABI       8260 VIEW                SCHEDULER_PROGRAM_ARGS
V AAA6tWAAFAAAACgABJ       8261 VIEW                SCHEDULER_JOB_ARGS
V AAA6tWAAFAAAACuAA7      14233 VIEW                PRODUCT_PRIVS

6 rows selected.

2 rows deleted.


1 row updated.


Commit complete.


M RID                 OBJECT_ID OBJECT_TYPE         OBJECT_NAME
- ------------------ ---------- ------------------- --------------------------------
T AAA6tXAAFAAAAEBAAJ        474 TABLE               OL$HINTS
T AAA6tXAAFAAAAEBAAK        478 TABLE               OL$NODES
V AAA6tWAAFAAAACgABI       8260 VIEW                SCHEDULER_PROGRAM_ARGS
V AAA6tWAAFAAAACgABJ       8261 VIEW                SCHEDULER_JOB_ARGS
V AAA6tWAAFAAAACuAA7      14233 VIEW                PRODUCT_PRIVILEGES

5 rows selected.

If you’re wondering why you see “2 rows deleted” but a reduction by just one row in the final output, remember that we’re deleting from table tt but the materialized view holds information about just the subset of tables owned by OUTLN – I happen to have a row in tt that says SYSTEM also owns a table called OL$.

Assistance

If you have trouble working out why your attempts to create a particular materialized view aren’t working the dbms_mview package has a procedure called explain_mview that may give you enough ideas to work out what you’re doing wrong. For example, here’s how I could find out that I needed a literal column to tag the two parts of my union all view:


@$ORACLE_HOME/rdbms/admin/utlxmv.sql

begin
        dbms_mview.explain_mview (
                q'{
                create materialized view mv_t
                        build immediate
                        refresh fast
                        enable query rewrite
                as
                select  -- 'T' mv_marker,
                        rowid rid,
                        object_id, object_type, object_name from tt
                union all
                select  -- 'V' mv_marker,
                        rowid rid,
                        object_id, object_type, object_name from tv
                }'
        );
end;
/

column cap_class noprint
column related_text format a7
column short_msg format a72
break on cap_class skip 1

select
        substr(capability_name,1,3) cap_class,
        capability_name, possible, related_text, substr(msgtxt,1,70) short_msg
from
        mv_capabilities_table
where
        mvname = 'MV_T'
order by
        substr(capability_name,1,3), related_num, seq
;

The first line calls a supplied script to create a table called mv_capabilities_table in the current schema. The call to dbms_mview.explain_mview passes the text of a “create materialized view” statement to the procedure (there are a couple of variations possible) then, after a couple of SQL*Plus formatting commands I’ve queried the table to see Oracle’s analysis for the statement. (You can tag each call to this procedure using a second parameter that I haven’t bothered to use.)

Here’s the output for the failed attempt above, which has commented out the literals that tag the two parts of the UNION ALL:

CAPABILITY_NAME                POS RELATED SHORT_MSG
------------------------------ --- ------- ------------------------------------------------------------------------
PCT_TABLE                      N   TT      relation is not a partitioned table
PCT_TABLE_REWRITE              N   TT      relation is not a partitioned table
PCT_TABLE                      N   TV      relation is not a partitioned table
PCT_TABLE_REWRITE              N   TV      relation is not a partitioned table
PCT                            N

REFRESH_COMPLETE               Y
REFRESH_FAST                   N
REFRESH_FAST_AFTER_INSERT      N           the materialized view does not have a UNION ALL marker column
REFRESH_FAST_AFTER_INSERT      N           set operator in a context not supported for fast refresh
REFRESH_FAST_AFTER_ONETAB_DML  N           see the reason why REFRESH_FAST_AFTER_INSERT is disabled
REFRESH_FAST_AFTER_ANY_DML     N           see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_PCT               N           PCT FAST REFRESH is not possible if query has set operand query blocks

REWRITE                        Y
REWRITE_FULL_TEXT_MATCH        Y
REWRITE_PARTIAL_TEXT_MATCH     N           set operator encountered in mv
REWRITE_GENERAL                N           set operator encountered in mv
REWRITE_PCT                    N           general rewrite is not possible or PCT is not possible on any of the d


17 rows selected.

The query manages to split the output into three sections (but that depends on a side-effect in a way that I would normally call bad design): elements relating to “Partition Change Tracking”, elements relating to “Materialized View Refresh” and elements relating to “Query Rewrite”. You’ll notice that the rewrite section tells me that (even though I haven’t chosen to enable it) my view could be enabled to do query rewrite.

Critically, though, this version of the materialized view can’t be fast refreshed, and we see the key reason in the first “Refresh fast after insert” line: “the materialized view does not have a UNION ALL marker column”. That’s how I know I have to include a literal column that has a different value in each of the two parts of the UNION ALL.

June 24, 2016

Never …

Filed under: Infrastructure,Oracle,Philosophy — Jonathan Lewis @ 1:15 pm BST Jun 24,2016

From time to time a question comes up on OTN that results in someone responding with the mantra: “Never do in PL/SQL that which can be done in plain  SQL”. It’s a theme I’ve mentioned a couple of times before on this blog, most recently with regard to Bryn Llewellyn’s presentation on transforming one table into another and Stew Ashton’s use of Analytic functions to solve a problem that I got stuck with.

Here’s a different question that challenges that mantra. What’s the obvious reason why someone might decide to produce the following code rather than writing a simple “insert into t1 select * from t2;”:


declare

        cursor c1 is
        select * from t2
        ;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

begin

        open c1;
        loop
                fetch c1
                bulk collect into m_tab limit 100;

                begin
                        forall i in 1..m_tab.count
                                insert into t1 values m_tab(i);
                exception
                        when others
                                then begin
                                        --  proper exception handling should go here
                                        dbms_output.put_line(m_tab(1).id);
                                        dbms_output.put_line(sqlerrm);
                                end;
                end;

                exit when c1%notfound;

        end loop;
        close c1;
end;
/

There is a very good argument for this approach.

Follow-up (Saturday 25th)

As Andras Gabor pointed out in one of the comments, there are documented scenarios where the execution plan for a simple select statement is not legal for the select part of an “insert into .. select …” statement. Specifically, if you have a distributed query the most efficient execution plan may require the remote site to be the driving site, but the plan for a CTAS or insert/select is required to use the local site as the driving site.

There are workarounds – if you’re allowed to use them – such as creating a view at the remote site and selecting from the view, or you could create a pipelined function locally and select from the pipelined function (but that’s going to be writing PL/SQL anyway, and you’d have to create one or two object types in the database to implement it).s

Another example of plan limitations, that I had not seen before (but have now found documented as “not a bug in MoS note 20112932”), showed up in a comment from Louis: a select statement may run efficiently because the plan uses a Bloom filter, but the filter disappears when the statement is used in insert/select.

These limitations, however, were not the point I had in mind. The “obvious” reason for taking the pl/sql approach is error handling. What happens if one of the rows in your insert statement raises an Oracle exception ? The entire statement has to rollback. If you adopt the PL/SQL array processing approach then you can trap each error as it occurs and decide what to do about it – and there’s an important detail behind that statement that is really important: the PL/SQL can operate at virtually the same speed as the simple SQL statement once you’ve set the arraysize to a value which allows each insert to populate a couple of blocks.

Let me emphasise the critical point of the last sentence:  array inserts in PL/SQL operate at (virtually) the speed of the standard SQL insert / select.

As it stands I don’t think the exception handler in my code above could detect which row in the batch had caused the error – I’ve just printed the ID from the first row in the batch as a little debug detail that’s only useful to me because of my knowledge of the data. Realistically the PL/SQL block to handle the inserts might look more like the following:


-- In program declaration section

        dml_errors      exception;
        pragma exception_init(dml_errors, -24381);

        m_error_pos     number(6,0)     := 0;

-- ------------------------------

                begin
                        forall i in 1..m_tab.count save exceptions
                                insert into t1 values m_tab(i);
                exception
                        when dml_errors then begin

                                for i in 1..sql%bulk_exceptions.count loop

                                        dbms_output.put_line(
                                                'Array element: ' ||
                                                        sql%bulk_exceptions(i).error_index || ' ' ||
                                                        sqlerrm(-sql%bulk_exceptions(i).error_code)
                                        );

                                        m_error_pos := sql%bulk_exceptions(i).error_index;
                                        dbms_output.put_line(
                                                'Content: ' || m_tab(m_error_pos).id || ' ' || m_tab(m_error_pos).n1
                                        );

                                end loop;
                        end;

                        when others then raise;
                end;


You’ll notice that I’ve added the SAVE EXCEPTIONS clause to the FORALL statement. This allows Oracle to trap any errors that occur in the array processing step and record details of the guilty array element as it goes along, storing those details in an array calls SQL%BULK_EXCEPTIONS. My exception handler then handles the array processing exception by walking through that array.

I’ve also introduced an m_error_pos variable (which I could have declared inside the specific exception handler) to remove a little of the clutter from the line that shows I can identify exactly which row in the source data caused the problem. With a minimum of wasted resources this code now inserts all the valid rows and reports the invalid rows (and, if necessary, could take appropriate action on each invalid row as it appears).

If you’ve got a data loading requirement where almost all the data is expected to be correct but errors occasionally happen, this type of coding strategy is likely to be the most efficient thing you could do to get your data into the database. It may be slightly slower when there are no errors, but that’s a good insurance premium when compared with the crash and complete rollback that occurs if you take the simple approach – and there are bound to be cases where a pre-emptive check of all the data (that would, probably, make the insert safe) would add far more overhead than the little bit of PL/SQL processing shown here.

Results

It’s obviously a little difficult to produce any time-based rates that demonstrate the similarity in performance of the SQL and PL/SQL approaches – the major time component in a little demo I built was about the I/O rather than the the CPU (which, in itself, rather validates the claim anyway). But if you want to do some testing here’s my data model with some results in the following section:


rem
rem     Script: plsql_loop_insert.sql
rem     Author: Jonathan Lewis
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        2 * trunc(dbms_random.value(1e10,1e12))         n1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
create table t2
nologging
noparallel
as
select
        /*+ no_parallel(t1) */
        id + 1e6        id,
        n1 - 1          n1,
        rpad('x',100,'x') padding
from t1 
;

-- update t2 set n1 = n1 + 1 where id = 2e6;
-- update t2 set n1 = n1 + 1 where id = 2e6 - 10;
-- update t2 set n1 = n1 + 1 where id = 2e6 - 20;
-- update t2 set n1 = n1 + 1 where id = 1750200;
-- update t2 set n1 = n1 + 1 where id = 1500003;
-- update t2 set n1 = n1 + 1 where id = 1500001;
commit;

alter system checkpoint;
alter system switch logfile;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

create unique index t1_i1 on t1(n1) nologging;
create unique index t1_pk on t1(id) nologging;
alter table t1 add constraint t1_pk primary key(id);


I’ve generated 1 million rows with an id column and a random integer – picking the range of the random numbers to give me a very good chance (that worked) of getting unique set of values. I’ve doubled the random values I use for t1 so that I can substract 1 and still guarantee uniqueness when I generate the t2 values (I’ve also added 1 million to the id value for t2 for the same uniqueness reasons).

The optional update to add 1 to a scattering of rows in t2 ensures that those values go back to their original t1 values so that they can cause “duplicate key” errors. The SQL insert was a simple insert into t1 select * from t2 (ensuring that parallel query didn’t come into play), and the PL/SQL detail I used was as follows:

declare

        cursor c1 is
        select /*+ no_parallel(t2) */ * from t2
        ;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

        dml_errors      exception;
        pragma exception_init(dml_errors, -24381);

        m_error_pos     number(6,0)     := 0;

begin

        open c1;
        loop
                fetch c1
                bulk collect
                into m_tab limit 100;

                begin
                        forall i in 1..m_tab.count save exceptions
                                insert into t1 values m_tab(i);

                exception
                        when dml_errors then begin

                                for i in 1..sql%bulk_exceptions.count loop

                                        dbms_output.put_line(
                                                'Array element: ' ||
                                                        sql%bulk_exceptions(i).error_index || ' ' ||
                                                        sqlerrm(-sql%bulk_exceptions(i).error_code)
                                        );

                                        m_error_pos := sql%bulk_exceptions(i).error_index;
                                        dbms_output.put_line(
                                                'Content: ' || m_tab(m_error_pos).id || ' ' || m_tab(m_error_pos).n1
                                        );

                                end loop;
                        end;

                        when others then raise;

                end;

                exit when c1%notfound;  -- when fetch < limit

        end loop;
        close c1;
end;
/

The PL/SQL output with one bad row (2e6 – 20) looked like this:

Array element: 80 ORA-00001: unique constraint (.) violated
Content: 1999980 562332925640

Here are some critical session statistics for different tests in 11g:


No bad data, insert select
--------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                             944
CPU used by this session                               944
DB time                                              1,712
redo entries                                     1,160,421
redo size                                      476,759,324
undo change vector size                        135,184,996

No bad data, PL/SQL loop
------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                             990
CPU used by this session                               990
DB time                                              1,660
redo entries                                     1,168,022
redo size                                      478,337,320
undo change vector size                        135,709,056


Duplicate Key (2e6-20), insert select (with huge rollback)
----------------------------------------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                           1,441
CPU used by this session                             1,440
DB time                                              2,427
redo entries                                     2,227,412
redo size                                      638,505,684
undo change vector size                        134,958,012
rollback changes - undo records applied          1,049,559

Duplicate Key (2e6-20), PL/SQL loop - bad row reported
------------------------------------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                             936
CPU used by this session                               936
DB time                                              1,570
redo entries                                     1,168,345
redo size                                      478,359,528
undo change vector size                        135,502,488
rollback changes - undo records applied                 74

Most of the difference between CPU time and DB time in all the tests was file I/O time (in my case largerly checkpoint wait time, I had small log files, but in larger systems it’s quite common to see a lot of time spent on db file sequential reads as index blocks are read for update). You can see that there’s some “unexpected” variation in CPU time – I wasn’t expecting the PL/SQL loop that failed after nearly 1M inserts to use less CPU than anything else – but the CPU numbers fluctuated a few hundredths of a second across tests, this just happened to be particularly noticeable with the first one I did – so to some extent this was probably affected by background activity relating to space management, job queue processing and all the other virtual machines on the system.

Critically I think it’s fair to say that the differences in CPU timing are not hugely significant across a reasonably sized data set, and most importantly the redo and undo hardly vary at all between the successful SQL and both PL/SQL tests. The bulk processing PL/SQL approach doesn’t add a dramatic overhead – but it clearly does bypass the threat of a massive rollback.

Footnote:

You might want to argue the case for using basic SQL with the log errors clause. The code method is simple and it gives you a table of rows which have caused exceptions as the insert executed – and that may be sufficient for your purposes; but there’s a problem until you upgrade to 12c.

Here’s how I had to modify my test case to demonistrate the method:


begin
        dbms_errlog.create_error_log('t1');
end;
/

insert into t1 select * from t2
log errors
reject limit unlimited
;

The procedure call creates a table to hold the bad rows, by default it’s name will be err$_t1, and it will be a clone of the t1 table with changes to column types (which might be interseting if you’ve enable 32K columns in 12c — to be tested) and a few extra columns:


SQL> desc err$_t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$                        NUMBER
 ORA_ERR_MESG$                          VARCHAR2(2000)
 ORA_ERR_ROWID$                         ROWID
 ORA_ERR_OPTYP$                         VARCHAR2(2)
 ORA_ERR_TAG$                           VARCHAR2(2000)
 ID                                     VARCHAR2(4000)
 N1                                     VARCHAR2(4000)
 PADDING                                VARCHAR2(4000)

SQL> execute print_table('select * from err$_t1')
ORA_ERR_NUMBER$               : 1
ORA_ERR_MESG$                 : ORA-00001: unique constraint (TEST_USER.T1_I1) violated

ORA_ERR_ROWID$                :
ORA_ERR_OPTYP$                : I
ORA_ERR_TAG$                  :
ID                            : 1999980
N1                            : 562332925640
PADDING                       : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

SO what’s the problem with logging errors ? Here are the sets of session stats corresponding to the ones that I reported above for the SQL and PL/SQL options. The first set comes from running this test on 11.2.0.4, the second from 12.1.0.2:


11g results
===========
Name                                                 Value
----                                                 -----
CPU used when call started                           1,534
CPU used by this session                             1,534
DB time                                              2,816
redo entries                                     3,113,105
redo size                                      902,311,860
undo change vector size                        269,307,108

12c results
===========
Name                                                 Value
----                                                 -----
CPU used when call started                             801
CPU used by this session                               801
DB time                                              3,061  -- very long checkpoint waits !!
redo entries                                     1,143,342
redo size                                      492,615,336
undo change vector size                        135,087,044


Ihe 12c stats are very sinilar to the stats from the perfect SQL run and the two PL/SQL runs – but if you look at the 11g stats you’ll see that they’re completely different from all the other stats. The number of redo entries (if nothing else) tells you that Oracle has dropped back from array processing to single row processing in order to be able to handle the error logging (1 million rows, one entry for each row, it’s PK index entry, and the unique key index entry.)

Until 12c error logging is just row by row processing.

Footnote:

As far as I can tell, I first pointed out this “single row processing” aspect of the log errors option some time around December 2005.

Late Entry:

While looking for a posting about efficient updates  I came across another of my posting that compares SQL with PL/SQL for updates – it’s worth a read.

 

May 23, 2016

Virtual Partitions

Filed under: 12c,CBO,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 1:16 pm BST May 23,2016

Here’s a story of (my) failure prompted by a recent OTN posting.

The OP wants to use composite partitioning based on two different date columns – the table should be partitioned by range on the first date and subpartitioned by month on the second date. Here’s the (slightly modified) table creation script he supplied:


rem
rem     Script: virtual_partition.sql
rem     Dated:  May 2016
rem

CREATE TABLE M_DTX
(
        R_ID    NUMBER(3),
        R_AMT   NUMBER(5),
        DATE1   DATE,
        DATE2   DATE,
        VC GENERATED ALWAYS AS (EXTRACT(MONTH FROM DATE2))
)
PARTITION BY RANGE (DATE1) interval (numtoyminterval(1,'MONTH'))
SUBPARTITION BY LIST (VC)
        SUBPARTITION TEMPLATE (
                SUBPARTITION M1 VALUES (1),
                SUBPARTITION M2 VALUES (2),
                SUBPARTITION M3 VALUES (3),
                SUBPARTITION M4 VALUES (4),
                SUBPARTITION M5 VALUES (5),
                SUBPARTITION M6 VALUES (6),
                SUBPARTITION M7 VALUES (7),
                SUBPARTITION M8 VALUES (8),
                SUBPARTITION M9 VALUES (9),
                SUBPARTITION M10 VALUES (10),
                SUBPARTITION M11 VALUES (11),
                SUBPARTITION M12 VALUES (12)
        )
        (
        PARTITION M_DTX_2015060100 VALUES LESS THAN (TO_DATE('2015-06-01 00:00:01', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
        )
;

There’s nothing particularly exciting about this – until you get to the query requirement – the user wants to query on date1 and date2, and doesn’t know about the virtual month column, e.g. (and, I know that there should be a to_date() or ANSI equivalent here):

SELECT * FROM m_dtx WHERE date1 = trunc(sysdate) AND date2 = '01-Jun-2016';

Now, as a general rule, you don’t expect partition elimination to occur unless the partitioning column appears with a predicate that make elimination possible, so your first response to this query is that it could eliminate on date1, but can’t possibly eliminiate on vc because vc isn’t in the where clause. However it’s possible that the partitioning code might be coded to recognise that the subpartition is on a virtual column that is derived from date2, so perhaps it could generate a new predicate before optimising, for example:

date2 = '01-Jun-2016'  => vc = 6

Unfortunately, your first response is correct – the optimizer doesn’t get this clever, and doesn’t do the sub-partition elimination. Here’s the execution plan from 12.1.0.2 for the sample query, followed by the execution plan when I explicitly add the predicate vc = 6.


SQL_ID  8vk1a05uv16mb, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy')

Plan hash value: 3104206240

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST ALL   |       |     1 |    57 |    15   (7)| 00:00:01 |     1 |    12 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))



SQL_ID  33q012bdhjrpn, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy') and vc
= 6

Plan hash value: 938710559

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |     6 |     6 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))


Note how the predicate vc = 6  doesn’t show up in the predicate section in either case, but the execution plan shows PARTITION LIST ALL at operation 2 when we omit the predicate and PARTITION LIST SINGE when we include it (with suitable values also appearing for Pstart and Pstop). (The cost, by the way, is the cost of scanning a whole (range)partition whether or not the optimizer expects to restrict that scan to just one sub-partition.)

So the optimizer isn’t quite clever enough (yet). BUT … the optimizer can be very clever with constraints, combining constraints with predicates and applying transitive closure to produce new predicates – so maybe we could get the optimizer to do this if we helped it a little bit. Given the table definition supplied I’m going to assume that the date2 column is supposed to be non-null, so let’s add some truthful constraints/declarations to the table definition:


alter table m_dtx modify date2 not null;
alter table m_dtx modify vc  not null;
alter table m_dtx add constraint md_ck_vc check (vc = extract(month from date2));

Alas, this didn’t make any difference to the execution plan. But it did do something surprising to my attempts to load data into the table:


insert into m_dtx (r_id, r_amt, date1, date2)
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum, 1000),
        rownum,
        trunc(sysdate,'yyyy') + dbms_random.value(0,365),
        trunc(sysdate,'yyyy') + dbms_random.value(0,365)
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-01400: cannot insert NULL into (???)

So the array insert with the virtual column doesn’t like the NOT NULL constraint on the virtual column because vc is, presumably, still null when the constraint is checked (though there’s no problem with single row inserts with the values() clause – I wonder what happens with the PL/SQL “FORALL” clause) – so let’s remove the not null constraint on vc and see what happens.


insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-02290: check constraint (TEST_USER.MD_CK_VC) violated

Unsurprisingly, given the fact that Oracle didn’t like the not null constraint, the critical check constraint also fails. This, by the way, is odd because a check constraint should accept a row when the constraint doesn’t evaluate to FALSE, so (a) vc can’t have been evaluated at this point or the constraint would evaluate to TRUE – which is not FALSE, and (b) vc at this point can no longer be null or the constraint would evaluate to NULL – which is not FALSE: so what “value” has vc got that makes the constraint check return FALSE ?

Bottom line:

I can see some scope for an optimizer enhancement that tries to find eliminating predicates from virtual columns; and I think there’s a need for ensuring that we can safely add constraints to virtual columns – after all we might want to create an index on a virtual column and sometimes we need a NOT NULL declaration to ensure that an index-only execution path can be found. Unfortunately I have to end this blog without finding an immediate solution for the OP.

Despite this failure, though, there are cases (as I showed a couple of years ago) where the optimizer in 12c can get clever enough to recognize the connection between a queried date column and the virtual partitioning column based on that date column.

May 9, 2016

RI Locks

Filed under: deadlocks,Indexing,IOT,Locks,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 12:24 pm BST May 9,2016

RI = Referential Integrity: also known informally as parent/child integrity, and primary (or unique) key/foreign key checking.

I’m on a bit of a roll with things that I must have explained dozens or even hundreds of times in different environments without ever formally explaining them on my blog. Here’s a blog item I could have done with to response to  a question that came up on the OTN database forum over the weekend.

What happens in the following scenario:


-- session 1

create table parent (
        id        number(8,0),
        constraint par_pk primary key(id)
);

create table child  (
        id_p      number(8,0) not null references parent,
        id_c      number(8,0) not null,
        constraint child_pk primary key(id_p, id_c)
)
;

insert into parent values(1);

-- session 2
insert into child values(1,1);

Since the parent row corresponding to the child row doesn’t (yet) seem to exist as far as session 2 is concerned you might expect session 2 to respond immediately with an error message like:

ERROR at line 1:
ORA-02291: integrity constraint (TEST_USER.SYS_C0017926) violated - parent key not found

In fact, although the end-user is not allowed to see the uncommitted parent row, the user’s process can see the uncommitted row and will wait until session 1 commits or rolls back – so if you examine v$lock for the current locks for the two sessions you’d see something like this:

  1  select  sid, type, id1, id2, lmode, request, ctime, block
  2  from    V$lock
  3  where   sid in (select sid from V$session where username = 'TEST_USER')
  4  and     type != 'AE'
  5  order by
  6*         sid, type desc
  7  /

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
         3 TX     327709      12584          6          0        283          1
           TM     143734          0          2          0        283          0
           TM     143732          0          3          0        283          0

       250 TX     589829      12877          6          0        240          0
           TX     327709      12584          0          4        240          0
           TM     143734          0          3          0        240          0
           TM     143732          0          3          0        240          0


7 rows selected.

In the above, SID 250 is session 2: it’s holding a transaction lock (TX) in mode 6 because it has acquired an undo segment and has generated some undo, it’s also waiting for a transaction lock in mode 4 (share) and – checking id1 and id2 – we can see that the transaction table entry it’s waiting for is held by session 3 in mode 6 (and we also note that the lock held by session 3 is marked as a blocker).

If session 3 commits (thus releasing the transaction lock) session 250 will continue processing the insert; if session 3 rolls back session 250 will raise error ORA-02291 and roll back its insert statement. (Note: if this were a multi-statement transaction it would only be the insert into child that would be rolled back; that’s another one of those details that is important but often isn’t stated explicitly, leaving people believing that the entire transaction would be rolled back.)

Updates and deletes can produce the same effects. Imagine that we have just created the two tables, and then run the following:


-- session 1
insert into parent values(1);
commit;
delete from parent where id = 1;

-- session 2
insert into child values(1,1);

Again session 2 will wait for session 1 to commit or roll back. In this case if session 1 commits session 2 will raise Oracle error ORA-02291, if session 1 rolls back session 2 will continue with the insert.

Deadlocks

Whenever you can demonstrate a way of producing a wait chain you can also manage to produce a deadlock. Consider the following (starting, again, from empty tables);


-- (1) session 1
insert into parent values(1);

-- (2) session 2
insert into parent values(2);

-- (3) session 1
insert into child values(2,2);

-- (4)session 2
insert into child values(1,1);

Session 1 will start waiting for session 2 to commit (or rollback) at step 3, then session 2 will start to wait for session 1 at step 4 – with the result that session 1 will recognise the deadlock after about three seconds and rollback its last statement, raising exception ORA-00060 and dumping a trace file. (Note: session 1 will not, as many people think, roll back the entire transaction, it will only roll back the statement that allowed the deadlock to develop). Session 2 will still be waiting for session 1 to commit or rollback its insert into parent. Contrary to the popular claim, Oracle will not “resolve” the deadlock, it will simply break the deadlock leaving one session waiting for the other session to respond appropriately to the deadlock error.

For reference, here’s the deadlock graph (from a 12c trace file) produced by session 1 (SID = 3) for this demo:


Deadlock graph:
                                          ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name                             process session holds waits  process session holds waits
TX-00010017-000026C7-00000000-00000000          6       3     X             33     250           S
TX-000A000D-000026F8-00000000-00000000         33     250     X              6       3           S

session 3: DID 0001-0006-00000004       session 250: DID 0001-0021-00000041
session 250: DID 0001-0021-00000041     session 3: DID 0001-0006-00000004

Rows waited on:
  Session 3: no row
  Session 250: no row

When you see a deadlock graph with TX waits of type S (share, mode 4) it’s a very good bet that the wait has something to do with indexes – which may mean referential integrity as discussed here, but may mean collisions on primary keys, and may mean something to do with simple collisions on index-organized tables. You’ll notice that the “Rows waited on:” section shows no row – unfortunately in earlier versions of Oracle you may find a spurious row entry here because the wait information from some other (block) wait has been left in the relevant columns in v$session.

May 3, 2016

Debugging

Filed under: CBO,compression,Execution plans,Infrastructure,Oracle,Uncategorized — Jonathan Lewis @ 8:11 am BST May 3,2016

The OTN database forum supplied a little puzzle a few days ago – starting with the old, old, question: “Why is the plan with the higher cost taking less time to run?”

The standard (usually correct) answer to this question is that the optimizer doesn’t know all it needs to know to predict what’s going to happen, and even if it had perfect information about your data the model used isn’t perfect anyway. This was the correct answer in this case, but with a little twist in the tail that made it a little more entertaining. Here’s the query, with the two execution plans and the execution statistics from autotrace:


SELECT  /* INDEX(D XPKCLIENT_ACCOUNT) */ 
        E.ECID,A.acct_nb
FROM    
        client_account d, 
        client         e, 
        account        a
where
        A.acct_nb ='00000000000000722616216'</li>


AND     D.CLNT_ID = E.CLNT_ID
AND     D.ACCT_ID=A.ACCT_ID;

Plan (A) with a full tablescan of client_account – cost 808, runtime 1.38 seconds, buffer gets 17,955


-------------------------------------------------------------------------------------------------
| Id | Operation                      | Name           | Rows  | Bytes  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                |     1 |    59  |   808 (14) | 00:00:10 |
|  1 |  NESTED LOOPS                  |                |     1 |    59  |   808 (14) | 00:00:10 |
|  2 |   NESTED LOOPS                 |                |     1 |    59  |   808 (14) | 00:00:10 |
|* 3 |    HASH JOIN                   |                |     1 |    42  |   806 (14) | 00:00:10 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT        |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT    |     1 |        |     4  (0) | 00:00:01 |
|  6 |     TABLE ACCESS FULL          | CLIENT_ACCOUNT |  9479K|   108M |   763 (10) | 00:00:09 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT      |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT         |     1 |    17  |     2  (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 17955  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Plan (B) with an index fast full scan on a client_account index – cost 1,190, runtime 0.86 seconds, buffer gets 28696


----------------------------------------------------------------------------------------------------
| Id | Operation                      | Name              | Rows  | Bytes  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  1 |  NESTED LOOPS                  |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  2 |   NESTED LOOPS                 |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|* 3 |    HASH JOIN                   |                   |     1 |    42  |  1188  (8) | 00:00:14 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT           |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT       |     1 |        |     4  (0) | 00:00:01 |
|  6 |     INDEX FAST FULL SCAN       | XPKCLIENT_ACCOUNT | 9479K |   108M |  1145  (5) | 00:00:13 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT         |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT            |     1 |    17  |     2  (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 28696  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Note, particularly, that the two plans are the same apart from operation 6 where a full tablescan changes to an index fast full scan, predicting the same number of rows but with an increase of 50% in the cost; the increase in cost is matched by an increase in the reported workload – a 60% increase in the number of consistent reads and no disk reads or recursive SQL in either case. Yet the execution time (on multiple repeated executions) dropped by nearly 40%.

So what’s interesting and informative about the plan ?

The cost of a tablescan or an index fast full scan is easy to calculate; broadly speaking it’s “size of object” / “multiblock read count” * k, where k is some constant relating to the hardware capability. The costs in these plans and the autotrace statistics seem to be telling us that the index is bigger than the table, while the actual run times seem to be telling us that the index has to be smaller than the table.

It’s easy for an index to be bigger than its underlying table, of course; for example, if this table consisted of nothing but two short columns the index could easily be bigger (even after a rebuild) because it would be two short columns plus a rowid. If that were the case here, though, we would expect the time to fast full scan the index to be higher than the time to scan the table.

So two thoughts crossed my mind as I looked at operation 6:

  • Mixing block sizes in a database really messes up the optimizer costing, particularly for tablescans and index fast full scans. Maybe the table had been built in a tablespace using 32KB  blocks while the index had been built in a tablespace using the more common 8KB blocksize – I didn’t want to start working out the arithmetic but that might be just enough to produce the contradiction.
  • Maybe the table was both bigger AND smaller than the index – bigger because it held more data, smaller because it had been compressed. If so then the difference in run-time would be the overhead of decompressing the rows before projecting and comparing the data.

Conveniently the OP has included an extract from the 10053 trace:


Table Stats::
  Table: CLIENT_ACCOUNT  Alias:  D
    #Rows: 9479811  #Blks:  18110  AvgRowLen:  71.00  ChainCnt:  0.00
  Column (#1): CLNT_ID(
    AvgLen: 6 NDV: 1261035 Nulls: 0 Density: 0.000001 Min: 0 Max: 4244786
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 239
  Column (#2): ACCT_ID(
    AvgLen: 6 NDV: 9479811 Nulls: 0 Density: 0.000000 Min: 1 Max: 22028568
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255

Index Stats::
  Index: XPKCLIENT_ACCOUNT  Col#: 1 2
    LVLS: 2  #LB: 28543  #DK: 9479811  LB/K: 1.00  DB/K: 1.00  CLUF: 1809449.00

Note that the index is called xpclient_account – which suggests “primary key” –  and the number of distinct keys in the index (#DK) matches the number of rows in the table(#Rows). The index and table stats seem to be consistent so we’re not looking at a problem of bad statistics.

Now to do some simple (ballpark) arithmetic: for the table can we check if  “rows * average row length / 8K =  blocks”. We can read the numbers directly from the trace file:  9,500,000 * 71 / 8,000 = 84,000.  It’s wrong by a factor of about 4 (so maybe it’s a 32K block, and maybe I could rule out that possibility by including more detail in the arithmetic – like allowing properly for the block header, row overheads, pctfree etc).

For the index – we believe it’s the primary key, so we know the number of rows in the index – it’s the same as the number of distinct keys. As for the length of an index entry, we have the index definition (col#: 1 2) and we happen to have the column stats about those columns so we know their average length. Allowing for the rowid and length bytes we can say that the average index entry is (6 +1) + (6 + 1) + 6 = 20 bytes.  So the number of leaf blocks should be roughy 9,500,000 * 20 / 8,000 = 23,750. That’s close enough given the reported 28,543 and the fact that I haven’t bothered to worry about row overheads, block overheads and pctfree.

The aritmetic provides an obvious guess – which turned out to be correct: the table is compressed, the index isn’t. The optimizer hasn’t allowed for the CPU cost of decompressing the compressed rows, so the time required to decompress 9.5M rows doesn’t appear in the execution plan.

Footnote.

Looking at the column stats, it looks like there are roughly 8 acct_ids for each clnt_id, so it would probably be sensible to compress the primary key index (clnt_id, acct_id) on the first column as this would probably reduce the size of the index by about 20%.

Better still – the client_account table has very short rows – it looks like a typical intersection table with a little extra data carried. Perhaps this is a table that should be an index-organized table with no overflow. It looks like there should also be an index (acct_id, clnt_id) on this table to optimse the path from account to client and this would become a secondary index – interestingly being one of those rare cases where the secondary index on an IOT might actually be a tiny bit smaller than the equivalent index on a heap table because (in recent versions of Oracle) primary key columns that are included in the secondary key are not repeated in the index structure. (It’s a little strange that this index doesn’t seem to exist already – you might have expected it to be there given the OP’s query, and given that it’s an “obvious” requirement as an index to protect the foreign key.)

The only argument against the IOT strategy is that the table clearly compresses very well as a heap table, so a compressed heap table plus two B-tree indexes might be more cost-effective than an IOT with a single secondary index.

 

April 27, 2016

Stats History

Filed under: Histograms,Infrastructure,Oracle,Statistics — Jonathan Lewis @ 1:09 pm BST Apr 27,2016

From time to time we see a complaint on OTN about the stats history tables being the largest objects in the SYSAUX tablespace and growing very quickly, with requests about how to work around the (perceived) threat. The quick answer is – if you need to save space then stop holding on to the history for so long, and then clean up the mess left by the history that you have captured; on top of that you could stop gathering so many histograms because you probably don’t need them, they often introduce instability to your execution plans, and they are often the largest single component of the history (unless you are using incremental stats on partitioned objects***)

For many databases it’s the histogram history – using the default Oracle automatic stats collection job – that takes the most space, here’s a sample query that the sys user can run to get some idea of how significant this history can be:


SQL> select table_name , blocks from user_tables where table_name like 'WRI$_OPTSTAT%HISTORY' order by blocks;

TABLE_NAME                           BLOCKS
-------------------------------- ----------
WRI$_OPTSTAT_AUX_HISTORY                 80
WRI$_OPTSTAT_TAB_HISTORY                244
WRI$_OPTSTAT_IND_HISTORY                622
WRI$_OPTSTAT_HISTHEAD_HISTORY          1378
WRI$_OPTSTAT_HISTGRM_HISTORY           2764

5 rows selected.

As you can see the “histhead” and “histgrm” tables (histogram header and histogram detail) are the largest stats history tables in this (admittedly very small) database.

Oracle gives us a couple of calls in the dbms_stats package to check and change the history setting, demonstrated as follows:


SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         31

1 row selected.

SQL> execute dbms_stats.alter_stats_history_retention(7)

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                          7

1 row selected.

Changing the retention period doesn’t reclaim any space, of course – it simply tells Oracle how much of the existing history to eliminate in the next “clean-up” cycle. This clean-up is controllled by a “savtime” column in each table:

SQL> select table_name from user_tab_columns where column_name = 'SAVTIME' and table_name like 'WRI$_OPTSTAT%HISTORY';

TABLE_NAME
--------------------------------
WRI$_OPTSTAT_AUX_HISTORY
WRI$_OPTSTAT_HISTGRM_HISTORY
WRI$_OPTSTAT_HISTHEAD_HISTORY
WRI$_OPTSTAT_IND_HISTORY
WRI$_OPTSTAT_TAB_HISTORY

5 rows selected.

If all you wanted to do was stop the tables from growing further you’ve probably done all you need to do. From this point onwards the automatic Oracle job will start deleting the oldest saved stats and re-using space in the existing table. But you may want to be a little more aggressive about tidying things up, and Oracle gives you a procedure to do this – and it might be sensible to use this procedure anyway at a time of your own choosing:


SQL> execute dbms_stats.purge_stats(sysdate - 7);

Basically this issues a series of delete statements (including a delete on the “stats operation log (wri$_optstat_opr)” table that I haven’t previously mentioned) – here’s an extract from an 11g trace file of a call to this procedure (output from a simple grep command):


delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_tab_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_ind_history h        where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_aux_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_opr                  where start_time < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history     where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histgrm_history      where savtime < :1 and rownum <= NVL(:2, rownum)

Two points to consider here: although the appearance of the rownum clause suggests that there’s a damage limitation strategy built into the code I only saw one commit after the entire delete cycle, and I never saw a limiting bind value being supplied. If you’ve got a large database with very large history tables you might want to delete one day (or even just a few hours) at a time. The potential for a very long, slow, delete is also why you might want to do a manual purge at a time of your choosing rather than letting Oracle do the whole thing on auto-pilot during some overnight operation.

Secondly, even though you may have deleted a lot of data from these table you still haven’t reclaimed the space – so if you’re trying to find space in the sysaux tablespace you’re going to have to rebuild the tables and their indexes. Unfortunately a quick check of v$sysaux_occupants tells us that there is no official “move” producedure:


SQL> execute print_table('select occupant_desc, move_procedure, move_procedure_desc from v$sysaux_occupants where occupant_name = ''SM/OPTSTAT''')

OCCUPANT_DESC                 : Server Manageability - Optimizer Statistics History
MOVE_PROCEDURE                :
MOVE_PROCEDURE_DESC           : *** MOVE PROCEDURE NOT APPLICABLE ***

So we have to run a series of explicit calls to alter table move and alter index rebuild. (Preferably not when anyone is trying to gather stats on an object). Coding that up is left as an exercise to the reader, but it may be best to move the tables in the order of smallest table first, rebuilding indexes as you go.

Footnote:

*** Incremental stats on partitioned objects: I tend to assume that sites which use partitioning are creating very large databases and have probably paid a lot more attention to the details of how to use statistics effectively and successfully; that’s why this note is aimed at sites which don’t use partitioning and therefore think that the space taken up by the stats history significant.

March 11, 2016

Wrong Results ?

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 9:18 am BST Mar 11,2016

I gather that journalistic style dictates that if the headline is a question then the answer is no. So, following on from a discussion of possible side effects of partition exchange, let’s look at an example which doesn’t involve partitions.  I’ve got a schema that holds nothing by two small, simple heap tables, parent and child, (with declared primary keys and the obvious referential integrity constraint) and I run a couple of very similar queries that produce remarkably different results:


select
        par.id      parent_id,
        chi.id      child_id,
        chi.name    child_name
from
        parent  par,
        child   chi
where
        chi.id_p = par.id
order by
        par.id, chi.id
;

 PARENT_ID   CHILD_ID CHILD_NAME
---------- ---------- ----------
         1          1 Simon
         1          2 Sally
         2          1 Janet
         2          2 John
         3          1 Orphan

5 rows selected.

Having got this far with my first query I’ve decided to add the parent name to the report:


select
        par.id      parent_id,
        par.name    parent_name,
        chi.id      child_id,
        chi.name    child_name
from
        parent  par,
        child   chi
where
        chi.id_p = par.id
order by
        par.id, chi.id
;

 PARENT_ID PARENT_NAM   CHILD_ID CHILD_NAME
---------- ---------- ---------- ----------
         1 Smith2              1 Simon
         1 Smith               1 Simon
         1 Smith2              2 Sally
         1 Smith               2 Sally
         2 Jones               1 Janet
         2 Jones               2 John

6 rows selected.

How could adding a column to the select list result in one child row disappearing and two child rows being duplicated; and is this a bug ?

To avoid any confusion, here’s the complete script I used for creating the schema owner, in 11.2.0.4, with no extra privileges granted to PUBLIC:


create user u1
        identified by u1
        default tablespace test_8k
        quota unlimited on test_8k
;

grant
        create session,
        create table
to
        u1
;


Update

It didn’t take long for a couple of people to suggest that the oddity was the consequence of constraints that had not been enabled and validated 100% of the time, but the suggestions offered were a little more convoluted than necessary. Here’s the code I ran from my brand new account before running the two select statements:


create table parent (
        id      number(4),
        name    varchar2(10),
        constraint par_pk primary key (id)
        rely disable novalidate
)
;

create table child(
        id_p    number(4)
                constraint chi_fk_par
                references parent
                on delete cascade
                rely disable novalidate,
        id      number(4),
        name    varchar2(10),
        constraint chi_pk primary key (id_p, id)
                rely disable novalidate
)
;

insert into parent values (1,'Smith');
insert into parent values (1,'Smith2');
insert into parent values (2,'Jones');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

insert into child values(2,1,'Janet');
insert into child values(2,2,'John');

insert into child values(3,1,'Orphan');

commit;

begin
        dbms_stats.gather_table_stats(user,'child');
        dbms_stats.gather_table_stats(user,'parent');
end;
/


In a typical data warehouse frame of mind I’ve added plenty of constraints, but left them all disabled and novalidated, but told Oracle to rely on them for optimisation strategies. This means all sorts of incorrect data could get into the tables, with all sorts of unexpected side effects on reporting. The example above shows duplicates on primary keys (and if you checked the table definition you’d find that the primary key columns were nullable as well), child rows with no parent key.

In fact 11g and 12c behave differently – the appearance of the Orphan row in the first sample query is due, as Chris_cc pointed out in the first comment, to the optimizer deciding that it could use join elimination because it was joining to a single-column primary key without selecting any other columns from the referenced table. In 12c the optimizer doesn’t use join elimination for this query, so both queries have the same (duplicated) output.

Update:

Make sure you read the articles linked to by Dani Schneider’s comment below, and note especially the impact on the query_rewrite_integrity parameter.

February 26, 2016

Partition Limit

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 2:19 pm BST Feb 26,2016

A tweet from Connor McDonald earlier on today reminded me of a problem I managed to pre-empt a couple of years ago.

Partitioning is wonderful if done properly but it’s easy to get a little carried away and really foul things up. So company “X” decided they were going to use range/hash composite partitioning and, to minimise contention and (possibly) reduce the indexing overheads, they decided that they would create daily partitions with 1,024 subpartitions.

This, in testing, worked very well, and the idea of daily/1024 didn’t seem too extreme given the huge volume of data they were expecting to handle. There was, however, something they forgot to test; and I can demonstrate this on 12c with an interval/hash partitioned table:


rem     Script:         pt_interval_hash_2.sql
rem     Author:         Jonathan Lewis

column start_date new_value m_start_date
select to_char(trunc(sysdate),'dd-mon-yyyy') start_date from dual;

create table interval_hash (
        d1      date            not null,
        n1      number(10,0)    not null,
        v1      varchar2(100)
)
tablespace test_8k_assm_2
partition by range (d1) interval (numtodsinterval(1,'DAY'))
subpartition by hash (n1) subpartitions 1024
        (
                partition p0 values less than (to_date('&m_start_date','dd-mon-yyyy'))
        )
;

insert into interval_hash values (trunc(sysdate) + 1023, 1, rpad('x',100));
insert into interval_hash values (trunc(sysdate) + 1022, 1, rpad('x',100));
insert into interval_hash values (trunc(sysdate) + 1021, 1, rpad('x',100));
commit;

I’ve started the partition range on trunc(sysdate), so it’s easy to pick a few very important future dates for insertion.

So what do you think is going to happen as I try to insert those three rows ?


insert into interval_hash values (trunc(sysdate) + 1023, 1, rpad('x',100))
            *
ERROR at line 1:
ORA-14300: partitioning key maps to a partition outside maximum permitted number of partitions


insert into interval_hash values (trunc(sysdate) + 1022, 1, rpad('x',100))
*
ERROR at line 1:
ORA-14299: total number of partitions/subpartitions exceeds the maximum limit


1 row created.

One of them succeeds – the other two fail with DIFFERENT error messages.

The limit on the number of segments allowed for a single table (or index) is 2^20 – 1 (i.e. 1,048575); and with 1,024 hash subpartitions I’m only allowed 1,023 partitions. I think that we get two different error messages because Oracle can work out that the first subpartition that it would create for sysdate + 1023 will exceed the segment limit, but it doesn’t “realise” that it’s going to go beyond the limit with sysdate + 1022 until it’s too late and it’s started working on creating the segments.

In fact, in its attempt to create the sysdate + 1022 partition Oracle actually temporarily allocated 1GB of space (I had a 1MB uniform extent size for the tablespace) and updated various bits of the data dictionary generating a total of 13,000 redo entries and 2.3MB of redo – for a single row insert!

So – if you want 1,024 hash partitions you’re only allowed 3 years of daily partitions. If you want a longer history you need fewer hash partitions (and don’t forget to stick to a power of 2) or longer durations for each partition. If 3 years is sufficient, though, make sure you test how long it takes to drop old partitions from the start of a table that is getting close to the limit.

 

February 11, 2016

Quiz Night

Filed under: 12c,Infrastructure,LOBs,Oracle,Statistics — Jonathan Lewis @ 3:28 pm BST Feb 11,2016

I was setting up a few tests on a copy of 12.1.0.2 recently when I made a mistake creating the table – I forgot to put in a couple of CAST() calls in the select list, so I just patched things up with a couple of “modify column” commands. Since I was planning to smash the table in all sorts of ways and it had taken me several minutes to create the data set (10 million rows) I decided to create a clean copy of the data so that I could just drop the original table and copy back the clean version – and after I’d done this I noticed something a little odd.

Here’s the code (cut down to just 10,000 rows), with a little output:

--
--      Script:  12c_vc32767_b.sql
--

create table t1 (
        id      not null,
        date_open,
        date_closed,
        deal_type,
        client_ref,
        small_vc,
        padding
)
nologging
as
select
        rownum                                          id,
        trunc(
                add_months(sysdate, -120) +
                        (rownum)* 3652 / 1e7
        )                                               date_open,
        trunc(
                add_months(
                        add_months(sysdate, -120) +
                                (rownum) * 3652 / 1e7,
                        12 * trunc(dbms_random.value(1,6))
                )
        )                                               date_closed,
        dbms_random.string('U',1)                       deal_type,
        dbms_random.string('U',4)                       client_ref,
        lpad(rownum,10)                                 vc_small,
        rpad('x',100,'x')                               vc_padding
from
        dual
connect by
        level <= 10000
;
alter table t1 modify deal_type varchar2(1);
alter table t1 modify client_ref varchar2(4);

create table t2 nologging as select * from t1;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt=>'for all columns size 1'
        );
end;
/

select table_name, num_rows, blocks, avg_row_len from user_tables;

TABLE_NAME             NUM_ROWS     BLOCKS AVG_ROW_LEN
-------------------- ---------- ---------- -----------
T1                        10000        304         139
T2                        10000        218         139

There’s no hidden code – this is the code I ran, and the tables both went into the same tablespace- yet one table has used about 30% more blocks than the other on a simple “create as select”, even though the row lengths and row counts are the same!

When I first did this test it just so happened that the only place in the database I had to make the t2 copy was in a tablespace using freelist management, and I’d created the original table in a tablespace using ASSM, so I wasted a little time examining the ASSM space management (bitmap) blocks before I realised what had happened.

Suggestions about what has happened, and how to confirm your hypothesis, invited in the comments.

Update (following morning)

The explanation is in the comments: I had enabled extended (32,767 byte) varchar2().  My calls to dbms_random.string() had returned character values, and the return type defaults to the maximum character length, which meant they were declared as varchar2(32767) and those are implemented as (unhackable) CLOBs in 12.1.0.2.

When I modified the table to change the varchar2(32767) to shorter lengths Oracle let me get away with it because none of the actual stored values at that time was longer than my target declaration – but it didn’t physically change the stored column values to “short” varchar2(), it left them as CLOBs, and a “short” CLOB includes a LOB Locator which starts out at about 30 bytes.

When I created t2 from t1 Oracle first copied the (short) column definitions, and then copied the data, reading and converting the CLOBs to normal varchar2() storage, and that explains why t2 needed fewer blocks.  The avg_row_len for the two tables matched because the code to gather stats simply applies the sys_op_opnsize() function to the varchar2() defined in t1, and doesn’t give you any indication about the LOB locator.

(If you use the extremely deprecated analyze command to gather stats on the two tables you’ll find that the avg_row_len of table t1 then allows for the LOB locator size.)

“Workaround:”

I should have used:  cast(dbms_random.string(‘U’,4) as varchar2(4)) and the column definition would have been a proper varchar2(4) from the start.  (As noted in a comment from Ivica Arsov below, substr(…, 1, 4) would also have worked.  Of course I should have used cast() for the numerics as well so that I could declare them as things like number(8,0) rather than the slightly dangerous “number”.

Note:  I’ve put “workaround” in quotes because it’s not really a workaround to a problem – it’s just an example of doing it right.

Footnote:

Here’s a symbolic dump of a single character (‘T’) being stored as a varchar2(32767):


LOB
Locator:
  Length:        84(31)
  Version:        1
  Byte Length:    1
  LobID: 00.00.00.01.00.00.01.6b.ba.d7
  Flags[ 0x01 0x0c 0x00 0x80 ]:
    Type: BLOB
    Storage: SecureFile
    Characterset Format: IMPLICIT
    Partitioned Table: No
    Options: ReadWrite
  SecureFile Header:
    Length:   11
    Old Flag: 0x48 [ DataInRow SecureFile ]
    Flag 0:   0x90 [ INODE Valid ]
    Layers:
      Lengths Array: INODE:5
      INODE:
        00 00 01 01 54

Note the 84(31) at line 3: this is a reminder of how big a LOB locator could get for an out of line LOB, compared to the current size of the LOB locator. Rows that hold out of line LOBs can actually be much longer than avg_row_len tells you.

Addendum

Another side effect of enabling extended varchar2() types is the surprise you get when you use error logging in SQL (i.e. syntax like: “insert into t1 select … log errors”). When you create the “clone” table for the log it changes every column from the original into a varchar2(32767) e.g.:

desc target
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ID                                                                                NUMBER
 N1                                                                                NUMBER
 N2                                                                                NUMBER
 SMALL_VC                                                                          VARCHAR2(10)
 PADDING                                                                           VARCHAR2(100)


begin
        dbms_errlog.create_error_log('TARGET');
end;
/

desc err$_TARGET
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ORA_ERR_NUMBER$                                                                   NUMBER
 ORA_ERR_MESG$                                                                     VARCHAR2(2000)
 ORA_ERR_ROWID$                                                                    ROWID
 ORA_ERR_OPTYP$                                                                    VARCHAR2(2)
 ORA_ERR_TAG$                                                                      VARCHAR2(2000)
 ID                                                                                VARCHAR2(32767)
 N1                                                                                VARCHAR2(32767)
 N2                                                                                VARCHAR2(32767)
 SMALL_VC                                                                          VARCHAR2(32767)
 PADDING                                                                           VARCHAR2(32767)

You probably hope that you’re not going to see many rows inserted into the error log table, so the extra space taken up in the table by error rows probably won’t be a problem – but the fact that you get two segments (the LOB and the LOBINDEX) created for every column in the original table might cause a problem – even if they are completely empty.

February 2, 2016

Partitioned Bitmap Join

Filed under: bitmaps,Bugs,Indexing,Infrastructure,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 8:32 am BST Feb 2,2016

If you don’t want to read the story, the summary for this article is:

If you create bitmap join indexes on a partitioned table and you use partition exchanges to load data into the table then make sure you create the bitmap join indexes on the loading tables in exactly the same order as you created them on the partitioned table or the exchange will fail with the (truthful not quite complete) error: ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION.

My story starts with this OTN posting from John Hall where he found after a year of successful batch loading one of his partition exchanges was raising error 14098. After an exchange of ideas, user rp0428 came up with a query against sys.jijoin$ (one of the tables behind bitmap join indexes) that allowed John Hall to see that the indexes on the exchange table had been created in a different order from that of the partitioned table. I did a quick test to see if this might be relevant (it shouldn’t be, it isn’t with “normal” indexes or function-based indexes, or virtual columns) and didn’t manage to reproduce the problem with two dimension tables and two bitmap join indexes.

Fortunately John didn’t take my word for it and tested the idea on a clone of the production system – and found that the order of creation did matter. His system, however, had 9 dimension tables and 33 bitmap join indexes – which shouldn’t have made any difference in principle, but maybe it was something to do with having several indexes on the same table,  maybe it was something to do with have far more tables or far more indexes than I had. So I built a larger test case with 6 dimension tables and six indexes per table – and reproduced the problem.

Then I started cutting back to see where the problem appeared, and found that all it took was one dimension with two indexes, or two dimensions with one index each – whatever I had done in my “quick test” I had clearly done it too quickly and done something wrong. (Unfortunately I had overwritten most of the code from the original quick test while building the larger test, so I couldn’t go back and see where the error was.)

Here, then, is the minimal test case that I finally ran to demonstrate that switching the order of index creation on the exchange table causes the exchange to fail:


drop table pt_range purge;
drop table t1 purge;
drop table dim_1 purge;
drop table dim_2 purge;

prompt  =================
prompt  Partitioned table
prompt  =================

create table pt_range (
        id,
        grp1,
        grp2,
        padding
)
nologging
partition by range(id) (
        partition p2001 values less than (2001),
        partition p4001 values less than (4001),
        partition p6001 values less than (6001),
        partition p8001 values less than (8001)
)
as
select
        rownum                          id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 8000
;

prompt  ================================================
prompt  Exchange table - loaded to match partition p8001
prompt  ================================================

alter table pt_range 
add constraint pt_pk primary key (id) using index local;

create table t1 (
        id,
        grp1,
        grp2,
        padding
)
as 
select
        rownum + 6000                   id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 2000
;

alter table t1
add constraint t1_pk primary key (id);

execute dbms_stats.gather_table_stats(user,'pt_range')
execute dbms_stats.gather_table_stats(user,'t1')

prompt  ================
prompt  dimension tables
prompt  ================

create table dim_1 
as 
select distinct 
        grp1, 
        cast('A'||grp1 as varchar2(3)) agrp1,
        cast('B'||grp1 as varchar2(3)) bgrp1
from
        t1
;

create table dim_2 as select * from dim_1;

prompt  ===============================
prompt  Primary keys required for BMJIs
prompt  ===============================

alter table dim_1 add constraint d1_pk primary key (grp1);
alter table dim_2 add constraint d2_pk primary key (grp1);

execute dbms_stats.gather_table_stats(user,'dim_1')
execute dbms_stats.gather_table_stats(user,'dim_2')

prompt  ============================
prompt  Creating bitmap join indexes
prompt  ============================

create bitmap index pt_1a on pt_range(d1.agrp1) from pt_range pt, dim_1 d1 where d1.grp1 = pt.grp1 local ;
create bitmap index pt_2a on pt_range(d2.agrp1) from pt_range pt, dim_2 d2 where d2.grp1 = pt.grp2 local ;

prompt  ====================================================
prompt  Pick your index creation order on the exchange table
prompt  ====================================================

create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;
create bitmap index t1_2a on t1(d2.agrp1) from t1, dim_2 d2 where d2.grp1 = t1.grp2 ;
-- create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;

prompt  ==================
prompt  Exchanging (maybe)
prompt  ==================

alter table pt_range
        exchange partition p8001 with table t1
        including indexes
        without validation
;

I’ve got the same create statement twice for one of the bitmap join indexes – as it stands the indexes will be created in the right order and the exchange will work; if you comment out the first t1_1a create and uncomment the second the exchange will fail. (If you comment out the ‘including indexes’ then the exchange will succeed irrespective of the order of index creation, but that rather defeats the point of being able to exchange partitions.)

I’ve reproduced the problem in 12.1.0.2, 11.2.0.4 and 10.2.0.5

Footnote: running an extended trace didn’t help me work out how Oracle is detecting the mismatch, presumably it’s something that gets into the dictionary cache in a general “load the index definition” step; but it did show me that (in the “without validation” case) the code seems to check the correctness of the exchange table’s primary key data BEFORE checking whether the indexes match properly.

January 28, 2016

Bitmap Efficiency

Filed under: bitmaps,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 1:02 pm BST Jan 28,2016

An interesting observation came up on the Oracle-L list server a few days ago that demonstrated how clever the Oracle software is at minimising run-time work, and how easy it is to think you know what an execution plan means when you haven’t actually thought through the details – and the details might make a difference to performance.

The original question was about a very large table with several bitmap indexes, and an anomaly that appeared as a query changed its execution plan.  Here are the critical sections from the plans (extracted from memory with rowsource execution statistics enabled):

--------------------------------------------------------------------------------------------------------
|  Id |Operation                        | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
|   6 |    TABLE ACCESS BY INDEX ROWID  |       FACT |      1 |      1 |     24 |00:00:00.01 |      31 |
|   7 |     BITMAP CONVERSION TO ROWIDS |            |      1 |        |     24 |00:00:00.01 |       7 |
|   8 |      BITMAP AND                 |            |      1 |        |      1 |00:00:00.01 |       7 |
|*  9 |       BITMAP INDEX SINGLE VALUE |     FACT_0 |      1 |        |      1 |00:00:00.01 |       3 |
|* 10 |       BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      1 |        |      4 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C0"=243001)
    10 - access("FACT"."C1"="DIMENSION1"."ID")


-------------------------------------------------------------------------------------------------------
|  Id | Operation                      | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
|   7 |    BITMAP CONVERSION TO ROWIDS |            |      5 |        |      8 |00:00:00.01 |     119 |
|   8 |     BITMAP AND                 |            |      5 |        |      1 |00:00:00.01 |     119 |
|*  9 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      5 |        |     20 |00:00:00.01 |      28 |
|* 10 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM2 |      5 |        |    140 |00:00:00.01 |      78 |
|* 11 |      BITMAP INDEX SINGLE VALUE |     FACT_0 |      5 |        |      5 |00:00:00.01 |      13 |
|  12 |   TABLE ACCESS BY INDEX ROWID  |       FACT |      8 |      1 |      8 |00:00:00.01 |       8 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C1"="DIMENSION1"."ID")
    10 - access("FACT"."C2"="DIMENSION2"."ID")
    11 - access("FACT"."C0"=243001)

The first plan shows the steps leading to a single access (Starts = 1) to the FACT table after combining two bitmap indexes; the second shows the second child of a nested loop join where Oracle has combined three bitmaps indexes to access the FACT table – operation 7 (and its descendants) execute 5 times in this case. I’ve included the related parts of the predicate section so that you can see that the predicates at operations 9 and 10 of the first plan are the same as the predicates at operations 9 and 11 of the second plan.

So here’s the question – if one access to fact_dim1 requires 4 buffer visits, why does it take 28 buffer visits to do the same thing 5 times (and it is with the same value every time); conversely if one access to fact_0 requires 3 buffer visits, why do 5 visits to do the same thing take only 13 buffer visits. (Note: the arithmetic is made a little more obscure by the way in which index branch blocks may be pinned during nested loop joins.)

Then there’s a further question – not visible in the plan – the A-Rows in the “BITMAP INDEX SINGLE VALUE” operation is the number of bitmap sections in the rowsource, and we can see that the key values for index fact_dim2 have a significant number of bitmap chunks for a single key (5 executions returned 140 bitmap chunks). This scale, though, is true of all three indexes – in fact a follow-up email pointed out that a typical key value in EVERY ONE of the three indexes consisted of about 100 bitmap chunks, so why can’t we see those hundreds in the execution plan ?

So this is where we’re at: we have an execution plan where we haven’t visited all the bitmap chunks for a bitmap key, and the order in which the bitmap indexes are used in the plan seems to have some effect on the choice of leaf-blocks you visit when accessing the chunks. So (a) could a change in the order of indexes make a significant difference to the number of bitmap chunks you visit and the resulting performance, and (b) is there a way to control the order in which you visit the indexes. That’s where the note starts to get a bit technical – if you don’t want to read any more the answers are: (a) yes but probably not significantly and (b) yes.

Demo

To investigate what goes on inside a “BITMAP AND” I created a table with two bitmap indexes and used a very large setting for pctfree for the indexes so that they had to be stored with a large number of bitmap chunks per key. Here’s the code that I used, with some results from an instance of 12.1.0.2:


create table people
nologging
as
with generator as (
        select  --+ materialize 
                rownum id 
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum-1, 1e2)      id_town_home,
        trunc((rownum-1)/1e4)   id_town_work,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PEOPLE',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index pe_home on people(id_town_home) nologging pctfree 95;
create bitmap index pe_work on people(id_town_work) nologging pctfree 95;

select
        index_name, distinct_keys, num_rows, leaf_blocks, avg_leaf_blocks_per_key
from
        user_indexes
where
        table_name = 'PEOPLE'
order by
        index_name
;


INDEX_NAME           DISTINCT_KEYS   NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
-------------------- ------------- ---------- ----------- -----------------------
PE_HOME                        100      30399       15200                     152
PE_WORK                        100       1800         907                       9

As you can see I’ve generated two columns (id_town_home, id_town_work) with 100 distinct values and 10,000 rows each, but with very different data distributions – the rows for any given value for id_town_home are uniformly spread across the entire table, every hundredth row; while the rows for any given value of id_town_work are very tightly clustered as a group of 10,000 consecutive rows. As a consequence the index entry (bitmap string) for a typical key value for id_town_home is enormous and has to be broken into 304 chunks spread across 152 leaf blocks (2 index entries per leaf block), while the index entry for a typical key value for id_town_work is much shorter, but still requires 18 chunks spread across 9 leaf blocks.

So what will I see if I run the following query, and force it to use a BITMAP AND of the two indexes, in the two different orders:

select
        /*+ index_combine(pe) */
        max(small_vc)
from
        people pe
where
        id_town_home = 50
and     id_town_work = 50
;

Based on a very simple interpretation of the typical execution plan and using the index stats shown above we might expect to see roughly A-Rows = 18 with 9 buffer gets (plus a few more for segment headers and branch blocks) on the id_town_work index and A-Rows = 304 with 152 buffer gets on the id_town_home index to allow Oracle to generate and compare the two bit strings – but here are the two plans with their execution stats, generated in 12.1.0.2, and each run after flushing the buffer cache:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     118 |    117 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     118 |    117 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     118 |    117 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      18 |     17 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      18 |     17 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      4 |00:00:00.01 |       4 |      4 |
-------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     122 |    120 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     122 |    120 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     122 |    120 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      22 |     20 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      22 |     20 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      5 |00:00:00.01 |       8 |      7 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
-------------------------------------------------------------------------------------------------------------------

We have NOT touched anything like the entire bit-string for the id_town_home index – a bit-string that spans 152 leaf blocks! Clearly Oracle is doing something clever to minimise the work, and it’s so clever that switching the order of these two extremely different indexes in the plan has made virtually no difference to the work done. Obviously I can’t tell you exactly what the code is doing, but I think I can produce a reasonable guess about what’s going on.

The pe_work index has the smaller number of leaf blocks per key, which makes it the better starting choice for the AND in this case, so the optimizer’s default starting action was to pick the first couple of chunks of that index key value; and Oracle immediately sees that the first rowid that it could possibly need in its result set is roughly in the middle of the table – remember that the “key” columns of a bitmap index are (real_key, first_rowid_of chunk, last_rowid_of_chunk, compressed_bitstring).

Since it now knows the lowest possible rowid that it could need Oracle can now probe the pe_home index by (id_town_home=50, {target_rowid}) – which will let it go to a bitmap index chunk that’s roughly in the middle of the full range of 152. Then Oracle can expand the bitstrings from the chunks it has, reading new chunks as needed from each of the indexes until the 18 chunks / 9 leaf block from the pe_work index have been used up (and that range would have aligned with just two or three chunks from the pe_home index) at which point Oracle can see there’s no more rows in the table that could match both predicates and it doesn’t need to read the next 75 chunks of the pe_home index.

Conversely, when I forced Oracle to use the (inappropriate) pe_home index first, it read the first couple of chunks, then read the first couple of chunks of the pe_work index, at which point it discovered that it didn’t need any of the pe_home index prior to (roughly) chunk 75, so it jumped straight to the right chunk to align with pe_work and carried on from there. That’s why the forced, less efficient, plan that visited pe_home first visited just a couple more leaf blocks than the plan the optimizer selected for itself.

Bottom line on performance (tl;dr) – Oracle is sufficiently smart about checking the start and end ranges on bitmap indexes (rather then arbitrarily expanding the entire bitmap for each key) that even for very large bitmap index entries it will probably only access a couple of “redundant” leaf blocks per index even if it picks the worst possible order for using the indexes. You’re far more likely to notice Oracle picking the wrong indexes (because you know the data better) than you are to spot it using the right indexes in the wrong order – and given that bitmap indexes tend to be relatively small and well buffered (compared to the tables), and given the relatively large number of rows we pick by random I/O from fact tables, a little extra work in the bitmap indexes is unlikely to make a significant difference to the performance of most queries.

Closing fact: in the unlikely circumstances that you do spot the special case where it will make a difference (and it will probably be a difference in CPU usage) then you can dictate the order of the indexes with the undocumented bitmap_tree() hint.  I may get round to writing up the variations one day but, for this simple case, the index_combine() hint that I used to force the BITMAP AND turned into the following bitmap_tree() hint in the outline:

bitmap_tree(@sel$1 pe@sel$1 and((people.id_town_work) (people.id_town_home)))

bitmap_tree( @query_block     table_name@query_block     and( ({first index definition}) ({second index definition}) ) )

Obviously not suitable to throw into production code casually – check with Oracle support if you think it’s really necessary – but if you wanted to reverse the order of index usage in this case you could just swap the order of the index definitions. If you thought there was a third index that should be used you could include its definition (note that it’s table_name.column_name – the index definition – in the brackets).

My reference: bitmap_control_02.sql

January 18, 2016

Drop Column

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 8:14 am BST Jan 18,2016

I published a note on AllthingsOracle a few days ago discussing the options for dropping a column from an existing table. In a little teaser to a future article I pointed out that dropping columns DOESN’T reclaim space; or rather, probably doesn’t, and even if it did you probably won’t like the way it does it.

I will  be writing about “massive deletes” for AllthingsOracle in the near future, but I thought I’d expand on the comment about not reclaiming space straight away. The key point is this – when you drop a column you are probably dropping a small fraction of each row. (Obviously there are some extreme variants on the idea – for example, you might have decided to move a large varchar2() to a separate table with shared primary key).

If you’ve dropped a small fraction of each row you’ve freed up a small fraction of each block, which probably means the block hasn’t been identified as having available free space for inserts. In many cases this is probably  a good thing – because it’s quite likely the if every block in your table is suddenly labelled as having sufficient free space for new row then you could end up with a difficult and ongoing performance problem.

Many large tables have a “time-based” component to their usage – as time passes the most recently entered rows are the ones that get most usage, and older rows are no longer accessed; this means you get a performance benefit from caching because the most useful fractions of such tables are often well cached and the “interesting” data is fairly well clustered.

In a case like this, imagine what will happen if EVERY block in your table suddenly acquires enough free space to accept a couple of new rows – over the next few days the incoming data will be spread across the entire length of the table, and for the next couple of months, or years, you will have to keep the entire table cached in memory if the performance is to stay constant; moreover the clustering_factor of the most useful indexes is likely to jump from “quite small” to “absolutely massive”, and the optimizer will start changing lots of plans because it will decide that your favourite indexes are probably much to expensive to user.

I am, of course, painting a very grim picture – but it is a possible scenario that should be considered before you drop a column from a table. Combined with my observations about the locking and overheads of dropping a column you might (probably ought to) decide that you should never drop a column you should only mark it as unused or (better still if you’re on 12c) mark it invisible for a while before marking it unused. You can worry about space reclamation at a later date when you considered all the ramifications of how it might impact on performance.

Footnote: If you’re still using freelist management then dropping a column won’t put a block on the freelist until the total used space in the block falls below the value dictated by pctused (default 40%); if you’re using ASSM then the block doesn’t become available for reuse until (by default) the free space exceeds 25% of the block’s usable space.

 

 

November 3, 2015

Nul points

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 8:16 am BST Nov 3,2015

(To understand the title, see this Wikipedia entry)

The title could also be: “Do as I say, don’t do as I do”, because I want to remind you of an error that I regularly commit in my demonstrations. Here’s an example:

 
SQL> create table t (n number); 

Table created 

Have you spotted the error yet ? Perhaps this will help:

SQL> insert into t select 1 - 1/3 * 3 from dual; 

1 row created. 

SQL> insert into t select 1 - 3 * 1/3 from dual; 

1 row created. 

SQL> column n format 9.99999999999999999999999999999999999999999 
SQL> select * from t; 

                                           N
--------------------------------------------
  .00000000000000000000000000000000000000010
  .00000000000000000000000000000000000000000

2 rows selected. 

Spotted the error yet ? If not then perhaps this will help:

SQL> select * from dual where 3 * 1/3 = 1/3 * 3;

no rows selected 

SQL> select * from dual where 3 * (1/3) = (1/3) * 3; 

D
-
X

1 row selected. 

Computers work in binary, people (tend to) work in decimal. 10 = 2 * 5, and 5 (more precisely, dividing by 5) is something that a computer cannot do accurately. So when you do arbitrary arithmetic you should use some method to deal with tiny rounding errors.

In Oracle this means you ought to define all numbers with a precision and scale. Look on it as another form of constraint that helps to ensure the correctness of your data as well as improving performance and reducing wasted storage space.

October 12, 2015

Read Consistency

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 7:25 pm BST Oct 12,2015

I posted a note a few days ago about read consistency, the Cross Session PL/SQL Function Result Cache, deterministic functions, and scalar subqueries. The intent of the article was to make clear the point that while you might think that declaring a PL/SQL function to be deterministic or in the PL/SQL Result Cache might make a query that calls the function perform faster, if that function contained its own SQL statement then your code might not be producing self-consistent results and (even worse) if you had used the Result Cache option your code might actually cause other session to get wrong results if you tried to “set transaction read only” or “alter session set isolation_level = serializable”

It occurred to me a couple of days after writing that article that perhaps there’s a much more basic point I ought to make as well. I’ll make it about PL/SQL but it’s something that applies across all programming languages – it merits being italicised and emphasised, and possibly even the addition of 5 exclamation marks:

Any time you execute more than one SQL statement in a PL/SQL procedure the results of executing that procedure may not be self-consistent unless you have explicitly locked your session SCN to a fixed value!!!!!

Similarly: if you run a report from SQL*Plus which goes: “select this, select that, select the total of the other” – your report may not be self-consistent; if you have a web-application which goes: “select the customer balance, select the customer orders outstanding, select the customer credit notes” – your screenful may not be self-consistent.

By default Oracle operates at statement-level read-consistency for SQL. In a multi-user system if you run two SQL statements and someone else changes the data and commits between your two statement executions then the result from the second statement may not be consistent with the result from the first. Here’s a sillly little bit of code you can use to demonstrate the concept:

set serveroutput on

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 
) 
select 
        rownum id,    
        rownum n1,
        rownum n2,
        rpad('x',10,'x')  small_vc,
        rpad('x',100,'x') padding 
from 
        generator v1 
; 


create table t2 ( 
        id, 
        v1, 
        constraint t2_pk primary key(id) 
) 
organization index 
as 
select 
        cast(1 as number (6,0)), 
        cast('one' as varchar2(10)) 
from dual 
;
 
begin 
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

/*
   ----------------------------------------------------
   Table t3 only relevant to "serializable" experiments
   ----------------------------------------------------

create table t3 (
        id      number,
        v1      varchar2(10),
        padding varchar2(100) default rpad('x',100,'x')
)
initrans 4
;

alter table t3 add constraint t3_pk primary key(id) using index(
        create unique index t3_pk on t3(id) initrans 4
)
;

insert into t3 values (0,'0',rpad('x',100,'x'));
commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T3',
                method_opt       => 'for all columns size 1'
        );
end;
/

   ----------------------------------------------------
   End of optional t3 definition
   ----------------------------------------------------
*/

prompt  ===========================================================
prompt  Four second pause - only needed to avoid flashback problems
prompt  ===========================================================

execute dbms_lock.sleep(4)

-- ------------------------------------------------------------------------------
-- Options for "fixing" the SCN - plenty of scope for ORA-08177 when serializable
-- ------------------------------------------------------------------------------

-- set transaction read only;
-- execute dbms_flashback.enable_at_time(systimestamp)
-- execute dbms_flashback.enable_at_system_change_number(dbms_flashback.get_system_change_number)
-- alter session set isolation_level = serializable;

prompt  =====================
prompt  Starting PL/SQL block
prompt  =====================

declare
        m_v1    varchar2(10);
begin
        for r in (select * from t1 where rownum <= 10) loop

                select  v1
                into    m_v1
                from    t2
                where   t2.id = r.n2 - r.n1 + 1
                ;

--              insert into t3 values(r.id, m_v1, rpad('x',100));

                dbms_output.put_line(r.id || ' - ' || m_v1);
                dbms_lock.sleep(1);

        end loop;

end;
/

execute dbms_flashback.disable;
commit;

--
--      Code to be run from a second session once the 
--      first session displays the "loop running" banner
--

begin
        for r in 1..20 loop
                update t2 set v1 = lpad(r,10,0);
                commit;
                dbms_lock.sleep(0.5);
        end loop;
end;
.

The code covers several different tests – the basic test and the three “read-only” tests don’t need table t3 at all, which exists only so that we have something to insert into (and show a possible ORA-08177) for the serializable test. If you do the serializable test you might want to note what happens if you eliminate the initial insert of id = 0; you may also want to increase the number of rows selected, the number of cycles through the update loop, and the sleep times.

In principle the code emulates a classic “nested loop join done in SQL”, with one row in table t2, and a CURSOR FOR LOOP select from t1 being used to drive a select from t2 “for each row”. The peculiar predicate “t2.id = r.n2 – r.n1 + 1” inside the loop is an attempt to minimise the risk of any future release of the PL/SQL optimizer getting so clever that it puts the “constant” query for “t2.id = 1” outside the loop.

As the main loop cycles once per second, selecting the same row from t2 on each cycle, another session updates the row and commits twice every second. You might have thought that the main loop would keep returning the same value on every cycle – but each select starts at a new SCN. If you don’t take some sort of defensive action (set transaction read only, dbms_flashback) then you’ll get output similar to the following:


=================
Four second pause
=================

PL/SQL procedure successfully completed.

=====================
Starting PL/SQL block
=====================
1 - one
2 - 0000000002
3 - 0000000004
4 - 0000000006
5 - 0000000008
6 - 0000000010
7 - 0000000012
8 - 0000000014
9 - 0000000016
10 - 0000000018

PL/SQL procedure successfully completed.

I have seen many sites where this type of code is used. Typically it’s justified because the referenced table is a “very static” code table or a table that’s not supposed to change while the main loop is running – but eventually Murphy’s law will raise its ugly head.

(My favourite invocation of Murphy’s law came from my A-level physics teacher with respect to experimentation: “Constants aren’t and variables don’t”.)

September 28, 2015

Result Cache 2

Filed under: 12c,Flashback,Oracle,Performance — Jonathan Lewis @ 8:50 am BST Sep 28,2015

Following on from my earlier posting of problems with temporary table and the PL/SQL result cache (a combination which the manuals warn you against) here’s another problem – again, to a large degree, self-inflicted.

Imagine you have a complex report involving a large number of financial transactions with a need to include calculations about current exchange rates. Unfortunately the rules about calculating the appropriate exchange rate for any transaction are complex and you find you have a choice between adding 6 tables with outer joins and a couple of aggregate (max) subqueries to the base query or calling a PL/SQL function to calculate the exchange rate for each row. I’m going to create an extremely simplified model of this requirement:

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        case mod(rownum,2) when 0 then 'USD' else 'GBP' end v1,
        case mod(rownum,2) when 0 then 'GBP' else 'USD' end v2
from
        generator       v1
where
        rownum <= 1e3
;

create table t2 (
        v1      varchar2(3),
        v2      varchar2(3),
        cvt     number(10,6),
        constraint t2_pk primary key(v1,v2)
)
organization index
;

insert into t2 values('USD','GBP',0);
insert into t2 values('GBP','USD',1);

commit;

create or replace function plsql_ordinary(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number is
        m_ret number;
begin
        select /*+ ordinary trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_ordinary;
/

execute dbms_stats.gather_table_stats(user,'t2')

My t1 table represents the set of transactions but only has to worry about two exchange rates, the dollar/sterling and its inverse. My t2 table is an exchange rate table and I’ve loaded it with the two exchange rates I’m interested in. My function plsql_ordinary() takes two currency codes as inputs and returns the exchange rate.

Here’s the modelling query, with a little infrastructure to examine the workload. Note that this will only run on 12c because of the inline PL/SQL function I’ve used for slowing the query down.

set pagesize 20
set arraysize 6

set serveroutput off
alter system flush shared_pool;
alter session set statistics_level = all;

with
        function wait_row_n (
                i_secs          number,
                i_return        number default -1
        ) return number
        is
        begin
                dbms_lock.sleep(i_secs);
                return i_return;
        end wait_row_n;
select
        /*+ driver trace_this */
        wait_row_n(0.3,id),
        plsql_ordinary(v1,v2),
        (select /*+ scalar trace_this */ t2.cvt from t2 where t2.v1 = t1.v1 and t2.v2 = t1.v2) scalar_sub
from
        t1
where
        rownum <= 50
;

set pagesize 40

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select  sql_id, executions, fetches, rows_processed, sql_text
from    v$sql
where   lower(sql_text) like 'select%trace_this%'
and     lower(sql_text) not like '%v$sql%'
;

The query includes a scalar subquery in the select list to get the same data as the PL/SQL function, and you’ll see the point of that in a while. Because of the arraysize and input parameters to wait_row_n() the query will produce output in batches of 6 rows roughly every two seconds for a total of about 18 seconds – which will give me plenty of time to play around in another session. Before I try to do any damage, though, let’s check the execution plan of the report and the statistics of the queries with “trace_this” in their text:


--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |     50 |00:00:00.01 |      11 |
|*  1 |  INDEX UNIQUE SCAN | T2_PK |      2 |      1 |      2 |00:00:00.01 |       2 |
|*  2 |  COUNT STOPKEY     |       |      1 |        |     50 |00:00:00.01 |      11 |
|   3 |   TABLE ACCESS FULL| T1    |      1 |     50 |     50 |00:00:00.01 |      11 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."V1"=:B1 AND "T2"."V2"=:B2)
   2 - filter(ROWNUM<=50)


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------
f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

Looking at operation 1 in the plan you can see that the scalar subquery has started just twice (once for each distinct combination of currency codes).
Looking at the results from v$sql you can see that the query in the PL/SQL function was executed 50 times – once for each row. Functions like this can be a massive drain of resources (typically CPU, but also through latch contention on buffer cache and shared pool latches).

Conveniently we realise that in our system the derived exchange rates change very slowly – so how about telling Oracle that the exchange rate function is a deterministic function (it’s nearly true), or better still, perhaps, experiment with the PL/SQL Function Result Cache.

(Note very carefully, however, that the Database Administrators’ Manual for 11.2 makes the following comment about using the deterministic keyword with PL/SQL functions)

DETERMINISTIC

Tells the optimizer that the function returns the same value whenever it is invoked with the same parameter values (if this is not true, then specifying DETERMINISTIC causes unpredictable results).

Look carefully at that “unpredictable” – it’s true … but you might not realise it until too late. Our PL/SQL function is NOT deterministic – after all a function that queries the database to produce a result may produce a different result every time it executes if someone keeps changing the underlying data – but we might wave our hands a bit and say that the table isn’t going to change while we’re running our report so it’s okay to pretend it’s deterministic, we might even make it a read-only table for the duration. Similar thoughts should cross our minds about declaring a PL/SQL function to the result cache – even though the manual doesn’t say anything quite so explicitly threatening about the result cache. But what the heck, let’s just do it and see what happens:

create or replace function plsql_result_cache(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number
result_cache
is
        m_ret number;
begin
        select /*+ result cache trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_result_cache;
/

create or replace function plsql_deterministic(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number
deterministic
is
        m_ret number;
begin
        select /*+ det trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_deterministic;
/

...
select
        /*+ driver trace_this */
        wait_row_n(0.3,id),
        plsql_ordinary(v1,v2),
        plsql_deterministic(v1,v2),
        plsql_result_cache(v1,v2),
        (select /*+ scalar trace_this */ t2.cvt from t2 where t2.v1 = t1.v1 and t2.v2 = t1.v2) scalar_sub
from
...

All three functions returned the same set of results as the scalar subquery – and here’s the output from v$sql showing the impact of declaring a deteministic function and a result cache function (note that “result cache” is not a hint in the first statement, it’s just a convenient label):


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg          2          2              2
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1


The simple function ran 50 times, the deteministic function ran 18 times, and the result cache function ran twice. It required just two executions to get the two distinct results needed, after which the session used the result cache rather than calling the function again.

The deterministic function only remembers its results for the duration of a single database call – in this case the fetch – so on each fetch the session has to re-populate the session’s “deterministic cache”, which takes 2 calls for each fetch, a total of 9 fetches * 2 calls = 18 calls.

Clearly, if I can get away with it safely, the PL/SQL function result cache looks like a real winner, with the deterministic option coming a close second (given that real life ought to be using a significantly larger fetch arraysize). So what could possibly go wrong ? Two things – first, the results … and if the results can go wrong there’s not much point in talking about the second thing at the moment.

My query runs for 18 seconds, I’m going to start another session while it runs and update one of the rows in the t2 table a few seconds after my long-running query starts. Here’s the SQL I’ve run, an extract from the query output, and the results from v$sql:


update  t2 set
        cvt = 2
where   v1 = 'USD' 
;

commit;


WAIT_ROW_N(0.3,ID) PLSQL_ORDINARY(V1,V2) PLSQL_DETERMINISTIC(V1,V2) PLSQL_RESULT_CACHE(V1,V2) SCALAR_SUB
------------------ --------------------- -------------------------- ------------------------- ----------
                 1                     1                          1                         1          1
                 2                     0                          0                         0          0
                 3                     1                          1                         1          1
                 4                     0                          0                         0          0
                 5                     1                          1                         1          1
                 6                     0                          0                         0          0
                 7                     1                          1                         1          1
                 8                     0                          0                         0          0
                 9                     1                          1                         1          1
                10                     2                          0                         2          0
                11                     1                          1                         1          1
                12                     2                          0                         2          0
                13                     1                          1                         1          1
                14                     2                          2                         2          0
                15                     1                          1                         1          1
                16                     2                          2                         2          0


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg          4          4              4
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

The most important point is that we’ve got results that are not self-consistent – except for the scalar subquery results.

The SQL statements that are executed inside the PL/SQL functions do not run at the SCN of the driving query, each individual statement executes at its own starting SCN. This is an important point that is often overlooked when people write PL/SQL functions that are then called from SQL. The inline scalar subquery, on the other hand, always runs as at the start SCN of the driving query no matter how many times or how late in the lifetime of the driving query it runs.

If we examine the results we can see that the ordinary PL/SQL function and the result cache PL/SQL function changed their output the moment the commit took place (you’ll have to take my word on that, of course), but the deterministic function seemed to delay slightly. We can also see that the number of executions for the ordinary and deterministic functions didn’t change, but the result cache function doubled its executions.

Because of the way I’ve created my data and defined the function, the ordinary function executes its SQL once every row while the deterministic function executes its SQL twice every fetch (once for each pair of input values (though the first fetch from SQL*Plus is a special case) and then remembers the outputs for the rest of the fetch – this is why there is a delay before the deterministic function changes its output and doesn’t introduce any extra calls to the SQL – it was going to re-execute for both values on the fetch starting at id 13 whatever went on around it; the result cache function gets an invalidation warning the moment the other session commits, so re-executes its SQL as many times as needed to repopulate the bits of the cache that have been discarded – and that’s why we see the number of SQL calls doubling, the relevant part of the cache was identified by some sort of hash value for the statement with SQL_ID = ’49buxp3gba3cg’ so both results were discarded and reloaded even though only one of them actually had to change.

Critically every execution of the recursive statements runs at the then-current SCN – so when the underlying data changes our report sees those changes, the report is not self-consistent.

Fortunately there’s something we can do about this – if we want the whole report to operate at the same SCN all we need to do is freeze our session at a specific point in time with the command “set transaction read only;”. This has to be executed as the first statement of a transaction but if we can slip it in just before we start running our report we can ensure that all our SQL statements (including the recursive ones) run as at the same SCN until we issue a commit. When I repeated the data-changing experiment after setting the transaction read only the report ran to completion showing the correct results.

But there was a significant change in the output from v$sql:


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg         44         44             44
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

I did my update just after the first batch of rows had appeared on screen – notice how the result cache SQL has executed 44 times instead of 2 (or 4) times. When I set my transaction to read only it looks as if my session stopped using the result cache the moment the other session commited – and that’s a GOOD THING. If my session were able to continue using the result cache that would mean one of two things, either I would be seeing a result created by another user – which would be too new for me, or every other session would be seeing the results from my session – which would (typically) be out of date for them. The session seems to have protected itself from the risk of a result cached PL/SQL function producing inconsistent results – but the workload changed the moment another session committed a change to the data we were interested in.

At that point I stopped investigating “set transaction read only” until a couple of days later when I realised that there was one more thing I had to test – when I changed the data from another session I didn’t check to see what that session would see when it executed the cached function, so I modified the code run by the other session to do the following:


update  t2 set 
        cvt = 2 
where   v1 = 'USD' 
;

commit;

execute dbms_lock.sleep(6)
execute dbms_output.put_line(plsql_result_cache('USD','GBP'))

The other session goes to sleep for a while (to let the reporting session get through a little more work) and then calls the function. I was very pleased to see that the session returned the correct result – the value 2 that it had just written to the table. But what I got from the reporting session wasn’t so good:

WAIT_ROW_N(0.3,ID) PLSQL_ORDINARY(V1,V2) PLSQL_DETERMINISTIC(V1,V2) PLSQL_RESULT_CACHE(V1,V2) SCALAR_SUB
------------------ --------------------- -------------------------- ------------------------- ----------
                 1                     1                          1                         1          1
                 2                     0                          0                         0          0
                 3                     1                          1                         1          1
                 4                     0                          0                         0          0
                 5                     1                          1                         1          1
                 6                     0                          0                         0          0
                 7                     1                          1                         1          1
                 8                     0                          0                         0          0
...
                24                     0                          0                         0          0
                25                     1                          1                         1          1
                26                     0                          0                         0          0
                27                     1                          1                         1          1
                28                     0                          0                         0          0
                29                     1                          1                         1          1
                30                     0                          0                         2          0
                31                     1                          1                         1          1
                32                     0                          0                         2          0

SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED SQL_TEXT
------------- ---------- ---------- -------------- --------------------------------
49buxp3gba3cg         32         32             32 SELECT /*+ result cache trace_th
                                                   is */ CVT FROM T2 WHERE V1 = :B2
                                                    AND V2 = :B1

49buxp3gba3cg          1          1              1 SELECT /*+ result cache trace_th
                                                   is */ CVT FROM T2 WHERE V1 = :B2
                                                    AND V2 = :B1

2sh7bm59dkwhw         18         18             18 SELECT /*+ det trace_this */ CVT
                                                    FROM T2 WHERE V1 = :B2 AND V2 = 
                                                    :B1    

f1bz07bk5rbth         50         50             50 SELECT /*+ ordinary trace_this * 
                                                   / CVT FROM T2 WHERE V1 = :B2 AND
                                                    V2 = :B1

I changed t2 just after the first batch of rows had appeared (just after id 6), then called the function a few seconds later – and as I called the function from the other session it queried the data (the second child to 49buxp3gba3cg, executed just once above) and reloaded the result cache. At that moment (visible at id 30) the first session found data in the result cache and stopped re-executing its queries. When my session went read only it protected other sessions from the past by NOT re-populating the result cache as it ran its queries – but if it found something in the result cache it used it (notice how it has recorded 32 executions of the query rather than 44 – it found something in the result cache on the last 12 calls of the function). The protection only goes one way.

Observations

Using PL/SQL functions in the select list to encapsulate complex calculations that query the database is not something you can do casually. You have no guarantee that you will end up with a self-consistent result unless you take action to deal with the threats introduced by concurrent activity – ideally all tables involved should be set to read-only (which is only possible in 12c [Ed: see comment below] became possible from 11.1 onwards, though you can set a whole tablespace readonly in earlier versions: neither strategy is commonly viable). If you decide that you can work around those threats you still have the performance threat implied by the need to do some complex work for each driving row of your result set. For a report the simple solution to consistency is to “freeze” the session SCN by setting your session (transaction) into read only mode.

Once you’ve dealt with the consistency problem you can start to address the performance problen by claiming that you were using deterministic functions. You might be tempted to use the PL/SQL Result Cache to give you an even larger performance boost, but if you do you really have to make the table (or tablespace) read-only to be protected from read-consistency problems. The deterministic strategy may not be as dramatic in its effects as the result cache strategy but, given a reasonably sized fetch array, the performance benefit you get may be all you really need.

Whatever else you do, there’s an important underlying threat to remember. The moment you create a PL/SQL function that uses the result cache or deterministic option you have to ensure that nobody uses that function without ensuring that their code has handled the consistency threat properly. It’s too easy to forget, with the passing of time, that certain procedures have to be adopted when particular coding strategies are used.

Left as Exercise

I was planning to write a further article going into some detail about using dbms_flashback.enable_at_time(systimestamp) instead of “set transaction read only” – a mechanism that might be used to achieve the same read-consistency requirement though, perhaps, used less frequently than the older, less flexible option. I was also planning to include notes in the same araticle about the effect of “alter session set isolation_level = serializable” which some systems probably use to get read-consistency across multiple statements while writing results back to the database.

Both strategies run into the same issue as “set transaction read only”, with the added effect that your session (the one that has tried to “fix” its SCN) will repopulate the cache, so not only could you see newer results from other sessions in the cache; other sessions could see older results because you put them into the cache.

I’m not going to write up these demonstrations (which require fairly simple modifications to the code supplied above) as all the phenomena have been recorded as bugs on MoS (along with the GTT problem from my previous post, and a documentation bug for the PL/SQL manual to clarify the various threats):

  • Bug 21905592 : PL/SQL RESULT CACHE MUST BE BYPASSSED WHEN SESSION SCN NOT CURRENT
  • Bug 21905695 : BAD X-SESSION RESULTS W/ RESULT-CACHED PL/SQL FN AND “SET TXN READ ONLY”
  • Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT
  • Bug 21885173 : THE LNPLS SECTION “RESTRICTIONS ON RESULT-CACHED FUNCTIONS” NEEDS RE-WRITE

Footnote

I haven’t mentioned it here, but another strategy for reducing the number of PL/SQL calls is simply to wrap the function call inside a scalar subquery, along the lines of:

select
       (select plsql_ordinary(v1, v2) from dual),
       ...

Provided you don’t have many combinations of (v1,v2) to handle, and with a little luck with Oracle’s internal hashing function, you could find that scalar subquery caching reduces your execution count from once per row to once per combination. Note that the function is the “ordinary” function, not the “fake deterministic” one; internally Oracle uses the same hashing strategy for remembering the results, but the duration of the scalar subquery cache is the statement rather than the fetch.

 

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 6,550 other followers