Oracle Scratchpad

December 2, 2016

Histogram Upgrade

Filed under: Oracle,Statistics,Histograms — Jonathan Lewis @ 3:02 pm GMT Dec 2,2016

I’ve written notes in the past about the improvements 12c introduces for histograms – particularly the frequency and top-N histograms which can be generated “free of charge” while the basic “approximate NDV” scans are taking place to gather stats. Gathering histograms in 12c is much safer than it used to be in earlier versions of Oracle even in the case of the new hybrid histograms (which are still sampled on a very small sample and therefore still a stability risk).

There is a threat, though, recently highlighted by Franck Pachot, that sneaked in at 11.2.0.4 – the way in which the calculation of endpoint values for histograms on char() and nchar() columns has changed. When you upgrade from anything prior to 11.2.0.4 to either 11.2.0.4 or 12c you need to recreate any historgams on those columns; note that this does not apply to varchar2() and nvarchar2() columns, just the fixed length character types. If you fail to do this then you may find that Oracle produces some very silly estimates of cardinality, which could result in some very inefficient tablescans – in particular you are likely to find (as Franck showed) cases where you “know” that a particular value is IN the histogram but the optimizer behaves as if it isn’t – which means it uses the “half the least popular” estimate for the cardinality.

Here’s a little demo to show the underlying difference:


rem
rem     Script:         histogram_change_11204.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem     Purpose:
rem

create table t1 (v1 varchar2(32), c1 char(32));
insert into t1
select
        case when rownum <= 100 then 'N' else 'Y' end,
        case when rownum <= 100 then 'N' else 'Y' end
from
        all_objects
where
        rownum <= 1000
;
begin
        dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 10'
        );
end;
/

column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999
break on column_name skip 1

select
        column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_number
;

Here are the results from an instance of 11.1.0.7 (though anything up to 11.2.0.3 should produce the same), and 11.2.0.4 (and later – including 12.2):

Results 11.1.0.7
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  405,650,737,316,592,000,000,000,000,000,000,000    4e20202020203a7bb119d5f6000000
                                1000  462,766,002,760,475,000,000,000,000,000,000,000    59202020202034d998ff0b5ae00000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Results 11.2.0.4
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Look particularly at the first 6 bytes of the Hex version of the endpoint values for the char() column c1. In 11.1.0.7 you see “4e2020202020”, “592020202020” – that’s ASCII ‘N’ and ‘Y’ respectively, padded to 6 characters with spaces. In 11.2.0.4 the spaces have disappeared – the char() columns are now padded to 6 characters with zeros (which is how varchar2() columns have always been treated).

In 11.1.0.7 the optimizer will find a histogram entry for c1 = ‘Y’ and produce a cardinality of 900; if you upgrade the database to 11.2.0.4 without recreating the histograms the optimizer won’t find a histogram entry for the predicate and will produce a cardinality of 50 (i.e. 100 / 2).

Footnote

There’s a brief summary of the algorithm Oracle uses to generate values for character-based histograms at this URL.

 

November 29, 2016

Delete/Insert #2

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 4:33 pm GMT Nov 29,2016

In the previous post I threw out a couple of options for addressing the requirement to transfer data from one table to another (“cut and paste” rather than just “copy”) without running into odd inconsistency errors. This triggered of a wonderful comment trail of alternatives based on how large the volume might be, how relaxed the concurrency requirements might be, and so on.

A comment by SydOracle1 picked up on my failure to get Oracle working with the “as of SCN” syntax because it kept reporting ORA-08187 and suggested a straightforward use of the VERSIONS strategy. I thought it was quite surprising that this could work given that “as of SCN” didn’t, so I whipped up a quick test to check it – adding a couple of little refinements to the supplied sample – and it worked.

create table t1
as
select	object_id, object_name, owner
from	all_objects
;

create table t2
as
select	* from t1
where	rownum = 0
;

alter table t1 add constraint t1_pk primary key(object_id);

execute dbms_stats.gather_table_stats(user,'t1')

execute dbms_lock.sleep(5)

That dbms_lock.sleep() is very important for the purposes of this demonstration; it has to be just a few seconds otherwise the references back to earlier SCNs could report error: “ORA-01466: unable to read data – table definition has changed”. This is probably a side effect due to the 3 second interval in the capture that Oracle uses in the table smon_scn_time.

So now we do the following:

  • find the current SCN,
  • start a transaction,
  • get the transaction id,
  • delete the data from the source table,
  • find the current SCN again,
  • insert into the target table the data that was deleted by our transaction in the interval spanned by the two SCNs.

 


rem
rem     insert_delete_2.sql
rem     Jonathan Lewis
rem     Nov 2016
rem

column current_scn new_value m_start_scn 
select to_char(current_scn,'FM999999999999999999999') current_scn from v$database;

variable m_tx_id varchar2(20)
exec :m_tx_id := dbms_transaction.local_transaction_id(true)

column xid new_value m_xid
select xid from v$transaction where xidusn || '.' || xidslot || '.' || xidsqn = :m_tx_id;

delete from t1 where owner = 'SYSTEM';
commit;

column current_scn new_value m_end_scn format 999999999999999999
select to_char(current_scn,'FM999999999999999999999') current_scn from v$database;
 
insert	into t2
select	r.*
from	t1  versions between scn &m_start_scn and &m_end_scn r
where 
	versions_operation = 'D'
and	versions_xid = '&m_xid'
and	owner = 'SYSTEM'
;

commit;

There are a couple of variants on getting the transaction ID – I decided to use a function call to start a transaction without doing any work rather than doing the delete and then finding the transaction id that the delete initiated (I could have linked v$session for my SID to v$transaction after the delete). Because of the choice I made I have to do a little bit of messing around in the subsequent code – the function call returns the transaction ID in the form 31.16.19111 (that’s undo segment, slot number, sequence) but the VERSIONS mechanism wants a transaction ID in its HEX form which, for the example shown, would be ‘1F001000A74A0000’. I could have converted the three part form to the other using a messy bit of to_char(,’XXXXXXXX’) code, but I was feeling a little lazy.

To my surprise I didn’t see any ORA-08187 errors – which made me look back at the notes I had jotted down on the couple of tests I’d initially tried to find out what I had been doing wrong. My first attempt did the insert first then tried to do the delete “as of SCN” and failed, so my second attempt tried to do the delete first just in case the problem related to using “as of SCN” in the middle of a transaction:


column current_scn new_value m_scn

prompt  ============
prompt  Insert first
prompt  ============

select to_char(current_scn,'FM999999999999999999') current_scn from V$database;

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

delete from t1 as of scn &m_scn r
where owner = 'SQLTXADMIN'
;

-- ORA-08171: snapshot expression not allowed here

rollback;

prompt  ============
prompt  Delete first
prompt  ============

select to_char(current_scn,'FM999999999999999999') current_scn from V$database;

delete from t1 as of scn &m_scn r
where owner = 'SQLTXADMIN'
;

-- ORA-08171: snapshot expression not allowed here

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

rollback;

Clearly “delete as of scn” is illegal.

Of course, if I’d gone a little further with this idea I might have tried starting with a delete that didn’t use “as of SCN”, and then the code would have succeeded. In fact, though, this wouldn’t be a perfect solution because it would allow a window for error: some other session might delete a relevant row between my call for SCN and my delete, which means my insert would insert a row deleted by another user.

The code could be modified though in its choice of SCN. Provided I started my transaction with the delete I could then query v$transaction for the start SCN for the transaction, and use that as the “as of” SCN for the insert:


delete from t1
where owner = 'SQLTXADMIN'
;

select
        to_char(
                start_scnw * power(2,32) + start_scnb,
                'FM999999999999999999'
        )       current_scn
from
        v$transaction
where   ses_addr = (
                select  saddr
                from    v$session
                where   sid = (
                        select  sid
                        from    V$mystat
                        where   rownum = 1
                )
        )
;

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

By using the delete to initiate the transaction and set the SCN I think we block any window of inconsistency and, apart from the messy little bit of code that finds the transaction entry, we have an even simpler piece of code than the example give by SydOracle.

Collaboration or, kicking ideas around, is a wonderful way to learn.

 

November 22, 2016

Delete/Insert

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 12:59 pm GMT Nov 22,2016

Many of the questions that appear on OTN are deceptively simple until you start thinking carefully about the implications; one such showed up a little while ago:

What i want to do is to delete rows from table where it matches condition upper(CATEGORY_DESCRIPTION) like ‘%BOOK%’.

At the same time i want these rows to be inserted into other table.

The first problem is this: how carefully does the requirement need to be stated before you can decide how to address it? Trying to imagine awkward scenarios, or boundary conditions, can help to clarify the issue.

If you delete before you insert, how do you find the data to insert ?

If you insert before you delete, what happens if someone updates a row you’ve copied so that it no longer matches the condition. Would it matter if the update changes the row in a way that leaves it matching the condition (what you’ve inserted is not totally consistent with what you’ve deleted).

If you insert before you delete, and someone executes some DML that makes another row match the requirement should you delete it (how do you avoid deleting it) or leave it in place.

Once you start kicking the problem about you’ll probably come to the conclusion that the requirement is for the delete and insert to be self-consistent – in other words what you delete has to be an exact match for what you insert as at the time you inserted it. You’ll ignore rows that come into scope in mid-process due to other activity, and you’ll have to stop people changing rows that are being transferred (in case there’s an audit trail that subsequently says that there was, at some point in time, a row that matched the condition but never arrived – and a row that has arrived that didn’t match the final condition of the rows that disappeared).

Somehow your code needs to lock the set of rows to be transferred and then transfer those rows and eliminate them. There are two “obvious” and simple strategies – readers are invited to propose others (or criticise the two I – or any of the comments – suggest). I’ll start with a simple data setup for testing:


create table t1
as
select  object_id, object_name, owner
from    all_objects
;

alter table t1 add constraint t1_pk primary key(object_id);

create table t2
as
select  * from t1
where   rownum = 0
;

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

Option 1:

The simplest approach is often the best – until, perhaps, you spot the drawbacks – do a basic delete of the data to be transferred (which handles the locking) but wrap the statement in a PL/SQL block that captures the data (using the returning clause) and then inserts it into the target table as efficiently as possible. With thanks to Andrew Sayer who prompted this blog post:

declare
        type t1_rows is table of t1%rowtype;
        t1_deleted t1_rows;

begin
        delete from t1 where owner = 'SYSTEM'
        returning object_id, object_name, owner bulk collect into t1_deleted;

        forall i in 1..t1_deleted.count
                insert into t2 values t1_deleted(i);

        commit;
end;
/

The drawback to this, of course, is that if the volume to be transferred is large (where “large” is probably a fairly subjective measure) then you might not want to risk the volume of memory (PGA) it takes to gather all the data with the bulk collect.

Option 2:

For large volumes of data we could reduce the threat to the PGA by gathering only the rowids of the rows to be transferred (locking the rows as we do so) then do the insert and delete based on the rowids:

declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;
        fetch c1 bulk collect into r;
        close c1;

        forall i in 1..r.count
                insert into t2 select * from t1 where rowid = r(i);

        forall i in 1..r.count
                delete from t1 where rowid = r(i);

        commit;
end;
/

Note, particularly, the “for update” in the driving select.

Inevitably there is a drawback to this strategy as well (on top of the threat that the requirement for memory might still be very large even when the return set is restricted to just rowids). We visit the source data (possibly through a convenient index and avoid visiting the table, [silly error deleted – see comment 8]) to collect rowids; then we visit the data again by rowid (which is usually quite efficient) to copy it, then we visit it again (by rowid) to delete it. That’s potentially a significant increase in buffer cache activity (especially latching) over the simple “delete returning” strategy; moreover the first strategy gives Oracle the option to use the index-driven optimisation for maintaining indexes and this method doesn’t. You might note, by the way, that you could include an “order by rowid” clause on the select; depending on your data distribution and indexes this might reduce the volume of random I/O you have to do as Oracle re-visits the table for the inserts and deletes.

We can address the PGA threat, of course, by fetching the rowids with a limit:


declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;

--      dbms_lock.sleep(60);

        loop
                fetch c1 bulk collect into r limit 5;

                forall i in 1..r.count
                        insert into t2 select * from t1 where rowid = r(i);

                forall i in 1..r.count
                        delete from t1 where rowid = r(i);

                exit when r.count != 5;
        end loop;

        close c1; 

        commit;
end;
/

One thing to be aware of is that even though we fetch the rowids in small batches we lock  all the relevant rows when we open the cursor, so we don’t run into the problem of inserting thousands of rows into t2 and then finding that the next batch we select from t1 has been changed or deleted by another session. (The commented out call to dbms_lock.sleep() was something I included as a way of checking that this claim was true.) This doesn’t stop us running into a locking (or deadlocking) problem, of course; if it takes us 10 seconds to lock 1M rows in our select for update another user might manage to lock our millionth row before we get there; if, a few seconds later, it then gets stuck in a TX/6 wait trying to lock one of our locked rows after we start waiting in a TX/6 wait for our millionth row our session will time out after 3 further seconds with an ORA-00060 deadlock error.

The limit of 5 is just for demonstration purposes, of course – there were 9 rows in all_objects that matched the select predicate; in a production system I’d probably raise the limit as high as 255 (which seems to be the limit of Oracle’s internal array-processing).

You’ll notice, of course, that we can’t use this limited fetch approach with the delete command – the entire delete would take place as we opened the equivalent cursor and, though we can use the bulk collect with the returning clause, there is no syntax that allows something like the fetch with limit to take place.

Discarded option

My first thought was to play around with the AS OF SCN clause.  Select the current SCN from v$database and then do things like delete “as of scn”, or “select for update as of scn” – there were ways of getting close, but invariably I ended up running into Oracle error: “ORA-08187: snapshot expression not allowed here”. But maybe someone else can come up with a way of doing this that doesn’t add significant overheads and doesn’t allow for inconsistent results.

November 14, 2016

Distributed Trap

Filed under: distributed,Oracle — Jonathan Lewis @ 1:19 pm GMT Nov 14,2016

Here’s an interesting (and potentially very useful) observation from an OTN database forum thread that appeared at the end of last week. It concerns the problems of pulling data from remote systems, and I’ll start by building some data:

rem
rem     Script:         remote_insert_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(sysdate) + rownum d1,
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',50)            padding
from
        generator
where
        rownum <= 500
;

alter table t2 add constraint t2_pk primary key(id);

create table t3
as
select sysdate d1, t1.* from t1
where rownum = 0
;

--  Now gather stats if you need to (depending on version)

I’ve created three tables. Table t3 is an empty copy of table t1 with a date column added, and t2 is some sort of reference table that looks rather like table t1 but has a lot less data. Now I’m going to pretend that t1 and t2 are in a remote database while t3 is in the local database and copy data from t1 to t3, omitting any data that is referenced in t2. The SQL is simple:


define m_target=test@loopback

insert into t3(
        id, small_vc, padding
)
select
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

----------------------------------------------------------------------
| Id  | Operation                | Name | Cost (%CPU)| Inst   |IN-OUT|
----------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     0   (0)|        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |            |        |      |
|   2 |   REMOTE                 |      |            |   TEST | R->S |
----------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - EXPLAIN PLAN INTO PLAN_TABLE@! FOR SELECT
       "A1"."ID","A1"."SMALL_VC","A1"."PADDING" FROM "T1" "A1" WHERE
       "A1"."ID"<>ALL (SELECT "A2"."ID" FROM "T2" "A2") (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

I’ve set up an SQL*Plus substitution variable to hold a database link name (and used a loopback qualifier so that I can pretend t1 and t2 are in a remote database. The execution plan (taken from an explain plan, but confirmed by running the query and calling dbms_xplan.display_cursor) shows that Oracle has executed select part of the insert as a “fully remote” statement – which is nice.

Unfortunately I forgot to include a datestamp as I copied the data over. So let’s modify the query to do that:


insert into t3(
        d1,
        id, small_vc, padding
)
select
        sysdate,
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

--------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |      1 |        |   123 (100)|      0 |00:00:27.42 |   10908 |
|   1 |  LOAD TABLE CONVENTIONAL |      |      1 |        |            |      0 |00:00:27.42 |   10908 |
|*  2 |   FILTER                 |      |      1 |        |            |  49500 |00:00:26.51 |       0 |
|   3 |    REMOTE                | T1   |      1 |  50000 |   113   (3)|  50000 |00:00:00.33 |       0 |
|   4 |    REMOTE                | T2   |  50000 |      1 |     0   (0)|    500 |00:00:25.29 |       0 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM  "T2" WHERE "T2"."ID"=:B1))

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","SMALL_VC","PADDING" FROM "T1" "T1"
       (accessing 'TEST.LOCALDOMAIN@LOOPBACK' )

   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID" FROM "T2" "T2" WHERE "ID"=:1 (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

Whoops, the plan just changed – it also took 27.4 seconds instead of the 1.1 seconds that it used to – that’s because of the 50,000 remote calls to execute a remote query for the subquery filter. The addition of sysdate (which is the local sysdate@!) to the select list has made the select statement distributed instead of fully remote, and the query for a CTAS or “insert/select” has to be driven from the local site if it’s a distributed query.

Okay, plan (b), don’t insert sysdate, add it to the table as a default:


alter table t3 modify d1 default sysdate;

This doesn’t help; even though the query doesn’t mention sysdate explicitly the query is still treated as disrtibuted query.

Okay, plan (c) – don’t insert sysdate, add a trigger to the table:

alter table t3 modify d1 default null;

create or replace trigger t3_bri
before insert on t3
for each row
begin
        :new.d1 := sysdate;
end;
/


And this works (in 11.2.0.4 and 12.1.0.2, at least, which are the versions I tested).

I could have made the trigger a little more sophisticated, of course, but the point of the post was simply to demonstrate a problem and a simple workaround.

There are probably several other commonly used features (various sys_context() calls, perhaps) that have the same effect.

 

November 7, 2016

Reorg

Filed under: fragmentation,humour,Oracle — Jonathan Lewis @ 5:31 pm GMT Nov 7,2016

A current question on the OTN database forum asks: “What’s the difference between object and tablespace reorganization?” Here’s an analogy to address the question.

I have three crates of Guiness in the boot (trunk) of my car, one crate has 4 bottles left, one has 7 bottles left and one has 2 bottles. I also have two cases of Louis Roederer Brut NV champagne, one case has 2 bottles left and one has only one. (I have two objects in my tablespace – one of type Beer, one of type Champagne – and my boot requires manual free space management .)

I move all the Guiness bottles into a single crate and all the champagne bottles into a single case. That’s a couple of “shrink space compact” calls – I’ve re-organised the objects to get all the bottles in each object close to each other, but the crates are still taking up space in the boot.

I take the two empty crates and the empty case out of the boot. That’s a couple of “resize” (or “shrink space” without “compact”) calls that free up space in the boot.

I now want to load a port barrel into car, but it won’t fit until I slide the remaining beer crate and champagne case together at one side of the boot. That’s a couple of “move” commands that have reorganized the boot (tablespace) to make the free space usable.

 

Filter Subquery

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 1:04 pm GMT Nov 7,2016

There’s a current thread on the OTN database forum showing an execution plan with a slightly unusual feature. It looks like this:

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2 |   HASH JOIN                              |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |    TABLE ACCESS FULL                     | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |    HASH JOIN                             |                                |    11M|  1837M|   810M| 57206 |       |       |  
|   5 |     INDEX FAST FULL SCAN                 | PDTCOST_BILL_INV_TRACK         |    29M|   475M|       | 16107 |       |       |  
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID    | BILL_INVOICE_DETAIL            |  5840K|   478M|       |     2 |       |       |  
|   7 |      NESTED LOOPS                        |                                |    11M|  1634M|       |     6 |       |       |  
|   8 |       NESTED LOOPS                       |                                |     2 |   120 |       |     3 |       |       |  
|   9 |        TABLE ACCESS FULL                 | JDL_WORK_LIST                  |     2 |    96 |       |     2 |       |       |  
|  10 |        PARTITION RANGE ITERATOR          |                                |       |       |       |       |   KEY |   KEY |  
|  11 |         TABLE ACCESS BY LOCAL INDEX ROWID| BILL_INVOICE                   |     1 |    12 |       |     1 |   KEY |   KEY |  
|* 12 |          INDEX UNIQUE SCAN               | BILL_INVOICE_XSUM_BILL_REF_NO  |     1 |       |       |       |   KEY |   KEY |  
|  13 |       PARTITION RANGE ITERATOR           |                                |       |       |       |       |   KEY |   KEY |  
|* 14 |        INDEX RANGE SCAN                  | BILL_INVOICE_DETAIL_PK         |    32 |       |       |     1 |   KEY |   KEY |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  

Spot the oddity ? If not, here’s a collapsed version of the plan that makes it easier to see – if you were viewing this plan through OEM or one of the other GUI interfaces to execution plans you’d probably be able to do this by clicking on some sort of  “+/-”  symbol by operation 4:

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2 |   HASH JOIN                              |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |    TABLE ACCESS FULL                     | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |    HASH JOIN                             |                                |    11M|  1837M|   810M| 57206 |       |       |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  

How often have you seen a HASH JOIN (operation 2) with three child operations (3, 4, 15) ?

It’s not a formatting error – but since I’ve shown neither the Predicate section of the report nor the original query it’s a little difficult to recognise what’s going on, so here’s the critical part of the original WHERE clause:

AND     P.TRACKING_ID      = PCM.TRACKING_ID  
AND     P.TRACKING_ID_SERV = PCM.TRACKING_ID_SERV  
AND     (   (P.BILLING_INACTIVE_DT IS NULL AND PCM.INACTIVE_DT IS NULL)  
         OR (PCM.ACTIVE_DT = (SELECT MAX(ACTIVE_DT) FROM PDTCOST_CHARGE_MAP PCM1 ))
        )
;  

Operation 4 produces a set of rows derived by joining table P (an alias for pdtcost) to a couple of other tables, and operation 2 joins this to PCM (an alias for pdtcost_change_map) with a simple two-column equality and then introduces a pair of problems: first an “OR SUBQUERY” construct, secondly a predicate that requires data from both tables to be examined before any more rows can be discarded.

Just to clarify the performance implication of this combination of predicates:

If we start from pdtcost (p):

  • If the billing_inactive_dt is null we don’t discard it because satisfires a predicate and we need to check the matching pcm.inactive_dt.
  • If the billing_inactive_dt is NOT null we still can’t discard it because the matching pcm.active_dt may satisfy the subquery predicate.
  • Whatever the state of billing_inactive_dt we have to find the matching pcm row(s)

Starting from pdtcost_charge_map (pcm):

  • We can’t unnest the subquery and use it to drive into p (because of the OR), so we have to scan pcm to apply the subquery.
  • If the active_dt satisfies the subquery we have to find the matching p row.
  • If the active_dt doesn’t satisfy the subquery but pcm_inactive_dt is null we still have to find the matching p row to check the billing_dt.
  • The only time we don’t need to probe p for a match is if the active_dt doesn’t match the subquery and the inactive_dt is not null – which tells us that for a very specific data pattern we have the potential for a (relatively) efficient access path; however this path would require the optimizer to test one part of an OR predicate at one operation in the plan and the second part of the OR predicate at a different operation of the plan and it’s not programmed to do that, so the entire compound predicate test is always run late.

Returning to the question of interpreting this plan with three child operations for a hash join – what does it mean and how does it work ? In effect the plan is the wrong shape – it has concealed a filter operation.  As the join between the two tables takes place the rows are tested against the simple filter condition – each row that satisfies the predicate is passed to the next operation of the plan; for any row doesn’t satisfy the simple filter predicate the subquery is executed to provide a check against active_dt (fortunately, since this is a “constant” subquery we benefit enormously from scalar subquery caching and the subquery will run a most once in the lifetime of the whole query.)

The plan would probably be easier to understand if it looked like this (which may actually be how it would have looked in Oracle 8i):

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2a|   FILTER                                 |                                |   140K|    26M|  1293M|   133K|       |       |  
|*  2b|    HASH JOIN                             |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |     TABLE ACCESS FULL                    | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |     HASH JOIN                            |                                |    11M|  1837M|   810M| 57206 |       |       |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  
 

Predicate Information (identified by operation id):
---------------------------------------------------
   2a - filter("P"."BILLING_INACTIVE_DT" IS NULL AND "PCM"."INACTIVE_DT" IS NULL
               OR "PCM"."ACTIVE_DT"= (SELECT MAX("ACTIVE_DT") FROM "PDTCOST_CHARGE_MAP" "PCM1"))

   2b - access("P"."TRACKING_ID"="PCM"."TRACKING_ID" AND
               "P"."TRACKING_ID_SERV"="PCM"."TRACKING_ID_SERV")

This modified plan makes it clear that the hash join (2b) is followed by execution of the filter (2a) subquery (though we can safely infer that the subquery runs only for join rows where at least one of p.billing_inactive_dt or pcm.inactive_dt is not null).

You might wonder whether Oracle actually runs the subquery once at a very early point in the query so that it can, effectively, turn the subquery predicate into “active_dt = {derived constant}” – it’s fairly easy to show that this isn’t the case. Perhaps the most obvious way to do this is to run the query with rowsource execution stats enabled after setting billing_inactive_dt and inactive_dt to null for every row in their respective tables – because if you do that the subquery won’t be run at all.

If you want to experiment with this problem, here’s some code to model it:


drop table pdtcost purge;
drop table pdtcost_charge_map purge;

create table pdtcost
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum,100)                 filter_col,
        rownum                          tracking_id,
        rownum                          tracking_id_serv,
        decode(
                mod(rownum,97),
                0 , trunc(sysdate),
                    null
        )                               billing_inactive_dt,
/*
        to_date(null)                   billing_inactive_dt,
*/
        lpad('x',100,'x')               padding
from
        generator       v2
where
        rownum <= 1e4
;

alter table pdtcost add constraint pdt_pk primary key(tracking_id, tracking_id_serv);

create table pdtcost_charge_map
nologging
as
with generator as (
        select
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum                          tracking_id,
        rownum                          tracking_id_serv,
        decode(
                mod(rownum,93), 
                0 , trunc(sysdate),
                    null
        )                               inactive_dt,
/*
        to_date(null)                   inactive_dt,    
*/
        trunc(sysdate + dbms_random.value(-100,0))      active_dt,
        lpad('x',100,'x')               padding
from
        generator       v2
where
        rownum <= 1e4
;

alter table pdtcost_charge_map add constraint pcm_pk primary key(tracking_id, tracking_id_serv, active_dt);
-- create index pcm_act_dt on pdtcost_charge_map(active_dt);

-- gather basic table stats if your version needs to.

select
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where
        p.filter_col = 0
and     p.tracking_id      = pcm.tracking_id
and     p.tracking_id_serv = pcm.tracking_id_serv
and     (   (p.billing_inactive_dt is null and pcm.inactive_dt is null)
         or (pcm.active_dt = (select max(active_dt) from pdtcost_charge_map pcm1 ))
        )
;

The original question started with a table of 30 million rows and a result set of only 450 rows – suggesting that there ought to be a lot of scope for finding ways to eliminate data early. One possibility, assuming the appropriate indexes exist (which is why I have defined, but commented out, the pcm_act_dt index above), is to convert this query into a union all (taking care to eliminate duplication in the result set) in the following way:

select
        /*+ leading(p pcm) use_nl(pcm) */
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where
        p.filter_col = 0
and     pcm.tracking_id      = p.tracking_id
and     pcm.tracking_id_serv = p.tracking_id_serv
and     (p.billing_inactive_dt is null and pcm.inactive_dt is null)
union all
select
        /*+ leading(p pcm) use_nl(pcm) */
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where   
        p.filter_col = 0
and     pcm.tracking_id      = p.tracking_id   
and     pcm.tracking_id_serv = p.tracking_id_serv   
and     (p.billing_inactive_dt is not null or pcm.inactive_dt is not null)
and     pcm.active_dt = (select /*+ unnest */ max(active_dt) from pdtcost_charge_map pcm1)
;

Here is the resulting execution plan when the pcm_act_dt index exists. I had to hint the table order and join mechanism because my tables were rather small and the selectivity relatively high – it’s probably safe to assume that selectivities are much better on the original data set and that a path like this is more likely to be chosen unhinted (the full tablescan on pdtcost is irrelevant in the context of the demonstration):


---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                    |      1 |        |     98 |00:00:00.06 |     657 |
|   1 |  UNION-ALL                     |                    |      1 |        |     98 |00:00:00.06 |     657 |
|   2 |   NESTED LOOPS                 |                    |      1 |     99 |     98 |00:00:00.05 |     386 |
|   3 |    NESTED LOOPS                |                    |      1 |     99 |     99 |00:00:00.05 |     287 |
|*  4 |     TABLE ACCESS FULL          | PDTCOST            |      1 |     99 |     99 |00:00:00.01 |     173 |
|*  5 |     INDEX RANGE SCAN           | PCM_PK             |     99 |      1 |     99 |00:00:00.01 |     114 |
|*  6 |    TABLE ACCESS BY INDEX ROWID | PDTCOST_CHARGE_MAP |     99 |      1 |     98 |00:00:00.01 |      99 |
|   7 |   NESTED LOOPS                 |                    |      1 |      2 |      0 |00:00:00.01 |     271 |
|   8 |    NESTED LOOPS                |                    |      1 |    100 |      1 |00:00:00.01 |     270 |
|*  9 |     TABLE ACCESS FULL          | PDTCOST            |      1 |    100 |    100 |00:00:00.01 |     166 |
|* 10 |     INDEX UNIQUE SCAN          | PCM_PK             |    100 |      1 |      1 |00:00:00.01 |     104 |
|  11 |      SORT AGGREGATE            |                    |      1 |      1 |      1 |00:00:00.01 |       2 |
|  12 |       INDEX FULL SCAN (MIN/MAX)| PCM_ACT_DT         |      1 |      1 |      1 |00:00:00.01 |       2 |
|* 13 |    TABLE ACCESS BY INDEX ROWID | PDTCOST_CHARGE_MAP |      1 |      1 |      0 |00:00:00.01 |       1 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("P"."FILTER_COL"=0 AND "P"."BILLING_INACTIVE_DT" IS NULL))
   5 - access("PCM"."TRACKING_ID"="P"."TRACKING_ID" AND
              "PCM"."TRACKING_ID_SERV"="P"."TRACKING_ID_SERV")
   6 - filter("PCM"."INACTIVE_DT" IS NULL)
   9 - filter("P"."FILTER_COL"=0)
  10 - access("PCM"."TRACKING_ID"="P"."TRACKING_ID" AND
              "PCM"."TRACKING_ID_SERV"="P"."TRACKING_ID_SERV" AND "PCM"."ACTIVE_DT"=)
  13 - filter(("P"."BILLING_INACTIVE_DT" IS NOT NULL OR "PCM"."INACTIVE_DT" IS NOT NULL))


You’ll notice that this plan also displays an interesting little quirk – at operation 10 we can see the index unique scan of index pcm_act_dt that occurs once for each row returned from pdtcost; but each unique scan is preceded by a call to run the subquery (except that scalar subquery caching means the subquery runs only once in total) to supply a value for active_dt that can be used in the unique scan. (In the absence of the pcm_act_dt index the full scan min/max would be a fast full scan of the primary key.)

With a little luck the OP will be able to apply the same strategy to his query, though it may be a little harder to get the desired plan since the original query includes 6 tables; but the principle doesn’t change.

Footnote:

various people on the OTN thread have pointed out that there are some odd details about the optimizers cardinality predictions which may mean that part of the problem is simply an issue of misleading (possibly out of date) object statistics. It’s possible that with better estimates the optimizer may change the plan so much that even the strategy of getting all the rows from pdtcost_charge_map related to the rows acquired from pdtcost and then eliminating based on a late filter may be efficient enough for the OP.  By changing the data volume and distribution in my test case one of the plans (which predicted 100 rows from 100,000) was as follows:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |      1 |        |     98 |00:00:01.58 |    2307 |
|   1 |  NESTED LOOPS                |                    |      1 |     98 |     98 |00:00:01.58 |    2307 |
|   2 |   NESTED LOOPS               |                    |      1 |    100 |    100 |00:00:00.01 |    1814 |
|*  3 |    TABLE ACCESS FULL         | PDTCOST            |      1 |    100 |    100 |00:00:00.01 |    1699 |
|*  4 |    INDEX RANGE SCAN          | PCM_PK             |    100 |      1 |    100 |00:00:00.01 |     115 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| PDTCOST_CHARGE_MAP |    100 |      1 |     98 |00:00:01.57 |     493 |
|   6 |    SORT AGGREGATE            |                    |      1 |      1 |      1 |00:00:01.57 |     393 |
|   7 |     INDEX FAST FULL SCAN     | PCM_PK             |      1 |    100K|    100K|00:00:01.02 |     393 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("P"."FILTER_COL"=0)
   4 - access("P"."TRACKING_ID"="PCM"."TRACKING_ID" AND
              "P"."TRACKING_ID_SERV"="PCM"."TRACKING_ID_SERV")
   5 - filter((("P"."BILLING_INACTIVE_DT" IS NULL AND "PCM"."INACTIVE_DT" IS NULL) OR
              "PCM"."ACTIVE_DT"=))

October 24, 2016

Anniversary OICA

Filed under: CBO,Oracle,Performance,Statistics,Troubleshooting — Jonathan Lewis @ 1:00 pm GMT Oct 24,2016

Happy anniversary to me!

On this day 10 years ago I published the first article in my blog. It was about the parameter optimizer_index_cost_adj (hence OICA), a parameter that has been a  source of many performance problems and baffled DBAs over the years and, if you read my first blog posting and follow the links, a parameter that should almost certainly be left untouched.

It seems appropriate to mention it today because I recently found a blog posting (dated 3rd May 2013) on the official Oracle Blogs where the director for Primavera advises setting this parameter to 1 (and the optimizer_index_caching parameter to 90) for the Primavera P6 OLTP (PMDB) database. The recommendation is followed by a fairly typical “don’t blame me” warning, viz: “As with any changes that affect query optimization, it is paramount to TEST, TEST and TEST again. At least these settings are easily adjusted or change back to the original value”.

Here’s a thought, though: setting the optimizer_index_cost_adj to the extreme value 1 is a catastrophic change so don’t suggest it unless you are extremely confident that it’s almost certain to be the right thing to do. If you’re confident that it’s a good idea to reduce the parameter to a much smaller value than the default then suggest a range of values that varies from “ideal if it works, but high risk” to “low risk and mostly helpful”. Maybe a suggestion like: “Primavera P6 OLTP (PMDB) tends to work best with this parameter set to a value in the range of 1 to 15” would be a more appropriate comment from someone in a position of authority.

Here’s another thought: if you work for Oracle you could always contact the optimizer group to present them with your argument for the strategy and see what they think about it. Then you can include their opinion when you offer your suggestion.

For what it’s worth, here’s my opinion: as a general rule you shouldn’t be working around performance issues by fiddling with the optimizer_index_cost_adj; as a specific directive do not set it to 1. If you want to encourage Oracle to be enthusiastic about indexes in general then adjust the system statistics (preferably with a degree of truth). If you need to persuade Oracle that particular indexes are highly desirable than you can use dbms_stats.set_index_stats() to adjust the clustering_factor (and avg_data_blocks_per_key) of those indexes. If you are running 11.2.0.4 or later then you can use dbms_stats.set_table_prefs() to set the “table_cached_blocks” parameter for tables where you think Oracle should be particularly keen on using indexes but isn’t; and if your queries are suffering from bad cardinality estimates because of a pattern of multi-column filter predicates create some column group (extended) statistics.

Why am I so firmly set against setting the optimizer_index_cost_adj to 1 ? Because it doesn’t tell Oracle to “use indexes instead of doing tablescans”, it tells Oracle that every index is just about as good as every other index for almost any query. Here’s a pdf file of an article (formerly published on DBAZine and then on my old website) I wrote over twelve years ago explaining the issue. Various links in the article no longer work, and the data pattern was generated to display the problem in 8i and 9i and you would need to modify the data to display the same effect in newer versions of Oracle – but the principle remains the same.

If you would like to see a slightly newer example of how the parameter causes problems. Here’s a thread dated April 2012 from the OTN database forum where a SYS-recursive query caused a performance problem because the parameter was set 1.

 

October 20, 2016

Conjuctive Normal Form

Filed under: 12c,Exadata,in-memory,Oracle — Jonathan Lewis @ 1:00 pm GMT Oct 20,2016

I recently tweeted about a comment I’d picked up at the Trivadis performance days regarding tablescans and performance.

“If you can write your SQL in conjunctive normal form it can help the optimizer to offload more predicates”

Inevitably someone asked me if I had an example to demonstrate this – I didn’t, and still don’t really, but here’s an interesting demo based on an example from the Oracle In-Memory blog showing how the optimizer will rearrange your filter predicates before passing them to the tablescan code for evaluation against an inmemory table.


rem
rem     Script:         in_memory_conjunctive.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        trunc(dbms_random.value(1,501)) qty,
        mod(rownum,200) + 1             part_no,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7
;
prompt  ==========
prompt  Base query
prompt  ==========

select
        count(v1)
from
        t1
where
        (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  ===============
prompt  predicate added
prompt  ===============

select
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  =================
prompt  Ordered predicate
prompt  =================

select  /*+ ordered_predicates */
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;

The 2nd and 3rd queries add a predicate to the first query – which, unfortunately, changes the estimated cardinality even though it has no effect on the result. This predicate is one that would be added by the inmemory code path if the table were declared to be inmemory. I’ve got two versions of the query, one with the (deprecated) ordered_predicates hint because in my initial tests the optimizer swapped the order of the predicates and I wanted to see if the ordering was at all critical.

Here’s the plan for the base query – first before declaring the table inmemory, then after declaring the table inmemory:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14739 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  1862K| 14739   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |       |       |  1974 (100)|          |
|   1 |  SORT AGGREGATE             |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   100K|  1862K|  1974  (44)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))) filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))

And here, after putting the table back to no inmemory are the plans for the second and third queries; note, particularly the different order of the predicates in the predicate section: the predicate order matches the inmemory predicate order only if I use the ordered_predicates hint:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)) AND ("QTY">495
              OR "QTY"<3)))
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND
              "PART_NO"=50))))

Finally the run times – after running the queries a few times each to check for consistency:

  • Base query: 0.82 seconds
  • Query with extra predicate: 0.86 seconds
  • Query with extra predicate and forced order of predicate evaluation: 0.71 seconds

The query with the predicate arrangement matching the inmemory rewrite actually ran 13% faster than the original. Unfortunatly the rewrite without the ordered_predicates hint ran slower – which is a bit of a shame but understandable – the first predicate is the more complex, and then the code has to run a completely redundant second predicate; I was a little surprised at how much slower it was, but the table is 10M rows and we’re only looking at sub-second times anyway.

My table was fully cached and just under 112,000 blocks, so not very large, and this was running a serial query on a basic Oracle instance. Nevetheless there is a difference in execution time that is more than just “random noise” – If this is an indication of how a little unsightly tweaking of SQL for small data sets can make a difference, you can imagine that there might be a worthwhile benefit to considering ways of tweaking your predicates that make a significant difference to execution time if the extra predicates end up being pushed down to storage on an Exadata machine.

Footnote:

Another “not quite” example I happen to have written about a few months ago is a case where rewriting “not exists() OR not exists() OR not exists()” as “not (exists() AND exists() AND exists())” allowed Oracle to rewrite three subqueries as a single subquery with three-table join.

 

October 17, 2016

Fixed Stats

Filed under: Execution plans,Oracle,Statistics,Statspack,Troubleshooting — Jonathan Lewis @ 12:43 pm GMT Oct 17,2016

There are quite a lot of systems around the world that aren’t using the AWR (automatic workload repository) and ASH (active session history) tools to help them with trouble shooting because of the licensing requirement – so I’m still finding plenty of sites that are using Statspack and I recently came across a little oddity at one of these sites that I hadn’t noticed before: one of the Statspack snapshot statements was appearing fairly regularly in the Statspack report under the “SQL Ordered by Elapsed Time” section – even when the application had been rather busy and had generated lots of other work that was being reported. It was the following statement – the collection of file-level statistics:


select
       ts.name      tsname
     , df.name      filename
     , fs.phyrds
     , fs.phywrts
     , fs.readtim
     , fs.writetim
     , fs.singleblkrds
     , fs.phyblkrd
     , fs.phyblkwrt
     , fs.singleblkrdtim
     , fw.count     wait_count
     , fw.time      time
     , df.file#
  from x$kcbfwait   fw
     , v$filestat   fs
     , v$tablespace ts
     , v$datafile   df
 where ts.ts#    = df.ts#
   and fs.file#  = df.file#
   and fw.indx+1 = df.file#
;

The execution plan didn’t look very friendly, and the volume of I/O it generated (several hundred thousand disk reads) was surprising. The reason why the statement stood out so much in this case was that there was a fairly large number of files in the database (over 1,000) and the default execution plan was showing very bad cardinality estimates that resulted in highly inappropriate cartesian merge joins. At best the statement was taking around 2 minutes to run, at worst it was much, much worse.

This system was running 10g – also something which is still fairly common, though becoming much scarcer – which produced the following execution plan (which  I’ve recreated on a much smaller system):

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   535 |     2 (100)| 00:00:01 |
|*  1 |  HASH JOIN                    |                 |     1 |   535 |     2 (100)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN        |                 |     5 |  1190 |     0   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |                 |     1 |   199 |     0   (0)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN      |                 |     1 |   173 |     0   (0)| 00:00:01 |
|*  5 |      FIXED TABLE FULL         | X$KCCTS         |     1 |    43 |     0   (0)| 00:00:01 |
|   6 |      BUFFER SORT              |                 |     1 |   130 |     0   (0)| 00:00:01 |
|*  7 |       FIXED TABLE FULL        | X$KCFIO         |     1 |   130 |     0   (0)| 00:00:01 |
|*  8 |     FIXED TABLE FIXED INDEX   | X$KCCFE (ind:1) |     1 |    26 |     0   (0)| 00:00:01 |
|   9 |    BUFFER SORT                |                 |   100 |  3900 |     0   (0)| 00:00:01 |
|  10 |     FIXED TABLE FULL          | X$KCBFWAIT      |   100 |  3900 |     0   (0)| 00:00:01 |
|* 11 |   VIEW                        | GV$DATAFILE     |     1 |   297 |     1 (100)| 00:00:01 |
|  12 |    SORT ORDER BY              |                 |     1 |   957 |     1 (100)| 00:00:01 |
|  13 |     NESTED LOOPS              |                 |     1 |   957 |     0   (0)| 00:00:01 |
|  14 |      NESTED LOOPS             |                 |     1 |   647 |     0   (0)| 00:00:01 |
|  15 |       NESTED LOOPS            |                 |     1 |   371 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FULL       | X$KCCFN         |     1 |   323 |     0   (0)| 00:00:01 |
|* 17 |        FIXED TABLE FIXED INDEX| X$KCVFH (ind:1) |     1 |    48 |     0   (0)| 00:00:01 |
|* 18 |       FIXED TABLE FIXED INDEX | X$KCCFE (ind:1) |     1 |   276 |     0   (0)| 00:00:01 |
|* 19 |      FIXED TABLE FULL         | X$KCCFN         |     1 |   310 |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TSTSN"="TS#" AND "K"."KCFIOFNO"="FILE#" AND "FILE#"="FW"."INDX"+1)
   5 - filter("TSTSN"<>(-1) AND "INST_ID"=USERENV('INSTANCE'))
   7 - filter("K"."INST_ID"=USERENV('INSTANCE'))
   8 - filter("F"."FEDUP"<>0 AND "F"."FENUM"="K"."KCFIOFNO")
  11 - filter("INST_ID"=USERENV('INSTANCE'))
  16 - filter("FN"."FNNAM" IS NOT NULL AND "FN"."FNTYP"=4 AND BITAND("FN"."FNFLG",4)<>4)
  17 - filter("FN"."FNFNO"="FH"."HXFIL")
  18 - filter("FE"."FEDUP"<>0 AND "FN"."FNFNO"="FE"."FENUM" AND
              "FE"."FEFNH"="FN"."FNNUM" AND "FE"."FETSN"<>(-1))
  19 - filter("FE"."FEPAX"<>65535 AND "FE"."FEPAX"<>0 AND "FE"."FEPAX"="FNAUX"."FNNUM"
              OR ("FE"."FEPAX"=0 OR "FE"."FEPAX"=65535) AND "FE"."FENUM"="FNAUX"."FNFNO" AND
              "FNAUX"."FNTYP"=4 AND "FNAUX"."FNNAM" IS NOT NULL AND BITAND("FNAUX"."FNFLG",4)<>4 AND
              "FE"."FEFNH"="FNAUX"."FNNUM")

Note particularly the two Cartesian merge joins and the very late filter at operation 1.

Note also the number of times the cardinality estimate is 1 – always a bit of a threat when the query gets complicated: “anything goes following a one for Rows”.

The easy (first thought) solution was simply to gather stats on all the fixed objects in this query:


begin
        dbms_stats.gather_table_stats('sys','x$kcbfwait',method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccfe',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccfn',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccts',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kcfio',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kcvfh',   method_opt=>'for all columns size 1');
end;
/

The option to gather fixed objects stats individually with a call to dbms_stats.gather_table_stats() is not commonly known, but it does work.

Here’s the plan (again from the small system) after stats collection:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |    29 | 10411 |     3 (100)| 00:00:01 |
|*  1 |  HASH JOIN                     |                 |    29 | 10411 |     3 (100)| 00:00:01 |
|*  2 |   HASH JOIN                    |                 |     1 |   350 |     2 (100)| 00:00:01 |
|   3 |    MERGE JOIN CARTESIAN        |                 |    25 |  1325 |     0   (0)| 00:00:01 |
|   4 |     NESTED LOOPS               |                 |     4 |   148 |     0   (0)| 00:00:01 |
|*  5 |      FIXED TABLE FULL          | X$KCFIO         |   200 |  6200 |     0   (0)| 00:00:01 |
|*  6 |      FIXED TABLE FIXED INDEX   | X$KCCFE (ind:1) |     1 |     6 |     0   (0)| 00:00:01 |
|   7 |     BUFFER SORT                |                 |     7 |   112 |     0   (0)| 00:00:01 |
|*  8 |      FIXED TABLE FULL          | X$KCCTS         |     7 |   112 |     0   (0)| 00:00:01 |
|*  9 |    VIEW                        | GV$DATAFILE     |     1 |   297 |     1 (100)| 00:00:01 |
|  10 |     SORT ORDER BY              |                 |     1 |   316 |     1 (100)| 00:00:01 |
|  11 |      NESTED LOOPS              |                 |     1 |   316 |     0   (0)| 00:00:01 |
|  12 |       NESTED LOOPS             |                 |     1 |   248 |     0   (0)| 00:00:01 |
|  13 |        NESTED LOOPS            |                 |     1 |   226 |     0   (0)| 00:00:01 |
|* 14 |         FIXED TABLE FULL       | X$KCCFE         |     4 |   612 |     0   (0)| 00:00:01 |
|* 15 |         FIXED TABLE FIXED INDEX| X$KCCFN (ind:1) |     1 |    73 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FIXED INDEX | X$KCVFH (ind:1) |     1 |    22 |     0   (0)| 00:00:01 |
|* 17 |       FIXED TABLE FULL         | X$KCCFN         |     1 |    68 |     0   (0)| 00:00:01 |
|  18 |   FIXED TABLE FULL             | X$KCBFWAIT      |   400 |  3600 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("FILE#"="FW"."INDX"+1)
   2 - access("TSTSN"="TS#" AND "K"."KCFIOFNO"="FILE#")
   5 - filter("K"."INST_ID"=USERENV('INSTANCE'))
   6 - filter("F"."FEDUP"<>0 AND "F"."FENUM"="K"."KCFIOFNO")
   8 - filter("TSTSN"<>(-1) AND "INST_ID"=USERENV('INSTANCE'))
   9 - filter("INST_ID"=USERENV('INSTANCE'))
  14 - filter("FE"."FEDUP"<>0 AND "FE"."FETSN"<>(-1))
  15 - filter("FN"."FNTYP"=4 AND "FN"."FNNAM" IS NOT NULL AND BITAND("FN"."FNFLG",4)<>4
              AND "FN"."FNFNO"="FE"."FENUM" AND "FE"."FEFNH"="FN"."FNNUM")
  16 - filter("FN"."FNFNO"="FH"."HXFIL")
  17 - filter("FE"."FEPAX"<>65535 AND "FE"."FEPAX"<>0 AND "FE"."FEPAX"="FNAUX"."FNNUM" OR
              ("FE"."FEPAX"=0 OR "FE"."FEPAX"=65535) AND "FE"."FENUM"="FNAUX"."FNFNO" AND
              "FNAUX"."FNTYP"=4 AND "FNAUX"."FNNAM" IS NOT NULL AND BITAND("FNAUX"."FNFLG",4)<>4 AND
              "FE"."FEFNH"="FNAUX"."FNNUM")

Note the changes in cardinality estimates: they now look a little more realistic and we’re down to one cartesian merge join which (if you have a rough idea of what your X$ tables hold) still looks a little surprising at first sight but not completely unreasonable. A change of plan doesn’t necessarily mean much without the data and time behind it, of course, so here are the two sets of results from a 10g database with a handful of datafiles and tablespaces showing the Row Source Operation sections from the tkprof output before and after stats collection:

Before stats collection:

Rows     Row Source Operation
-------  ---------------------------------------------------
      6  HASH JOIN  (cr=0 pr=0 pw=0 time=1957860 us)
  16800   MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=33855662 us)
     42    NESTED LOOPS  (cr=0 pr=0 pw=0 time=73795 us)
   1400     MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=21555 us)
      7      FIXED TABLE FULL X$KCCTS (cr=0 pr=0 pw=0 time=3204 us)
   1400      BUFFER SORT (cr=0 pr=0 pw=0 time=7233 us)
    200       FIXED TABLE FULL X$KCFIO (cr=0 pr=0 pw=0 time=1210 us)
     42     FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=1859092 us)
  16800    BUFFER SORT (cr=0 pr=0 pw=0 time=67643 us)
    400     FIXED TABLE FULL X$KCBFWAIT (cr=0 pr=0 pw=0 time=2008 us)
      6   VIEW  GV$DATAFILE (cr=0 pr=0 pw=0 time=68087 us)
      6    SORT ORDER BY (cr=0 pr=0 pw=0 time=68065 us)
      6     NESTED LOOPS  (cr=0 pr=0 pw=0 time=65989 us)
      6      NESTED LOOPS  (cr=0 pr=0 pw=0 time=56632 us)
      6       NESTED LOOPS  (cr=0 pr=0 pw=0 time=47217 us)
      6        FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=19830 us)
      6        FIXED TABLE FIXED INDEX X$KCVFH (ind:1) (cr=0 pr=0 pw=0 time=25568 us)
      6       FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=9849 us)
      6      FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=9715 us)

After stats collection:


Rows     Row Source Operation
-------  ---------------------------------------------------
      6  HASH JOIN  (cr=0 pr=0 pw=0 time=196576 us)
      6   HASH JOIN  (cr=0 pr=0 pw=0 time=195829 us)
     42    MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=4390 us)
      6     NESTED LOOPS  (cr=0 pr=0 pw=0 time=7810 us)
    200      FIXED TABLE FULL X$KCFIO (cr=0 pr=0 pw=0 time=1224 us)
      6      FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=150150 us)
     42     BUFFER SORT (cr=0 pr=0 pw=0 time=1574 us)
      7      FIXED TABLE FULL X$KCCTS (cr=0 pr=0 pw=0 time=1353 us)
      6    VIEW  GV$DATAFILE (cr=0 pr=0 pw=0 time=41058 us)
      6     SORT ORDER BY (cr=0 pr=0 pw=0 time=41005 us)
      6      NESTED LOOPS  (cr=0 pr=0 pw=0 time=39399 us)
      6       NESTED LOOPS  (cr=0 pr=0 pw=0 time=34229 us)
      6        NESTED LOOPS  (cr=0 pr=0 pw=0 time=15583 us)
      6         FIXED TABLE FULL X$KCCFE (cr=0 pr=0 pw=0 time=1124 us)
      6         FIXED TABLE FIXED INDEX X$KCCFN (ind:1) (cr=0 pr=0 pw=0 time=15067 us)
      6        FIXED TABLE FIXED INDEX X$KCVFH (ind:1) (cr=0 pr=0 pw=0 time=18971 us)
      6       FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=5581 us)
    400   FIXED TABLE FULL X$KCBFWAIT (cr=0 pr=0 pw=0 time=1615 us)

The execution time has dropped from about 2 seconds to less than 1/10th of a second – and all I’ve got is 6 or 7 files and tablespaces.  (Some of the “time=” values in the first plan are very odd, but the final time figure is about right.)

Generating an intermediate result set of 16,800 rows for a tiny number of files is not a good thing – just imagine how big that number would get with 1,000 files and a couple of hundred tablespaces.

I have to say that (for a couple of tiny databases) the 11.2.0.4 and 12.1.0.2 systems I checked this query on had no problem and immediately picked a sensible path. It’s possible that the definition of some of the v$ objects has actually changed or that the optimizer features have changed (some assistence from complex view merging, perhaps) – but if you are still running Statspack, even if it’s on 11g or 12c, then it’s worth checking from time to time how much work goes into executing the snapshot and seeing if you need some fixed object stats to make things a little more efficient.

Footnote:

Several years ago I wrote a short note about how Statspack actually captured its own execution time (from 10g onwards) and how you could run a report on it to check the run time. It’s worth running that report from time to time. I’ve recently updated that note to add the equivalent query against the AWR.

October 11, 2016

#ThanksOTN

Filed under: Non-technical,Oracle — Jonathan Lewis @ 6:57 pm GMT Oct 11,2016

To mark the OTN Appreciation Day I’d like to offer this thought:

“Our favourite feature is execution plans … execution plans and rowsource execution statistics … rowsource execution statistics and execution plans …  our two favourite features and rowsource execution stats and execution plans … and ruthless use of SQL monitoring …. Our three favourite features are rowsource execution stats, execution plans, ruthless use of SQL monitoring and an almost fanatical devotion to the Cost Based Optimizer …. Our four … no … amongst our favourite features  are such elements as rowsource execution statistics, execution plans …. I’ll come in again.”

With apologies to Monty Python.

 

 

 

October 10, 2016

InMemory Bonus

Filed under: 12c,in-memory,Oracle — Jonathan Lewis @ 1:13 pm GMT Oct 10,2016

It should be fairly well known by now that when you enable the 12c InMemory (Columnar Store) option (and set the inmemory_size) your SQL may take advantage of a new optimizer transformation know as the Vector Transformation, including Vector Aggregation. You may be a little surprised to learn, though, that some of your plans may change even when they don’t produce any sign of a vector transformation as a consequence. This is because In-Memory Column Store isn’t just about doing tablescans very quickly it’s also about new code paths for doing clever things with predicates to squeeze all the extra benefits from the technology. Here’s an example:


rem
rem     Script:         12c_inmemory_surprise.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2016
rem
rem     Last tested
rem             12.1.0.2
rem

drop table t2 purge;
drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create index t1_n1   on t1(n1)   nologging;
create index t2_rand on t2(rand) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns (n1,n2) size 1'
        );
end;
/

There’s nothing particularly special about these two tables – I engineered them for a different demo, and the call to gather extended stats on the column group (n1, n2) is just a minor detail in getting better cardinality estimates in the upcoming plans. At this point I haven’t specified that either table should be in memory, so let’s see what plan I get from dbms_xplan.display_cursor() when I run a query that should do a hash join using t1 as the build table and t2 as the probe table:


select
        /*+
                qb_name(main)
        */
        count(*)
from    (
        select
                /*+ qb_name(inline) */
                distinct t1.v1, t2.v1
        from
                t1,t2
        where
                t1.n1 = 50
        and     t1.n2 = 50
        and     t2.rand = t1.id
        )
;

select * from table(dbms_xplan.display_cursor);

Plan hash value: 1718706536

-------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |       |       |       |  2441 (100)|          |
|   1 |  SORT AGGREGATE                         |       |     1 |       |            |          |
|   2 |   VIEW                                  |       | 10001 |       |  2441   (4)| 00:00:01 |
|   3 |    HASH UNIQUE                          |       | 10001 |   351K|  2441   (4)| 00:00:01 |
|*  4 |     HASH JOIN                           |       | 10001 |   351K|  2439   (4)| 00:00:01 |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | T1_N1 |   100 |       |     1   (0)| 00:00:01 |
|   7 |      TABLE ACCESS FULL                  | T2    |  1000K|    14M|  2416   (4)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   5 - filter("T1"."N2"=50)
   6 - access("T1"."N1"=50)

Thanks to the column group the optimizer has estimated correctly that the number of rows selected from t1 would be 100. Beyond that there’s very little exciting about this execution plan.

So let’s modify t2 to be in-memory and see how the plan changes as we re-execute the query:


alter table t2 inmemory;

select
        /*+ qb_name(main) */
        count(*)
...

select * from table(...);


Plan hash value: 106371239

----------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |         |       |       |   259 (100)|          |
|   1 |  SORT AGGREGATE                          |         |     1 |       |            |          |
|   2 |   VIEW                                   |         | 10001 |       |   259  (27)| 00:00:01 |
|   3 |    HASH UNIQUE                           |         | 10001 |   351K|   259  (27)| 00:00:01 |
|*  4 |     HASH JOIN                            |         | 10001 |   351K|   257  (26)| 00:00:01 |
|   5 |      JOIN FILTER CREATE                  | :BF0000 |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1      |   100 |  2100 |     3   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_N1   |   100 |       |     1   (0)| 00:00:01 |
|   8 |      JOIN FILTER USE                     | :BF0000 |  1000K|    14M|   234  (20)| 00:00:01 |
|*  9 |       TABLE ACCESS INMEMORY FULL         | T2      |  1000K|    14M|   234  (20)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   6 - filter("T1"."N2"=50)
   7 - access("T1"."N1"=50)
   9 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))

The cost of the tablescan drops dramatically as the optimizer assumes the table will be in the in-memory column store (IMCS) but (in manys way more significantly) we suddenly have a (serial) Bloom filter for a hash join – which eliminates (most of) the data that wouldn’t have survived the hash join without having to use the CPU that would normally be spent probing the build table.

This is an interesting example of what the in-memory code path can do for us. There’s no point in using the Bloom filter for a serial hash join in “classic” Oracle because the Bloom filter is basically the bitmap of the hash table that is the first thing Oracle examines when probing the hash table – but with “in-memory” Oracle there are some enhancements to the Bloom filter and the code path that make using the Bloom filter an effective strategy. Most significant, perhaps, is that the in-memory code path can use SIMD instructions to perform multiple probes from t2 simultaneously, so not only do we get the benefits of avoiding disk access, buffer activity and row-by-row access to columns, we also reduce the CPU time spent on making the first-stage comparisons of the hash join. (And shared columnar dictionaries in 12.2 could reduce this even further!)

Footnote: I also have a note I scribbled dowsn at the Trivadis performance days last month that the Bloom filter used with IMCS carries the actual low and high values from the build table.  I may have misinterpreted this as I wrote it, but if that’s correct then it’s another step in eliminating data very quickly and very early when using IMCS (or Exadata if the same feature exists in the Bloom filters that get pushed to the storage servers.)

 

October 6, 2016

My session workload

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 1:19 pm GMT Oct 6,2016

My old website (www.jlcomp.demon.co.uk) will be disappearing in a couple of weeks – but there are a couple of timeless articles on it that are worth saving and although the popularity of this one has probably been surpassed by Tanel Poder’s Snapper script, or other offerings by Tom Kyte or Adrian Billington, it’s still one of those useful little things to have around – it’s a package to takes a snapshot of your session stats.

The package depends on a view created in the SYS schema, and the package itself has to be installed in the SYS schema – which is why other strategies for collecting the information have become more popular; but if you want to have it handy, here are the two scripts:

rem
rem     Script:         c_mystats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem     Purpose:        Put names to v$mystat
rem
rem     Last tested
rem             12.1.0.2        -- naming issue
rem             11.2.0.4
rem             10.2.0.5
rem             10.1.0.4
rem              9.2.0.8
rem              8.1.7.4
rem
rem     Notes:
rem     Should be run by SYS - which means it has to be re-run
rem     on a full database export/import
rem
rem     It looks as if it is illegal to create a view with a
rem     name starting with v$ in the sys account as from 12c.
rem     (ORA-00999: invalid view name). Hence the JV$ name.
rem
rem     But you can create a public synonym starting "v$"
rem

create or replace view jv$my_stats
as
select
        /*+
                first_rows
                ordered
        */
        ms.sid,
        sn.statistic#,
        sn.name,
        sn.class,
        ms.value
from
        v$mystat        ms,
        v$statname      sn
where
        sn.statistic# = ms.statistic#
;

drop public synonym v$my_stats;
create public synonym v$my_stats for jv$my_stats;
grant select on v$my_stats to public;

rem
rem	Script:		snap_myst.sql
rem	Author:		Jonathan Lewis
rem	Dated:		March 2001
rem	Purpose:	Package to get snapshot start and delta of v$mystat
rem
rem	Last tested
rem		12.1.0.2
rem		11.2.0.4
rem		10.2.0.5
rem		10.1.0.4
rem		 9.2.0.8
rem		 8.1.7.4
rem
rem	Notes
rem	Has to be run by SYS to create the package
rem	Depends on view (j)v$my_stats (see c_mystats.sql)
rem
rem	Usage:
rem		set serveroutput on size 1000000 format wrapped
rem		set linesize 120
rem		set trimspool on
rem		execute snap_my_stats.start_snap
rem		-- do something
rem		execute snap_my_stats.end_snap
rem

create or replace package snap_my_stats as
	procedure start_snap;
	procedure end_snap (i_limit in number default 0);
end;
/

create or replace package body snap_my_stats as

cursor c1 is
	select 
		statistic#, 
		name,
		value
	from 
		v$my_stats
	where
		value != 0
	;


	type w_type is table of c1%rowtype index by binary_integer;
	w_list		w_type;
	empty_list	w_type;

	m_start_time	date;
	m_start_flag	char(1);
	m_end_time	date;

procedure start_snap is
begin

	m_start_time := sysdate;
	m_start_flag := 'U';
	w_list := empty_list;

	for r in c1 loop
		w_list(r.statistic#).value := r.value;
	end loop;

end start_snap;


procedure end_snap (i_limit in number default 0) 
is
begin

	m_end_time := sysdate;

	dbms_output.put_line('---------------------------------');

	dbms_output.put_line('Session stats - ' ||
				to_char(m_end_time,'dd-Mon hh24:mi:ss')
	);

	if m_start_flag = 'U' then
		dbms_output.put_line('Interval:-  '  || 
				trunc(86400 * (m_end_time - m_start_time)) ||
				' seconds'
		);
	else
		dbms_output.put_line('Since Startup:- ' || 
				to_char(m_start_time,'dd-Mon hh24:mi:ss')
		);
	end if;

	if (i_limit != 0) then
		dbms_output.put_line('Lower limit:-  '  || i_limit);
	end if;

	dbms_output.put_line('---------------------------------');

	dbms_output.put_line(
		rpad('Name',60) ||
		lpad('Value',18)
	);

	dbms_output.put_line(
		rpad('----',60) ||
		lpad('-----',18)
	);

	for r in c1 loop
		if (not w_list.exists(r.statistic#)) then
		    w_list(r.statistic#).value := 0;
		end if;

		if (
		       (r.value > w_list(r.statistic#).value + i_limit)
		) then
			dbms_output.put(rpad(r.name,60));
			dbms_output.put(to_char(
				r.value - w_list(r.statistic#).value,
					'9,999,999,999,990')
			);
			dbms_output.new_line;
		end if;
	end loop;

end end_snap;

begin
	select
		logon_time, 'S'
	into
		m_start_time, m_start_flag
	from
		v$session
	where
		sid = 	(
				select /*+ no_unnest */ sid 
				from v$mystat 
				where rownum = 1
			);

end snap_my_stats;
/

drop public synonym snap_my_stats;
create public synonym snap_my_stats for snap_my_stats;
grant execute on snap_my_stats to public;

One point to be cautious about with this package: do not embed it inside anonymous pl/sql blocks, e.g.

begin

        snap_my_stats.start_snap;

        -- some other code

        snap_my_stats.end_snap;

end;
/

There are some statistics in v$my_stats (v$mystat / v$sesstat) which are not updated until the end of a database call – and calling an anonymous pl/sql block counts as a single database call, so some of your statistics (for example “CPU used by this session”) will report misleading values.

October 3, 2016

Kill CPU

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 8:58 am GMT Oct 3,2016

My old website (www.jlcomp.demon.co.uk) will be disappearing in a couple of weeks – but there are a couple of timeless articles on it that are worth saving and a method for soaking up all the CPU on your system with a simple SQL statement against a small data set is, surely, one of them. Here, then is a little script that I wrote (or, at least, formalised) 15 years ago to stress out a CPU:


rem
rem     Script:         kill_cpu.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem     Purpose:        Exercise CPU and latches
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.1.0.7
rem             10.2.0.5
rem             10.1.0.4
rem              9.2.0.8
rem              8.1.7.4
rem
rem     Notes:
rem     The count(*) will return power((2,n-1))
rem     To run from Oracle 9 and later we have to set parameter
rem             _old_connect_by_enabled = true;
rem
rem     Base calculation (historical):
rem     ==============================
rem     Rule of thumb - ca. 10,000 logical I/Os per sec per 100 MHz of CPU.
rem     (Modern versions of Oracle on modern CPUs - about twice that, maybe
rem     due to the introduction of the "fastpath" consistent gets with the
rem     elimination of some logging activity that used to exist.)
rem
rem     With the value of 23 shown we do 6M buffer visits of which 4M
rem     are "conistent read gets", and 2M are "buffer is pinned count".
rem     (That's power(2,23-1) and power(2,23-2) respectively). For each
rem     row you add to the kill_cpu table you double the run-time.
rem
rem      This is an example of SQL that can take MUCH longer when run
rem      with rowsource_execution_statistics enabled. Mostly spent on 
rem      CPU calling the O/S timer. (On my last test, using a 12c VM
rem      the time jumped from 6 seconds - 23 rows - to 75 seconds when
rem      I set statistics_level to all; but half would be the effect of
rem      running through the VM.)
rem

drop table kill_cpu;

begin

        begin           execute immediate 'purge recyclebin';
        exception       when others then null;
        end;

        begin           execute immediate 'alter session set "_old_connect_by_enabled"=true';
        exception       when others then null;
        end;

end;
/

create table kill_cpu(n, primary key(n))
organization index
as
select  rownum n
from    all_objects
where   rownum <= 23
;
execute snap_my_stats.start_snap 

set timing on 
set serveroutput off

-- alter session set statistics_level = all;

spool kill_cpu

select  count(*) X
from    kill_cpu 
connect by 
        n > prior n
start with 
        n = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

set serveroutput on
execute snap_my_stats.end_snap

spool off

set timing off
alter session set statistics_level = typical;

The calls to snap_my_stats use a package (owned by sys) that I wrote a long time ago for taking a snapshot of v$mystats; many people use Tanel Poder’s “Snapper” script or Tom Kyte’s script instead.

September 29, 2016

IOT limitation

Filed under: Execution plans,Infrastructure,IOT,Oracle — Jonathan Lewis @ 10:17 am GMT Sep 29,2016

In the right circumstances Index Organized Tables (IOTs) give us tremendous benefits – provided you use them in the ideal fashion. Like so many features in Oracle, though, you often have to compromise between the benefit you really need and the cost of the side effect that a feature produces.

The fundamental design targets for an IOT are that you have short rows and only want to access them through index range scans of primary key. The basic price you pay for optimised access is the extra work you have to do as you insert the data. Anything you do outside the two specific targets is likely to lead to increased costs of using the IOT – and there’s one particular threat that I’ve mentioned twice in the past (here and here). I want to mention it one more time with a focus on client code and reporting.


create table iot1 (
        id1     number(7.0),
        id2     number(7.0),
        v1      varchar2(10),
        v2      varchar2(10),
        padding varchar2(500),
        constraint iot1_pk primary key(id1, id2)
)
organization index
including id2
overflow
;

insert into iot1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum,311)                 id1,
        mod(rownum,337)                 id2,
        to_char(mod(rownum,20))         v1,
        to_char(trunc(rownum/100))      v2,
        rpad('x',500,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 ; commit; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          => 'IOT1'
                method_opt       => 'for all columns size 1'
        );
end;
/

alter system flush buffer_cache;

select table_name, blocks from user_tables where table_name = 'IOT1' or table_name like 'SYS_IOT_OVER%';
select index_name, leaf_blocks from user_indexes where table_name = 'IOT1';

set autotrace traceonly
select max(v2) from iot1;
set autotrace off

I’ve created an index organized table with an overflow. The table definition places all columns after the id2 column into the overflow segment. After collecting stats I’ve then queried the table with a query that, for a heap table, would produce a tablescan as the execution plan. But there is no “table”, there is only an index for an IOT. Here’s the output I get (results from 11g and 12c are very similar):

TABLE_NAME               BLOCKS
-------------------- ----------
SYS_IOT_OVER_151543        8074
IOT1

INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
IOT1_PK                      504

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |     4 | 99793   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE       |         |     1 |     4 |            |          |
|   2 |   INDEX FAST FULL SCAN| IOT1_PK |   100K|   390K| 99793   (1)| 00:00:04 |
---------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     100376  consistent gets
       8052  physical reads

The index segment has 504 leaf blocks, the overflow segment has 8,074 used blocks below the high water mark. The plan claims an index fast full scan of the index segment – but the physical reads statistic looks more like a “table” scan of the overflow segment. What’s actually happening ?

The 100,000+ consistent reads should tell you what’s happening – we really are doing an index fast full scan on the index segment, and for each index entry we go to the overflow segment to find the v2 value. Oracle doesn’t have a mechanism for doing a “tablescan” of just the overflow segment – even though the definition of the IOT ought (apparently) to be telling Oracle exactly which columns are in the overflow.

In my particular test Oracle reported a significant number of “db file scattered read” waits against the overflow segment, but these were for “prefetch warmup”; in a normal system with a buffer cache full of other data this wouldn’t have happened. The other interesting statistic that showed up was “table fetch continued row” – which was (close to) 100,000, again highlighting that we weren’t doing a normal full tablescan.

In terms of normal query processing this anomaly of attempted “tablescans” being index driven probably isn’t an issue but, as I pointed out in one of earlier posts on the topic, when Oracle gathers stats on the “table” it will do a “full tablescan”. If you have a very large table with an overflow segment it could be a very slow process – especially if you’ve engineered the IOT for the right reason, viz: the data arrives in the wrong order relative to the order you want to query it, and you’ve kept the rows in the IOT_TOP short by dumping the rarely used data in the overflow. With this in mind you might want to make sure that you write a bit of special code that gathers stats only on the columns you know to be in the IOT_TOP, creates representative numbers for the other columns, then locks the stats until the next time you want to refresh them.

 

September 26, 2016

PK Histogram

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 10:25 am GMT Sep 26,2016

One of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t need a histogram on a single column unique/primary key”.

Not only can a histogram be helpful on a column that’s declared to hold unique values, the optimizer may even spot the need automatically. It’s a little unusual (and probably the result of poor programming practice) but it does happen. Here’s an example demonstrating the principle:


rem
rem     Script:         pk_histogram.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.1.0.7
rem             10.2.0.5
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum  id
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into t1 select id + 1e6 from t1;
insert into t1 select id + 1e7 from t1;

alter table t1 add constraint t1_pk primary key(id);

select
        /*+ dynamic_sampling(0) */
        *
from    t1
where
        id between 12000 and 13000
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1'
        );
end;
/


select
        column_name, sample_size,
        num_distinct, num_nulls, density,
        histogram, num_buckets
from   
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
;

I’ve created a small data set some large gaps in the ranges of values used, then queried the data with a range-based predicate that will return no rows. After that I’ve let Oracle do it’s own thing with gathering stats (I think all the preferences are at their defaults). This is the result I got:

COLUMN_NAME          SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS
-------------------- ----------- ------------ ---------- ---------- --------------- -----------
ID                          5555        40000          0 .000024752 HEIGHT BALANCED         254

Oracle will gather a histogram automatically if it can see that the distribution of the data you are querying is skewed; but people tend to think of “skewed” as meaning there is wide variation in the number of rows for a given value and forget that it can also mean a variation in the number of rows within a given size of range. All I’ve done with my example is cue Oracle to the fact that I have big gaps in the range of known values – so it has produced a histogram that let’s it know where the gaps are.

Note: The results above are from 11.2.0.4, in 12.1.0.2 I got a HYBRID histogram, on 10.2.0.5 the sample size was 40,000

Next Page »

Blog at WordPress.com.