Oracle Scratchpad

September 12, 2018

Column Stats

Filed under: 12c,extended stats,Oracle,Statistics — Jonathan Lewis @ 1:46 pm BST Sep 12,2018

A little while ago I added a postscript about gathering stats on a virtual column to a note I’d written five years ago and then updated with a reference to a problem on the Oracle database forum that complained that stats collection had taken much longer after the addition of a function-based index. The problem related to the fact that the function-based index was supported by a virtual column that used an instr() function on a CLOB (XML) column – and gathering stats on the virtual column meant applying the function to every CLOB in the table.

So my post-script, added about a month ago, suggested adding a preference (dbms_stats.set_table_prefs) to avoid gathering stats on that column. There’s a problem with this suggestion – it doesn’t work

Oracle doesn’t play nicely when you try to limit the stats collection to a few columns – even in version 18.3. Here’s a demonstration of the effect. First we create a table that includes a column group (extended stats), a virtual column, and a function-based index – i.e. the three different ways of generating user-related virtual columns.


rem
rem     Script:         stats_struggle_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2018
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad(rownum,10,'0')             v2
from
        generator       v1
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

execute dbms_stats.delete_table_stats(user,'t1')

begin
        dbms_output.put_line(
                dbms_stats.create_extended_stats(
                        ownname         => user,
                        tabname         => 'T1',
                        extension       => '(v1, v2)'
                )
        );
end;
/

alter table t1 add id_12 
        generated always as (mod(id,12)) virtual
;

create index t1_id on t1(mod(id,10));


Since I’ve run this on 12c and 18c I’ve included a call to delete table stats after creating the table. So the next step is to enable SQL trace and see what Oracle does under the covers when we try to gather stats on just a couple of columns in the table:


alter session set events '10046 trace name context forever';

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 't1',
                method_opt  => 'for columns size 1 id v1',
                cascade     => false
        );
end;
/

alter session set events '10046 trace name context off';

column column_name  format a32
column data_default format a32

select 
        column_name, data_default,
        num_nulls, num_distinct, to_char(last_analyzed,'hh24:mi:ss') gathered
from    user_tab_cols 
where   table_name = 'T1' 
order by 
        internal_column_id
;

COLUMN_NAME                      DATA_DEFAULT                      NUM_NULLS NUM_DISTINCT GATHERED
-------------------------------- -------------------------------- ---------- ------------ --------
ID                                                                         0        10000 16:13:12
V1                                                                         0        10000 16:13:12
V2
SYS_STUIBQVZ_50PU9_NIQ6_G6_2Y7   SYS_OP_COMBINED_HASH("V1","V2")
ID_12                            MOD("ID",12)
SYS_NC00006$                     MOD("ID",10)

According to the output of the last query we’ve gathered stats only on the two columns specified. But have we really avoided the work ? Here, with some cosmetic tidying, is the SQL executed by the package:

select 
        /*+
                full(t) no_parallel(t) no_parallel_index(t) dbms_stats
                cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
                xmlindex_sel_idx_tbl no_substrb_pad 
         */
        to_char(count(ID)),
        substrb(dump(min(ID),16,0,64),1,240),
        substrb(dump(max(ID),16,0,64),1,240),
        to_char(count(V1)),
        substrb(dump(min(V1),16,0,64),1,240),
        substrb(dump(max(V1),16,0,64),1,240),
        to_char(count(V2)),
        to_char(count(SYS_STUIBQVZ_50PU9_NIQ6_G6_2Y7)),
        to_char(count(ID_12)),
        to_char(count(SYS_NC00006$))
from
        TEST_USER.T1 t  /* NDV,NIL,NIL,NDV,NIL,NIL,ACL,ACL,ACL,ACL*/

We can see that Oracle has done a count(), min() and max() on id and v1, and the “comment” at the end of the text tells us that it’s applied the approximate_ndv mechanism to the first two columns queried but not the rest. However it has count()ed all the other columns – which means it’s evaluated their underlying expressions. So if you were hoping that limiting the columns gathered would avoid a really expensive function call, bad luck.

Threat / Bug alert

A further irritation showed up when I ran a test case that used a deterministic PL/SQL function to generate a virtual column: in 12.1.0.2 the function was called once per row (possibly because every row had a different value) whether or not it was in the list of columns for gathering stats; in 18.3 the function was called nearly twice per row when I didn’t specificy stats gathering for the column and nearly 4 times per row when I did. This looks like it might be a change (possibly accidental) to how deterministic functions can cache their inputs and outputs – possibly something as “minor” as the size of the cache. To be continued when time permits …

 

 

September 10, 2018

Stats time

Filed under: 12c,Oracle,Statistics — Jonathan Lewis @ 1:37 pm BST Sep 10,2018

I wrote a note a couple of years ago explaining how I used to get a rough idea (with some errors) of how much time was spent in the overnight stats collection by each object. One of the nice little enhancements that appeared in 12c was the appearance of a couple of functions that can report information about this type of thing, and more. These are the dbms_stats function report_stats_operations() and report_single_stats_operation() with the following definitions:


function report_stats_operations(
        detail_level  varchar2                  default 'TYPICAL',
        format        varchar2                  default 'TEXT', 
        latestN       number                    default null,
        since         timestamp with time zone  default null,
        until         timestamp with time zone  default null,
        auto_only     boolean                   default false,
        container_ids dbms_utility.number_array default dbms_stats.NULL_NUMTAB
) return clob;

function report_single_stats_operation(
        opid         number,
        detail_level varchar2 default 'TYPICAL', 
        format       varchar2 default 'TEXT', 
        container_id number   default null
) return clob;

As you can see, there are lots of options to generating the report of stats operations, and you can check the manuals or $ORACLE_HOME/rdbms/admin/dbmsstat.sql for information about how you can use it. One of the simplest options would be to run from SQL*Plus:

set long 1000000

set pagesize    0
set linesize  255
set trimspool on

column text_line format a254

select
        dbms_stats.report_stats_operations(
                since => sysdate - 3
        ) text_line
from dual
;

Of course you wouldn’t be able to pick the option that limited the report to just the auto gather stats jobs (auto_only => true) as SQL doesn’t a boolean type, so you would have to write a little PL/SQL wrapper to capture just those details. Here’s a sample of the (rather wide) output:


select
        dbms_stats.report_single_stats_operation(25809) text_line
from dual
;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Operation Id | Operation             | Target                             | Start Time          | End Time            | Status      | Total Tasks | Successful Tasks | Failed Tasks | Active Tasks |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25811        | purge_stats           |                                    | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 0           | 0                | 0            | 0            |
|              |                       |                                    | 01.47.37.764146 PM  | 01.47.38.405437 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25810        | purge_stats           |                                    | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 0           | 0                | 0            | 0            |
|              |                       |                                    | 01.47.35.827284 PM  | 01.47.37.763926 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25809        | gather_database_stats | AUTO                               | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 285         | 282              | 3            | 0            |
|              | (auto)                |                                    | 01.46.31.672033 PM  | 01.47.35.826873 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25807        | gather_table_stats    | TEST_USER.T1                       | 08-SEP-18           | 08-SEP-18           | COMPLETED   | 1           | 1                | 0            | 0            |
|              |                       |                                    | 12.59.57.704111 PM  | 12.59.57.822695 PM  |             |             |                  |              |              |
|              |                       |                                    | +01:00              | +01:00              |             |             |                  |              |              |

etc.

You’ll notice in this little sample that operation 25809 is an (auto) gather_database_stats operation which ran 285 tasks, failing on 3 and succeeding on 282 – so lets run the “single stats operation” report to find out more.


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Operation Id | Operation                    | Target | Start Time                      | End Time                        | Status    | Total Tasks | Successful Tasks | Failed Tasks | Active Tasks |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25809        | gather_database_stats (auto) | AUTO   | 08-SEP-18 01.46.31.672033 PM    | 08-SEP-18 01.47.35.826873 PM    | COMPLETED | 285         | 282              | 3            | 0            |
|              |                              |        | +01:00                          | +01:00                          |           |             |                  |              |              |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                                                                                                                                     |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|                                                                                              T A S K S                                                                                              |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | Target                                                         | Type            | Start Time                          | End Time                            | Status                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.RECYCLEBIN$                                                | TABLE           | 08-SEP-18 01.46.50.719791 PM +01:00 | 08-SEP-18 01.46.51.882418 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.RECYCLEBIN$_OBJ                                            | INDEX           | 08-SEP-18 01.46.51.273134 PM +01:00 | 08-SEP-18 01.46.51.773297 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.RECYCLEBIN$_TS                                             | INDEX           | 08-SEP-18 01.46.51.777032 PM +01:00 | 08-SEP-18 01.46.51.787730 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
...
...
...
|    | SYS.WRH$_SEG_STAT_PK.WRH$_SEG_ST_3089296639_5150               | INDEX PARTITION | 08-SEP-18 01.47.35.409615 PM +01:00 | 08-SEP-18 01.47.35.483637 PM +01:00 | COMPLETED                  |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.X$LOGMNR_CONTENTS                                          | TABLE           | 08-SEP-18 01.47.35.520504 PM +01:00 | 08-SEP-18 01.47.35.696953 PM +01:00 | FAILED                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.X$LOGMNR_REGION                                            | TABLE           | 08-SEP-18 01.47.35.699253 PM +01:00 | 08-SEP-18 01.47.35.722545 PM +01:00 | FAILED                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|    | SYS.X$DRC                                                      | TABLE           | 08-SEP-18 01.47.35.725003 PM +01:00 | 08-SEP-18 01.47.35.801384 PM +01:00 | FAILED                     |    |
|    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
|                                                                                                                                                                                                     |
|                                                                                                                                                                                                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I’ve trimmed out most of the 285 entries, of course, showing that the last three in the list failed; but with no indication why they failed. Fortunately we could have called the report with “detail_level => ‘ALL'” – so let’s see what that gives us:

select
        dbms_stats.report_single_stats_operation(
                opid         => 25809,
                detail_level => 'ALL'
        ) text_line
from dual
;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Operation | Operation             | Target | Start Time      | End Time        | Status    | Total    | Successful | Failed   | Active   | Job Name | Session  | Additional Info             |
| Id        |                       |        |                 |                 |           | Tasks    | Tasks      | Tasks    | Tasks    |          | Id       |                             |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 25809     | gather_database_stats | AUTO   | 08-SEP-18       | 08-SEP-18       | COMPLETED | 285      | 282        | 3        | 0        |          | 250      | Parameters: [block_sample:  |
|           | (auto)                |        | 01.46.31.672033 | 01.47.35.826873 |           |          |            |          |          |          |          | FALSE] [cascade: NULL]      |
|           |                       |        | PM +01:00       | PM +01:00       |           |          |            |          |          |          |          | [concurrent: FALSE]         |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [degree:                    |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_DEGREE_VALUE]       |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [estimate_percent:          |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_ESTIMATE_PERCENT]   |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [granularity:               |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_GRANULARITY]        |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [method_opt:                |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DEFAULT_METHOD_OPT]         |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [no_invalidate:             |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | DBMS_STATS.AUTO_INVALIDATE] |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [reporting_mode: FALSE]     |
|           |                       |        |                 |                 |           |          |            |          |          |          |          | [stattype: DATA]            |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                                                                                                                                     |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|                                                                                              T A S K S                                                                                              |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|       | Target       | Type         | Start Time   | End Time     | Status    | Rank  | Job Name | Estimated    | Batching     | Histogram    | Extended     | Reason Code  | Additional   |        |
|       |              |              |              |              |           |       |          | Cost         | Info         | Columns      | Stats        |              | Info         |        |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|       | SYS.RECYCLEB | TABLE        | 08-SEP-18 01 | 08-SEP-18 01 | COMPLETED | 1     |          | N/A          | N/A          |              |              | stale stats  |              |        |
|       | IN$          |              | .46.50.71979 | .46.51.88241 |           |       |          |              |              |              |              |              |              |        |
|       |              |              | 1 PM +01:00  | 8 PM +01:00  |           |       |          |              |              |              |              |              |              |        |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
...
...
...
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|       | SYS.X$DRC    | TABLE        | 08-SEP-18 01 | 08-SEP-18 01 | FAILED    | 151   |          | N/A          | N/A          |              |              | no stats     | ORA-20000:   |        |
|       |              |              | .47.35.72500 | .47.35.80138 |           |       |          |              |              |              |              |              | Unable to    |        |
|       |              |              | 3 PM +01:00  | 4 PM +01:00  |           |       |          |              |              |              |              |              | analyze      |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | TABLE "SYS". |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | "X$DRC", log |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | miner or     |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | data guard   |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | must be      |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | started      |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | before       |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | analyzing    |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | this fixed   |        |
|       |              |              |              |              |           |       |          |              |              |              |              |              | table"       |        |
|       --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        |
|                                                                                                                                                                                                     |
|                                                                                                                                                                                                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------




So we can now see that stats collection failed on the one object I’ve left in the extract because it’s an X$ object that only exists when LogMiner is running. You’ll notice that the we also get some information about things like input parameters to calls and reasons why objects were selected (“stale stats” in the first item in this list).

It’s a great convenience – but it’s always possible to grumble: I’d rather like to see the elapsed time for each operation, or even a filter to limit the report to any operation that took more than X seconds. However, if I want to do a quick check on a client site I’d rather not have to type in the code to query the base tables by hand.

September 5, 2018

Subquery Order

Filed under: 12c,Execution plans,Hints,Oracle,subqueries — Jonathan Lewis @ 1:09 pm BST Sep 5,2018

From time to time I’ve wanted to optimize a query by forcing Oracle to execute existence (or non-existence) subqueries in the correct order because I know which subquery will eliminate most data most efficiently, and it’s always a good idea to look for ways to eliminate early. I’ve only just discovered (which doing some tests on 18c) that Oracle 12.2.0.1 introduced the /*+ order_subq() */ hint that seems to be engineered to do exactly that.

Here’s a very simple (and completely artificial) demonstration of use.


rem
rem     Script:         122_order_subq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2018
rem

create table t1 as select * from all_objects;
create table t2 as select * from all_objects;
create table t3 as select * from all_objects;

create index t2_i1 on t2(object_id);
create index t3_i1 on t3(object_id);

prompt  =============================
prompt  order_subq(@main subq2 subq3)
prompt  =============================

explain plan for
select
        /*+
                qb_name(main)
                no_unnest(@subq2)
                no_unnest(@subq3)
                order_subq(@main subq2 subq3)
        */
        t1.object_name, t1.object_type
from
        t1
where
        exists (
                select
                        /*+ qb_name(subq2) */
                        null
                from    t2
                where   t2.object_id = t1.object_id * 5
        )
and     exists (
                select
                        /*+ qb_name(subq3) */
                        null
                from    t3
                where   t3.object_id = t1.object_id * 13
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

=============================
order_subq(@main subq2 subq3)
=============================

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2585036931

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    53 | 51090   (1)| 00:00:02 |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1    | 61765 |  3196K|   163   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T2_I1 |     1 |     5 |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | T3_I1 |     1 |     5 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
...
      ORDER_SUBQ(@"MAIN" "SUBQ2" "SUBQ3")
...
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST QB_NAME ("SUBQ2") */ 0 FROM
              "T2" "T2" WHERE "T2"."OBJECT_ID"=:B1*5) AND  EXISTS (SELECT /*+
              NO_UNNEST QB_NAME ("SUBQ3") */ 0 FROM "T3" "T3" WHERE
              "T3"."OBJECT_ID"=:B2*13))
   3 - access("T2"."OBJECT_ID"=:B1*5)
   4 - access("T3"."OBJECT_ID"=:B1*13)

I’ve blocked subquery unnesting for the purposes of the demo and given a query block name to the two subqueries (using a name that identifies the associated table). As you can see, the execution plan uses the subqueries as filter subqueries, operating them in the order I’ve specified in my hint. You can also see that the hint is echoed down into the Outline section of the plan.

It’s possible that this is the plan that the optimizer would have chosen without the order_subq hint, so I ought to see if I can also use the hint to make the subqueries filter in the oppostie order. I happen to know that executing the subquery against t3 is likely to eliminate more rows that executing the subquery against t2. (The “* 13” compared to the “* 5” is significant) so I really want the subqueries to be used in the opposite order anyway – so here’s what happens when I reverse the order in the hint:


prompt  =============================
prompt  order_subq(@main subq3 subq2)
prompt  =============================

explain plan for
select
        /*+
                qb_name(main)
                no_unnest(@subq2)
                no_unnest(@subq3)
                order_subq(@main subq3 subq2)
        */
        t1.object_name, t1.object_type
from
        t1
where
        exists (
                select
                        /*+ qb_name(subq2) */
                        null
                from    t2
                where   t2.object_id = t1.object_id * 5
        )
and     exists (
                select
                        /*+ qb_name(subq3) */
                        null
                from    t3
                where   t3.object_id = t1.object_id * 13
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

=============================
order_subq(@main subq2 subq3)
=============================

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3585049451

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    53 | 51090   (1)| 00:00:02 |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1    | 61765 |  3196K|   163   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T3_I1 |     1 |     5 |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | T2_I1 |     1 |     5 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
...
      ORDER_SUBQ(@"MAIN" "SUBQ3" "SUBQ2")
...
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST QB_NAME ("SUBQ3") */ 0 FROM
              "T3" "T3" WHERE "T3"."OBJECT_ID"=:B1*13) AND  EXISTS (SELECT /*+
              NO_UNNEST QB_NAME ("SUBQ2") */ 0 FROM "T2" "T2" WHERE
              "T2"."OBJECT_ID"=:B2*5))
   3 - access("T3"."OBJECT_ID"=:B1*13)
   4 - access("T2"."OBJECT_ID"=:B1*5)

With the modified hint in place the order of the filter subqueries is reversed. Notice how the Predicate section also echoes the ordering of the subqueries.

Footnote

It should be noted that the order_subq() hint doesn’t get mentioned in the 18c SQL Language Reference “Alphabetical List of Hints”. If it were then one of the little oddities that might get a mention is that the optimizer seems to ignore the hint if you disable CPU costing. (not that anyone should be doing that since 10g).

August 29, 2018

RTFM ?

Filed under: Non-technical,Philosophy — Jonathan Lewis @ 11:04 am BST Aug 29,2018

My entrance at the Polish Oracle User Group conference 2017 has just resurfaced on Twitter. There is a back-story to this which contains an allegorical lesson in using Oracle. As I said in the opening remarks in the subsequent presentation: “The lesson began before I got to the stage”.

Like all the other speakers at POUG2017 I had received an email asking me to name a tune for a playlist. Having decided that Beethoven, Brahms and Tchaikowski were not the sort of composers the organisers had in mind I nominated (with a touch of irony) “Who wants to live forever?” by Queen (despite Richard Foote’s strenuous efforts to turn the Oracle world to David Bowie).

When the conference started I noticed two things: first, that the tunes for the “playlist” were actually being used to accompany speakers to the stage, secondly that the admin staff were all wearing monk-like gowns and hoods. So I asked for my tune to be changed to “The Imperial March” (Darth Vader’s theme) and borrowed the gown from the tallest admin person, with the results you see in the video clip.

So what’s the Oracle allegory ?

First you read the manuals, then you observe how it really works before you go live.

 

August 24, 2018

Error Logging

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 11:19 am BST Aug 24,2018

Error logging is a topic that I’ve mentioned a couple of times in the past, most recently as a follow-up in a discussion of the choices for copying a large volume of data from one table to another, but originally in an addendum about a little surprise you may get when you use extended strings (max_string_size = EXTENDED).

If you use the default call to dbms_errlog.create_error_log() to create an error logging table then Oracle will create a table with a few columns of its own plus every column (name) that you have in your original table – but it will create your columns as varchar2(4000), or nvarchar2(2000), or raw(2000) – unless you’ve set the max_string_size to extended.  Here’s a simple  demo script with results from two different systems, one with the default setting the other with the extended setting (note, there’s a little inconsistency in handling raw() columns.


rem
rem     Script:         log_errors_min.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem

create table t1 (
        v1      varchar2(10),
        n1      number(2,0),
        d1      date,
        nv1     nvarchar2(10),
        r1      raw(10)
);


execute dbms_errlog.create_error_log('t1')

desc err$_t1


max_string_size = STANDARD
--------------------------
 Name			       Null?	Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$			NUMBER
 ORA_ERR_MESG$				VARCHAR2(2000)
 ORA_ERR_ROWID$ 			ROWID
 ORA_ERR_OPTYP$ 			VARCHAR2(2)
 ORA_ERR_TAG$				VARCHAR2(2000)
 V1					VARCHAR2(4000)
 N1					VARCHAR2(4000)
 D1					VARCHAR2(4000)
 NV1					NVARCHAR2(2000)
 R1					RAW(2000)


max_string_size = EXTENDED
--------------------------
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$                        NUMBER
 ORA_ERR_MESG$                          VARCHAR2(2000)
 ORA_ERR_ROWID$                         ROWID
 ORA_ERR_OPTYP$                         VARCHAR2(2)
 ORA_ERR_TAG$                           VARCHAR2(2000)
 V1                                     VARCHAR2(32767)
 N1                                     VARCHAR2(32767)
 D1                                     VARCHAR2(32767)
 NV1                                    NVARCHAR2(16383)
 R1                                     RAW(32767)

Every single “original” column that appears in this table will be a LOB, with an inline LOB locator of 30 or more bytes. (At least, that’s the 12.1.0.2 implementation, I haven’t checked for 12.2 or 18.3).

If this is going to be a problem (e.g. you have a table defined with 500 columns but only use 120 of them) you can create a minimalist error logging table. Provided you create it with the ora_err% columns suitably defined you can add only those columns you’re really interested in (or feel threatened by), and you don’t have to declare them at extreme lengths. e.g.


create table err$_special (
        ora_err_number$         number,
        ora_err_mesg$           varchar2(2000),
        ora_err_rowid$          rowid,
        ora_err_optyp$          varchar2(2),
        ora_err_tag$            varchar2(2000),
        n1                      varchar2(128)
)
;

insert into t1 values(1,'abc','02-jan-1984',sys_op_c2c('abc'),hextoraw('0xFF')) 
log errors into err$_special
reject limit unlimited
;

execute print_table('select * from err$_special')


ORA_ERR_NUMBER$               : 1722
ORA_ERR_MESG$                 : ORA-01722: invalid number

ORA_ERR_ROWID$                :
ORA_ERR_OPTYP$                : I
ORA_ERR_TAG$                  :
N1                            : abc


If you try to create an error logging table that doesn’t include the 5 critical columns you’ll see Oracle error ORA-38900: missing mandatory column “ORA_ERR_{something}” of error log table “{your logging table name}” when you try to log errors into it, and the 5 critical columns have to be the first 5 columns (in any order) in the table or you’ll get Oracle error ORA-38901: column “ORA_ERR_{something}$” of table “{your logging table name}” when you try to log errors into it.

August 22, 2018

Descending bug

Filed under: Bugs,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:20 pm BST Aug 22,2018

Following on from Monday’s posting about reading execution plans and related information, I noticed a question on the ODC database forum asking about the difference between “in ({list of values})” and a list of “column = {constant}” predicates connected by OR. The answer to the question is that there’s essentially no difference as you would be able to see from the predicate section of an execution plan:


SELECT  c1, c2, c3, c4, c5, c6, c7, c8..  
FROM    TAB1  
WHERE   STS IN ( 'A', 'B')  
AND     cnt < '4'  
AND     dt < sysdate  
and     rownum <=1;  
  
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 33399   (1)| 00:03:14 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   114K|   201M| 33399   (1)| 00:03:14 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   114K|       |   723   (1)| 00:00:05 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  

Note how the predicate section tells you that the original “sts in ( ‘A’, ‘B’ )” has been transformed into “sts = ‘A’ or sts = ‘B'”.

A further point I made about IN-lists in Monday’s post was that as one step in the transformation Oracle would sort the list and eliminate duplicates, and it suddenly occurred to me to wonder whether Oracle would sort the list in descending order if the only relevant index were defined to start with a descending column. Naturally I had to try it so here’s a suitable script to prepare some data:

rem
rem     Script:         descending_bug_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0        Crashes
rem             12.2.0.1        Crashes
rem             12.1.0.2        Crashes
rem             11.2.0.4        Bad Plan
rem

create table t1
nologging
pctfree 95 pctused 5
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        case mod(rownum,1000)
                when 0 then 'A'
                when 3 then 'B'
                when 6 then 'C'
                       else 'D'
        end                             sts,
        case mod(rownum,1000)
                when 0 then '1'
                when 3 then '2'
                when 6 then '3'
                       else '4'
        end                             cnt,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

create index t1_i1a on t1(sts) nologging;
create index t1_i1d on t1(sts desc) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size skewonly'
        );
end;
/

There is one oddity in this script – if you’ve got every column in an index declared as DESC you’ve made a mistake and none of the columns should be declared as DESC. The feature is relevant only if you want a mixture of ascending and descending column in a single index.

An important detail of the script is that I’ve gathered stats AFTER creating the objects – it’s important to do this, even in 18.3, because (a) creating the “descending” index will result in a hidden virtual column being created to represent the descending column and I want make sure I have stats on that column and (b) the “stats on creation” code doesn’t generate histograms and I want a (frequency) histogram on columns sts and the hidden, virtual, descending version of the column.

After generating the data and checking that I have the correct histograms for sts and sys_nc00006$ (the relevant hidden column) I can then run the following test:

set serveroutput off
alter session set statistics_level = all;

alter index t1_i1d invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter index t1_i1d   visible;
alter index t1_i1a invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
set serveroutput on


The code makes one index invisible then runs a query that should use an inlist iterator; then it switches indexes making the invisible one visible and vice versa and repeats the query. I’ve enabled rowsource execution statistics and pulled the execution plans from memory to make sure I don’t get fooled by any odd glitches that might exist within “explain plan”. Here are the results from 11.2.0.4 – normal index, then descending index – with a little cosmetic cleaning:


S   COUNT(*)
- ----------
B        100
C        100

SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.01 |       5 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.01 |       5 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    200 |00:00:00.01 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |      2 |    178 |    200 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))


Index altered.
Index altered.


S   COUNT(*)
- ----------
C        100
B        100


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.02 |     198 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.02 |     198 |
|*  2 |   INDEX FULL SCAN    | T1_I1D |      1 |   1000 |    200 |00:00:00.02 |     198 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


As expected we see counts of 100 for ‘B’s and ‘C’s, and we also see that the “sort group by nosort” operation with the descending index has produced the results in reverse order. The problem though is that the optimizer has decided to use an “index full scan” on the descending index, and the estimate of the rows returned is terribly wrong (and seems to be the common “5% guess”, used once for each target value), and the number of buffer visits is huge compared to the result from the normal index – Oracle really did walk every leaf block in the index to get this result. The predicate section also looks rather silly – why hasn’t the optimizer produced predicates more like: “sys_nc00006$ = sys_op_descend(‘B’)” ?

In passing you’ll notice that the estimated rows in the plan using the normal index is a little low. This is the result of Oracle using a small sample (ca. 5,500 rows) in 11g to gather histogram stats. 12c will do better for a frequency histogram with the fast algorithm it uses for a 100% (auto) sample size.

So 11g doesn’t do very well but we’ve got 12.1.0.2, 12.2.0.1, and (in the last couple of weeks) 18.3 to play with. Here’s the result from 12.1.0.2 and 12.2.0.1 for the query that should use the descending index:


select  sts, count(*)
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qernsRowP], [1], [], [], [], [], [], [], [], [], [], []


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      0 |00:00:00.01 |       0 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      0 |00:00:00.01 |       0 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    101 |00:00:00.03 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1D |      2 |    200 |    101 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR
              "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))

The query crashed! The plan, however, did look appropriate – the optimizer picked an inlist iterator, picked an index range scan, got the correct estimate of rows (index entries), and did better with the predicate section (though having used a sensible predicate for the access predciate it then used the bizarre version as the filter predicate). Judging from the A-rows column the query seems to have crashed at roughly the point where the optimizer was switching from the range scan for the first iteration into the range scan for the second iteration.

And then there’s Oracle 18.3 – which does the same as the 12c versions :(

To make sure that my silly “single column so it shouldn’t be declared descending” index was the sole cause of the problem I repeated the tests using a two-column index on (sts, cnt).

Conclusion:

Descending indexes or (to be more accurate) indexes with descending columns can still produce problems even in the very latest version of Oracle.

Footnote

Oracle MoS has the wonderful “ORA-600/ORA-7445/ORA-700 Error Look-up Tool (Doc ID 153788.1)” (which doesn’t yet allow you to choose 18.3 as a version) so I used this to do a look up for ORA-00600 errors with first paremeter qernsRowP in 12.2.0.1 and got the following suggestion from doc ID 285913.1: “set event:10119 to disable no-sort fetch and then reparse the failing SQL.” The example suggested setting the event to level 12, and this solved the problem for all three failing versions – but the suggestion came with a warning: “Setting this event at system level may impact the performance of database.” The execution plan (taken, in this case, from 18.2) may explain the warning:

 

S   COUNT(*)
- ----------
B	 100
C	 100

SQL_ID	f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	    |	   1 |	      |      2 |00:00:00.01 |	    4 |       |       | 	 |
|   1 |  HASH GROUP BY	   |	    |	   1 |	    2 |      2 |00:00:00.01 |	    4 |  1558K|  1558K|  659K (0)|
|   2 |   INLIST ITERATOR  |	    |	   1 |	      |    200 |00:00:00.01 |	    4 |       |       | 	 |
|*  3 |    INDEX RANGE SCAN| T1_I1D |	   2 |	  200 |    200 |00:00:00.01 |	    4 |       |       | 	 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


The plan has changed from using a “sort group by nosort” – which effectively means just keeping a running count as you go – to a real “hash group by” which means you have to do the hashing arithmetic for every value (though maybe there’s a deterministic trick that means Oracle won’t do the arithmetic if the next value to be hashed is the same as the previous value) and the actual memory used (659K) does seem a little extreme for counting two distinct values.

 

August 20, 2018

Masterclass – 1

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 10:42 am BST Aug 20,2018

A recent thread on the Oracle developer community database forum raised a fairly typical question with a little twist. The basic question is “why is this (very simple) query slow on one system when it’s much faster on another?” The little twist was that the original posting told use that “Streams Replication” was in place to replicate the data between the two systems.

To make life easy for remote trouble-shooters the poster had supplied (for each system) the output from SQL Monitor when running the query, the autotrace output (which shows the predicate section that SQL Monitor doesn’t report), and the session statistics for the query run, plus some statistics about the single table in the query, the index used in the plan, and the column on which that index was based.

Here, with a little cosmetic editing (and a query that has clearly been camouflaged by the OP), is the information supplied for the faster database, where the query took about 30 seconds to complete.


SELECT c1, c2, c3, c4, c5, c6, c7, c8..  
FROM TAB1  
WHERE STS IN ( 'A', 'B')  
AND cnt < '4'  
AND dt < sysdate  
and rownum <=1;  
  
Sql_monitor and stats from DB1  
******************************  
  
Global Information  
------------------------------  
 STS              :  DONE (ALL ROWS)             
 Instance ID         :  1                           
 Execution Started   :  08/17/2018 08:31:22         
 First Refresh Time  :  08/17/2018 08:31:22         
 Last Refresh Time   :  08/17/2018 08:31:53         
 Duration            :  31s                         
 Program             :  sqlplus.exe                 
 Fetch Calls         :  1                           
  
Global Stats  
===============================================================================  
| Elapsed |   Cpu   |    IO    | Concurrency | Fetch | Buffer | Read  | Read  |  
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Calls |  Gets  | Reqs  | Bytes |  
===============================================================================  
|      33 |    3.00 |       30 |        0.08 |     1 |   102K | 38571 | 301MB |  
===============================================================================  
  
SQL Plan Monitoring Details (Plan Hash Value=715774357)  
======================================================================================================================================================================================  
| Id |            Operation            |          Name           |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Activity |       Activity Detail        |  
|    |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes |   (%)    |         (# samples)          |  
======================================================================================================================================================================================  
|  0 | SELECT STATEMENT                |                         |         |       |         1 |    +31 |     1 |        1 |       |       |          |                              |  
|  1 |   COUNT STOPKEY                 |                         |         |       |         1 |    +31 |     1 |        1 |       |       |          |                              |  
|  2 |    INLIST ITERATOR              |                         |         |       |         1 |    +31 |     1 |        1 |       |       |          |                              |  
|  3 |     TABLE ACCESS BY INDEX ROWID | TAB1                    |    114K | 33399 |        32 |     +0 |     2 |        1 | 38377 | 300MB |    96.77 | Cpu (1)                      |  
|    |                                 |                         |         |       |           |        |       |          |       |       |          | db file sequential read (16) |  
|    |                                 |                         |         |       |           |        |       |          |       |       |          | read by other session (13)   |  
|  4 |      INDEX RANGE SCAN           | TAB1_STS_IDX            |    115K |   723 |        30 |     +2 |     2 |     118K |   194 |   2MB |     3.23 | read by other session (1)    |  
======================================================================================================================================================================================  
  
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 33399   (1)| 00:03:14 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   114K|   201M| 33399   (1)| 00:03:14 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   114K|       |   723   (1)| 00:00:05 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  
  
  
Table stats 
-----------
table_name    num_rows        blocks  empty_blocks  chain_cnt  avg_row_len             
TAB1        79,654,925    22,416,917             0          0        1,847          
  
column_stats(STS)
------------------
table_name  column_name  num_distinct  num_nulls    density  avg_col_len   
TAB1        STS                     5          0  6.2049E-9            2         
  
Index_stats(on STS)
-------------------
index_name    leaf_blocks  distinct_keys  avg_leaf_blocks_per_key  avg_data_blocks_per_key  clustering_factor    num_rows
TAB1_STS_IDX      487,939              5                   97,587                4,458,874         22,294,372  78,308,939   
 
Session stats
-------------
process last non-idle time              1,534,508,966
session connect time                    1,534,508,966
logical read bytes from cache             839,663,616
cell physical IO interconnect bytes       316,055,552
physical read bytes                       316,055,552
physical read total bytes                 316,055,552
file io wait time                          17,044,083
session pga memory                          8,643,880
session pga memory max                      8,643,880
temp space allocated (bytes)                4,194,304
session uga memory                          1,755,696
session uga memory max                      1,755,696
buffer is pinned count                        135,743
table fetch by rowid                          117,519
non-idle wait count                           107,301
session logical reads                         102,500
consistent gets                               102,450
consistent gets from cache                    102,448
no work - consistent read gets                102,368
buffer is not pinned count                    101,741
free buffer inspected                          43,458
free buffer requested                          38,592
physical read total IO requests                38,581
physical read IO requests                      38,581
physical reads                                 38,581
physical reads cache                           38,579
hot buffers moved to head of LRU               37,258
bytes sent via SQL*Net to client                7,370
bytes received via SQL*Net from client          6,869
redo size                                       5,536
undo change vector size                         4,432
DB time                                         3,166
non-idle wait time                              2,962
user I/O wait time                              2,954
table fetch continued row                       2,423


And here’s the equivalent information from the slower database where the query took more than 9 times as long (4 minutes 42 seconds) to complete.


Global Information  
------------------------------  
 STS              :  DONE (ALL ROWS)           
 Instance ID         :  1                         
 Execution Started   :  08/17/2018 08:21:47       
 First Refresh Time  :  08/17/2018 08:21:47       
 Last Refresh Time   :  08/17/2018 08:26:29       
 Duration            :  282s                      
 Module/Action       :  SQL*Plus/-                
 Program             :  sqlplus.exe               
 Fetch Calls         :  1                         
  
Global Stats  
================================================================  
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |  
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |  
================================================================  
|     287 |    8.76 |      278 |     1 |   110K | 110K | 858MB |  
================================================================  
  
SQL Plan Monitoring Details (Plan Hash Value=715774357)  
======================================================================================================================================================================================  
| Id |            Operation            |          Name           |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |        Activity Detail        |  
|    |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |          (# samples)          |  
======================================================================================================================================================================================  
|  0 | SELECT STATEMENT                |                         |         |       |         1 |   +282 |     1 |        1 |      |       |          |                               |  
|  1 |   COUNT STOPKEY                 |                         |         |       |         1 |   +282 |     1 |        1 |      |       |          |                               |  
|  2 |    INLIST ITERATOR              |                         |         |       |         1 |   +282 |     1 |        1 |      |       |          |                               |  
|  3 |     TABLE ACCESS BY INDEX ROWID | TAB1                    |    142K | 40211 |       282 |     +1 |     2 |        1 | 109K | 854MB |   100.00 | db file sequential read (277) |  
|  4 |      INDEX RANGE SCAN           | TAB1_STS_IDX            |    142K |   892 |       280 |     +3 |     2 |     118K |  491 |   4MB |          |                               |  
======================================================================================================================================================================================  
  
Execution Plan (autotrace) 
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 40211   (1)| 00:08:03 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   141K|   249M| 40211   (1)| 00:08:03 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   141K|       |   892   (1)| 00:00:11 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  
            
Table stats
-----------
table_name    num_rows        blocks  empty_blocks  chain_cnt  avg_row_len             
TAB1        79,447,350   22,318,667            710    537,597        1,847  
  
column_stats(STS)
-----------------
table_name  column_name  num_distinct  num_nulls    density  avg_col_len  
TAB1        STS                     5          0  6.1789E-9            2  
  
Index_stats(on STS)
-------------------
index_name    leaf_blocks  distinct_keys  avg_leaf_blocks_per_key  avg_data_blocks_per_key  clustering_factor    num_rows
TAB1_STS_IDX      493,152              5                   98,630                4,382,625         21,913,127  79,106,263   


Session Stats
-------------
process last non-idle time              1,534,508,200
session connect time                    1,534,508,200
logical read bytes from cache             903,790,592
physical read total bytes                 899,629,056
cell physical IO interconnect bytes       899,629,056
physical read bytes                       899,629,056
file io wait time                         277,881,742
session pga memory                          8,586,744
session pga memory max                      8,586,744
temp space allocated (bytes)                4,194,304
session uga memory max                      1,690,184
session uga memory                          1,690,184
buffer is pinned count                        129,148
table fetch by rowid                          117,521
session logical reads                         110,326
consistent gets                               110,276
consistent gets from cache                    110,276
no work - consistent read gets                110,071
non-idle wait count                           109,879
free buffer requested                         109,830
physical read IO requests                     109,818
physical reads cache                          109,818
physical reads                                109,818
physical read total IO requests               109,818
buffer is not pinned count                    109,577
free buffer inspected                          70,740
hot buffers moved to head of LRU               31,910
DB time                                        28,203
non-idle wait time                             27,788
user I/O wait time                             27,788
dirty buffers inspected                        19,067
bytes sent via SQL*Net to client               14,927
bytes received via SQL*Net from client         10,607
redo size                                       5,440
undo change vector size                         4,432
table fetch continued row                       3,660

There are all sorts of noteworthy details in these two sets of information – some of the “how to see what’s in front of you” type, some of the “be careful, Oracle can deceive you” type. So I’m going to walk though the output picking up a number of background thoughts before commenting on the answer to the basic question.

We’ll start with the object statistics, then we’ll look at the SQL Monitor plan to see if we can determine where the extra time was spent, then we’ll try to work out what else the plan might be telling us about the code and data, then we’ll summarise my observations to make a claim about the difference in behaviour.

Object statistics

The table has 79M rows with average length of 1,847 bytes, using 22M blocks. With an 8KB block size and that average row size we would expect to see about 3 rows per block, and that’s fairly consistent with the value of rows / blocks.  We don’t know what the sample size was for this stats collection, but it might have been a “small” sample size rather than the the 100% you would get from using auto_sample_size, so that might also explain some discrepancy between the two different views on the figures.

We note that the secondary system reports a chain_cnt in excess of 500,000 rows. The only (unhacked) way that this figure could be set would be through a call to analyze statistics, and once the figure is there it won’t go away unless you use the analyze command again to delete statistics.  We don’t know the history of how and when the figure got there so it doesn’t guarantee that there are any chained or migrated rows, nor does the zero in the table stats on the primary system guarantee that it doesn’t have any chained or migrated rows – all it tells us is that at some time someone used the wrong command to gather stats and there were some (less than 1%) migrated or chained rows in the table at the time. (The optimizer will use this figure in its arithmetic if it is set, by the way, so it may affect some of the cost calculations – but not by a huge amount.)

The column sts reports 5 distinct values, no nulls, and a density of 6.2e-9 which is roughly half of 1/79M: so we have a frequency histogram on the column (in the absence of a histogram the density would be 1/5, and it’s reasonable to assume that the number of buckets was either the default or set to something larger than 5).  We were told that the system was running 11.2.0.4 – so we have to be a little suspicious about the accuracy of this histogram since it will have been sampled with a very small sample if the stats collection had used auto_sample_size. (12c will use a specially optimized 100% sample for frequency and top-N histograms when using auto_sample_size)

The index on sts has a clustering_factor of around 22M which is similar to the number of blocks in the table – and that’s not too surprising if there are are only a very small number of distinct values in the column – especially when the presence of the histogram suggest that there’s a skew in the data distribution. (There’s more to come on that point.) The number of leaf blocks is about 500,000 (being lazy about arithmetic) – just as a side note this suggests the index is running in a fairly inefficient state (and probably hasn’t been created with the compress keyword).

Doing a rough estimate of the index arithmetic :  the avg_col_len for sts is 2, so the space required for each index entry will be 13 bytes (2 for the column, 7 for the rowid content, 2 for the row header, 2 for the row directory entry).  Take off the block overhead, and assume the index is running at a “typical” 70% space usage per leaf block and you might expect 5,600 bytes used per leaf block for current index data and that works out to about 430 index entries per leaf block.  With 79M rows in the table that should lead to 79M/430 leaf blocks – i.e. roughly 184,000 leaf blocks, not 493,000 leaf blocks.  However it’s not unusual to see an index with extremely repetitive values operating at something like 50% utilisation, which would bring our estimate to about 310 rows per leaf block and 255,000 leaf blocks – which is still off by a factor of nearly 2 compared to what we’ve actually got. Again, of course, we have to be a little bit cautious about these statistics – we don’t know the sample size, and Oracle uses a surprisingly small number of blocks to sample the stats for an index.

Where’s the time.

The SQL Monitor gives us a very clear report of where most of the time went – almost all of it was spent in I/O waits, and almost all of the wait time was in the “table access by index rowid” opration in both cases; but the primary system did 38,377 read requests while the secondary did 109,000 read requests in that line of the plan. It is significant, though, that quite a lot (40%) of the ASH samples for that operation on the primary system were for “read by other session” rather than “db file sequential read”:  in other words some other session(s) were doing a lot of work to pull the data we wanted into the buffer cache at the same time. Apart from the fact that a wait for “read by other session” often means we spend less time waiting than if we’d had to do the read ourselves, the presence of this wait suggests that other sessions may be “pre-caching” data for us so that we end up having to read far fewer blocks than would otherwise be the case.

It’s important to note at the same time that the difference in Buffer Gets for the two systems was small – 102K vs. 110K – and the “Rows (actual)” was the same in both cases – 118K entries returned by the index range scan.  Both systems did similar amounts of “logical” work, to process similar amounts of data; the difference was the fraction of the work that required a buffer get to turn into a disc read or a “wait for other read”.

We might want to pick up a few more numbers to corroborate the view that the only significant difference was in the volume of data cached and not some more esoteric reason.  Some of the session statistics should help.


DB1:  table fetch by rowid                          117,519
DB2:  table fetch by rowid                          117,521

DB1:  undo change vector size                         4,432
DB2:  undo change vector size                         4,432

DB1:  redo size                                       5,536
DB2:  redo size                                       5,440

DB1:  session logical reads                         102,500
DB2:  session logical reads                         110,326

DB1:  no work - consistent read gets                102,368
DB2:  no work - consistent read gets                110,071

DB1:  table fetch continued row                       2,423
DB2:  table fetch continued row                       3,660

The number of rows fetched by rowid is virtually identical and we have done (virtually) no work that generates undo or redo – such as delayed block cleanout; there are no statistics shown for “%undo record applied” so we probably haven’t done very much work to get a read consistent view of the data though we can’t be sure that the OP simply failed to copy that stat into list supplied (but then the similarity of “session logical reads” to “no work – consistent read gets” confirms the hypothesis that we didn’t do any (significant) work on visiting undo blocks.

We do see a few percent increase in the number of buffer gets (“session logical reads”) – but this may reflect the fact that the actual pattern of data in one table is slightly different from the pattern in the other – thanks to ASSM the process id of the process that inserts a row into a table can affect (within a small range, usually) the block into which the row is inserted; but at 102,000 / 110,000 buffer gets to visit 117,500 rows in the table we can see that there must be some table blocks that hold two (or more) rows that are identified as consecutive in the index – leading to some row visits being achieved through a buffer pin and without a fresh buffer get. You’ll note that this argument is consistent with the small variation in clustering_factor (not that we entirely trust those figures) for the two indexes – the system with the lower clustering_factor for the index has done fewer buffer gets to acquire the same number of rows from the table – by definition that means (assuming default setup) that there are more cases where “the next table row” is in the same block as the current row.

The final figure I’ve shown is the “table fetch continued rows”: according to the table stats (which we don’t necessarily trust completely) 500K out of 79M rows are chained/migrated which is roughly 0.6%. We know that we’re visiting about 117K table rows so might expect (on average) roughly the same percentage migrated/chained viz: 0.6% of 117K = 743, so there’s a little anomaly there (or an error in our assumption about “average” behaviour.  It’s worth noting, though, that a “continued fetch” would have to do an extra buffer visit (and maybe an extra physical read).  You might wonder, of course, how there could be any chained or migrated rows when the average row length is 1,847 bytes but in a follow-up post the OP did say there were 3 BLOB columns in the table, which can cause havoc with interpreting stats for all sorts of reasons. We don’t have any information about the table structure – particularly whether the columns in the query appear before or after the BLOB columns in the table definition – and we don’t know what processing takes place (for example, maybe the 3rd BLOB is only updated after the sts column has been changed to a value other than A or B which would help to explain why we shouldn’t be using the 0.6% calculation above as a table-wide average), so we’re not in a position to say why any of the continued fetches appear but there are several guesses we could make and they’re all easy to check.

Plan observations

If we examine row estimates we see that it 114K for the faster plan and 141K for the slower plan (with a closely corresponding variation in cost). The difference in estimates simply tells us that the histogram gathering was probably a small sample size and subject to a lot of variation. The scale of the estimates tells us that the A and B rows are probably rare – call it 125K out of 79M rows, about 0.16% of the total rows in the table, so it would not be surprising to see consecutive samples for the histogram producing significant variations in estimates.

The other interesting thing we can note in the SQL Monitor plan is that the Starts column for the index range scan / table access operations in both plans shows the value 2: this means that there are no “A” rows that match the other predicates:  Oracle has run the “A” iteration to completion then started the “B” iteration and found a row on the second iteration. Is this a coincidence, or should it always happen, or is it only fairly likely to happen; is it possible to find times when there are no suitable “B” rows but plenty of suitable “A” rows. The final predicate in the query is “rownum <= 1” – so the query is picking one row with no explicit strategy for choosing a specific row when there are multiple appropriate rows, does this mean that we could optimizer the query by rewriting it as a “union all” that searched for B rows first and A rows second ? We just don’t know enough about the processing.

In passing, we can’t get Oracle to search the B rows first by changing the order of the in-list.  If you have a predicate like “where sts in ({list of literal values})” the optimizer will sort the list to eliminate duplicates before rewriting the predicate as a list of disjuncts, and then (if the path uses an iterator) iterate through the list in the resulting order.

In the absence of information about the way in which the data is processed we can only say that we need to avoid visiting the table so frequently. To do this we will need to add one or both of the columns from the other predicates to the index – this might double the size of the index, but eliminate 95% of the potential I/O.  For example if we discover that A and B rows are initially created “into the future” and this query is looking for a row whose “time has come” so that it can be processed and changed to an X row (say) then there may only ever be a tiny number of rows where the “sts = A and the dt < sysdate” and an index on (sts, dt) would be a perfect solution (especially if it were compressed on at least the first column).

The OP has declared a reluctance to add an index to the table – but there are two points to go with this indexing strategy. Since we know there’s a frequency histogram and the A and B rows appear to be rare values – what benefit is there in having an index that covers the other values (unless 2 of the remaining 3 are also rare).  How about creating a function-based index that represents only the rare values and modifying this code to use that index – e.g.

create index t1_id on t1 (
        case sts when 'A' then sts when 'B' then sts end,
        case sts when 'A' then dt  when 'B' then dt  end
) compress 1
;

select  *
from    t1
where   case sts when 'A' then sts when 'B' then sts end in ('A','B')
and     case sts when 'A' then dt  when 'B' then dt  end < sysdate
and     cnt < '4'
and     rownum <= 1
/


You might be able to replace a huge index (79M rows worth) with this small one (120K rows worth) unless there’s too much other code in the system that has to be adjusted or the sts column is actually the target of a referential integrity constraint; at worst you could add this index knowing that it’s generally not going to consume much in the way of extra space or processing resources and is going to save you a lot of work for this query.

Summary

The execution plan from SQL Monitor points very strongly to the fast system benefiting from having a lot of the relevant data cached and constantly being reloaded into the cache by other sessions while the slow system has to acquire almost all of its data by real phyiscal reads. Most of the reads address the table so engineering an index that is low-cost and (fairly) high precision is the most significant strategy for reducing the workload and time on the slow system.

The fact that all the potential A rows fail to match the full predicate set suggests that there MAY be some aspect of the processing that means it would be more efficient to check for B rows before checking for A rows.

Given the massive skew in the data distribution a function-based index that hides all the non-popular values (or even all the values that are not of interest to this query) may be the most cost-effective way of adding a very effective index to the table with minimal resource requirements.

And finally

It’s taken me more than 4 hours to write this note after spending about 10 minutes reading through the information supplied by the OP and identifying and cross-checking details. A small fraction of the 4+ hours was spent creating a little model to check something I had to say about in-lists, the rest of it was trying to write up a coherent description covering all the details.

That’s it for today, but I may have missed a couple of points that I noticed as I read the OP’s posting; and I will want to do a little cosmetic work on this article and check grammar and spelling over the next couple of days.

<h3<Update (already)

Shortly after I posted this blog note the owner of the question reported the following as the distribution of values for the sts column:

 STS   COUNT(*)
---- ----------
   A          6
   E        126
   D        866
   C   80212368
   B     117631

Two things stand out about these figures – first it’s an ideal example of a case where it would be nice avoid having index entries for the 80 million ‘C’ rows. Depending on the coding and testing costs, the supportability of the application and the possible side effects this could be done with a function-based index, or by introducing a virtual column that hides the ‘C’s behing a NULL, or by changing the code to use NULL instead of ‘C’.

Secondly – I made a comment about rewriting the code to query the B’s before the A’s. But we saw that Oracle worked through about 117,000 rows before returning a result: so the fitures above tell us that it must have worked through almost all the B’s and the handful of A’s was just a tiny little blip before it got to the B iteration – so there’s no point in making that change.

My suggestion for the function-based index above could be modified in two ways, of course – add two more “when”s to each “case” to capture the D and E rows, or take the opposite viewpoint and create an index on expressions like: “case sts when ‘C’ then to_char(null) else sts end”. The benefit of the latter approach is that you don’t have to modify the index definition (and rebuild the index) if a new legal value for sts appears.

August 2, 2018

Extended Histograms – 2

Filed under: extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 2:13 pm BST Aug 2,2018

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Here’s the starting code that generates the data, and creates histograms on all the columns (I’ve run this against 12.1.0.2 and 12.2.0.1 so far):


rem
rem     Script:         histogram_hack_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to deal with wordpress format issue.
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to deal with wordpress format issue.
;

variable v1 varchar2(128)

begin
        :v1 := dbms_stats.create_extended_stats(null,'t1','(c2,c3)');
        dbms_output.put_line(:v1);
end;
/

execute dbms_stats.gather_table_stats(null, 't1', method_opt=>'for all columns size 10');

In a variation from the previous version of the code I’ve used the “create_extended_stats()” function so that I can return the resulting virtual column name (also known as an “extension” name) into a variable that I can use later in an anonymous PL/SQL block.

Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago:


select
        endpoint_value
from 
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = :v1
;

select 
        distinct c2, c3, 
        mod(sys_op_combined_hash(c2,c3),9999999999) endpoint_value
from t1
;

ENDPOINT_VALUE
--------------
    4794513072
    6030031083

2 rows selected.


C C ENDPOINT_VALUE
- - --------------
N Y     4794513072
Y N     6030031083

2 rows selected.

So we have a method of generating the values that Oracle should store in the histogram; now we need to generate 4 values and supply them to a call to dbms_stats.set_column_stats() in the right order with the frequencies we want to see:


declare
        l_distcnt number;
        l_density number;
        l_nullcnt number;
        l_avgclen number;

        l_srec  dbms_stats.statrec;
        n_array dbms_stats.numarray;

begin
        dbms_stats.get_column_stats (
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

        l_srec.novals := dbms_stats.numarray();
        l_srec.bkvals := dbms_stats.numarray();

        for r in (
                select
                        mod(sys_op_combined_hash(c2,c3),9999999999) hash_value, bucket_size
                from    (
                        select 'Y' c2, 'Y' c3, 1 bucket_size from dual
                        union all
                        select 'N' c2, 'N' c3, 1 from dual
                        union all
                        select 'Y' c2, 'N' c3, 71482 from dual
                        union all
                        select 'N' c2, 'Y' c3, 1994 from dual
                        )
                order by hash_value
        ) loop
                l_srec.novals.extend;
                l_srec.novals(l_srec.novals.count) := r.hash_value;

                l_srec.bkvals.extend;
                l_srec.bkvals(l_srec.bkvals.count) := r.bucket_size;
        end loop;

        n_array := l_srec.novals;

        l_distcnt  := 4;
        l_srec.epc := 4;

--
--      For 11g rpcnts must not be mentioned
--      For 12c is must be set to null or you
--      will (probably) raise error:
--              ORA-06533: Subscript beyond count
--

        l_srec.rpcnts := null;

        dbms_stats.prepare_column_values(l_srec, n_array);

        dbms_stats.set_column_stats(
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

end;

The outline of the code is simply: get_column_stats, set up a couple of arrays and simple variables, prepare_column_values, set_column_stats. The special detail that I’ve included here is that I’ve used a “union all” query to generate an ordered list of hash values (with the desired frequencies), then grown the arrays one element at a time to copy them in place. (That’s not the only option at this point, and it’s probably not the most efficient option – but it’s good enough). In the past I’ve used this type of approach but used an analytic query against the table data to produce the equivalent of 12c Top-frequency histogram in much older versions of Oracle.

A couple of important points – I’ve set the “end point count” (l_srec.epc) to match the size of the arrays, and I’ve also changed the number of distinct values to match. For 12c to tell the code that this is a frequency histogram (and not a hybrid) I’ve had to null out the “repeat counts” array (l_srec.rpcnts). If you run this on 11g the reference to rpcnts is illegal so has to be commented out.

After running this procedure, here’s what I get in user_tab_histograms for the column:


select
        endpoint_value                          column_value,
        endpoint_number                         endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                               prev_endpoint,
                endpoint_value
        from
                user_tab_histograms
        where
                table_name  = 'T1'
        and     column_name = :v1
        )
order by endpoint_number
;

COLUMN_VALUE ENDPOINT_NUMBER  FREQUENCY
------------ --------------- ----------
   167789251               1          1
  4794513072            1995       1994
  6030031083           73477      71482
  8288761534           73478          1

4 rows selected.


It’s left as an exercise to the reader to check that the estimated cardinality for the predicate “c2 = ‘N’ and c3 = ‘N'” is 1 with this histogram in place.

July 31, 2018

Extended Histograms

Filed under: CBO,extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 11:05 pm BST Jul 31,2018

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

  • Create simple stats (no histograms) on c2 and c3.
  • Create frequency histograms on c2 and c3
  • Create a column group (extended stats) on (c2,c3) but no histograms
  • Create a column group (extended stats) on (c2,c3) with a histogram on (c2, c3)

If you do these tests you’ll find the estimated cardinalities are (from 12.1.0.2):

  • 18,369 – derived as 73,476 / 4  … total rows over total possible combinations
  • 1,940   – derived as 73,476 * (1,994/73,476) * (71,482/73,476) … total rows * fraction where c2 = ‘N’ * fraction where c3 = ‘N’
  • 36,738 – derived as 73,476 / 2 … total rows / number of distinct combinations of (c2, c3)
  • 997      – derived as 1,994 / 2 … half the frequency of the least frequently occurring value in the histogram

The last algorithm appeared in 10.2.0.4; prior to that a “value not in frequency histogram” would have been given an estimated cardinality of 1 (which is what the person on Oracle-L wanted to see).

In fact the optimizer’s behaviour can be reverted to the 10.2.0.3 mechanism by setting fix-control 5483301 to zero (or off), either with an “alter session” call or inside the /*+ opt_param() */ hint. There is, however, another option – if you get the column stats, then immediately set them (dbms_stats.get_column_stats(), dbms_stats.set_column_stats()) the optimizer defines the stats as “user defined” and (for reasons I don’t know – perhaps it’s an oversight) reverts to the 10.2.0.3 behaviour. Here’s some code to demonstrate the point; as the srcipt header says, I’ve tested it on versions up to 18.1


rem
rem     Script:         histogram_hack_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             18.1.0.0        via LiveSQL (with some edits)
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to avoid format issue
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 10 for columns (c2,c3) size 10');

column column_name format a128 new_value m_colname

select  column_name
from    user_tab_cols
where   table_name = 'T1'
and     column_name not in ('C2','C3')
;

set autotrace traceonly explain
select /* pre-hack */ * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

declare
        l_distcnt number default null;
        l_density number default null;
        l_nullcnt number default null;
        l_srec    dbms_stats.statrec;
        l_avgclen number default null;

begin

        dbms_stats.get_column_stats (
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

        dbms_stats.set_column_stats(
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

end;
/

set autotrace traceonly explain
select /* post-hack */  * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

I’ve created a simple table for the data and collected stats including histograms on the two columns and on the column group. I’ve taken a simple strategy to find the name of the column group (I could have used the function dbms_stats.create_extended_stats() to set an SQL variable to the name of the column group, of course), and then run a little bit of PL/SQL that literally does nothing more than copy the column group’s stats into memory then write them back to the data dictionary.

Here are the “before” and “after” execution plans that we get from autotrace:

BEFORE
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   997 |  3988 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   997 |  3988 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')


AFTER
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     4 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     4 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')

As required – the estimate for the (‘N’,’N’) rows drops down to (the optimizer’s best approximation to ) zero.

Footnote:

An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past  but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well.

You might wonder why the optimizer is programmed to use “half the least popular” for predicates references values not in the index. Prior to 12c it’s easy to make an argument for the algorithm. Frequency histograms used to be sampled with a very small sample size, so if you were unlucky a “slightly less popular” value could be missed completely in the sample; if you were requesting a value that didn’t appear in the histogram then (presumably) you knew it should exist in the data, so guessing a cardinality somewhat less than the least popular must have seemed like a good idea.

In 12c, of course, you ought to be taking advantage of the “approximate NDV” implementation for using a 100% sample to generate frequency (and Top-N / Top-Frequency histograms). If you’ve got a 12c frequency histogram then the absence of a value in the histogram means the data really wasn’t there so a cardinality estimate of 1 makes more sense. (Of course, you might have allowed Oracle to gather the histogram at the wrong time – but that’s a different issue). If you’ve got a Top-N histogram then the optimizer will behave as if a “missing” value is one of those nominally allowed for in the “low frequency” bucket and use neither the 1 nor the “half the least popular”.

So, for 12c and columns with frequency histograms it seems perfectly reasonably to set the fix control to zero – after getting approval from Oracle support, of course.

 

July 16, 2018

Direct IOT

Filed under: 12c,Infrastructure,IOT,Oracle — Jonathan Lewis @ 1:02 pm BST Jul 16,2018

A recent (automatic ?) tweet from Connor McDonald highlighted an article he’d written a couple of years ago about an enhancement introduced in 12c that allowed for direct path inserts to index organized tables (IOTs). The article included a demonstration seemed to suggest that direct path loads to IOTs were of no benefit, and ended with the comment (which could be applied to any Oracle feature): “Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.”

Clearly it’s necessary to pose the question – “so when would direct mode insert be a good option for IOTs?” – because if it’s never a good option you have to wonder why it has been implemented. This naturally leads on to thinking about which tests have not yet been done – what aspects of IOTs did Connor not get round to examining in his article. (That’s a standard principle of trouble-shooting, or testing, or investigation: when someone shows you a test case (or when you think you’ve finished testing) one thing you should do before taking the results as gospel is to ask yourself what possible scenarios have not been covered by the test.)

So if you say IOT what are the obvious tests once you’ve got past the initial step of loading the IOT and seeing what happens. First, I think, would be “What if the IOT weren’t empty before the test started”; second would be “IOTs can have overflow segments, what impact might one have?”; third would be “Do secondary indexes have any effects?”; finally “What happens with bitmap indexes and the requirement for a mapping table?” (Then, of course, you can worry about mixing all the different possibilities together – but for the purposes of this note I’m just going to play with two simple examples: non-empty starting tables, and overflow segments.)

Here’s some code to define a suitable table:


create table t2 
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	3 * rownum			id,
	lpad(rownum,10,'0')		v1,
	lpad('x',50,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
order by
	dbms_random.value
;

begin
	dbms_stats.gather_table_stats(
		ownname     => null,
		tabname     => 'T2',
		method_opt  => 'for all columns size 1'
	);
end;
/

create table t1(
	id,
	v1,
	padding,
	constraint t1_pk primary key(id)
)
organization index
-- including v1
-- overflow
nologging
as
select * from t2
;

begin
	dbms_stats.gather_table_stats(
		ownname     => null,
		tabname     => 'T1',
		method_opt  => 'for all columns size 1'
	);
end;
/

I’ve created a heap table t2 that holds 100,000 rows with an id column that is randomly ordered; then I’ve used this ta1ble as a source to create an IOT, with the option to have an overflow segment that contains just the 100 character padding columns. I’ve used 3 * rownum to define the id column for t2 so that when I insert another copy of t2 into t1 I can add 1 (or 2) to the id and interleave the new data with the old data. (That’s another thought about IOT testing – are you loading your data in a pre-existing order that suits the IOTs or is it arriving in a way that’s badly out of order with respect to the IOT ordering; and does your data go in above the current high value, or spread across the whole range, or have a partial overlap with the top end of the range and then run on above it.)

Have created the starting data set, here’s the test:


execute snap_my_stats.start_snap
execute snap_events.start_snap

insert 
	/*  append */
into t1
select
	id + 1, v1, padding
from
	t2
;


execute snap_events.end_snap
execute snap_my_stats.end_snap

All I’m doing is using a couple of my snapshot packages to check the work done and time spent while insert 100,000 interleaved rows – which are supplied out of order – into the existing table. As shown the “append” is a comment, not a hint, so I’ll be running the test case a total of 4 times: no overflow, with and without the hint – then with the overflow, with and without the hint. (Then, of course, I could run the test without the overflow but an index on v1).

Here are some summary figures from the tests – first from the test without an overflow segment:

                                      Unhinted       With Append
                                  ============      ============
CPU used when call started                 153               102
CPU used by this session                   153               102
DB time                                    166               139

redo entries                           130,603            42,209
redo size                           78,315,064        65,055,376

sorts (rows)                                30           100,031

You’ll notice that with the /*+ append */ hint in place there’s a noticeable reduction in redo entries and CPU time, but this has been achieved at a cost of sorting the incoming data into order. The reduction in redo (entries and size) is due to an “array insert” effect that Oracle can take advantage of with the delayed index maintenance that takes place when the append hint is legal (See the section labelled Option 4 in this note). So even with an IOT with no overflow there’s a potential benefit to gain from direct path loading that depends on how much the new data overlaps the old data, and there’s a penalty that depends on the amount of sorting you’d have to do.

What happens in my case when I move the big padding column out to an overflow segment – here are the equivalent results:


Headline figures                      Unhinted       With Append
================                  ============      ============
CPU used when call started                 158                52
CPU used by this session                   158                52
DB time                                    163                94
redo entries                           116,669            16,690
redo size                           51,392,748        26,741,868
sorts (memory)                               4                 5
sorts (rows)                                33           100,032

Interestingly, comparing the unhinted results with the previous unhinted results, there’s little difference in the CPU usage between having the padding column in the “TOP” section of the IOT compared to having it in the overflow segment, though there is a significant reduction in redo (the index entries are still going all over the place one by one, but the overflow blocks are being pinned and packed much more efficiently). The difference between having the append hint or not, though, is damatic. One third of the CPU time (despited still having 100,000 rows to sort), and half the redo. One of the side effects of the overflow, of course, is that the things being sorted are much shorted (only the id and v1 columns that go into the TOP section, and not the whole IOT row.

So, if you already have an overflow segment that caters for a significant percentage of the row, it looks as if the benefit you could get from using the /*+ append */ hint would far outweigh the penalty of sorting you have to pay. Of course, an IOT with a large overflow doesn’t look much different from a heap table with index – so perhaps that result isn’t very surprising.

I’ll close by re-iterating Connor’s closing comment:

Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.

Before you dive in and embrace it, or ruthlessly push it to one side, make sure you do some testing that reflects the situations you have to handle.

July 14, 2018

Quiz Night

Filed under: Execution plans,Oracle,Performance,sorting — Jonathan Lewis @ 7:07 pm BST Jul 14,2018

Here’s a question prompted by a recent thread on the ODevCom database forum – how many rows will Oracle sort (assuming you have enough rows to start with in all_objects) for the final query, and how many sort operations will that take ?


drop table t1 purge;

create table t1 nologging as select * from all_objects where rownum < 50000;

select owner, count(distinct object_type), count(distinct object_name) from t1 group by owner;

Try to resist the temptation of doing a cut-n-paste and running the code until after you’ve thought about the answer.

And the answer is:

It was nice to see a few ideas being volunteered in response to this question; I think that getting a diverse set of comments makes a nice point about how it’s always worth spending a little time to think along the lines of: “If I do X how might Oracle handle it”. Having the ideas before trying to check the effects can make it a lot easier to understand what’s happening and, sometimes, how to take advantage of what Oracle does to improve the way you design a query.

The first point to make, as Michael D O’Shea  pointed out in comment #2, is that computer systems don’t usually “sort” data – they tend to create pointers to data and shuffle the pointers in some way. In Oracle’s case “sorting” used to mean inserting pointers into a balanced binary tree, and aggregating used to be a case of accumulating values at the leaf nodes of the insertion tree. Then in 10g Oracle introduced a new sorting algorithm that often works more efficiently than the binary insertion tree. I’m still going to refer to the binary tree method as “sorting”, though.

Looking at the query we can see that there is no “order by” clause so it’s possible that Oracle will do whatever it does using hash aggregation throughout and no sorting, but that leaves open the question of how a hash table on owner can also record a distinct count of both object_type and object_name because every single owner hash bucket would have to link to its own hash tables for object_type and object_name and do a sort of “recursive hash aggregation” which starts to sound a little complicated. Maybe the alternative suggested by Kaley in comment #1 is closer to the truth – maybe Oracle just “buckets” all the data by owner and then sorts within each owner twice to do the count distincts, but then we’re still going to be hanging on to a lot of data, doing a two-level open-ended process.

Having waved hands for a little bit to try and head in the direction of possible solutions we need to look for clues that tell us whether we ought to eliminate or refine some of our guesses. There are several bits of information we could look at and running the query (although I asked you not to) is the next step we have to take. But when we run the query we want to see the session statistics, pick up the actual execution plan with rowsource execution statistics, and enable the 10032 and 10033 (sort) traces. So let’s fold the query into a longer script, something like:


set linesize 255
set trimspool on
set pagesize 60

set serveroutput off
alter session set statistics_level = all;

execute snap_my_stats.start_snap

alter session set events '10032 trace name context forever';
alter session set events '10033 trace name context forever';

select owner ... etc.

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set events '10033 trace name context off';
alter session set events '10032 trace name context off';

execute snap_my_stats.end_snap

To avoid blurring around the edges we may have to isolate the three different tests – the query against dbms_xplan.display_cursor(), for example, is obviously going to have some impact on the session stats – and we may then want to run each test twice in succession so that any warm-up or parsing activities don’t confuse the issue. It would also be a good idea to run the tests after creating a new session in case there are some distracting side effects from creating the data set. But with these details addressed, here are a few results:

First the execution plan (I got these results from 12.2.0.1, all recent versions of Oracle behave similarly):


----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     16 |00:00:00.34 |    1018 |       |       |          |
|   1 |  SORT GROUP BY     |      |      1 |     16 |     16 |00:00:00.34 |    1018 |  5014K|  1445K| 4456K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  50000 |  50000 |00:00:00.07 |    1018 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Oracle uses a SORT GROUP BY, not a HASH GROUP BY, and the indications are that we used about 4.4MB of memory to sort 50,000 rows. Then there’s the session statistics:


Name                                                     Value
----                                                ----------

session uga memory max                               3,255,648
session pga memory max                               3,211,264

table scan rows gotten                                  50,000

sorts (memory)                                               1
sorts (rows)                                           150,000

This says we did just one sort operation and sorted 150,000 rows using an excess of 3.2MB over the starting pga/uga (rather than the 4.4MB suggested by the plan); possibly the variation in apparent memory usage could be explained by the way that Oracle allocates reasonably large chunks to grow the PGA when you grow a workarea but since I’m taking deltas there’s also some scope for being misled by the change in maximum pga/uga memory.

Finally the 10032 trace file shows the following (we didn’t spill to disc, so the 10033 trace wasn’t triggered):


---- Sort Parameters ------------------------------
sort_area_size                    4562944
sort_area_retained_size           4562944
sort_multiblock_read_count        7
max intermediate merge width      38

---- Sort Statistics ------------------------------
Input records                             150000
Output records                            49907
Total number of comparisons performed     1945863
  Comparisons performed by in-memory sort 1945863
Total amount of memory used               4562944
Uses version 1 sort
---- End of Sort Statistics -----------------------

These figures are ones we have to trust – it seems we really did do one sort operation of 150,000 rows, and we did allocate 4.5MB of memory. There’s an obvious guess for the 150,000 input rows – Oracle has turned every row into three rows – the original row, a row to count the object_type, and a row to count the object_name – and with that in mind maybe we will be able to make sense of getting an output of 49,907 rows using 4.5M of memory. Let’s create a query that could produce the right numbers but with a differently arranged output:


select distinct owner, 0, null        from t1
union all
select distinct owner, 1, object_type from t1
union all
select distinct owner, 2, object_name from t1
order by 1, 2, 3
;

For my data set this query produced 49,907 rows of output (which is a number we wanted to see) and here are the first 9 rows of output – followed by the first row of output from the original query:


OWNER                    0 NULL
--------------- ---------- ---------------------
APPQOSSYS                0

APPQOSSYS                1 SYNONYM
APPQOSSYS                1 TABLE

APPQOSSYS                2 DBMS_WLM
APPQOSSYS                2 WLM_CLASSIFIER_PLAN
APPQOSSYS                2 WLM_FEATURE_USAGE
APPQOSSYS                2 WLM_METRICS_STREAM
APPQOSSYS                2 WLM_MPA_STREAM
APPQOSSYS                2 WLM_VIOLATION_STREAM


OWNER           COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOBJECT_NAME)
--------------- -------------------------- --------------------------
APPQOSSYS                                2                          6

Spot the pattern ? All I have to do after this “union all with simple sort” is walk the result set in order accumulating distinct counts as I do so.

But what about the memory requirements ? Check back to the original 10032 trace file, it reported 4562944 bytes as the total memory needed, but it also reported using a “Version 1” sort – so before I ran my union all query I set “_newsort_enabled”=false, to get the following 10032 trace details:


---- Sort Statistics ------------------------------
Input records                             49907
Output records                            49907
Total number of comparisons performed     730667
  Comparisons performed by in-memory sort 730667
Total amount of memory used               4562944
Uses version 1 sort
---- End of Sort Statistics -----------------------

The memory used is exactly the number I wanted to see. (The  version 2 sort got exactly the same result using 3.5MB of memory, so I don’t know why it’s not used at this point – but maybe that’s because the implementation isn’t quite what I think.)

So, hypothesis (to date, until a reader shows me that my answer is incomplete – or wrong):

As the “SORT GROUP BY” operation accepts each row from the “TABLE ACCESS FULL” operation it converts each row into N rows (one base row and one for each column for which there is a count(distinct)). The base row holds just the set of “group by” columns and is tagged with a zero, each subsequent row holds the “group by” columns, a tag value to identify which column it carries, and one of the “count(distinct)” columns. Oracle then operates the normal “group by” aggregation mechanism but is actually aggregating on the “group by” columns plus the “tag” column. So each leaf node in the binary tree ends up holding {owner_value, tag_value, count}, and once all the data has been aggregated into the binary tree Oracle can walk the tree and perform a pivot to turn three rows for each owner value into a single row.

If you start thinking about nasty scenarios you will realise that the upshot of this implementation (if my hypothesis is correct) is that if you have a query with a long “group by” list, and several columns where there are lots of distinct values for each combination of the “group by” list then the volume of the B-tree could actually be much larger than the volume of the original table, and the amount of memory and CPU needed to build that tree (before collapsing it) could be huge.

Footnote:

There is one special case with this count(distinct …) query. If you have only ONE distinct operation in the query Oracle can use the “distinct aggregation” transformation with “view merging” to produce a completely different plan.

July 13, 2018

pushing predicates

Filed under: CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:05 pm BST Jul 13,2018

I came across this odd limitation (maybe defect) with pushing predicates (join predicate push down) a few years ago that made a dramatic difference to a client query when fixed but managed to hide itself rather cunningly until you looked closely at what was going on. Searching my library for something completely different I’ve just rediscovered the model I built to demonstrate the issue so I’ve tested it against a couple of newer versions  of Oracle (including 18.1) and found that the anomaly still exists. It’s an interesting little detail about checking execution plans properly so I’ve written up the details. The critical feature of the problem is a union all view:


rem
rem	Script:		push_pred_limitation.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jan 2015
rem
rem	Last tested 
rem		18.1.0.0	via LiveSQL
rem		12.2.0.1
rem		12.1.0.2
rem		11.2.0.4
rem

create table t1
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t2
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t3
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1 for columns owner size 254'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T3',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t2_id on t2(object_id);
-- create index t2_id_ot on t2(object_id, object_type);

create index t3_name_type on t3(object_name, object_type);

create or replace view v1
as
select 
	/*+ qb_name(part1) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.created	date_2,
	t3.created	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
union all
select
	/*+ qb_name(part2) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.last_ddl_time	date_2,
	t3.last_ddl_time	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
;

Two points to note so far: first, the view is basically joining the same two tables in the same way twice but selecting different columns. It’s a close model of what the client was doing but so much simpler that it wouldn’t be hard to find a different way of getting the same result: the client’s version would have been much far harder to rewrite. Secondly, I’ve listed two possible indexes for table t2 but commented one of them out. The indexing will make a difference that I’ll describe later.

So here’s the query with execution plan (from explain plan – but pulling the plan from memory gives the same result):


select
	/*+ qb_name(main) */
	t1.object_name, t1.object_type,
	v1.object_id, v1.date_2, v1.date_3
from
	t1,
	v1
where
	v1.object_id = t1.object_id
and	v1.object_type_2 = t1.object_type
and	v1.object_type_3 = t1.object_type
and	t1.owner = 'OUTLN'
;

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   588 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   588 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|*  3 |   VIEW                                   | V1           |     1 |    44 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   3 - filter("V1"."OBJECT_TYPE_2"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

The execution plan appears to be fine – we can see at operation 4 that the union all view has been access with the pushed predicate option and that the subsequent sub-plan has
used index driven nested loop joins in both branches – until we look a little more closely and examine the Predicate section of the plan. What, exactly, has been pushed ?

Look at the predicate for operation 3: “V1″.”OBJECT_TYPE_2″=”T1″.”OBJECT_TYPE”. It’s a join predicate that hasn’t been pushed into the view. On the other hand the original, and similar, join predicate v1.object_type_3 = t1.object_type has been pushed into the view, appearing at operations 9 and 15. There is a difference, of course, the object_type_3 column appears as the second column of the index on table t3.

Two questions then: (a) will the object_type_2 predicate be pushed if we add it to the relevant index on table t2, (b) is there a way to get the predicate pushed without adding it to the index. The answer to both questions is yes. First the index – re-run the test but create the alternative index on t2 and the plan changes to:

Plan hash value: 497545587

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

Notice how the predicate at operation 3 has disappeared, and the access predicate at operation 8 now includes the predicate “T2″.”OBJECT_TYPE”=”T1″.”OBJECT_TYPE”.

Alternatively, don’t mess about with the indexes – just tell Oracle to push the predicate. Normally I would just try /*+ push_pred(v1) */ as the hint to do this, but the Outline section of the original execution plan already included a push_pred() hint that looked like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 1), so I first copied exactly that into the SQL to see if it would make any difference. It did – I got the following plan (and the hint in the outline changed to PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2 1) so this may be a case where the plan produced by a baseline will perform better than the plan that the produced the baseline!):

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|* 13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='TEST_USER')
   7 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  13 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

In this case we see that the critical late-joining predicate has disappeared from operation 3 and re-appeared as a filter predicate at operation 7 In many cases you may find that the change in predicate use makes little difference to the performance – in my example the variation in run time over several executions of each query was larger than the average run time of the query; nevertheless it’s worth noting that the delayed use of the predicate could have increased the number of probes into table t3 for both branches of the union all and resulted in redundant data passing up through several layers of the call stack before being eliminated … and “eliminate early” is one of the major commandments of optimisation.

You might notice that the Plan Hash Value for the hinted execution plan is the same as for the original execution plan: the hashing algorithm doesn’t take the predicates into account (just one of many points that Randolf Geist raised in a blog post several years ago). This is one of the little details that makes it easy to miss the little changes in a plan that can make a big difference in performance.

Summary

If you have SQL that joins simple tables to set based (union all, etc.) views and you see the pushed predicate option appearing take a little time to examine the predicate section of the execution plan to see if the optimizer is pushing all the join predicates that it should and, if it isn’t, test the effects of pushing more predicates.

In many cases adding the hint /*+ push_pred(your_view_name) */ at the top of the query may be sufficient to get the predicate pushing you need, but you may need to look at the outline section of the execution plan and add a series of more complicated push_pred() and no_push_pred() hints because the push_pred hint has evolved over time to deal with increasingly complicated transformations.

 

July 12, 2018

Cardinality Puzzle

Filed under: Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 12:57 pm BST Jul 12,2018

One of the difficulties of being a DBA and being required to solve performance problems is that you probably never have enough time to think about how you got to a solution and why the solution works; and if you don’t learn about the process itself , you just don’t get better at it. That’s why I try (at least some of the time) to write articles and books (as I did with CBO Fundamentals) that

  1. explain simple details that can be used as background facts
  2. walk through the steps of solving a problem

So here’s an example from a question on the ODC database forum asking about the cause and workaround for a bad cardinality estimate that is producing a poorly performing execution plan. It’s actually a type of problem that comes up quite frequently on large data sets and explains why a simple “gather stats” is almost guaranteed to leave you with a few headaches (regardless of whether or not you choose to include histograms as part of the process). I’m not going to offer “the answer” – I’m just going to talk about the inferences we can make from the facts supplied and where we have to go from there.

The DBA has a table holding 80,000,000,000 rows. It is list/hash partitioned with 2 partitions and 1,024 sub-partitions (per partition) but neither of the partitioning key columns appears in the query. The query runs parallel and the optimizer (presumably thanks to the specific settings of various parameters related to parallel execution uses dynamic sampling at level 3).

There is an inline view defined in the query and the DBA has isolated this as a key component of the problem and supplied a query and plan (from “explain plan”) against that view.


select * from TAB2 T
WHERE T.DT = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');
 
------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                          |   479M|    76G|  1756K (14)| 05:51:14 |       |       |        |      |            |
|   1 |  PX COORDINATOR              |                          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000                 |   479M|    76G|  1756K (14)| 05:51:14 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL     |                          |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  1024 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS STORAGE FULL| TAB1                     |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  2048 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - storage(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))
       filter(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))

Note
-----
   - dynamic sampling used for this statement (level=3)

The DBA’s problem is that if the estimated cardinality of this extract goes over roughly 500M the optimizer chooses a bad plan for the overall query – and on occasion this extract has given an estimate of 5 billion rows. Moreover, the actual number of rows returned by this extract is typically in the order of 40M, so the estimate is a long way off even when it’s “good enough”.

So where do we start looking to work out what’s going wrong? You’ll note, of course, that after text expansion the user’s single predicate has changed, and an extra predicate (previously hidden inside the view) has appeared; instead of just T.DT = to_date(:b1,’MM/DD/YYYY HH24:MI:SS’) we now have (cosmetically adjusted):

        COALESCE(DFG,'N')='N' 
AND     TRUNC(DT_TM)=TO_DATE(:B1,'MM/DD/YYYY HH24:MI:SS')

There are two immediately obvious threats here – first that the combination of predicates means Oracle is likely to make a mistake because it will check the individual selectivities and multiply them together to get the combined selectivity, second that the appearance of predicates of the form “function(column) = constant” means that Oracle will guess 1% as the individual selectivities.

Without checking more details we might assume that a possible quick fix (that would require no changes to existing code) would be to create a couple of virtual columns (or extended stats) to represent the two expressions and gather stats on the resulting columns – though it is a restriction of extended stats that you can’t “double up” and create a column group on the two column expressions, so there’s still some scope for a cardinality estimate that is still sufficiently bad even with this approach. We also note that if we can change the coalesce(DFG,’N’) that must have been hidden in the view to nvl(DFG,’N’) then Oracle would be able to “or expand” the nvl() and use a more appropriate selectivity for that part of the predicate.

However, the points I’ve covered so far tend to produce estimates that are too small and often much too small. So maybe the key to the problem is in the Note section that tells us that Oracle has (successfully) used dynamic sampling for this statement. In other words, all the theory of how the optimizer calculates selectivity may be irrelevant – the estimate will be based on the luck of the sample.

So let’s take a look at the (slightly edited) table stats we’ve been given:

column_name data_type num_distinct low_value      high_value     density   num_null  histogram
DT_TM       DATE           6179571 78740B1E0A383C 7876020B01262B 1.6182E-7 0         NONE
DFG         VARCHAR2             1 4E             4E             1         0         NONE

Notice that the DFG (apparently) has the value ‘N’ for every row in the table (low_value = high_value = 0x4E, num_nulls = 0). The date range is 30-Nov-2016 to 11-Feb-2018, with no histogram but 6.18M distinct values for 80 Billion rows. Neither column has a histogram.

A little arithmetic tells us that (on average) there ought to be about 182M (= 80B / 438 days) rows for any one day – and that’s worth thinking about for three separate reasons.

First, an estimate of 479M against an average of 182M isn’t too surprising if it’s based on a fairly small sample, it’s only out by a factor of 2.6. On the other hand, getting an an estimate of 5 billion – which can happen on bad days – is extremely unlikely if the data is uniformly distributed across dates.

Secondly, the DBA supplied us with some data from the recent past with an aggregate query for “trunc(dt_tm)”, with the following results:

TRUNC(DT_TM)   COUNT(*)
------------ ----------
01-FEB-18    44,254,425
02-FEB-18    46,585,349
03-FEB-18    43,383,099
04-FEB-18    32,748,364
05-FEB-18    37,993,126
06-FEB-18    39,708,994
07-FEB-18    38,696,777
08-FEB-18    41,871,780
09-FEB-18    46,702,852
10-FEB-18    42,744,870
11-FEB-18    34,971,845
12-FEB-18    37,165,983

Recent data seems to follow an average of around 40M rows per day, so the estimate of 182M that we can derive from the stored statistics is a long way off: the present is behaving very differently from the past and that’s a relatively common problem with very large data sets – though it’s more usual for rolling averages to increase from the past to the present because the data is often representing the growth of a business over time. Can we create a hypothesis to explain the discrepancy, and could that hypothesis also account for the sample producing some very strange estimates ?

Finally, slightly more subtle and only included for completeness, if this column is supposed to hold date and time to the nearest second – which is what you might expect from an Oracle date type – there are 38 million possible values (438 x 86,400) it could be holding, and that’s more than the actual number of distinct values by a factor of 6. We can also work out that 80 billion rows over 438 days is 2,000 rows per second (on average). Averages are often misleading, of course, many systems have a pattern where a working day shows most of the data created in a 12 – 16 hour window with a couple of hours of more intense activity. For reference, though: average rows per second for the recent data is roughly 40M/86400 = 460; while the average we derive from the stored statistics is 80B / 6M = 13000 rows per second; this unlikely pattern needs a “non-uniform” explanation.

How do these three thoughts help us to understand or, to be more accurate, to make a sensible guess about why the optimizer can use dynamic sampling and get a wildly variable estimate which can be 1 or 2 orders of magnitude wrong. (varying between 479M and 5,000M compared to the recent actual 40M)?

Here’s one simple idea: extrapolate the 40M rows per day over 80B rows: that’s 2,000 days (possibly rather more since businesses tend to grow). What if the dt_tm is the timestamp for the moment the row was loaded into the database, and a couple of years ago (maybe around “30th Nov 2016”) the data was restructured and the existing five years of data was loaded over a very short period of time – let’s say one week. This would leave you with 17B rows of “new” data with a dt_tm spread at 40M rows per day for most of 438 days, and 63B rows of “historic” data packed into 7 days (at 9B rows per day).

I don’t know how Oracle would have randomly selected its sample from an extremely large table with 2,048 physical data segments but it’s totally believable that a small, widely scattered sample could end up with an extremely unrepresentative subset of the data. A completely random sample of the data would produce an estimate of around 500M rows for the predicate; but it would only take a fairly small variation in the sample (taking a few too many “historic” rows) to produce a large enough change in the estimate to change the execution plan, and a rare, but not extreme variation could easily take the estimate up to 5B.

Next Steps

It would be at this point in a performance assignment that I’d be asking around to find out if my guess about a massive data load operation was correct – if I couldn’t get the answer by talking to people I’d run a query against the whole data set to check the hypothesis, because there’s clearly some sort of skew in the data that’s causing a problem. I’d also run the critical part of the query a couple of times with events 10046/level 4 and 10053 set (but only fetching the first few rows) to find out from the trace file how large a sample Oracle was using, and then run the sampling query a few times to see what the sampled results looked like. Depending on the results I’d either find a way to stop Oracle from sampling for this query or I might create a virtual column (or just extended stats since it’s 11g) on just the trunc(dt_tm), possibly with a histogram in place (maybe coded by hand) if that could isolate the special dates and leave Oracle with a better estimate of the typical date. I might find I had to change the coalesce() to an nvl() as well – or create a virtual  column – to stop the sampling.

Finally, it’s worth noting that in 11g it’s possible to create pending (table preference “PUBLISH” = FALSE) stats for testing purposes; it’s also worth noting that the default histogram on trunc(dt_tm) would be a height-balanced histogram while we could create a frequency histogram in 12c since 12c allows us to specify up to 2,048 buckets.

Footnote

If you check the ODC thread you’ll see that the OP has marked as correct a suggestion to change:

    TRUNC (TB1.DT_TM)  = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');  

to

    dt_tm >= trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))
and dt_tm <  trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))+1

Note that that’s “greater than or equal to” at one end and “strictly less than” at the other when using “date + 1”.

This has the effect of giving the optimizer a chance of using the low/high values of the column to produce a better (though perhaps still overlarge) and consistent estimate of the rows in the date range; and it may also stop the optimizer from doing dynamic sampling at level 3 (the “I’m guessing, let’s check” level) though it’s possible that the sampling would be disabled only if the coalesce() were changed to an nvl() as well.

Of course, from the information supplied, this looks like the OP would have to change a view definition and the run-time code to achieve the result. But in an ideal world doing things that avoid confusing the optimizer is usually the sensible strategy provided it doesn’t take an extreme amount of coding and testing.

 

July 10, 2018

Validate FK

Filed under: constraints,Infrastructure,Oracle — Jonathan Lewis @ 10:42 am BST Jul 10,2018

A comment arrived yesterday on an earlier posting about an enhancement to the truncate command in 12c that raised the topic of what Oracle might do to validate a foreign key constraint. Despite being sure I had the answer written down somewhere (maybe on a client site or in a report to a client) I couldn’t find anything I’d published about it, so I ran up a quick demo script to show that all Oracle does is construct a simple SQL statement that will do check the data – and then do whatever the optimizer does to produce the fastest possible plan.

Here’s the script – with a few variations to show what happens if you start tweaking features to change the plan.

rem
rem     Script:         validate_fk.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table parent 
as 
select  * 
from    all_Objects 
where   rownum <= 10000 -- > comment to avoid wordpress format issue
;

alter table parent add constraint par_pk primary key(object_id);

execute dbms_stats.gather_table_stats(null, 'parent', cascade=>true)


create table child 
as 
select par.* 
from    (select rownum from dual connect by level <= 10) v1, --> comment to avoid wordpress format issue
        parent par
; 

alter table child add constraint chi_fk_par foreign key(object_id) references parent enable novalidate; 
create index chi_fk_par on child(object_id); 
execute dbms_stats.gather_table_stats(null, 'child', cascade=>true)


-- alter table child modify object_id null;
-- alter table child parallel(degree 8);
-- alter session set "_fast_full_scan_enabled" = FALSE;
-- alter session set "_optimizer_outer_to_anti_enabled" = false;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 12';
alter table child modify constraint chi_fk_par validate;
alter session set events '10046 trace name context off;

All I’ve done is created parent table with a primary key, and a child table with 10 rows per parent. I’ve created a foreign key constraint on the child table, enabled it (so future data will be checked) but not validated it (so there’s no enforced guarantee that the existing data is correct). Then I’ve issued a command to validate the foreign key.

The flush of the buffer cache is to allow me to see the I/O that takes place and will also (usually) let me see some if there are any strange issues due to any recursive SQL Oracle. As you can see I’ve also got a couple of commented commands that might cause a couple of variations in behaviour.

Here’s the critical content from the output of the trace file summary from tkprof (in versions from 11.2.0.4 to 12.2.0.1):


select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3
from
 "TEST_USER"."CHILD" A , "TEST_USER"."PARENT" B where( "A"."OBJECT_ID" is not
  null) and( "B"."OBJECT_ID" (+)= "A"."OBJECT_ID") and( "B"."OBJECT_ID" is
  null)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.01       0.02        241        373          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.02       0.02        241        373          0           0

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  NESTED LOOPS ANTI (cr=373 pr=241 pw=0 time=21779 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=17753 us starts=1 cost=32 size=1700000 card=100000)(object id 104840)
     10000      10000      10000   INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=4494 us starts=10000 cost=0 size=49995 card=9999)(object id 104838)

As you can see, Oracle writes SQL for an outer join with an “is null” predicate on the outer table – which the optimizer converts to an anti-join, running a nested loop in this case. It’s an interesting little oddity that the code includes the predicate “A”.”OBJECT_ID” is not null given that the column is declared as not null – but this is presumably a developer deciding to re-use code even if it then includes a redundant predicate (which is effectively zero cost – since the optimizer can use transitive closure to eliminate it).

Given that Oracle has converted an outer join to an anti join I obviously had to check what would happen if I disabled this conversion by altering the “_optimizer_outer_to_anti_enabled” parameter to false. The optimizer obeyed the session setting with the following plan in the trace:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  FILTER  (cr=373 pr=241 pw=0 time=226926 us starts=1)
    100000     100000     100000   NESTED LOOPS OUTER (cr=373 pr=241 pw=0 time=177182 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000    INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=40811 us starts=1 cost=32 size=1700000 card=100000)(object id 104848)
    100000     100000     100000    INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=119363 us starts=100000 cost=0 size=49995 card=9999)(object id 104846)

The significant difference is in the CPU usage, of course, and to a degree the magnitude of the change is dictated by the pattern and distribution of the data. The number of CR gets hasn’t changed as the number of index probes jumps from 10,000 to 100,000 because Oracle will have pinned index blocks (There’s a very old article on my old website if you want to read more about buffer pins).

The original question was about the effect of a local session setting that disabled index fast full scans, and followed up with a question on parallelism. After seeing the effect of changing one optimizer parameter at the session level you probably won’t be surprised by the following two results.  First, when the only change I make is the setting of the “_index_fast_full_scan_enabled” parameter, and then when the only change is the declared parallelism of the child table.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  MERGE JOIN ANTI (cr=240 pr=240 pw=0 time=120163 us starts=1 cost=247 size=22000 card=1000)
    100000     100000     100000   INDEX FULL SCAN CHI_FK_PAR (cr=218 pr=218 pw=0 time=20314 us starts=1 cost=222 size=1700000 card=100000)(object id 104852)
    100000     100000     100000   SORT UNIQUE (cr=22 pr=22 pw=0 time=81402 us starts=100000 cost=25 size=50000 card=10000)
     10000      10000      10000    INDEX FULL SCAN PAR_PK (cr=22 pr=22 pw=0 time=1185 us starts=1 cost=22 size=50000 card=10000)(object id 104850)


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=15 pr=2 pw=0 time=722483 us starts=1)
         0          0          0   PX SEND QC (RANDOM) :TQ10001 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0    HASH JOIN ANTI BUFFERED (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0     PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0      TABLE ACCESS FULL CHILD (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0      PX SEND BROADCAST :TQ10000 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0       PX SELECTOR  (cr=0 pr=0 pw=0 time=0 us starts=0)
         0          0          0        INDEX FAST FULL SCAN PAR_PK (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)(object id 104854)

In the first case my version of Oracle has switched to a merge anti-join with an index full scan (not FAST full scan). It’s interesting to note that the merge join anti hasn’t been as clever as the nested loop anti in avoiding probes of the second data source as it walks the foreign key index (note how starts=100000 in the SORT UNIQUE line).

In the second case all the work was done by the parallel query slaves – and the PX SELECTOR line tells you that this plan must have come from 12c. As you can see we’re still doing an anti-join but this time we do a parallel tablescan of the child table (as we haven’t enabled the index for parallel execution – if we had altered the index to parellel(degree 8) as well we would have seen a parallel index fast full scan instead of the parallel tablescan.)

Bottom line: the SQL executed to validate a foreign key constraint is essentially a join between the parent and child tables, Oracle will simply optimize that statement to the best of its abilities based on the current session settings. If you want to test on a clone (or accurate model) of the tables you may find that you can create an sql_patch that works (even though the necessary SQL will be optimised as SYS – though so far I’ve only tried this with a couple of variants of the parallel() hint on 12.2.0.1)

<h3>Footnote</h3>

If you were wondering what the three bind variables in the query were, this is the relevant extract from the 10046 trace file with bind variable tracing enabled:

 Bind#0
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c38  bln=32  avl=09  flg=05
  value="TEST_USER"
 Bind#1
  oacdty=01 mxl=32(05) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c00  bln=32  avl=05  flg=05
  value="CHILD"
 Bind#2
  oacdty=01 mxl=32(10) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359bc8  bln=32  avl=10  flg=05
  value="CHI_FK_PAR"

The values are the owner, table, and constraint names. (Though you have to modify the code a little to show that the last one is the constraint name and not the index name).

 

 

July 9, 2018

Historic Stats

Filed under: Oracle,Statistics — Jonathan Lewis @ 12:45 pm BST Jul 9,2018

If you want to examine historic object stats Oracle gives you a few procedures in the dbms_stats package to compare sets of stats captured at two different time periods, but there’s no view that you can query to get an idea of how a table’s stats have changed over time. This is a problem that can be addressed when you discover two things:

  • There are views to report pending table, index, column and histogram stats.
  • Pending stats are stored stored as “historic” stats with a future date.

Once you’ve spotted the second detail, you can acquire the SQL to generate the pending stats views:


SQL> select view_name from dba_views where view_name like 'DBA%PENDING%STAT%';

VIEW_NAME
--------------------------------------------------------------------------------------------------------------------------------
DBA_TAB_PENDING_STATS
DBA_IND_PENDING_STATS
DBA_COL_PENDING_STATS
DBA_TAB_HISTGRM_PENDING_STATS


SQL> set long 20000
SQL> set pagesize 0
SQL> select dbms_metadata.get_ddl('VIEW','DBA_TAB_PENDING_STATS','SYS') from dual;

  CREATE OR REPLACE FORCE NONEDITIONABLE VIEW "SYS"."DBA_TAB_PENDING_STATS" ("OW
NER", "TABLE_NAME", "PARTITION_NAME", "SUBPARTITION_NAME", "NUM_ROWS", "BLOCKS",
 "AVG_ROW_LEN", "IM_IMCU_COUNT", "IM_BLOCK_COUNT", "SCAN_RATE", "SAMPLE_SIZE", "
LAST_ANALYZED") AS
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
	 h.im_imcu_count, h.im_block_count, h.scanrate, h.samplesize, h.analyzet
ime
  from	 sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime > systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
	 h.avgrln, h.im_imcu_count, h.im_block_count, h.scanrate, h.samplesize,
	 h.analyzetime
  from	 sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime > systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
	 h.blkcnt, h.avgrln, h.im_imcu_count, h.im_block_count, h.scanrate,
	 h.samplesize, h.analyzetime
  from	sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
	sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
	tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime > systimestamp

Notice the critical predicate repeated across the UNION ALL: “and h.savtime > systimestamp” – all we have to do is change that to “less than or equal to” (or just delete it if we’re not fussy about reporting pending stats along with historic) then add a few columns reporting the available statistics and we can create a view that we can query for the historic stats.


rem
rem     Script:         optstat_table_history.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Notes
rem     Have to be connected as SYS, or have directly
rem     granted privileges on the sys tables to do this
rem

create or replace force view sys.jpl_tab_history_stats (
        owner, table_name, partition_name, subpartition_name,
        num_rows, blocks, avg_row_len, sample_size, last_analyzed
)
as
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
         h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
         h.avgrln, h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
         h.blkcnt, h.avgrln, h.samplesize, h.analyzetime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime <= systimestamp
;

drop public synonym jpl_tab_history_stats;
create  public synonym jpl_tab_history_stats for sys.jpl_tab_history_stats;
grant select on jpl_tab_history_stats to public;

Now you can query historic stats for any schema and table.

If you don’t want to create a view in the sys schema (and don’t want to create a permanent object at all) you can always use subquery factoring as an easy way of editing the metadata into a suitable query. You have to be connected as a user with privileges (that could be through a role) to view the relevant sys tables, though.


with jpl_tab_history (
        owner, table_name, partition_name, subpartition_name,
        num_rows, blocks, avg_row_len, sample_size, last_analyzed
) as (
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
         h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
         h.avgrln, h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
         h.blkcnt, h.avgrln, h.samplesize, h.analyzetime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime <= systimestamp
)
select
        table_name, blocks, num_rows, avg_row_len, sample_size, last_analyzed
from
        jpl_tab_history
where
        owner = 'TEST_USER'
and     table_name = 'T1'
order by
        last_analyzed
;

TABLE_NAME               BLOCKS   NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE LAST_ANAL
-------------------- ---------- ---------- ----------- ----------- ---------
T1                          188      10000         120	     10000 04-JUL-18
T1                          322      18192         118	     18192 06-JUL-18

Matching code for indexes, columns and histograms is left as an exercise to the interested reader.

Next Page »

Powered by WordPress.com.