Oracle Scratchpad

March 23, 2017

min/max Upgrade

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle,Troubleshooting — Jonathan Lewis @ 8:53 am BST Mar 23,2017

A question came up on the OTN database forum a little while ago about a very simple query that was taking different execution paths on two databases with the same table and index definitions and similar data. In one database the plan used the “index full scan (min/max)” operation while the other database used a brute force “index fast full scan” operation.

In most circumstances the starting point to address a question like this is to check whether some configuration details, or some statistics, or the values used in the query are sufficiently different to result in a significant change in costs; and the first simple procedure you can follow is to hint each database to use the plan from the opposite database to see if this produces any clues about the difference – it’s a good idea when doing this test to use one of the more verbose formatting options for the call to dbms_xplan.

In this case, though, the OP discovered a note on MoS reporting exactly the problem he was seeing:

Doc ID 2144428.1: Optimizer Picking Wrong ‘INDEX FAST FULL SCAN’ Plan vs Correct ‘INDEX FULL SCAN (MIN/MAX)’

which referred to

Bug 22662807: OPTIMIZER PICKING INDEX FFS CAN INSTEAD OF MIN/MAX

Conveniently the document suggested a few workarounds:

  • alter session set optimizer_features_enable = ‘11.2.0.3’;
  • alter session set “_fix_control” = ‘13430622:off’;
  • delete object stats [Ed: so that dynamic sampling takes place … maybe a /*+ dynamic_sampling(alias level) */ hint would suffice].

Of the three options my preference would (at least in the short term) be the _fix_control one. Specifically, from the v$system_fix_control view, we can see that it addresses the problem very precisely with the description: “index min/max cardinality estimate fix for filter predicates”.

The example in the bug note showed a very simple statement (even more simple than the OP’s query which was only a single table query anyway), so I thought I’d build a model and run a few tests to see what was going on. Luckily, before I’d started work, one of the other members of the Oak Table network sent an email to the list asking if anyone knew how the optimizer was costing an example he’d constructed – and I’ve finally got around to looking at his example, and here’s the model and answer(s), starting with the data set:


rem
rem     Script:         test_min_max.sql
rem     Dated:          March 2017
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.2.0.3
rem

create table min_max_test nologging
as
with ids as (
        select /*+ Materialize */ rownum  id from dual connect by rownum <= 50000 -- > comment to protect formatting
),
line_nrs as (
        select /*+ Materialize */  rownum line_nr from dual connect by rownum <= 20 -- > comment to protect formatting
)
select
        id, line_nr ,rpad(' ', 800, '*') data
from
        line_nrs, ids
order by
        line_nr, id
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'min_max_test',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index mmt_ln_id on min_max_test (line_nr, id) nologging;
create index mmt_id    on min_max_test (id)          nologging;

The table has two critical columns: each id has 20 line_nr values associated with it, but the way the data was generated means that the line numbers for a given id are scattered across 20 separate table blocks.

There are two indexes – one on the id which will allow us to find all the rows for a given id as efficiently as possible, and one (slightly odd-looking in this context) that would allow us to find a specific row for a given line_nr and id very efficiently. Two things about these indexes – in a live application they should both be compressed on the first (only, in the case of index mmt_id) column, and secondly the necessity of the mmt_id index is questionable and it might be an index you could drop if you reversed the order of the columns in mmt_ln_id. The thing about these indexes, though, is that they allow us to demonstrate a problem. So let’s query the data – twice, hinting each index in turn:


variable b1 number;
exec :b1 := 50000;

set serveroutput off

select
        /*+ index(t(id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

select
        /*+ index(t(line_nr, id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

It’s fairly safe to make a prediction about the execution plan and cost of the first query – it’s likely to be a range scan that accesses a couple of branch blocks, a leaf block and 20 separate table blocks followed by a “sort aggregate” – with a cost of about 23.

It’s a little harder to make a prediction about the second query. The optimizer could infer that the min(line_nr) has to be close to the left hand section of the index, and could note that the number of rows in the table is the same as the product of the number of distinct values of the two separate columns, and it might note that the id column is evenly distributed (no histogram) across the data, so it might “guess” that it need only range scan all the entries for the first line_nr to find the appropriate id. So perhaps the optimizer will use the index min/max range scan with a cost that is roughly 2 branch blocks plus total leaf blocks / 20 (since there are 20 distinct values for line_nr); maybe it would divide the leaf block estimate by two because “on average” – i.e. for repeated random selections of value for id – it would have to scan half the leaf blocks. There were 2,618 leaf blocks in my index, so the cost should be close to either 133 or 68.

Here are the two plans – range scan first, min/max second:


select  /*+ index(t(id)) */  min(line_nr) from  min_max_test t where id = :b1
-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       |    23 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MIN_MAX_TEST |    20 |   160 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | MMT_ID       |    20 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"=:B1)

select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |    22 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |    22   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Spot on with the estimate for the simple range scan – but what did we do wrong with the estimate for the min/max scan ? You might notice in the first example the “table access by rowid batched” and realise that this is running on 12c. Here’s the plan if I get if I set the optimizer_features_enable back to 11.2.0.3 before running the second query again:


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   136 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |   136   (1)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |   136   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Using the 11.2.0.3 optimizer model the plan has a cost that’s very close to our prediction – we’ll see why there’s a slight difference in a moment. If we set the optimizer_features_enable to 11.2.0.4 the cost drops back to 22. So for our example 11.2.0.3 will use the simple “index range scan” and an upgrade to 11.2.0.4 (or higher) will switch to the “index full scan (min/max)”. If you look at the OTN posting the impact of the change in costing is exactly the other way around – 11.2.0.3 uses the min/max path, 11.2.0.4 uses the simple index range scan.

The techy bit

You really don’t need to know this – experimenting with the optimizer_features_enable (or _fix_control) will give you plans that show you all the numbers you need to see to check whether or not you’ve run into this particular problem – but if you’re interested here’s a little bit from the two 10053 trace files. We need only look at a few critical lines. From the 11.2.0.3 costing for the min/max scan:


Index Stats::
  Index: MMT_ID  Col#: 1
  LVLS: 2  #LB: 2202  #DK: 50000  LB/K: 1.00  DB/K: 20.00  CLUF: 1000000.00  NRW: 1000000.00
  Index: MMT_LN_ID  Col#: 2 1
  LVLS: 2  #LB: 2618  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 125000.00  NRW: 1000000.00

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 135.000000  resc_cpu: 961594
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 135.697679  Resp: 135.697679  Degree: 1

I was running 12.1.0.2 so there were a few extra bits and pieces that I’ve deleted (mostly about SQL Plan Directives and in-memory). Critically we can see that the stats collection has a small error for the ID column – 50,536 distinct values (NDV) instead of exactly 50,000. This seems to have given us a cost for the expected index range of: 2 (blevel) + ceiling(2618 (leaf blocks) * 50536 / 1000000) = 2 + ceil(132.3) = 135, to which we add a bit for the CPU and get to 136. (Q.E.D.)

Then we switch to costing for 11.2.0.4:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 21.787874  resc_cpu: 156872
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 22.324608  Resp: 22.324608  Degree: 1

We still have the small error in the number of distinct values for id, so the estimated number of rows that we need to access from the table for a given id (before “aggregating” to find its minimum line_nr) is 19.787874 (Computed: / Non Adjusted:) rather than exactly 20. Notice, then, that the cost of using the index is 19.787874 + 2 which looks suspiciously like adding the blevel to the number of table blocks to get a cost and forgetting that we might have to kiss a lot of frogs before we find the prince. Basically, in this example at least, it looks like the costing algorithm has NOTHING to do with the mechanics of what actually has to happen at run-time.

Footnote

This is only an initial probe into what’s going on with the min/max scan; there are plenty more patterns of data that would need to be tested before we could have any confidence that we had produced a generic model of how the optimizer does its calculations – the only thing to note so far is that there IS a big change as  you move from 11.2.0.3 to later versions: the case on OTN showed the min/max scan disappearing on the upgrade, the example above shows the min/max disappearing on the downgrade – either change could be bad news for parts of a production system.

There are a couple of related bugs that might also be worth reviewing.

  • Bug 11834402 : CBO CHOOSES A SLOW INDEX FULL SCAN OVER A MUCH FASTER INDEX RANGE SCAN
  • Bug 13430622 : INDEX SCAN IN VERY SLOW FOR ONE PREDICATE AND FAST FOR OTHERS

There is a note, though that this last bug was fixed in 12.1

Footnote 2

When experimenting, one idea to pursue as the models get more complex and you’re using indexes with more than two columns is to test whether the presence of carefully chosen column group statistics might make a difference to the optimizer’s estimates of cardinality (hence cost) of the min/max scan.

January 10, 2017

Join Elimination 12.2

Filed under: 12c,Bugs,Execution plans,Oracle — Jonathan Lewis @ 1:03 pm BST Jan 10,2017

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:


rem
rem	Script:		join_elimination_12c2.sql
rem	Author:		Jonathan Lewis
rem	

-- Environment details eliminated

define m_pad=100

/*
	IDs will be 1 to 1000
*/

create table grandparent 
as
select 
	rownum			id,
	trunc((rownum-1)/5)	small_num_g,
	rpad(rownum,10)		small_vc_g,
	rpad(rownum,&m_pad)	padding_g
from 
	all_objects 
where 
	rownum <= 1000
;

/*
	Each GP has two (scattered) children here
	Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
	1+mod(rownum,1000)	id_g,
	rownum			id,
	trunc((rownum-1)/5)	small_num_p,
	rpad(rownum,10)		small_vc_p,
	rpad(rownum,&m_pad)	padding_p
from 
	all_objects 
where 
	rownum <= 2000
;

/*
	Simple trick to get 5 (clustered) children per parent
	Child IDs are 1 to 12,000
*/

create table child 
as
select 
	id_g,
	id			id_p,
	rownum			id,
	trunc((rownum-1)/5)	small_num_c,
	rpad(rownum,10)		small_vc_c,
	rpad(rownum,&m_pad)	padding_c
from 
	parent	p,
	(
		select /*+ no_merge */ 
			rownum 
		from	parent p 
		where	rownum <= 5
	)	d
;

create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem	Don't need to collect stats because it's 12c
rem

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered gp -> p -> c
prompt	The final plan is GP->C, The optimizer eliminated P before
prompt	considering GP
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered c -> p -> gp
prompt	The final plan is a tablescan of C only. The optimizer managed 
prompt	to eliminate GP first and P second
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	child		c,
	parent		p,
	grandparent	g 
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	==================================================
prompt	Convert to ANSI standard in the order gp -> p -> c
prompt	and both gp and p eliminated.
prompt	==================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g
join
	parent		p
on	p.id_g = g.id
join
	child		c
on	c.id_p = p.id
and	c.id_g = p.id_g
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===================================================
prompt	Convert to ANSI standard in the order c -> p -> gp
prompt	and only p is eliminated. 
prompt	===================================================

select 
	count(c.small_vc_c)
from 
	child		c
join
	parent		p
on      p.id   = c.id_p 
and	p.id_g = c.id_g 
join
	grandparent	g
on	g.id = p.id_g 
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.

Footnote:

If you’re wondering why the “traditional” and “ANSI” syntax should exhibit this flaw for joins in the opposite direction – remember that ANSI SQL is first transformed into an equivalent Oracle form and – in the simple cases – the first two tables form the first query block then each table after that introduces a new query block, so the optimizer strategy does (approximately) the following translation:


select ... from grandparent join parent join child

==>

select ... from (select ... from grandparent join parent) join child

The optimizer then optimizes the inline query, which eliminates grandparent leaving a join between parent and child, which then allows parent to be eliminated.

Conversely we get:

select ... form child join parent join grandparent 

==>

select ... from (select ... from child join parent) join grandparent

In this form the optimizer eliminates parent from the inline view and is left with a join between child and grandparent – so no further elimination.

August 31, 2016

Parallel_index hint

Filed under: Bugs,CBO,Oracle,Parallel Execution — Jonathan Lewis @ 1:28 pm BST Aug 31,2016

Prompted by a recent OTN posting I’ve dug out from my library the following demonstration of an anomalty with the parallel_index() hint. This note is a warning about  how little we understand hints and what they’re supposed to mean, and how we can be caught out by an upgrade. We’ll start with a data set which, to match a comment made in the origina posting rather than being a necessity for the demonstration, has an index that I’ve manipulated to be larger than the underlying table:


rem
rem     Script:         parallel_index_hint_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          December 2005
rem

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1e4)         modded,
        lpad(rownum,10,'0')     v1,
        lpad('x',30,'x')        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(modded) pctfree 75;
alter table t1 modify modded not null;

Your figures may vary slightly if you try to recreate this model, but according to my stats blocks=7876 for the table and leaf_blocks=8054 for the index. Now I’m going to generate the execution plans for a simple query – count(modded) from the table – with a varying selection of hints.


select /*+ index_ffs(t1 t1_i1) */ count(modded) from t1;

select /*+ parallel_index(t1 t1_i1 20) */ count(modded) from t1;

select /*+ index_ffs(t1 t1_i1) parallel_index(t1 t1_i1 20) */ count(modded) from t1;

select /*+ parallel_index(t1 t1_i1 20) parallel(t1 10) */ count(modded) from t1;

Pause for thought

  • which of the four statements will have a plan that uses an index fast full scan ?
  • which of the four statements will have a plan that indicates parallel execution ?
  • which of the four statements will indicate a parallel index fast full scan ?
  • why are two of the plans going to be identical but with different costs ?

The most interesting bit of this note is in the last question because it’s also the answer to a more subtle “why didn’t Oracle do what I thought it should” question. Here are the four plans I got from an instance of 11.2.0.4 (with a little labelling to remind us about the hinting):


==================================================
index_ffs() on it's own - get index fast full scan
==================================================

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |  1043   (4)| 00:00:06 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|  1043   (4)| 00:00:06 |
-----------------------------------------------------------------------

=====================================================
parallel_index() on it's own
Get serial tablescan which is cheaper than serial FFS
=====================================================

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  1027   (4)| 00:00:06 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  1027   (4)| 00:00:06 |
-------------------------------------------------------------------

========================================================
parallel_index() with index_ffs()
Get parallel fast full scan - at same cost as SERIAL FFS
========================================================

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |  1043   (4)| 00:00:06 |        |      |            |
|   1 |  SORT AGGREGATE           |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000 |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |          |  1000K|  1043   (4)| 00:00:06 |  Q1,00 | PCWC |            |
|   6 |       INDEX FAST FULL SCAN| T1_I1    |  1000K|  1043   (4)| 00:00:06 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------

===============================================
parallel_index() with parallel()
Get parallel fast full scan - costed correctly.
(Not costed at the degree given for table).
===============================================

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |    58   (4)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE           |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000 |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |          |  1000K|    58   (4)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       INDEX FAST FULL SCAN| T1_I1    |  1000K|    58   (4)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------

First plan – we supplied the index_ffs() hint, it’s valid: so Oracle did an index fast full scan. The serial cost was 1,043.

Second plan – we supplied the parallel_index() hint: but the optimizer apparently didn’t consider the cost of a parallel index fast full scan (which ought to have cost something like (1043/20)/0.9 = 58; instead it chose the serial tablescan at a cost of 1027 (cheaper than a serial index fast full scan because of the way I had defined a very large index).

Third plan – we supplied the parallel_index() hint with an explicit index_ffs() hint: both were legal so the optimizer obeyed the hints and produced a parallel index fast full scan (which is what we might have hoped would happen automatically for the second query). But the cost of the query is 1,043 – the cost of the serial index fast full scan.

Final plan – we didn’t hint an index_ffs() we hinted parallel() and parallel_index(): “strangely” Oracle has selected the parallel index fast full scan – and this time the cost is 58 (which, as I pointed out above, is (1043/20)/0.9, the value it’s “supposed” to be). It looks as if Oracle uses the parallel scaling factor on the index only if the table is ALSO hinted (or defined) to be treated to parallel execution.

Note, by the way, that I’ve hinted the table parallel 10 so that the cost of the parallel tablescan (ca. (1027/10)/0.9 = 114) will be larger than the cost of the parallel index fast full scan. Normally you’re likely to have the same degree hinted for table and index, and the index is likely to be smaller than the table which means that if you see this anomaly in production it’s more likely to be a case of Oracle using a parallel tablescan when you’re expecting a parallel index fast full scan. Oracle will have compared the serial cost of an index fast full scan against the parallel cost of the table.

Footnote:

When I ran the script on 12c there was one important change. The cost of the “correctly costed” index fast full scan came out at 103 rather than 58. Oracle has used the degree from the parallel() hint for the tablescan to calculate the cost of the parallel index fast full scan. If you are unlucky enough to have some code that has conflicting degrees hinted (or defined) across tables and indexes then you may see some plans change because parallel index fast full scans suddenly change their cost.

Addendum (prompted by Randolf Geist’s comment): The plan in 12c also had a Note: “Degree of Parallelism is 4 because of table property”. This was a little odd because the degree of the table was 1 and the hint said 10, but the arithmetic of the tablescan had ( as indicated) used a DOP of 4 – despite reporting Degree=0 in the 10053 trace file. The only way I could get the number 4 was to look at the CPU parameters – bearing in mind the critical parallel parameters:

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     2
parallel_threads_per_cpu             integer     2

parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_io_cap_enabled              boolean     FALSE

The anomaly of “degree 10” appearing in the costing for the index fast full scan was echoed in the 10053 trace: the file reported “Degree: 20”, then showed the cost being divided by 10.

At run-time, though, the query DID run parallel 20 when hinted with both the parallel() and parallel_index() hints, and ran parallel 10 (while costing and repeating the note about parallel 4) when modified to have only the /*+ parallel(t1 10) */ hint.

July 26, 2016

Lost Concatenation

Filed under: Bugs,CBO,Oracle — Jonathan Lewis @ 10:46 am BST Jul 26,2016

This note models one feature of a problem that came up at a client site recently from a system running 12.1.0.2 – a possible bug in the way the optimizer handles a multi-column in-list that can lead to extremely bad cardinality estimates.

The original query was a simple three table join which produced a bad plan with extremely bad cardinality estimates; there was, however, a type-mismatch in one of the predicates (of the form “varchar_col = numeric”), and when this design flaw was addressed the plan changed dramatically and produced good cardinality estimates. The analysis of the plan, 10053 trace, and 10046 trace files done in-house suggested that the problem might relate in some way to an error in the handling of SQL Plan Directives to estimate cardinalities.

This was one of my “solve it in a couple of hours over the internet” assignments and I’d been sent a sample of the original query with the 10046 and 10053 trace files, and a modified version of the query that bypassed the problem, again including the 10046 and 10053 trace files, with a request to explain the problem and produce a simple test case to pass to Oracle support.

The first thing I noticed was that there was something very strange about the execution plan. Here’s the query and plan in from my simplified model, showing the same anomaly:


select  /*+ no_expand */
        count(*)
from    t1, t2
where
        t2.shipment_order_id = t1.order_id
and     (t1.id, t2.v1) in ( (5000, 98), (5000, 99))
;

-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |       |       |       |   331 (100)|          |
|   1 |  SORT AGGREGATE                       |       |     1 |    19 |       |            |          |
|*  2 |   HASH JOIN                           |       |     1 |    19 |  2056K|   331   (5)| 00:00:01 |
|   3 |    TABLE ACCESS FULL                  | T2    |   100K|   878K|       |   219   (3)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100K|   976K|       |     2   (0)| 00:00:01 |
|   5 |     BITMAP CONVERSION TO ROWIDS       |       |       |       |       |            |          |
|   6 |      BITMAP OR                        |       |       |       |       |            |          |
|   7 |       BITMAP CONVERSION FROM ROWIDS   |       |       |       |       |            |          |
|*  8 |        INDEX RANGE SCAN               | T1_PK |       |       |       |     1   (0)| 00:00:01 |
|   9 |       BITMAP CONVERSION FROM ROWIDS   |       |       |       |       |            |          |
|* 10 |        INDEX RANGE SCAN               | T1_PK |       |       |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."SHIPMENT_ORDER_ID"="T1"."ORDER_ID")
       filter((
                  (TO_NUMBER("T2"."V1")=98 AND "T1"."ID"=5000) 
               OR (TO_NUMBER("T2"."V1")=99 AND "T1"."ID"=5000)
       ))
   8 - access("T1"."ID"=5000)
  10 - access("T1"."ID"=5000)

Before going on I meed to remind you that this is modelling a production problem. I had to use a hint to block a transformation that the optimizer wanted to do with my data set and statistics, I’ve got a deliberate type-mismatch in the data definitions, and there’s a simple rewrite of the SQL that would ensure that Oracle does something completely different).

The thing that caught my eye was the use of the bitmap transformation (operations 5,7,9) using exactly the same index range scan twice (operations 8,10). Furthermore, though not visible in the plan, the index in question was (as the name suggests) the primary key index on the table and it was a single column index – and “primary key = constant” should produce an “index unique scan” not a range scan.

Once you’ve added in the fact that operations 8 and 10 are the same “primary key = constant” predicates, you can also pick up on the fact that the cardinality calculation for the table access to table t1 can’t possibly produce more than one row – but it’s reporting a cardinality estimate of 100K rows (which happens to be the number of rows in the table.)

As a final point, you can see that there are no “Notes” about Dynamic Statistics or SQL Directives – this particular issue is not caused by anything to do with 12c sampling. In fact, having created the model, I ran it on 11.2.0.4 and got the same strange bitmap conversion and cardinality estimate. In the case of the client, the first pass the optimizer took went through exactly the same sort of process and produced a plan which was (probably) appropriate for a query where the driving table was going to produce (in their case) an estimated 4 million rows – but not appropriate for the actual 1 row that should have been identified.

In my example, if I allowed concatenation (i.e. removed the no_expand hint) I got the following plan:


------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |       |       |     8 (100)|          |
|   1 |  SORT AGGREGATE                        |       |     1 |    19 |            |          |
|   2 |   CONCATENATION                        |       |       |       |            |          |
|   3 |    NESTED LOOPS                        |       |     1 |    19 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID        | T1    |     1 |    10 |     2   (0)| 00:00:01 |
|*  5 |      INDEX UNIQUE SCAN                 | T1_PK |     1 |       |     1   (0)| 00:00:01 |
|*  6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |     9 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |    NESTED LOOPS                        |       |     1 |    19 |     4   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID        | T1    |     1 |    10 |     2   (0)| 00:00:01 |
|* 10 |      INDEX UNIQUE SCAN                 | T1_PK |     1 |       |     1   (0)| 00:00:01 |
|* 11 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |     9 |     2   (0)| 00:00:01 |
|* 12 |      INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1"."ID"=5000)
   6 - filter(TO_NUMBER("T2"."V1")=99)
   7 - access("T2"."SHIPMENT_ORDER_ID"="T1"."ORDER_ID")
  10 - access("T1"."ID"=5000)
  11 - filter((TO_NUMBER("T2"."V1")=98 AND (LNNVL(TO_NUMBER("T2"."V1")=99) OR
              LNNVL("T1"."ID"=5000))))
  12 - access("T2"."SHIPMENT_ORDER_ID"="T1"."ORDER_ID")

This is a much more appropriate plan – and similar to the type of plan the client saw when they eliminated the type-mismatch problem (I got a completely different plan when I used character values ’98’ and ’99’ in the in-list or when I used a numeric column with numeric literals).

Examining my 10053 trace file I found the following:

  • In the BASE STATISTICAL INFORMATION, the optimizer had picked up column statistics about the order_id column, but not about the id column in the in-list – this explained why the cardinality estimate was 100K, Oracle had “lost” the predicate.
  • In the “SINGLE TABLE ACCESS PATH”, the optimizer had acquired the statistics about the id column and calculated the cost of using the t1_pk index to access the table for a single key (AllEqUnique), then calculated the cost of doing a bitmap conversion twice (remember we have two entries in the in-list – it looks like the optimizer has “rediscovered” the predicate). But it had still kept the table cardinality of 4M.

After coming up with a bad plan thanks to this basic cardinality error, the 10053 trace file for the client’s query then went on to consider or-expansion (concatenation). Looking at this part of their trace file I could see that the BASE STATISTICAL INFORMATION now included the columns relevant to the in-list and the SINGLE TABLE ACCESS PATH cardinalities were suitable. Moreover when we got to the GENERAL PLANS the join to the second table in the join order showed a very sensible cost and cardinality – unfortunately, having been sensible up to that point, the optimizer then decided that an SQL Plan Directive should be used to generate a dynamic sampling query to check the join cardinality and the generated query again “lost” the in-list predicate, resulting in a “corrected” cardinality estimate of 6M instead of a correct cardinality estimate of 1. As usual, this massive over-estimate resulted in Oracle picking the wrong join method with a huge cost for the final join in the client’s query – so the optimizer discarded the or-expansion transformation and ran with the bad bitmap/hash join plan.

Bottom line for the client – we may have seen the same “lose the predicate” bug appearing in two different ways, or we may have seen two different “lose the predicate” bugs – either way a massive over-estimate due to “lost” predicates during cardinality calculations resulted in Oracle picking a very bad plan.

Footnote:

If you want to do further testing on the model, here’s the code to generate the data:


create table t1
nologging
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  id,
        rownum                                  order_id,
        rpad('x',100)                           padding
from
        generator, generator
where
        rownum <= 1e5
;

execute dbms_stats.gather_table_stats(user,'t1')

alter table t1 modify order_id not null;
alter table t1 add constraint t1_pk primary key(id);


create table t2
nologging
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  shipment_order_id,
        mod(rownum-1,1000)                      n1,
        cast(mod(rownum-1,1000) as varchar2(6)) v1,
        rpad('x',100)                           padding
from
        generator, generator
where
        rownum <= 1e5
;

execute dbms_stats.gather_table_stats(user,'t2')

alter table t2 modify shipment_order_id not null;
create index t2_i1 on t2(shipment_order_id);

The interesting question now is WHY does Oracle lose the predicate – unfortunately my model may be too simplistic to allow us to work that out, but it might be sufficient to make it easy for an Oracle developer to see what’s going on and how best to address it. There is one bug on MoS (23343961) that might be related in some way, but I wasn’t convinced that the description was really close enough.

Update

This issue is now recorded on MoS as: Bug 24350407 : WRONG CARDINALITY ESTIMATION IN PRESENCE OF BITMAP OR

 

July 8, 2016

DML and Bloom

Filed under: Bugs,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 1:01 pm BST Jul 8,2016

One of the comments on my recent posting about “Why use pl/sql bulk strategies over simple SQL” pointed out that it’s not just distributed queries that can change plans dramatically when you change from a simple select to “insert into … select …”; there’s a similar problem with queries that use Bloom filters – the filter disappears when you change from the query to the DML.

This seemed a little bizarre, so I did a quick search on MoS (using the terms “insert select Bloom Filter”) to check for known bugs and then tried to run up a quick demo. Here’s a summary of the related bugs that I found through my first simple search:

Bug 20112932 : BLOOM FILTER IS NOT CHOOSED WHEN EXECUTING INSERT SELECT STATEMENT
11.2.0.4 - Nov 2014: "not a bug".
This cites three more documents:
  Bug 13801198  BLOOM PRUNING/FILTER NOT USED IN DML STATEMENTS 
    11.2.0.3 - March 2012 - Duplicate, base bug: 18949550
    18949550: MERGE QUERY NOT USING BLOOM FILTER
    11.2.0.3 - June 2014: fixed in 12.2 (no patches yet)

  Bug 14325392  BLOOM FILTER NOT SUPPORTED IN PARTITION WISE JOIN 
    11.2.0.3 - July 2012 - fixed in 12.1 
    (but I think this is a PX filter, not a partition filter)

  Bug 17716301  BLOOM FILTER NOT USED FOR OUTER JOIN</pre>

    Aug 2014 - hidden bug note. Patches available for 11.2.0.3
    Note 1919508.1 describes the problem.

And here’s some code to demonstrate the problem – running 11.2.0.4 (with similar results on 12.1.0.2).


rem
rem     Script:         dml_bloom_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016
rem

create table pt_hash (
        id,
        grp,
        small_vc,
        padding
)
nologging
pctfree 90 pctused 10
partition by hash(grp)
(
        partition p1,
        partition p2,
        partition p3,
        partition p4
)
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        trunc(rownum/50)                grp,
        to_char(trunc(rownum/20))       small_vc,
        rpad('x',100)                   padding
from
        generator, generator
where
        rownum <= 1e5
;

create table t1
as
select
        rownum  id,
        rownum  n1,
        lpad(rownum,10) v1,
        rpad('x',100) padding
from
        dual
connect by
        rownum <= 4
;

create table target (
        n1      number,
        id      number
)
;

-- gather stats on t1 and pt_hash (for all columns size 1, otherwise default)

There’s nothing fancy in this code, just a hash partitioned table with 100,000 rows (and a lot of wasted space), and a “driving” table with 4 rows that I can use in a very simple join to the partitioned table. I’ve also got an empty table ready for an insert statement.

So here’s a query with its execution plan (with rowsource execution statistics) that (based on the content of the t1 table) should select from at most two of the four partitions. I’ve hinted a hash join which is where Bloom filtering can be very effective with partitioned table joins:


set serveroutput off
alter session set statistics_level = all;

select
        /*+
                leading(t1 h1)
                use_hash(h1)    no_swap_join_inputs(h1)
        */
        t1.n1,
        h1.id
from
        t1,
        pt_hash h1
where
        t1.id between 2 and 3
and     h1.grp = t1.n1
and     h1.id <= 100
;

select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));

------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |      1 |        |  2115 (100)|       |       |      1 |00:00:00.16 |    8374 |   8367 |       |       |          |
|*  1 |  HASH JOIN                  |         |      1 |      3 |  2115   (2)|       |       |      1 |00:00:00.16 |    8374 |   8367 |  2440K|  2440K|  777K (0)|
|   2 |   PART JOIN FILTER CREATE   | :BF0000 |      1 |      3 |     2   (0)|       |       |      2 |00:00:00.01 |       2 |      0 |       |       |          |
|*  3 |    TABLE ACCESS FULL        | T1      |      1 |      3 |     2   (0)|       |       |      2 |00:00:00.01 |       2 |      0 |       |       |          |
|   4 |   PARTITION HASH JOIN-FILTER|         |      1 |    100 |  2113   (2)|:BF0000|:BF0000|     51 |00:00:00.16 |    8372 |   8367 |       |       |          |
|*  5 |    TABLE ACCESS FULL        | PT_HASH |      2 |    100 |  2113   (2)|:BF0000|:BF0000|     51 |00:00:00.16 |    8372 |   8367 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Predicates and Outline redacted

The most significant points to notice are:

  • Operation 2 – Part Join Filter Created: we’ve created a Bloom filter
  • Operation 4 – Partition Hash Join – Filter: we’ve used the Bloom filter, Pstart/Pstop = :BF0000
  • Operations 5 – Table Access Full: starts twice, the two partitions we need

And now we use the same SQL statement but insert the result set into the holding table:

insert into target(n1, id)
select
        /*+
                ordered
                use_hash(h1)    no_swap_join_inputs(h1)
        */
        t1.id,
        h1.id
from
        t1,
        pt_hash h1
where
        t1.id between 2 and 3
and     h1.grp = t1.n1
and     h1.id <= 100
;

select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |         |      1 |        |  2115 (100)|       |       |      0 |00:00:00.02 |   16698 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |         |      1 |        |            |       |       |      0 |00:00:00.02 |   16698 |       |       |          |
|*  2 |   HASH JOIN              |         |      1 |      3 |  2115   (2)|       |       |      1 |00:00:00.02 |   16689 |  1969K|  1969K|  723K (0)|
|*  3 |    TABLE ACCESS FULL     | T1      |      1 |      3 |     2   (0)|       |       |      2 |00:00:00.01 |       2 |       |       |          |
|   4 |    PARTITION HASH ALL    |         |      1 |    100 |  2113   (2)|     1 |     4 |    100 |00:00:00.02 |   16687 |       |       |          |
|*  5 |     TABLE ACCESS FULL    | PT_HASH |      4 |    100 |  2113   (2)|     1 |     4 |    100 |00:00:00.02 |   16687 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------

-- Predicate and Outline sections redacted

In this case we don’t see any operation to create a Bloom filter, and operation 5 reports 4 starts, one for each partition, with the Pstart/Pstop showing 1 – 4. As a cross-check you can also see that the buffers visited has doubled and the time (though don’t trust it too much on such a small test) has increased.

To my surprise the outline information for the first plan didn’t report a Bloom filter hint (px_join_filter), but I tried adding one to the insert statement anyway – and nothing changed; however there is a “subquery pruning” mechanism that’s been available to the optimizer for a very long time now so I decided to try hinting that for both queries /*+ subquery_pruning(@sel$1 h1@sel$1 partition) */. This had no effect on the select statement, but produced the following plan for the insert:


-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT          |         |      1 |        |  2115 (100)|       |       |      0 |00:00:00.01 |    8390 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL  |         |      1 |        |            |       |       |      0 |00:00:00.01 |    8390 |       |       |          |
|*  2 |   HASH JOIN               |         |      1 |      3 |  2115   (2)|       |       |      1 |00:00:00.01 |    8381 |  1969K|  1969K|  829K (0)|
|*  3 |    TABLE ACCESS FULL      | T1      |      1 |      3 |     2   (0)|       |       |      2 |00:00:00.01 |       2 |       |       |          |
|   4 |    PARTITION HASH SUBQUERY|         |      1 |    100 |  2113   (2)|KEY(SQ)|KEY(SQ)|     51 |00:00:00.01 |    8379 |       |       |          |
|*  5 |     TABLE ACCESS FULL     | PT_HASH |      2 |    100 |  2113   (2)|KEY(SQ)|KEY(SQ)|     51 |00:00:00.01 |    8377 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see operation 4 now reports Partition Hash Subquery – Oracle has “pre-queried” the driving table (you can see the statement in the 10046 trace file if you enable SQL trace) to discover which partitions it will have to visit – and operation 5 starts twice with a Pstart/Pstop of KEY(SQ).

Interestingly the hint didn’t end up in the Outline section of the execution plan, so this is an example of a query which you can’t fix by capturing the SQL Plan Baseline for one statement against the text for another – though you can hack the hint into an SQL Profile or (my preference in this case, I think) an SQL Patch.

Footnote:

After publishing this note I started to browse through the comments on the article I’d linked to reference subquery pruning, and discovered that an example of this insert/select issue appeared there with a surprising workaround, which was to use the “multi-table insert” (with just a single table). At the moment, though I haven’t managed to get this working with this example.

Footnote 2:

I’ve just been reminded by email of another detail mentioned in the comments of the earlier post – this problem isn’t present for direct path (i.e. /*+ append */) inserts – and I have run a quick test on 12.1.0.2 to check that this is indeed the case (though the fix isn’t present in 11.2.0.4).  Don’t forget to check the listed Bug notes on MoS, in case there’s a back-port available for your version of Oracle.

 

July 7, 2016

Invisible Bug

Filed under: 12c,Bugs,CBO,Indexing,Oracle — Jonathan Lewis @ 5:27 pm BST Jul 7,2016

At this Wednesday’s Oracle Midlands event someone asked me if Oracle would use the statistics on invisible indexes for the index sanity check. I answered that there had been a bug in the very early days of invisible indexes when the distinct_key statistic on the index could be used even though the index itself would not be considered as a candidate in the plan (and the invisible index is still used to avoid foreign key locking – even in 12c – it’s only supposed to be invisible to the optimizer).

The bug was fixed quite a long time ago – but a comment on the “Index Sanity” article has introduced me to a related bug that is still present in 11.2.0.4 where the presence of an invisible index can affect an execution plan. Here’s a little model (run under 11.2.0.4) to demonstrate:

rem
rem     Script:         invisible_index_bug.sql
rem     Author:         Jonathan Lewis
rem

execute dbms_random.seed(0)

drop table t2;
drop table t1;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(mod(rownum,1000) as number(8,0))           n1,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table t2
as
select
        rownum id,
        trunc(dbms_random.value(0,10000)) n1
from
        dual
connect by
        level <= 100
;
begin 
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

column n1 new_value m_n1
select n1 from t2 where id = 50;
clear columns

set autotrace traceonly explain

select
        t1.*
from
        t1, t2
where
        t2.n1 = &m_n1
;

create unique index t2_i1 on t2(n1)
-- invisible
;

select
        t1.*
from
        t1, t2
where
        t2.n1 = &m_n1
;

set autotrace off

All I’ve done is create a couple of tables then do a join that we might expect to see executed as a cartesian merge join; at one point I was going to make the data more complicated and include a join condition, but decided to keep things small and simple so it’s a silly example but it is sufficient to make the point. The funny little bit about selecting an n1 value from t2 was also in anticipation of a more complex example but it does, at least, ensure I query for a value that is in range.

Here are the two execution plans from 11.2.0.4 – the key feature is that the plan changes after the invisible index is created:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|   1 |  MERGE JOIN CARTESIAN|      |  1000K|   119M|  2263   (3)| 00:00:12 |
|*  2 |   TABLE ACCESS FULL  | T2   |     1 |     4 |     2   (0)| 00:00:01 |
|   3 |   BUFFER SORT        |      |  1000K|   115M|  2261   (3)| 00:00:12 |
|   4 |    TABLE ACCESS FULL | T1   |  1000K|   115M|  2261   (3)| 00:00:12 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N1"=5308)


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|   1 |  NESTED LOOPS      |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |     4 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   115M|  2261   (3)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N1"=5308)

Notice how the plan without the invisible index does a “sort” (actually a “buffer sort” so it’s just copying the data into local memory), while the plan with the not quite invisible enough index in place gets away with just a full tablescan. This is bug 16564891, associated with base bug 16544878.

The bug notes say “fixed in 12.2”, but in Oracle 12.1.0.2 the first plan appears in both cases, and we have to make the index visible to get the second plan. (Take note of the need for the “negative” test to prove the point; the fact that the same plan appears for both cases doesn’t, by itself, prove that the bug was fixed, we have to show that the plan would have changed if the bug had still been present).

I believe the problem isn’t the problem of Oracle using the statistics when it shouldn’t; the change appears because in 11g Oracle incorrectly allows itself to see the uniqueness of the index and infer that table t2 is a “single row” table. In 12c the optimizer calculates that there will probably be only one row but that doesn’t stop it choosing the merge join cartesian as the “insurance bet” against having to do more than one tablescan of the t1 table. We can see this difference in the 10053 trace files, the 11g file has an entry for the “Single Table Access Path” for t2 that reads:

1-ROW TABLES:  T2[T2]#0

If you read the bug note for bug 16564891 you’ll see that it has a more realistic example of the problem – and it may give you some idea of where you might run into the bug. In general I don’t think many people are likely to come across the problem since it revolves around uniqueness, which is rather an important property, and there can’t be many occasions when someone decides to add (or test dropping) a unique index. Given that the example in the bug looks like “add a unique index to a dimension table that’s joining to a fact table” that may be a good pointer to where you’re most likely to run into the problem — when you’re trying to enforce data correctness in a data warehouse.

 

May 16, 2016

Cursor_Sharing problem

Filed under: 12c,Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 12:49 pm BST May 16,2016

Here’s a possible bug (though maybe “not a bug”) that came up over the weekend on the OTN database forum. An application generating lots of “literal string” SQL was tested with cursor_sharing set to force. This successfully forced the use of bind variable substitution, but a particular type of simple insert statement started generating very large numbers of child cursors – introducing a lot of mutex waits and library cache contention. Here’s a (substituted) statement that was offered as an example of the problem:


INSERT INTO schema.tableName (column1,columns2,..)
VALUES (:"SYS_B_0",:"SYS_B_1",:"SYS_B_2",:"SYS_B_3",:"SYS_B_4",:"SYS_B_5",:"SYS_B_6",timestamp:"SYS_B_7",SYSTIMESTAMP,:"SYS_B_8")

Note particularly the slightly odd looking detail: timestamp:”SYS_B_7″; this is how bind variable substitution looks if you’ve used the “compact” ANSI mechanism for handling datetime literals. We were told, in fact, that the tables had only number, varchar2, and date columns – so it looks a little suspicious when see timestamp values being inserted but the implied coercion wasn’t the source of the problem. Here’s all it takes to see the problem (tested only on 12.1.0.2):


rem
rem     Script:         ansi_datetime_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem

create table t1 (
        n1      number(8,0),
        v1      varchar2(10),
        d1      date,
        t1      timestamp
);

insert into t1 values(-1,'x',sysdate, systimestamp);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

set serveroutput off
alter session set cursor_sharing = force;

prompt  ============================
prompt  Testing DATE literals
prompt  Expect to see child number 2
prompt  ============================

insert into t1 values(1, 'A', date'2016-01-01', null);
insert into t1 values(1, 'A', date'2016-01-02', null);
insert into t1 values(1, 'A', date'2016-01-03', null);

select * from table(dbms_xplan.display_cursor(null,null,'-note -plan_hash'));

prompt  ============================
prompt  Testing TIMESTAMP literals
prompt  Expect to see child number 2
prompt  ============================

insert into t1 values(1, 'A', null, timestamp'2016-01-01 00:00:00');
insert into t1 values(1, 'A', null, timestamp'2016-01-02 00:00:00');
insert into t1 values(1, 'A', null, timestamp'2016-01-03 00:00:00');

select * from table(dbms_xplan.display_cursor(null,null,'-note -plan_hash'));

prompt  ===============================
prompt  Need privilege to see this view
prompt  Pre-coded for the sql_ids above
prompt  ===============================

break on sql_id skip 1

select
        sql_id, child_number, hash_match_failed
from
        v$sql_shared_cursor
where
        sql_id in ('58udhcm270bhn', 'gssz5cbnt7mgn')
order by
        sql_id, child_number
;

A couple of points – there are several more tests in the script demonstrating things that do NOT cause multiple child cursors to appear. I probably didn’t cover all the options that I could have covered but I hit a number of common cases to check that it wasn’t simply that cursor_sharing being severely broken in 12c. I’ve also allowed a side effect to demonstrate the presence of multiple child cursors rather than explcitly listing the child cursors. If the three statements (of each type) had produced shareable cursors then the child number reported by dbms_xplan.display_cursor() would have been zero in both cases. Here are the actual outputs:


SQL_ID  58udhcm270bhn, child number 2
-------------------------------------
insert into t1 values(:"SYS_B_0", :"SYS_B_1", date:"SYS_B_2", null)

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | T1   |       |
-------------------------------------------------


SQL_ID  gssz5cbnt7mgn, child number 2
-------------------------------------
insert into t1 values(:"SYS_B_0", :"SYS_B_1", null, timestamp:"SYS_B_2")


-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | T1   |       |
-------------------------------------------------


And, having pre-coded the script with the SQL_IDs of the two guilty statements, here’s the output identifying the cause of the failure to share from v$sql_shared_cursor:


SQL_ID        CHILD_NUMBER H
------------- ------------ -
58udhcm270bhn            0 N
                         1 Y
                         2 Y

gssz5cbnt7mgn            0 N
                         1 Y
                         2 Y

There is a bug on MoS relating to timestamp columns and failure to share cursors – it doesn’t really look like the same problem but it could be related in some way: Bug 13407937 : HIGH VERSION COUNT FOR INSERT WITH CURSOR_SHARING=FORCE AND TIMESTAMP COLUMN. It’s described as “not a bug” :(

March 8, 2016

Wrong Results

Filed under: Bugs,Hints,Indexing,Oracle,Partitioning — Jonathan Lewis @ 6:57 pm BST Mar 8,2016

Just in – a post on the Oracle-L mailing lists asks: “Is it a bug if a query returns one answer if you hint a full tablescan and another if you hint an indexed access path?” And my answer is, I think: “Not necessarily”:


SQL> select /*+ full(pt_range)  */ n2 from pt_range where n1 = 1 and n2 = 1;

        N2
----------
         1
SQL> select /*+ index(pt_range pt_i1) */ n2 from pt_range where n1 = 1 and n2 = 1;

        N2
----------
         1
         1

The index is NOT corrupt.

The reason why I’m not sure you should call this a bug is that it is a side effect of putting the database into an incorrect state. You might have guessed from the name that the table is a (range) partitioned table, and I’ve managed to get this effect by doing a partition exchange with the “without validation” option.


create table t1 (
        n1      number(4),
        n2      number(4)
);

insert into t1
select  rownum, rownum
from    all_objects
where   rownum <= 5
;

create table pt_range (
        n1      number(4),
        n2      number(4)
)
partition by range(n1) (
        partition p10 values less than (10),
        partition p20 values less than (20)
)
;

insert into pt_range
select
        rownum, rownum
from
        all_objects
where
        rownum <= 15
;
create index pt_i1 on pt_range(n1,n2);

begin
        dbms_stats.gather_table_stats(
                ownname    => user,
                tabname    => 'T1',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname    => user,
                tabname    => 'PT_RANGE',
                method_opt => 'for all columns size 1'
        );
end;
/

alter table pt_range
exchange partition p20 with table t1
including indexes
without validation
update indexes
;

The key feature (in this case) is that the query can be answered from the index without reference to the table. When I force a full tablescan Oracle does partition elimination and looks at just one partition; when I force the indexed access path Oracle doesn’t eliminate rows that belong to the wrong partition – though technically it could (because it could identify the target partition by the partition’s data_object_id which is part of the extended rowid stored in global indexes).

Here are the two execution plans (from 11.2.0.4) – notice how the index operation has no partition elimination while the table operation prunes partitions:


select /*+ full(pt_range)  */ n2 from pt_range where n1 = 1 and n2 = 1

---------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |     2 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|          |     1 |     6 |     2   (0)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS FULL    | PT_RANGE |     1 |     6 |     2   (0)| 00:00:01 |     1 |     1 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=1 AND "N2"=1))


select /*+ index(pt_range pt_i1) */ n2 from pt_range where n1 = 1 and n2 = 1

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     1 (100)|          |
|*  1 |  INDEX RANGE SCAN| PT_I1 |     1 |     6 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("N1"=1 AND "N2"=1)


Note: If I had a query that did a table access by (global) index rowid after the index range scan it WOULD do partition elimination and visit just the one partition – never seeing the data in the wrong partition.

So is it a bug ? You told Oracle not to worry about bad data – so how can you complain if it reports bad data.

Harder question – which answer is the “right” one – the answer which shows you all the data matching the query, or the answer which shows you only the data that is in the partition it is supposed to be in ?

February 2, 2016

Partitioned Bitmap Join

Filed under: bitmaps,Bugs,Indexing,Infrastructure,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 8:32 am BST Feb 2,2016

If you don’t want to read the story, the summary for this article is:

If you create bitmap join indexes on a partitioned table and you use partition exchanges to load data into the table then make sure you create the bitmap join indexes on the loading tables in exactly the same order as you created them on the partitioned table or the exchange will fail with the (truthful not quite complete) error: ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION.

My story starts with this OTN posting from John Hall where he found after a year of successful batch loading one of his partition exchanges was raising error 14098. After an exchange of ideas, user rp0428 came up with a query against sys.jijoin$ (one of the tables behind bitmap join indexes) that allowed John Hall to see that the indexes on the exchange table had been created in a different order from that of the partitioned table. I did a quick test to see if this might be relevant (it shouldn’t be, it isn’t with “normal” indexes or function-based indexes, or virtual columns) and didn’t manage to reproduce the problem with two dimension tables and two bitmap join indexes.

Fortunately John didn’t take my word for it and tested the idea on a clone of the production system – and found that the order of creation did matter. His system, however, had 9 dimension tables and 33 bitmap join indexes – which shouldn’t have made any difference in principle, but maybe it was something to do with having several indexes on the same table,  maybe it was something to do with have far more tables or far more indexes than I had. So I built a larger test case with 6 dimension tables and six indexes per table – and reproduced the problem.

Then I started cutting back to see where the problem appeared, and found that all it took was one dimension with two indexes, or two dimensions with one index each – whatever I had done in my “quick test” I had clearly done it too quickly and done something wrong. (Unfortunately I had overwritten most of the code from the original quick test while building the larger test, so I couldn’t go back and see where the error was.)

Here, then, is the minimal test case that I finally ran to demonstrate that switching the order of index creation on the exchange table causes the exchange to fail:


drop table pt_range purge;
drop table t1 purge;
drop table dim_1 purge;
drop table dim_2 purge;

prompt  =================
prompt  Partitioned table
prompt  =================

create table pt_range (
        id,
        grp1,
        grp2,
        padding
)
nologging
partition by range(id) (
        partition p2001 values less than (2001),
        partition p4001 values less than (4001),
        partition p6001 values less than (6001),
        partition p8001 values less than (8001)
)
as
select
        rownum                          id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 8000
;

prompt  ================================================
prompt  Exchange table - loaded to match partition p8001
prompt  ================================================

alter table pt_range 
add constraint pt_pk primary key (id) using index local;

create table t1 (
        id,
        grp1,
        grp2,
        padding
)
as 
select
        rownum + 6000                   id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 2000
;

alter table t1
add constraint t1_pk primary key (id);

execute dbms_stats.gather_table_stats(user,'pt_range')
execute dbms_stats.gather_table_stats(user,'t1')

prompt  ================
prompt  dimension tables
prompt  ================

create table dim_1 
as 
select distinct 
        grp1, 
        cast('A'||grp1 as varchar2(3)) agrp1,
        cast('B'||grp1 as varchar2(3)) bgrp1
from
        t1
;

create table dim_2 as select * from dim_1;

prompt  ===============================
prompt  Primary keys required for BMJIs
prompt  ===============================

alter table dim_1 add constraint d1_pk primary key (grp1);
alter table dim_2 add constraint d2_pk primary key (grp1);

execute dbms_stats.gather_table_stats(user,'dim_1')
execute dbms_stats.gather_table_stats(user,'dim_2')

prompt  ============================
prompt  Creating bitmap join indexes
prompt  ============================

create bitmap index pt_1a on pt_range(d1.agrp1) from pt_range pt, dim_1 d1 where d1.grp1 = pt.grp1 local ;
create bitmap index pt_2a on pt_range(d2.agrp1) from pt_range pt, dim_2 d2 where d2.grp1 = pt.grp2 local ;

prompt  ====================================================
prompt  Pick your index creation order on the exchange table
prompt  ====================================================

create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;
create bitmap index t1_2a on t1(d2.agrp1) from t1, dim_2 d2 where d2.grp1 = t1.grp2 ;
-- create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;

prompt  ==================
prompt  Exchanging (maybe)
prompt  ==================

alter table pt_range
        exchange partition p8001 with table t1
        including indexes
        without validation
;

I’ve got the same create statement twice for one of the bitmap join indexes – as it stands the indexes will be created in the right order and the exchange will work; if you comment out the first t1_1a create and uncomment the second the exchange will fail. (If you comment out the ‘including indexes’ then the exchange will succeed irrespective of the order of index creation, but that rather defeats the point of being able to exchange partitions.)

I’ve reproduced the problem in 12.1.0.2, 11.2.0.4 and 10.2.0.5

Footnote

Running an extended trace didn’t help me work out how Oracle is detecting the mismatch, presumably it’s something that gets into the dictionary cache in a general “load the index definition” step; but it did show me that (in the “without validation” case) the code seems to check the correctness of the exchange table’s primary key data BEFORE checking whether the indexes match properly.

January 11, 2016

Subquery Effects

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 12:50 pm BST Jan 11,2016

Towards the end of last year I used a query with a couple of “constant” subqueries as a focal point for a blog note on reading parallel execution plans. One of the comments on that note raised a question about cardinality estimates and, coincidentally, I received an email about the cost calculations for a similar query a few days later.

Unfortunately there are all sorts of anomalies, special cases, and changes that show up across versions when subqueries come into play – it’s only in recent versions of 11.2, for example, that a very simple example I’ve got of three equivalent statements that produce the same execution plan report the same costs and cardinality. (The queries are:  table with IN subquery, table with EXISTS subquery, table joined to “manually unnested” subquery – the three plans take the unnested subquery shape.)

I’m just going to pick out one particular anomaly, which is a costing error with multiple subqueries when “OR-ed”. Here’s my sample data set:


create table t1
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 20000
;


create table t2
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 25000
;

create table t3
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 30000
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t2',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t3',
                method_opt       => 'for all columns size 1'
        );
end;
/

The three tables are slightly different sizes so that it will be easy to see different costs of tablescans, and there are no indexes to everything I do in the queries will be tablescans. Here are six queries I’m going to test – they all scan t1, with “constant” subqueries against t2 and/or t3. The first pair is just to show you the basic cost of the query with a single subquery, the second pair shows you the default action with two subqueries in two different orders, the final pair shows you what happens with two subqueries when you block subquery pushing.


select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n2 > (select avg(t2.n2) from t2)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n3 > (select avg(t3.n3) from t3)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select avg(t2.n2) from t2)
         or t1.n3 > (select avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select avg(t3.n3) from t3)
         or t1.n2 > (select avg(t2.n2) from t2)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
         or t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
         or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
        )
;

Here are the first two plans, pulled from memory (which you might have guessed thanks to the “disappearing subquery predicate” in the predicate section. These examples came from 12.1.0.2, but the same happens in 11.2.0.4:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N2">))

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   123 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N3">))

As you can see, the cost of the query is the cost of the t1 tablescan plus the cost of running the t2 or t3 subquery once: 111 = 49 + 62, and 123 = 49 + 74.

(As a general guideline, recent versions of the optimizer tend to allow for subqueries by including “cost of subquery” * “number of times the optimizer thinks it will execute” – in this case the optimizer knows that the subquery will run exactly once).

But what happens when we test the query that applies BOTH subqueries to the tablescan ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N2"> OR "T1"."N3">)))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N3"> OR "T1"."N2">)))

The cost of the query in both cases is just the cost of the tablescan of t1 – the subqueries are, apparently, free. You can check from the predicate section, by the way, that the subqueries are applied in the order they appear in original statement.

Does anything change if the subqueries are not pushed ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N2"> OR "T1"."N3">))
   3 - filter("T1"."N1">10000)

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   124 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N3"> OR "T1"."N2">))
   3 - filter("T1"."N1">10000)

The two plans have different costs – and the cost is the cost of the tablescan of t1 plus the cost of just the first subquery in the filter predciate list.

The non-pushed subqueries show up another anomaly: you’ll notice that the t1 tablescan reports 10,001 rows cardinality, but the FILTER operation doesn’t have an associated cardinality so we can’t see how many rows the optimizer thinks will survive the subqueries. So let’s run a query that allows us to see the surviving row estimate:


select
        max(n1)
from
        (
        select
                /*+ no_eliminate_oby */
                t1.n1
        from
                t1
        where
                t1.n1 > 10000
        and     (
                   t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
                or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
                )
        order by
                n1
        )
;

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |       |       |   126 (100)|          |
|   1 |  SORT AGGREGATE        |      |     1 |    13 |            |          |
|   2 |   VIEW                 |      | 10001 |   126K|   126   (5)| 00:00:01 |
|   3 |    SORT ORDER BY       |      | 10001 |   146K|   126   (5)| 00:00:01 |
|*  4 |     FILTER             |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   6 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |       TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   8 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   9 |       TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."N3"> OR "T1"."N2">))
   5 - filter("T1"."N1">10000)

As you can see, the SORT ORDER BY operation thinks it’s going to handle 10,0001 rows – it looks as if the optimizer arithmetic hasn’t applied the usual subquery guess of 5% for the two subqueries. (When the subqueries were automatically pushed you saw a cardinality of 975 – which is 5% for subquery t2 plus (due to OR) 5% for subquery t3 minus 5% of 5% (=25) for the overlap – this is the standard OR arithmetic)

tl;dr

Although the optimizer code has been enhanced in many places for dealing with subquery estimates, but there are still some odd errors and inconsistencies that you need to be aware of. The examples I’ve shown may not be particularly significant in terms of what they do, but the pattern is one that you may recognise in more complex queries.

 

Reference script: subq_cost_anomaly_2.sql

 

January 6, 2016

NLS Mess

Filed under: Bugs,CBO,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:18 pm BST Jan 6,2016

The Oracle database has all sorts of little details built into it to help it deal with multi-national companies, but since they’re not commonly used you can find all sorts of odd “buggy” bits of behaviour when you start to look closely. I have to put “buggy” in quotes because some of the reported oddities are the inevitable consequences of (for example) how multi-byte character sets have to work; but some of the oddities look as if they simply wouldn’t be there if the programmer writing the relevant bit of code had remembered that they also had to cater for some NLS feature.

Here’s an example of the type of unexpected behaviour that can appear. There probably are some bugs in the area I’m going to demonstrate but, at first glance, I thought I was looking at an acceptable limitation imposed by a generic requirement. The example came from AskTom. which is why the data set isn’t my usual “t1” generation (and the formatting and capitalisation isn’t according to my usual standards).

The problem involves Case Insensitive indexing.


ALTER session SET nls_sort=binary_ci;
ALTER session SET nls_comp=linguistic;

CREATE TABLE log_data(
  account_id NUMBER,
  log_type NUMBER,
  sys_name VARCHAR2(30),
  log_time TIMESTAMP,
  msg varchar2(4000)
)
nologging
;

insert /*+ append */ into log_data(
  account_id,
  log_type,
  sys_name,
  log_time,
  msg
)
select
        5,
        2,
        dbms_random.string('a',1),
        sysdate + dbms_random.value,
        rpad('x',200)
from
        dual
connect by
        level <= 26000
;


create index log_date on log_data (
        account_id, 
        log_type, 
--      sys_name,
        NLSSORT(sys_name,'NLS_SORT=BINARY_CI'),
        log_time
)
nologging
;
  
rem     ======================================================================
rem     Need to gather stats AFTER index creation because of the hidden column
rem     ======================================================================
  
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'LOG_DATA',
                method_opt       => 'for all columns size 1'
        );
end;
/

And here’s the query I want to optimize:


SELECT 
        *
FROM
  (
    SELECT
        sys_name, log_time,  substr(msg,1,40) msg
    FROM log_data
    WHERE
      account_id=5
      AND log_type=2
      AND sys_name='a'
    ORDER BY
      log_time  desc
  )
WHERE
  rownum <= 10
;

The requirement of the query is that we see the ten most recent entries for a given combination of account_id, log_type and sys_name (ignoring case in sys_name). The orginal table has tens of millions of rows, of course, with many combinations, and some of the combinations have a very large number of entries hence the desire to find an access path that gets just the 10 rows we want without getting all the rows for a combination and sorting them before returning the ten.

Normally we would just create an index that started with the 3 columns used in the equality and ending with the column in the order by clause, and that would be enough for the optimizer to see the option for a “sort order by nosort” operation to get the required data through an index range scan; so that’s the index the code sample creates, except that since we’ve enabled case insensitive sorting we need to use a function-based index to hold the case-insensitive version of sys_name.

Here’s the execution plan we would get if we DIDN’T use the nlssort() function in the index – I’ve run the query in 11.2.0.4 and pulled the plan from memory with rowsource execution stats enabled:


---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   605 (100)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.02 |    1065 |       |       |          |
|   2 |   VIEW                         |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY       |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| LOG_DATA |      1 |    500 |   603   (1)|    966 |00:00:00.01 |    1065 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | LOG_DATE |      1 |    500 |   103   (3)|    966 |00:00:00.01 |     100 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2)
       filter(NLSSORT("SYS_NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6100') )

Notice particularly the filter predicate at operation 5: that’s the thing we need to get into the index before we can avoid picking up excess data and sorting it. Notice also in the A-Rows column that we acquired 966 rows from the table before sorting and discarding all but 10 of them at operation 3.

Notice especially how important it is to look at the predicate section of an execution plan to gain a full understanding of what’s happening.

So here’s the execution plan we get by default with the function-based index in place:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |    13 (100)|     10 |00:00:00.01 |     969 |       |       |          |
|*  1 |  COUNT STOPKEY                  |          |      1 |        |            |     10 |00:00:00.01 |     969 |       |       |          |
|   2 |   VIEW                          |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY        |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|    966 |00:00:00.01 |     969 |       |       |          |
|*  5 |      INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|    966 |00:00:00.01 |       5 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

It didn’t work ! (Check the A-Rows at operations 4 and 5, and the sort that we didn’t want at operation 3 where the data is finally reduced to 10 rows.

But there’s something odd going on here – look at the predicate section: our three predicates are all access predicates for the index range scan descending. We are doing exactly what we want to do with the index, but we’re not stopping after the 10 rows that we need, we’re getting all of them (in the order we want) and then doing a trivial sort and discard. Look at the Cost column – the cost at operation 4 is exactly what we might expect for the 10 rows we want to see, and the E-rows at line 5 is clearly based on our “first 10 rows” requirement.

This raises two questions:

  1. What’s gone wrong ?
  2. Can we work around the problem ?

The answer to (1) is, I think, that there’s a bug in the code. Looking at the 10053 trace file I can see the optimizer correctly handling the arithmetic of the virtual column (the sys_nc000006$) representing the function in the index and then getting to the point where it goes into a code section relating to “Recost for ORDER BY”, and brings back the original function as a filter predicate – I think that in the recosting it may be losing track of the fact that sys_nc000006$ and nlssort(sys_name, ‘nls_sort=binary_ci’) are the same thing and therefore can’t apply the rule about “Equality on 1st N columns, order by on the remainder”.

There are several answers to (2).

Workarounds

The honest hack

The first one is simply to fall back to the old (probably version 7, possibly version 8) requirement for getting the “sort order by nosort” operation – put all the index columns into the order by clause. Unfortunately the optimizer then did a tablescan rather than an index range scan because my data set was so small, so I had to hack the system stats temporarily to make the tablescan very expensive:


begin
        dbms_stats.set_system_stats('MBRC',2);
        dbms_stats.set_system_stats('MREADTIM',20); 
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000); 
end;
/

... order by account_id desc, log_type desc, sys_name desc, lot_time desc

Unfortunately the optimizer still went wrong – it did an ASCENDING index range scan sorting all the data. I actually had to hint the code to use the index in descending order to get the following execution plan:


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |  1215 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |   1000 |  1215   (1)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |  1006   (1)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |   1000 |     5   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

The A-Rows tells us we’ve accessed the minimum data set, and the absence of the SORT ORDER BY STOPKEY operation tells us that we’ve avoided doing the sort. Notice, though that the cost is the cost that would have been appropriate if we have accessed all 1,000 rows that matched the equality predicates. This is an example of a plan that you couldn’t really trust if all you had done was an “explain plan” rather than running the query and checking the rowsource execution stats. If you ignore the A-Rows it looks as if the plan WOULD get all the data in order and only eliminate the redundant rows at operation 1.

The silly surprise

The original author of the problem came up with this one. Put in two predicates which, between them are equivalent to the original requirement:


where ...
and     sys_name >= 'a'
and     sys_name <= 'a'

Clearly this is totally silly – the optimizer can fold this pair of predicates into the single predicate “sys_name = ‘a'”, so it shouldn’t make any difference. But here’s the execution plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

Yes, it’s (structurally) exactly the same plan, with exactly the same predicate section except that (a) it gets there without being hinted, (b) the Cost column looks appropriate all down the line, and (c) the E-Rows value for the VIEW operator would have helped us appreciate that the correct elimination was (probably) going to happen if all we had done was the Explain Plan.

The dirty hack

I know the name of the hidden column that’s causing the problem, and I know how to generate the value it has to be – so let’s give Oracle exactly what it needs to see rather than allowing its internal transformation to rewrite the SQL:

...
AND sys_nc00006$ = nlssort('a','nls_sort=binary_ci')
...


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "SYS_NC00006$"=HEXTORAW('6100') )

We get exactly the plan we need – and the silly thing about this example is that it’s a case where we get the plan we want by EXPLICITLY transforming the SQL to reproduce the transformation that Oracle had done IMPLICITLY and then messed up !

Final Choice
Of the three options – the dirty hack is definitely a no-no in production; the “double the predicate” trock is undesirable because it may depend in some unexpected way on a particular optimizer bug or on some statistical detail that could change; so I’d choose the hinted path with the (nominally) redundant columns.

One final point about this solution, we actually needed to include only the sys_name in the order by clause to use the descending range scan and early stop – which is basically another indication that it’s something about the function-based column that is breaking the normal code path.

Reference Script: nls_sort_anomaly.sql

January 4, 2016

ANSI bug

Filed under: ANSI Standard,Bugs,Oracle — Jonathan Lewis @ 1:12 pm BST Jan 4,2016

In almost all cases the SQL you write using the ANSI (SQL-92) standard syntax is tranformed into a statement using Oracle’s original syntax before being optimised – and there are still odd cases where the translation is not ideal.  This can result in poor performance, it can result in wrong results. The following examples arrived in my in-tray a couple of weeks ago:

with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select *
    from (
        select sum(table3.table2.my_number) the_answer
            from table1
            left join table2 on table1.my_number = table2.my_number
            group by table1.my_number
        );


with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select sum(table3.table2.my_number) the_answer
    from table1
    left join table2 on table1.my_number = table2.my_number
    group by table1.my_number;

Notice the reference to table3.table2.my_number in the select list of both queries – where does the “table3” bit come from ? These queries should result in Oracle error ORA-00904: “TABLE3″.”TABLE2″.”MY_NUMBER”: invalid identifier.

If you’re running 11.2.0.4 (and, probably, earlier versions) both queries produce the following result:


THE_ANSWER
----------
         1

1 row selected.

If you’re running 12.1.0.2 the first query produces the ORA-00904 error that it should do, but the second query still survives to produce the same result as 11.2.0.4.

Update (8th July 2016)

I’ve just seen a reference to this blog note on the OTN database forum that prompted me to point out that the problem doesn’t occur if you switch to traditional Oracle syntax – the expected Oracle error message appears in 11.2.0.4:

SQL> ed temp

with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select *
    from (
        select sum(table3.table2.my_number) the_answer
/*
            from table1
            left join table2 on table1.my_number = table2.my_number
*/
            from table1, table2
            where table2.my_number(+) = table1.my_number
            group by table1.my_number
        )
 15  ;
        select sum(table3.table2.my_number) the_answer
                   *
ERROR at line 6:
ORA-00904: "TABLE3"."TABLE2"."MY_NUMBER": invalid identifier


SQL> ed temp3

with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select *
    from (
        select sum(table3.table2.my_number) the_answer
--            from table1
--            left join table2 on table1.my_number = table2.my_number
            from table1, table2
            where table2.my_number(+) = table1.my_number
            group by table1.my_number
        )
 13  ;
        select sum(table3.table2.my_number) the_answer
                   *
ERROR at line 6:
ORA-00904: "TABLE3"."TABLE2"."MY_NUMBER": invalid identifier

November 24, 2015

Table Expansion

Filed under: 12c,Bugs,Oracle,Partitioning — Jonathan Lewis @ 12:13 pm BST Nov 24,2015

I’ve often found that while I’m investigating one Oracle feature I get waylaid by noticing anomalies in other parts of the code. I was caught by one of these events a little while ago while experimenting with the new (12.1.0.2) Inmemory Columnar Store.  After reading a posting by Martin Bach I asked the question:

“If you have a partitioned table with a local index and one of the table partitions has been declared INMEMORY, would a query that could use that index be able to apply table expansion to produce a plan that did a tablescan on the in-memory partition and an indexed access path on the partitions that weren’t in-memory?”

The answer was yes, with one important caveat – the first test I built to investigate the question did something very silly and produced the wrong results. In fact the error turned out to have nothing to do with the inmemory columnar store, it also appeared when I disabled the inmemory feature off and nudged the optimizer into table expansion by making one of the index partitions unusable. Here’s the demo, followed by a few notes, if you want to test it for yourselves:


create table t1 (
        id,
        n1,
        padding
)
partition by range (id) interval (250000) (
        partition p_start values less than (250000)
        segment creation immediate
        inmemory priority critical
)
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum - 1              id,
        mod(rownum - 1,1250)    n1,             -- 200 rows per segment
        rpad('x',400,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(n1) local nologging;

I’ve created an interval-partitioned table holding a million rows with an interval of 250,000, and I’ve defined the first partition with an upper bound of 250,000 and the inmemory attribute, leaving Oracle to add the other three partitions which won’t have the inmemory attribute set.

I’ve created a local index on the n1 column, with 200 rows (1,250 distinct values) per partition. Because of the way I’ve defined n1 I can easily recreate the table to adjust the relative code of the index accessed path and the tablescan path by adjusting the value I use in the mod() function. The use of the mod() function also means that every partition holds the same volume of data (with exactly the same degree of scattering) for any specific value of n1.

To test the option for table expansion I’ve then checked the execution plan for (and run) four queries:


select
        /*+ full(t1) */
        id
from
        t1
where
        n1 = 1
and     id < 250000
;

select
        /*+ index(t1) */
        id
from
        t1
where
        n1 = 1
and     id < 250000 
;
 
select
        /*+ full(t1) */
        id 
from    t1
where   n1 = 1
and     id >= 250000
and     id <  500000
;

select
        id
from
        t1
where
        n1 = 1
;

The first two queries give me the cost of accessing the inmemory partition by tablescan compared to the cost of accessing it by index. The third query is there to demonstrate that the non-inmemory tablescan is a LOT more expensive than the inmemory one. The final query accesses all four partitions to see if the inmemory partition is treated differently from the other three. Here are the four plans:

Default tablescan when limited to the in-memory partition
----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   200 |  1800 |   184   (9)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE     |      |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory("N1"=1 AND "ID"<250000)
       filter("N1"=1 AND "ID"<250000)



Index access is more expensive than the inmemory tablescan
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   200 |  1800 |   201   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |       |   200 |  1800 |   201   (0)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   200 |  1800 |   201   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   200 |       |     1   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<250000)
   3 - access("N1"=1)



Tablescan on a non-inmemory partition is much higher than inmemory (and indexed access)
-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |   200 |  1800 |   1891  (2)| 00:00:01 |       |       |
|   1 | PARTITION RANGE SINGLE|       |   200 |  1800 |   1891  (2)| 00:00:01 |     2 |     2 |
|*  2 | TABLE ACCESS FULL     | T1    |   200 |  1800 |   1891  (2)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"=1 AND "ID">=250000 AND "ID"<500000)



Access all four partitions - no table expansion. Cost = 4 x cost of single partition indexed access path.
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   800 |  7200 |   807   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL                       |       |   800 |  7200 |   807   (1)| 00:00:01 |     1 |1048575|
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   800 |  7200 |   807   (1)| 00:00:01 |     1 |1048575|
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   800 |       |     6   (0)| 00:00:01 |     1 |1048575|
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=1)

So we can see that the inmemory tablescan of a single segment is cheaper than the indexed access path, and much cheaper than the tablescan of a single non-inmemory segment. When we execute the query that accesses all four segments, though, the optimizer chooses to use the same access path for all four partitions rather then splitting the query into one tablescan and three indexed accesses.

This is just a starting point for testing though – there are transformations where the optimizer will only use a particular transformation if the transformed query has a cost that is “X%” lower than it would be without the transformation (and some of these transformations have associated parameters – usually hidden – that can be set to adjust the percentage). Perhaps if I adjusted the data so that the relative benefit of a single inmemory scan was larger; or if I created a table with more partitions and had two inmemory and three not inmemory, or 3 inmemory and 4 not inmemory; or if I had more inmemory partitions than not then maybe the optimizer would spontaneously do table expansion.

Rather than fiddle about with the data my next step was to hint the final query with /*+ expand_table(t1) */. Here’s the resulting execution plan:


------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |       |       |  1797 (100)|          |       |       |
|   1 |  VIEW                                        | VW_TE_1 |  1600 | 20800 |  1797   (1)| 00:00:01 |       |       |
|   2 |   UNION-ALL                                  |         |       |       |            |          |       |       |
|   3 |    PARTITION RANGE SINGLE                    |         |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|*  4 |     TABLE ACCESS INMEMORY FULL               | T1      |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|   5 |    PARTITION RANGE ITERATOR                  |         |   600 |  5400 |   806   (1)| 00:00:01 |     2 |     4 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |   600 |  5400 |   806   (1)| 00:00:01 |     2 |     4 |
|*  7 |      INDEX RANGE SCAN                        | T1_I1   |   800 |       |     5   (0)| 00:00:01 |     2 |     4 |
|   8 |    PARTITION RANGE INLIST                    |         |   800 | 16800 |   807   (1)| 00:00:01 |       |       |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |   800 | 16800 |   807   (1)| 00:00:01 |       |       |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |   800 |       |     6   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - inmemory(("N1"=1 AND "T1"."ID"<250000))
       filter(("N1"=1 AND "T1"."ID"<250000)) 6 - filter(("T1"."ID">=250000 AND "T1"."ID"<1000000))
   7 - access("N1"=1)
  10 - access("N1"=1)

Points to notice:
Table expansion has occurred – we did an inmemory full tablescan on the inmemory partition at a cost of 184, and index range scans on the other three partitions at a cost of 805 (which looks more like the cost of visiting 4 partitions), expecting 800 rowids from the local index (a failure of partition elimination) and 600 rows from the table. The partition start/stop columns do show that the optimizer expects to visit just the three correct partitions, although some of the cost and cardinality numbers seem to be appropriate to 4 partitions.

Worse still we see a third branch to the UNION ALL / table expansion – operations 8 to 10 – which don’t report a partition start and stop. What’s that doing ? Sometimes, of course, we see sections of plan that don’t actually run – usually preceded by a FILTER operation that can evaluate to FALSE – so maybe that’s what’s happening here. Re-running the query with rowsource execution stats enabled it turned out that the PARTITION RANGE INLIST started once, and operations 9 and 10 didn’t operate at all – so that’s okay.

But what happens if I execute a query that should only access the first two partitions ? Here’s the run-time plan to answer that question:


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  5dgp982ffsph8, child number 0
-------------------------------------
select  /*+ expand_table(t1) gather_plan_statistics */         id from
       t1 where  n1 = 1 and id < 500000

Plan hash value: 2876620067

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |      1 |        |    800 |00:00:00.05 |     649 |
|   1 |  VIEW                                        | VW_TE_1 |      1 |   1200 |    800 |00:00:00.05 |     649 |
|   2 |   UNION-ALL                                  |         |      1 |        |    800 |00:00:00.01 |     649 |
|   3 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |       3 |
|*  4 |     TABLE ACCESS INMEMORY FULL               | T1      |      1 |    200 |    200 |00:00:00.01 |       3 |
|   5 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |     215 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      1 |    200 |    200 |00:00:00.01 |     215 |
|*  7 |      INDEX RANGE SCAN                        | T1_I1   |      1 |    200 |    200 |00:00:00.01 |      15 |
|   8 |    PARTITION RANGE INLIST                    |         |      1 |    800 |    400 |00:00:00.01 |     431 |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      2 |    800 |    400 |00:00:00.01 |     431 |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |      2 |    800 |    400 |00:00:00.01 |      31 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - inmemory(("N1"=1 AND "T1"."ID"<250000))
       filter(("N1"=1 AND "T1"."ID"<250000))
   6 - filter(("T1"."ID"<500000 AND "T1"."ID">=250000))
   7 - access("N1"=1)
  10 - access("N1"=1)

My query should have returned 400 rows, accessing only the first two partitions. It returned 800 rows, accessing all 4 partitions. (Notice how there are no predicates on ID for operations 9 and 10).

Since I had started my investigation with the inmemory option I assumed at this point that the problem was somehow related to the that feature; which shows how easy it is to get too focused and jump to conclusions. After raising the problem with Oracle I got a reply that the problem wasn’t about the inmemory columnar store – and here’s the next little change to test to demonstrate that point:


alter index t1_i1 indexing partial;

alter table t1 modify partition p_start indexing off;
alter table t1 modify partition p_start no inmemory;

select  partition_name, indexing, inmemory 
from    user_tab_partitions 
where   table_name = 'T1';

select  partition_name, segment_created    
from    user_ind_partitions 
where   index_name = 'T1_I1';

Enable partial indexing for the index, switch off the index on the first partition then disable the inmemory option for the partition. This is the plan I got from re-running the two-partition query:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |      1 |        |    800 |00:00:00.08 |   15370 |  14706 |
|   1 |  VIEW                                        | VW_TE_1 |      1 |   1200 |    800 |00:00:00.08 |   15370 |  14706 |
|   2 |   UNION-ALL                                  |         |      1 |        |    800 |00:00:00.07 |   15370 |  14706 |
|   3 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |     216 |      0 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      1 |    200 |    200 |00:00:00.01 |     216 |      0 |
|*  5 |      INDEX RANGE SCAN                        | T1_I1   |      1 |    200 |    200 |00:00:00.01 |      16 |      0 |
|   6 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.04 |   14723 |  14706 |
|*  7 |     TABLE ACCESS FULL                        | T1      |      1 |    200 |    200 |00:00:00.04 |   14723 |  14706 |
|   8 |    PARTITION RANGE INLIST                    |         |      1 |    800 |    400 |00:00:00.01 |     431 |      0 |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      2 |    800 |    400 |00:00:00.01 |     431 |      0 |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |      2 |    800 |    400 |00:00:00.01 |      31 |      0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."ID"<500000 AND "T1"."ID">=250000))
   5 - access("N1"=1)
   7 - filter(("N1"=1 AND "T1"."ID"<250000))
  10 - access("N1"=1)


Again I accessed all 4 partitions and returned 800 rows. As an odd little detail the plan reversed the order of access of partitions 1 and 2. The problem isn’t about the inmemory option, it’s a problem with generic table expansion.

Continuing Tests
I won’t go into all the details of what I did next – once you get started it’s hard to stop, and easy to lose track of what you’ve done. So here’s a quick summary.

Given the problem appeared without tthe inmemory columnar store enabled, I switched back to 11.2.0.4 (where table expansion is also possible) and emulated the problem by setting the first index partition unusable (since partial indexing is a 12c feature). Table expansion did not occur even when hinted: the plan was a full tablescan on both partitions.

So I switched the table from being interval partitioned to simple range partitioned, creating all 4 partitions as I created the data. In 11g I got table expansion and the correct answer without the extra branch to the union all; so I went back to 12c and did the same – pure range partitioning, partial indexing, and got table expansion with the correct result and no spare branches to the plan.

Tentative Conclusion

Clearly my testing is by no means exhaustive – but 12c seems to allow table expansion for interval partitioning in cases where 11g does not; unfortunately it is possible for the table expansion code to go wrong for interval partitioning in cases where simple range partitioning does not. In my case this led to wrong results.

 

Reference Script: in_memory_table_expansion_*.sql

 

 

November 9, 2015

Wrong Results

Filed under: 12c,Bugs,Oracle — Jonathan Lewis @ 6:23 am BST Nov 9,2015

Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result depends on the SQL*Plus arraysize!

The email had a short description, and a script to create a small data set that would demonstrate the problem. I’m not going to show you the query, or the result set, but here’s a sample of the output from an SQL*Plus session after creating the data. This is, by the way, on a “single-user” system – there is no way that some other session is changing the data – especially after the opening “set transaction”:

SQL> set transaction read only;

Transaction set.

SQL> set arraysize 1
SQL> select ...

...

541 rows selected.

SQL> set arraysize 4
SQL> select ...

...

599 rows selected.

SQL> set arraysize 10
SQL> select ...

...

620 rows selected.

SQL> set arraysize 32
SQL> select ...

...

616 rows selected.

The correct result set should have had the 616 rows reported when the arraysize was set to 32 (of, course, it’s possible with an arraysize of 32 the 616 rows returned weren’t the correct 616 rows – rows seemed to get multiplied or deleted fairly arbitrarily as the arraysize changed).

The execution plan was a little unusual in that it forced a nested loop join with a tablescan on the inner table; and when I ran the query with rowsource execution statistics enabled the number of starts of the inner tablescan was 1,597 but the number of rows actually returned varied. My first thought was that some new mechanical optimisation of the tablescan code was losing track of where it had got to in the repeated tablescans – but it turned out I was wrong.

Here’s the execution plan (with camouflage) – the key detail is in a section I didn’t look at intially, the column projection:


select * from table(dbms_xplan.display_cursor('0dh0kh9qa88mz',1,'-note +projection'));

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       | 14118 (100)|          |
|   1 |  NESTED LOOPS       |      |    29 |  2958 | 14118   (2)| 00:00:01 |
|*  2 |   HASH JOIN         |      |   892 | 57088 |    35   (3)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ABC  |   549 | 21411 |    17   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| DEF  |   892 | 22300 |    17   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | XYZ  |     1 |    38 |    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEF"."ABC_FK"="ABC"."ABC_ID")
   5 - filter(("DEF"."COL0"="XYZ"."COL0" AND "XYZ"."COL1"="ABC"."COL1"
              AND "XYZ"."COL2"="ABC"."COL2"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20], "XYZ"."COL1"[NUMBER,22],
       "XYZ"."COL2"[NUMBER,22], "XYZ"."COL0"[VARCHAR2,20]
   2 - (#keys=1) "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20]
   3 - (rowset=200) "ABC"."ABC_ID"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22]
   4 - (rowset=200) "DEF"."ABC_FK"[NUMBER,22], "DEF"."COL0"[VARCHAR2,20]
   5 - "XYZ"."COL1"[NUMBER,22], "XYZ"."COL2"[NUMBER,22],
       "XYZ"."COL0"[VARCHAR2,20]

The predicate section is irrelevant in this case, and I’ve camouflaged the names of the tables and columns – the only interesting bit is the appearance of the (rowset=200) in the projection information. This is reporting a feature new in 12c (and not to be confused with Oracle Java Rowsets) that should improve the performance of some queries.

I didn’t actually look at the projection information until after I’d asked the Oak Table members if they had ever seen this type of anomaly before – and Stefan Koehler emailed back to suggest that the problem might be related to rowsets (there are a couple of similar bugs on MoS, e.g: 17016479 and 20960570) – so I checked the projection, then repeated my tests after disabling the feature with a call to: ‘alter session set “_rowsets_enabled”=false;’

Problem solved – although I’ve told the person who emailed me to report this discovery and workaround to Oracle support and see what they supply as the approved solution.

It is possible to affect the feature through event 10055 – different levels disable it at different locations in the code; the list of options is given in the $ORACLE_HOME/rdbms/mesg/oraus.msg file (if you’re not running Windows):


//            Level:
//            0x00000001 - turn off for table scan
//            0x00000002 - turn off for hash join consume
//            0x00000004 - turn off for hash join produce
//            0x00000008 - turn off for group by
//            0x00000010 - turn off for sort
//            0x00000020 - turn off for table-queue out
//            0x00000040 - turn off for table-queue in
//            0x00000080 - turn off for identity
//            0x00000100 - turn off for granule iterator
//            0x00000200 - turn off for EVA functions
//            0x00000400 - turn off for PL/SQL
//            0x00000800 - turn off for upgrade
//            0x00001000 - turn off for database startup
//            0x00002000 - turn off for blobs and clobs
//            0x00004000 - turn off for tracing row source
//            0x00008000 - turn off rowset information in explain plan
//            0x00010000 - disable hash join rowsets fast path
//            0x00020000 - turn off for bloom create
//            0x00040000 - turn off for bloom use
//            0x00080000 - disable prefetch for hash join
//            0x00100000 - disable prefetch for bloom
//            0x00200000 - disable semi blocking hash join
//            0x00400000 - turn off rowset for fixed table

I tried the first few levels and found that both levels 1 and 2 eliminated the problem (and eliminated the appearance of the (rowset=200) entry in the projection information). Given the shape of the plan I had thought that just one of 1,2 or 4 might have been relevant so I was a little surprised to find that both 1 and 2 were effective – but that’s probably just a question of interpretation of the brief descriptions.

Update

I’ve asked the owner of the problem if it’s okay to post the script to create the tables and data – and the answer was yes: the content was already heavily camouflaged anyway. So here’s a file you can download if you want to test other environments: insert_script_12c_bug

It’s declared as a “.doc” file to get past the upload process, but it’s really a flat text file.

Update 16th Nov

The official “minimum impact” workaround is to set event 10055 at level 2097152 (disable semi blocking hash join). Alternatively there is also a patch available. See Mike Dietriech’s blog for details: https://blogs.oracle.com/UPGRADE/entry/update_for_switch_off_rowsets

Update 7th Dec

Mike Dietriech has updated his second posting to add details of a patch for this bug.

 

October 23, 2015

Histogram Limit

Filed under: Bugs,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:03 pm BST Oct 23,2015

A surprising question came up on OTN a couple of days ago:

Why does a query for “column = 999999999999999999” run slower than a query for “column > 999999999999999998” (that’s 18 digit numbers, if you don’t want to count them). With the equality predicate the query is very slow, with the range-based predicate perfomance is good.

In the absence of further information there are various reasons why this is possible – but the example in question was about a “versioning” table where the single very large value was used as the “not yet ended” value for the history of an id so, at a minimum, the table held columns (id, nstart, nend, other), and each id could appear many times with pairs of start and end values that supplied non-overlapping, covering ranges and one row that had the very large number as the end value.

Let’s jot down a few ideas about what the data (and stats) might look like.

Assuming every id appears “a few” time and every id has to have a “still valid” row this means that a very large fraction (say 10% to 25%, if “a few” means 4 to 9) of the rows hold the value 999999999999999999.

If you gather stats without a histogram then you should get the low and high, finding that the high is 999999999999999999 and that the range is enormous, and so the predicates “column = {high value}” and “column > {high value} – 1” should give very similar cardinalities.

If you collect stats with a histogram you should find the very popular high value even in a very small sample set (which is what happens with histogram collection in 11g, and even in 12c for hybrid histograms). In this case the histogram should spot the significance of the high value and again the two predicates should have very similar cardinalities.

At first sight there doesn’t seem to be a feasible way that the two cardinalities could be sufficiently different to cause a problem – so maybe there’s something about character conversion or maybe bind variable usage that hasn’t been mentioned. So to test a couple of the less likely ideas I built a data set using 11.2.0.4 – and found a bug:


create table t1 (
        id              number(18),
        nend            number(18),
        n1              number(18),
        small_vc        varchar2(10)
)
nologging
;

insert /*+ append */ into t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum -1               id,
        case mod(rownum - 1,4)
                when 0  then 999999999999999999
--              when 0  then 999999999999999
                        else mod(rownum - 1, 250000)
        end                     nend,
        rownum - 1              n1,
        rpad('x',10,'x')        small_vc
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 ; commit; select * from t1 where nend = 0; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size auto'
        );
end;
/

I have an nend column that is set to 999999999999999999 every 4th row in the table and otherwise has 4 rows per value for 187,500 other (relatively low) values. It’s probably a reasonable initial model of the original data. I’ve run a query with a predicate referencing nend before gathering stats so that the (default) auto option will build a histogram for nend. Then I’ve checked the execution plans for two critical queries:


explain plan for
select  *
from    t1
where   nend = 999999999999999999
;

select * from table(dbms_xplan.display);

explain plan for
select  *
from    t1
where   nend > 999999999999999998
;

select * from table(dbms_xplan.display);

And here’s the surprise – the two plans, in order:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |   108 |   625   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |   108 |   625   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND"=999999999999999999)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250K|  6591K|   627   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |   250K|  6591K|   627   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND">999999999999999998)

Clearly this should not happen – the equality test is way off, the inequality test is correct. The obvious first guess is that something funny has happened with the statistics so let’s see what they look like – the column stats (user_tab_cols) and the histogram stats (user_tab_histograms) seem like a good starting point:


select
        sample_size, num_distinct, histogram, num_buckets,
        substr(low_value,1,26) low_value, substr(high_value,1,26) high_value
from
        user_tab_cols
where
        table_name = 'T1'
and     column_name = 'NEND'
;

column endpoint_value format 999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'NEND'
order by
        endpoint_number
;

And here are the results (with a couple of hundred uninteresting rows eliminated from the histogram):


      Sample     Distinct HISTOGRAM          Buckets LOW_VALUE                  HIGH_VALUE
------------ ------------ --------------- ---------- -------------------------- --------------------------
       5,541      190,176 HEIGHT BALANCED        254 C102                       C9646464646464646464

ENDPOINT_NUMBER               ENDPOINT_VALUE
--------------- ----------------------------
              0                           19
              1                        1,225
              2                        2,503
              3                        3,911
              4                        4,806
...
            188                      247,479
            189                      248,754
            190                      249,862
            254    1,000,000,000,000,000,000

Oracle will have started with a 100% sample to collect stats on all the columns, but taken a small sample to test the need for a histogram on the nend column – and that’s why the sample size of 5,541 has appeared, but that’s not relevant to the problem in hand. The big question comes from endpoint_number 254 – why is the highest value in the histogram 1e19 when we know (and the column stats show) that the highest value is actually 999999999999999999 ?!

It’s a question to which I don’t have an answer – but I do know that

  • if your high value is 15 digits long (all 9s) then the histogram shows the right high value
  • if your high value is more that 15 9s then the histogram shows the high value plus 1
  • the value collected by the query that Oracle runs is the actual value (i.e. 18 9s)
  • if you use set_column stats to set 18 9s as the high value you still get 1e19 in the histogram

Once you see the stats you can understand why the OP sees the odd performance problem. If the histogram identifies 1e19 as a (very) popular value, leaving 999999999999999 as an “average” value with only 4 rows; on the other hand the query for greater than 999999999999999998 can see that there really are 250K rows with higher values.

Footnote:

Interestingly 12c does the same with the stats – introducing the 1e19 in the histogram – but still manages, somehow, to calculate the correct cardinality in the equality case. (There is one slight difference in 12c, the histogram is a hybrid histogram, not a height-balanced histogram).

Update 24th Oct 2015:

I realised late last night that I had written about this behaviour before – though possible just as a response on OTN or in some unpublished notes. According to the notes in a script called histogram_numeric_bug.sql that I found on my laptop this is “Bug 18514507 : WRONG CARDINALITY ESTIMATES WHEN NUMERIC VALUE IS LONGER THAN 15 BYTES”. (except that the base bugs that that one is linked to are probably not the same bug).

I suspect the problem is related to the way that character histograms are built based on a numeric representation of the string that takes the first few characters of the string, treats that resulting N bytes as a hex number, converts to decimal and then applies round(N, -21) to restrict the precision stored. The effect with strings is that (broadly speaking) you get fifteen digits (sometimes reduced to 14) precision – which is exactly what I seem to be seeing with numbers.

The reason that 12c can get the right answers despite storing the wrong endpoint_value is that it’s also storing to_char() of the right value as the endpoint_actual_value – possibly doing this any time it has had to round the endpoint_value (for character strings the endpoint_actual_value was only populated if two entries in the endpoint_value were the same).

Next Page »

Powered by WordPress.com.