Oracle Scratchpad

January 10, 2017

Join Elimination 12.2

Filed under: 12c,Bugs,Execution plans,Oracle — Jonathan Lewis @ 1:03 pm GMT Jan 10,2017

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:


rem
rem	Script:		join_elimination_12c2.sql
rem	Author:		Jonathan Lewis
rem	

-- Environment details eliminated

define m_pad=100

/*
	IDs will be 1 to 1000
*/

create table grandparent 
as
select 
	rownum			id,
	trunc((rownum-1)/5)	small_num_g,
	rpad(rownum,10)		small_vc_g,
	rpad(rownum,&m_pad)	padding_g
from 
	all_objects 
where 
	rownum <= 1000
;

/*
	Each GP has two (scattered) children here
	Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
	1+mod(rownum,1000)	id_g,
	rownum			id,
	trunc((rownum-1)/5)	small_num_p,
	rpad(rownum,10)		small_vc_p,
	rpad(rownum,&m_pad)	padding_p
from 
	all_objects 
where 
	rownum <= 2000
;

/*
	Simple trick to get 5 (clustered) children per parent
	Child IDs are 1 to 12,000
*/

create table child 
as
select 
	id_g,
	id			id_p,
	rownum			id,
	trunc((rownum-1)/5)	small_num_c,
	rpad(rownum,10)		small_vc_c,
	rpad(rownum,&m_pad)	padding_c
from 
	parent	p,
	(
		select /*+ no_merge */ 
			rownum 
		from	parent p 
		where	rownum <= 5
	)	d
;

create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem	Don't need to collect stats because it's 12c
rem

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered gp -> p -> c
prompt	The final plan is GP->C, The optimizer eliminated P before
prompt	considering GP
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered c -> p -> gp
prompt	The final plan is a tablescan of C only. The optimizer managed 
prompt	to eliminate GP first and P second
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	child		c,
	parent		p,
	grandparent	g 
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	==================================================
prompt	Convert to ANSI standard in the order gp -> p -> c
prompt	and both gp and p eliminated.
prompt	==================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g
join
	parent		p
on	p.id_g = g.id
join
	child		c
on	c.id_p = p.id
and	c.id_g = p.id_g
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===================================================
prompt	Convert to ANSI standard in the order c -> p -> gp
prompt	and only p is eliminated. 
prompt	===================================================

select 
	count(c.small_vc_c)
from 
	child		c
join
	parent		p
on      p.id   = c.id_p 
and	p.id_g = c.id_g 
join
	grandparent	g
on	g.id = p.id_g 
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.

Footnote:

If you’re wondering why the “traditional” and “ANSI” syntax should exhibit this flaw for joins in the opposite direction – remember that ANSI SQL is first transformed into an equivalent Oracle form and – in the simple cases – the first two tables form the first query block then each table after that introduces a new query block, so the optimizer strategy does (approximately) the following translation:


select ... from grandparent join parent join child

==>

select ... from (select ... from grandparent join parent) join child

The optimizer then optimizes the inline query, which eliminates grandparent leaving a join between parent and child, which then allows parent to be eliminated.

Conversely we get:

select ... form child join parent join grandparent 

==>

select ... from (select ... from child join parent) join grandparent

In this form the optimizer eliminates parent from the inline view and is left with a join between child and grandparent – so no further elimination.

December 13, 2016

Index Compression

Filed under: 12c,Indexing,Oracle — Jonathan Lewis @ 1:11 pm GMT Dec 13,2016

Richard Foote has published a couple of articles in the last few days on the new (licensed under the advanced compression option) compression mechanism in 12.2 for index leaf blocks. The second of these pointed out that the new “high compression” mechanism was even able to compress single-column unique indexes – a detail that doesn’t make sense and isn’t allowed for the older style “leading edge deduplication” mechanism for index compression.

In 12.2 an index can be created (or rebuilt) with the option “compress advanced high” – and at the leaf-block level this will create “compression units” (possibly just one per leaf block – based on my early testing) that takes the complexity of compression far beyond the level of constructing a directory of prefixes. Richard demonstrated the benefit by creating a table with a numeric unique index – then compressed the index, reducing its size from 2,088 leaf blocks to 965 leaf blocks, which is pretty dramatic difference.

It crossed my mind, though, to wonder whether the level of compression was a side effect of the very straightforward code that Richard had used to create the data set: the table was a million rows with a primary key that had been generated as the rownum selected from the now-classic “connect by..” query against dual, and the row length happened to allow for 242 rows per 8KB table block.

If you pause to think about this data set you realise that if you pick the correct starting point and walk through 242 consecutive entries of the index you will be walking through 242 consecutive rows in the table starting from the zeroth row in a particular table block and ending at the 241st row in that block. A rowid (as stored by Oracle in a simple B-tree index) consists of 6 bytes and the first five bytes of the rowid will be the same for the first 256 rows in any one table block (and the first four will still be the same for the remaining rows in the block). Richard’s data set will be very close to ideal for any byte-oriented, or bit-oriented, compression algorithm because (to use Oracle terminology) the data will have a perfect clustering_factor. (He might even have got a slightly better compression ratio if he’d used an /*+ append */ on the insert, or done a CTAS, and reduced the rowsize to get a uniform 256 rows per table block.)

So how do things change if you randomise the ordering of the key ? Here’s a variant on Richard’s code:


rem
rem	Script:		index_compression_12c_2.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Dec 2016
rem
rem	Last tested 
rem		12.2.0.1
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
	rownum				id,
	lpad('x',130,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
order by
	dbms_random.value
;

select table_name, blocks, round(num_rows/blocks,0) rpb from user_tables where table_name = 'T1';

drop index t1_i1;
create unique index t1_i1 on t1(id);
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

drop index t1_i1;
create unique index t1_i1 on t1(id) compress advanced high;
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

The initial drop index is obviously redundant, and the calls to gather_index_stats should also be redundant – but they’re there just to make it obvious I haven’t overlooked any checks for correctness in the build and stats.

You’ll notice that my row length is going to be slightly more “real-world” than Richard’s so that the degree of compression I get from nearly identical rowid values is likely to be reduced slightly, and I’ve completely randomised the order of key values.

So what do the results like ?

With the default pctfree = 10, and in a tablespace of uniform 1MB extents, 8KB blocks, utilising ASSM I get this:


TABLE_NAME               BLOCKS        RPB
-------------------- ---------- ----------
T1                        19782         51

INDEX_NAME           COMPRESSION     PCT_FREE LEAF_BLOCKS
-------------------- ------------- ---------- -----------
T1_I1                DISABLED              10        2088
T1_I1                ADVANCED HIGH         10        1303

Unsurprisingly the uncompressed index is exactly the same size as Richard’s (well, it was just the integers from 1 to 1M in both cases) but the compression ratio is significantly less – though still pretty impressive.

Of course, for this type of index my example probably goes to the opposite extreme from Richard’s. Realistically if you have a sequence based key with an OLTP pattern of data arrival then consecutive key values are likely to be scattered within a few blocks of each other rather than being scattered complely randomly across the entire width of the table; so a more subtle model (using a suitable number of concurrent processes to insert ids based on a sequence, perhaps) would probably get a better compression ratio than I did, though a worse one than Richard’s.There’s also the issue of the size of the key value itself – once you get to values in the order of 10 million to 100 million you’re looking at mostly 4 bytes (internal format) storage where for large runs of values the first 3 bytes match, possibly leading to a better compression ratio.

Of course the question of globally partitioned indexes will be relevant for some people since the principle reason for global indexes on partitioned tables is to enforce uniqueness on column combinations that don’t include the partition key, and that introduces another possible benefit – the rowid goes up to 10 bytes, of which the first 4 bytes are the object id of the table partition: depending on the nature of the partitioning that repeated 4 bytes per row may be close to non-existent after compression, giving you a better compression ratio on globally partitioned than you get on any other type of single column unique index.

Once you start looking at the details there are a surpising number of factors that can affect how well the advanced compression high can work.

Footnote:

Once you’ve created the index, you can start poking around in all sorts of ways to try and work out what the compression algorithm does. A simple block dump is very informative, with lots of clues in the descriptive details – and lots of puzzles when you start looking too closely. There are hints that this type of index compression adopts a strategy similar to “oltp comprssion” for tables in that compression occurs only as the leaf block becomes full – and possibly allows some sort of batching process within a leaf block before finally compressing to a single contiguous unit. (This is just conjecture, at present: the only firm statement I’ll make about the actual implementation of index compression is that it uses a lot of CPU; in my example the baseline create index took about 1.5 seconds of CPU, the compressed create took about 4.5 seconds of CPU.)

There are also a couple of amusing side effects that may confound those who use the old “validate index / query index_stats” two-step to decide when to rebuild indexes. Here’s what I got on the compressed index:


SQL> validate index t1_i1;

SQL> select blocks, lf_rows, lf_rows_len, btree_space, used_space, pct_used from index_stats;

    BLOCKS    LF_ROWS LF_ROWS_LEN BTREE_SPACE USED_SPACE   PCT_USED
---------- ---------- ----------- ----------- ---------- ----------
      1408    1000000	 14979802    10416812	14994105	144

My index is using 144% of the space that it has been allocated. You don’t have to be very good at maths (or math, even) to realise that something’s wrong with that number.

October 20, 2016

Conjuctive Normal Form

Filed under: 12c,Exadata,in-memory,Oracle — Jonathan Lewis @ 1:00 pm GMT Oct 20,2016

I recently tweeted about a comment I’d picked up at the Trivadis performance days regarding tablescans and performance.

“If you can write your SQL in conjunctive normal form it can help the optimizer to offload more predicates”

Inevitably someone asked me if I had an example to demonstrate this – I didn’t, and still don’t really, but here’s an interesting demo based on an example from the Oracle In-Memory blog showing how the optimizer will rearrange your filter predicates before passing them to the tablescan code for evaluation against an inmemory table.


rem
rem     Script:         in_memory_conjunctive.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        trunc(dbms_random.value(1,501)) qty,
        mod(rownum,200) + 1             part_no,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7
;
prompt  ==========
prompt  Base query
prompt  ==========

select
        count(v1)
from
        t1
where
        (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  ===============
prompt  predicate added
prompt  ===============

select
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  =================
prompt  Ordered predicate
prompt  =================

select  /*+ ordered_predicates */
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;

The 2nd and 3rd queries add a predicate to the first query – which, unfortunately, changes the estimated cardinality even though it has no effect on the result. This predicate is one that would be added by the inmemory code path if the table were declared to be inmemory. I’ve got two versions of the query, one with the (deprecated) ordered_predicates hint because in my initial tests the optimizer swapped the order of the predicates and I wanted to see if the ordering was at all critical.

Here’s the plan for the base query – first before declaring the table inmemory, then after declaring the table inmemory:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14739 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  1862K| 14739   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |       |       |  1974 (100)|          |
|   1 |  SORT AGGREGATE             |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   100K|  1862K|  1974  (44)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))) filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))

And here, after putting the table back to no inmemory are the plans for the second and third queries; note, particularly the different order of the predicates in the predicate section: the predicate order matches the inmemory predicate order only if I use the ordered_predicates hint:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)) AND ("QTY">495
              OR "QTY"<3)))
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND
              "PART_NO"=50))))

Finally the run times – after running the queries a few times each to check for consistency:

  • Base query: 0.82 seconds
  • Query with extra predicate: 0.86 seconds
  • Query with extra predicate and forced order of predicate evaluation: 0.71 seconds

The query with the predicate arrangement matching the inmemory rewrite actually ran 13% faster than the original. Unfortunatly the rewrite without the ordered_predicates hint ran slower – which is a bit of a shame but understandable – the first predicate is the more complex, and then the code has to run a completely redundant second predicate; I was a little surprised at how much slower it was, but the table is 10M rows and we’re only looking at sub-second times anyway.

My table was fully cached and just under 112,000 blocks, so not very large, and this was running a serial query on a basic Oracle instance. Nevetheless there is a difference in execution time that is more than just “random noise” – If this is an indication of how a little unsightly tweaking of SQL for small data sets can make a difference, you can imagine that there might be a worthwhile benefit to considering ways of tweaking your predicates that make a significant difference to execution time if the extra predicates end up being pushed down to storage on an Exadata machine.

Footnote:

Another “not quite” example I happen to have written about a few months ago is a case where rewriting “not exists() OR not exists() OR not exists()” as “not (exists() AND exists() AND exists())” allowed Oracle to rewrite three subqueries as a single subquery with three-table join.

 

October 10, 2016

InMemory Bonus

Filed under: 12c,in-memory,Oracle — Jonathan Lewis @ 1:13 pm GMT Oct 10,2016

It should be fairly well known by now that when you enable the 12c InMemory (Columnar Store) option (and set the inmemory_size) your SQL may take advantage of a new optimizer transformation know as the Vector Transformation, including Vector Aggregation. You may be a little surprised to learn, though, that some of your plans may change even when they don’t produce any sign of a vector transformation as a consequence. This is because In-Memory Column Store isn’t just about doing tablescans very quickly it’s also about new code paths for doing clever things with predicates to squeeze all the extra benefits from the technology. Here’s an example:


rem
rem     Script:         12c_inmemory_surprise.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2016
rem
rem     Last tested
rem             12.1.0.2
rem

drop table t2 purge;
drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create index t1_n1   on t1(n1)   nologging;
create index t2_rand on t2(rand) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns (n1,n2) size 1'
        );
end;
/

There’s nothing particularly special about these two tables – I engineered them for a different demo, and the call to gather extended stats on the column group (n1, n2) is just a minor detail in getting better cardinality estimates in the upcoming plans. At this point I haven’t specified that either table should be in memory, so let’s see what plan I get from dbms_xplan.display_cursor() when I run a query that should do a hash join using t1 as the build table and t2 as the probe table:


select
        /*+
                qb_name(main)
        */
        count(*)
from    (
        select
                /*+ qb_name(inline) */
                distinct t1.v1, t2.v1
        from
                t1,t2
        where
                t1.n1 = 50
        and     t1.n2 = 50
        and     t2.rand = t1.id
        )
;

select * from table(dbms_xplan.display_cursor);

Plan hash value: 1718706536

-------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |       |       |       |  2441 (100)|          |
|   1 |  SORT AGGREGATE                         |       |     1 |       |            |          |
|   2 |   VIEW                                  |       | 10001 |       |  2441   (4)| 00:00:01 |
|   3 |    HASH UNIQUE                          |       | 10001 |   351K|  2441   (4)| 00:00:01 |
|*  4 |     HASH JOIN                           |       | 10001 |   351K|  2439   (4)| 00:00:01 |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | T1_N1 |   100 |       |     1   (0)| 00:00:01 |
|   7 |      TABLE ACCESS FULL                  | T2    |  1000K|    14M|  2416   (4)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   5 - filter("T1"."N2"=50)
   6 - access("T1"."N1"=50)

Thanks to the column group the optimizer has estimated correctly that the number of rows selected from t1 would be 100. Beyond that there’s very little exciting about this execution plan.

So let’s modify t2 to be in-memory and see how the plan changes as we re-execute the query:


alter table t2 inmemory;

select
        /*+ qb_name(main) */
        count(*)
...

select * from table(...);


Plan hash value: 106371239

----------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |         |       |       |   259 (100)|          |
|   1 |  SORT AGGREGATE                          |         |     1 |       |            |          |
|   2 |   VIEW                                   |         | 10001 |       |   259  (27)| 00:00:01 |
|   3 |    HASH UNIQUE                           |         | 10001 |   351K|   259  (27)| 00:00:01 |
|*  4 |     HASH JOIN                            |         | 10001 |   351K|   257  (26)| 00:00:01 |
|   5 |      JOIN FILTER CREATE                  | :BF0000 |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1      |   100 |  2100 |     3   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_N1   |   100 |       |     1   (0)| 00:00:01 |
|   8 |      JOIN FILTER USE                     | :BF0000 |  1000K|    14M|   234  (20)| 00:00:01 |
|*  9 |       TABLE ACCESS INMEMORY FULL         | T2      |  1000K|    14M|   234  (20)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   6 - filter("T1"."N2"=50)
   7 - access("T1"."N1"=50)
   9 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))

The cost of the tablescan drops dramatically as the optimizer assumes the table will be in the in-memory column store (IMCS) but (in manys way more significantly) we suddenly have a (serial) Bloom filter for a hash join – which eliminates (most of) the data that wouldn’t have survived the hash join without having to use the CPU that would normally be spent probing the build table.

This is an interesting example of what the in-memory code path can do for us. There’s no point in using the Bloom filter for a serial hash join in “classic” Oracle because the Bloom filter is basically the bitmap of the hash table that is the first thing Oracle examines when probing the hash table – but with “in-memory” Oracle there are some enhancements to the Bloom filter and the code path that make using the Bloom filter an effective strategy. Most significant, perhaps, is that the in-memory code path can use SIMD instructions to perform multiple probes from t2 simultaneously, so not only do we get the benefits of avoiding disk access, buffer activity and row-by-row access to columns, we also reduce the CPU time spent on making the first-stage comparisons of the hash join. (And shared columnar dictionaries in 12.2 could reduce this even further!)

Footnote: I also have a note I scribbled dowsn at the Trivadis performance days last month that the Bloom filter used with IMCS carries the actual low and high values from the build table.  I may have misinterpreted this as I wrote it, but if that’s correct then it’s another step in eliminating data very quickly and very early when using IMCS (or Exadata if the same feature exists in the Bloom filters that get pushed to the storage servers.)

 

August 2, 2016

Adaptive mayhem

Filed under: 12c,Oracle — Jonathan Lewis @ 4:29 pm GMT Aug 2,2016

So you run a query and it gives you a plan with a note that says “This is an adaptive plan”.

So you run it again and the plan changes,  with a note that says “Statistics feedback used for this statement”

So you pause to think for a bit, then run the query again and the plan changes, with a note that says “One SQL Directive used, dynamic statistics used”. (You waited too long and the internal re-optimization hints got flushed down into an SQL directive.)

So you decide to think about it the following morning when you’re feeling bright and fresh, and when you run it you get another plan because overnight the automatic stats job gathered stats on the critical table and created a column group that was indicated by the (now defunct) directive.

Happy optimisation!

See also:

Update 1:

This just in from fellow Oak Table member Stefan Koehler:If the automatic stats collection task doesn’t finish inside its window that you may get unlucky and find that Oracle has created extended stats on a table, failed to complete the stats collection, and dropped the extended stats. Bug 19450314 mentioned above avoids the problems of unnecessary invalidation when the extended stats are added and collected – it doesn’t deal with the problem if the stats are dropped.

Update 2(Sep 2016):

Fresh news from OpenWorld 2016 on 12.2, courtesy of DBI Services’ Franck Pachot: The parameter contolling adaptive optimisation has been split into two parts, one for the adaptive plans the other for adaptive statistics – so you can officially switch of the bit that’s more likely to cause unexpected side effects while leaving the bit that’s more likely to be beneficial active.

Update 3 (Nov 2016):

The latest news from Stefan Koelher on Bug 19450314 (see update 1 above) is that the unfinished bits of the fix for this bug are now known as bug 21418655. “Not done were the following: MODIFY, RENAME and DROP column (including work on Drop_Extended_Stats).”

Update 4 (Dec 2016)

And now a potential threat (about column groups in general) described in a series of three blog posts from Magnus Johansson. With reference to a relevant MoS Doc Id and patch.

July 7, 2016

Invisible Bug

Filed under: 12c,Bugs,CBO,Indexing,Oracle — Jonathan Lewis @ 5:27 pm GMT Jul 7,2016

At this Wednesday’s Oracle Midlands event someone asked me if Oracle would use the statistics on invisible indexes for the index sanity check. I answered that there had been a bug in the very early days of invisible indexes when the distinct_key statistic on the index could be used even though the index itself would not be considered as a candidate in the plan (and the invisible index is still used to avoid foreign key locking – even in 12c – it’s only supposed to be invisible to the optimizer).

The bug was fixed quite a long time ago – but a comment on the “Index Sanity” article has introduced me to a related bug that is still present in 11.2.0.4 where the presence of an invisible index can affect an execution plan. Here’s a little model (run under 11.2.0.4) to demonstrate:

rem
rem     Script:         invisible_index_bug.sql
rem     Author:         Jonathan Lewis
rem

execute dbms_random.seed(0)

drop table t2;
drop table t1;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(mod(rownum,1000) as number(8,0))           n1,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table t2
as
select
        rownum id,
        trunc(dbms_random.value(0,10000)) n1
from
        dual
connect by
        level <= 100
;
begin 
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

column n1 new_value m_n1
select n1 from t2 where id = 50;
clear columns

set autotrace traceonly explain

select
        t1.*
from
        t1, t2
where
        t2.n1 = &m_n1
;

create unique index t2_i1 on t2(n1)
-- invisible
;

select
        t1.*
from
        t1, t2
where
        t2.n1 = &m_n1
;

set autotrace off

All I’ve done is create a couple of tables then do a join that we might expect to see executed as a cartesian merge join; at one point I was going to make the data more complicated and include a join condition, but decided to keep things small and simple so it’s a silly example but it is sufficient to make the point. The funny little bit about selecting an n1 value from t2 was also in anticipation of a more complex example but it does, at least, ensure I query for a value that is in range.

Here are the two execution plans from 11.2.0.4 – the key feature is that the plan changes after the invisible index is created:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|   1 |  MERGE JOIN CARTESIAN|      |  1000K|   119M|  2263   (3)| 00:00:12 |
|*  2 |   TABLE ACCESS FULL  | T2   |     1 |     4 |     2   (0)| 00:00:01 |
|   3 |   BUFFER SORT        |      |  1000K|   115M|  2261   (3)| 00:00:12 |
|   4 |    TABLE ACCESS FULL | T1   |  1000K|   115M|  2261   (3)| 00:00:12 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N1"=5308)


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|   1 |  NESTED LOOPS      |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |     4 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   115M|  2261   (3)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N1"=5308)

Notice how the plan without the invisible index does a “sort” (actually a “buffer sort” so it’s just copying the data into local memory), while the plan with the not quite invisible enough index in place gets away with just a full tablescan. This is bug 16564891, associated with base bug 16544878.

The bug notes say “fixed in 12.2”, but in Oracle 12.1.0.2 the first plan appears in both cases, and we have to make the index visible to get the second plan. (Take note of the need for the “negative” test to prove the point; the fact that the same plan appears for both cases doesn’t, by itself, prove that the bug was fixed, we have to show that the plan would have changed if the bug had still been present).

I believe the problem isn’t the problem of Oracle using the statistics when it shouldn’t; the change appears because in 11g Oracle incorrectly allows itself to see the uniqueness of the index and infer that table t2 is a “single row” table. In 12c the optimizer calculates that there will probably be only one row but that doesn’t stop it choosing the merge join cartesian as the “insurance bet” against having to do more than one tablescan of the t1 table. We can see this difference in the 10053 trace files, the 11g file has an entry for the “Single Table Access Path” for t2 that reads:

1-ROW TABLES:  T2[T2]#0

If you read the bug note for bug 16564891 you’ll see that it has a more realistic example of the problem – and it may give you some idea of where you might run into the bug. In general I don’t think many people are likely to come across the problem since it revolves around uniqueness, which is rather an important property, and there can’t be many occasions when someone decides to add (or test dropping) a unique index. Given that the example in the bug looks like “add a unique index to a dimension table that’s joining to a fact table” that may be a good pointer to where you’re most likely to run into the problem — when you’re trying to enforce data correctness in a data warehouse.

 

June 10, 2016

Uniquely parallel

Filed under: 12c,distributed,Execution plans,Oracle,Parallel Execution,Performance,Upgrades — Jonathan Lewis @ 7:36 am GMT Jun 10,2016

Here’s a surprising (to me) execution plan from 12.1.0.2 – parallel execution to find one row in a table using a unique scan of a unique index – produced by running the following script (data creation SQL to follow):


set serveroutput off
set linesize 180
set trimspool on
set pagesize 60

alter session set statistics_level = all;

variable b1 number
exec :b1 := 50000

select /*+ parallel (3) */ id, v1 from t2 where id=:b1;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

break on dfo_number skip 1 on tq_id skip 1 on server_type

select
        dfo_number, tq_id, server_type, instance, process, num_rows
from
        v$pq_tqstat
order by
        dfo_number, tq_id, server_type desc, instance, process
;

All I’ve done is enable rowsource execution statistics, set a bind variable to a value, query a table with a /*+ parallel(3) */ hint to find the one row that will be identified by primary key, and then reported the actual execution plan. When I first ran the test Oracle didn’t report the execution statistics correctly so I’ve also queried v$pq_tqstat to show the PX servers used and the flow of data through the plan. Here’s the plan, followed by the  results from v$pq_tqstat:


SQL_ID  0dzynh9d29pt9, child number 0
-------------------------------------
select /*+ parallel (3) */ id,v1 from t2 where id=:b1

Plan hash value: 247082613

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   1 |  PX COORDINATOR                   |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |  Q1,01 | P->S | QC (RAND)  |      0 |00:00:00.01 |
|   3 |    TABLE ACCESS BY INDEX ROWID    | T2       |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   4 |     BUFFER SORT                   |          |      0 |        |  Q1,01 | PCWC |            |      0 |00:00:00.01 |
|   5 |      PX RECEIVE                   |          |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |
|   7 |        PX SELECTOR                |          |      0 |        |  Q1,00 | SCWC |            |      0 |00:00:00.01 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |      0 |      1 |  Q1,00 | SCWP |            |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("ID"=:B1)

Note
-----
   - Degree of Parallelism is 3 because of hint

DFO_NUMBER      TQ_ID SERVER_TYP   INSTANCE PROCES   NUM_ROWS
---------- ---------- ---------- ---------- ------ ----------
         1          0 Producer            1 P003            0
                                          1 P004            1
                                          1 P005            0
                      Consumer            1 P000            0
                                          1 P001            1
                                          1 P002            0

                    1 Producer            1 P000            0
                                          1 P001            1
                                          1 P002            0
                      Consumer            1 QC              1

As you can see the table access follows a unique scan of an index and, although the rowsource execution stats report zero starts for the unique scan, we can see from v$pq_tqstat that slave P004 acquired a “row” (actually a rowid) and passed it to slave P001 which then acquired a row from the table and passed that row to the query coordinator. Oracle really did execute a parallel query, starting and stopping a total of 6 sessions to perform a single unique index access.

You’ll notice operation 7 is one you’ve only seen in the latest version of Oracle. The PX SELECTOR was introduced in 12c to reduce the number of times a complex parallel query would funnel into the query coordinator (parallel to serial) and then fan out again (serial to parallel) generating a new data flow operation tree (DFO tree) spawning one or two new parallel server groups as it did so. To stop this happening a step that needs to serialise in a 12c parallel plan can nominate one of the existing PX server processes (from each set, if necessary) to do the job so that the same set of PX servers can carry on running the query without the need for a new DFO tree to appear.

This enhancement to parallel execution plans is a good idea – except when it appears in my silly little query and turns something that ought to be quick and cheap into a job that is far more resource-intensive than it should be.

At this point, of course, you’re probably wondering what kind of idiot would put a parallel() hint into a query that was doing nothing but selecting one row by primary key – the answer is: “the Oracle optimizer in 12c”. I discovered this anomaly while creating a demonstration of the way that a distributed parallel query has to serialise through a single database link even if the operations at the two ends of the link run parallel. Here’s the SQL I wrote for the full demonstration:


rem     Script:         distributed_pq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016

define m_remote='test@loopback'
define m_remote='orcl@loopback'

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,8,'0') as varchar2(8))         v1,
        cast(rpad('x',100) as varchar2(100))            padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;

create table t2
as
select  *
from    t1
where   mod(id,100) = 0
;

alter table t2 add constraint t2_pk primary key(id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

set serveroutput off

select
        /*+ parallel(3) */
        t1.v1, t2.v1
from
        t1,
        t2@&m_remote
where
        mod(t1.id,10) = 0
and     t2.id = t1.id
and     mod(to_number(t2.v1),10) = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

If you want to run this demo you’ll need to do something about formatting the output; more importantly you’ll have to create a database link (with a loopback link) and set up a define identifying it at the line where I’ve got orcl@loopback and test@loopback (which are my 12c and 11g loopback links respectively).

Here’s the plan (with rowsource stats) I got from the 12c test:


----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   1 |  NESTED LOOPS         |          |      1 |     10 |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   2 |   PX COORDINATOR      |          |      1 |        |        |      |            |  10000 |00:00:00.11 |       7 |      0 |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |      0 |   1000 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
|   4 |     PX BLOCK ITERATOR |          |      3 |   1000 |  Q1,00 | PCWC |            |  10000 |00:03:17.72 |    1745 |   1667 |
|*  5 |      TABLE ACCESS FULL| T1       |     39 |   1000 |  Q1,00 | PCWP |            |  10000 |00:00:00.06 |    1745 |   1667 |
|   6 |   REMOTE              | T2       |  10000 |      1 |        |      |            |      0 |00:01:14.44 |       0 |      0 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(:Z>=:Z AND :Z<=:Z)
       filter(MOD("T1"."ID",10)=0)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   6 - SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1
       (accessing 'ORCL@LOOPBACK' )

Note
-----
   - Degree of Parallelism is 3 because of hint

I have hacked this output a little – the “Remote SQL” section didn’t get reported by display_cursor(), so I’ve inserted the remote sql I got from a call to dbms_xplan.display() after using explain plan to generate a plan. Note the /*+ shared(3) */ hint that appears in the remote SQL – that’s the internal version of a parallel(3) hint.

In 11g the query complete in 2.4 seconds, in 12c the query took nearly 75 seconds to run thanks to the 12c enhancement that allowed it to obey the hint! Looking at the time column (and ignoring the anomalous 3:17 at operation 4 – which might roughly be echoing 3 * 1:14) we can see that the time goes on the calls to the remote database (and a check of v$session_event shows this time spent in “SQL*Net message from db link”), so the obvious thing to do is check what actually happened at the remote database and we can do that by searching the library cache for a recognizable piece of the remote SQL – here’s the SQL to do that, with the results from 11g followed by the results from 12c:


SQL> select sql_id, child_number, executions, px_servers_executions, sql_text from v$sql
  2  where sql_text like '%SHARED%' and sql_text not like 'select sql_id%';

11g results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
c0f292z5czhwk            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE MOD(TO_NUMBER("V1"),10)=1 AND "ID"=:1


12c results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
7bk51w7vtagwd            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

7bk51w7vtagwd            1          0                 59995
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

It’s not surprising to see that the query has executed 10,000 times – that’s what we were told by the Starts statistic from dbms_output.display_cursor(), but 12c has 60,000 (with a little error) PX Servers executions of the statement. That’s 10,000 executions * degree 3 * the 2 slave sets we saw in my original execution plan. (It’s an odd little quirk of the two versions of Oracle that the order of predicates in the remote SQL was reversed between 11g and 12c – leading to two different SQL_IDs).

By enabling rowsource execution stats at the system level I was able to capture the remote execution plan with its stats:


SQL_ID  7bk51w7vtagwd, child number 0
-------------------------------------
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND
MOD(TO_NUMBER("V1"),10)=1

--------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  PX COORDINATOR                   |          |      0 |        |      0 |00:00:00.01 |       0 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  3 |    TABLE ACCESS BY INDEX ROWID    | T2       |  29983 |      1 |      0 |00:00:22.21 |    1000 |
|   4 |     BUFFER SORT                   |          |  29995 |        |    999 |00:00:21.78 |       0 |
|   5 |      PX RECEIVE                   |          |  29924 |      1 |    994 |00:00:21.21 |       0 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |
|   7 |        PX SELECTOR                |          |  29993 |        |    999 |00:00:06.08 |   19992 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |  29999 |      1 |   1000 |00:00:00.24 |   20000 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MOD(TO_NUMBER("V1"),10)=1)
   8 - access("ID"=:1)

Unlike the test case I started with, this output did show the number of starts (with a few missing) and the work done across the slaves. Our index probe had to do two buffer gets on every execution, and we have 10,000 executions of the query so 20,000 buffer gets on the index unique scan. Even though only one slave actually does any work with the PX Selector, all three slaves in that set seem to “start” the relevant operations. The definition of the data meant that only one index probe in 10 was successful so we only visited 1,000 buffers from the table. If you’re familiar with reading plans with rowsource execution stats you’ll appreciate that something has gone wrong in the reporting here – that 1,000 at operation 3 should read 21,000 because it ought to include the 20,000 from the index scan (at least, that’s what a serial plan would do).

If you’re still wondering why running this query as a parallel query should take so long – after all it’s only 10,000 executions in 70 seconds – bear in mind that Oracle has to allocate and deallocate 6 PX servers to new sessions each time it starts; the instance activity stats showed “logons cumulative” going up by 60,000 each time I ran the driving query: that’s about 850 logons (and log offs) per second. I don’t think my test machine would give a realistic impression of the impact of a couple of copies of this query running simultaneously, but when I tried the contention introduce increased the run time to 93 seconds.

tl;dr

Watch out for poor performance becoming disastrous for distributed parallel queries when you upgrade from 11g to 12c

 

 

.

June 7, 2016

Quiz Night

Filed under: 12c,Execution plans,Oracle,subqueries — Jonathan Lewis @ 7:35 pm GMT Jun 7,2016

Here’s an execution plan from a recent OTN database forum posting:

 
------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |                    |     1 |   231 |   160   (6)| 00:00:02 |
|   1 |  UPDATE                   | GS_TABLE           |       |       |            |          |
|*  2 |   HASH JOIN SEMI          |                    |     1 |   231 |   130   (0)| 00:00:02 |
|*  3 |    TABLE ACCESS FULL      | GS_TABLE           |     5 |   895 |   123   (0)| 00:00:02 |
|*  4 |    TABLE ACCESS FULL      | UPDATEDPROGRAMCODE |   850 | 44200 |     7   (0)| 00:00:01 |
|*  5 |   VIEW                    |                    |    11 |  2024 |     8  (13)| 00:00:01 |
|*  6 |    WINDOW SORT PUSHED RANK|                    |    11 |   440 |     8  (13)| 00:00:01 |
|*  7 |     FILTER                |                    |       |       |            |          |
|*  8 |      TABLE ACCESS FULL    | UPDATEDPROGRAMCODE |    11 |   440 |     7   (0)| 00:00:01 |
|   9 |   VIEW                    |                    |   850 |  1138K|     9  (23)| 00:00:01 |
|  10 |    SORT ORDER BY          |                    |   850 |   685K|     9  (23)| 00:00:01 |
|* 11 |     VIEW                  |                    |   850 |   685K|     8  (13)| 00:00:01 |
|  12 |      WINDOW SORT          |                    |   850 | 47600 |     8  (13)| 00:00:01 |
|* 13 |       FILTER              |                    |       |       |            |          |
|* 14 |        TABLE ACCESS FULL  | UPDATEDPROGRAMCODE |   850 | 47600 |     7   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Unfortunately the originator of this plan declined to show us the query or answer any questions about where the work was going, but did confirm a speculative comment I had made that the instance was 12c. So the question is this: can you spot what it was that made me think that the plan came from 12c ?

I have to say, by the way, that there may be ways to get this plan from 11g, it was just that my first impression was that it was probably 12c and I didn’t attempt to come up with a way of getting a similar plan from 11g. (And, as far as the general shape of the plan is concerned, I can think of two different types of query that could produce it.)

Footnote

You are allowed to prove me wrong.

Answer

(Which might be me showing ignorance rather than inspiration)

The basic shape of the plan suggests to me that the query is of the form:

update gs_table 
set     col1 = (select from updatedprogramcode),
        col2 = (select from updatedprogramcode)
where   exists (select from updatedprogramcode)         -- possibly "where IN (subquery)"
;

There are a couple of variations in how the “set” subquery content might vary, and I’ll write up a short blog about that later.

Having noted this basic shape, I then noted that the subqueries involved analytic functions – as indicated by the WINDOW SORT operations; moreover one of them used a PUSHED RANK option and the other was embedded in a non-mergeable VIEW (operation 11). Updates with subqueries generally involve correlated columns – and prior to 12c there are some restrictions on how far up the tree the correlation can go. Here’s a sample query (using two tables that I’ve cloned from all_objects) to demonstrate:


update t1 set
        data_object_id = (
                select  objno
                from    (
                        select
                                object_id objno,
                                row_number() over (order by  object_id desc) rn
                        from
                                t2
                        where
                                t2.object_type = t1.object_type
                        )
                where rn = 1
        )
/

We need to embed the inner select statement in an inline view because we want to use the result of the row_number() analytic function in a filter predicate, but in Oracle 11g the reference to t1.object_id can’t correlate back to the outer t1 table, while in 12c this isn’t a problem. Here’s the 12c plan, followed by the 11g error:


12c Plan (autotrace)
----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |      |  5000 | 45000 | 70012  (15)| 00:04:34 |
|   1 |  UPDATE                   | T1   |       |       |            |          |
|   2 |   TABLE ACCESS FULL       | T1   |  5000 | 45000 |    12   (0)| 00:00:01 |
|*  3 |   VIEW                    |      |     1 |    26 |    13   (8)| 00:00:01 |
|*  4 |    WINDOW SORT PUSHED RANK|      |   556 |  6116 |    13   (8)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL     | T2   |   556 |  6116 |    12   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("RN"=1)
   4 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID")
              DESC )<=1)
   5 - filter("T2"."OBJECT_TYPE"=:B1)


11g Error
---------
                                t2.object_type = t1.object_type
                                                 *
ERROR at line 11:
ORA-00904: "T1"."OBJECT_TYPE": invalid identifier

Notice, by the way that my predicate “rn = 1” has resulted in the WINDOW SORT PUSHED RANK that appeared in the original plan.

In case I haven’t said it enough times: this is a just a rapid inference I drew from looking briefly at the plan and I haven’t tried hard to work out whether there is a way to get a plan like this in 11g. It was nice being proved right by the follow-up post from the OP, but my guess may have been right by accident – I’d rather be proved wrong than carry on thinking I’d got it right when I hadn’t … so feel free to supply an example in the comments.

 

May 23, 2016

Virtual Partitions

Filed under: 12c,CBO,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 1:16 pm GMT May 23,2016

Here’s a story of (my) failure prompted by a recent OTN posting.

The OP wants to use composite partitioning based on two different date columns – the table should be partitioned by range on the first date and subpartitioned by month on the second date. Here’s the (slightly modified) table creation script he supplied:


rem
rem     Script: virtual_partition.sql
rem     Dated:  May 2016
rem

CREATE TABLE M_DTX
(
        R_ID    NUMBER(3),
        R_AMT   NUMBER(5),
        DATE1   DATE,
        DATE2   DATE,
        VC GENERATED ALWAYS AS (EXTRACT(MONTH FROM DATE2))
)
PARTITION BY RANGE (DATE1) interval (numtoyminterval(1,'MONTH'))
SUBPARTITION BY LIST (VC)
        SUBPARTITION TEMPLATE (
                SUBPARTITION M1 VALUES (1),
                SUBPARTITION M2 VALUES (2),
                SUBPARTITION M3 VALUES (3),
                SUBPARTITION M4 VALUES (4),
                SUBPARTITION M5 VALUES (5),
                SUBPARTITION M6 VALUES (6),
                SUBPARTITION M7 VALUES (7),
                SUBPARTITION M8 VALUES (8),
                SUBPARTITION M9 VALUES (9),
                SUBPARTITION M10 VALUES (10),
                SUBPARTITION M11 VALUES (11),
                SUBPARTITION M12 VALUES (12)
        )
        (
        PARTITION M_DTX_2015060100 VALUES LESS THAN (TO_DATE('2015-06-01 00:00:01', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
        )
;

There’s nothing particularly exciting about this – until you get to the query requirement – the user wants to query on date1 and date2, and doesn’t know about the virtual month column, e.g. (and, I know that there should be a to_date() or ANSI equivalent here):

SELECT * FROM m_dtx WHERE date1 = trunc(sysdate) AND date2 = '01-Jun-2016';

Now, as a general rule, you don’t expect partition elimination to occur unless the partitioning column appears with a predicate that make elimination possible, so your first response to this query is that it could eliminate on date1, but can’t possibly eliminiate on vc because vc isn’t in the where clause. However it’s possible that the partitioning code might be coded to recognise that the subpartition is on a virtual column that is derived from date2, so perhaps it could generate a new predicate before optimising, for example:

date2 = '01-Jun-2016'  => vc = 6

Unfortunately, your first response is correct – the optimizer doesn’t get this clever, and doesn’t do the sub-partition elimination. Here’s the execution plan from 12.1.0.2 for the sample query, followed by the execution plan when I explicitly add the predicate vc = 6.


SQL_ID  8vk1a05uv16mb, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy')

Plan hash value: 3104206240

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST ALL   |       |     1 |    57 |    15   (7)| 00:00:01 |     1 |    12 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))



SQL_ID  33q012bdhjrpn, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy') and vc
= 6

Plan hash value: 938710559

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |     6 |     6 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))


Note how the predicate vc = 6  doesn’t show up in the predicate section in either case, but the execution plan shows PARTITION LIST ALL at operation 2 when we omit the predicate and PARTITION LIST SINGE when we include it (with suitable values also appearing for Pstart and Pstop). (The cost, by the way, is the cost of scanning a whole (range)partition whether or not the optimizer expects to restrict that scan to just one sub-partition.)

So the optimizer isn’t quite clever enough (yet). BUT … the optimizer can be very clever with constraints, combining constraints with predicates and applying transitive closure to produce new predicates – so maybe we could get the optimizer to do this if we helped it a little bit. Given the table definition supplied I’m going to assume that the date2 column is supposed to be non-null, so let’s add some truthful constraints/declarations to the table definition:


alter table m_dtx modify date2 not null;
alter table m_dtx modify vc  not null;
alter table m_dtx add constraint md_ck_vc check (vc = extract(month from date2));

Alas, this didn’t make any difference to the execution plan. But it did do something surprising to my attempts to load data into the table:


insert into m_dtx (r_id, r_amt, date1, date2)
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum, 1000),
        rownum,
        trunc(sysdate,'yyyy') + dbms_random.value(0,365),
        trunc(sysdate,'yyyy') + dbms_random.value(0,365)
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-01400: cannot insert NULL into (???)

So the array insert with the virtual column doesn’t like the NOT NULL constraint on the virtual column because vc is, presumably, still null when the constraint is checked (though there’s no problem with single row inserts with the values() clause – I wonder what happens with the PL/SQL “FORALL” clause) – so let’s remove the not null constraint on vc and see what happens.


insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-02290: check constraint (TEST_USER.MD_CK_VC) violated

Unsurprisingly, given the fact that Oracle didn’t like the not null constraint, the critical check constraint also fails. This, by the way, is odd because a check constraint should accept a row when the constraint doesn’t evaluate to FALSE, so (a) vc can’t have been evaluated at this point or the constraint would evaluate to TRUE – which is not FALSE, and (b) vc at this point can no longer be null or the constraint would evaluate to NULL – which is not FALSE: so what “value” has vc got that makes the constraint check return FALSE ?

Bottom line:

I can see some scope for an optimizer enhancement that tries to find eliminating predicates from virtual columns; and I think there’s a need for ensuring that we can safely add constraints to virtual columns – after all we might want to create an index on a virtual column and sometimes we need a NOT NULL declaration to ensure that an index-only execution path can be found. Unfortunately I have to end this blog without finding an immediate solution for the OP.

Despite this failure, though, there are cases (as I showed a couple of years ago) where the optimizer in 12c can get clever enough to recognize the connection between a queried date column and the virtual partitioning column based on that date column.

May 16, 2016

Cursor_Sharing problem

Filed under: 12c,Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 12:49 pm GMT May 16,2016

Here’s a possible bug (though maybe “not a bug”) that came up over the weekend on the OTN database forum. An application generating lots of “literal string” SQL was tested with cursor_sharing set to force. This successfully forced the use of bind variable substitution, but a particular type of simple insert statement started generating very large numbers of child cursors – introducing a lot of mutex waits and library cache contention. Here’s a (substituted) statement that was offered as an example of the problem:


INSERT INTO schema.tableName (column1,columns2,..)
VALUES (:"SYS_B_0",:"SYS_B_1",:"SYS_B_2",:"SYS_B_3",:"SYS_B_4",:"SYS_B_5",:"SYS_B_6",timestamp:"SYS_B_7",SYSTIMESTAMP,:"SYS_B_8")

Note particularly the slightly odd looking detail: timestamp:”SYS_B_7″; this is how bind variable substitution looks if you’ve used the “compact” ANSI mechanism for handling datetime literals. We were told, in fact, that the tables had only number, varchar2, and date columns – so it looks a little suspicious when see timestamp values being inserted but the implied coercion wasn’t the source of the problem. Here’s all it takes to see the problem (tested only on 12.1.0.2):


rem
rem     Script:         ansi_datetime_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem

create table t1 (
        n1      number(8,0),
        v1      varchar2(10),
        d1      date,
        t1      timestamp
);

insert into t1 values(-1,'x',sysdate, systimestamp);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

set serveroutput off
alter session set cursor_sharing = force;

prompt  ============================
prompt  Testing DATE literals
prompt  Expect to see child number 2
prompt  ============================

insert into t1 values(1, 'A', date'2016-01-01', null);
insert into t1 values(1, 'A', date'2016-01-02', null);
insert into t1 values(1, 'A', date'2016-01-03', null);

select * from table(dbms_xplan.display_cursor(null,null,'-note -plan_hash'));

prompt  ============================
prompt  Testing TIMESTAMP literals
prompt  Expect to see child number 2
prompt  ============================

insert into t1 values(1, 'A', null, timestamp'2016-01-01 00:00:00');
insert into t1 values(1, 'A', null, timestamp'2016-01-02 00:00:00');
insert into t1 values(1, 'A', null, timestamp'2016-01-03 00:00:00');

select * from table(dbms_xplan.display_cursor(null,null,'-note -plan_hash'));

prompt  ===============================
prompt  Need privilege to see this view
prompt  Pre-coded for the sql_ids above
prompt  ===============================

break on sql_id skip 1

select
        sql_id, child_number, hash_match_failed
from
        v$sql_shared_cursor
where
        sql_id in ('58udhcm270bhn', 'gssz5cbnt7mgn')
order by
        sql_id, child_number
;

A couple of points – there are several more tests in the script demonstrating things that do NOT cause multiple child cursors to appear. I probably didn’t cover all the options that I could have covered but I hit a number of common cases to check that it wasn’t simply that cursor_sharing being severely broken in 12c. I’ve also allowed a side effect to demonstrate the presence of multiple child cursors rather than explcitly listing the child cursors. If the three statements (of each type) had produced shareable cursors then the child number reported by dbms_xplan.display_cursor() would have been zero in both cases. Here are the actual outputs:


SQL_ID  58udhcm270bhn, child number 2
-------------------------------------
insert into t1 values(:"SYS_B_0", :"SYS_B_1", date:"SYS_B_2", null)

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | T1   |       |
-------------------------------------------------


SQL_ID  gssz5cbnt7mgn, child number 2
-------------------------------------
insert into t1 values(:"SYS_B_0", :"SYS_B_1", null, timestamp:"SYS_B_2")


-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | T1   |       |
-------------------------------------------------


And, having pre-coded the script with the SQL_IDs of the two guilty statements, here’s the output identifying the cause of the failure to share from v$sql_shared_cursor:


SQL_ID        CHILD_NUMBER H
------------- ------------ -
58udhcm270bhn            0 N
                         1 Y
                         2 Y

gssz5cbnt7mgn            0 N
                         1 Y
                         2 Y

There is a bug on MoS relating to timestamp columns and failure to share cursors – it doesn’t really look like the same problem but it could be related in some way: Bug 13407937 : HIGH VERSION COUNT FOR INSERT WITH CURSOR_SHARING=FORCE AND TIMESTAMP COLUMN. It’s described as “not a bug” :(

April 25, 2016

DDL logging

Filed under: 12c,Oracle,Partitioning — Jonathan Lewis @ 1:05 pm GMT Apr 25,2016

I was presenting at the UKOUG event in Manchester on Thursday last week (21st April 2016), and one of the sessions I attended was Carl Dudley’s presentation of some New Features in 12c. The one that caught my eye in particular was “DDL Logging” because it’s a feature that has come up fairly frequently in the past on OTN and other Oracle forums.

So today I decided to write a brief note about DDL Logging – and did a quick search of my blog to see if I had mentioned it before: and I found this note that I wrote in January last year but never got around to publishing – DDL Logging is convenient, but doesn’t do the one thing that I really want it to do:

DDL Logging – 12c

One of the little new features that should be most welcome in 12c is the ability to capture all DDL executed against the database. All it takes is a simple command (if you haven’t set the relevant parameter in the parameter file):

alter system set enable_ddl_logging = true;

All subsequent DDL will be logged to two different places (in two formats)

  • $ORACLE_BASE/diag/rdbms/{database}/{instance}/log/ddl/log.xml
  • $ORACLE_BASE/diag/rdbms/{database}/{instance}/log/ddl_{instance}.log

Unfortunately the one thing I really wanted to see doesn’t appear – probably because it doesn’t really count as DDL –  it’s the implicit DDL due to inserting into not-yet-existing partitions of an interval partitioned table.

Note: If you’re using a container database with pluggable databases then the DDL for all the pluggable databases goes into the same log file.

Update – Licensing

The following text in the Oracle 12c Database Licensing document has just been brought to my attention:

Licensed Parameters
The init.ora parameter ENABLE_DDL_LOGGING is licensed as part of the Database Lifecycle Management Pack when set to TRUE. When set to TRUE, the database reports schema changes in real time into the database alert log under the message group schema_ddl. The default setting is FALSE.

The licensing document is also linked to from the 12c online html page for the parameter.

The 11g parameter definition makes no mention of licensing, and the 11g “New Features” manual don’t mention the feature at all, but the parameter does get a special mention in the 11g licensing document where it is described as being part of the Change Management Pack.

init.ora Parameters
The use of the following init.ora parameter is licensed under Oracle Change Management Pack:
■ ENABLE_DDL_LOGGING: when set to TRUE (default: FALSE)

 

April 1, 2016

Set Operations

Filed under: 12c,CBO,Execution plans,Oracle — Jonathan Lewis @ 2:20 pm GMT Apr 1,2016

A recent post on the OTN database forum highlights a couple of important points ideas for optimising SQL. There are: (a) is there a logically equivalent way of stating the SQL and (b) is there a different “natural language” way of posing the problem.

The posting starts with a query, part of an execution plan, and a request to “get rid of the tablescan”. I guessed originally that the query came from an 11g instance, and the OP gave us some code to create the tables and indexes, so I’ve modelled the tables to get the indicated plan (then filled in the original numbers). This is the query, and my cosmetically adjusted version of the plan output that the OP probably got:


SELECT a.hotel_code
  FROM lf_hotel_temp a
WHERE a.service_id = : p_service_id
       AND (NOT EXISTS (SELECT *
          FROM lf_ts_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code)
        or NOT EXISTS (SELECT *
          FROM lf_gta_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code) 
       or  NOT EXISTS (SELECT *
          FROM lf_hb_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code))

-------------------------------------------------------------------------------
| Id  | Operation          | Name                     | Rows  |  Bytes | Cost |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                          | 12613 | 113517 |  135 |
|*  1 |  FILTER            |                          |       |        |      |
|*  2 |   TABLE ACCESS FULL| LF_HOTEL_TEMP            | 88433 | 795897 |  135 |
|*  3 |   INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX  |     1 |      7 |    1 |
|*  4 |   INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX |     1 |      9 |    1 |
|*  5 |   INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX  |     2 |     14 |    3 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
              "B"."HOTEL_CODE"=:B1) OR  NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
              WHERE "B"."HOTEL_CODE"=:B2) OR  NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
              "B" WHERE "B"."HOTEL_CODE"=:B3))
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   3 - access("B"."HOTEL_CODE"=:B1)
   4 - access("B"."HOTEL_CODE"=:B1)
   5 - access("B"."HOTEL_CODE"=:B1)

We were told in the original posting that there’s a primary key on lf_hotel_temp declared on (hotel_code, service_id), and we were given the definitions, sizes, and index declarations of all the table in a follow-up posting. It turns out that lf_hotel_temp consists of just those two columns and holds 278,000 rows: the optimizer’s estimate for the number of rows identified by a single service_id is over 88,000, and the nature of the query tells us that the optimizer would have to examine every one of those rows to check if it satisfied any of the three subqueries.

So how might Oracle access the rows ?  Given that the only columns used will all be in the primary key index (which implies not null constraints) there are four basic options: tablescan, index fast full scan, index full scan, and index skip scan. Given the most likely data content (i.e. lots of different hotel_codes), we can assume the skip scan would be a very bad idea. We can be sure that an index fast full scan will be lower cost than an index full scan – for anything except tiny indexes. Ultimately the question is really “why a tablescan instead of an index fast full scan?”. As I pointed out, though, the table consists of just those two columns – which means it’s perfectly reasonable for the index to be larger than the table as each entry of the index will consist of the two columns AND a rowid.

The first interesting bit

The question of why the access to lf_hotel_temp was by tablescan rather than some indexed method isn’t really interesting. The interesting bit is how (in principle) we might make the plan more efficient (if it really needs it); and this leads to two key, and general purpose, observations. As Andrew Sayer pointed out on the thread, we have a compound predicate:

    (not exists A OR not exists B OR not exists C)

and this is logically equivalent to

   not (exists A AND exists B AND exists C)

If we rewrite the query to reflect this equivalence could the optimizer find a different, better way of executing it:


select  /*+ dynamic_sampling(0) */
        a.hotel_code
from    lf_hotel_temp a
where
        a.service_id = :p_service_id
and     not(
                exists (
                        select  null
                        from    lf_ts_roomtype_properties ts
                        where   ts.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_gta_roomtype_properties gta
                        where   gta.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_hb_roomtype_properties hb
                        where   hb.hotel_code = a.hotel_code
                )
        )
;

Of course, I didn’t have the original data; so I copied the DDL supplied in the OTN thread and added a little DML to insert a few rows in the tables. The data I used looked like this:


insert into lf_hotel_temp (hotel_code, service_id) values ('A',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('B',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('C',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('D',1);

-- insert into lf_ts_roomtype_properties values ( 'A','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_gta_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_gta_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_hb_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'B','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_hb_roomtype_properties values ( 'D','x','x',0,1,'x');
commit;

It’s possible that with different data volumes you’d get different execution plans, but in 11g the optimizer transformed my query back into the original form – in other words it recognised the equivalence of “not (A and B and C)” and rewrote it as “(not A or not B or not C)” !

However, I also have 12c available, and I had created a script to build a model, so I ran the test on 12c. Both versions of the query produced the following plan:


----------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|   2 |   VIEW                | VW_SQ_1                    |    82 |   984 |     6   (0)| 00:00:01 |
|*  3 |    HASH JOIN SEMI     |                            |    82 |  2952 |     6   (0)| 00:00:01 |
|*  4 |     HASH JOIN         |                            |    82 |  1968 |     4   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES |    82 |   984 |     2   (0)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| LF_HB_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|   7 |     TABLE ACCESS FULL | LF_TS_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|*  8 |   INDEX FULL SCAN     | LF_HOTEL_TEMP_PK           |   101 |   198K|     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("VW_COL_1"="A"."HOTEL_CODE")
   3 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("TS"."HOTEL_CODE"))
   4 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("GTA"."HOTEL_CODE"))
   8 - access("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))

Ignore the numbers (I hadn’t collected stats, which is why I added the /*+ dynamic_sampling(0) */ hint – with stats in place 12c produced the FILTER plan that 11g had produced) the key feature is that Oracle has managed to transform my three filter subqueries into a single join subquery and then transformed the resulting subquery into an anti-join. It’s a pretty amazing transformation – the optimizer did it automatically in 12c, but if you are aware of the logical equivalence then you may find cases where you can turn “OR’s” into “AND’s” and help the optimizer to find transformations that it can’t find automatically.

The second interesting bit

If you think about the meaning behind the query (prompted, perhaps, by the logical equivalence described above) you might rephrase the question as “find me the hotel codes that fail to appear in all three related tables” – in English this is ambigious and open to catastrophic mis-interpretation so you might have another go and say “find me the hotel codes that appear in every one of the three related tables – those are the hotel codes I don’t want”. This latter expression, of course, is exactly what Oracle is doing by joining the three tables and then doing the “not exists”/anti-join against the result. Obviously you could translate the new English form into SQL by hand, with a three table join in a “not exists” subquery.

I actually took a different approach (which might, or might not, be efficient – depending on the actual data and indexes).  I translated the new English statement into the following:


rem
rem     Script:         minus_intersect.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

select  /*+ dynamic_sampling(0) */
        hotel_code
from    lf_hotel_temp
where   service_id = :p_service_id
minus   (
        select  hotel_code
        from    lf_ts_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_gta_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_hb_roomtype_properties
        where   hotel_code is not null
        )
;

The three way intersection gets me the list of hotels that appear in all three tables; the minus operator takes the list of hotel with the correct service_id and eliminates from it the hotels that appear in the intersection – giving me the result I want.

For my tiny data set, this is the plan I got:

--------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |  2159 |     8  (50)| 00:00:01 |
|   1 |  MINUS                |                          |       |       |            |          |
|   2 |   SORT UNIQUE NOSORT  |                          |     1 |  2015 |     2  (50)| 00:00:01 |
|*  3 |    INDEX FULL SCAN    | LF_HOTEL_TEMP_PK         |     1 |  2015 |     1   (0)| 00:00:01 |
|   4 |   INTERSECTION        |                          |       |       |            |          |
|   5 |    INTERSECTION       |                          |       |       |            |          |
|   6 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  7 |      INDEX FULL SCAN  | LF_TS_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
|   8 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  9 |      INDEX FULL SCAN  | LF_GTA_ROOMTYPE_PROP_IDX |     4 |    48 |     1   (0)| 00:00:01 |
|  10 |    SORT UNIQUE NOSORT |                          |     4 |    48 |     2  (50)| 00:00:01 |
|* 11 |     INDEX FULL SCAN   | LF_HB_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   7 - filter("HOTEL_CODE" IS NOT NULL)
   9 - filter("HOTEL_CODE" IS NOT NULL)
  11 - filter("HOTEL_CODE" IS NOT NULL)

Important note: I am not claiming that this use of set operators will be more efficient than a filter subquery or anti-join/semi-join approach, performance ultimately depends on the volume and patterns in the data combined with the available indexing. In this case you can almost see the classic performance compromise that we often see in Oracle – even in the trade-off between something as simple as choosing between a hash join and a nested loop join – should we operate this query as a tiny number of “bulk” operations, or as a (potentially) large number of tiny, high-precision operations.

If the original query was spending all it’s time on CPU running lots of subqueries, or doing lots of single block random I/Os because of the random ordering of the subqueries, then perhaps a couple of brute force “db file parallel read” index full scans would be a friendlier use of the available resources, run more quickly, and have less impact on every other user.

 

March 29, 2016

Index Usage

Filed under: 12c,Exadata,HCC,in-memory,Indexing,Oracle,Performance — Jonathan Lewis @ 10:53 am GMT Mar 29,2016

There are some questions about Oracle that are like the mythical Hydra – you think you’ve killed it, but for every head you cut off another two grow. The claim that “the optimizer will switch between using an index and doing a tablescan when you access more than X% of the data” re-appeared on the OTN database forum a little while ago – it doesn’t really matter what the specific value of X was – and it’s a statement that needs to be refuted very firmly because it’s more likely to cause problems than it is to help anyone understand what’s going on.

At a very informal level we may have an intuitive feeling that for a “precise” query accessing a “small” amount of data an indexed access path should make sense while for a “big” query accessing a “large” amount of data we might expect to see a tablescan, but any attempt to give a meaning to “small” and “large” that is both general purpose and strictly quantified will be wrong: there are too many variables involved.

Just as a quick demonstration of how badly we can be misled by a simple numeric claim here’s a quick test I created on a newly installed instance of 11.2.0.4, which I happened to set up with a locally defined tablespace using uniform extents of 1MB using the default 8KB blocksize but with manual (freelist) space management:


rem
rem     Script:   index_usage_pct.sql
rem     Dated:    March 2016
rem     Author:   J P Lewis
rem

drop table t1;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4
)
select
        cast(rownum as number(8,0))                              id,
        cast(trunc(dbms_random.value(0,1e6)) as number(8,0))     n1,
        lpad(rownum,6,'0')              v1,
        rpad('x',10,'x')                small_vc,
        rpad('x',151,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(id);

spool index_usage_pct.lst

select  num_rows, blocks, avg_row_len, round(num_rows/blocks) rows_per_block
from    user_tables
where   table_name = 'T1'
;

set autotrace on explain
select count(v1) from t1 where id between 1 and 245000;
set autotrace off

spool off

I’ve created a table with 1 million rows; the rows are about 180 bytes long (you’ll see the sizes a few lines further down the page), so it’s not an unreasonable model for lots of tables in typical systems – if you want to experiment further you can adjust the rpad() in the padding column; and I’ve created an index on a sequentially  (rownum) generated column. My call to autotrace will produce a truthful execution plan for the query supplied – there’s no risk of unexpected type conversion and no problems from bind variable peeking. As you can easily infer, my query will access 245,000 rows in the table of 1,000,000 – nearly a quarter of the table. Would you expect to see Oracle use the index ?

Here’s the output from the script on MY brand new database, instance, and tablespace:


  NUM_ROWS     BLOCKS AVG_ROW_LEN ROWS_PER_BLOCK
---------- ---------- ----------- --------------
   1000000      25642         180             39

1 row selected.


 COUNT(N1)
----------
    245000

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 269862921

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    10 |  6843   (1)| 00:01:23 |
|   1 |  SORT AGGREGATE              |       |     1 |    10 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   245K|  2392K|  6843   (1)| 00:01:23 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |   245K|       |   553   (1)| 00:00:07 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">=1 AND "ID"<=245000)


There are no tricks involved here, no cunning fiddling with data structures or parameters – this is just a simple, straightforward, test.

Of course the result is probably a little counter-intuitive; 24.5% of the data seems a lot for the optimizer to pick an index. There are many reasons for this, the first being that the data is very well clustered relative to the index – the index’s clustering_factor is the smallest it could be for a B-tree indexing every row in this table.

Another important feature, though, is that I haven’t done anything with the system statistics so the optimizer was using various default values which tell it that a multiblock read will be quite small (eight blocks) and a lot slower than a single block read (26 ms vs. 12 ms). One simple change that many people might have made during or shortly after installation (though it shouldn’t really be done in any modern version of Oracle) is to set the db_file_multiblock_read_count parameter to 128 – with just this change the optimizer would assume that a multiblock read really would be 128 blocks, but that it would now take 266 ms. That means the optimizer will assume that the read will be ten times slower than it was, but will read 32 times as much data – a fairly significant relative improvement thanks to which the access path for my initial query will switch to a full tablescan and won’t switch back to an index range scan until I reduce the range from 245,000 to something like 160,000.

I can go further, of course. With a few background queries running to exercise the database I executed the dbms_stats.gather_system_stats() procedure with the ‘start’ and ‘stop’ options to collect some figures about the hardware and expected workload. This gave me the following results,  interpreted from the sys.aux_stats$ table:


MBRC       :126
MREADTIM   :0.902
SREADTIM   :0.386
CPUSPEED   :976

With the optmizer using these figures to compare the relative speed and size of single and multiblock reads I had to reduce my selected range to roughly 51,000 before the optimizer would choose the index range scan.

I could go on to demonstrate the effects of the dbms_resource_manager.calibrate_io procedure and the effects of allowing different degrees of parallelism with different system stats, but I think I’ve probably made the point that there’s a lot of variation in the break point between index range scans and tablescans EVEN when you don’t change the data. With this very well-ordered (perfect clustering_factor) data I’ve seen the break point vary between 51,000 rows and 245,000 rows (5% and 25%).

And finally …

Let’s just finish with a last (and probably the most important) variation:  changing the pattern in the data we want from perfectly clustered to extremely scattered. If you check the query that generated the data you’ll see that we can do this by creating an index on column n1 instead of column id, and changing the where clause in the test query to n1 between 1 and 4500 (which, in my case, returned slightly more that 4,500 rows thanks to a small amount of duplication generated by the call to dbms_random.value()). With my most recent settings for the system statistics the optimizer chose to use a tablescan at slightly under 0.5% of the data.

Remember, there are many factors involved in the optimizer choosing between a tablescan and an index range scan and one of the most significant factors in the choice is the (apparent) clustering of the data so, if you haven’t come across it before, you should examine the “table_cached_blocks” option that appeared in 11.2.0.4 for the procedure dbms_stats.set_table_prefs() as this allows you to give the optimizer a better idea of how well your data really is clustered.

Addendum (April 2016)

Following on from the demonstration of how changes in parameters, patterns and statistics can make a difference in what we (or the optimizer) might consider a “small” amount of data and whether an indexed access path would be appropriate, it’s worth mentioning that the Exadata technologies of smart scans and hybrid columnar compression and Oracle’s latest technology of In-Memory Colum Store do not change the way you think about indexes – they only change the (unspecifiable) volume at which an index ceases to be the better option to use.

 

February 11, 2016

Quiz Night

Filed under: 12c,Infrastructure,LOBs,Oracle,Statistics — Jonathan Lewis @ 3:28 pm GMT Feb 11,2016

I was setting up a few tests on a copy of 12.1.0.2 recently when I made a mistake creating the table – I forgot to put in a couple of CAST() calls in the select list, so I just patched things up with a couple of “modify column” commands. Since I was planning to smash the table in all sorts of ways and it had taken me several minutes to create the data set (10 million rows) I decided to create a clean copy of the data so that I could just drop the original table and copy back the clean version – and after I’d done this I noticed something a little odd.

Here’s the code (cut down to just 10,000 rows), with a little output:

--
--      Script:  12c_vc32767_b.sql
--

create table t1 (
        id      not null,
        date_open,
        date_closed,
        deal_type,
        client_ref,
        small_vc,
        padding
)
nologging
as
select
        rownum                                          id,
        trunc(
                add_months(sysdate, -120) +
                        (rownum)* 3652 / 1e7
        )                                               date_open,
        trunc(
                add_months(
                        add_months(sysdate, -120) +
                                (rownum) * 3652 / 1e7,
                        12 * trunc(dbms_random.value(1,6))
                )
        )                                               date_closed,
        dbms_random.string('U',1)                       deal_type,
        dbms_random.string('U',4)                       client_ref,
        lpad(rownum,10)                                 vc_small,
        rpad('x',100,'x')                               vc_padding
from
        dual
connect by
        level <= 10000
;
alter table t1 modify deal_type varchar2(1);
alter table t1 modify client_ref varchar2(4);

create table t2 nologging as select * from t1;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt=>'for all columns size 1'
        );
end;
/

select table_name, num_rows, blocks, avg_row_len from user_tables;

TABLE_NAME             NUM_ROWS     BLOCKS AVG_ROW_LEN
-------------------- ---------- ---------- -----------
T1                        10000        304         139
T2                        10000        218         139

There’s no hidden code – this is the code I ran, and the tables both went into the same tablespace- yet one table has used about 30% more blocks than the other on a simple “create as select”, even though the row lengths and row counts are the same!

When I first did this test it just so happened that the only place in the database I had to make the t2 copy was in a tablespace using freelist management, and I’d created the original table in a tablespace using ASSM, so I wasted a little time examining the ASSM space management (bitmap) blocks before I realised what had happened.

Suggestions about what has happened, and how to confirm your hypothesis, invited in the comments.

Update (following morning)

The explanation is in the comments: I had enabled extended (32,767 byte) varchar2().  My calls to dbms_random.string() had returned character values, and the return type defaults to the maximum character length, which meant they were declared as varchar2(32767) and those are implemented as (unhackable) CLOBs in 12.1.0.2.

When I modified the table to change the varchar2(32767) to shorter lengths Oracle let me get away with it because none of the actual stored values at that time was longer than my target declaration – but it didn’t physically change the stored column values to “short” varchar2(), it left them as CLOBs, and a “short” CLOB includes a LOB Locator which starts out at about 30 bytes.

When I created t2 from t1 Oracle first copied the (short) column definitions, and then copied the data, reading and converting the CLOBs to normal varchar2() storage, and that explains why t2 needed fewer blocks.  The avg_row_len for the two tables matched because the code to gather stats simply applies the sys_op_opnsize() function to the varchar2() defined in t1, and doesn’t give you any indication about the LOB locator.

(If you use the extremely deprecated analyze command to gather stats on the two tables you’ll find that the avg_row_len of table t1 then allows for the LOB locator size.)

“Workaround:”

I should have used:  cast(dbms_random.string(‘U’,4) as varchar2(4)) and the column definition would have been a proper varchar2(4) from the start.  (As noted in a comment from Ivica Arsov below, substr(…, 1, 4) would also have worked.  Of course I should have used cast() for the numerics as well so that I could declare them as things like number(8,0) rather than the slightly dangerous “number”.

Note:  I’ve put “workaround” in quotes because it’s not really a workaround to a problem – it’s just an example of doing it right.

Footnote:

Here’s a symbolic dump of a single character (‘T’) being stored as a varchar2(32767):


LOB
Locator:
  Length:        84(31)
  Version:        1
  Byte Length:    1
  LobID: 00.00.00.01.00.00.01.6b.ba.d7
  Flags[ 0x01 0x0c 0x00 0x80 ]:
    Type: BLOB
    Storage: SecureFile
    Characterset Format: IMPLICIT
    Partitioned Table: No
    Options: ReadWrite
  SecureFile Header:
    Length:   11
    Old Flag: 0x48 [ DataInRow SecureFile ]
    Flag 0:   0x90 [ INODE Valid ]
    Layers:
      Lengths Array: INODE:5
      INODE:
        00 00 01 01 54

Note the 84(31) at line 3: this is a reminder of how big a LOB locator could get for an out of line LOB, compared to the current size of the LOB locator. Rows that hold out of line LOBs can actually be much longer than avg_row_len tells you.

Addendum

Another side effect of enabling extended varchar2() types is the surprise you get when you use error logging in SQL (i.e. syntax like: “insert into t1 select … log errors”). When you create the “clone” table for the log it changes every column from the original into a varchar2(32767) e.g.:

desc target
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ID                                                                                NUMBER
 N1                                                                                NUMBER
 N2                                                                                NUMBER
 SMALL_VC                                                                          VARCHAR2(10)
 PADDING                                                                           VARCHAR2(100)


begin
        dbms_errlog.create_error_log('TARGET');
end;
/

desc err$_TARGET
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ORA_ERR_NUMBER$                                                                   NUMBER
 ORA_ERR_MESG$                                                                     VARCHAR2(2000)
 ORA_ERR_ROWID$                                                                    ROWID
 ORA_ERR_OPTYP$                                                                    VARCHAR2(2)
 ORA_ERR_TAG$                                                                      VARCHAR2(2000)
 ID                                                                                VARCHAR2(32767)
 N1                                                                                VARCHAR2(32767)
 N2                                                                                VARCHAR2(32767)
 SMALL_VC                                                                          VARCHAR2(32767)
 PADDING                                                                           VARCHAR2(32767)

You probably hope that you’re not going to see many rows inserted into the error log table, so the extra space taken up in the table by error rows probably won’t be a problem – but the fact that you get two segments (the LOB and the LOBINDEX) created for every column in the original table might cause a problem – even if they are completely empty.

January 26, 2016

Trace file size

Filed under: 12c,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 8:30 am GMT Jan 26,2016

Here’s a convenient enhancement for tracing that came up on Twitter a few days ago – first in a tweet that I retweeted, then in a question from Christian Antognini based on this bit of the 12c Oracle documentation (opens in separate tab). The question was – does it work for you ?

The new description for max_dump_file_size says that for large enough values Oracle will split the file into multiple chunks of a few megabytes, using a suffix to identify the sequence of the chunks, keeping only the first chunk and the most recent chunks. Unfortunately this doesn’t seem to be true. However, prompted by Chris’ question I ran a quick query against the full parameter list looking for parameters with the word “trace” in their name:


select
        /*+
                leading(nam val val2)
                full(name)
                full(val)  use_hash(val)  no_swap_join_inputs(val)
                full(val2) use_hash(val2) no_swap_join_inputs(val2)
        */
        nam.ksppinm                             name,
        val.ksppstvl                            ses_val,
        val2.ksppstvl                           sys_val,
        nam.ksppdesc                            description,
        nam.indx+1                              numb,
        nam.ksppity                             type,
        val.ksppstdf                            is_def,
        decode(bitand(nam.ksppiflg/256,1),
                1,'True',
                  'False'
        )                                       ses_mod,
        decode(bitand(nam.ksppiflg/65536,3),
                1,'Immediate',
                2,'Deferred' ,
                3,'Immediate',
                  'False'
        )                                       sys_mod,
        decode(bitand(val.ksppstvf,7),
                1,'Modified',
                4,'System Modified',
                  'False'
        )                                       is_mod,
        decode(bitand(val.ksppstvf,2),
                2,'True',
                  'False'
        )                                       is_adj,
        val.ksppstcmnt                          notes
from
        x$ksppi         nam,
        x$ksppcv        val,
        x$ksppsv        val2
where
        nam.indx = val.indx
and     val2.indx = val.indx
and     ksppinm like '%&m_search.%'
order by
        nam.ksppinm
;

Glancing through the result I spotted a couple of interesting parameters with the letters “uts” in their names, so re-ran my query looking for all the “uts” parameters, getting the following (edited) list:


NAME                           SYS_VAL         DESCRIPTION    
------------------------------ --------------- ---------------------------------------------
_diag_uts_control              0               UTS control parameter
_uts_first_segment_retain      TRUE            Should we retain the first trace segment
_uts_first_segment_size        0               Maximum size (in bytes) of first segments 
_uts_trace_disk_threshold      0               Trace disk threshold parameter
_uts_trace_segment_size        0               Maximum size (in bytes) of a trace segment
_uts_trace_segments            5               Maximum number of trace segments 

Note particularly the “first segment size” and “trace segment size” – defaulting to zero (which often means a hidden internal setting, though that doesn’t seem to be the case here, but maybe that’s what the “diag control” is for). I haven’t investigated all the effects, but after a little experimentation I found that all I needed to do to get the behaviour attributed to max_dump_file_size was to set the following two parameters – which I could do at the session level.


alter session set "_uts_first_segment_size" = 5242880;
alter session set "_uts_trace_segment_size" = 5242880;

The minimum value for these parameters is the one I’ve shown above (5120 KB) and with the default value for _uts_trace_segments you will get a maximum of 5 trace files with sequential names like the following:

ls -ltr *4901*.trc

-rw-r----- 1 oracle oinstall 5243099 Jan 26 08:15 orcl_ora_4901_1.trc
-rw-r----- 1 oracle oinstall 5243064 Jan 26 08:15 orcl_ora_4901_12.trc
-rw-r----- 1 oracle oinstall 5243058 Jan 26 08:15 orcl_ora_4901_13.trc
-rw-r----- 1 oracle oinstall 5242993 Jan 26 08:15 orcl_ora_4901_14.trc
-rw-r----- 1 oracle oinstall 1363680 Jan 26 08:15 orcl_ora_4901.trc

As you can see I’m currently generating my 15th trace, and Oracle has kept the first one and the previous three. It’s always working on a file with no suffix to its name but as soon as that file hits its limiting size (plus or minus a few bytes) it gets its appropriate suffix, the oldest file is deleted, and a new trace file without a suffix is started.

Apart from the usual header information the trace files start and end with lines like:

*** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_11.trc ***
  
*** TRACE SEGMENT RENAMED TO /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_12.trc ***

There is one little trap to watch out for: if you set either of these parameters to be larger than max_dump_file_size tracing stops as soon as one of the segments hits the max_dump_file_size and that trace file ends with the usual “overflow” message – e.g, when I changed the max_dump_file_size to 4M in mid-session:

*** DUMP FILE SIZE IS LIMITED TO 4194304 BYTES ***

In my case I had started with max_dump_file_size set to 20M, so I got lucky with my choice of 5M as the segment size.

Further investigation is left as an exercise to the interested reader.

 

Next Page »

Powered by WordPress.com.