Oracle Scratchpad

September 2, 2015

IN/EXISTS bugs

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 8:11 am BST Sep 2,2015

Here’s a simple data set – I’m only interested in three of the columns in the work that follows, but it’s a data set that I use for a number of different models:


execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1000))	n_1000,
	trunc(dbms_random.value(0,750))		n_750,
	trunc(dbms_random.value(0,600))		n_600,
	trunc(dbms_random.value(0,400))		n_400,
	trunc(dbms_random.value(0,90))		n_90,
	trunc(dbms_random.value(0,72))		n_72,
	trunc(dbms_random.value(0,40))		n_40,
	trunc(dbms_random.value(0,3))		n_3
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;
create table t2 nologging 
as
select * from t1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

The columns I want to consider are n_3, n_400, and n_1000. As their names suggest the columns have 3, 400, and 1000 distinct values respectively and since I’ve used the dbms_random.value() function to generate the data the distinct values are fairly evenly spread across the million rows of the table.

Consider, then, the following two queries:


select
        *
from
        t1
where
        exists (
                select  null
                from    t2
                where   n_1000 = 0
                and     t2.n_400 = t1.n_400
                and     t2.n_3 = t1.n_3
        )
;


select
        *
from
        t1
where
        (t1.n_400, t1.n_3) in (
                select  t2.n_400, t2.n_3
                from    t2
                where   t2.n_1000 = 0
        )
;

The first point to check is that these two queries are logically equivalent.

Once you’re happy with that idea we can work out, informally, how many rows we should expect the queries ought to return: there are 1,200 combinations for (n_400, n_3) so each combination should return roughly 833 rows; if we pick 1,000 rows from the 1 million available we can expect to see 679 of those combinations (that’s Alberto Dell’Era’s “selection without replacement” formula that Oracle uses for adjusting num_distinct to allow for filter predicates). So we might reasonably suggest that the final number of rows as 833 * 679 = 565,607. It turns out that that’s a pretty good estimate – when I ran the query the result was actually 567,018 rows.

So what does Oracle produce for the two execution plans – here are the result from 12c (EXISTS first, then IN):


===================
Multi-column EXISTS
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_400"="T1"."N_400" AND "T2"."N_3"="T1"."N_3")
   2 - filter("N_1000"=0)

===================
Equivalent IN query
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N_400"="T2"."N_400" AND "T1"."N_3"="T2"."N_3")
   2 - filter("T2"."N_1000"=0)

The first thing to note is that the shape of the plans is identical, and the predicate sections are identical – but the final cardinalities are different. Clearly at least one of the cardinalities has to be wrong by a significant amount (7.5% or 10.4%, depending which way round you want to look at it). If you run the test on 11.2.0.4 you find that both plans give the same estimated row count – and it’s the 920,000 rows; so arguably 12c has “fixed” the IN subquery calculation, bringing it closer to a reasonable prediction, but it hasn’t fixed the EXISTS subquery calculation. That 833K prediction, by the way, is what you would expect to see with this data with a basic join – and a semi-join shouldn’t be able to produce more data than  a join.

But both predictions are way off the (informal) expectation, so how have they appeared ?

Working backwards it’s easy to spot that: 833K = 833 * 1,000: Oracle is behaving as if every single row identified in the subquery will produce a separate combination of (n_400, n_3). If we reverse engineer 920K we get: 920K / 833 = 1104 – it would appear that the optimizer thinks the 1,000 rows produced by the subquery will produce 1,104 distinct combinations of (n_400, n_3) so we how did the impossible 1,104 appear in the arithmetic.

If you apply the “selection without replacement” formula to picking 1,000 rows with 400 distinct values from 1,000,000 rows the expected number of distinct values (with rounding) will be 368; if you apply the formula for picking 1,000 rows with 3 distinct values from 1,000,000 rows the expected number will be 3. And 3 * 368 = 1,104. (Remember that in my original estimate I applied the formula after multiplying out the combination of distinct values). The optimizer is using its standard methods, but using internediate results in an unsuitable fashion.

It’s impossible to say what the impact of this particular code path – and the change on the upgrade – might be. The optimizer has over-estimated by 47% in one case and 62% in the other but (a) there may be something about my data that exaggerated an effect that few people will see in the wild and (b) in many cases getting in the right ballpark is enough to get a reasonable plan, and a factor of 2 is the right ballpark.

Of course, a few people will be unlucky with a few queries on the upgrade where the estimate changes – after all a single row difference in the estimate can cause the optimizer to flip between a hash join and a nested loop – but at least you’ve got a little extra information that might help when you see a bad estimate on an important semi-join.

So is there a workaround ? Given that I’ve got 12c, the obvious thing to try is to create a column group at both ends of the semi-join and see what happens. It shouldn’t really make any difference because column groups are targeted at the problems of correlated column – but we might as well try it:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns (n_400,n_3) size 1')
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for columns (n_400,n_3) size 1')

Unfortunately when I did this the final cardinality estimate for both queries dropped to just 833 (the absence of a K on the end isn’t a typo!).

Manually unnesting got me closer:


select
        *
from
        (
        select  distinct n_3, n_400
        from    t2
        where   n_1000 = 0
        )       sq,
        t1
where   
        sq.n_400 = t1.n_400
and     sq.n_3 = t1.n_3
;

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   649K|    33M|  1260  (11)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   649K|    33M|  1260  (11)| 00:00:01 |
|   2 |   VIEW               |      |   779 | 20254 |   612   (8)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |   779 |  8569 |   612   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SQ"."N_400"="T1"."N_400" AND "SQ"."N_3"="T1"."N_3")
   4 - filter("N_1000"=0)

The cardinality of 649K is (allowing for rounding) 833 * 779; so we need to know where the 779 came from. It’s the optimizer standard arithmetic for “distinct” – multiply the N individual selectivities together then divide by the sqrt(2) “N-1” times. So we apply the “selection without replacement formula twice”:

  • adjusted selectivity of n_400 = 367.21
  • adjusted selectivity of n_3 = 3
  • 367.21 * 3 / sqrt(2) = 779

If you create column group statistics for (n_400, n_3) this doesn’t change the optimizer’s estimate for the number of distinct combinations after selection – maybe that’s another enhancement in the pipeline – but, at least in this case, the manual unnesting has got us a little closer to the right estimates without any statistical intervention.

Footnote:

Just for the sake of completeness, here are the plans (with yet more cardinality predictions) that you get if you block the unnesting:


select 
	*
from 
	t1 
where 
	exists (
		select	
			/*+ no_unnest */
			null  
		from	t2 
		where	n_1000 = 0 
		and	t2.n_400 = t1.n_400 
		and	t2.n_3 = t1.n_3
	)
;



---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1179 | 33012 |   766K (12)| 00:00:30 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    26M|   632  (11)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "T2" WHERE
              "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   3 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)



=====================================
Unnesting blocked and subquery pushed
=====================================
select 
	*
from 
	t1 
where 
	exists (
		select	
			/*+ no_unnest push_subq */
			null  
		from	t2 
		where	n_1000 = 0 
		and	t2.n_400 = t1.n_400 
		and	t2.n_3 = t1.n_3
	)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 50000 |  1367K|  1271  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | T1   | 50000 |  1367K|   632  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST */ 0 FROM "T2"
              "T2" WHERE "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   2 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)

The 1179 comes from the magic of sqrt(2):  1179 = 1,000,000 / (400 * 3 / sqrt(2)).

The 50,000 is just the basic “I dunno, let’s call it 5%”.

 

Reference script: aggregate_selectivity_c.sql

 

September 1, 2015

Index Usage – 4

Filed under: CBO,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:41 pm BST Sep 1,2015

Here’s a thought that came to me while I was writing up a note about identifying redundant indexes a few minutes ago. Sometimes you end up supporting applications with unexpected duplication of data and indexes and need to find ways to reduce overheads. Here’s some code modelling a scenario that I’ve seen more often than I like (actually, just once would be more often than I’d like):


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e5
)
select
        rownum                                          id,
        trunc(sysdate,'MM') + (rownum-1)/1440           date_time,
        trunc(sysdate,'MM') + trunc((rownum-1)/1440)    date_only,
        rpad('x',100)                                   padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 ; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

I’ve got a table holding one row per minute since the start of the month; there’s a column which holds the date and time accurate to the minute, and another column which is supposed to hold just the date part. Is it possible to create a single index that allows Oracle to handles queries relatively efficiently whether they refer to date_time or date_only ? As a starting step could we get an index range scan on the same index for both of the following queries:


select
        max(id)
from
        t1
where
        date_only between sysdate-1 and sysdate
;


select
        max(id)
from
        t1
where
        date_time between sysdate-1 and sysdate
;

As Bob the Builder likes to say: “yes we can”.

There are a few lines of SQL between the table creation and the stats gathering that I didn’t show you. The first creates the constraint that describes the relationship between date_time and date_only – one is the truncated version of the other; the second defines the index we need, and the third (unfortunately) has to be there to declare the date_time column as a mandatory column:

alter table t1
        add constraint t1_trunc_date
        check(
                  date_only = trunc(date_time)
              and (   (date_only is null and date_time is null)
                   or (date_only is not null and date_time is not null)
              )
        )
;

create index t1_i1 on t1(trunc(date_time)) nologging;

alter table t1 modify (date_time not null);

(Given the requirement for date_time to be not null to get my indexing strategy to work, we could simplify the t1_trunc_date constraint to just (date_only = trunc(date_time)) if we declared date_only to be not null as well).

With the extra lines of SQL included here are the resulting execution plans for the two queries (running on 11.2.0.4, but you get the same plans on 12.1.0.2):


=======================================
date_only between sysdate-1 and sysdate
=======================================

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    21 |    92   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE               |       |     1 |    21 |            |          |
|*  2 |   FILTER                      |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T1    |  4306 | 90426 |    92   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_ONLY"<=SYSDATE@! AND "DATE_ONLY">=SYSDATE@!-1)
   4 - access(TRUNC(INTERNAL_FUNCTION("DATE_TIME"))>=SYSDATE@!-1 AND
              TRUNC(INTERNAL_FUNCTION("DATE_TIME"))<=SYSDATE@!)
=======================================
date_time between sysdate-1 and sysdate
=======================================

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    21 |    92   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE               |       |     1 |    21 |            |          |
|*  2 |   FILTER                      |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T1    |  1442 | 30282 |    92   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_TIME"=SYSDATE@!-1)
   4 - access(TRUNC(INTERNAL_FUNCTION("DATE_TIME"))>=TRUNC(SYSDATE@!-1) AND
              TRUNC(INTERNAL_FUNCTION("DATE_TIME"))>=TRUNC(SYSDATE@!))

The optimizer has managed to generate extra predicates in both cases by applying transitive closure to the critical constraint to produce queries that can be addressed (with some inefficiencies) through the single index.

Within limits, therefore, I can reduce two indexes to a single index. The strategy isn’t ideal but it may be appropriate in a few special cases. There are several problems that should be considered carefully:

  • The date_time column has to be declared not null for this optimization strategy to appear – that’s going to limit its applicability.
  • You may have more complex code where the transformation simply can’t be made to appear.
  • The introduction of the trunc() function may change the optimizer’s arithmetic in ways that cause plans to change for the worse
  • (Most important) The index range scan is always a multiple of 24 hours, with the excess data discarded after you reach the table. If you have lots of time-based queries for short time intervals (e.g. less than 8 hours) then the extra work done may outweigh the benefit of reducing the number of indexes – especially if all the excess table visits turn into randomly scattered single block reads.

Despite these drawbacks you may decide that you have a case where the strategy is “good enough” to help you reduce the workload on your system at some critical times during the day or night.

 

Index Usage – 3

Filed under: Indexing,Oracle,Tuning — Jonathan Lewis @ 5:52 pm BST Sep 1,2015

In my last note on index usage I introduced the idea of looking at v$segstat (or v$segment_statistics) and comparing the “logical reads” statistic with the “db block changes” statistic as an indicator of whether or not the index was used in execution plans. This week I’ll explain the idea and show you some results – with a little commentary – from a production system that was reported on the OTN database forum.

The idea is fairly simple (and simplistic). If you update a typical index you will traverse three blocks (root, branch, leaf) to find the index entry that has to be updated, so if the only reason you use an index is to find out which index entry has to be updated than the number of “db block changes” for that index will be (we hope) roughly one-third of the number of “session logical I/Os” of the index.

We can do some testing of this hypothesis with some simple SQL:


create table t1 nologging as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                  id,
        trunc(dbms_random.value(0,333333))      n1,
        rpad('x',100)                           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 ; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

alter table t1 add constraint t1_pk primary key(id) using index nologging;
create index t1_i1 on t1(n1)nologging;

So I’ve got a table with a million rows, a primary key, and an index on a column of randomly generated data. Now all I need to do is run the following little script  a few thousand times and check the segment stats – I’ve avoided using a pl/sql script because of all the special buffer-handling optimisations could appear if I did:


exec :b1 := trunc(dbms_random.value(1,1000001))

update t1
        set n1 = trunc(dbms_random.value(0,333333))
        where   id = :b1;

commit;

There are various ways of checking the segment stats, you could simply launch an AWR snapshot (or statspack snapshot at level 7) before and after the test – the results from the “Segments by …” sections of the report should tell you all you need to know; or you could run a simple piece of SQL like the following before and after the test and then doing some arithimetic:

select
        object_name, statistic_name, value 
from
       v$segment_statistics
where
       owner = {your user name here}
and    object_name in ('T1','T1_PK','T1_I1')
and    statistic_name in (
              'db block changes',
              'logical reads'
)
and     value != 0
order by
        object_name,
        statistic_name
;

I happen to have some snapshot code in a little procedure that does the job I need, so my testbed code looks like this:

execute snap_my_stats.start_snap
execute snap_segstat.start_snap

set termout off
set serveroutput off

variable b1 number

@start_10000    -- invoke my script 10,000 times

spool test

set serveroutput on
set termout on

execute snap_segstat.end_snap
execute snap_my_stats.end_snap

spool off

The question is, what do we expect the results to look like, and what do they actually look like. Given we have 10,000 updates going on we might expect something like the following:

  • T1_PK – index access by primary key, 10,000 * 3 logical I/Os
  • T1 – 10,000 logical I/Os as we find the rows then 10,000 db block changes
  • T1_I1 – index access to find entry to be deleted (10,000 * 3 logical I/Os), repeated to find leaf block for insertion of new entry (10,000 * 3 logical I/Os), with 10,000 * 2 db block changes for the delete/insert actions.

Here are a few results from 12.1.0.2 – if I don’t include a commit in the update script:


12.1.0.2 with no commit
Segment stats
=======================
T1
logical reads                               20,016
db block changes                            19,952

T1_PK
logical reads                               30,016
physical reads                                  19
physical read requests                          19

T1_I1
logical reads                               60,000
db block changes                            21,616

Session Stats
=============
Name                                         Value
----                                         -----
session logical reads                      110,919
consistent gets                             30,051
consistent gets examination                 30,037
db block gets                               80,868
db block changes                            81,989

Some of the figures match the predictions very nicely – in particular the logical reads and db block changes on the T1_I1 index are amazing (so good I feel I have to promise that I didn’t fake them, or wait until after the test to make my prediction;)

There are, however, some anomalies: why have I got 20,000 logical reads and db block changes on the table when I did only 10,000 updates. I was surprised by this, but it is something I’ve seen before: Oracle was locking each row before updating it, so generating two changes and two redo entries (Op Codes 11.4 and 11.5). In the past I’d noticed this as a side effect of setting the audit_trail to DB, but it was happening here with audit_trail =none. (Something to add to my “todo” list – why is this happening, when did it appear.)

You’ll also notice that the session level stats for logical reads nearly matches the table and index level (20K + 30K + 60K = ca. 110K) while the db block changes stats are out by a factor of 2. Don’t forget that for each change to a table or index we make a change to an undo block describing how to reverse that change so the 40,000 data changes are matched by a further 40,000 undo block changes; and on top of this every time we get the next undo block we change our transaction table entry in the undo segment header we’re using, and that accounts for most of the rest. The discrepancy in the number of logical reads is small because while we keeping getting and releasing the table and index blocks, we pin the undo block from the moment we acquire it to the moment it’s full so we don’t record extra logical reads each time we modify it.

Big observation

Based on the figures above, we could probably say that, for an index with a blevel = 2 (height = 3), if the number of db block changes recorded is close to one-third of the logical reads recorded, then that index is a good candidate for review as it may be an index that is not used to access data, it may be an index that does nothing except use up resources to keep itself up to date.

Big problem

Take a look at the statistics when I included the commit in my test case:

12.1.0.2 with commit
Segment Stats
====================
T1
logical reads                               20,000

T1_PK
logical reads                               30,000

T1_I1
logical reads                                  512
db block changes                               160

Session Stats
=============
Name                                         Value
----                                         -----
session logical reads                       80,625
consistent gets                             30,106
consistent gets examination                 30,039
db block gets                               50,519
db block changes                            60,489

Apparently my session has made 60,000 changes – but none of them applied to the table or index! In fact I haven’t even accessed the T1_I1 index! The segment statistics have to be wrong. Moreover, if I commit every update I ought to change a segment header block at the start and end of every update, which means I should see at least 20,000 more db block changes in the session (not 20,000 less); and since I’m not pinning undo blocks for long transaction I should see about 10,000 extra logical reads as I acquire 10,000 undo blocks at the start of each short transaction. The session statistics have to be wrong as well!

A quick check on the redo stream shows exactly the change vectors I expect to see for these transactions:

  • 11.4 – lock row price (table)
  • 5.2 – start transaction (update undo segment header)
  • 11.5 – update row piece (table)
  • 10.4 – delete leaf row (index)
  • 10.2 – insert leaf row (index)
  • 5.4 – commit (update undo segment header)
  • 5.1 – update undo block (op 11.1 – undo table row operation)
  • 5.1 – update undo block (op 11.1 – undo table row operation)
  • 5.1 – update undo block (op 10.22 – undo leaf operation)
  • 5.1 – update undo block (op 10.22 – undo leaf operation)

That’s a total of 10 changes per transaction – which means 100,000 db block changes  in total, not 60,000.

This anomaly is so large that it HAS to make my suggested use of the segment stats suspect.  Fortunately, though, the error is in a direction that, while sapping our confidence, doesn’t make checking the numbers a completely pointless exercise.  If the error is such that we lose sight of the work done in modifying the index then the figures remaining are such that they increase our perception of the index as one that is being used for queries as well – in other words the error doesn’t make an index that’s used for queries look like an index that’s only used for self-maintenance.

Case Study

The following figures were the results from the OTN database forum posting that prompted me to write this note and the previous one:

OTN

The poster has some code which gives a report of the indexes on a table (all 26 of them in this case) with their column definition and segment statistics. What (tentative) clues do we get about these indexes as far as this article is concerned ?

Conveniently the code arranges the indexes in order of “change percentage”, and we can see very easily that the first nine indexes in the list show “db block changes” > one-third of “logical reads”, the cut-off point for the article, so it’s worth taking a quick look at those indexes to see if they are suitable candidates for dropping. Inevitably the moment you start looking closely there are a number of observations to add to this starting point.

  1. Look at the number of changes in the first 12 indexes, notice how frequently numbers around 300,000 appear – perhaps that’s indicative of about 300,000 inserts taking place in the interval, in which case the first and 14th indexes (on (zcid) and (ps_spdh) respectively) must be on columns which are very frequently null and are therefore much smaller than the rest of the indes. Even though the index on (zcid) is reported at 39%, perhaps this is an index with a blevel of 1 (height = 2) in which case its cut-off point would be 50% rather than 33% – which means it could well be used for a lot of queries.
  2. The tenth index on (dp_datetime) reports 26%, “change percentage”  which is below the cut-off, but it’s worth noting that are three other indexes (12, 13 and 21) on that table that start with a column called dp_datetime_date. Is dp_datetime_date the truncated value of db_datetime and is it a real column or a virtual column ? Given my comments about the optimizer’s clever trick with indexes on trunc(date_column) in the second post in this series perhaps there’s scope here for getting rid of the dp_datetime index even though the simple numeric suggests that it probably is used for some queries.
  3. Of the three indexes starting with db_datetime_date, one consists of just that single column – so perhaps (as suggested in the first post in this series) we could simply drop that too. Then, when we look at the other two (indexes 12 and 13) we note that index 13 is subject to fives time as much change as index 12 (is that one insert plus 2 updates, given that an update means two changes), but fifteen times as much logical I/O. The extra LIO may be because the index is larger (so many more columns), it may be because the index is used very inefficiently – either way, we might look very carefully at the column ordering to see if index 13 could be rearranged to start the same way as index 12, and then drop index 12.  On top of everything else we might also want to check whether we have the right level of compression on the index – if it’s not very effective until we’ve selected on many columns then it must be subject to a lot of repetition in the first few columns.
  4. I gave a few examples in part one of reasons for dropping indexes based on similarity of columns used – the examples came from this output so I won’t repeat them, but if you refer back to them you will note that the desirability of some of the suggestions in the earlier article is re-inforced by the workload statistics – for example: the similarity of indexes 24 and 24, with an exact ordered match on the first 4 columns, suggests that we consider combining the two indexes into a single index: the fact that both indexes were subject to 2.7 million changes makes this look like a highly desirable target.

Summary

There are a lot of indexes on this table but it looks as if we might be able to drop nearly half of them, although we will have to be very careful before we do so and will probably want to make a couple at a time invisible (and we can make the change “online” in 12c) for a while before dropping them.

Remember, though, that everything I’ve said in this note is guesswork based on a few simple numbers, and I want to emphasise an important point – this note wasn’t trying to tell you how to decide if an index could be dropped, it was pointing out that there’s a simple way to focus your attention on a few places where you’re most likely to find some indexes that are worth dropping.  Run a report like this against the five biggest tables or the five busiest tables or the five tables with the most indexes and you’ll probably find a few easy wins as far as redundant indexes are concerned.

Footnote

While writing up my comments about the optimizer’s tricks with columns like dp_datetime and a virtual dp_datetime_date I had a sudden sneaky thought about how we could play games with the optimizer if both columns were real columns that were kept in synch with each other. If it works out I’ll write it up in a further blog.

August 29, 2015

Index Usage – 2

Filed under: 12c,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 11:33 am BST Aug 29,2015

I’ve been a little slow in the follow-up to my previous posting on possibly redundant indexes. Before going into the slightly more complex stuff, there’s another peripheral point (but a very important one) that’s worth raising about how clever the optimizer can be. Here’s some code for 11.2.0.4 to demonstrate the point:

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(sysdate,'MM') + (rownum-1)/1440	date_time,
	rpad('x',100)				padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;


alter table t1 
add (
        date_only
	generated always as (trunc(date_time)) virtual 
)
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_i1 on t1(date_only) nologging;

So, in a two-step process, I’ve got an indexed virtual column that holds the value of the date_time column truncated to just the date. Would you expect the optimizer to use the index to execute the following query efficiently:


select
        max(id)
from
        t1
where
        date_time between sysdate-1 and sysdate
;

Note that the query references the real date_time column not the virtual column date_only, and it’s not using the expression that defines the index – yet the plan reads as follows:


-----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |     1 |    21 |    86   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE                       |       |     1 |    21 |            |          |
|*  2 |   FILTER                              |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |  1442 | 30282 |    86   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_TIME"<=SYSDATE@! AND "DATE_TIME">=SYSDATE@!-1)
   4 - access("T1"."DATE_ONLY">=TRUNC(SYSDATE@!-1) AND
              "T1"."DATE_ONLY"<=TRUNC(SYSDATE@!))

It’s a little odd that even though the optimizer in the newer versions of Oracle treats many simple expressions on sysdate as constants it still checks (operation 2) that “sysdate >= sysdate – 1” but perhaps that’s just a case of a piece of generic code that isn’t worth the risk or effort of changing.

The key point, of course, is that Oracle has managed to generate some extra predicates that allow it to use the “wrong” index to get a first approximation of the result set fairly efficiently, and then used the original predicate to reduce the approximation down to the correct result set.

If you want a quick sanity check on the access predicates used for operation 4:

  • If date_time >= sysdate-1, then trunc(date_time) >= trunc(sysdate-1)
  • If date_time <= sysdate, then trunc(date_time) <= trunc(sysdate)

This style of predicate manipulation also works numeric data types, but I think its greatest benefit (or convenience) is likely to come from date data types where the data has been created with a time component but there are frequent “date-only” queries. The days of creating two indexes as a workaround for handling generated code that wants to deal with both date_time and trunc(date_time) predicates should be numbered.

Footnote:

This enhancement probably appeared in 11.2.0.2, and I first saw it described in October 2013 in this blog note by Mohamed Houri; but 12c offers a delightful little enhancement – here’s what my table looks like in the 12c version of the code:


SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 DATE_TIME                              DATE
 PADDING                                VARCHAR2(100)

SQL> 

Where’s the virtual column ? The 12c version of my code had a slightly different definition for it:


alter table t1  
add (
        date_only
        invisible
        generated always as (trunc(date_time)) virtual
)
;

The transformation still works even when the virtual column is invisible. So (subject to searching for anomalies, boundary conditions and bugs) it looks as if you can change the table definition, and get the benefits of two indexes for the price of one without the application realising that anything has changed.

August 25, 2015

Truncate – 2

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 6:25 pm BST Aug 25,2015

Following on from my earlier comments about how a truncate works in Oracle, the second oldest question about truncate (and other DDL) appeared on the OTN database forum“Why isn’t a commit required for DDL?”

Sometimes the answer to “Why” is simply “that’s just the way it is” – and that’s what it is in this case, I think.  There may have been some historic reason why Oracle Corp. implemented DDL the way they did (commit any existing transaction the session is running, then auto-commit when complete), but once the code has been around for a few years – and accumulated lots of variations – it can be very difficult to change a historic decision, no matter how silly it may now seem.

This posting isn’t about answering the question “why”, though; it’s about a little script I wrote in 2003 in response to a complaint from someone who wanted to truncate a table in the middle of a transaction without committing the transaction. Don’t ask why – you really shouldn’t be executing DDL as part of a transactional process (though tasks like dropping and recreating indexes as part of a batch process is a reasonable strategy).

So if DDL always commits the current transaction how do you truncate a table without committing ? Easy – use an autonomous transaction. First a couple of tables with a little data, then a little procedure to do my truncate:


create table t1 (n1 number);
insert into t1 values(1);

create table t2 (n1 number);
insert into t2 values(1);

create or replace procedure truncate_t1
as
        pragma autonomous_transaction;
begin
        execute immediate 'truncate table t1';
end;
/

Then the code to demonstrate the effect:


prompt  ======================================
prompt  In this example we end up with no rows
prompt  in t1 and only the original row in t2,
prompt  the truncate didn't commit the insert.
prompt  ======================================

insert into t2 values(2);

execute truncate_t1;
rollback;

select * from t1;
select * from t2;


According to my notes, the last time I ran this code was on 9.2.0.3 but I’ve just tested it on 12.1.0.2 and it behaves in exactly the same way.

I’ve only tested the approach with “truncate” and “create table” apparently, and I haven’t made any attempt to see if it’s possible to cause major distruption with cunningly timed concurrent activity; but if you want to experiment you have a mechanism which Oracle could have used to avoid committing the current transaction – and you may be able to find out why it doesn’t, and why DDL is best “auto-committed”.

Truncate

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 8:39 am BST Aug 25,2015

The old question about truncate and undo (“does a truncate generate undo or not”) appeared on the OTN database forum over the week-end, and then devolved into “what really happens on a truncate”, and then carried on.

The quick answer to the traditional question is essentially this: the actual truncate activity typically generates very little undo (and redo) compared to a full delete of all the data because all it does is tidy up any space management blocks and update the data dictionary; the undo and redo generated is only about the metadata, not about the data itself.

Of course, a reasonable response to the quick answer is: “how do you prove that?” – so I suggested that all you had to do was “switch logfile, truncate a table, dump logfile”. Unfortunately I realised that I had never bothered to do this myself and, despite having far more useful things to do, I couldn’t resist wasting some of my evening doing it. Here’s the little script I wrote to help


create table t2 (v1 varchar2(32));
insert into t2 values (rpad('A',32));
commit;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id, 
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;

create index t1_i1 on t1(id);
alter system flush buffer_cache;
execute dbms_lock.sleep(3)

alter system switch logfile;

insert into t2 values(rpad('X',32));

truncate table t1;and t

insert into t2 values(rpad('Y',32));
commit;

execute dump_log

Procedure dump_log simply dumps the current log file. The call to switch logfile keeps the dumped log file as small as possible; and I’ve flushed the buffer cache with a three second sleep to minimise the number of misleading “Block Written Record” entries that might otherwise appear in the log file after the truncate. There were all sorts of interesting little details in the resulting activity when I tested this on 12.1.0.2 – here’s one that’s easy to spot before you even look at the trace file:


SQL> select object_id, data_object_id, object_name from user_objects where object_name like 'T1%';

 OBJECT_ID DATA_OBJECT_ID OBJECT_NAME
---------- -------------- --------------------
    108705         108706 T1_I1
    108704         108707 T1

Notice how the data_object_id of the index is smaller than that of the table after the truncate ? Oracle truncates (and renumbers) the index before truncating the table.

The truncate activity was pretty much as as I had assumed it would be – with one significant variation. The total number of change vectors report was 272 in 183 redo record (your numbers may vary slightly if you try to reproduce the example), and here’s a summary of the redo OP codes that showed up in those change vectors in order of frequency:


Change operations
=================
  1 OP:10.25    Format root block
  1 OP:11.11    Insert multiple rows (table)
  1 OP:24.1     DDL
  1 OP:4.1      Block cleanout record
  2 OP:10.4     Delete leaf row
  2 OP:13.28    HWM on segment header block
  3 OP:10.2     Insert leaf row
  3 OP:17.28    standby metadata cache invalidation
  4 OP:11.19    Array update (index)
  4 OP:11.5     Update row (index)
 10 OP:13.24    Bitmap Block state change (Level 2)
 11 OP:23.1     Block written record
 12 OP:14.1     redo: clear extent control lock
 12 OP:22.5     File BitMap Block Redo
 14 OP:14.2     redo - lock extent (map)
 14 OP:14.4     redo - redo operation on extent map
 14 OP:5.4      Commit / Rollback
 15 OP:18.3     Reuse record (object or range)
 15 OP:22.16    File Property Map Block (FPM)
 22 OP:13.22    State on Level 1 bitmap block
 24 OP:22.2     File Space Header Redo
 29 OP:5.2      Get undo header
 58 OP:5.1      Update undo block

The line that surprised me was the 14 commit/rollback codes – a single truncate appears to have operated as 14 separate (recursive) transactions. I did start to walk through the trace file to work out the exact order of operation, but it’s really messy, and a tedious task, so I just did a quick scan to get the picture. I may have made a couple of mistakes in the following, but I think the steps were:

  • Start transaction
  • Lock the extent map for the index — no undo needed
  • Lock each bitmap (space management) block  — no undo needed
  • Reset each bitmap block — undo needed to preserve space management information
  • Reset highwater marks where relevant on bitmap and segment header block — undo needed
  • Clear segment header block — undo needed
  • Write all the updated space management blocks to disc (local write waits)
    • Log file records “Block Written Record”.
  • For each space management block in turn
    • Update space management blocks with new data object_id — undo needed
    • Write the updated block to disc (local write wait)
    • Log file records one “Block Written Record” for each block
  • Repeat all the above for the TABLE segment.
  • Start a recursive transacion
    • Insert a row into mon_mod$ — undo needed
    • recursive commit
  • Set DDL marker in redo log (possibly holding the text of the DDL statement, but it’s not visible in the dump)
  • Set object reuse markers in the redo log
  • update tab$  — needs undo, it’s just DML
  • update ind$ — needs undo, it’s just DML
  • update seg$  — needs undo, it’s just DML (twice – once for table once for index)
  • update obj$ — needs undo, it’s just DML (twice – ditto)
  • COMMIT — at last, with a change vector for a “Standby metadata cache invalidation” marker

The remaining 12 transactions look like things that could be delayed to tidy up things like space management blocks for the files and tablespaces and releasing “block locks”.

This first, long, transaction, is the thing that has to happen as an atomic event to truncate the table – and you can imagine that if the database crashed (or you crashed the session) in the middle of a very slow truncate then there seems to be enough information being recorded in the undo to allow the database to roll forward an incomplete truncate, and then roll back to before the truncate.

It would be possible to test whether or not this would actually work – but I wouldn’t want to do it on a database that anyone else was using.

August 19, 2015

Oops

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 10:02 am BST Aug 19,2015

I made a mistake a few days ago following up a question on the OTN database forum. The question was about a problem creating a hash/list composite partitioned table, and one of the respondants suggested that perhaps the problem appeared because hash/list wasn’t a legal combination.

Spot on: so I confirmed that observation and supplied a link to the official Oracle white paper that listed the combinations that were legal in 11.2 for composite partitioning.  In fact, although I was fairly sure that hash/list wasn’t legal, I had even run up a quick test to check that the attempt would fail before I’d searched online for the document.

Fortunately other people had different ideas about the original failure, and one of them supplied a link to a thread on AskTom which actually included some SQL to create a hash/list composite partitioned table. Naturally I tested it (even though it was from Tom Kyte) and obviously it worked (after all it was from Tom Kyte) – and then I spotted the syntax error in the example I had created for myself.

Trust but verify … and then …

I had fallen into two traps – and one of them was documented in my own “Trust” posting from 2006.

The white paper was dated September 2009 (Tom’s example was dated June 2013) and as it says in my Trust note:

  • If its date is more than about 18 months old – don’t assume it’s (still) true
  • If it’s not your exact version number – don’t assume it’s (still) true

The second trap was an example of confirmation bias, I was fairly sure that my test was supposed to fail with Oracle error “ORA-00922: missing or invalid option”, so when it failed with exactly that error I didn’t check why it had failed and didn’t notice that I had swapped the order of a couple of clauses in the create table statement. It’s very easy to think you’ve done enough when testing – especially when your test results match your expectation.

Update – later that day

So I’ve had an email asking me how I got the ORA-00922.  Here’s the SQL as I wrote it – don’t give the answer in the comments, but how quickly can you spot what I did wrong ?

create table t1
(
        object_name,
        object_type,
        created
)
partition by hash(object_name) partitions 4
subpartition by list(object_type)
subpartition template (
        subpartition sp1 values ('TABLE'),
        subpartition sp2 values ('INDEX'),
        subpartition sp3 values (default)
)
as
select  object_name, object_type, created
from    all_objects
;

It’s a deceptive error (to me, at least) because even though I know it’s wrong it still looks right.

August 17, 2015

Index Usage

Filed under: Indexing,Oracle,Tuning — Jonathan Lewis @ 4:25 pm BST Aug 17,2015

The question of how to identify indexes that could be dropped re-appeared (yet again) on the OTN database forum last week. It’s not really surprising that it recurs so regularly – the problem isn’t an easy one to solve but new (and even less new) users keep hoping that there’s a quick and easy solution.

There are, however, strategies and pointers that can help you to optimise the trade-off between effort, risk, and reward. Broadly the idea is to spend a small amount of effort finding a relatively small number of “expensive” indexes that might be safe to drop, so that when you do the detailed analysis you have a good chance that the time spent will be rewarded by a positive result.

Before we get to some results posted on OTN, it’s worth thinking about the global impact and what we’re trying to achieve, and the threats that go with our attempt to achieve it.

The key detail, of course, is that index maintenance is an expensive process. We could insert 1,000 rows into a table at a cost of writing about 25 table blocks plus a few undo blocks plus something like half a megabyte of redo (assuming, for the purposes of illustration that each row is about 200 bytes on insert). Add one index to the table and we might have to locate and modify 1,000 separate index leaf blocks. The increment on the redo might be about quarter of a megabyte and we may have to access 1,000 different undo blocks for read consistency reasons, but the simple fact that we may need 1,000 buffers to be able to maintain that index is likely to be a significant extra cost on the insert. Make that 10 indexes, or 70 (as one unhappy DBA once told me) and the probability of being able to do high-speed inserts becomes rather low.

Of course we hope that our indexes will allow our queries to operate efficiently with great precision, but inevitably we get to a point where the benefit of precision is outweighed by the cost of maintenance. Our target, then, is to design the set of indexes that makes it possible for the optimizer to find good paths for all the important queries and “good enough” paths for the rest. By the time the system is live, though, it’s too late for “proper design”, and the only option is for damage limitation, a bit of guesswork, and some live testing with fingers crossed (thank goodness for invisible indexes).

The starting point is usually an attempt to identify “the indexes we are not using”, which is typically translated into “the indexes that do not appear in execution plans” – but that’s not actually a good target, for various reasons:

  • Problem 1: If we are using an index it’s possible that we shouldn’t be and that there’s an alternative index available that ought to be more efficient. A corollary to this is that if you do identify and drop such an index you may find that the optimizer doesn’t use the alternative index you were expecting it to use until you take some action to help the optimizer recognise that the alternative is a good choice.
  • Problem 2: if we aren’t using a particular index then perhaps we should be using it and would use it if we dropped one of the other indexes on the table. (And there’s always the possibility that we didn’t happen to use it during the interval we were checking but do use it at some other times)
  • Problem 3: the optimizer is capable of using information about the number of distinct keys in a multi-column index to select an executon plan even though it may not use that index in the plan it finally chooses. We may be able to work around this problem in current versions of Oracle by creating a column group (extended statistics) that matches the definition of each indexes we drop – but there’s a limit of 20 column groups per table.
  • Problem 4: There are some indexes we might not be using but which must exist to avoid the “foreign key locking” problem. It should be easy enough to check, before dropping an index, whether it has to exist to match a foreign key; and even then it may be possible to show that nothing in the application would cause the locking problem to appear – and as a safety measure you could disable locks on the (child) table to ensure that the application doesn’t grind to a halt because of foreign key locking problems.

Provided you remember that problems like these exist, and think carefully about the indexes that your strategy suggests, there are various ways you could approach the problem of identifying indexes that don’t get into execution plans.

v$object_usage

The ink had barely dried on the manual pages for this view before several people (including me) had written notes explaining why this view wasn’t particularly helpful. (I think I even said something about this in Practical Oracle 8i). I won’t repeat the discussion here but it revolves around the fact that an index is flagged as “used” even if it has only been used once in a single execution of a single statement – so you don’t get any idea of the real importance of the index.

v$sql_plan et. al.

If you review the set of in-memory execution plans (and the AWR or Statspack equivalents) you can identify indexes which definitely have been used – but (a) it’s expensive to scan v$sql_plan frequently and (b) the AWR/Statspack repositories only capture a subset of the more expensive plans, so it’s easy to miss indexes which have been used and are relatively important but aren’t in the repository and don’t happen to be in memory at the moments you look.

Review the definitions

If you examine the index definitions you may spot indexes where look very similar. If one index starts with the same columns, in the same order, as another index, there is a good chance that you could reduce two indexes to one – especially if the whole of one of the indexes is the “leading edge” of the other – for example:

  • (dp_datetime_date)
  • (dp_datetime_date, dp_compid)

Even if the leading edges match and the trailing edges differ we might be able to collapse two indexes into one – depending on how selective the leading columns are and how the indexes are used – for example:

  • (dp_compid, ddzt, cirmhcx, ct_nxr_mhcx, dp_datetime_date)
  • (dp_compid, ddzt, cirmhcx, ct_nxr_mhcx, pnr_cfrqsj_date)

which could perhaps be replaced by one of :

  • (dp_compid, ddzt, cirmhcx, ct_nxr_mhcx, dp_datetime_date, pnr_cfrqsj_date)

or

  • (dp_compid, ddzt, cirmhcx, ct_nxr_mhcx, pnr_cfrqsj_date, dp_datetime_date)

Guessing about the use of a typical date column, though, it’s possible that in this example the current trailing date columns are used with a range-based predicate, so it’s possible that this strategy won’t be effective for this pair of indexes.

Even if the order of later columns in the index doesn’t match you may still find that a pair of indexes could be reduced to a single index – for example the pair:

  • (dp_datetime_date, dp_compid)
  • (dp_datetime_date, ddzdt, dp_compid, ct_nxrdh, ct_smsmobilno)

which could perhaps be replaced by just:

  • (dp_datetime_date, dp_compid, ddzdt, ct_nxrdh, ct_smsmobilno)

As a safety measure, of course, you would probably create a new index, then make the subject indexes invisible, and wait for at least a week to see whether any performance problems appear (remembering that one automatic performance threat would be the increase in workload as yet another index – temporarily – has to be maintained).

The difficulty of eliminating indexes by examination is that it takes a lot of effort to investigate all the possibilities, so you really need some way of choosing a relatively small subset of indexes that might be worth the effort. This brings me to the principle topic of this posting – using segment statistics to help you pick which indexes might be worth the effort.

v$segstat / v$segment_statistics

Oracle records a number of workload statistics for each object in memory. The view v$segstat is an efficient version of these statistics, and v$segment_statistics is a friendlier version that joins v$segstat to tables user$, obj$ and ts$, with a filter against ind$ to turn meaningless numbers into names.

SQL&amp;gt; desc V$segstat
 Name                    Null?    Type
 ----------------------- -------- ----------------
 TS#                              NUMBER
 OBJ#                             NUMBER
 DATAOBJ#                         NUMBER
 STATISTIC_NAME                   VARCHAR2(64)
 STATISTIC#                       NUMBER
 VALUE                            NUMBER

SQL&amp;gt; desc V$segment_statistics
 Name                    Null?    Type
 ----------------------- -------- ----------------
 OWNER                            VARCHAR2(30)
 OBJECT_NAME                      VARCHAR2(30)
 SUBOBJECT_NAME                   VARCHAR2(30)
 TABLESPACE_NAME                  VARCHAR2(30)
 TS#                              NUMBER
 OBJ#                             NUMBER
 DATAOBJ#                         NUMBER
 OBJECT_TYPE                      VARCHAR2(18)
 STATISTIC_NAME                   VARCHAR2(64)
 STATISTIC#                       NUMBER
 VALUE                            NUMBER

For each segment Oracle records the following statistics (according to v$segstat_name – but there are a couple more hidden statistics reported in the underlying x$ksolsstat object):

NAME                             SAMPLED
-------------------------------- -------
logical reads                    YES
buffer busy waits                NO
gc buffer busy                   NO
db block changes                 YES
physical reads                   NO
physical writes                  NO
physical read requests           NO
physical write requests          NO
physical reads direct            NO
physical writes direct           NO
optimized physical reads         NO
optimized physical writes        NO
gc cr blocks received            NO
gc current blocks received       NO
ITL waits                        NO
row lock waits                   NO
space used                       NO
space allocated                  NO
segment scans                    NO

Both Statspack (at level 7) and the AWR report have several “Top N” sections for segment statistics. If we examine these stats for all the indexes on a given table we can get some clues about which indexes are likely to be worth further investigation to see if they could be dropped.

One very simple measure is the number of “physical reads” (which, for indexes, will generally be very similar to “physical read requests”). Since a (real) physical read is generally going to take a significant amount of time, segments with very large numbers of physical reads could be contributing a lot of of time to the total database time – so it’s worth knowing why it’s responsible for so many physical reads and worth cross-checking with v$sql_plan (and its historic equivalents) which statements seem to be using or modifying this index.

Even if it turns out that the index is absolutely necessary, you might still be able to spot opportunities to improve efficiency. If it is subject to a significant number of physical reads it may be that the index is just very large – could you make it smaller by rebuilding it with compression on some of the leading columns, is it an index which (for some reason you can identify) tends to degenerate over time and waste a lot of space and should you rebuild it occasionally. It might be possible (depending on the predicates used) to re-arrange the column order in such a way that the activity is focused onto a particular section of the index rather than being spread across the entire index – or you could even find that by careful choice of global partitioning (which is legal on even a non-partitioned table) you might be able to isolate the activity to a small section of the index.

A more interesting measure, though, comes from comparing the “logical reads” with the number of “db block changes”; and that’s the point of this posting – except that I’ve spent so much time on it already that I’m going to have to write part 2 some time next week.

 

August 7, 2015

CBO catchup

Filed under: 12c,CBO,Oracle,Partitioning — Jonathan Lewis @ 1:10 pm BST Aug 7,2015

It’s interesting to watch the CBO evolving and see how an enhancement in one piece of code doesn’t necessarily echo through to all the other places it seems to fit. Here’s an example of an enhancement that spoiled (or, rather, made slightly more complicated) a little demonstration I had been running for about the last 15  years  – but (in a fashion akin to another partitioning limitation) doesn’t always work in exactly the way you might expect.

I wrote a note some time ago about the way that the optimizer could pre-compute the result of what I called a “fixed subquery” (such as “select 9100 from dual”) and take advantage of the value it derived to do a better job of estimating the cardinality for a query. That’s a neat feature (although it may cause some 3rd party applications a lot of pain as plans change on the upgrade to 11.2.0.4 or 12c) but it doesn’t work everywhere you might hope.

I’m going to create two (small) tables with the same data, but one of them is going to be a simple heap table and the other is going to be partitioned by range; then I’m going to run the same queries against the pair of them and show you the differences in execution plans. First the tables:


create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
from
        generator       v1
where
        rownum <= 1e4
;

create table pt1(
        id, v1, padding
)
partition by range (id) (
        partition p02000 values less than ( 2001),
        partition p04000 values less than ( 4001),
        partition p06000 values less than ( 6001),
        partition p08000 values less than ( 8001),
        partition p10000 values less than (10001)
)
as
select * from t1
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PT1',
                granularity      =>'ALL',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

alter table  t1 add constraint  t1_pk primary key(id);
alter table pt1 add constraint pt1_pk primary key(id) using index local;

create or replace function f(i_in  number)
return number
is
begin
        return i_in;
end;
/

Note that I’ve used ‘ALL’ as my granularity option – for such small tables this should mean that the statistics at the partition and global level are as accurate as they can be. And since the data is defined to be uniform I don’t expect the partitioning to introduce any peculiarities in the optimizer’s calculations of selectivity and cardinality. I’ve created the indexes after gathering stats on the tables – this is 12c (and 11.2.0.4) so the index stats will be collected with a 100% sample as the indexes are created. Finally I’ve created a function that simply returns its numeric input.

Now let’s run a couple of queries against the simple table and check the cardinality (Rows) predicted by the optimizer – the two plans follow the code that generated them:

set serveroutput off

select  max(v1)
from    t1
where   id between (select 500 from dual)
           and     (select 599 from dual)
;

select * from table(dbms_xplan.display_cursor);

select  max(v1)
from    t1
where   id between (select f(500) from dual)
           and     (select f(599) from dual)
;

select * from table(dbms_xplan.display_cursor);

======================
Actual Execution Plans
======================

select max(v1) from t1 where id between (select 500 from dual)
  and     (select 599 from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   101 |  1515 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |   101 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

select max(v1) from t1 where id between (select f(500) from dual)
     and     (select f(599) from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    25 |   375 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |    45 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

In the first plan the optimizer has recognised the values 500 and 599, so its range-based calculation has produced a (matching, and nearly correct) prediction of 101 rows. In the second plan the function call has hidden the values so the optimizer has had to use the arithmetic for “ranges with unknown values” – which means it uses guesses for the selectivity of 0.45% for the index and 0.25% for the table. Maybe in a future release that f(500) will be evaluated in the same way that we can trigger in-list calculation with the precompute_subquery hint.

Now we repeat the query, but using the partitioned table – showing only the trimmed output from dbms_xplan.display_cursor():

select max(v1) from pt1 where id between (select 500 from dual)
   and     (select 599 from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     4 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |   101 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

select max(v1) from pt1 where id between (select f(500) from dual)
      and     (select f(599) from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     3 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |    45 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

It’s great to see that the predicted cardinalities match the simple heap version exactly – but can you see anything odd about either of these plans ?

 

 

Pause for thought …

 

 

There’s nothing odd about the second plan but there’s a little puzzle in the first.

In theory, it seems, the optimizer is aware that the first query covers the range 500 – 599; so why are the pstart/pstop columns for operations 2-4 showing KEY-KEY, which usually means the optimizer knows that it will have some partition key values at run-time and will be able to do run-time partition elimination, but it doesn’t know what those key values are at parse time.

In this very simple case it’s (probably) not going to make any difference to the performance – but it may be worth some careful experimentation in more complex cases where you might have been hoping to get strict identification of partitions and partition-wise joins taking place. Yet another topic to put on the todo list of “pre-emptive investigations” with a reminder to re-run the tests from time to time.

 

 

August 5, 2015

Compression

Filed under: compression,Infrastructure,Oracle — Jonathan Lewis @ 1:06 pm BST Aug 5,2015

Originally published Jan 2013

Red Gate have asked me to write a few articles for their Oracle site, so I’ve sent them a short series on “traditional” compression in Oracle – which means I won’t be mentioning Exadata hybrid columnar compression (HCC a.k.a. EHCC). There will be five articles, published at the rate of one per week starting Tuesday (15th Jan). I’ll be supplying links for them as they are published.

Updated Aug 2015

Since I’ve got a little catalogue on compression started, I decided to add some items from my blog to the list, starting with a six-part series on HCC and compression units that I wrote (mostly) in the summer of 2012.

There are a few other items on my blog that I’ll add to this list eventually.

August 3, 2015

Demo data

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 1:26 pm BST Aug 3,2015

One of the articles I wrote for redgate’s AllthingsOracle site some time ago included a listing of the data distribution for some client data which I had camouflaged. A recent comment on the article asked how I had generated the data – of course the answer was that I hadn’t generated it, but I had done something to take advantage of its existence without revealing the actual values.  This article is just a little note showing what I did; it’s not intended as an elegant and stylish display of perfectly optimised SQL, it’s an example of a quick and dirty one-off  hack that wasn’t (in my case) a disaster to run.

I’ve based the demonstration on the view all_objects. We start with a simple query showing the distribution of the values of column object_type:


break on report
compute sum of count(*) on report

select
        object_type, count(*)
from
        all_objects
group by object_type
order by
        count(*) desc
;

OBJECT_TYPE           COUNT(*)
------------------- ----------
SYNONYM                  30889
JAVA CLASS               26447
...
JAVA RESOURCE              865
TRIGGER                    509
JAVA DATA                  312
...
JAVA SOURCE                  2
DESTINATION                  2
LOB PARTITION                1
EDITION                      1
MATERIALIZED VIEW            1
RULE                         1
                    ----------
sum                      76085

44 rows selected.

Starting from this data set I want 44 randomly generated strings and an easy way to translate the actual object type into one of those strings. There are various ways to do this but the code I hacked out put the original query into an inline view, surrounded it with a query that added a rownum to the result set to give each row a unique id, then used the well-known and much-loved  “connect by level” query against  dual to generate a numbered list of randomly generated strings as an inline view that I could use in a join to do the translation.


execute dbms_random.seed(0)

column random_string format a6

select
        generator.id,
        dbms_random.string('U',6)       random_string,
        sum_view.specifier,
        sum_view.ct                     "COUNT(*)"
from
        (
        select
                rownum  id
                from    dual
                connect by
                        level <= 100
        )       generator,
        (
        select
                rownum          id,
                specifier,
                ct
        from
                (
                select
                        object_type specifier, count(*) ct
                from
                        all_objects
                group by
                        object_type
                order by
                        count(*) desc
                )
        )       sum_view
where
        sum_view.id = generator.id
order by
        ct desc
;

        ID RANDOM SPECIFIER             COUNT(*)
---------- ------ ------------------- ----------
         1 BVGFJB SYNONYM                  30889
         2 LYYVLH JAVA CLASS               26447
...
         9 DNRYKC JAVA RESOURCE              865
        10 BEWPEQ TRIGGER                    509
        11 UMVYVP JAVA DATA                  312
...
        39 EYYFUJ JAVA SOURCE                  2
        40 SVWKRC DESTINATION                  2
        41 CFKBRX LOB PARTITION                1
        42 ZWVEVH EDITION                      1
        43 DDAHZX MATERIALIZED VIEW            1
        44 HFWZBX RULE                         1
                                      ----------
sum                                        76085

44 rows selected.

I’ve selected the id and original value here to show the correspondance, but didn’t need to show them in the original posting. I’ve also left the original (now redundant) “order by” clause in the main inline view, and you’ll notice that even though I needed only 44 distinct strings for the instance I produced the results on I generated 100 values as a safety margin for testing the code on a couple of other versions of Oracle.

A quick check for efficiency – a brief glance at the execution plan, which might have prompted me to add a couple of /*+ no_merge */ hints if they’d been necessary – showed that the work done was basically the work of the original query plus a tiny increment for adding the rownum and doing the “translation join”. Of course, if I’d then wanted to translate the full 76,000 row data set and save it as a table I’d have to join the result set above back to a second copy of all_objects – and it’s translating full data sets , while trying to deal with problems of referential integrity and correlation, where the time disappears when masking data.

It is a minor detail of this code that it produced fixed length strings (which matched the structure of the original client data). Had I felt the urge I might have used something like: dbms_random.string(‘U’,trunc(dbms_random.value(4,21))) to give me a random distribution of string lengths between 4 and 20. Getting fussier I might have extracted the distinct values for object_type and then generated a random string that matched the length of the value it was due to replace. Fussier still I might have generated the right number of random strings matching the length of the longest value, sorted the original and random values into alphabetical order to align them, then trimmed each random value to the length of the corresponding original value.

It’s extraordinary how complicated it can be to mask data realistically – even when you’re looking at just one column in one table. And here’s a related thought – if an important type of predicate in the original application with the original data is where object_type like ‘PACK%’ how do you ensure that your masked data is consistent with the data that would be returned by this query and how do you determine the value to use instead of “PACK” as the critical input when you run the critial queries against the masked data ? (Being privileged may give you part of the answer, but bear in mind that the people doing the testing with that data shouldn’t be able to see the unmasked data or any translation tables.)

 

 

 

July 29, 2015

Existence

Filed under: Execution plans,Oracle,subqueries,Subquery Factoring,Tuning — Jonathan Lewis @ 1:05 pm BST Jul 29,2015

A recent question on the OTN Database Forum asked:

I need to check if at least one record present in table before processing rest of the statements in my PL/SQL procedure. Is there an efficient way to achieve that considering that the table is having huge number of records like 10K.

I don’t think many readers of the forum would consider 10K to be a huge number of records; nevertheless it is a question that could reasonably be asked, and should prompt a little discssion.

First question to ask, of course is: how often do you do this and how important is it to be as efficient as possible. We don’t want to waste a couple of days of coding and testing to save five seconds every 24 hours. Some context is needed before charging into high-tech geek solution mode.

Next question is: what’s wrong with writing code that just does the job, and if it finds that the job is complete after zero rows then you haven’t wasted any effort. This seems reasonable in (say) a PL/SQL environment where we might discuss the following pair of strategies:


Option 1:
=========
-- execute a select statement to see in any rows exist

if (flag is set to show rows) then
    for r in (select all the rows) loop
        do something for each row
    end loop;
end if;

Option 2:
=========
for r in (select all the rows) loop
    do something for each row;
end loop;

If this is the type of activity you have to do then it does seem reasonable to question the sense of putting in an extra statement to see if there are any rows to process before processing them. But there is a possibly justification for doing this. The query to find just one row may produce a very efficient execution plan, while the query to find all the rows may have to do something much less efficient even when (eventually) it finds that there is no data. Think of the differences you often see between a first_rows_1 plan and an all_rows plan; think about how Oracle can use index-only access paths and table elimination – if you’re only checking for existence you may be able to produce a MUCH faster plan than you can for selecting the whole of the first row.

Next question, if you think that there is a performance benefit from the two-stage approach: is the performance gain worth the cost (and risk) of adding a near-duplicate statement to the code – that’s two statements that have to be maintained every time you make a change. Maybe it’s worth “wasting” a few seconds on every execution to avoid getting the wrong results (or an odd extra hour of programmer time) once every few months. Bear in mind, also, that the optimizer now has to optimize two statement instead of one – you may not notice the extra CPU usage in testing but perhaps in the live environment the execution benefit will be eroded by the optimization cost.

Next question, if you still think that the two-stage process is a good idea: will it result in an inconsistent database state ?! If you select and find a row, then run and find that there are no rows to process because something modified and “hid” the row you found on the first pass – what are you going to do. Will this make the program crash ? Will it produce an erroneous result on this run, or will a silent side effect be that the next run will produce the wrong results. (See Billy Verreynne’s comment on the original post). Should you set the session to “serializable” before you start the program, or maybe lock a critical table to make sure it can’t change.

So, assuming you’ve decided that some form of “check for existence then do the job” is both desirable and safe, what’s the most efficient strategy. Here’s one of the smarter solutions that miminises risk and effort (in this case using a pl/sql environment).


select  count(*)
into    m_counter
from    dual
where   exists ({your original driving select statement})
;

if m_counter = 0 then
    null;
else
    for c1 in {your original driving select statement} loop
        -- do whatever
    end loop;
end if;

The reason I describe this solution as smarter, with minimum risk and effort, is that (a) you use EXACTLY the same SQL statement in both locations so there should be no need to worry about making the same effective changes twice to two slightly different bits of SQL and (b) the optimizer will recognise the significance of the existence test and run in first_rows_1 mode with maximum join elimination and avoidance of redundant table visits. Here’s a little data set I can use to demonstrate the principle:


create table t1
as
select
        mod(rownum,200)         n1,     -- scattered data
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 10000
;

delete from t1 where n1 = 100;
commit;

create index t1_i1 on t1(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                cascade => true,
                method_opt => 'for all columns size 1'
        );
end;
/

It’s just a simple table with index, but the index isn’t very good for finding the data – it’s repetitive data widely scattered through the table: 10,000 rows with only 200 distinct values. But check what happens when you do the dual existence test – first we run our “driving” query to show the plan that the optimizer would choose for it, then we run with the existence test to show the different strategy the optimizer takes when the driving query is embedded:


alter session set statistics_level = all;

select  *
from    t1
where   n1 = 100
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

select  count(*)
from    dual
where   exists (
                select * from t1 where n1 = 100
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

Notice how I’ve enabled rowsource execution statistics and pulled the execution plans from memory with their execution statistics. Here they are:


select * from t1 where n1 = 100

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    38 (100)|      0 |00:00:00.01 |     274 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |     50 |    38   (3)|      0 |00:00:00.01 |     274 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=100)

select count(*) from dual where exists (   select * from t1 where n1 = 100  )

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |     3 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|*  2 |   FILTER           |       |      1 |        |            |      0 |00:00:00.01 |       2 |
|   3 |    FAST DUAL       |       |      0 |      1 |     2   (0)|      0 |00:00:00.01 |       0 |
|*  4 |    INDEX RANGE SCAN| T1_I1 |      1 |      2 |     1   (0)|      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   4 - access("N1"=100)

For the original query the optimizer did a full tablescan – that was the most efficient path. For the existence test the optimizer decided it didn’t need to visit the table for “*” and it would be quicker to use an index range scan to access the data and stop after one row. Note, in particular, that the scan of the dual table didn’t even start – in effect we’ve got all the benefits of a “select {minimum set of columns} where rownum = 1” query, without having to work out what that minimum set of columns was.

But there’s an even more cunning option – remember that we didn’t scan dual when when there were no matching rows:


for c1 in (

        with driving as (
                select  /*+ inline */
                        *
                from    t1
        )
        select  /*+ track this */
                *
        from
                driving d1
        where
                n1 = 100
        and     exists (
                        select
                                *
                        from    driving d2
                        where   n1 = 100
                );
) loop

    -- do your thing

end loop;

In this specific case the subquery would automatically go inline, so the hint here is actually redundant; in general you’re likely to find the optimizer materializing your subquery and bypassing the cunning strategy if you don’t use the hint. (One of the cases where subquery factoring doesn’t automatically materialize is when you have no WHERE clause in the subquery.)

Here’s the execution plan pulled from memory (after running this SQL through an anonymous PL/SQL block):


SQL_ID  7cvfcv3zarbyg, child number 0
-------------------------------------
WITH DRIVING AS ( SELECT /*+ inline */ * FROM T1 ) SELECT /*+ track
this */ * FROM DRIVING D1 WHERE N1 = 100 AND EXISTS ( SELECT * FROM
DRIVING D2 WHERE N1 = 100 )

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |    39 (100)|      0 |00:00:00.01 |       2 |
|*  1 |  FILTER            |       |      1 |        |            |      0 |00:00:00.01 |       2 |
|*  2 |   TABLE ACCESS FULL| T1    |      0 |     50 |    38   (3)|      0 |00:00:00.01 |       0 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |      1 |      2 |     1   (0)|      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("T1"."N1"=100)
   3 - access("T1"."N1"=100)

You’ve got just one statement – and you’ve only got one version of the complicated text because you put it into a factored subquery; but the optimizer manages to use one access path for one instantiation of the text and a different one for the other. You get an efficient test for existence and only run the main query if some suitable data exists, and the whole thing is entirely read-consistent.

I have to say, though, I can’t quite make myself 100% enthusiastic about this code strategy – there’s just a nagging little doubt that the optimizer might come up with some insanely clever trick to try and transform the existence test into something that’s supposed to be faster but does a lot more work; but maybe that’s only likely to happen on an upgrade, which is when you’d be testing everything very carefully anyway (wouldn’t you) and you’ve got the “dual/exists” fallback position if necessary.

Footnote:

Does anyone remember the thing about reading execution plan “first child first” – this existence test is one of the interesting cases where it’s not the first child of a parent operation that runs first: it’s the case I call the “constant subquery”.

July 27, 2015

Subquery Factoring (10)

Filed under: Bugs,CBO,Oracle,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 1:26 pm BST Jul 27,2015

What prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do something nasty on the upgrade. As ever it’s always interesting how many little oddities you can discover while looking closely as some little detail of how the optimizer works. Here’s an oddity that came up in the course of my playing around investigation in 12.1.0.2 – first some sample data:


create table t1
nologging
as
select * from all_objects;

create index t1_i1 on t1(owner) compress nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns owner size 254'
        );
end;
/

The all_objects view is convenient as a tool for modelling what I wanted to do since it has a column with a small number of distinct values and an extreme skew across those values. Here’s a slightly weird query that shows an odd costing effect:


with v1 as (
        select /*+ inline */ owner from t1 where owner &gt; 'A'
)
select count(*) from v1 where owner = 'SYS'
union all
select count(*) from v1 where owner = 'SYSTEM'
;

Since the query uses the factored subquery twice and there’s a predicate on the subquery definition, I expect to see materialization – and that’s what happens (even though I’ve engineered the query so that materialization is more expensive than executing inline). Here are the two plans from 12.1.0.2 (the same pattern appears in 11.2.0.4, though the costs are a little less across the board):


=======================
Unhinted (materializes)
=======================

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     2 |   132 |    25  (20)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661B_876C2CB |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN    | T1_I1                      | 85084 |   498K|    21  (15)| 00:00:01 |
|   4 |   UNION-ALL                |                            |       |       |            |          |
|   5 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  6 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
|   8 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  9 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

=============
Forced inline
=============

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     2 |    12 |    22  (14)| 00:00:01 |
|   1 |  UNION-ALL             |       |       |       |            |          |
|   2 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  3 |    INDEX FAST FULL SCAN| T1_I1 | 38784 |   227K|    21  (15)| 00:00:01 |
|   4 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  5 |    INDEX RANGE SCAN    | T1_I1 |   551 |  3306 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

I’m not surprised that the optimizer materialized the subquery – as I pointed out in my previous article, the choice seems to be rule-based (heuristic) rather than cost-based. What surprises me is that the cost for the default plan is not self-consistent – the optimizer seems to have lost the cost of generating the temporary table. The cost of the materialized query plan looks as if it ought to be 21 + 13 + 13 = 47. Even if the optimizer were smart enough to assume that the temporary table would be in the cache for the second scan (and therefore virtually free to access) we ought to see a cost of 21 + 13 = 34. As it is we have a cost of 25, which is 13 + 13 (or, if you check the 10053 trace file, 12.65 + 12.65, rounded).

Since the choice to materialize doesn’t seem to be cost-based (at present) this doesn’t really matter – but it’s always nice to see, and be able to understand, self-consistent figures in an execution plan.

Footnote

It is worth pointing out as a side note that materialization can actually be more expensive than running in-line, even for very simple examples. Subquery factoring seems to have become more robust and consistent over recent releases in terms of consistency of execution plans when the subqueries are put back inline, but you still need to think a little bit before rewriting a query for cosmetic (i.e. totally valid “readability”) reasons just to check whether the resulting query is going to produce an unexpected, and unexpectedly expensive, materialization.

July 24, 2015

Subquery Factoring (9)

Filed under: CBO,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 12:34 pm BST Jul 24,2015

Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:


create table t1
as
select
        object_id, data_object_id, created, object_name, rpad('x',1000) padding
from
        all_objects
where
        rownum &lt;= 10000
;

exec dbms_stats.gather_table_stats(user,'T1')

explain plan for
with gen as (
        select /*+ materialize */ object_id, object_name from t1
)
select
        g1.object_name,
        g2.object_name
from
        gen g1,
        gen g2
where
        g2.object_id = g1.object_id
;

select * from table(dbms_xplan.display);

You’ll notice that my original table has very wide rows, but my factored subquery selects a “narrow” subset of those rows. My target is to have an example where doing a tablescan is very expensive but the temporary table holding the extracted data is much smaller and cheaper to scan.

I’ve included a materialize hint in the SQL above, but you need to run the code twice, once with, and once without the hint. Here are the two plans – unhinted first:


============================
Unhinted - won't materialize
============================

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   468K|   428   (2)| 00:00:03 |
|*  1 |  HASH JOIN         |      | 10000 |   468K|   428   (2)| 00:00:03 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
---------------------------------------------------------------------------

==================================
Hinted to materialize - lower cost
==================================

--------------------------------------------------------------------------------------------------------- 
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |                            | 10000 |   585K|   227   (2)| 00:00:02 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6664_9DAAEB7 |       |       |            |          | 
|   3 |    TABLE ACCESS FULL       | T1                         | 10000 |   234K|   214   (2)| 00:00:02 | 
|*  4 |   HASH JOIN                |                            | 10000 |   585K|    13   (8)| 00:00:01 | 
|   5 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 | 
|   7 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Clearly the optimizer isn’t considering the costs involved. If I add the predicate “where object_id > 0” (which identifies ALL the rows in the table), materialization occurs unhinted (with the same costs reported as for the hinted plan above. My tentative conclusion is that the transformation is a heuristic one that follows the rule “two or more appearances of the subquery and some indication of row selection in the subquery rowsource”. (In fact if the rowsource is “select * from pipeline_function” the requirement for subsetting doesn’t seem to apply.)

The plans above came from 11.2.0.4 but I got the same result, with a slight difference in costs, in 12.1.0.2. It’s worth pointing out that despite Oracle apparently ignoring the costs when deciding whether or not to materialize, it still seems to report self-consistent values after materialization: the 227 for the plan above is the 214 for creating the temporary table plus the 13 for deriving the hash join of the two copies of the temporary table.

July 22, 2015

Invalidation

Filed under: ANSI Standard,Bugs,Oracle — Jonathan Lewis @ 12:45 pm BST Jul 22,2015

Someone who attended my sessions at the Bucharest Oracle Summit earlier on this year sent me an example of a quirky little bug, possibly related to the newer “fine-grained” invalidation mechanisms, possibly related to ANSI syntax SQL, that’s very easy to reproduce. (That’s always nice for Oracle support – a perfect test case.)

All it takes is two tables and a packaged procedure that queries those tables. The package is coded to do something that should not be allowed in production code; but “should not” and “is not” are very different things. For anyone who wants to play with the example, here’s the script to create the necessary objects:


drop package pkg_test;
drop table t2 purge;
drop table t1 purge;

create table t1 (id1 number, val1 varchar2(10));
create table t2 (id2 number, val2 varchar2(10));

insert into t1 values(1,rpad('x',10,'x'));
insert into t2 values(1,rpad('x',10,'x'));

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

create or replace package pkg_test is
   procedure pr_call;
end pkg_test;
/

create or replace package body pkg_test as

   procedure pr_call is

      cursor cur_ids is
         select *                   -- Naughty !
           from t1
           join t2
             on t2.id2 = t1.id1
         ;

      rec_id cur_ids%rowtype := null;

   begin
      open cur_ids;
      fetch cur_ids into rec_id;
      close cur_ids;
      dbms_output.put_line(rec_id.val1 || '-' || rec_id.val2);

   exception
      when others then
         if cur_ids%isopen then
            close cur_ids;
         end if;
         raise;

   end pr_call;

end pkg_test;
/

Having created the procedure I’m now going to call it – and then add a column to table t1. What’s that going to do to a packaged procedure with a “select *”?

Pause for thought …

Here’s some SQL to run the test.


set serveroutput on

prompt   *** Make a first call to the procedure: no error ***
execute  pkg_test.pr_call

prompt   *** add a column to one of the tables
alter table t1 add col_test varchar2(20);

prompt   *** Make two more calls to the procedure: ouch! ***
execute  pkg_test.pr_call
execute  pkg_test.pr_call

prompt  *** Recompile before a third call ***
execute  dbms_ddl.alter_compile('package body', user, 'pkg_test')
execute  pkg_test.pr_call

Unless I’ve managed to cut-n-paste the wrong bits of code, you would have got the following error for the 2nd and 3rd calls to the package:


BEGIN
*
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got -
ORA-06512: at "TEST_USER.PKG_TEST", line 25
ORA-06512: at line 2

The package body should (I believe) have invalidated and recompiled itself for the second execution, and even if it failed on the first attempt surely it should have invalidated itself on the ORA-932 and recompiled itself and succeeded on the third execution.  (If you remove the exception clause you’ll find that the error is intially raised at the fetch, by the way).

If we change the “select *” to explicitly name the columns we want, viz:“select t1.id1, t1.val1, t2.id2, t2.val2” we don’t get the ORA-00932 errors (just as we would probably expect). What we might not expect is that the errors also disappear if we leave the “select *” in place but change the query from ANSI syntax to traditional Oracle syntax.

Footnote:

Obviously you shouldn’t use the lazy “*” notation in any production code – it can cause several different problems (including the dangers of “whoops, I didn’t mean to make that one invisible”) – but if you do you may find that you end up with packaged procedures that crash for no apparent reason until you recompile them. Perhaps ORA-00932 is the only possible error message, but maybe it’s possible to cause other errors to appear. Even worse, though I haven’t tried to force it yet, you may find that you can construct cases where the package reports no error but modifies the wrong data.

I’ve tested this code on versions 11.2.0.4 and 12.1.0.2 and see the same results on both.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,390 other followers