Oracle Scratchpad

July 16, 2018

Direct IOT

Filed under: 12c,Infrastructure,IOT,Oracle — Jonathan Lewis @ 1:02 pm BST Jul 16,2018

A recent (automatic ?) tweet from Connor McDonald highlighted an article he’d written a couple of years ago about an enhancement introduced in 12c that allowed for direct path inserts to index organized tables (IOTs). The article included a demonstration seemed to suggest that direct path loads to IOTs were of no benefit, and ended with the comment (which could be applied to any Oracle feature): “Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.”

Clearly it’s necessary to pose the question – “so when would direct mode insert be a good option for IOTs?” – because if it’s never a good option you have to wonder why it has been implemented. This naturally leads on to thinking about which tests have not yet been done – what aspects of IOTs did Connor not get round to examining in his article. (That’s a standard principle of trouble-shooting, or testing, or investigation: when someone shows you a test case (or when you think you’ve finished testing) one thing you should do before taking the results as gospel is to ask yourself what possible scenarios have not been covered by the test.)

So if you say IOT what are the obvious tests once you’ve got past the initial step of loading the IOT and seeing what happens. First, I think, would be “What if the IOT weren’t empty before the test started”; second would be “IOTs can have overflow segments, what impact might one have?”; third would be “Do secondary indexes have any effects?”; finally “What happens with bitmap indexes and the requirement for a mapping table?” (Then, of course, you can worry about mixing all the different possibilities together – but for the purposes of this note I’m just going to play with two simple examples: non-empty starting tables, and overflow segments.)

Here’s some code to define a suitable table:


create table t2 
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	3 * rownum			id,
	lpad(rownum,10,'0')		v1,
	lpad('x',50,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
order by
	dbms_random.value
;

begin
	dbms_stats.gather_table_stats(
		ownname     => null,
		tabname     => 'T2',
		method_opt  => 'for all columns size 1'
	);
end;
/

create table t1(
	id,
	v1,
	padding,
	constraint t1_pk primary key(id)
)
organization index
-- including v1
-- overflow
nologging
as
select * from t2
;

begin
	dbms_stats.gather_table_stats(
		ownname     => null,
		tabname     => 'T1',
		method_opt  => 'for all columns size 1'
	);
end;
/

I’ve created a heap table t2 that holds 100,000 rows with an id column that is randomly ordered; then I’ve used this ta1ble as a source to create an IOT, with the option to have an overflow segment that contains just the 100 character padding columns. I’ve used 3 * rownum to define the id column for t2 so that when I insert another copy of t2 into t1 I can add 1 (or 2) to the id and interleave the new data with the old data. (That’s another thought about IOT testing – are you loading your data in a pre-existing order that suits the IOTs or is it arriving in a way that’s badly out of order with respect to the IOT ordering; and does your data go in above the current high value, or spread across the whole range, or have a partial overlap with the top end of the range and then run on above it.)

Have created the starting data set, here’s the test:


execute snap_my_stats.start_snap
execute snap_events.start_snap

insert 
	/*  append */
into t1
select
	id + 1, v1, padding
from
	t2
;


execute snap_events.end_snap
execute snap_my_stats.end_snap

All I’m doing is using a couple of my snapshot packages to check the work done and time spent while insert 100,000 interleaved rows – which are supplied out of order – into the existing table. As shown the “append” is a comment, not a hint, so I’ll be running the test case a total of 4 times: no overflow, with and without the hint – then with the overflow, with and without the hint. (Then, of course, I could run the test without the overflow but an index on v1).

Here are some summary figures from the tests – first from the test without an overflow segment:

                                      Unhinted       With Append
                                  ============      ============
CPU used when call started                 153               102
CPU used by this session                   153               102
DB time                                    166               139

redo entries                           130,603            42,209
redo size                           78,315,064        65,055,376

sorts (rows)                                30           100,031

You’ll notice that with the /*+ append */ hint in place there’s a noticeable reduction in redo entries and CPU time, but this has been achieved at a cost of sorting the incoming data into order. The reduction in redo (entries and size) is due to an “array insert” effect that Oracle can take advantage of with the delayed index maintenance that takes place when the append hint is legal (See the section labelled Option 4 in this note). So even with an IOT with no overflow there’s a potential benefit to gain from direct path loading that depends on how much the new data overlaps the old data, and there’s a penalty that depends on the amount of sorting you’d have to do.

What happens in my case when I move the big padding column out to an overflow segment – here are the equivalent results:


Headline figures                      Unhinted       With Append
================                  ============      ============
CPU used when call started                 158                52
CPU used by this session                   158                52
DB time                                    163                94
redo entries                           116,669            16,690
redo size                           51,392,748        26,741,868
sorts (memory)                               4                 5
sorts (rows)                                33           100,032

Interestingly, comparing the unhinted results with the previous unhinted results, there’s little difference in the CPU usage between having the padding column in the “TOP” section of the IOT compared to having it in the overflow segment, though there is a significant reduction in redo (the index entries are still going all over the place one by one, but the overflow blocks are being pinned and packed much more efficiently). The difference between having the append hint or not, though, is damatic. One third of the CPU time (despited still having 100,000 rows to sort), and half the redo. One of the side effects of the overflow, of course, is that the things being sorted are much shorted (only the id and v1 columns that go into the TOP section, and not the whole IOT row.

So, if you already have an overflow segment that caters for a significant percentage of the row, it looks as if the benefit you could get from using the /*+ append */ hint would far outweigh the penalty of sorting you have to pay. Of course, an IOT with a large overflow doesn’t look much different from a heap table with index – so perhaps that result isn’t very surprising.

I’ll close by re-iterating Connor’s closing comment:

Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.

Before you dive in and embrace it, or ruthlessly push it to one side, make sure you do some testing that reflects the situations you have to handle.

July 14, 2018

Quiz Night

Filed under: Execution plans,Oracle,Performance,sorting — Jonathan Lewis @ 7:07 pm BST Jul 14,2018

Because it’s been a long time since the last quiz night.  Here’s a question prompted by a recent thread on the ODevCom database forum – how many rows will Oracle sorts (assuming you have enough rows to start with in all_objects) for the final query, and how many sort operations will that take ?


drop table t1 purge;

create table t1 nologging as select * from all_objects where rownum < 50000;

select owner, count(distinct object_type), count(distinct object_name) from t1 group by owner;

Try to resist the temptation of doing a cut-n-paste and running the code until after you’ve thought about the answer.

And the answer is:

It was nice to see a few ideas being volunteered in response to this question; I think that getting a diverse set of comments makes a nice point about how it’s always worth spending a little time to think along the lines of: “If I do X how might Oracle handle it”. Having the ideas before trying to check the effects can make it a lot easier to understand what’s happening, and sometimes how to take advantage of what Oracle does do to improve the way you design a query.

The first point to make, as Michael D O’Shea  pointed out in comment #2, is that computer systems don’t usually “sort data” – they tend to create pointers to data and shuffle the pointers in some way. In Oracle’s case “sorting” used to mean inserting pointers into a balanced binary tree, and aggregating used to be a case of accumulating values at the leaf nodes of the insertion tree. In 10g Oracle then introduced a new sorting algorithm that often works more efficiently than the binary insertion tree. I’m still going to refer to the binary tree method as “sorting”, though.

Looking at the query, we can see that there is no “order by” clause so it’s possible that Oracle will do whatever it does using hash aggregation throughout and no sorting, but that leaves open the question of how a hash table on owner can also record a distinct count of both object_type and object_name because every single owner hash bucket would have to link to its own hash tables for object_type and object_name and do a sort of “recursive hash aggregation” which starts to sound a little complicated. Maybe the alternative suggested by Kaley in comment #1 is closer to the truth – maybe Oracle just “buckets” all the data by owner and then sorts within each owner twice to do the count distincts, but then we’re still going to be hanging on to a lot of data, doing a two-level open-ended process.

Having waved hands for a little bit to try and head in the direction of possible solutions we need to look for clues that tell us whether we ought to eliminate or refine some of our guesses. There are several bits of information we could look at and running the query (although I asked you not to) is the next step we have to take. But when we run the query we want to see the session statistics, pick up the actual execution plan with rowsource execution statistics, and enable the 10032 and 10033 (sort) traces. So let’s fold the query into a longer script, something like:


set linesize 255
set trimspool on
set pagesize 60

set serveroutput off
alter session set statistics_level = all;

execute snap_my_stats.start_snap

alter session set events '10032 trace name context forever';
alter session set events '10033 trace name context forever';

select owner ... etc.

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set events '10033 trace name context off';
alter session set events '10032 trace name context off';

execute snap_my_stats.end_snap

Of course, to avoid blurring around the edges, we may have to isolate the three different tests – the query against dbms_xplan.display_cursor(), for example, is obviously going to have some impact on the session stats – and we may then want to run each test twice in succession so that any warm-up or parsing activities don’t confuse the issue. It would also be a good idea to run the tests after creating a new session in case there are some distracting side effects from creating the data set. But with these details addressed, here are a few results:

First the execution plan (I got these results from 12.2.0.1, all recent versions of Oracle behave similarly):


----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     16 |00:00:00.34 |    1018 |       |       |          |
|   1 |  SORT GROUP BY     |      |      1 |     16 |     16 |00:00:00.34 |    1018 |  5014K|  1445K| 4456K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  50000 |  50000 |00:00:00.07 |    1018 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Oracle uses a SORT GROUP BY, not a HASH GROUP BY, and the indications are that we used about 4.4MB of memory to sort 50,000 rows. Then there’s the session statistics:


Name                                                                         Value
---- 

session uga memory max                                                   3,255,648
session pga memory max                                                   3,211,264

table scan rows gotten                                                      50,000

sorts (memory)                                                                   1
sorts (rows)                                                               150,000

This says we did only one sort operation, and sorted 150,000 rows using an excess over starting pga/uga of 3.2MB (rather than the 4.4MB suggested by the plan); possibly the variation in apparent memory usage could be explained by the way that Oracle allocates reasonably large chunks to grow the PGA when you grow a workarea, but since I’m taking deltas there’s some scope for being misled by the change in maximum pga/uga memory.

Finally the 10032 trace file shows the following (we didn’t spill to disc, so the 10033 trace wasn’t triggered):


---- Sort Parameters ------------------------------
sort_area_size                    4562944
sort_area_retained_size           4562944
sort_multiblock_read_count        7
max intermediate merge width      38

---- Sort Statistics ------------------------------
Input records                             150000
Output records                            49907
Total number of comparisons performed     1945863
  Comparisons performed by in-memory sort 1945863
Total amount of memory used               4562944
Uses version 1 sort
---- End of Sort Statistics -----------------------

These figures are ones we have to trust – it seems we really did d0 one sort operation of 150,000 rows, and we did allocate 4.5MB of memory. There’s an obvious guess for the 150,000 input rows – Oracle has turned every row into three rows, the original row, one to count the object_type, and one to count the object_name, and with that in mind maybe we will be able to make sense of getting an output of 49,907 rows using 4.5M of memory. Let’s create a query that could produce the right numbers, though with a differently arranged output:


select distinct owner, 0, null        from t1
union all
select distinct owner, 1, object_type from t1
union all
select distinct owner, 2, object_name from t1
order by 1, 2, 3
;

For my data set this query produced 49,907 rows of output (which is a number we wanted to see) and here are the first 9 rows of output – followed by the first row of output from the original query:


OWNER                    0 NULL
--------------- ---------- ---------------------
APPQOSSYS                0

APPQOSSYS                1 SYNONYM
APPQOSSYS                1 TABLE

APPQOSSYS                2 DBMS_WLM
APPQOSSYS                2 WLM_CLASSIFIER_PLAN
APPQOSSYS                2 WLM_FEATURE_USAGE
APPQOSSYS                2 WLM_METRICS_STREAM
APPQOSSYS                2 WLM_MPA_STREAM
APPQOSSYS                2 WLM_VIOLATION_STREAM


OWNER           COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOBJECT_NAME)
--------------- -------------------------- --------------------------
APPQOSSYS                                2                          6

Spot the pattern ? All I have to do after this “union all with simple sort” is walk the result set in order accumulating distinct counts as I do so.

But what about the memory requirements ? Check back to the original 10032 trace file, it reported 4562944 bytes as the total memory needed, but it also reported using a “Version 1” sort – so before I ran my query I set “_newsort_enabled”=false, to get the following 10032 trace details:


---- Sort Statistics ------------------------------
Input records                             49907
Output records                            49907
Total number of comparisons performed     730667
  Comparisons performed by in-memory sort 730667
Total amount of memory used               4562944
Uses version 1 sort
---- End of Sort Statistics -----------------------

The memory used is exactly the number I wanted to see. (The new version 2 sort got exactly the same result using 3.5MB of memory, so I don’t know why it’s not used at this point – but maybe that’s because the implementation isn’t quite what I think.)

So, conclusion (to date, until a reader shows me that my answer is incomplete – or wrong): As the “SORT GROUP BY” operation accepts each row from the “TABLE ACCESS FULL” operation it converts each row into N rows (one base row and one for each column for which there is a count(distinct)). The base row holds just the set of “group by” columns and is tagged with a zero, each subsequent row holds the “group by” columns, an tag value to identify which column it carries, and one of the “count(distinct)” columns. Oracle then operates the normal “group by” aggregation mechanism but is actually aggregating on the “group by” columns plus the “tag” column. So each leaf node in the binary tree ends up holding {owner_value, tag_value, count}, and once all the data has been aggregated into the binary tree Oracle can walk the tree and perform a pivot to turn three rows for each owner value into a single row.

If you start thinking about nasty scenarios you will realise that the upshot of this implementation (if my hypothesis is correct) is that if you have a query with a long “group by” list, and several columns where there are lots of distinct values for each combination of the “group by” list then the volume of the B-tree could actually be much larger than the volume of the original table, and the amount of memory and CPU needed to build that tree (before collapsing it) could be huge.

<h3>Footnote:</h3>

There is one special case with this count(distinct …) query. If you have only ONE distinct operation in the query Oracle can use the “distinct aggregation” transformation with “view merging” to produce a completely different plan.

July 13, 2018

pushing predicates

Filed under: CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:05 pm BST Jul 13,2018

I came across this odd limitation (maybe defect) with pushing predicates (join predicate push down) a few years ago that made a dramatic difference to a client query when fixed but managed to hide itself rather cunningly until you looked closely at what was going on. Searching my library for something completely different I’ve just rediscovered the model I built to demonstrate the issue so I’ve tested it against a couple of newer versions  of Oracle (including 18.1) and found that the anomaly still exists. It’s an interesting little detail about checking execution plans properly so I’ve written up the details. The critical feature of the problem is a union all view:


rem
rem	Script:		push_pred_limitation.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jan 2015
rem
rem	Last tested 
rem		18.1.0.0	via LiveSQL
rem		12.2.0.1
rem		12.1.0.2
rem		11.2.0.4
rem

create table t1
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t2
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t3
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1 for columns owner size 254'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T3',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t2_id on t2(object_id);
-- create index t2_id_ot on t2(object_id, object_type);

create index t3_name_type on t3(object_name, object_type);

create or replace view v1
as
select 
	/*+ qb_name(part1) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.created	date_2,
	t3.created	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
union all
select
	/*+ qb_name(part2) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.last_ddl_time	date_2,
	t3.last_ddl_time	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
;

Two points to note so far: first, the view is basically joining the same two tables in the same way twice but selecting different columns. It’s a close model of what the client was doing but so much simpler that it wouldn’t be hard to find a different way of getting the same result: the client’s version would have been much far harder to rewrite. Secondly, I’ve listed two possible indexes for table t2 but commented one of them out. The indexing will make a difference that I’ll describe later.

So here’s the query with execution plan (from explain plan – but pulling the plan from memory gives the same result):


select
	/*+ qb_name(main) */
	t1.object_name, t1.object_type,
	v1.object_id, v1.date_2, v1.date_3
from
	t1,
	v1
where
	v1.object_id = t1.object_id
and	v1.object_type_2 = t1.object_type
and	v1.object_type_3 = t1.object_type
and	t1.owner = 'OUTLN'
;

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   588 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   588 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|*  3 |   VIEW                                   | V1           |     1 |    44 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   3 - filter("V1"."OBJECT_TYPE_2"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

The execution plan appears to be fine – we can see at operation 4 that the union all view has been access with the pushed predicate option and that the subsequent sub-plan has
used index driven nested loop joins in both branches – until we look a little more closely and examine the Predicate section of the plan. What, exactly, has been pushed ?

Look at the predicate for operation 3: “V1″.”OBJECT_TYPE_2″=”T1″.”OBJECT_TYPE”. It’s a join predicate that hasn’t been pushed into the view. On the other hand the original, and similar, join predicate v1.object_type_3 = t1.object_type has been pushed into the view, appearing at operations 9 and 15. There is a difference, of course, the object_type_3 column appears as the second column of the index on table t3.

Two questions then: (a) will the object_type_2 predicate be pushed if we add it to the relevant index on table t2, (b) is there a way to get the predicate pushed without adding it to the index. The answer to both questions is yes. First the index – re-run the test but create the alternative index on t2 and the plan changes to:

Plan hash value: 497545587

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

Notice how the predicate at operation 3 has disappeared, and the access predicate at operation 8 now includes the predicate “T2″.”OBJECT_TYPE”=”T1″.”OBJECT_TYPE”.

Alternatively, don’t mess about with the indexes – just tell Oracle to push the predicate. Normally I would just try /*+ push_pred(v1) */ as the hint to do this, but the Outline section of the original execution plan already included a push_pred() hint that looked like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 1), so I first copied exactly that into the SQL to see if it would make any difference. It did – I got the following plan (and the hint in the outline changed to PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2 1) so this may be a case where the plan produced by a baseline will perform better than the plan that the produced the baseline!):

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|* 13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='TEST_USER')
   7 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  13 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

In this case we see that the critical late-joining predicate has disappeared from operation 3 and re-appeared as a filter predicate at operation 7 In many cases you may find that the change in predicate use makes little difference to the performance – in my example the variation in run time over several executions of each query was larger than the average run time of the query; nevertheless it’s worth noting that the delayed use of the predicate could have increased the number of probes into table t3 for both branches of the union all and resulted in redundant data passing up through several layers of the call stack before being eliminated … and “eliminate early” is one of the major commandments of optimisation.

You might notice that the Plan Hash Value for the hinted execution plan is the same as for the original execution plan: the hashing algorithm doesn’t take the predicates into account (just one of many points that Randolf Geist raised in a blog post several years ago). This is one of the little details that makes it easy to miss the little changes in a plan that can make a big difference in performance.

Summary

If you have SQL that joins simple tables to set based (union all, etc.) views and you see the pushed predicate option appearing take a little time to examine the predicate section of the execution plan to see if the optimizer is pushing all the join predicates that it should and, if it isn’t, test the effects of pushing more predicates.

In many cases adding the hint /*+ push_pred(your_view_name) */ at the top of the query may be sufficient to get the predicate pushing you need, but you may need to look at the outline section of the execution plan and add a series of more complicated push_pred() and no_push_pred() hints because the push_pred hint has evolved over time to deal with increasingly complicated transformations.

 

July 12, 2018

Cardinality Puzzle

Filed under: Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 12:57 pm BST Jul 12,2018

One of the difficulties of being a DBA and being required to solve performance problems is that you probably never have enough time to think about how you got to a solution and why the solution works; and if you don’t learn about the process itself , you just don’t get better at it. That’s why I try (at least some of the time) to write articles and books (as I did with CBO Fundamentals) that

  1. explain simple details that can be used as background facts
  2. walk through the steps of solving a problem

So here’s an example from a question on the ODC database forum asking about the cause and workaround for a bad cardinality estimate that is producing a poorly performing execution plan. It’s actually a type of problem that comes up quite frequently on large data sets and explains why a simple “gather stats” is almost guaranteed to leave you with a few headaches (regardless of whether or not you choose to include histograms as part of the process). I’m not going to offer “the answer” – I’m just going to talk about the inferences we can make from the facts supplied and where we have to go from there.

The DBA has a table holding 80,000,000,000 rows. It is list/hash partitioned with 2 partitions and 1,024 sub-partitions (per partition) but neither of the partitioning key columns appears in the query. The query runs parallel and the optimizer (presumably thanks to the specific settings of various parameters related to parallel execution uses dynamic sampling at level 3).

There is an inline view defined in the query and the DBA has isolated this as a key component of the problem and supplied a query and plan (from “explain plan”) against that view.


select * from TAB2 T
WHERE T.DT = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');
 
------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                          |   479M|    76G|  1756K (14)| 05:51:14 |       |       |        |      |            |
|   1 |  PX COORDINATOR              |                          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000                 |   479M|    76G|  1756K (14)| 05:51:14 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL     |                          |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  1024 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS STORAGE FULL| TAB1                     |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  2048 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - storage(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))
       filter(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))

Note
-----
   - dynamic sampling used for this statement (level=3)

The DBA’s problem is that if the estimated cardinality of this extract goes over roughly 500M the optimizer chooses a bad plan for the overall query – and on occasion this extract has given an estimate of 5 billion rows. Moreover, the actual number of rows returned by this extract is typically in the order of 40M, so the estimate is a long way off even when it’s “good enough”.

So where do we start looking to work out what’s going wrong? You’ll note, of course, that after text expansion the user’s single predicate has changed, and an extra predicate (previously hidden inside the view) has appeared; instead of just T.DT = to_date(:b1,’MM/DD/YYYY HH24:MI:SS’) we now have (cosmetically adjusted):

        COALESCE(DFG,'N')='N' 
AND     TRUNC(DT_TM)=TO_DATE(:B1,'MM/DD/YYYY HH24:MI:SS')

There are two immediately obvious threats here – first that the combination of predicates means Oracle is likely to make a mistake because it will check the individual selectivities and multiply them together to get the combined selectivity, second that the appearance of predicates of the form “function(column) = constant” means that Oracle will guess 1% as the individual selectivities.

Without checking more details we might assume that a possible quick fix (that would require no changes to existing code) would be to create a couple of virtual columns (or extended stats) to represent the two expressions and gather stats on the resulting columns – though it is a restriction of extended stats that you can’t “double up” and create a column group on the two column expressions, so there’s still some scope for a cardinality estimate that is still sufficiently bad even with this approach. We also note that if we can change the coalesce(DFG,’N’) that must have been hidden in the view to nvl(DFG,’N’) then Oracle would be able to “or expand” the nvl() and use a more appropriate selectivity for that part of the predicate.

However, the points I’ve covered so far tend to produce estimates that are too small and often much too small. So maybe the key to the problem is in the Note section that tells us that Oracle has (successfully) used dynamic sampling for this statement. In other words, all the theory of how the optimizer calculates selectivity may be irrelevant – the estimate will be based on the luck of the sample.

So let’s take a look at the (slightly edited) table stats we’ve been given:

column_name data_type num_distinct low_value      high_value     density   num_null  histogram
DT_TM       DATE           6179571 78740B1E0A383C 7876020B01262B 1.6182E-7 0         NONE
DFG         VARCHAR2             1 4E             4E             1         0         NONE

Notice that the DFG (apparently) has the value ‘N’ for every row in the table (low_value = high_value = 0x4E, num_nulls = 0). The date range is 30-Nov-2016 to 11-Feb-2018, with no histogram but 6.18M distinct values for 80 Billion rows. Neither column has a histogram.

A little arithmetic tells us that (on average) there ought to be about 182M (= 80B / 438 days) rows for any one day – and that’s worth thinking about for three separate reasons.

First, an estimate of 479M against an average of 182M isn’t too surprising if it’s based on a fairly small sample, it’s only out by a factor of 2.6. On the other hand, getting an an estimate of 5 billion – which can happen on bad days – is extremely unlikely if the data is uniformly distributed across dates.

Secondly, the DBA supplied us with some data from the recent past with an aggregate query for “trunc(dt_tm)”, with the following results:

TRUNC(DT_TM)   COUNT(*)
------------ ----------
01-FEB-18    44,254,425
02-FEB-18    46,585,349
03-FEB-18    43,383,099
04-FEB-18    32,748,364
05-FEB-18    37,993,126
06-FEB-18    39,708,994
07-FEB-18    38,696,777
08-FEB-18    41,871,780
09-FEB-18    46,702,852
10-FEB-18    42,744,870
11-FEB-18    34,971,845
12-FEB-18    37,165,983

Recent data seems to follow an average of around 40M rows per day, so the estimate of 182M that we can derive from the stored statistics is a long way off: the present is behaving very differently from the past and that’s a relatively common problem with very large data sets – though it’s more usual for rolling averages to increase from the past to the present because the data is often representing the growth of a business over time. Can we create a hypothesis to explain the discrepancy, and could that hypothesis also account for the sample producing some very strange estimates ?

Finally, slightly more subtle and only included for completeness, if this column is supposed to hold date and time to the nearest second – which is what you might expect from an Oracle date type – there are 38 million possible values (438 x 86,400) it could be holding, and that’s more than the actual number of distinct values by a factor of 6. We can also work out that 80 billion rows over 438 days is 2,000 rows per second (on average). Averages are often misleading, of course, many systems have a pattern where a working day shows most of the data created in a 12 – 16 hour window with a couple of hours of more intense activity. For reference, though: average rows per second for the recent data is roughly 40M/86400 = 460; while the average we derive from the stored statistics is 80B / 6M = 13000 rows per second; this unlikely pattern needs a “non-uniform” explanation.

How do these three thoughts help us to understand or, to be more accurate, to make a sensible guess about why the optimizer can use dynamic sampling and get a wildly variable estimate which can be 1 or 2 orders of magnitude wrong. (varying between 479M and 5,000M compared to the recent actual 40M)?

Here’s one simple idea: extrapolate the 40M rows per day over 80B rows: that’s 2,000 days (possibly rather more since businesses tend to grow). What if the dt_tm is the timestamp for the moment the row was loaded into the database, and a couple of years ago (maybe around “30th Nov 2016”) the data was restructured and the existing five years of data was loaded over a very short period of time – let’s say one week. This would leave you with 17B rows of “new” data with a dt_tm spread at 40M rows per day for most of 438 days, and 63B rows of “historic” data packed into 7 days (at 9B rows per day).

I don’t know how Oracle would have randomly selected its sample from an extremely large table with 2,048 physical data segments but it’s totally believable that a small, widely scattered sample could end up with an extremely unrepresentative subset of the data. A completely random sample of the data would produce an estimate of around 500M rows for the predicate; but it would only take a fairly small variation in the sample (taking a few too many “historic” rows) to produce a large enough change in the estimate to change the execution plan, and a rare, but not extreme variation could easily take the estimate up to 5B.

Next Steps

It would be at this point in a performance assignment that I’d be asking around to find out if my guess about a massive data load operation was correct – if I couldn’t get the answer by talking to people I’d run a query against the whole data set to check the hypothesis, because there’s clearly some sort of skew in the data that’s causing a problem. I’d also run the critical part of the query a couple of times with events 10046/level 4 and 10053 set (but only fetching the first few rows) to find out from the trace file how large a sample Oracle was using, and then run the sampling query a few times to see what the sampled results looked like. Depending on the results I’d either find a way to stop Oracle from sampling for this query or I might create a virtual column (or just extended stats since it’s 11g) on just the trunc(dt_tm), possibly with a histogram in place (maybe coded by hand) if that could isolate the special dates and leave Oracle with a better estimate of the typical date. I might find I had to change the coalesce() to an nvl() as well – or create a virtual  column – to stop the sampling.

Finally, it’s worth noting that in 11g it’s possible to create pending (table preference “PUBLISH” = FALSE) stats for testing purposes; it’s also worth noting that the default histogram on trunc(dt_tm) would be a height-balanced histogram while we could create a frequency histogram in 12c since 12c allows us to specify up to 2,048 buckets.

Footnote

If you check the ODC thread you’ll see that the OP has marked as correct a suggestion to change:

    TRUNC (TB1.DT_TM)  = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');  

to

    dt_tm >= trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))
and dt_tm <  trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))+1

Note that that’s “greater than or equal to” at one end and “strictly less than” at the other when using “date + 1”.

This has the effect of giving the optimizer a chance of using the low/high values of the column to produce a better (though perhaps still overlarge) and consistent estimate of the rows in the date range; and it may also stop the optimizer from doing dynamic sampling at level 3 (the “I’m guessing, let’s check” level) though it’s possible that the sampling would be disabled only if the coalesce() were changed to an nvl() as well.

Of course, from the information supplied, this looks like the OP would have to change a view definition and the run-time code to achieve the result. But in an ideal world doing things that avoid confusing the optimizer is usually the sensible strategy provided it doesn’t take an extreme amount of coding and testing.

 

July 10, 2018

Validate FK

Filed under: constraints,Infrastructure,Oracle — Jonathan Lewis @ 10:42 am BST Jul 10,2018

A comment arrived yesterday on an earlier posting about an enhancement to the truncate command in 12c that raised the topic of what Oracle might do to validate a foreign key constraint. Despite being sure I had the answer written down somewhere (maybe on a client site or in a report to a client) I couldn’t find anything I’d published about it, so I ran up a quick demo script to show that all Oracle does is construct a simple SQL statement that will do check the data – and then do whatever the optimizer does to produce the fastest possible plan.

Here’s the script – with a few variations to show what happens if you start tweaking features to change the plan.

rem
rem     Script:         validate_fk.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table parent 
as 
select  * 
from    all_Objects 
where   rownum <= 10000 -- > comment to avoid wordpress format issue
;

alter table parent add constraint par_pk primary key(object_id);

execute dbms_stats.gather_table_stats(null, 'parent', cascade=>true)


create table child 
as 
select par.* 
from    (select rownum from dual connect by level <= 10) v1, --> comment to avoid wordpress format issue
        parent par
; 

alter table child add constraint chi_fk_par foreign key(object_id) references parent enable novalidate; 
create index chi_fk_par on child(object_id); 
execute dbms_stats.gather_table_stats(null, 'child', cascade=>true)


-- alter table child modify object_id null;
-- alter table child parallel(degree 8);
-- alter session set "_fast_full_scan_enabled" = FALSE;
-- alter session set "_optimizer_outer_to_anti_enabled" = false;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 12';
alter table child modify constraint chi_fk_par validate;
alter session set events '10046 trace name context off;

All I’ve done is created parent table with a primary key, and a child table with 10 rows per parent. I’ve created a foreign key constraint on the child table, enabled it (so future data will be checked) but not validated it (so there’s no enforced guarantee that the existing data is correct). Then I’ve issued a command to validate the foreign key.

The flush of the buffer cache is to allow me to see the I/O that takes place and will also (usually) let me see some if there are any strange issues due to any recursive SQL Oracle. As you can see I’ve also got a couple of commented commands that might cause a couple of variations in behaviour.

Here’s the critical content from the output of the trace file summary from tkprof (in versions from 11.2.0.4 to 12.2.0.1):


select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3
from
 "TEST_USER"."CHILD" A , "TEST_USER"."PARENT" B where( "A"."OBJECT_ID" is not
  null) and( "B"."OBJECT_ID" (+)= "A"."OBJECT_ID") and( "B"."OBJECT_ID" is
  null)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.01       0.02        241        373          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.02       0.02        241        373          0           0

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  NESTED LOOPS ANTI (cr=373 pr=241 pw=0 time=21779 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=17753 us starts=1 cost=32 size=1700000 card=100000)(object id 104840)
     10000      10000      10000   INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=4494 us starts=10000 cost=0 size=49995 card=9999)(object id 104838)

As you can see, Oracle writes SQL for an outer join with an “is null” predicate on the outer table – which the optimizer converts to an anti-join, running a nested loop in this case. It’s an interesting little oddity that the code includes the predicate “A”.”OBJECT_ID” is not null given that the column is declared as not null – but this is presumably a developer deciding to re-use code even if it then includes a redundant predicate (which is effectively zero cost – since the optimizer can use transitive closure to eliminate it).

Given that Oracle has converted an outer join to an anti join I obviously had to check what would happen if I disabled this conversion by altering the “_optimizer_outer_to_anti_enabled” parameter to false. The optimizer obeyed the session setting with the following plan in the trace:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  FILTER  (cr=373 pr=241 pw=0 time=226926 us starts=1)
    100000     100000     100000   NESTED LOOPS OUTER (cr=373 pr=241 pw=0 time=177182 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000    INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=40811 us starts=1 cost=32 size=1700000 card=100000)(object id 104848)
    100000     100000     100000    INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=119363 us starts=100000 cost=0 size=49995 card=9999)(object id 104846)

The significant difference is in the CPU usage, of course, and to a degree the magnitude of the change is dictated by the pattern and distribution of the data. The number of CR gets hasn’t changed as the number of index probes jumps from 10,000 to 100,000 because Oracle will have pinned index blocks (There’s a very old article on my old website if you want to read more about buffer pins).

The original question was about the effect of a local session setting that disabled index fast full scans, and followed up with a question on parallelism. After seeing the effect of changing one optimizer parameter at the session level you probably won’t be surprised by the following two results.  First, when the only change I make is the setting of the “_index_fast_full_scan_enabled” parameter, and then when the only change is the declared parallelism of the child table.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  MERGE JOIN ANTI (cr=240 pr=240 pw=0 time=120163 us starts=1 cost=247 size=22000 card=1000)
    100000     100000     100000   INDEX FULL SCAN CHI_FK_PAR (cr=218 pr=218 pw=0 time=20314 us starts=1 cost=222 size=1700000 card=100000)(object id 104852)
    100000     100000     100000   SORT UNIQUE (cr=22 pr=22 pw=0 time=81402 us starts=100000 cost=25 size=50000 card=10000)
     10000      10000      10000    INDEX FULL SCAN PAR_PK (cr=22 pr=22 pw=0 time=1185 us starts=1 cost=22 size=50000 card=10000)(object id 104850)


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=15 pr=2 pw=0 time=722483 us starts=1)
         0          0          0   PX SEND QC (RANDOM) :TQ10001 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0    HASH JOIN ANTI BUFFERED (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0     PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0      TABLE ACCESS FULL CHILD (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0      PX SEND BROADCAST :TQ10000 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0       PX SELECTOR  (cr=0 pr=0 pw=0 time=0 us starts=0)
         0          0          0        INDEX FAST FULL SCAN PAR_PK (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)(object id 104854)

In the first case my version of Oracle has switched to a merge anti-join with an index full scan (not FAST full scan). It’s interesting to note that the merge join anti hasn’t been as clever as the nested loop anti in avoiding probes of the second data source as it walks the foreign key index (note how starts=100000 in the SORT UNIQUE line).

In the second case all the work was done by the parallel query slaves – and the PX SELECTOR line tells you that this plan must have come from 12c. As you can see we’re still doing an anti-join but this time we do a parallel tablescan of the child table (as we haven’t enabled the index for parallel execution – if we had altered the index to parellel(degree 8) as well we would have seen a parallel index fast full scan instead of the parallel tablescan.)

Bottom line: the SQL executed to validate a foreign key constraint is essentially a join between the parent and child tables, Oracle will simply optimize that statement to the best of its abilities based on the current session settings. If you want to test on a clone (or accurate model) of the tables you may find that you can create an sql_patch that works (even though the necessary SQL will be optimised as SYS – though so far I’ve only tried this with a couple of variants of the parallel() hint on 12.2.0.1)

<h3>Footnote</h3>

If you were wondering what the three bind variables in the query were, this is the relevant extract from the 10046 trace file with bind variable tracing enabled:

 Bind#0
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c38  bln=32  avl=09  flg=05
  value="TEST_USER"
 Bind#1
  oacdty=01 mxl=32(05) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c00  bln=32  avl=05  flg=05
  value="CHILD"
 Bind#2
  oacdty=01 mxl=32(10) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359bc8  bln=32  avl=10  flg=05
  value="CHI_FK_PAR"

The values are the owner, table, and constraint names. (Though you have to modify the code a little to show that the last one is the constraint name and not the index name).

 

 

July 9, 2018

Historic Stats

Filed under: Oracle,Statistics — Jonathan Lewis @ 12:45 pm BST Jul 9,2018

If you want to examine historic object stats Oracle gives you a few procedures in the dbms_stats package to compare sets of stats captured at two different time periods, but there’s no view that you can query to get an idea of how a table’s stats have changed over time. This is a problem that can be addressed when you discover two things:

  • There are views to report pending table, index, column and histogram stats.
  • Pending stats are stored stored as “historic” stats with a future date.

Once you’ve spotted the second detail, you can acquire the SQL to generate the pending stats views:


SQL> select view_name from dba_views where view_name like 'DBA%PENDING%STAT%';

VIEW_NAME
--------------------------------------------------------------------------------------------------------------------------------
DBA_TAB_PENDING_STATS
DBA_IND_PENDING_STATS
DBA_COL_PENDING_STATS
DBA_TAB_HISTGRM_PENDING_STATS


SQL> set long 20000
SQL> set pagesize 0
SQL> select dbms_metadata.get_ddl('VIEW','DBA_TAB_PENDING_STATS','SYS') from dual;

  CREATE OR REPLACE FORCE NONEDITIONABLE VIEW "SYS"."DBA_TAB_PENDING_STATS" ("OW
NER", "TABLE_NAME", "PARTITION_NAME", "SUBPARTITION_NAME", "NUM_ROWS", "BLOCKS",
 "AVG_ROW_LEN", "IM_IMCU_COUNT", "IM_BLOCK_COUNT", "SCAN_RATE", "SAMPLE_SIZE", "
LAST_ANALYZED") AS
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
	 h.im_imcu_count, h.im_block_count, h.scanrate, h.samplesize, h.analyzet
ime
  from	 sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime > systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
	 h.avgrln, h.im_imcu_count, h.im_block_count, h.scanrate, h.samplesize,
	 h.analyzetime
  from	 sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime > systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
	 h.blkcnt, h.avgrln, h.im_imcu_count, h.im_block_count, h.scanrate,
	 h.samplesize, h.analyzetime
  from	sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
	sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
	tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime > systimestamp

Notice the critical predicate repeated across the UNION ALL: “and h.savtime > systimestamp” – all we have to do is change that to “less than or equal to” (or just delete it if we’re not fussy about reporting pending stats along with historic) then add a few columns reporting the available statistics and we can create a view that we can query for the historic stats.


rem
rem     Script:         optstat_table_history.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Notes
rem     Have to be connected as SYS, or have directly
rem     granted privileges on the sys tables to do this
rem

create or replace force view sys.jpl_tab_history_stats (
        owner, table_name, partition_name, subpartition_name,
        num_rows, blocks, avg_row_len, sample_size, last_analyzed
)
as
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
         h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
         h.avgrln, h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
         h.blkcnt, h.avgrln, h.samplesize, h.analyzetime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime <= systimestamp
;

drop public synonym jpl_tab_history_stats;
create  public synonym jpl_tab_history_stats for sys.jpl_tab_history_stats;
grant select on jpl_tab_history_stats to public;

Now you can query historic stats for any schema and table.

If you don’t want to create a view in the sys schema (and don’t want to create a permanent object at all) you can always use subquery factoring as an easy way of editing the metadata into a suitable query. You have to be connected as a user with privileges (that could be through a role) to view the relevant sys tables, though.


with jpl_tab_history (
        owner, table_name, partition_name, subpartition_name,
        num_rows, blocks, avg_row_len, sample_size, last_analyzed
) as (
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
         h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
         h.avgrln, h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
         h.blkcnt, h.avgrln, h.samplesize, h.analyzetime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime <= systimestamp
)
select
        table_name, blocks, num_rows, avg_row_len, sample_size, last_analyzed
from
        jpl_tab_history
where
        owner = 'TEST_USER'
and     table_name = 'T1'
order by
        last_analyzed
;

TABLE_NAME               BLOCKS   NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE LAST_ANAL
-------------------- ---------- ---------- ----------- ----------- ---------
T1                          188      10000         120	     10000 04-JUL-18
T1                          322      18192         118	     18192 06-JUL-18

Matching code for indexes, columns and histograms is left as an exercise to the interested reader.

July 6, 2018

Question Time

Filed under: Oracle — Jonathan Lewis @ 10:44 am BST Jul 6,2018

It’s that time of year again – the UKOUG Tech conference is approaching and I’m organising a panel session on the Cost Based Optimizer.

As usual I’ll be asking Christian Antognini, Nigel Bayliss, Maria Colgan and a special guest panellist (maybe David Dimbleby will be able to find the time now that he’s retired from chairing his own panel); and Neil Chandler and Martin Widlake will be volunteering to take on their inimitable roles as MCs.

If you’ve got any questions you’d like to put to the panel you will be able to write them down on the day, but it would be nice to have a few supplied in advance in the comments below.  Tactical, strategic, technical, or just plain curious – this is a panel that can tell you what can be done, what shouldn’t be done, and how to do the things you shouldn’t do but sometimes have to do.

Even if it’s a question where you already know the answer, if you think it’s intersting put it to the panel to hear what they have to say.

If you prefer to email your questions then click this link.

Since a tweet would be a nicely sized question you could reply to my tweeted call for questions to submit one.

July 2, 2018

Clustering_Factor

Filed under: CBO,Indexing,Oracle,Statistics — Jonathan Lewis @ 1:24 pm BST Jul 2,2018

Here’s another little note on the clustering_factor for an index and the table preference table_cached_blocks that can be set with a call to dbms_stats.set_table_prefs(). I might be repeating a point that someone made in a comment on an older posting but if that’s the case I can’t find the comment at present, and it’s worth its own posting anyway.

The call to dbms_stats.set_table_prefs(null,'{tablename}’,’table_cached_blocks’,N) – where N can be any integer between 1 and 255, will modify Oracle’s algorithm for calculating the clustering_factor of an index. The default is 1, which often means the clustering_factor is much higher than it ought to be from a humanly visible perspective and leads to Oracle not using an index that could be a very effective index.

The big point is this: the preference has no effect when you execute a “create index” statement, or an “alter index rebuild” statement. Here’s a simple script to demonstrate the point.


rem
rem     Script:         table_cached_blocks_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

drop table t1 purge;
create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum-1,100)               n1,
        mod(rownum-1,100)               n2,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator
;

column blocks new_value m_blocks

select  blocks 
from    user_tables
where   table_name = 'T1'
;

column preference_value format a40

select  preference_name, preference_value
from    user_tab_stat_prefs
where
        table_name = 'T1'
;

I’ve created a very simple table of 10,000 rows with two identical columns and captured the number of blocks (which I know will be less than 256) in a substitution variable which I’m going to use in a call to set_table_prefs(). I’ve also run a quick check to show there are no table preferences set for the table. I’ll be running the same check again after setting the table_cached_blocks preference. Step 1 – create two indexes, but set the preference after building the first one; I’ve shown the result of the query against user_indexes immediately after the query:


create index t1_i1 on t1(n1);

execute dbms_stats.set_table_prefs(null,'t1','table_cached_blocks',&m_blocks)

create index t1_i2 on t1(n2);

select
        index_name, clustering_factor, to_char(last_analyzed, 'dd-mon-yyyy hh24:mi:ss') analyzed
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME	     CLUSTERING_FACTOR ANALYZED
-------------------- ----------------- -----------------------------
T1_I1				 10000 26-jun-2018 14:13:51
T1_I2				 10000 26-jun-2018 14:13:51


Now we check the effect of rebuilding the t1_i2 index – the one second sleep is so that we can use the last_analyzed time to see that new stats have been created for the index:


execute dbms_lock.sleep(1)
alter index t1_i2 rebuild /* online */ ;

select
        index_name, clustering_factor, to_char(last_analyzed, 'dd-mon-yyyy hh24:mi:ss') analyzed
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME	     CLUSTERING_FACTOR ANALYZED
-------------------- ----------------- -----------------------------
T1_I1				 10000 26-jun-2018 14:13:51
T1_I2				 10000 26-jun-2018 14:13:52


Finally we do an explicit gather_index_stats():


execute dbms_lock.sleep(1)
execute dbms_stats.gather_index_stats(null,'t1_i2')

select
        index_name, clustering_factor, to_char(last_analyzed, 'dd-mon-yyyy hh24:mi:ss') analyzed
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME	     CLUSTERING_FACTOR ANALYZED
-------------------- ----------------- -----------------------------
T1_I1				 10000 26-jun-2018 14:13:51
T1_I2				   179 26-jun-2018 14:13:53

At last – on the explicit call to gather stats – the table_cached_blocks preference is used.

Dire Threat

Think about what this means: you’ve carefully worked out that a couple of indexes really need a special setting of table_cached_blocks and you gathered stats on those indexes so you have a suitable value for the clustering_factor. Then, one night, someone decides that they’re going to rebuild some of those indexes. The following morning the clustering_factor is much higher and a number of critical execution plans change as a consequence, and don’t revert until the index statistics (which are perfectly up to date!) are re-gathered.

Footnote

The same phenomenon appears even when you’ve set the global preference for stats collection with dbms_stats.set_global_prefs().

June 29, 2018

Truncate upgrade

Filed under: 12c,Infrastructure,Oracle,Upgrades — Jonathan Lewis @ 8:22 am BST Jun 29,2018

Connor McDonald produced a tweet yesterday linking to a short video he’d created about an enhancement to the truncate command in 12c. If you have referential integrity declared between a parent and child table then in 12c you can truncate the parent table and Oracle will truncate the child table for you – rather than raising an error. The feature requires the foreign key constraint to be declared “on delete cascade” – which is an option that I don’t see used very often. Unfortunately if you try to change an existing foreign key constraint to meet this requirement you’ll find that you can’t (yet) use the “alter table modify constraint” to make the necessary change. As Connor pointed out, you’ll have to drop and recreate the constraint – which leaves you open to bad data getting into the system or an outage while you do the drop and recreate.

If you don’t want to stop the system but don’t want to leave even a tiny window for bad data to arrive here’s a way to do it. In summary:

  1. Add a virtual column to the child table “cloning” the original foreign key column
  2. Create an index on the  virtual column (if you have concerns about “foreign key locking”)
  3. Add a foreign key constraint based on the virtual column
  4. Drop the old foreign key constraint
  5. Recreate the old foreign key constraint “on delete cascade”
  6. Drop the virtual column

Here’s some sample SQL:


rem
rem	Script:		122_truncate_workaround.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jun 2018
rem	Purpose:	
rem
rem	Last tested 
rem		18.1.0.0	via LiveSQL
rem		12.2.0.1
rem		12.1.0.2

drop table child;
drop table parent;

create table parent (
	p number,
	constraint p_pk primary key(p)
);

create table child (
	c	number,
	p	number,
	constraint c_pk primary key(c),
	constraint c_fk_p foreign key (p) references parent
);

create index c_fk_p on child(p);

insert into parent values(1);
insert into child values(1,1);

commit;

prompt	==========================================================================
prompt	Truncate  should fail with
prompt	ORA-02266: unique/primary keys in table referenced by enabled foreign keys
prompt	==========================================================================

truncate table parent;

alter table child add (
	pv generated always as (p+0) virtual
)
;

create index c_ipv on child(pv) online;

alter table child add constraint c_fk_pv
	foreign key (pv)
	references parent(p)
	on delete cascade
	enable novalidate
;
alter table child modify constraint c_fk_pv validate;

alter table child drop constraint c_fk_p;

prompt	===================================================================================
prompt	Illegal insert (first 1) should fail with
prompt	ORA-02291: integrity constraint (TEST_USER.C_FK_PV) violated - parent key not found
prompt	===================================================================================

insert into child (c,p) values(2,2);
insert into child (c,p) values(2,1);
commit;

alter table child add constraint c_fk_p
	foreign key (p)
	references parent(p)
	on delete cascade
	enable novalidate
;

alter table child modify constraint c_fk_p validate;

prompt	===================================================
prompt	Dropping the virtual column results in Oracle
prompt	dropping the index and constraint at the same time
prompt	===================================================

alter table child drop column pv;

The overhead of this strategy is significant – I’ve created an index (which you may not need, or want, to do) in anticipation of a possible “foreign key locking” issue – and I’ve used the online option to avoid locking the table while the index is created which means Oracle has to use a tablescan to acquire the data. I’ve enabled a new constraint without validation (which takes a brief lock on the table) then validated it (which doesn’t lock the table but could do a lot of work). Then I’ve dropped the old constraint and recreated it using the same novalidate/validate method to minimise locking time. If I were prepared simply to drop and recreate the original foreign key I wouldn’t need to create that index and I’d only do one validation pass rather than two.

 

June 26, 2018

Hacking Profiles

Filed under: Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 8:25 am BST Jun 26,2018

Saturday’s posting about setting cursor_sharing to force reminded me about one of the critical limitations of SQL Profiles (which is one of those little reason why you shouldn’t be hacking SQL Profiles as a substitute for SQL Plan Baselines). Here’s a demo (taking advantage of some code that I think Kerry Osborne published several years ago) of creating an SQL Profile from the current execution plan of a simple statement – first we create some data and find the sql_id and child_number for a simple query:

rem
rem     Script:         sql_profile_restriction.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4 -- > comment to avoid WordPress format issue
;

alter system flush shared_pool;

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;

column sql_id new_value m_sql_id
column child_number new_value m_child_number

select  sql_id , child_number
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

Now I can create the SQL Profile for this query using the Kerry Osborne code:


declare
        ar_profile_hints        sys.sqlprof_attr;
        cl_sql_text clob;
begin
        select
                extractvalue(value(d), '/hint') as outline_hints
        bulk collect into 
                ar_profile_hints
        from
                xmltable(
                        '/*/outline_data/hint'
                        passing (
                                select
                                        xmltype(other_xml) as xmlval
                                from
                                        v$sql_plan
                                where
                                        sql_id = '&m_sql_id'
                                and     child_number =  &m_child_number 
                                and     other_xml is not null
                )
        ) d;

        select
                sql_fulltext
        into
                cl_sql_text
        from
                v$sql
        where
                sql_id = '&m_sql_id'
        and     child_number =  &m_child_number
        ;

        dbms_sqltune.import_sql_profile(
                sql_text        => cl_sql_text, 
                profile         => ar_profile_hints, 
                category        => 'DEFAULT',
                name            => 'PROFILE_LITERAL',
                force_match     =>  true
        );
end;
/

Note particularly that I have given the profile a simple name, put it in the DEFAULT category, and set force_match to true (which means that the profile ought to be used even if I change the literal values in the query). So now let’s check that the profile will be used as expected. First I’ll create an index that is a really good index for this query, then I’ll run the query to see if Oracle uses the index or obeys the profile; then I’ll change the query (literals) slightly and check again. I’ll also run a query that won’t be recognised as legally matching (thanks to the changed “hint”) to demonistrate that the index could have been used if the profile hadn’t been there:


alter system flush shared_pool;
set serveroutput off

prompt  =============================
prompt  Is the SQL Profile used ? Yes
prompt  =============================

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor);

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor);

select /*+ Non-match */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor);

Here (with a little cosmetic adjustment) are the three outputs from dbms_xplan.display_cursor():

SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=15 AND "N2"=15))

Note
-----
   - SQL profile PROFILE_LITERAL used for this statement


SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=16 AND "N2"=16))

Note
-----
   - SQL profile PROFILE_LITERAL used for this statement


SQL_ID  3gvaxypny9ry1, child number 0
-------------------------------------
select /*+ Non-match */ count(*) from t1 where n1 = 16 and n2 = 16
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=16 AND "N2"=16)

As you can see the SQL Profile is reported as used in the first two queries, and (visibly) seems to have been used. Then in the third query where we wouldn’t expect a match the SQL Profile is not used and we get a plan that shows the index would have been used for the other queries had the SQL Profile not been there. So far, so good – the profile behaves as everyone might expect.

Bind Variable Breaking

Now let’s repeat the entire experiment but first do a global find and replace to change every occurrence of “n2 = 16” to “n2 = :b1”. We’ll also change the name of the SQL Profile when we create it to PROFILE_MIXED, and we’ll put in a couple of lines at the top of the script to declare the variable b1 and set its value, then the final test in the script will look like this:


alter system flush shared_pool;
create index t1_i1 on t1(n1, n2);

exec :b1 := 15

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = :b1;
select * from table(dbms_xplan.display_cursor);

exec :b1 := 16

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = :b1;
select * from table(dbms_xplan.display_cursor);

And here are the execution plans from the two queries:


SQL_ID  236f82vmsvjab, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = :b1
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=15 AND "N2"=:B1))

Note
-----
   - SQL profile PROFILE_MIXED used for this statement


SQL_ID  7nakm3tw27z3c, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = :b1
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=16 AND "N2"=:B1)


As you can see the execution plan for the original query is still doing a full tablescan and reporting the SQL Profile as used; but we’re not using (or reporting) the SQL Profile when we change the literal values – even though a query against dba_sql_profiles will tell us that the profile has force_matching = ‘YES’.

tl;dr

(Clarified in response to Mohammed Houri’s comment below)
If you use an SQL Profile with force_match => true to “hide” the literals in a statement that includes bind variables (even if they appear only in the select list, in fact) the mechanism will not be used, and the SQL Profile will apply only to the original statement.

Update

Christian Antognini has an elegant little script that uses the dbms_sqltune.sqltext_to_signature() function to highlight this point (among others).  Bear in mind, before you run the script, that you need to be licensed to use the dbms_sqltune package to do so.

 

June 23, 2018

Cursor_sharing force

Filed under: CBO,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 12:05 pm BST Jun 23,2018

Prompted by a recent ODC (OTN) question I’ve just written up an example of one case where setting the cursor_sharing parameter to force doesn’t work as you might expect. It’s a specific example of what I believe is a theme that can appear in several different circumstances: if your SQL mixes “genuine” bind variable with literals then the literals may not be substituted.

Here’s a simple data set to start with:


rem
rem     Script:         cursor_sharing_limit.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem
rem     Last tested
rem             18.1.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4 -- > comment to avoid WordPress format issue
;

I haven’t bothered to gather stats – it’s not needed in trivial cases of CTAS.

Here’s one way to execute SQL that doesn’t get literal substitution when (apparently) it should:


alter system flush shared_pool;
alter session set cursor_sharing=force;

declare
        m_ct number;
        m_n1 number := 20;
begin
        execute immediate
                'select /*+ trace this */ count(*) from t1 where n2 = 15 and n1 = :b1'
                into m_ct using m_n1;
        dbms_output.put_line(m_ct);

        execute immediate
                'select /*+ trace this too */ count(*) from t1 where n1 = 15 and n2 = 15'
                into m_ct;
        dbms_output.put_line(m_ct);
end;
/

alter session set cursor_sharing=exact;

I’ve included a “hint” that allows me to find the SQL statements in v$sql very easily – and here they are, preceded by the query I used to find them:


select  sql_id, parse_calls, executions, rows_processed, sql_text 
from    v$sql
where   sql_text like 'select%trace this%'
and     sql_text not like '%v$sql%'
;


SQL_ID        PARSE_CALLS EXECUTIONS ROWS_PROCESSED
------------- ----------- ---------- --------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
cbu4s78h5pfj5           1          1              1
select /*+ trace this too */ count(*) from t1 where n1 = :"SYS_B_0" and n2 = :"SYS_B_1"

cru67sufptx8x           1          1              1
select /*+ trace this */ count(*) from t1 where n2 = 15 and n1 = :b1

Notice how one statement – which used only literals – underwent bind variable transformation but the other – with its mix of literals and bind variables – didn’t. On the other hand, if I execute the mixed format statement from a basic SQL*Plus session then pull the plan from memory, this is what I get:


SQL> select /*+ SQL*Plus session */ count(*) from t1 where n2 = 15 and n1 = :b1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select  sql_id, parse_calls, executions, rows_processed, sql_text
  2  from    v$sql
  3  where   sql_text like 'select%Plus session%'
  4  and     sql_text not like '%v$sql%'
  5  ;

SQL_ID        PARSE_CALLS EXECUTIONS ROWS_PROCESSED
------------- ----------- ---------- --------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
gq2qy2a9yuta7           1          1              1
select /*+ SQL*Plus session */ count(*) from t1 where n2 = :"SYS_B_0" and n1 = :b1

When trying to take advantage of cursor_sharing = force, you have to think about how the SQL gets to the database engine – is it delivered directly, or is it sent through some layer of code that means the raw SQL statement is protected in some way from the substitution code.

Footnote:

I happen to have used an example that puts the bind variable into the where clause; you will see the same effect even if the bind variables are in the select list – for example if you’ve selected something like to_char(date_col, :formatting_variable).

June 11, 2018

dbms_random

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 8:31 am BST Jun 11,2018

In a recent ODC thread someone had a piece of SQL that was calling dbms_random.string(‘U’,20) to generate random values for a table of 100,000,000 rows. The thread was about how to handle the ORA-30009 error (not enough memory for operation) that is almost inevitable when you use the “select from dual connect by level <= n” strategy for generating very large numbers of rows, but this example of calling dbms_random.string() so frequently prompted me to point out an important CPU saving , and then publicise through this blog a little known fact (or deduction) about the dbms_random.string() function.

If you generate a random string of length 6 using only upper-case letters there are 308,915,766 different combinations (266); so if you’re after “nearly unique” values for 100 million rows then a six character string is probably good enough – it might give you a small percentage of values which appear in a handful rows but most of the values are likely to be unique or have two rows. If you want to get closer to uniqueness then 7 characters will do it, and 8 will make it almost certain that you will get a unique value in every row.

So if you want “nearly unique” and “random 20 character strings” it’s probably sufficient to generate random strings of 6 to 8 characters and then rpad() them up to 20 characters with spaced – the saving in CPU will be significant; roughly a factor of 3 (which is going to matter when you’re trying to generate 100 million rows. As a little demo I supplied the OP with a script to create a table of just one million random strings – first of 20 random characters, then of 6 random characters with 14 spaces appended. The run time (mostly CPU) dropped from 1 minute 55 seconds to 41 seconds.

Why is there such a difference ? Because to generate a random string of 6 characters Oracle generates a random string of one character six times in a row and concatenates them. The difference between 6 calls and 20 calls per row gives you that factor of around 3. For a quick demo, try running the following anonymous PL/SQL block:

rem
rem     Script:         random_speed.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2010
rem

begin
        dbms_random.seed(0);
        dbms_output.put_line(dbms_random.string('U',6));
        dbms_output.new_line;

        dbms_random.seed(0);
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
end;
/

Jere are the results I got from instances of 12.1.0.2, 12.2.0.1, and 18.1.0.0 (from LiveSQL):


BVGFJB
B
V
G
F
J
B

I haven’t shown the tests for all the possible dbms_random.string() options but, unsurprisingly, changing the test to use the ‘L’ (lower case alpha) option produces the same effect (and the same 6 letters changed to lower case). The same effect, with different characters, also appeared using the ‘A’ (mixed case alpha), ‘X’ (uppercase alphanumeric) and ‘P’ (all printable characters) options.

I haven’t considered the effect of using a multi-byte character set – maybe Oracle calls its random number generator once per byte rather than once per character. The investigation is left as an exercise to the interested reader.

tl;dr

When generating a very large number of random strings – keep the “operational” part of the string as short as you can and leave the rest to be rpad()‘ed.

June 8, 2018

Massive Delete

Filed under: Oracle,Performance — Jonathan Lewis @ 9:14 am BST Jun 8,2018

The question of how to delete 25 million rows from a table of one billion came up on the ODC database forum recently. With changes in the numbers of rows involved it’s a question that keeps coming back and I wrote a short series for AllthingsOracle a couple of years ago that discusses the issue. This is note is just a catalogue of links to the articles:

There is an error in part 2 in the closing paragraphs – it says that the number of index entries deleted varies “from just one to 266″, it actually varies from 181 to 266.

 

June 1, 2018

Index Bouncy Scan 4

Filed under: 12c,Execution plans,Indexing,Oracle,Partitioning,Performance — Jonathan Lewis @ 9:19 am BST Jun 1,2018

There’s always another hurdle to overcome. After I’d finished writing up the “index bouncy scan” as an efficient probing mechanism to find the combinations of the first two columns (both declared not null) of a very large index a follow-up question appeared almost immediately: “what if it’s a partitioned index”.

The problem with “typical” partitioned indexes is that the smallest value of the leading column might appear in any of the partitions, and the combination of that value and the smallest value for the second column might not appear in all the partitions where the smallest value appears. Consider a table of 10 partitions and a locally partitioned index on (val1, val2) where neither column is the partition key. The smallest value of val1 – call it k1 may appear only in partitions 4, 7, 8, 9, 10; the lowest combination of (val1, val2) – call it (k1, k2) may appear only in partitions 8 and 10. In a global (or globally partitioned) index the pair (k1, k2) would be at the low (leftmost) end of the index, but to find the pair in a locally partitioned index we have to probe the leftmost end of 10 separate index partitions – and once we’ve done that each “bounce” requires us to probe 10 index partitions for the first (val1, val2) pair where val1 = k1 and val2 is just just greater than k2, or val1 is just greater than k1 and val2 is the minimum for that value of val1. The more partitions we have the greater the number of index partitions we have to probe at each step and the more likely it is that we ought to switch to a brute force index fast full scan with aggregate.

Here’s the starting point for solving the problem (maybe) – I’ll create a simple partitioned table, and use the “bouncy scan” code from the earlier posting with the table and column names adjusted accordingly:


rem
rem     Script:         bouncy_index_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem

create table pt1 (
        object_id,
        owner,
        object_type,
        object_name,
        status,
        namespace
)
nologging
partition by hash (object_id) partitions 4
as
select
        object_id,
        owner,
        object_type,
        object_name,
        status,
        namespace
from
        (select * from all_objects),
        (select rownum n1 from dual connect by level <= 10) ; alter table pt1 modify(status not null); execute dbms_stats.gather_table_stats(null,'pt1',granularity=>'ALL',method_opt=>'for all columns size 1')

create index pt1_i1 on pt1(status, namespace) nologging local;

prompt  ==================================================
prompt  Make some rows in the last partition have a status
prompt  that won't be found in the first partition.
prompt  ==================================================

column namespace format 99999999
column partition_name new_value m_part

select  partition_name
from    user_tab_partitions
where   table_name = 'PT1'
order by
        partition_position
;

update pt1 partition (&m_part) set status = 'MISSING' where rownum <= 10;

select
        dbms_mview.pmarker(rowid), status, namespace
from    pt1
where   status = 'MISSING'
;

I’ve created a hash partitioned copy of view all_objects, duplicating it 10 times and created a local index on the columns (status, namespace). My data has two values for status, ‘VALID’ and ‘INVALID’, and there are about 10 values for the namespace. I’ve then updated a few rows in the last partition, giving them a status value that is between the two current values – this is just one little test case to help me check that my code is going to catch all values even if they don’t appear in the first table partition.

Here’s the query from the earlier posting – and it does get the right results – followed by the execution plan:


alter session set statistics_level = all;

set serveroutput off
set linesize 180
set pagesize 60

prompt  =============================================================
prompt  Original Query, showing expensive access for driving minimums
prompt  =============================================================

with bounce1(status, namespace) as (
        select status, namespace
        from    (
                select
                        /*+ index(pt1) no_index_ffs(pt1) */
                        status, namespace,
                        row_number() over(order by status, namespace) rn
                from    pt1
        )
        where
                rn = 1
        union all
        select
                v1.status, v1.namespace
        from    bounce1,
                lateral (
                              select  /*+ index(pt1) no_index_ffs(pt1) no_decorrelate */
                                      pt1.status, pt1.namespace
                              from    pt1
                              where   pt1.status > bounce1.status
                              and     rownum = 1
                ) v1
        where   bounce1.status is not null
        and     bounce1.namespace is not null
),
bounce2 (status, namespace)
as (
        select  status, namespace
        from    bounce1
        where   bounce1.status is not null
        union all
        select  bounce2.status, (select min(pt1.namespace) namespace from pt1 where pt1.status = bounce2.status and pt1.namespace > bounce2.namespace) namespace
        from    bounce2
        where   bounce2.namespace is not null
        and     bounce2.status is not null
)
select * from bounce2
where
        bounce2.namespace is not null
and     bounce2.status is not null      -- > redundant predicate
order by
        status, namespace
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline'));


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |      1 |        | 16378 (100)|       |       |     10 |00:00:00.58 |    1869 |       |       |          |
|   1 |  SORT ORDER BY                               |                 |      1 |      4 | 16378   (4)|       |       |     10 |00:00:00.58 |    1869 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                                       |                 |      1 |      4 | 16377   (4)|       |       |     10 |00:00:00.58 |    1869 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                 |      1 |        |            |       |       |     12 |00:00:00.58 |    1869 |  1024 |  1024 |          |
|*  4 |     VIEW                                     |                 |      1 |      2 |  8157   (4)|       |       |      2 |00:00:00.58 |    1747 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |       |       |      2 |00:00:00.58 |    1747 |  1024 |  1024 | 2048  (0)|
|*  6 |       VIEW                                   |                 |      1 |      1 |  4047   (4)|       |       |      1 |00:00:00.58 |    1732 |       |       |          |
|*  7 |        WINDOW SORT PUSHED RANK               |                 |      1 |    617K|  4047   (4)|       |       |      1 |00:00:00.58 |    1732 |  2048 |  2048 | 2048  (0)|
|   8 |         PARTITION HASH ALL                   |                 |      1 |    617K|  1759   (2)|     1 |     4 |    617K|00:00:00.34 |    1732 |       |       |          |
|   9 |          INDEX FULL SCAN                     | PT1_I1          |      4 |    617K|  1759   (2)|     1 |     4 |    617K|00:00:00.15 |    1732 |       |       |          |
|  10 |       NESTED LOOPS                           |                 |      2 |      1 |  4110   (4)|       |       |      1 |00:00:00.01 |      15 |       |       |          |
|  11 |        RECURSIVE WITH PUMP                   |                 |      2 |        |            |       |       |      2 |00:00:00.01 |       0 |       |       |          |
|  12 |        VIEW                                  | VW_LAT_1BBF5C63 |      2 |      1 |     9   (0)|       |       |      1 |00:00:00.01 |      15 |       |       |          |
|* 13 |         COUNT STOPKEY                        |                 |      2 |        |            |       |       |      1 |00:00:00.01 |      15 |       |       |          |
|  14 |          PARTITION HASH ALL                  |                 |      2 |      1 |     9   (0)|     1 |     4 |      1 |00:00:00.01 |      15 |       |       |          |
|* 15 |           INDEX RANGE SCAN                   | PT1_I1          |      5 |      1 |     9   (0)|     1 |     4 |      1 |00:00:00.01 |      15 |       |       |          |
|  16 |     SORT AGGREGATE                           |                 |     10 |      1 |            |       |       |     10 |00:00:00.01 |     122 |       |       |          |
|  17 |      PARTITION HASH ALL                      |                 |     10 |      1 |     9   (0)|     1 |     4 |     27 |00:00:00.01 |     122 |       |       |          |
|  18 |       FIRST ROW                              |                 |     40 |      1 |     9   (0)|       |       |     27 |00:00:00.01 |     122 |       |       |          |
|* 19 |        INDEX RANGE SCAN (MIN/MAX)            | PT1_I1          |     40 |      1 |     9   (0)|     1 |     4 |     27 |00:00:00.01 |     122 |       |       |          |
|  20 |     RECURSIVE WITH PUMP                      |                 |     10 |        |            |       |       |     10 |00:00:00.01 |       0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("BOUNCE2"."NAMESPACE" IS NOT NULL AND "BOUNCE2"."STATUS" IS NOT NULL))
   4 - filter("BOUNCE1"."STATUS" IS NOT NULL)
   6 - filter("RN"=1)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "STATUS","NAMESPACE")<=1) 13 - filter(ROWNUM=1) 15 - access("PT1"."STATUS">"BOUNCE1"."STATUS")
  19 - access("PT1"."STATUS"=:B1 AND "PT1"."NAMESPACE">:B2)

In terms of time the query doesn’t seem to have done too badly – but I’m only using a small data set and we can see from the numbers that we haven’t produced an efficient plan. Operations 8 and 9 tell us that we’ve done an index full scan on every single partition before passing the data up for a window sort operation. That’s clearly a bad thing, but we did have an index() hint at that bit of code that worked very well for the simple (global) index so maybe we should have taken that out before testing (except it doesn’t help much to do so since Oracle still scans all 617K rows, changing to an index fast full scan).

Apart from that massive load the rest of the query looks quite efficient. We keep seeing “partition hash all” of course – whatever we do we tend to do it to 4 separate partitions one after the other – but everything else we do looks rather efficient. But there is another problem – and this is where the importance of inserting the rows with status = ‘MISSING’ shows up: this query didn’t find them! We have a predicate “rownum = 1” in the second half of the bounce1 recursive subquery and because we’re using a partitioned index we’ve managed to find a row that looks appropriate in an early partition when the row we really needed doesn’t appear until the last partition.

Let’s return to this problem later – first we want to check if the rest of the query will run efficiently and give us the right answer if we can find some way of getting the starting values; so let’s use a strategy we’ve used before – replace the bounce1 subquery with a union all select from dual:


with bounce1(status, namespace) as (
        select status, namespace
        from    (
                select 'INVALID' status, 1 namespace from dual
                union all
                select 'MISSING', 4 from dual
                union all
                select 'VALID', 1 from dual
        )
),
bounce2 (status, namespace)
as (
        select  status, namespace
        from    bounce1
        where   bounce1.status is not null
        union all
        select  bounce2.status, (select min(pt1.namespace) namespace from pt1 where pt1.status = bounce2.status and pt1.namespace > bounce2.namespace) namespace
        from    bounce2
        where   bounce2.namespace is not null
        and     bounce2.status is not null
)
select * from bounce2
where
        bounce2.namespace is not null
and     bounce2.status is not null      -- > redundant predicate
order by
        status, namespace
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last partition outline'));

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name   | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |        |      1 |        |    76 (100)|       |       |     11 |00:00:00.01 |     132 |       |       |          |
|   1 |  SORT ORDER BY                             |        |      1 |      6 |    76   (2)|       |       |     11 |00:00:00.01 |     132 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                                     |        |      1 |      6 |    75   (0)|       |       |     11 |00:00:00.01 |     132 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|        |      1 |        |            |       |       |     14 |00:00:00.01 |     132 |  1024 |  1024 |          |
|   4 |     VIEW                                   |        |      1 |      3 |     6   (0)|       |       |      3 |00:00:00.01 |       0 |       |       |          |
|   5 |      UNION-ALL                             |        |      1 |        |            |       |       |      3 |00:00:00.01 |       0 |       |       |          |
|   6 |       FAST DUAL                            |        |      1 |      1 |     2   (0)|       |       |      1 |00:00:00.01 |       0 |       |       |          |
|   7 |       FAST DUAL                            |        |      1 |      1 |     2   (0)|       |       |      1 |00:00:00.01 |       0 |       |       |          |
|   8 |       FAST DUAL                            |        |      1 |      1 |     2   (0)|       |       |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |     SORT AGGREGATE                         |        |     11 |      1 |            |       |       |     11 |00:00:00.01 |     132 |       |       |          |
|  10 |      PARTITION HASH ALL                    |        |     11 |      1 |     9   (0)|     1 |     4 |     27 |00:00:00.01 |     132 |       |       |          |
|  11 |       FIRST ROW                            |        |     44 |      1 |     9   (0)|       |       |     27 |00:00:00.01 |     132 |       |       |          |
|* 12 |        INDEX RANGE SCAN (MIN/MAX)          | PT1_I1 |     44 |      1 |     9   (0)|     1 |     4 |     27 |00:00:00.01 |     132 |       |       |          |
|  13 |     RECURSIVE WITH PUMP                    |        |     10 |        |            |       |       |     11 |00:00:00.01 |       0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("BOUNCE2"."NAMESPACE" IS NOT NULL AND "BOUNCE2"."STATUS" IS NOT NULL))
  12 - access("PT1"."STATUS"=:B1 AND "PT1"."NAMESPACE">:B2)

This gets us the right answer, very efficiently. There are only 11 rows in the result set and we have an average 12 buffer visits per row – which is reasonble given that we (probably) have to probe 4 index partitions for every row. So that’s 11 * 4 * 3 buffer visits per probe – which seems just about optimal.

The next step is to figure out a way of getting the (three in our case) starting points while using a partitioned index. Here’s a query we can use for bounce1:


with bounce1(status, namespace) as (
        select
                (select min(status) from pt1) status,
                (select /*+ index(pt1) */ min(namespace) from pt1 where status = (select min(status) from pt1)) namespace
        from
                dual
        union all
        select
                v1.status, v2.namespace
        from    bounce1,
                lateral(
                        (select /*+ index(pt1) */ min(pt1.status) status from pt1 where pt1.status > bounce1.status)
                )       v1,
                lateral(
                        select /*+ index(pt1) */ min(pt1.namespace) namespace
                        from pt1
                        where pt1.status =  (select min(pt2.status) from pt1 pt2 where pt2.status > bounce1.status)
                )       v2
        where
                bounce1.status is not null
        and     bounce1.namespace is not null
)
select * from bounce1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost partition outline'));

It looks a little convoluted with all the inline select statements, but they all do very small amounts of work and they’re only reading the index leaf blocks that you have to read. We know from yesterday’s post that Oracle can execute the scalar subqueries at lines 3 and 4 very efficiently; we can hope (and check) that the lateral() subqueries driven by the single values from the recursive row in bounce1 will operate just as efficiently – and here’s the plan:


----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |      1 |        |   166 (100)|       |       |      4 |00:00:00.01 |     132 |
|   1 |  VIEW                                     |                 |      1 |      2 |   166   (0)|       |       |      4 |00:00:00.01 |     132 |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |       |       |      4 |00:00:00.01 |     132 |
|   3 |    SORT AGGREGATE                         |                 |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|   4 |     PARTITION HASH ALL                    |                 |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|   5 |      INDEX FULL SCAN (MIN/MAX)            | PT1_I1          |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|   6 |    SORT AGGREGATE                         |                 |      1 |      1 |            |       |       |      1 |00:00:00.01 |      24 |
|   7 |     PARTITION HASH ALL                    |                 |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      24 |
|   8 |      FIRST ROW                            |                 |      4 |      1 |     9   (0)|       |       |      4 |00:00:00.01 |      24 |
|*  9 |       INDEX RANGE SCAN (MIN/MAX)          | PT1_I1          |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      24 |
|  10 |        SORT AGGREGATE                     |                 |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|  11 |         PARTITION HASH ALL                |                 |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|  12 |          INDEX FULL SCAN (MIN/MAX)        | PT1_I1          |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|  13 |    FAST DUAL                              |                 |      1 |      1 |     2   (0)|       |       |      1 |00:00:00.01 |       0 |
|  14 |    NESTED LOOPS                           |                 |      4 |      1 |   146   (0)|       |       |      3 |00:00:00.01 |      96 |
|  15 |     NESTED LOOPS                          |                 |      4 |      1 |   137   (0)|       |       |      3 |00:00:00.01 |      36 |
|  16 |      RECURSIVE WITH PUMP                  |                 |      4 |        |            |       |       |      3 |00:00:00.01 |       0 |
|  17 |      VIEW                                 | VW_LAT_C2D92EFA |      3 |      1 |     9   (0)|       |       |      3 |00:00:00.01 |      36 |
|  18 |       SORT AGGREGATE                      |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      36 |
|  19 |        PARTITION HASH ALL                 |                 |      3 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  20 |         FIRST ROW                         |                 |     12 |      1 |     9   (0)|       |       |      8 |00:00:00.01 |      36 |
|* 21 |          INDEX RANGE SCAN (MIN/MAX)       | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  22 |     VIEW                                  | VW_LAT_C2D92EFA |      3 |      1 |     9   (0)|       |       |      3 |00:00:00.01 |      60 |
|  23 |      SORT AGGREGATE                       |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      60 |
|  24 |       PARTITION HASH ALL                  |                 |      3 |      1 |     9   (0)|     1 |     4 |      5 |00:00:00.01 |      60 |
|  25 |        FIRST ROW                          |                 |     12 |      1 |     9   (0)|       |       |      5 |00:00:00.01 |      60 |
|* 26 |         INDEX RANGE SCAN (MIN/MAX)        | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      5 |00:00:00.01 |      60 |
|  27 |          SORT AGGREGATE                   |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      36 |
|  28 |           PARTITION HASH ALL              |                 |      3 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  29 |            FIRST ROW                      |                 |     12 |      1 |     9   (0)|       |       |      8 |00:00:00.01 |      36 |
|* 30 |             INDEX RANGE SCAN (MIN/MAX)    | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - access("STATUS"=)
  21 - access("PT1"."STATUS">"BOUNCE1"."STATUS")
  26 - access("PT1"."STATUS"=)
  30 - access("PT2"."STATUS">:B1)

Although we have done lots of individual probes into the index they have all been very efficient using a min/max access and an average of about 3 buffer visits per probe. So we can now insert this new bounce1 subquery into the previous query in place of the union all of dual and check that the two pieces of the query cooperate.


with bounce1(status, namespace) as (
        select
                (select min(status) from pt1) status,
                (select /*+ index(pt1) */ min(namespace) from pt1 where status = (select min(status) from pt1)) namespace
        from
                dual
        union all
        select
                v1.status, v2.namespace
        from    bounce1,
                lateral(
                        (select /*+ index(pt1) */ min(pt1.status) status from pt1 where pt1.status > bounce1.status)
                )       v1,
                lateral(
                        select /*+ index(pt1) */ min(pt1.namespace) namespace
                        from pt1
                        where pt1.status =  (select min(pt2.status) from pt1 pt2 where pt2.status > bounce1.status)
                )       v2
        where
                bounce1.status is not null
        and     bounce1.namespace is not null
),
bounce2 (status, namespace)
as (
        select  status, namespace from bounce1
        union all
        select  bounce2.status, (select min(t.namespace) namespace from pt1 t where t.namespace > bounce2.namespace and status=bounce2.status) namespace
        from    bounce2
        where   bounce2.status is not null
        and     bounce2.namespace is not null
)
select  *
from    bounce2
where   namespace is not null
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost partition outline'));

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name            | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                 |      1 |        |   396 (100)|       |       |     11 |00:00:00.01 |     266 |
|*  1 |  VIEW                                       |                 |      1 |      4 |   396   (1)|       |       |     11 |00:00:00.01 |     266 |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                 |      1 |        |            |       |       |     15 |00:00:00.01 |     266 |
|   3 |    VIEW                                     |                 |      1 |      2 |   166   (0)|       |       |      4 |00:00:00.01 |     132 |
|   4 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |       |       |      4 |00:00:00.01 |     132 |
|   5 |      SORT AGGREGATE                         |                 |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|   6 |       PARTITION HASH ALL                    |                 |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|   7 |        INDEX FULL SCAN (MIN/MAX)            | PT1_I1          |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|   8 |      SORT AGGREGATE                         |                 |      1 |      1 |            |       |       |      1 |00:00:00.01 |      24 |
|   9 |       PARTITION HASH ALL                    |                 |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      24 |
|  10 |        FIRST ROW                            |                 |      4 |      1 |     9   (0)|       |       |      4 |00:00:00.01 |      24 |
|* 11 |         INDEX RANGE SCAN (MIN/MAX)          | PT1_I1          |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      24 |
|  12 |          SORT AGGREGATE                     |                 |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|  13 |           PARTITION HASH ALL                |                 |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|  14 |            INDEX FULL SCAN (MIN/MAX)        | PT1_I1          |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|  15 |      FAST DUAL                              |                 |      1 |      1 |     2   (0)|       |       |      1 |00:00:00.01 |       0 |
|  16 |      NESTED LOOPS                           |                 |      4 |      1 |   146   (0)|       |       |      3 |00:00:00.01 |      96 |
|  17 |       NESTED LOOPS                          |                 |      4 |      1 |   137   (0)|       |       |      3 |00:00:00.01 |      36 |
|  18 |        RECURSIVE WITH PUMP                  |                 |      4 |        |            |       |       |      3 |00:00:00.01 |       0 |
|  19 |        VIEW                                 | VW_LAT_C2D92EFA |      3 |      1 |     9   (0)|       |       |      3 |00:00:00.01 |      36 |
|  20 |         SORT AGGREGATE                      |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      36 |
|  21 |          PARTITION HASH ALL                 |                 |      3 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  22 |           FIRST ROW                         |                 |     12 |      1 |     9   (0)|       |       |      8 |00:00:00.01 |      36 |
|* 23 |            INDEX RANGE SCAN (MIN/MAX)       | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  24 |       VIEW                                  | VW_LAT_C2D92EFA |      3 |      1 |     9   (0)|       |       |      3 |00:00:00.01 |      60 |
|  25 |        SORT AGGREGATE                       |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      60 |
|  26 |         PARTITION HASH ALL                  |                 |      3 |      1 |     9   (0)|     1 |     4 |      5 |00:00:00.01 |      60 |
|  27 |          FIRST ROW                          |                 |     12 |      1 |     9   (0)|       |       |      5 |00:00:00.01 |      60 |
|* 28 |           INDEX RANGE SCAN (MIN/MAX)        | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      5 |00:00:00.01 |      60 |
|  29 |            SORT AGGREGATE                   |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      36 |
|  30 |             PARTITION HASH ALL              |                 |      3 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  31 |              FIRST ROW                      |                 |     12 |      1 |     9   (0)|       |       |      8 |00:00:00.01 |      36 |
|* 32 |               INDEX RANGE SCAN (MIN/MAX)    | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  33 |    SORT AGGREGATE                           |                 |     11 |      1 |            |       |       |     11 |00:00:00.01 |     134 |
|  34 |     PARTITION HASH ALL                      |                 |     11 |      1 |     9   (0)|     1 |     4 |     27 |00:00:00.01 |     134 |
|  35 |      FIRST ROW                              |                 |     44 |      1 |     9   (0)|       |       |     27 |00:00:00.01 |     134 |
|* 36 |       INDEX RANGE SCAN (MIN/MAX)            | PT1_I1          |     44 |      1 |     9   (0)|     1 |     4 |     27 |00:00:00.01 |     134 |
|  37 |    RECURSIVE WITH PUMP                      |                 |     10 |        |            |       |       |     11 |00:00:00.01 |       0 |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NAMESPACE" IS NOT NULL)
  11 - access("STATUS"=)
  23 - access("PT1"."STATUS">"BOUNCE1"."STATUS")
  28 - access("PT1"."STATUS"=)
  32 - access("PT2"."STATUS">:B1)
  36 - access("STATUS"=:B1 AND "T"."NAMESPACE">:B2)

Job done. We’ve found the distinct set of pairs without having to scan the entire index. We’ve found 11 pairs at a total cost of 266 buffer gets. For comparitive purposes the query totalled 56 buffer visits when I recreated the table as a non-partitioned table (again updating a few rows to status = ‘MISSING’).

It’s important to note that this query can only work this efficiently in 12.2 (and possibly in a suitably patched 11.2.0.4) because of the optimizer’s ability to use the min/max operation for queries like: “select max(col2) where col1 = (select max()…))”. When I ran the final query on 12.1.0.2 the execution plan changed around lines 11 and 28 where 12.2.0.1 could use the aggregate subquery to drive the min/max scan 12.1.0.2 did a real range scan with aggregate (which was extremely expensive at one point).

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name            | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------------------
|  23 |       VIEW                                  | VW_LAT_C2D92EFA |      3 |      1 |  1206   (2)|       |       |      3 |00:00:05.43 |    2414 |
|  24 |        SORT AGGREGATE                       |                 |      3 |      1 |            |       |       |      3 |00:00:05.43 |    2414 |
|  25 |         PARTITION HASH ALL                  |                 |      3 |    422K|  1206   (2)|     1 |     4 |    845K|00:00:05.84 |    2414 |
|* 26 |          INDEX RANGE SCAN                   | PT1_I1          |     12 |    422K|  1206   (2)|     1 |     4 |    845K|00:00:02.03 |    2414 |
|  27 |           SORT AGGREGATE                    |                 |      3 |      1 |            |       |       |      3 |00:00:00.01 |      36 |
|  28 |            PARTITION HASH ALL               |                 |      3 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
|  29 |             FIRST ROW                       |                 |     12 |      1 |     9   (0)|       |       |      8 |00:00:00.01 |      36 |
|* 30 |              INDEX RANGE SCAN (MIN/MAX)     | PT1_I1          |     12 |      1 |     9   (0)|     1 |     4 |      8 |00:00:00.01 |      36 |
------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, this makes a dramatic difference to the work Oracle has to do – in this case 2,414 buffer gets and 845K rows examined. As I said in yestrday’s post – there’s a patch for 11.2.0.4, so there could be a patch for 12.1.0.2 if you ask for it, but it looks like no-one has done so yet.

<h3>Footnote:</h3>

I could have used a lateral() view in the first half of bounce1 to reduce the reported number of probes of pt1_i1 in the plan – but it made the code extremely messy, I had to include a /*+ no_decorrelate */ hint in it, and it increased the number of buffer visits slightly because the optimizer seemed to lose the option for a min/max scan in this particular lateral join.

 

May 31, 2018

Min/Max upgrade

Filed under: 12c,Indexing,Oracle,Partitioning,Performance — Jonathan Lewis @ 2:13 pm BST May 31,2018

Here’s a nice little optimizer enhancement that appeared in 12.2 to make min/max range scans (and full scans) available in more circumstances. Rather than talk through it, here’s a little demonstration:

rem
rem     Script:         122_minmax.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1        Good path
rem             12.1.0.2        Bad path

create table pt1 (
        object_id,
        owner,
        object_type,
        object_name,
        status,
        namespace
)
nologging
partition by hash (object_id) partitions 4
as
select
        object_id,
        owner,
        object_type,
        object_name,
        status,
        namespace
from
        (select * from all_objects),
        (select rownum n1 from dual connect by level <= 10)  -- > comment to avoid format wordpress issue
;

alter table pt1 modify(status not null);

execute dbms_stats.gather_table_stats(null,'pt1',granularity=>'ALL',method_opt=>'for all columns size 1')

create index pt1_i1 on pt1(status, namespace) nologging local;

alter session set statistics_level = all;
set serveroutput off
set linesize 156
set pagesize 60
set trimspool on

select  min(status) from pt1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost partition'));

select  min(namespace) from pt1 where status = 'INVALID';
select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost partition'));

select  min(namespace) from pt1 where status = (select min(status) from pt1);
select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost partition'));

The basic “min/max” optimisation allows Oracle to avoid a massive sort aggregate – Oracle doesn’t need to acquire a lot of data and sort it when it knows that the “left hand” end of an index is the low values and the “right hand” is the high values so, for example, in the first query above the optimizer could simply walk down the index branches to the left hand leaf and look at the single lowest entry in the leaf block to determine the lowest value for status … if the index had been a global index.

Things get a little messy, though, when the index is locally partitioned and your query isn’t about the partition key and there’s no suitable global index. Once upon a time (IIRC) Oracle would simply have to do an index fast full scan across all index partitions to handle such a query, but some time ago it got a lot cleverer and was enhanced to do a min/max scan on each partition in turn getting one value per partition very efficiently, then aggregating across those values to find the global minimum.

Here are the three execution plans (with rowsource execution stats pulled from memory) taken from 12.1.0.2 for the queries above:


-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |      1 |        |     9 (100)|       |       |      1 |00:00:00.01 |      12 |
|   1 |  SORT AGGREGATE             |        |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|   2 |   PARTITION HASH ALL        |        |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|   3 |    INDEX FULL SCAN (MIN/MAX)| PT1_I1 |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
-----------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |      1 |        |     9 (100)|       |       |      1 |00:00:00.01 |      12 |
|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|   2 |   PARTITION HASH ALL          |        |      1 |      1 |     9   (0)|     1 |     4 |      1 |00:00:00.01 |      12 |
|   3 |    FIRST ROW                  |        |      4 |      1 |     9   (0)|       |       |      1 |00:00:00.01 |      12 |
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| PT1_I1 |      4 |      1 |     9   (0)|     1 |     4 |      1 |00:00:00.01 |      12 |
-------------------------------------------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |      1 |        |   337 (100)|       |       |      1 |00:00:00.07 |    2402 |   2242 |
|   1 |  SORT AGGREGATE                |        |      1 |      1 |            |       |       |      1 |00:00:00.07 |    2402 |   2242 |
|   2 |   PARTITION HASH ALL           |        |      1 |    422K|   328  (10)|     1 |     4 |     10 |00:00:00.07 |    2402 |   2242 |
|*  3 |    INDEX FAST FULL SCAN        | PT1_I1 |      4 |    422K|   328  (10)|     1 |     4 |     10 |00:00:00.07 |    2402 |   2242 |
|   4 |     SORT AGGREGATE             |        |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |      0 |
|   5 |      PARTITION HASH ALL        |        |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |      0 |
|   6 |       INDEX FULL SCAN (MIN/MAX)| PT1_I1 |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |      0 |
-----------------------------------------------------------------------------------------------------------------------------------------

In the first plan Oracle has done an “index full scan (min/max)” across each of the four partitions in turn to return one row very cheaply from each, then aggregated to find the overall minimum.

In the second plan Oracle has done an “index range scan (min/max)” in exactly the same way, since it was able to find the start point in the index for the status ‘INVALID’ very efficiently.

In the third plan Oracle has been able to find the minimum value for the status (‘INVALID’) very efficiently in the subquery, and has passed that single value up to the main query, which has then used a brute force approach to search the whole of every partition of the index for every occurrence (all 10 of them) of the value ‘INVALID’ and then aggregated them to find the minimum namespace. Despite “knowing”, by the time the main query runs, that there will be a single value to probe for the status, the optimizer has not anticipated the fact that the final query will effectively become the same as the preceding one. As a result we’ve read 2,242 data blocks into the cache.

Turn, then, to the execution plan from 12.2.0.1 for this last query:


---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |        |      1 |        |     9 (100)|       |       |      1 |00:00:00.01 |      24 |
|   1 |  SORT AGGREGATE                 |        |      1 |      1 |            |       |       |      1 |00:00:00.01 |      24 |
|   2 |   PARTITION HASH ALL            |        |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      24 |
|   3 |    FIRST ROW                    |        |      4 |      1 |     9   (0)|       |       |      4 |00:00:00.01 |      24 |
|*  4 |     INDEX RANGE SCAN (MIN/MAX)  | PT1_I1 |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      24 |
|   5 |      SORT AGGREGATE             |        |      1 |      1 |            |       |       |      1 |00:00:00.01 |      12 |
|   6 |       PARTITION HASH ALL        |        |      1 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
|   7 |        INDEX FULL SCAN (MIN/MAX)| PT1_I1 |      4 |      1 |     9   (0)|     1 |     4 |      4 |00:00:00.01 |      12 |
---------------------------------------------------------------------------------------------------------------------------------

In 12.2 you can see that the main query is now doing an “index range scan (min/max)” on each index partition in turn, based on the incoming (though unknown at parse time) single value from the subquery. As a result the total work done is a mere 24 buffer visits.

There have been a couple of occasions in the past where I’ve had to write some PL/SQL to work around little details like this. It’s nice to know simple tables and partitoned tables with local indexes can now behave the same way. I also wonder whether there may be sites that could drop (or drop columns from, or make local) some indexes that they’ve previously created to  handle queries of the “most recent occurence” type.

If, for any reason, you need to disable this enhancement, it’s controlled by fix_control (v$system_fix_control) “18915345 Allow MIN/MAX optimization for pred having single row subquery” which can be set in the startup file, at the system level, or in the session.

Update

Checking MoS for the bug number I found that the limitation had been reported for 11.2.0.3, with “Fixed in product version” reported as 12.2; but there are patches for various releases of 11.2.0.4, though none yet for 12.1.0.2 – but if you think you need it you can always try raising an SR.

 

Next Page »

Powered by WordPress.com.