Oracle Scratchpad

May 14, 2014

Feature Bypass

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 1:23 pm GMT May 14,2014

Here’s a little tip that might be helpful occasionally when you’re trying to work out why the optimizer transformation you were expecting isn’t appearing

If you’ve ever checked the 10053 trace (and who wants to do that for a complex query) you may have noticed lines like:

SU: SU bypassed: Remote table referenced.

So now you know that SU – Subquery Unnesting – has limitations in distributed queries.

When I first saw a line like this, it crossed my mind that it would be useful to keep a reference list of features that could be reported as bypassed, which I do through a simple unix line:

strings -a oracle | grep -i bypassed > bypassed.txt

If you need a reference for the various short codes for transformations you can find it near the top of the 10053 trace, looking like this:

[sourecode gutter=”false”]
CBQT – cost-based query transformation
JPPD – join predicate push-down
OJPPD – old-style (non-cost-based) JPPD
FPD – filter push-down
PM – predicate move-around
CVM – complex view merging
SPJ – select-project-join
SJC – set join conversion
SU – subquery unnesting
OBYE – order by elimination
OST – old style star transformation
ST – new (cbqt) star transformation
CNT – count(col) to count(*) transformation
JE – Join Elimination
JF – join factorization
SLP – select list pruning
DP – distinct placement
… big gap here
AP – adaptive plans
[/sourcecode]

May 2, 2014

Costing Bug

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 8:53 am GMT May 2,2014

It’s amazing how you can find little bugs (or anomalies) as soon as you start to look closely at how things work in Oracle. I started to write an article for All Things Oracle last night about execution plans with subqueries, so wrote a little script to generate some sample data, set up the first sample query, checked the execution plan, and stopped because the final cost didn’t make sense. Before going on I should point out that this probably doesn’t matter and probably wouldn’t cause a change in the execution plan if the calculation were corrected – but it is just an interesting indication of the odd things that can happen when sections of modular code are combined in an open-ended way. Here’s the query (running on 11.2.0.4) with execution plan:


update t1 set 
	n1 = (
		select	max(mod100)
		from	t2
		where	t2.id = t1.id
	),
	n2 = (
		select	max(trunc100)
		from	t3
		where	t3.id = t1.id
	)
where
	id between 101 and 200
;

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |       |   101 |  1212 |   812  (25)| 00:00:05 |
|   1 |  UPDATE                       | T1    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN            | T1_I1 |   101 |  1212 |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   4 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN (MIN/MAX)| T2_I1 |     1 |     7 |     3   (0)| 00:00:01 |
|   6 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   7 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN (MIN/MAX)| T3_I1 |     1 |     7 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=101 AND "ID"<=200)
   5 - access("T2"."ID"=:B1)
   8 - access("T3"."ID"=:B1)

So the cost of running each of the subqueries is 3 – there are two of them, and we expect to run each of the 101 times: for a total cost of 606. So how do we get to 812 as the total cost of the query ?

Further testing:

  • the cost of the plan for updating the two columns with constants is just 4.
  • rebuild the indexes with different values for pctfree to see how the cost changes
  • vary the number of columns updated by subquery
  • check the 10053 trace – for issues or presentation vs. rounding, particularly

Ultimately I decided that for each column updated by subquery the optimizer added 1 to the cost of accessing the table for each row; or, to view it another way, the optimizer used “sum(subquery costs + 1) * number of rows to be updated” so (4 + 4) * 101 + a little bit for the driving table access =  812. This doesn’t seem entirely reasonable – given that a cost is essentially equivalent to assuming that a single block visit is a disk read when we know that when we update multiple columns of the same row we need only read the block into memory at most once. As I said at the start, though this anomaly in costing probably doesn’t matter – there are no further steps to be taken after the update so there’s nothing the optimizer might do differently if the cost of the update had been calculated as 612 rather then 812.

Footnote:

If you want to play about with this query, here’s the code to create the tables – with one proviso, the plan above happens to be one I produced after rebuilding the indexes on t2 and t3 with pctfree 99


create table t1
as
with generator as (
	select  --+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum				id,	
	mod(rownum-1,100)		mod100,
	trunc((rownum - 1)/100)		trunc100,
	rownum				n1,
	rownum				n2,
	lpad(rownum,6,'0')		vc1,
	rpad('x',100)			padding
from
	generator
where
	rownum <= 10000
;

create table t2 as select * from t1;
create table t3 as select * from t1;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id,mod100);
create index t3_i1 on t3(id,trunc100);

begin
	dbms_stats.gather_table_stats(user,'t1');
	dbms_stats.gather_table_stats(user,'t2');
	dbms_stats.gather_table_stats(user,'t3');
end;
/

April 4, 2014

NVL() change

Filed under: CBO,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 6:10 pm GMT Apr 4,2014

One of the problems of functions is that the optimizer generally doesn’t have any idea on how a predicate based on function(col) might affect the cardinality. However,  the optimizer group are constantly refining the algorithms to cover an increasing number of special cases more accurately. This is a good thing, of course – but it does mean that you might be unlucky on an upgrade where a better cardinality estimate leads to a less efficient execution plan. Consider for example the simple query (where d1 is column of type date):

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate

Now, there are many cases in many versions of Oracle, where the optimizer will appear to calculate the cardinality of

nvl(columnX,{constant}) operator {constant}

as if it were:

columnX is null or columnX operator {constant}

Unfortunately this doesn’t seem to be one of them – until you get to 11.2.something. Here’s a little demonstration code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	case
		when rownum > 100 then null else sysdate - rownum
	end	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	query with NVL

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
;

prompt	query with OR clause

select	*
from	t1
where	d1 is null or d1 < sysdate
;

If you run this code in 11.1.0.7 you get the following – with numeric variations for cost (which I’m interested not in at the moment):


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2500 |  2500 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

Take note particularly of the difference in the estimated cardinality for the tablescans.

When you upgrade to 11.2.0.4 (possibly earlier – though there are some nvl() related patches that appeared only in 11.2.0.4), you get this:


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49900 | 49900 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

As you can see the estimate for the “NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)”.

It’s interesting to note that a (relatively) small error has crept in to the “OR” example – interestingly the size of the error is exactly the number of rows where d1 is not null (which looks like enough of a coincidence to be a bug – but maybe there’s a good rationale for it)

Conclusion

Yet again, a simple upgrade has the capacity to make a dramatic change to a cardinality estimate – which could mean a significant change to an execution plan and major change in performance. If you’ve read this note, though, you may recognise the pattern that is the driving feature behind the problem.

Footnote:

If you have access to versions 11.2.0.1 through 11.2.0.3 and find that the test data produces different cardinalities please publish the results in the comments – it would be nice to know exactly when this change appears.  (You need only show the body of the execution plans  with labels, not the entire output of the script).

 

March 7, 2014

Subquery Anomaly

Filed under: Bugs,CBO,Execution plans,Oracle,Upgrades — Jonathan Lewis @ 8:57 am GMT Mar 7,2014

Here’s an oddity that appeared on the OTN database forum last night:

We have this query in our application which works fine in 9i but fails in 11gR2 (on Exadata) giving an “ORA-00937: not a single-group group function” error….

… The subquery is selecting a column and it doesn’t have a group by clause at all. I am not sure how is this even working in 9i. I always thought that on a simple query using an aggregate function (without any analytic functions / clause), we cannot select a column without having that column in the group by clause. So, how 11g behaves was not a surprise but surprised to see how 9i behaves. Can someone explain this behaviour?

The poster supplied the suspect query, and it certainly looked as if it should never have worked – but I took a guess that the optimizer was doing some sort of transformation that concealed the problem before the optimizer managed to see the error. The subquery was a little odd because it was doing something it didn’t need to do, and my was guess that the optimizer had recognised the option to simplify the query and the simplification had “accidentally” removed the error. This turned out to be correct, but my guess about exactly what had happened to hide the error was wrong.

Having created a hypothesis I couldn’t resist checking it this morning, so here’s the test case (don’t pay any attention to the actual data I’ve generated, it was a cut-n-paste from a script that I had previously used for something completely different):

create table t1
as
select
	trunc((rownum-1)/15)	n1,
	trunc((rownum-1)/15)	n2,
	rpad(rownum,180)	v1
from all_objects
where rownum <= 3000
;

create table t2
as
select
	mod(rownum,200)		n1,
	mod(rownum,200)		n2,
	rpad(rownum,180)	v1
from all_objects
where rownum <= 3000
;

begin
 	dbms_stats.gather_table_stats(
 		user,
 		't1',
 		method_opt => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		user,
		't2',
		method_opt => 'for all columns size 1'
	);
end;
/

explain plan for
select
	/*+ qb_name(main) */
	*
from t1
where (n2,n1) in (
	select /*+
			qb_name(subq)
			unnest
		*/
		max(t2.n2), t2.n1
	from t2
	where t2.n1 = t1.n1
)
;

You’ll notice, of course, that I don’t have a group by clause at all, so the presence of the t2.n1 in the select list should lead to Oracle error: “ORA-00937: not a single-group group function”.

In versions from 8i to 11.1.0.7, this query could run, and its execution plan looked looked like this:


----------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |   200 | 45200 |    46 |
|*  1 |  HASH JOIN           |         |   200 | 45200 |    46 |
|   2 |   VIEW               | VW_SQ_1 |   200 |  7800 |    31 |
|   3 |    HASH GROUP BY     |         |   200 |  2400 |    31 |
|   4 |     TABLE ACCESS FULL| T2      |  3000 | 36000 |    14 |
|   5 |   TABLE ACCESS FULL  | T1      |  3000 |   547K|    14 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("N2"="MAX(T2.N2)" AND "N1"="N1" AND "ITEM_1"="T1"."N1")

Notice how the optimizer has produced an inline view (VW_SQ_1) from the subquery, using it to drive a hash join; notice how that inline view has an aggregation operation (HASH GROUP BY) in it. In effect the optimizer has rewritten my query like this:

select
	t1.*
from	(
		select
			distinct max(t2.n2) max_n2, t2.n1 item_1, t2.n1
		from	t2
		group by
			t2.n1
	)	vw_sq_1,
	t1
where
	t1.n2 = vw_sq_1.max_n2
and	t1.n1 = vw_sq_1.n1
and	t1.n1 = vw_sq_1.item_1
;

There’s a clue about why this succeeded in the 10053 trace file, which includes the lines:

"Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#2)
SU:   Passed validity checks.

Compared to the 11.2 lines:

Subquery Unnesting on query block MAIN (#1)SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block MAIN (#1).
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

Whatever check it was that Oracle introduced in 11.2 (maybe a check that the query block was inherently legal), unnesting failed – and if I add an /*+ no_unnest */ hint to the original subquery in the earlier versions of Oracle I get the expected ORA-00937.

The philosophical argument is left to the reader: was the original behaviour a bug, or is the new behaviour the bug ?

 

February 28, 2014

Empty Hash

Filed under: Bugs,CBO,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 6:45 pm GMT Feb 28,2014

A little while ago I highlighted a special case with the MINUS operator (that one of the commentators extended to include the INTERSECT operator) relating to the way the second subquery would take place even if the first subquery produced no rows. I’ve since had an email from an Oracle employee letting me know that the developers looked at this case and decided that it wasn’t feasible to address it because – taking a wider view point – if the query were to run parallel they would need a mechanism that allowed some synchronisation between slaves so that every slave could find out that none of the slaves had received no rows from the first subquery, and this was going to lead to hanging problems.

The email reminded me that there’s another issue of the same kind that I discovered several years ago – I thought I’d written it up, but maybe it was on a newsgroup or forum somewhere, I can’t find it on my blog or old website). The problem can be demonstrated by this example:

(more…)

February 26, 2014

Predicate Order

Filed under: Bugs,CBO,Execution plans,Oracle — Jonathan Lewis @ 8:14 am GMT Feb 26,2014

Common internet question: does the order of predicates in the where clause make a difference.
General answer: It shouldn’t, but sometimes it will thanks to defects in the optimizer.

There’s a nicely presented example on the OTN database forum where predicate order does matter (between 10.1.x.x and 11.1.0.7). Note particularly – there’s a script to recreate the issue; note, also, the significance of the predicate section of the execution plan.
It’s bug 6782665, fixed in 11.2.0.1

January 3, 2014

Index Hash

Filed under: Bugs,CBO,Hints,Ignoring Hints,Index Joins,Indexing,Oracle — Jonathan Lewis @ 6:56 pm GMT Jan 3,2014

I’m afraid this is one of my bad puns again – an example of the optimizer  making a real hash of the index hash join. I’m going to create a table with several indexes (some of them rather similar to each other) and execute a query that should do an index join between the obvious two indexes. To show how obvious the join should be I’m going to start with a couple of queries that show the cost of simple index fast full scans.

Here’s the data generating code:

(more…)

December 8, 2013

12c Subqueries

Filed under: 12c,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:32 am GMT Dec 8,2013

When you upgrade you often find that some little detail (of the optimizer) that didn’t receive a lot of attention in the “New Features” manuals introduces a few dramatic changes in execution plans. Here’s one example of a detail that is likely to catch a few unlucky people. We start with a very simple table which is just and id column with some padding, and then show the effect of a change in the handling of “constant subqueries”. Here’s my data set:

(more…)

November 4, 2013

Outline hassle

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 6:17 pm GMT Nov 4,2013

Here’s the output I got from a 10.2.0.5 system after generating a stored outline on a query – then dropping the index that was referenced by the stored outline and creating an alternative index. Spot the problem:

(more…)

October 2, 2013

FBI decode

Filed under: Bugs,CBO,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 8:26 am GMT Oct 2,2013

It probably won’t surprise many people to hear me say that the decode() function can be a bit of a nuisance; and I’ll bet that quite a lot of people have had trouble occasionally trying to get function-based indexes that use this function to behave properly. So (to put it all together and support the general directives that case is probably a better choice than decode() and that the cast() operator is an important thing to learn) here’s an example of how function-based indexes don’t always allow you to work around bad design/code. (Note: this is a model of a problem I picked up at a client site, stripped to a minimum – you have to pretend that I’m not allowed to fix the problem by changing code).

(more…)

September 27, 2013

Virtual Stats

Filed under: CBO,Execution plans,Oracle,Statistics — Jonathan Lewis @ 6:49 am GMT Sep 27,2013

Or – to be more accurate – real statistics on a virtual column.

This is one of the “10 top tips” that I came up with for my session with Maria Colgan at OOW13. A method of giving more information that might improve execution plans when you can’t change the code. I’ll start with a small data set including a virtual column (running 11.1.0.7), and a couple of problem queries:
(more…)

September 25, 2013

Extended Stats

Filed under: CBO,Oracle,Statistics — Jonathan Lewis @ 5:06 pm GMT Sep 25,2013

Here’s a little demo cut-n-pasted from a session running Oracle 12.1.0.1 (it works on 11g, too). All it does is create a table by copying from a well-known table, gather extended stats on a column group, then show you the resulting column names by querying view user_tab_cols.

(more…)

September 7, 2013

Hash Joins

Filed under: CBO,Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 12:53 pm GMT Sep 7,2013

I’ve written notes about the different join mechanisms in the past – but such things are always worth revisiting, so here’s an accumulated bundle of comments about hash joins.

A hash join takes two inputs that (in most of the Oracle literature) are referred to as the “build table” and the “probe table”. These rowsources may be extracts from real tables or indexes, or might be result sets from previous joins. Oracle uses the “build table” to build a hash table in memory, consuming and using the rowsource in a single call; it then consumes the “probe table” one row at a time, probing the in-memory hash table to find a match.

Access to the hash table is made efficient by use of a hashing function applied to the join columns – rows with the same value on the join column end up hashing to the same place in the hash table. It is possible for different input values to produce the same hash value (a hash collision) so Oracle still has to check the actual values once it has identified “probable” joins in the hash table. Because the comparison is based on a hashing mechanism, hash joins can only be used for join predicates that are equality predicates.
(more…)

August 15, 2013

MV Refresh

Filed under: Bugs,CBO,Infrastructure,Oracle,Statistics — Jonathan Lewis @ 6:12 pm GMT Aug 15,2013

Here’s a funny little problem I came across some time ago when setting up some materialized views. I have two tables, orders and order_lines, and I’ve set up materialized view logs for them that allow a join materialized view (called orders_join) to be fast refreshable. Watch what happens if I refresh this view just before gathering stats on the order_lines table.

(more…)

August 13, 2013

NVL()

Filed under: CBO,Conditional SQL,Execution plans,NULL,Oracle — Jonathan Lewis @ 7:14 am GMT Aug 13,2013

Here’s a little detail about how the optimizer can handle the nvl() function that I hadn’t noticed before (and it goes back to at least 8i). This is running on 11.2.0.3, and table t1 is just all_objects where rownum <= 20000:

(more…)

« Previous PageNext Page »

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,429 other followers