Oracle Scratchpad

December 9, 2014

Parse Time

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 11:53 pm GMT Dec 9,2014

Here’s a little query I wrote some time ago to see where my time went while running a query. It’s nothing sophisticated, just one of those simple things you can do with v$active_session_history (or dba_hist_active_sess_history, if you don’t get to the crime scene in time).


set null CPU

select
        sql_exec_id, in_parse, in_hard_parse, event, count(*)
from
        v$active_session_history
where
        sql_id = '{your choice here}'
group by
        sql_exec_id, in_parse, in_hard_parse, event
order by
        sql_exec_id, in_parse, in_hard_parse, count(*)
;

SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
   40341649 N N db file sequential read                                68
   40341649 N N CPU                                                    21
   40341649 N N db file scattered read                                  9
            Y Y CPU                                                     7

I had run the query that I was tracking exactly once, but my ASH query allows for, and separates, multiple executions of the same query by summing on sql_exec_id (the thing that the SQL Monitor also uses). The last row looks a little odd, though: it does’t have a value for sql_exec_id; that’s because those are ASH samples when the query is being optimized, not being executed – note that I’ve reported the columns in_parse and in_hard_parse – and both are set to “Y” for that row.

So  (statistically speaking) it’s probably taken about 7 CPU seconds for Oracle to optimise the statement, and from the rest of the results you can see that it’s taken about 21 CPU seconds to run, with 68 seconds spent on random I/Os and 9 seconds spent on multiblock reads for a total of 103 seconds elapsed.

Seven seconds sounds like quite a lot of time for parsing – but it was a fairly complex statement. However, the reason I’d been running the statement on a test system (a fairly good clone of production) was that I’d been seeing something stranger on production and I needed to get a baseline on the test system before I starting trying to fix the problem. Here’s the equivalent ASH reports for the same statement when it had run on production at a time that allowed me to capture its ASH samples.


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  514257929 N N CPU                                                    21
  514257929 Y Y latch: row cache objects                                1
            Y Y CPU                                                   119

Note the 119 CPU seconds spent parsing to run a 22 second query ! But that wasn’t the worst of it – sometimes the results looked more like this:


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  523748347 N N db file sequential read                                 1
  523748347 N N CPU                                                    32
            Y Y resmgr:cpu quantum                                     11
            Y Y latch: row cache objects                               38
            Y Y CPU                                                   415

That’s a pretty catastrophic optimsation time – especially since the statement can be optimised in seven seconds in another environment. You might note the resource manager kicking in there, the session is exceeding the CPU limit set for its resource group – though not very often given how infrequently it seems to be waiting on “resmgr:cpu quantum”. But there’s another important wrinkle to this report – which you can see when compare v$active_session_history with v$sql.


  1  select
  2     sql_id,
  3     round(cpu_time/1000000,2) cpu,
  4     round(elapsed_time/1000000,2) ela from v$sql
  5  where
  6     sql_text like '{some identifying text}'
  7* and        sql_text not like '%v$sql%'
SQL> /

SQL_ID               CPU        ELA
------------- ---------- ----------
2atyuc3vtpswy     285.81     516.13

The figures from v$sql don’t match very well with the summed results from ASH which has a total sample of 497 seconds and a CPU sample of 447 seconds. I think I can live with a statistical error of 4% ((516-497)/516) in a random sample for total time, but how do you explain the 36% error in the CPU time ?

The samples reporting “resmgr:cpu quantum” are a clue: the machine is overloaded; it’s trying to use far more CPU time than is available. As a result a process that gets pushed off the CPU by the operating system scheduler while it’s running can spend a long time in the run queue waiting to start running again. And if it’s an Oracle process that got pre-empted it doesn’t “know” that it’s not running, it didn’t put itself into a wait state so all it “knows” is that it’s not in a wait state.

So how do ASH and v$sql differ ? The code that derives the cpu_time for v$sql issues a call to the O/S asking “how much CPU have I used”. The code that takes an ASH sample says: “is this session active, if so is it in a wait state and if it’s not in a wait state then it’s either on the CPU or in the run queue waiting for the CPU”. So when we compare v$sql with ASH the difference in CPU is (statistically speaking) time spent in the run queue. So of our 447 seconds of CPU recorded by ASH, we spent 161 seconds in the CPU run queue waiting for CPU.

We still have to account for the difference between the 7 CPU seconds on a test system and the variation between 119 CPU seconds and 415 CPU seconds in optimisation on production. In a word – concurrency. Apart from everything else going on at the time there were, in the worst case, 16 slightly different versions of the same statement being called at the same time (on a machine with 24 cores) – all 16 statement were competing violently for the same resources at the same time, and the escalating conflict as more session joined in produced an exponential growth in time spent competing for resources rather than doing the job. (I managed to demonstrate the effect quite nicely during the evening by limiting the batch to 4 concurrent executions – and got a typical parse time of 40 CPU seconds).

I’ve often warned people about the problems of concurrency and encouraged them to think about how much time is being spent in competition rather then doing the job; I think this is the most extreme case I’ve seen in a production system. Given how dramatic the variation is, I can’t help wondering if the problem has been exaggerated by some corner case of sessions spinning for mutexes or latches; perhaps even an error in the code that allows resource management to put a session into “resmgr:cpu quantum” while it’s holding a latch or mutex. (I wasn’t able to emulate such an extreme display of the problem on a slightly newer version of Oracle, but I was able to construct a test that demonstrated the effect with a much smaller wastage of CPU.)

The solution (almost certainly): the statements are extremely similar, varying in just one predicate that is using a literal constant. It ought to be a relatively safe and simple change to make the query use a bind variable in that predicate. If the solution is adopted I’d expect to see the (once only) parse time on production drop back to about 7 seconds. Of course, if any other session tries to call the same statement at the same time it ought to end up reporting 7 seconds waiting on “cursor: pin S wait on X” before it starts executing – but that 7 seconds wait is a lot better than an extra 493 CPU seconds trying to optimise the “same” statement at the same time.

Footnote:

Running a minor variation on my ASH query to report the sql_plan_hash_value along with the sql_exec_id, I found that the “fix-up” code that updates older ASH rows with information that only becomes available later (e.g. plan_hash_value when optimising, or long time_waited values for a single wait event) only goes back 255 rows – so when I queried ASH for the statements that took 500 seconds to optimizer only 255 of the in_parse rows showed the final sql_plan_hash_value.

 

November 28, 2014

Line Numbers

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:49 pm GMT Nov 28,2014

One of the presentations I went to at the DOAG conference earlier on this month was called “PL/SQL Tuning, finding the perf. bottleneck with hierarchical profiler” by Radu Parvu from Finland. If you do a lot of PL/SQL programming and haven’t noticed the dbms_hprof package yet make sure you take a good look at it.

A peripheral question that came up at the end of the session asked about problems with line numbers in pl/sql procedures; why, when you get a run-time error, does the reported line number sometimes look wrong, and how do you find the right line. I can answer (or give at least one reason for) the first part, but not the second part; Julian Dontcheff had an answer for the second bit, but unfortunately I failed to take a note of it.

Here’s the SQL to create, run and list a very simple (and silly) procedure.


define m_value = 3
set timing on

create or replace procedure silly
as
        m_n number;
begin
        for i in 1..1000000 loop
                m_n := exp(sqrt(ln(&m_value)));
        end loop;
end;
/

execute silly

list

Here’s the output I got from running this, and the thing I want you to note is the time it takes to run, and the line number reported for the assignment:


Procedure created.

Elapsed: 00:00:00.01

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
  1  create or replace procedure silly
  2  as
  3          m_n number;
  4  begin
  5          for i in 1..1000000 loop
  6                  m_n := exp(sqrt(ln(&m_value)));
  7          end loop;
  8* end;

It seems to take my instance o.oo seconds to perform 1,000,000 evaluations of the exponent of the square root of the natural logarithm of 3 at line 6 of the code. But let’s make two changes; first, let’s try that with the value -1 (which is a little careless if you know your logarithms).


Procedure created.

Elapsed: 00:00:00.02
BEGIN silly; END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "TEST_USER.SILLY", line 5
ORA-06512: at line 1

Notice that the 6502 error is reported at line 5, not line 6.

Now let’s go back to the value 3, but start the script with the command: alter session set plsql_optimize_level = 0; (the default level is 2, the range is 0 to 3):


Session altered.

Elapsed: 00:00:00.00

Procedure created.

Elapsed: 00:00:00.01
BEGIN silly; END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "TEST_USER.SILLY", line 6
ORA-06512: at line 1

Reducing the pl/sql optimisation level to zero (or one) results in the the error message reporting the problem at line 6 – which matches our original code. So let’s check the effect of running the code at level zero with a valid number for the input.


Session altered.

Elapsed: 00:00:00.00

Procedure created.

Elapsed: 00:00:00.02

PL/SQL procedure successfully completed.

Elapsed: 00:00:24.56


The run time for our pl/sql call has gone up from 0.00 seconds to 24.56 seconds.

Like all good compilers the pl/sql compiler had recognised (at level 2) that we were assigning a constant inside a loop, so it had (effectively) rewritten our code to move the assignment outside the loop, effectively swapping lines 6 and 5, storing the change in the “object” level code, but not in the database copy of the source. The benefit we get (from the default settings) is a saving of 25 seconds of CPU, the (small) penalty we pay is that the lines reported for run-time errors aren’t always going to identify the text that caused the problem.

November 23, 2014

Baselines

Filed under: CBO,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 8:58 pm GMT Nov 23,2014

I’m not very keen on bending the rules on production systems, I’d prefer to do things that look as if they could have happened in a completely legal fashion, but sometimes it’s necessary to abuse the system and here’s an example to demonstrate the point. I’ve got a simple SQL statement consisting of nothing more than an eight table join where the optimizer (on the various versions I’ve tested, including 12c) examines 5,040 join orders (even though _optimizer_max_permutations is set to the default of 2,000 – and that might come as a little surprise if you thought you knew what that parameter was supposed to do):

select
	/*+ star */
	d1.p1, d2.p2, d3.p3,
	d4.p4, d5.p5, d6.p6,
	d7.p7,
	f.small_vc
from
	dim1		d1,
	dim2		d2,
	dim3		d3,
	dim4		d4,
	dim5		d5,
	dim6		d6,
	dim7		d7,
	fact_tab	f
where
	d1.v1 = 'abc'
and	d2.v2 = 'def'
and	d3.v3 = 'ghi'
and	d4.v4 = 'ghi'
and	d5.v5 = 'ghi'
and	d6.v6 = 'ghi'
and	d7.v7 = 'ghi'
and	f.id1 = d1.id
and	f.id2 = d2.id
and	f.id3 = d3.id
and	f.id4 = d4.id
and	f.id5 = d5.id
and	f.id6 = d6.id
and	f.id7 = d7.id
;

It’s useful to have such extreme examples because they make it easy to notice certain features of Oracle’s behaviour – in this case the difference between SQL Plan Baselines and SQL Profiles. After I had created a baseline for this statement the optimizer still examined 5,040 join orders because that’s the way that baselines work – the optimizer first tries to optimize the statement without the baseline in case it can produce a better plan (for future evaluation and evolution) than the plan dictated by the baseline.

In my example this wasn’t really a problem (apart from the memory requirement in the SGA) but one of my clients has a production query that takes 7 seconds to optimize then runs in a few seconds more, so I’d like to get rid of that 7 seconds … without touching the code. Adding a baseline won’t reduce the time to optimize. (Note: 7 seconds may not seem like a lot, but when several copies of this query are executed concurrently using a different literal value as an input, that 7 seconds can go up to anything between 40 and 500 seconds of CPU parse time per query).

If I take a different tack and write some code to acquire the content of the SQL Plan Baseline (viz. the outline section from the in-memory execution plan) but store it as an SQL Profile the optimizer simply follows the embedded hints and examines just one join order (because the set of hints includes a leading() hint specifying exactly the join order required). This is why I will, occasionally, take advantage of the code that Kerry Osborne and Randolf Geist produced some time ago to raid the library cache for a child cursor and store its plan outline as an SQL profile.

Footnote:

This dirty trick doesn’t always solve the problem – the first example I tried to use for my demonstration did a complex concatenation that took a couple of minutes to optimise, and storing the baseline as a profile didn’t help.

 

 

November 17, 2014

Plan puzzle

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 1:43 pm GMT Nov 17,2014

I was in Munich a few weeks ago running a course on Designing Optimal SQL and Troubleshooting and Tuning, but just before I flew in to Munich one of the attendees emailed me with an example of a statement that behaved a little strangely and asked me if we could look at it during the course.  It displays an odd little feature, and I thought it might be interesting to write up what I did to find out what was going on. We’ll start with the problem query and execution plan:


select     s section_size,
           max(program_id) ,avg(program_id)
from       fullclones_test
cross join  (select distinct section_size s from fullclones_test)
where      section_size = (select min(section_size) from fullclones_test)
and        clone_size >= s
group by   s
order by   1; 

Since I found this a little difficult to follow (with repetitions of the same table name, and column aliases switching back and forth) I did a little cosmetic modification; all I’ve done is add table aliases and query block names, then arranged the text for my visual benefit. The aliases and query block names can help when dissecting the anomaly.


select
	/*+ qb_name(main) */
	max(ft1.program_id),
	avg(ft1.program_id),
	ftv.s	section_size
from
	fullclones_test	ft1
cross join
	(
	select	/*+ qb_name(inline) */
		distinct ft2.section_size s
	from	fullclones_test	ft2
	)	ftv
where	ft1.section_size = (
		select 	/*+ qb_name(subq) */
			min(section_size)
		from	fullclones_test	ft3
	)
and	ft1.clone_size >= ftv.s
group by
	ftv.s
order by
	ftv.s
; 

This query ran reasonably quickly (about half a second), producing the following execution plan:


--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                 |      1 |        |   404 (100)|      4 |00:00:00.05 |    1116 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |                 |      1 |      5 |   404   (6)|      4 |00:00:00.05 |    1116 |       |       |          |
|   2 |   MERGE JOIN           |                 |      1 |  48637 |   299   (7)|  50361 |00:00:00.58 |    1116 |       |       |          |
|   3 |    SORT JOIN           |                 |      1 |      5 |   114  (11)|      5 |00:00:00.02 |     372 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW               |                 |      1 |      5 |   114  (11)|      5 |00:00:00.02 |     372 |       |       |          |
|   5 |      HASH UNIQUE       |                 |      1 |      5 |   114  (11)|      5 |00:00:00.02 |     372 |  5894K|  3254K|  884K (0)|
|   6 |       TABLE ACCESS FULL| FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.31 |     372 |       |       |          |
|*  7 |    SORT JOIN           |                 |      5 |  20000 |   185   (4)|  50361 |00:00:00.16 |     744 |   619K|   472K|  550K (0)|
|*  8 |     TABLE ACCESS FULL  | FULLCLONES_TEST |      1 |  20000 |   106   (4)|  20076 |00:00:00.09 |     744 |       |       |          |
|   9 |      SORT AGGREGATE    |                 |      1 |      1 |            |      1 |00:00:00.01 |     372 |       |       |          |
|  10 |       TABLE ACCESS FULL| FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.29 |     372 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$071BB01A
   4 - INLINE       / FTV@SEL$1
   5 - INLINE   
   6 - INLINE       / FT2@INLINE   
   8 - SEL$071BB01A / FT1@SEL$1
   9 - SUBQ 
  10 - SUBQ         / FT3@SUBQ

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("FT1"."CLONE_SIZE">="FTV"."S")
       filter("FT1"."CLONE_SIZE">="FTV"."S")
   8 - filter("FT1"."SECTION_SIZE"=)

As you can see by comparing the block name / object alias information, we can identify a single full tablescan being executed at line 9 to produce the min(section_size) in the subquery.

We can also see that the “select distinct” executes at lines 5/6 to produce 5 rows which are then joined with a merge join to the first full tablescan of t1.

If you’re wondering about the appearance of a sel$1 despite my efforts to name every query block, that’s the (almost) inevitable side effect of using ANSI syntax – virtually every join after the first two tables will introduce a new (unnameable) query block to introduce the next table.

Now here’s the anomaly: if we eliminate the avg() from the select list we’re going to produce a result that ought to require less work – but look what happens:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                 |      1 |        | 10802 (100)|      4 |00:02:48.83 |    1116 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |                 |      1 |      5 | 10802  (94)|      4 |00:02:48.83 |    1116 |       |       |          |
|   2 |   MERGE JOIN           |                 |      1 |    972M| 10697  (95)|   1007M|03:21:28.41 |    1116 |       |       |          |
|   3 |    SORT JOIN           |                 |      1 |  99999 |   380   (4)|  80042 |00:00:00.39 |     372 |  2037K|   674K| 1810K (0)|
|   4 |     TABLE ACCESS FULL  | FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.26 |     372 |       |       |          |
|*  5 |    SORT JOIN           |                 |  80042 |  20000 |   185   (4)|   1007M|00:57:11.13 |     744 |   619K|   472K|  550K (0)|
|*  6 |     TABLE ACCESS FULL  | FULLCLONES_TEST |      1 |  20000 |   106   (4)|  20076 |00:00:00.11 |     744 |       |       |          |
|   7 |      SORT AGGREGATE    |                 |      1 |      1 |            |      1 |00:00:00.01 |     372 |       |       |          |
|   8 |       TABLE ACCESS FULL| FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.28 |     372 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$6B65F52B
   4 - SEL$6B65F52B / FT2@INLINE
   6 - SEL$6B65F52B / FT1@SEL$1
   7 - SUBQ
   8 - SUBQ         / FT3@SUBQ

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("FT1"."CLONE_SIZE">="FT2"."SECTION_SIZE")
       filter("FT1"."CLONE_SIZE">="FT2"."SECTION_SIZE")
   6 - filter("FT1"."SECTION_SIZE"=)

Ignore the timings from lines 2 and 5 – I was using the hint gather_plan_statistics to collect the rowsource execution stats, and those lines are showing a massive sampling error. The query took about 7 minutes 30 seconds to run. The key difference is that line 4 shows that the “select distinct” is NOT aggregated early – the optimizer has used complex view merging to “join then aggregate” rather than “aggregate then join”. As you can see, this was a bad choice and the join has produced over a billion (US) rows at line 2 which then have to aggregated down to just 4 rows in line 1.

The question then is why ? If I put a /*+ no_merge */ hint in query block named “inline” the optimizer accepts the hint and goes back to the plan that aggregates early and runs very quickly – so it’s not a question of the optimizer bypassing some mechanism to avoid getting the wrong answer. I think the only option available to use for further investigation at this point is to examine the 10053 (optimizer) trace file to see what’s going on.

From the (12c)  trace file where we select the avg() we see the following lines:

OJE: Begin: find best directive for query block INLINE (#0)
OJE: End: finding best directive for query block INLINE (#0)
CVM: CBQT Marking query block INLINE (#0) as valid for CVM.
CVM:   Not Merging INLINE (#0) into SEL$1 (#0) due to CBQT directive.

From the equivalent position in the trace file where we select only the max() we see the lines:

OJE: Begin: find best directive for query block INLINE (#0)
OJE: End: finding best directive for query block INLINE (#0)
CVM:   Merging SPJ view INLINE (#0) into SEL$1 (#0)

It’s always hard to decide exactly WHY things happen – but it looks as if the optimizer merges the view heuristically in the max() case “because it can”, but has a heuristic (internal directive) that stops it from merging in the avg() case. What a good job we have hints !

Footnote:

In cases like this it’s always possible that there’s a generic case that might produce wrong results even though the specific case won’t – so it’s worth spending a little time thinking about how the wrong results might appear. It’s also worth keep hold of the SQL as a test case because if there’s a generic block in place to handle specific cases you may find that future enhancements allow the block to be lifted for some cases, and it’s nice to be able to check for such cases as you upgrade.

On the other hand, you can get back to the efficient plan if you change the inline view to be:


        (
        select  /*+
                        qb_name(inline)
                */
                ft2.section_size s , count(*) ct
        from    fullclones_test ft2
        group by
                ft2.section_size
        )       ftv


That’s just a little bit surprising – but I’m not going to pursue this one any further, beyond noting that there are some interesting anomalies available with inline aggregate views, even in 12.1.0.2.

Footnote 2:

If you want to experiment further, here’s the SQL to generate the data set:

create table fullclones_test (
        program_id      number(5,0),
        start_line      number,
        clone_id        number(5,0),
        clone_line      number,
        clone_size      number,
        range_start     number,
        section_size    number(4,0)
)
;

insert into fullclones_test (
        program_id, start_line, clone_id, clone_line,
        clone_size, range_start, section_size
)
Select
        1,
        trunc(dbms_random.value(1,1000)),
        trunc(dbms_random.value(1,10)),
        trunc(dbms_random.value(1,1000)),
        trunc(dbms_random.value(20,100)),
        0,
        20*trunc(dbms_random.value(1,6))
from
        dual
connect by
        level <100000
;

commit;

exec dbms_stats.gather_table_stats(user,'fullclones_test')

Finally, for consistently reproducible results I had engineered my system stats as follows:


        begin
                dbms_stats.set_system_stats('MBRC',16);
                dbms_stats.set_system_stats('MREADTIM',10);
                dbms_stats.set_system_stats('SREADTIM',5);
                dbms_stats.set_system_stats('CPUSPEED',1000);
        end;

November 5, 2014

Cardinality Feedback

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 6:43 pm GMT Nov 5,2014

A fairly important question, and a little surprise, appeared on Oracle-L a couple of days ago. Running 11.2.0.3 a query completed quickly on the first execution then ran very slowly on the second execution because Oracle had used cardinality feedback to change the plan. This shouldn’t really be entirely surprising – if you read all the notes that Oracle has published about cardinality feedback – but it’s certainly a little counter-intuitive.

Of course there are several known bugs related to cardinality feedback that could cause this anomaly to appear – a common complaint seems to relate to views on the right-hand (inner table) side of nested loop joins, and cardinality feedback being used on a table inside the view; but there’s an inherent limitation to cardinality feedback that makes it fairly easy to produce an example of a query doing more work on the second execution.

The limitation is that cardinality feedback generally can’t be used (sensibly) on all the tables where better information is needed. This blog describes the simplest example I can come up with to demonstrate the point. Inevitably it’s a little contrived, but it captures the type of guesswork and mis-estimation that can make the problem appear in real data sets. Here’s the query I’m going to use:


select
	t1.n1, t1.n2, t2.n1, t2.n2
from
	t1, t2
where
	t1.n1 = 0
and	t1.n2 = 1000
and	t2.id = t1.id
and	t2.n1 = 0
and	t2.n2 = 400
;

You’ll notice that I’ve got two predicates on both tables so, in the absence of “column-group” extended stats the optimizer will enable cardinality feedback as the query runs to check whether or not its “independent columns” treatment of the predicates gives a suitably accurate estimate of cardinality and a reasonable execution plan. If the estimates are bad enough the optimizer will use information it has gathered as the query ran as an input to re-optimising the query on the next execution.

So here’s the trick.  I’m going to set up the data so that there seem to be only two sensible plans:  (a) full scan of t1, with nested loop unique index access to t2; (b) full scan of t2, with nested loop unique index access to t1. But I’m going to make sure that the optimizer thinks that (a) is more efficient than (b) by making making the stats look as if (on average) the predicates on t1 should return 100 rows while the predicates on t2 return 200 rows.

On the other hand I’ve set the data up so that (for this specific set of values) t1 returns 1,000 rows which means Oracle will decide that its estimate was so far out that it will re-optimize with 1,000 as the estimated single table access cardinality for t1 – and that means it will decide to do the nested loop from t2 to t1. But what the optimizer doesn’t know (and hasn’t been able to find out by running the first plan) is that with this set of predicates t2 will return 20,000 rows to drive the nested loop into t1 – and the new execution plan will do more buffer gets and use more CPU (and time) than the old plan. Since cardinality feedback is applied only once, the optimizer won’t be able to take advantage of the second execution to change the plan again, or even to switch back to the first plan.

Here’s the setup so you can test the behaviour for yourselves:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,2)		n1,
	mod(rownum,2000)	n2,	-- 200 rows for each value on average
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 4e5
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,2)		n1,
	2 * mod(rownum,1000)	n2,	-- 400 rows for each value on average, same range as t1
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 4e5
;

alter table t2 add constraint t2_pk primary key(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
end;
/

--
-- Now update both tables to put the data out of sync with the statistics
-- We need a skewed value in t1 that is out by a factor of at least 8 (triggers use of CF)
-- We need a skewed value in t2 that is so bad that the second plan is more resource intensive than the first
--

update t1 set n2 = 1000 where n2 between 1001 and 1019;
update t2 set n2 =  400 where n2 between 402 and 598;
commit;

Here are the execution plans for the first and second executions (with rowsource execution statistics enabled, and the “allstats last” option used in a call to dbms_xplan.display_cursor()).


----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |  1049 (100)|      0 |00:00:00.36 |   11000 |   6588 |
|   1 |  NESTED LOOPS                |       |      1 |    100 |  1049   (3)|      0 |00:00:00.36 |   11000 |   6588 |
|   2 |   NESTED LOOPS               |       |      1 |    100 |  1049   (3)|   2000 |00:00:00.35 |    9000 |   6552 |
|*  3 |    TABLE ACCESS FULL         | T1    |      1 |    100 |   849   (4)|   2000 |00:00:00.30 |    6554 |   6551 |
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |   2000 |      1 |     1   (0)|   2000 |00:00:00.02 |    2446 |      1 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T2    |   2000 |      1 |     2   (0)|      0 |00:00:00.01 |    2000 |     36 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   4 - access("T2"."ID"="T1"."ID")
   5 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |  1249 (100)|      0 |00:00:00.66 |   32268 |   1246 |
|   1 |  NESTED LOOPS                |       |      1 |    200 |  1249   (3)|      0 |00:00:00.66 |   32268 |   1246 |
|   2 |   NESTED LOOPS               |       |      1 |    200 |  1249   (3)|  20000 |00:00:00.56 |   12268 |    687 |
|*  3 |    TABLE ACCESS FULL         | T2    |      1 |    200 |   849   (4)|  20000 |00:00:00.12 |    6559 |    686 |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |  20000 |      1 |     1   (0)|  20000 |00:00:00.19 |    5709 |      1 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T1    |  20000 |      1 |     2   (0)|      0 |00:00:00.15 |   20000 |    559 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("T2"."N2"=400 AND "T2"."N1"=0))
   4 - access("T2"."ID"="T1"."ID")
   5 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))

Note
-----
   - cardinality feedback used for this statement

The second plan does fewer reads because of the buffering side effects from the first plan – but that’s not what the optimizer is looking at. The key feature is that the first plan predicts 100 rows for t1, with 100 starts for the index probe, but discovers 2,000 rows and does 2,000 probes. Applying cardinality feedback the optimizer decides that fetching 200 rows from t2 and probing t1 200 times will be lower cost than running the join the other way round with the 2,000 rows it now knows it will get – but at runtime Oracle actually gets 20,000 rows, does three times as many buffer gets, and spends twice as much time as it did on the first plan.

Hinting

Oracle hasn’t been able to learn (in time) that t2 will supply 20,000 rows – but if you knew this would happen you could use the cardinality() hint to tell the optimizer the truth about both tables /*+ cardinality(t1 2000) cardinality(t2 20000) */ this is the plan you would get:

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |  1698 (100)|      0 |00:00:00.06 |   13109 |  13105 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   2000 |  1698   (4)|      0 |00:00:00.06 |   13109 |  13105 |  1696K|  1696K| 1647K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   2000 |   849   (4)|   2000 |00:00:00.05 |    6554 |   6552 |       |       |          |
|*  3 |   TABLE ACCESS FULL| T2   |      1 |  20000 |   849   (4)|  20000 |00:00:00.09 |    6555 |   6553 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   2 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   3 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

Unfortunately, unless you have used hints, it doesn’t matter how many times you re-run the query after cardinality feedback has pushed you into the bad plan – it’s not going to change again (unless you mess around flushing the shared pool or using dbms_shared_pool.purge() to kick out the specific statement).

Upgrade

If you upgrade to 12c the optimizer does a much better job of handling this query – it produces an adaptive execution plan (starting with the nested loop join but dynamically switching to the hash join as the query runs). Here’s the full adaptive plan pulled from memory after the first execution – as you can see both the t1/t2 nested loop and hash joins were considered, then the nested loop was discarded in mid-execution. Checking the 10053 trace file I found that Oracle has set the inflexion point (cross-over from NLJ to HJ) at 431 rows.


----------------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT              |       |      1 |        |  1063 (100)|      0 |00:00:00.06 |   13113 |  13107 |       |       |          |
|  *  1 |  HASH JOIN                    |       |      1 |    100 |  1063   (3)|      0 |00:00:00.06 |   13113 |  13107 |  1519K|  1519K| 1349K (0)|
|-    2 |   NESTED LOOPS                |       |      1 |    100 |  1063   (3)|   2000 |00:00:00.11 |    6556 |   6553 |       |       |          |
|-    3 |    NESTED LOOPS               |       |      1 |    100 |  1063   (3)|   2000 |00:00:00.10 |    6556 |   6553 |       |       |          |
|-    4 |     STATISTICS COLLECTOR      |       |      1 |        |            |   2000 |00:00:00.09 |    6556 |   6553 |       |       |          |
|  *  5 |      TABLE ACCESS FULL        | T1    |      1 |    100 |   863   (4)|   2000 |00:00:00.08 |    6556 |   6553 |       |       |          |
|- *  6 |     INDEX UNIQUE SCAN         | T2_PK |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|- *  7 |    TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |     2   (0)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  *  8 |   TABLE ACCESS FULL           | T2    |      1 |      1 |     2   (0)|  20000 |00:00:00.07 |    6557 |   6554 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   5 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   6 - access("T2"."ID"="T1"."ID")
   7 - filter(("T2"."N2"=400 AND "T2"."N1"=0))
   8 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

Footnote:

For reference, here are a couple of the bug (or patch)  numbers associated with cardinality feedback:

  • Patch 13454409: BAD CARDINALITY FROM FEEDBACK (CFB) ON VIEW RHS OF NLJ
  • Bug 16837274 (fixed in 12.2): Bad cost estimate for object on RHS of NLJ
  • Bug 12557401: The table that is being incorrectly adjusted is in the right hand side of a nested loops.
  • Bug 8521689: Bad cardinality feedback estimate for view on right-hand side of NLJ

 

October 17, 2014

sreadtim

Filed under: Oracle,Statistics,System Stats,Troubleshooting — Jonathan Lewis @ 1:22 pm GMT Oct 17,2014

Here’s a question that appeared in my email a few days ago:

 

Based on the formula: “sreadtim = ioseektim + db_block_size/iotrfrspeed” sreadtim should always bigger than ioseektim.

But I just did a query on my system, find it otherwise, get confused,

SQL> SELECT * FROM SYS.AUX_STATS$;<

SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- --------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    10-08-2014 10:45
SYSSTATS_INFO                  DSTOP                                     10-10-2014 10:42
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                     680.062427
SYSSTATS_MAIN                  IOSEEKTIM                              10
SYSSTATS_MAIN                  IOTFRSPEED                           4096
SYSSTATS_MAIN                  SREADTIM                            4.716
SYSSTATS_MAIN                  MREADTIM                            2.055
SYSSTATS_MAIN                  CPUSPEED                             1077
SYSSTATS_MAIN                  MBRC                                    4
SYSSTATS_MAIN                  MAXTHR                          956634112
SYSSTATS_MAIN                  SLAVETHR                           252928

How do we explain this ?

 

This question highlights two points – one important, the other only slightly less so.

The really important point is one of interpretation.  Broadly speaking we could reasonably say that the (typical) time required to perform a single block read is made up of the (typical) seek time plus the transfer time which, using the names of the statistics above, would indeed give us the relationship: sreadtim = ioseektim + db_block_size/iotfrspeed; but we have to remember that we are thinking of a simplified model of the world. The values that we capture for sreadtim include the time it takes for a request to get from Oracle to the O/S, through the various network software and hardware layers and back again, the formula ignores those components completely and, moreover, doesn’t allow for the fact that some “reads” could actually come from one of several caches without any physical disc access taking place; similarly we should be aware that the time for an actual I/O seek would vary dramatically with the current position  of the read head, the radial position of the target block, the speed and current direction of movement of the read head, and the rotational distance to the target block. The formula is not attempting to express a physical law, it is simply expressing an approximation that we might use in a first line estimate of performance.

In fact we can see in the figures above that multi-block reads (typically of 4 blocks)  were faster than single block reads on this hardware for the duration of the sampling period – and that clearly doesn’t fit with the simple view embedded in our formula of how disc drives work.  (It’s a fairly typical effect of SANs, of course, that large read requests make the SAN software start doing predictive read-ahead, so that the next read request from Oracle may find that the SAN has already loaded the required data into its cache.)

There is, however, the second point that these figures highlight – but you have to be in the know to spot the detail: whatever the complexities introduced by SAN caching, we’re notlooking at the right figures. The ioseektim and iotfrspeed shown here are the default values used by Oracle. It looks as if the user has called dbms_stats.gather_system_stats() with a 48 hour workload (8th Oct to 10th Oct), but hasn’t yet executed the procedure using the ‘noworkload’ option. Perhaps the ioseektim and iotfrspeed figures from a noworkload call would look a little more reasonable when compared with the 4.716 milliseconds of the workload single block read. There may still be a large gap between the model and the reality, but until the two sets of figures we’re using come from the same place we shouldn’t even think about comparing them.

September 12, 2014

Unusual Deadlock

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 1:08 pm GMT Sep 12,2014

Prompted by a question on OTN I came up with a strategy for producing an ORA-00060 deadlock that DIDN’T produce a deadlock graph (because there isn’t one) and didn’t get reported in the alert log (at least, not when tested on 11.2.0.4). It’s a situation that shouldn’t arise in a production system because it’s doing the sorts of things that you shouldn’t do in a production system: but possibly if you’re trying to do some maintenance or upgrades while keeping the system live it could happen. Here’s the starting code:


drop procedure p2;
drop procedure p1;

drop table t1 purge;
create table t1 (n1 number);

insert into t1 values(1);
insert into t1 values(3);

create or replace procedure p1
as
begin
        update t1 set n1 = 2 where n1 = 1;
        dbms_lock.sleep(10);
        update t1 set n1 = 4 where n1 = 3;
end;
/

create or replace procedure p2
as

        procedure q
        is
                pragma autonomous_transaction;
        begin
                execute immediate 'drop procedure p1';
        end;

begin

        update t1 set n1 = 4 where n1 = 3;
        q;

end;
/

Of course you’re asking for trouble if you start doing DDL as part of your production code; and you’re asking for trouble if you start playing around with autonomous transaction; and if you use one to do the other you’re almost guaranteed to hit a problem eventually. All it takes in this case is a simple sequence of actions followed by a short wait (ca. 10 seconds):

Session 1: execute p1
Session 2: wait a couple of seconds, then execute p2

I got the following result from session 2:


BEGIN p2; END;

*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "TEST_USER.P2", line 8
ORA-06512: at "TEST_USER.P2", line 14
ORA-06512: at line 1

While this doesn’t show up in the alert log, I do get a trace file dumped for the session; in fact I got a trace file from both processes. The trace from the process that reported the deadlock started like this:


DEADLOCK DETECTED

  Performing diagnostic dump and signaling ORA-00060

  Complete deadlock information is located in the trace file of process (pid: 8, osid: 17861, DIA0)
    trace file: /u01/app/oracle/diag/rdbms/test/test/trace/test_dia0_17861.trc

-------------------------------------------------------------------------------

*** 2014-09-09 12:44:13.427
-------------------------------------------------------------------------------
HUNG PROCESS DIAGNOSTIC DUMP BEGIN:

    dump requested by process (pid: 8, osid: 17861, DIA0)
    trace file: /u01/app/oracle/diag/rdbms/test/test/trace/test_dia0_17861.trc
-------------------------------------------------------------------------------

The other process didn’t mention a deadlock, but started at the “HUNG PROCESS DIAGNOSTIC” line – the trace file had been triggered by the DIAG process. Of course, the DIAG process also dumped a trace file, summarising the situation; and this started with the following important note:


*** 2014-09-09 12:44:13.426
-------------------------------------------------------------------------------

DEADLOCK DETECTED (id=0xf678cfe3)

Chain Signature: 'enq: TX - row lock contention'<='library cache pin' (cycle)
Chain Signature Hash: 0x39f9868d

The rest of the trace file told me what the other two processes had been doing when the ORA-00060 was initiated, but the point I want to pick up here is that we have a very brief summary in the “Chain Signature” that tells us we’ve had a collision between some DML (enq: TX) and some sort of cursor-like problem (library cache pin) and not a simple data cross-over.

If you’ve looked at the original OTN posting you’ll see that the Chain Signature in that case is “Chain Signature: ‘library cache lock'<=’library cache pin’ (cycle)”, which indicates a collision restricted entirely to the library cache (the lock suggests someone is using a package/cursor while the pin suggests that another session is trying to destroy/(re)compile it – and they’re each trying to do it to each other’s package ! (In the past when I’ve a deadlock of this type it’s been reported as ORA-04020 rather than ORA-00060.) I still have to work out exactly how the OP got into their deadlock (especially in view of their latest comment) -but since I don’t have the right version of Oracle to hand, and it might be a version-specific bug, I don’t think I’m going to try.

 

September 8, 2014

ASSM Truncate.

Filed under: ASSM,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 11:34 am GMT Sep 8,2014

Here’s one that started off with a tweet from Kevin Closson, heading towards a finish that shows some interesting effects when you truncate large objects that are using ASSM. To demonstrate the problem I’ve set up a tablespace using system allocation of extents and automatic segment space management (ASSM).  It’s the ASSM that causes the problem, but it requires a mixture of circumstances to create a little surprise.


create
	tablespace test_8k_auto_assm
	datafile	-- OMF
	SIZE 1030M
	autoextend off
	blocksize 8k
	extent management local
	autoallocate
	segment space management auto
;

create table t1 (v1 varchar2(100)) pctfree 99 tablespace test_8k_auto_assm storage(initial 1G);

insert into t1 select user from dual;
commit;

alter system flush buffer_cache;

truncate table t1;

I’ve created a table with an initial definition of 1GB, which means that (in a clean tablespace) the autoallocate option will jump straight to extents of 64MB, with 256 table blocks mapped per bitmap block for a total of 32 bitmap blocks in each 64MB extent. Since I’m running this on 11.2.0.4 and haven’t included “segment creation immediate” in the definition I won’t actually see any extents until I insert the first row.

So here’s the big question – when I truncate this table (using the given command) how much work will Oracle have to do ?

Exchanging notes over twitter (140 char at a time) and working from a model of the initial state, it took a little time to get to understand what was (probably) happening and then produce this silly example – but here’s the output from a snapshot of v$session_event for the session across the truncate:


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
local write wait                                    490           0          83.26        .170          13
enq: RO - fast object reuse                           2           0         104.90      52.451         105
db file sequential read                              47           0           0.05        .001           0
db file parallel read                                 8           0           0.90        .112           0
SQL*Net message to client                            10           0           0.00        .000           0
SQL*Net message from client                          10           0           0.67        .067         153
events in waitclass Other                             2           0           0.04        .018         109

The statistic I want to highlight is the number recorded against “local write wait”: truncating a table of one row we wait for 490 blocks to be written! We also have 8 “db file parallel read”  waits which, according to a 10046 trace file, were reading hundreds of blocks. (I think the most significant time in this test – the RO enqueue wait – may have been waiting for the database writer to complete the work needed for an object checkpoint, but I’m not sure of that.)

The blocks written were the space management bitmap blocks for the extent(s) that remained after the truncate – even the ones that referenced extents above the high water mark for the table. Since we had set the tables initial storage to 1GB, we had a lot of bitmap blocks. At 32 per extent and 16 extents (64MB * 16 = 1GB) we might actually expect something closer to 512 blocks, but actually Oracle had formatted the last extent with only 8 space management blocks. and the first extent had an extra 2 to cater for the level 2 bitmap lock and segment header block giving: 32 * 15 + 8 + 2 = 490.

As you may have seen above, the impact on the test that Kevin was doing was quite dramatic – he had set the initial storage to 128GB (lots of bitmap blocks), partitioned the table (more bitmap blocks) and was running RAC (so the reads were running into waits for global cache grants).

I had assumed that this type of behaviour happened only with the “reuse storage” option of the truncate command: and I hadn’t noticed before that it also appeared even if you didn’t reuse storage – but that’s probably because the effect applies only to the bit you keep, which may typically mean a relatively small first extent. It’s possible, then, that in most cases this is an effect that isn’t going to be particularly visible in production systems – but if it is, can you work around it ? Fortunately another tweeter asked the question “What happens if you ‘drop all storage?'” Here’s the result from adding that clause to my test case:


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
enq: RO - fast object reuse                           1           0           0.08        .079           0
log file sync                                         1           0           0.03        .031           0
db file sequential read                              51           0           0.06        .001           0
SQL*Net message to client                            10           0           0.00        .000           0
SQL*Net message from client                          10           0           0.56        .056         123
events in waitclass Other                             3           0           0.87        .289         186


Looking good – if you don’t keep any extents you don’t need to make sure that their bitmaps are clean. (The “db file sequential read” waits are almost all about the data dictionary, following on from my “flush buffer cache”).

Footnote 1: the same effect appears in 12.1.0.2
Footnote 2: it’s interesting to note that the RO enqueue wait time seems to parallel the local write wait time: perhaps a hint that there’s some double counting going on. (To be investigated, one day).

August 21, 2014

Quiz night

Filed under: CBO,Indexing,NULL,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 6:05 pm GMT Aug 21,2014

Here’s a script to create a table, with index, and collect stats on it. Once I’ve collected stats I’ve checked the execution plan to discover that a hint has been ignored (for a well-known reason):

create table t2
as
select
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        all_objects
where
        rownum <= 3000
;

create index t2_i1 on t2(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt => 'for all columns size 1'
        );
end;
/

explain plan for
select  /*+ index(t2) */
        n1
from    t2
where   n2 = 45
;

select * from table(dbms_xplan.display);

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    15 |   120 |    15 |
|*  1 |  TABLE ACCESS FULL| T2   |    15 |   120 |    15 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=45)

Of course we don’t expect the optimizer to use the index because we didn’t declare n1 to be not null, so there may be rows in the table which do not appear in the index. The only option the optimizer has for getting the right answer is to use a full tablescan. So the question is this – how come Oracle will obey the hint in the following SQL statement:


explain plan for
select
        /*+
                leading (t2 t1)
                index(t2) index(t1)
                use_nl(t1)
        */
        t2.n1, t1.n2
from
        t2      t2,
        t2      t1
where
        t2.n2 = 45
and     t2.n1 = t1.n1
;

select * from table(dbms_xplan.display);

-------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |   225 |  3600 |  3248 |
|   1 |  NESTED LOOPS                         |       |   225 |  3600 |  3248 |
|   2 |   NESTED LOOPS                        |       |   225 |  3600 |  3248 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |   120 |  3008 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |  3000 |       |     8 |
|*  5 |    INDEX RANGE SCAN                   | T2_I1 |    15 |       |     1 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | T2    |    15 |   120 |    16 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

I ran this on 11.2.0.4, but it does the same on earlier versions.

Update:

This was clearly too easy – posted at 18:04, answered correctly at 18:21. At some point in it’s evolution the optimizer acquired a rule that allowed it to infer unwritten “is not null” predicates from the join predicate.

 

 

 

July 13, 2014

Deferrable RI – 2

Filed under: Infrastructure,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 7:46 pm GMT Jul 13,2014

A question came up on Oracle-L recently about possible locking anomalies with deferrable referential integrity constraints.

An update by primary key is taking a long time; the update sets several columns, one of which is the child end of a referential integrity constraint. A check on v$active_session_history shows lots of waits for “enq: TX – row lock contention” in mode 4 (share), and many of these waits also identify the current object as the index that has been created to avoid the “foreign key locking” problem on this constraint (though many of the waits show the current_obj# as -1). A possible key feature of the issue is that foreign key constraint is defined as “deferrable initially deferred”. The question is, could such a constraint result in TX/4 waits.

My initial thought was that if the constraint was deferrable it was unlikely, there would have to be other features coming into play.

Of course, when the foreign key is NOT deferrable it’s easy to set up cases where a TX/4 appears: for example you insert a new parent value without issuing a commit then I insert a new matching child, at that point my session will wait for your session to commit or rollback. If you commit my insert succeeds if you rollback my session raises an error (ORA-02291: integrity constraint (schema_name.constraint_name) violated – parent key not found). But if the foreign key is deferred the non-existence (or potential existence, or not) of the parent should matter.  If the constraint is deferrable, though, the first guess would be that you could get away with things like this so long as you fixed up the data in time for the commit.

I was wrong. Here’s a little example:


create table parent (
	id	number(4),
	name	varchar2(10),
	constraint par_pk primary key (id)
)
;

create table child(
	id_p	number(4)
		constraint chi_fk_par
		references parent
		deferrable initially deferred,
	id	number(4),
	name	varchar2(10),
	constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

insert into child values(2,1,'Jack');
insert into child values(2,2,'Jill');

commit;

begin
	dbms_stats.gather_table_stats(user,'parent');
	dbms_stats.gather_table_stats(user,'child');
end;
/

pause Press return

update child set id_p = 3 where id_p = 2 and id = 2;

If you don’t do anything after the pause and before the insert then the update will succeed – but fail on a subsequent commit unless you insert parent 3 before committing. But if you take advantage of the pause to use another session to insert parent 3 first, the update will then hang waiting for the parent insert to commit or rollback – and what happens next may surprise you. Basically the deferrability doesn’t protect you from the side effects of conflicting transactions.

The variations on what can happen next (insert the parent elsewhere, commit or rollback) are interesting and left as an exercise.

I was slightly surprised to find that I had had a conversation about this sort of thing some time ago, triggered by a comment to an earlier post. If you want to read a more thorough investigation of the things that can happen and how deferrable RI works then there’s a good article at this URL.

 

May 23, 2014

10053 trace

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 1:37 pm GMT May 23,2014

I published a note yesterday about enabling SQL trace system-wide for a single statement – and got a response on twitter from Bertrand Drouvot referencing a blog post he’d done a few months ago about using a similar method to dump the optimizer trace (10053) for a statement whenever it was optimized. He also highlighted the dbms_sqldiag package with its dump_trace() procedure – which is something I’d wanted to use a couple of weeks ago but couldn’t remember the package name – and supplied a reference to MoS note 225598.1 which then led on to Greg Rahn’s blog note about dbms_sqldiag.

The package is so important that I decided I’d create a reference note on my blog about it (I finally found my own reference notes on my laptop after getting a clue from Bertrand’s blog about the filename); but, at the same time, I think I have to report that it might not have helped me when I was looking for it two weeks ago. Here’s a critical paragraph from the MoS note:

How to Obtain Tracing of Optimizer Computations (EVENT 10053) (Doc ID 225598.1)

NOTE: The parse environment uses information captured in V$SQL_OPTIMIZER_ENV which does not record all information about the environment that parsed the query such as NLS settings. The result of this is that trace extracted from this may not always generate a trace that is truly representative of what happens when parsed from an application client. For example, if you parse from a client with NLS_SORT set differently to the Database then the application plan may be different to the database and so a trace generated from V$SQL_OPTIMIZER_ENV may cause confusion when the plan for a given cursor in the application is different to the one extracted.

The particular task I wanted to do was to re-optimize a query that had been run by another user (I was logged on as SYS at the time) because the critical user would have added a security predicate based on their current context to the table definitions in the query – and I didn’t know what that predicate would look like, but I knew it would have affected the execution path.  Judging from the note, though, it seems likely that the call to the package wouldn’t have re-acquired the user’s context and might, therefore, not be able to regenerate the necessary predicates. On the other hand, the actual security predicates used during the original optimisation could still be in-memory (in v$vpd_policy) so maybe dump_trace() would be able to do something about them.

Status: to be tested when I next have a few minutes free.

 

May 22, 2014

sql_trace

Filed under: Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:24 pm GMT May 22,2014

Here’s a convenient aid to trouble-shooting that appeared in 11g with its enhancements to setting events. It’s a feature that may help you to work out (among other things) why a given statement seems to have a highly variable performance profile. If you can find the SQL_ID for a parent cursor you can enable tracing for just that cursor whenever it executes, whoever executes it.


--	Last tested:  11.2.0.4

define m_sql_id = '&1'
define m_sql_id = '9tz4qu4rj9rdp'

alter system set events
	'
	sql_trace[SQL: &m_sql_id ]
/*
	plan_stat=all_executions,
	wait=false,
	bind=false
*/
	'
;

pause Press return to stop tracing

alter system set events
	'
	sql_trace[SQL: &m_sql_id ]
	off
	'
;
;

This is the whole of a little script I’ve got – the generic &1 is how I normally use it, I’ve just included a specific value (which is the sql_id for “select count(*) from all_objects;” as an example – that starts tracing across the entire system, but only for a given SQL_ID. On a production system, if I think I really need to do this, I would check the expected frequency of execution for the statement and wait enough time to capture a few occurrences before disabling the trace. Each session generates its own trace file, but if you’ve been sufficiently patient you get a reasonable sampling of the different workloads – and if you’ve captured the bind variables as well, this may give you some clues about why different work loads can appear.

A nice detail about this feature is that if the SQL_ID is for a pl/sql block, all the SQL executing inside the block is traced as part of the trace on the block (and that was particularly helpful the last time I had to make use of the feature); in the sample I’ve given I also found that some recursive SQL – relating to the XMLSCHEMA object types – executed within the view was also traced as the main statement was traced … so that makes it easy to see the effects of SQL statements calling PL/SQL functions that contain SQL statements.

 

May 14, 2014

Feature Bypass

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 1:23 pm GMT May 14,2014

Here’s a little tip that might be helpful occasionally when you’re trying to work out why the optimizer transformation you were expecting isn’t appearing

If you’ve ever checked the 10053 trace (and who wants to do that for a complex query) you may have noticed lines like:

SU: SU bypassed: Remote table referenced.

So now you know that SU – Subquery Unnesting – has limitations in distributed queries.

When I first saw a line like this, it crossed my mind that it would be useful to keep a reference list of features that could be reported as bypassed, which I do through a simple unix line:

strings -a oracle | grep -i bypassed > bypassed.txt

If you need a reference for the various short codes for transformations you can find it near the top of the 10053 trace, looking like this:

[sourecode gutter=”false”]
CBQT – cost-based query transformation
JPPD – join predicate push-down
OJPPD – old-style (non-cost-based) JPPD
FPD – filter push-down
PM – predicate move-around
CVM – complex view merging
SPJ – select-project-join
SJC – set join conversion
SU – subquery unnesting
OBYE – order by elimination
OST – old style star transformation
ST – new (cbqt) star transformation
CNT – count(col) to count(*) transformation
JE – Join Elimination
JF – join factorization
SLP – select list pruning
DP – distinct placement
… big gap here
AP – adaptive plans
[/sourcecode]

April 4, 2014

NVL() change

Filed under: CBO,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 6:10 pm GMT Apr 4,2014

One of the problems of functions is that the optimizer generally doesn’t have any idea on how a predicate based on function(col) might affect the cardinality. However,  the optimizer group are constantly refining the algorithms to cover an increasing number of special cases more accurately. This is a good thing, of course – but it does mean that you might be unlucky on an upgrade where a better cardinality estimate leads to a less efficient execution plan. Consider for example the simple query (where d1 is column of type date):

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate

Now, there are many cases in many versions of Oracle, where the optimizer will appear to calculate the cardinality of

nvl(columnX,{constant}) operator {constant}

as if it were:

columnX is null or columnX operator {constant}

Unfortunately this doesn’t seem to be one of them – until you get to 11.2.something. Here’s a little demonstration code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	case
		when rownum > 100 then null else sysdate - rownum
	end	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	query with NVL

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
;

prompt	query with OR clause

select	*
from	t1
where	d1 is null or d1 < sysdate
;

If you run this code in 11.1.0.7 you get the following – with numeric variations for cost (which I’m interested not in at the moment):


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2500 |  2500 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

Take note particularly of the difference in the estimated cardinality for the tablescans.

When you upgrade to 11.2.0.4 (possibly earlier – though there are some nvl() related patches that appeared only in 11.2.0.4), you get this:


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49900 | 49900 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

As you can see the estimate for the “NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)”.

It’s interesting to note that a (relatively) small error has crept in to the “OR” example – interestingly the size of the error is exactly the number of rows where d1 is not null (which looks like enough of a coincidence to be a bug – but maybe there’s a good rationale for it)

Conclusion

Yet again, a simple upgrade has the capacity to make a dramatic change to a cardinality estimate – which could mean a significant change to an execution plan and major change in performance. If you’ve read this note, though, you may recognise the pattern that is the driving feature behind the problem.

Footnote:

If you have access to versions 11.2.0.1 through 11.2.0.3 and find that the test data produces different cardinalities please publish the results in the comments – it would be nice to know exactly when this change appears.  (You need only show the body of the execution plans  with labels, not the entire output of the script).

 

March 28, 2014

Juggernaut

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 8:12 am GMT Mar 28,2014

One of the problems of “knowing” so much about Oracle is that the more you know the more you have to check on each new release of the software. An incoming ping on my posting “Lock Horror” reminded me that I was writing about 11.2.0.1, and the terminal release is 11.2.0.4, and the whole thing may have changed in 12.1.0.1 – so I ought to re-run some tests to make sure that the articel is up to date if it’s likely to be read a few times in the next few days.

Unfortunately, although I often add a URL to scripts I’ve used to confirm results published in the blog, I don’t usually include a script name in my blog postings  to remind me where to go if I want to re-run the tests. So how do I find the right script(s) ? Typically I list all the likely scripts and compare dates with the date on the blog; so here’s what I got for “lock”.


SQL> host ls -ltr *lock*.sql | grep -v block
-rwxr-xr-x 1 jonathan dba 1569 Jun 28  2002 c_bitlock.sql
-rwxr-xr-x 1 jonathan dba 1303 Oct  5  2002 ddl_deadlock.sql
-rwxr-xr-x 1 jonathan dba 1875 Oct  7  2002 ddl_deadlock_2.sql
-rwxr-xr-x 1 jonathan dba 1654 Aug  6  2003 hw_lock.sql
-rwxr-xr-x 1 jonathan dba 2626 Sep 17  2004 lock_oddity.sql
-rwxr-xr-x 1 jonathan dba 1804 Sep 17  2004 lock_speed.sql
-rwxr-xr-x 1 jonathan dba 3194 May  8  2006 space_locks.sql
-rwxr-xr-x 1 jonathan dba 4337 Jan  3  2008 tm_deadlock.sql
-rwxr-xr-x 1 jonathan dba 1149 Jan  3  2008 show_lock.sql
-rwxr-xr-x 1 jonathan dba 2068 Apr 21  2008 hw_lock_2.sql
-rwxr-xr-x 1 jonathan dba 1482 Feb  5  2010 tt_lock.sql
-rwxr-xr-x 1 jonathan dba 1692 Feb 16  2010 to_lock.sql
-rwxr-xr-x 1 jonathan dba 3308 Jun  1  2010 skip_locked.sql
-rwxr-xr-x 1 jonathan dba 2203 Nov  2  2010 deadlock_statement.sql
-rwxr-xr-x 1 jonathan dba 2883 Nov  3  2010 merge_locking.sql
-rwxr-xr-x 1 jonathan dba 1785 Dec 14  2010 sync_lock.sql
-rwxr-xr-x 1 jonathan dba  984 Apr 23  2011 para_dml_deadlock.sql
-rwxr-xr-x 1 jonathan dba 4305 Jun  4  2011 locking_fifo.sql
-rwxr-xr-x 1 jonathan dba 5970 Jun  5  2011 locking_fifo_2.sql
-rwxr-xr-x 1 jonathan dba  917 Jun 30  2011 ul_deadlock.sql
-rwxr-xr-x 1 jonathan dba  936 Jul  8  2011 funny_deadlock.sql
-rwxr-xr-x 1 jonathan dba  741 Sep  8  2011 row_lock_wait_index.sql
-rwxr-xr-x 1 jonathan dba 2590 Nov 30  2012 fk_lock_stress.sql
-rwxr-xr-x 1 jonathan dba 4561 Feb  6  2013 dbms_lock.sql
-rwxr-xr-x 1 jonathan dba 1198 Apr  6  2013 libcache_locks.sql
-rwxr-xr-x 1 jonathan dba 5636 Nov 27 19:40 ash_deadlocks.sql
-rwxr-xr-x 1 jonathan dba  379 Mar 27 19:17 fk_constraint_locks.sql

Nothing leaps out as an obvious candidate, though “funny_deadlock.sql” catches my eye for future reference; maybe I should look for “foreign key”.

SQL> host ls -ltr *fk*.sql | grep -v fkr
-rwxr-xr-x 1 jonathan dba  2140 Jun 16  2005 fk_check.sql
-rwxr-xr-x 1 jonathan dba  2897 Jun 16  2005 fk_order.sql
-rwxr-xr-x 1 jonathan dba   650 Oct 26  2007 pk_fk_null.sql
-rwxr-xr-x 1 jonathan dba  5444 Nov  4  2007 c_fk.sql
-rwxr-xr-x 1 jonathan dba  1568 Dec  5  2008 null_fk.sql
-rwxr-xr-x 1 jonathan dba  2171 Mar  2  2009 fk_anomaly_2.sql
-rwxr-xr-x 1 jonathan dba  3922 Mar  2  2009 fk_anomaly.sql
-rwxr-xr-x 1 jonathan dba  5512 Oct 15  2009 fk_check_2.sql
-rwxr-xr-x 1 jonathan dba  1249 Feb 15  2010 c_pk_fk_2.sql
-rwxr-xr-x 1 jonathan dba  1638 Feb 16  2010 c_pk_fk_3.sql
-rwxr-xr-x 1 jonathan dba  5121 Jun  1  2012 c_pt_fk_2.sql
-rwxr-xr-x 1 jonathan dba  4030 Jun  5  2012 c_pt_fk_3.sql
-rwxr-xr-x 1 jonathan dba  2062 Jun  5  2012 c_pt_fk_3a.sql
-rwxr-xr-x 1 jonathan dba  2618 Sep 23  2012 c_pk_fk_02.sql
-rwxr-xr-x 1 jonathan dba  1196 Oct 19  2012 deferrable_fk.sql
-rwxr-xr-x 1 jonathan dba  2590 Nov 30  2012 fk_lock_stress.sql
-rwxr-xr-x 1 jonathan dba  4759 Sep  1  2013 fk_bitmap.sql
-rwxr-xr-x 1 jonathan dba  1730 Sep 30 07:51 virtual_fk.sql
-rwxr-xr-x 1 jonathan dba  3261 Dec 22 09:41 pk_fk_gets.sql
-rwxr-xr-x 1 jonathan dba  8896 Dec 31 13:19 fk_delete_gets.sql
-rwxr-xr-x 1 jonathan dba 10071 Dec 31 14:52 fk_delete_gets_2.sql
-rwxr-xr-x 1 jonathan dba  4225 Jan 14 11:15 c_pk_fk.sql
-rwxr-xr-x 1 jonathan dba  2674 Jan 14 13:42 append_fk.sql
-rwxr-xr-x 1 jonathan dba  1707 Feb 10 12:34 write_cons_fk.sql
-rwxr-xr-x 1 jonathan dba  9677 Feb 24 17:23 c_pt_fk.sql
-rwxr-xr-x 1 jonathan dba   379 Mar 27 19:17 fk_constraint_locks.sql

(The “-fkr” is to eliminate scripts about “first K rows optimisation”). With a little luck, the dates are about right, c_pk_fk_2.sql and c_pk_fk_3.sql will be relevant. So keep an eye on “Lock Horror” for an update in the next few days.

You’ll notice that some of the scripts have a very old datestamp on them – that’s an indication of how hard it is to keep up; when I re-run a script on a new version of Oracle I invariably add a “Last Tested:” version to the header, and a couple of notes about changes.  A couple of my scripts date back to June 2001 – but that is, at least, the right century, and some people are still using Oracle 7.

Footnote

It should be obvious that I can’t test everything on every new release – but it’s amazing how often on a client site I can recognize a symptom and pick at script that I’ve used in the past to construct the problem – and that’s when a quick bit of re-testing helps me find a solution or workaround (or Oracle bug note).

 

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,521 other followers