Oracle Scratchpad

November 12, 2020

rowsets

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:35 pm GMT Nov 12,2020

Here’s a little demonstration of the benefit of rowsets. It started with a very simple question that arrived in my in-box from Kaley Crum.

  • “Why does the query with the UNION ALL take so much longer than the query without the UNION ALL?”

Here are the two queries – surely they should take virtually the same amount of time.

select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me);


select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me
      --
      union all
      --
      select 1 from dual);

Look closely – the difference between the two queries is just the addition through a UNION ALL of “select 1 from dual”. The first query took about 3.5 seconds to complete (there were 70M rows in the table), and the second took nearly 69 seconds.

Here are the execution plans showing the rowsource execution statistics (and Query Block // Object Alias information) – first the baseline query:

QL_ID  0ph1vfuuxkbqb, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me)
 
Plan hash value: 3609429292
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |        | 26465 (100)|          |      1 |00:00:03.34 |   95896 |  95891 |
|   1 |  SORT AGGREGATE     |         |      1 |      1 |            |          |      1 |00:00:03.34 |   95896 |  95891 |
|   2 |   VIEW              |         |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.32 |   95896 |  95891 |
|   3 |    TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.27 |   95896 |  95891 |
--------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2 / from$_subquery$_001@SEL$1
   3 - SEL$2 / DROP_ME@SEL$2

And then the UNION ALL query:

SQL_ID  0chdajr28y0ub, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me       --       union all       --       select 1 from
dual)
 
Plan hash value: 3408528233
 
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |      1 |        | 26467 (100)|          |      1 |00:01:18.58 |   95896 |  95891 |
|   1 |  SORT AGGREGATE      |         |      1 |      1 |            |          |      1 |00:01:18.58 |   95896 |  95891 |
|   2 |   VIEW               |         |      1 |     70M| 26467   (2)| 00:00:02 |     70M|00:01:10.84 |   95896 |  95891 |
|   3 |    UNION-ALL         |         |      1 |        |            |          |     70M|00:00:53.13 |   95896 |  95891 |
|   4 |     TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:19.28 |   95896 |  95891 |
|   5 |     FAST DUAL        |         |      1 |      1 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |
---------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SET$1 / from$_subquery$_001@SEL$1
   3 - SET$1
   4 - SEL$2 / DROP_ME@SEL$2
   5 - SEL$3 / DUAL@SEL$3

This is not a trick question – I’ve got the SQL to create the data set and run the test; and I’ve run the test through 19.3.0.0 and 12.2.0.1 with results very similar to the above. Here’s a slightly modified version of the script – you’ll notice the column name has changed because I’ve used my usual “large data set” generator rather than scaling up a clone of all_objects:

rem
rem     Script:         rowset_kaley.sql
rem     Author:         Kaley Crum / JP Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem

create table drop_me
pctfree 0
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum  n1
from
        generator
cross join
        generator
where rownum <= 7e7
;

alter session set statistics_level = all;

set timing on
set serveroutput off

prompt  ===========================================
prompt  Baseline: Simple query without a union all.
prompt  Runs in 3-4 seconds
prompt  ===========================================

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

prompt  ==========================================
prompt  Add a UNION ALL for one record from dual.
Prompt  Runs in over a minute
prompt  ==========================================

pause Press return

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me
      --
      union all
      --
      select 1 from dual);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

There are two factors involved in the massive variation in performance. The first factor is one that the test case will report – though I didn’t show it in the output above, the second is that the test case has enabled rowsource execution statistics.

Here’s the big clue – from the Column Projection Information, which is one of the things that appears with the “advanced” format option (or when you add the “projection” format option) in the call to dbms_xplan.display_xxx()

First for the fast query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]
   2 - (rowset=1019)
   3 - (rowset=1019)

The rowsource passes 1,019 rows at a time from the tablescan operation to the view operation and from the view operation to the sort operation, for a total of roughly 70,000 calls for each of those two steps of the plan.

Compare this to the projection information for the slow UNION ALL query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]

I haven’t missed a bit in copying – this really is it. The rows move up the stack one at a time, not in a rowset array. That’s 70,000,000 subroutime calls for each of the two steps of the plan.

I’m sure most of us have heard the mantra “row by row is slow by slow” (or some variant on the theme). This is true all the way down to the internal levels of an execution plan.

Of course 70M calls vs. 70K calls shouldn’t really explain a difference of more than 60 seconds – but when the calls to the O/S for timing information for rowsource execution stats are repeated for each call (or even if it’s only a 1% sample of the calls) down the stack that’s where a lot of time can disappear.

On my laptop, running Oracle 19.3 in a VM, this is how my timing went:

  • Fast query: 1.94 seconds, dropping to 1.79 seconds when I disabled rowsource execution stats.
  • Slow query: 65.91 seconds, dropping to a far less astounding 3.26 seconds when I disabled rowsource execution stats.

So there really is a noticeable difference between row-by-row and array-processing but most of the difference in the original test came from using rowsource execution statistics to measure how much of a difference there would be.

It’s also worth mentioning that this is probably the most extreme case you could produce to show the difference – using the largest possible rowset size with the smallest possible rows when you want to mazimise the gap between (internal) array processing and single row processing – and then processing a very large number of rows. [But isn’t ridiculously large numbers of rows what you do with Exadata?].

If you want further evidence that the difference is due to the rowset size you can always alter session set “_rowsets_enabled”=false; and watch the fast query slow down. It will take about half the time of the slow query as it only has to pass 70M rows up one step of the plan rather than the two steps that are in the UNION ALL plan.

In theory it looks as if you could also restrict the size of the rowset by setting _rowsets_max_rows or _rowsets_target_maxsize, but since the default value for the former is 256 the results above suggest that the settings might be ignored, and when I tried adjusting them at the session level nothing changed.

Another test you could run is to adjust (alter session) the parameter “_rowsource_statistics_sampfreq”, When I set this to 100 the time for the slow query dropped to about 21 seconds (and the fast query – with rowsets disabled – dropped to about 11 seconds).

Footnote

It’s always easy to hit an accidental special case without realising it, so when you’re testing something it’s important to think about what’s special in the example. I came up with two possibilities in this case – dual is always going to be a very special case in any circumstances, and I’ve got a table in my query that is guaranteed to return exactly one row. So I repeated the test with:

  • a real table with one row and primary key in case the reference to dual was disabling rowsets
  • a real table with 100 rows in case this enabled a rowset size of 100

In neither case did Oracle start using array processing.

Footnote 2

One of the patterns I see occasionally follows the form of the slow query above. Some front-end tool executes a select statement that starts with a select {list of desired column headings} from dual so that the front-end code doesn’t have to be programmed to create headings as it receives the data.

Although the impact this would have on rowsets looks like a large percentage of the work done in the database in this case, in more realistic cases it would probably be irrelevant, until someone starts testing with the statistics_level set to all, or _rowsource_execution_statistics explicitly enabled.

There is a related effect, though, from the SQL Monitor feature (which activates automatically for a statement if it’s executing in parallel, or if it’s predicted to take more than 5 seconds to complete). When I added the /*+ monitor */ hint to the two queries (and didn’t fiddle with the rowsets parameter) the fast query averaged 2.26 seconds instead of 1.79 seconds, and the slow query averaged 3.51 seconds instead of 3.26 seconds. I have to say, though, that the variation between executions of the same statement was quite significant compared the variation produced by adding or eliminating the hint.

October 8, 2020

Direct Path

Filed under: Infrastructure,Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 12:29 pm BST Oct 8,2020

This is a little addendum to a note I wrote a couple of days ago about serial direct path reads and KO (fast object checkpoint) enqueue waits.

The original note was prompted by a problem where someone had set the hidden parameter “_serial_direct_read” to ‘always’ because there were running 11g and wanted some “insert as select” statements to use direct path reads on the select portion and 11g wasn’t co-operating.

Serial direct path reads were introduced as a possibility in (at least) the 8.1.7.4 timeline, but the parameter was set to false until 11gR2 where it changed to auto. (Legal values are: false, true, never, auto, always.)

In 11.2, though, even though a simple select statement could use serial direct path reads for segment scans, Oracle would not use the mechanism for “insert as select”.

This note is just a little piece of code to demonstrate the point.  Run it on 11g and (unless your buffer cache is large enough to make the test table “small”) Oracle will use direct path reads on the select, but scattered reads to cache for the insert. Upgrade to 12.1 and Oracle will use direct path reads on both.

rem
rem     Script:         serial_fail.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem

create table t1
as
select
        ao.*
from
        all_objects     ao,
        (select rownum from dual connect by level <= 16) mult
/

create table t2
as
select  *
from    t1
where   rownum = 0
/

alter system flush buffer_cache;

prompt  =============
prompt  Simple Select
prompt  =============

execute snap_events.start_snap
select * from t1 where object_id = 98765;
execute snap_events.end_snap

prompt  ================
prompt  Insert as select
prompt  ================

execute snap_events.start_snap
insert into t2
select * from t1 where object_id = 98765;
execute snap_events.end_snap

prompt  =====================
prompt  Insert as select with
prompt  _serial_direct=always
prompt  =====================

alter session set "_serial_direct_read"=always;

execute snap_events.start_snap
insert /* serial direct */ into t2
select * from t1 where object_id = 98765;
execute snap_events.end_snap

alter session set "_serial_direct_read"=auto;

The calls to the snap_events package are the to produce the change in v$session_event for my session during the SQL.

You’ll notice I’ve included three main SQL statements rather than two – the third statement (2nd execution of the insert) is to demonstrate that it is possible to get direct path reads on the insert by setting the hidden parameter to ‘always’.

One detail to remember when testing this particular feature (and the same guideline applies to some other features), the “direct / not direct” becomes an attribute of the cursor, it’s not an attribute of the execution plan. This is why I’ve added a comment to the 2nd insert; if I hadn’t done so Oracle would have reused the (identical text) cursor from the first insert, which would have resulted in scattered reads being used instead of direct path reads. This distinction between cursor and plan explains why there is not hint that will allow you to force direct path reads for a specific query (not even the infamous opt_param() hint).

Here are the three sets of results from a system running 11.2.0.4:

=============
Simple Select
=============

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               1           0           0.10        .100           4
direct path read                                    114           0          20.86        .183           6
SQL*Net message to client                             4           0           0.00        .000           0
SQL*Net message from client                           4           0           0.11        .028     174,435

================
Insert as select
================

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                              22           0           0.60        .027           4
db file scattered read                              130           0          35.97        .277           5
SQL*Net message to client                             4           0           0.01        .002           0
SQL*Net message from client                           4           0           0.10        .025     174,435

=====================
Insert as select with
_serial_direct=always
=====================

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
direct path read                                    108           0          17.44        .161           6
SQL*Net message to client                             4           0           0.00        .000           0
SQL*Net message from client                           4           0           0.09        .022     174,435


Note the db file scattered read waits in the mddle test. If you re-run the test on 12.1.0.x (or later) you’ll find that the middle set of results will change to direct path read waits.

For reference, this limitation is covered by MOS note13250070.8: Bug 13250070 – Enh: Serial direct reads not working in DML. The actual bug note is not public.

Footnote (a couple of hours later):

A response from Roger MacNicol to my publication tweet has told us that the bug note says only that direct path reads had been restricted unnecessarily and the restriction has been removed.

October 5, 2020

Direct Path

Filed under: Oracle,Performance,Problem Solving,RAC,Troubleshooting,Tuning — Jonathan Lewis @ 11:29 am BST Oct 5,2020

Here’s a note that I might have written once already – but I can’t find it and I’ve just been reminded about what it (might have) said by a posting that came up on the Oracle database forum in the last few days.

The posting in question is asking why, after setting the hidden parameter _serial_direct_read to ‘always’ a particular query is now taking hours to complete when it used to complete in a minute or so.

The answer is partly “because you’ve forced direct path serial reads”, partly “because you’re running on RAC” and (most directly) because the optimizer is using a really bad execution plan for that query and the direct path reads have had a massive impact as a consequence. (It turns out, after modelling, that the answer might also include “because you’re running 11.2.0.4”)

I’m going to demonstrate the issue by forcing a very simple query to take a very bad execution plan.

rem
rem     Script:         ko.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem 

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

create table t2
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

select  table_name, blocks 
from    user_tables
where   table_name in ('T1','T2')
;

alter system flush buffer_cache;

execute snap_events.start_snap
execute snap_my_stats.start_snap
alter session set "_serial_direct_read"=always;


select
        /*+ 
                leading(t1 t2)
                use_nl(t2)
                full(t2)
        */
        t1.object_type,
        t2.object_type
from
        t1, t2
where
        t2.object_id = t1.object_id + 0.5
;

execute snap_my_stats.end_snap
execute snap_events.end_snap

alter session set "_serial_direct_read"=auto;


My query very carefully ensures that it’s not going to return any rows; but it’s going to do a lot of work finding no data because I’ve forced Oracle into doing a tablescan of t2 for every row in t1 – so 10,000 scans of a table of 140 – 190 (depending on version) data blocks.

The snap_my_stats and snap_events packages are a couple of my simple diagnostic packages that allows me to find the change in some v$ content between the start and end snapshots. In this case it’s v$mystat and v$session_event for the session.

On a test using 11.2.0.4 the query ran for about 41 seconds with the following wait events reported:

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.11        .054           1
direct path read                                  6,527           0         368.30        .056           6
SQL*Net message to client                            13           0           0.00        .000           0
SQL*Net message from client                          13           0      10,689.26     822.251      10,689

The 6,500 “direct path read” waits corresponded to 1.33M “physical reads direct” reported in the session activity stats. Although the t2 table was fairly small Oracle was forced to use direct path reads for every single cycle through the nested loop. As a quick comparison, here are the figures if I don’t force direct path serial scans.

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.09        .047           2
db file scattered read                               34           0           2.45        .072           0
SQL*Net message to client                            11           0           0.01        .001           0
SQL*Net message from client                          11           0         174.36      15.851      82,849



We’ve waited for only 34 “db file scattered reads” and 2.45 centiseconds as we read the t2 (and the t1) tables into the cache for the first time, but then we’ve been able to revisit the blocks in the cache. We also saw a reduction in CPU usage and the total run time dropped from 41 seconds to about 22 seconds.

In this tiny example it hasn’t made a staggering difference to the overall run time, but the OP wasn’t that lucky with his “couple of minutes” to “hours”.

If you look at the fragment of the SQL Monitor report supplied by the OP you’ll see that they have an operation which shows:

=====================================================================================================================================================================================================================================
| Id    |                  Operation                    |            Name              | Rows    | Cost |  Time     |  Start | Execs |    Rows  | Read | Read  | Cell   | Mem | Activity |            Activity Detail               |
|       |                                               |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload |    |    (%)   |              (# samples)                 |
=====================================================================================================================================================================================================================================
| -> 25 |       INDEX STORAGE FAST FULL SCAN            | TMRC_IX1                     |      1  |      |     10745 |     +0 |  268K |     112K | 267K |   2GB |  96.53% | 1M |    99.74 | enq: KO - fast object checkpoint (4783)  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | Cpu (753)                                |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | gcs drm freeze in enter server mode (25) |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | latch free (1)                           |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | reliable message (3505)                  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | cell smart index scan (1635)             |
=====================================================================================================================================================================================================================================

The “index fast full scan” is an example of a “segment scan” and will be subject to direct path reads, just like a tablescan. We don’t really know how big this index is – but we can see that we have read it 268K times (Execs column) reading approximately 2GB after 267K read requests. This suggests the index is at most 1MB in size since it can be read in a single read request, and may consist of only one populated block (2,000,000,000/267,000 = 7,490 bytes. Despite this small size the total time sampled for all these scans is more than 10,600 seconds – roughly 39 millisecond per scan. That’s not very good.

Looking more closely at the sampled time we notice 3 key features:

============================================
|            Activity Detail               |
|              (# samples)                 |
============================================
| enq: KO - fast object checkpoint (4783)  |
| Cpu (753)                                |
| gcs drm freeze in enter server mode (25) |
| latch free (1)                           |
| reliable message (3505)                  |
| cell smart index scan (1635)             |
============================================
  • A massive fraction of the time was spent on “enq: KO – fast object checkpoint”
  • A large fraction of the time was spent on “reliable message”
  • There was some time (relatively small, but large in absolute terms) for “gcs drm freeze …”

The last of these three is about global cache services, distributed resource manager” and is a clear indication that the system is running RAC and we have some hefty competition for “object mastering” between instances. But the actual time lost there is relatively small – though associated chatter between instances could be significant.

But what’s the “KO enqueue”? Every time an instance starts a direct path segment scan it has to get a message to the database writer (possibly via the checkpoint process) – hence the “reliable message” waits – to copy every dirty block for that segment from the buffer cache down to disc and it has to wait for the write to complete. This is necessary to ensure that the tablescan doesn’t miss any changes that have been made in memory without yet being written to disc.

The KO enqueue synchronises this activity – I haven’t worked out the complete chain of events, but the enqueue is negotiated between the session and the checkpoint process – and if you’re running RAC every instance has to write any dirty blocks it is holding for the segment, so you have to have a degree of cross-instance chatter to make this happen.

Thanks to the enforced serial direct reads the OP’s plan – which, surely, shouldn’t expect to do 267K index fast full scans – has a massive overhead thanks to the need for the repeated object checkpoints.

You may ask, at this point, why I didn’t see any KO enqueue waits in my test results – the answer is simple, I’d flushed the buffer cache before I started the test, so there were no dirty blocks for the session to worry about. Let’s see what happens if I introduce a little activity to keep dirtying a few blocks in the t2 table. Here’s a little loop that will update a few rows once per second:

begin
        for i in 1..1 loop
                update t2 set data_object_id = 0 where mod(object_id,1000) = i;
                dbms_output.put_line(sql%rowcount);
                commit;
                dbms_lock.sleep(1);
        end loop;
end;
/

You’ll notice the code runs through the loop just once – I started with a loop count of 30, and discovered it wasn’t necessary, but the option remains for further testing.

If I execute this anonymous block (which updates about 10 rows each time through the loop) from another session just after SQL*Plus reports my “alter session”, so that it runs just after the query starts, this is what the session event report looks like if I run the test against 11.2.0.4 (the version reported by the OP):

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
enq: KO - fast object checkpoint                  7,645           0         290.34        .038           6
db file sequential read                               2           0           0.13        .066           2
direct path read                                 10,714           0         675.50        .063           6
SQL*Net message to client                            14           0           0.00        .000           0
SQL*Net message from client                          14           0       1,556.23     111.160     101,653
events in waitclass Other                         5,607           0         218.04        .039           8


Suddenly we see a lot of time spent on the KO enqueue waits and the “events in waitclass Other” (which turn out to be “reliable message” waits). Apparently the session keeps finding dirty t2 blocks in the cache and telling the database writer they need to be written to disc before the next tablescan of t2 can take place.

There’s something odd here, though and I’ll introduce it with this comment: when I repeated the test on 19.3 (even with with the constant trickle of updates once per second), we only see a tiny number of KO enqueues and reliable message waits – the fact that we see a huge number of them in 11g is a defect in the design of the code.

Think about what’s happening with the KO enqueue: when you start the first tablescan of t2 you force every dirty block for that segment to be copied from the cache to the disc.

As the tablescan proceeds you may have to apply some undo change vectors to the blocks you’re reading to take them back to the SCN as at the start of query execution, but you know that any data that had been committed before the query started is on disc (even if it has been over-written by committed changes made after the query started, or by uncommitted changes made before the query started). What’s on the disc right now will be usable to get the correct read-consistent version of the data for the duration of the query run, no matter how many newer changes are made, whether or not they over-write the disc blocks before the query ends. There is no need to force write any dirty blocks as the tablescan is repeated and, it seems, by 19.3 the code has been adjusted to accomodate that fact.

Footnote

After I had posted this comment on the forum, the OP raised the question of whether or not the fix might apply to 12c as well – so I ran up a VM of 12.1.0.2 and 12.2.0.1 and re-ran the tests. The results were initially promising – neither version reported an extreme number of KO enqueue waits or reliable message waits.

However when I changed the loop counter from 1 back to 30 I found that the waits re-appeared – though the numbers were significantly less than those from 11g – so perhaps there’s a timing element involved that might need further investigation and stress testing even for 19.3.

Footnote 2

Another snapshot I took in testing was from the v$enqueue stats – which showed that (approximately) for every KO enqueue wait my session reported, the instance reported about 10 – 12 KO enqueue requests.

When checking v$enqueue_stats it’s important to remember that session activity stats (v$sesstat) report both “enqueue requests” and “enqueue conversions”. In v$enqueue_stats the conversions aren’t reported separately the view simply adds the two figures together under “requests”.

In the 11g test the session reported 7,645 KO enqueue waits, but the session activity stats reported 19,399 enqueue requests and 38,796 enqueue conversions; while v$enqueue_stats reported 96,990 KO enqueue requests. The remaining 38,796 KO enqueue requests were made by the checkpoint process (CKPT) – and it was only my session that repoted any waits for KO enqueue requests.

Without further low-level investigation this is what leads me to believe that the session sends CKPT a message that it wants an object-level checkpoint performed and waits for the message to be acknowledged (reliable message) before trying to convert a low-level KO enqueue to an exclusive one. But CKPT has acquired and converted the same KO enqueue before acknowledging the message from the session and will only release the enqueue when the checkpoint is complete. So this mechanism ensures that the session will have to wait until the checkpoint is complete and CKPT releases the enqueue before it can continue processing.

(P.S. If you enable event 10704 you will find that there seem to be two KO enqueues involved – one with id2 = 1, one with id2 = 2.)

September 30, 2020

Serial Bloom

Filed under: Execution plans,Oracle,Parallel Execution,Performance — Jonathan Lewis @ 1:35 pm BST Sep 30,2020

Following the recent note I wrote about an enhancement to the optimizer’s use of Bloom filters, I received a question by email asking about the use of Bloom filters in serial execution plans:

I’m having difficulty understanding the point of a Bloom filter when used in conjunction with a hash join where everything happens within the same process.

I believe you mentioned in your book (Cost Based Oracle) that hash joins have a mechanism similar to a Bloom filter where a row from the probe table is checked against a bitmap, where each hash table bucket is indicated by a single bit. (You have a picture on page 327 of the hash join and bitmap, etc).

The way that bitmap looks and operates appears to be similar to a Bloom filter to me…. So it looks (to me) like hash joins have a sort of “Bloom filter” already built into them.

My question is… What is the advantage of adding a Bloom filter to a hash join if you already have a type of Bloom filter mechanism thingy built in to hash joins?

I can understand where it would make sense with parallel queries having to pass data from one process to another, but if everything happens within the same process I’m just curious where the benefit is.

 

The picture on page 327 of CBO-F is a variation on the following, which is the penultimate snapshot of the sequence of events in a multi-pass hash join. The key feature is the in-memory bitmap at the top of the image describing which buckets in the (partitioned and spilled) hash table hold rows from the build table. I believe that it is exactly this bitmap that is used as the Bloom filter.

The question of why it might be worth creating and using a Bloom filter in a simple serial hash join is really a question of scale. What is the marginal benefit of the Bloom filter when the basic hash join mechanism is doing all the hash arithmetic and comparing with a bitmap anyway?

If the hash join is running on an Exadata machine then the bitmap can be passed as a predicate to the cell servers and the hash function can be used at the cell server to minimise the volume of data that has to be passed back to the database server – with various optimisations dependent on the version of the Exadata software. Clearly minimising traffic through the interconnect is going to have some benefit.

Similarly, as the email suggests, for a parallel query where (typically) one set of parallel processes will read the probe table and distribute the data to the second set of parallel processes which then do the hash join it’s clearly sensible to allow the first set of procsses to apply the hash function and discard as many rows as possible before distributing the survivors – minimising inter-process communication.

In both these cases, of course, there’s a break point to consider of how effective the Bloom filter needs to be before it’s worth taking advantage of the technology. If the Bloom filter allows 99 rows out of every hundred to be passed to the database server / second set of parallel processes then Oracle has executed the hash function and checked the bitmap 100 times to avoid sending one row (and it will (may) have to do the same hash function and bitmap check again to perform the hash join); on the other hand if the Bloom filter discards 99 rows and leaves only one row surviving then that’s a lot of traffic eliminated – and that’s likely to be a good thing. This is why there are a few hidden parameters defining the boundaries of when Bloom filters should be used – in particular there’s a parameter “_bloom_filter_ratio” which defaults to 35 and is, I suspect, a figure which says something like “use Bloom filtering only if it’s expected to reduce the probe data to 35% of the unfiltered size”.

So the question then becomes: “how could you benefit from a serial Bloom filter when it’s the same process doing everything and there’s no “long distance” traffic going on between processes?” The answer is simply that we’re operating at a much smaller scale. I’ve written blog notes in the past where the performance of a query depends largely on the number of rows that are passed up a query plan before being eliminated (for example here, where the volume of data moving results in a significant fraction of the total time).

If you consider a very simple hash join its plan is going to be shaped something like this:


-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    45 |   720 |    31 |
|*  1 |  HASH JOIN         |      |    45 |   720 |    31 |
|*  2 |   TABLE ACCESS FULL| T2   |    15 |   120 |    15 |
|   3 |   TABLE ACCESS FULL| T1   |  3000 | 24000 |    15 |
-----------------------------------------------------------

If you read Tanel Poder’s article on execution plans as a tree of Oracle function calls you’ll appreciate that you could translate this into informal English along the lines of:

  • Operation 1 calls a function (at operation 2) to do a tablescan of t1 and return all the relevant rows, building an in-memory hash table by applying a hashing function to the join column(s) of each row returned by the call to the tablescan. As the hash table is populated the operation also constructs a bitmap to flag buckets in the hash table that have been populated.
  • Operation 1 then calls a function (at operation 3) to start a tablescan and then makes repeated calls for it to return one row (or, in newer versions, a small rowset) at a time from table t2. For each row returned operation 1 applies the same hash function to the join column(s) and checks the bitmap to see if there’s a potential matching row in the relevant bucket of the hash table, and if there’s a potential match Oracle examines the actual contents of the bucket (which will be stored as a linked list) to see if there’s an actual match.

Taking the figures above, let’s imagine that Oracle is using a rowset size of 30 rows. Operation 1 will have to make 100 calls to Operation 3 to get all the data, and call the hashing function 3,000 times.  A key CPU component of the work done is that the function represented by operation 3 is called 100 times and (somehow) allocates and fills an array of 30 entries each time it is called.

Now assume operation 1 passes the bitmap to operation 3 as an input and it happens to be a perfect bitmap. Operation 3 starts its tablescan and will call the hash function 3,000 times, but at most 45 rows will get past the bitmap. So operation 1 will only have to call operation 3 twice.  Admittedly operation 1 will (possibly) call the hash function again for each row – but maybe operation 3 will supply the hash value in the return array. Clearly there’s scope here for a trade-off between the reduction in work due to the smaller number of calls and the extra work needed to take advantage of the bitmap technology.

Here’s an example that shows the potential for savings – if you want to recreate this test you’ll need about 800MB of free space in the database, the first table takes about 300MB and the second about 450MB.


rem
rem     Script:         bloom_filter_serial_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,30,'0')             v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7 -- > comment to avoid WordPress format issue
;

create table t2
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        round(rownum + 0.5,2)           id,
        mod(rownum,1e5)                 n1,
        lpad(rownum,10)                 v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7 -- > comment to avoid WordPress format issue
;


prompt  =================
prompt  With Bloom filter
prompt  =================

select 
        /*+ 
                px_join_filter(t1) 
                monitor
        */
        t1.v1, t2.v1
from 
        t2, t1
where 
        t2.n1 = 0
and 
        t1.id = t2.id
/

prompt  ===============
prompt  No Bloom filter
prompt  ===============

select 
        /*+
                monitor
        */
        t1.v1, t2.v1
from 
        t2, t1
where 
        t2.n1 = 0
and 
        t1.id = t2.id
/

I’ve created tables t1 and t2 with an id column that never quite matches, but the range of values is set so that the optimizer thinks the two tables might have a near-perfect 1 to 1 match. I’ve given t2 an extra column with 105 distinct values in its 107 rows, so it’s going to have 100 rows per distinct value. Then I’ve presented the optimizer with a query that looks as if it’s going to find 100 rows in t2 and needs to find a probable 100 rows of matches in t1. For my convenience, and to highlight a couple of details of Bloom filters, it’s not going to find any matches.

In both runs I’ve enabled the SQL Monitor feature with the /*+ monitor */ hint, and in the first run I’ve also hinted the use of a Bloom filter. Here are the resulting SQL Monitor outputs. Bear in mind we’re looking at a reasonably large scale query – volume of input data – with a small result set.

First without the Bloom filter:


Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    3.00 |    2.24 |     0.77 |     1 |  96484 |  773 | 754MB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=2959412835)
==================================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT     |      |         |       |         2 |     +2 |     1 |        0 |      |       |     . |          |                 |
|  1 |   HASH JOIN          |      |     100 | 14373 |         2 |     +2 |     1 |        0 |      |       |   2MB |          |                 |
|  2 |    TABLE ACCESS FULL | T2   |      99 |  5832 |         2 |     +1 |     1 |      100 |  310 | 301MB |     . |          |                 |
|  3 |    TABLE ACCESS FULL | T1   |     10M |  8140 |         2 |     +2 |     1 |      10M |  463 | 453MB |     . |          |                 |
==================================================================================================================================================

According to the Global Stats the query has taken 3 seconds to complete, of which 2.24 seconds is CPU. (The 750MB read in 0.77 second would be due to the fact that I’m running off SSD, and I’ve got a 1MB read size that helps). A very large fraction of the CPU appears because of the number of calls from operation 1 to operation 3 (the projection information pulled from memory reports a rowset size of 256 rows, so that’s roughly 40,000 calls to the function.

When we force the use of a Bloom filter the plan doesn’t change much (though the creation and use of the Bloom filter has to be reported) – but the numbers do change quite significantly.

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    1.97 |    0.99 |     0.98 |     1 |  96484 |  773 | 754MB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=4148581417)
======================================================================================================================================================
| Id |       Operation       |  Name   |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                       |         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
======================================================================================================================================================
|  0 | SELECT STATEMENT      |         |         |       |         1 |     +1 |     1 |        0 |      |       |     . |          |                 |
|  1 |   HASH JOIN           |         |     100 | 14373 |         1 |     +1 |     1 |        0 |      |       |   1MB |          |                 |
|  2 |    JOIN FILTER CREATE | :BF0000 |      99 |  5832 |         1 |     +1 |     1 |      100 |      |       |     . |          |                 |
|  3 |     TABLE ACCESS FULL | T2      |      99 |  5832 |         1 |     +1 |     1 |      100 |  310 | 301MB |     . |          |                 |
|  4 |    JOIN FILTER USE    | :BF0000 |     10M |  8140 |         1 |     +1 |     1 |    15102 |      |       |     . |          |                 |
|  5 |     TABLE ACCESS FULL | T1      |     10M |  8140 |         1 |     +1 |     1 |    15102 |  463 | 453MB |     . |          |                 |
======================================================================================================================================================


In this case, the elapsed time dropped to 1.97 seconds (depending on your viewpoint that’s either a drop of “only 1.03 seconds” or drop of “an amazing 34.3%”; with the CPU time dropping from 2.24 seconds to 0.99 seconds (55.8% drop!)

In this case you’ll notice that the tablescan of t1 produced only 15,102 rows to pass up to the hash join at operation 1 thanks to the application of the predicate (not reported here): filter(SYS_OP_BLOOM_FILTER(:BF0000,”T1″.”ID”)). Instead of 40,000 calls for the next rowset the hash function has been applied during the tablescan and operation 5 has exhausted the tablescan after only about 60 calls. This is what has given us the (relatively) significant saving in CPU.

This example of the use of a Bloom filter highlights up the two points I referred to earlier.

  • First, although we see operations 4 and 5 as Join (Bloom) filter use and Table access full respectively I don’t think the data from the tablescan is being “passed up” from operation 5 to 4; I believe operation 4 can be views as a “placeholder” in the plan to allow us to see the Bloom filter in action, the hashing and filtering actually happening during the tablescan.
  • Secondly, we know that there are ultimately no rows in the result set, yet the application of the Bloom filter has not eliminated all the data. Remember that the bitmap that Oracle constructs of the hash table identifies used buckets, not actual values. Those 15,102 rows are rows that “might” find a match in the hash table because they belong in buckets that are flagged. A Bloom filter won’t discard any data that is needed, but it might fail to eliminate data that subsequently turns out to be unwanted.

How parallel is parallel anyway?

I’ll leave you with one other thought. Here’s an execution plan from 12c (12.2.0.1) which joins three dimension tables to a fact table. There are 343,000 rows in the fact table and the three joins individually identify about 4 percent of the data in the table. In a proper data warehouse we might have been looking at a bitmap star transformation solution for this query, but in a mixed system we might want to run warehouse queries against normalised data – this plan shows what Bloom filters can do to minimise the workload. The plan was acquired from memory after enabling rowsource execution statistics:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |        |        |      |            |      1 |00:00:00.05 |      22 |      3 |       |       |          |
|   1 |  SORT AGGREGATE              |          |      1 |      1 |        |      |            |      1 |00:00:00.05 |      22 |      3 |       |       |          |
|   2 |   PX COORDINATOR             |          |      1 |        |        |      |            |      2 |00:00:00.05 |      22 |      3 | 73728 | 73728 |          |
|   3 |    PX SEND QC (RANDOM)       | :TQ10000 |      0 |      1 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   4 |     SORT AGGREGATE           |          |      2 |      1 |  Q1,00 | PCWP |            |      2 |00:00:00.09 |    6681 |   6036 |       |       |          |
|*  5 |      HASH JOIN               |          |      2 |     26 |  Q1,00 | PCWP |            |     27 |00:00:00.09 |    6681 |   6036 |  2171K|  2171K|     2/0/0|
|   6 |       JOIN FILTER CREATE     | :BF0000  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|*  7 |        TABLE ACCESS FULL     | T3       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|*  8 |       HASH JOIN              |          |      2 |    612 |  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6634 |   6026 |  2171K|  2171K|     2/0/0|
|   9 |        JOIN FILTER CREATE    | :BF0001  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 10 |         TABLE ACCESS FULL    | T2       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 11 |        HASH JOIN             |          |      2 |  14491 |  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6614 |   6022 |  2171K|  2171K|     2/0/0|
|  12 |         JOIN FILTER CREATE   | :BF0002  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 13 |          TABLE ACCESS FULL   | T1       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|  14 |         JOIN FILTER USE      | :BF0000  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  15 |          JOIN FILTER USE     | :BF0001  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  16 |           JOIN FILTER USE    | :BF0002  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  17 |            PX BLOCK ITERATOR |          |      2 |    343K|  Q1,00 | PCWC |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|* 18 |             TABLE ACCESS FULL| T4       |     48 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.05 |    6594 |   6018 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

It’s a parallel plan, but it’s used the 12c “PQ_REPLICATE” strategy. The optimizer has decided that all the dimension tables are so small that it’s going to allow every PX process to read every (dimension) table through the buffer cache and build its own hash tables from them. (In earlier versions you might have seen the query coordinator scanning and broadcasting the three small tables, or one set of PX processes scanning and broadcasting to the other set).

So every PX process has an in-memory hash table of all three dimension tables and then (operation 17) they start a tablescan of the fact table, picking non-overlapping rowid ranges to scan. But since they’ve each created three in-memory hash tables they’ve also been able to create three Bloom filters each, which can all be applied simultaneously as the tablescan takes place; so instead of 343,000 rows being passed up the plan and through the first hash join (where we see from operation 11 that the number of surviving rows would have been about 14,500 ) we see all but 27 rows discarded very early on in the processing. Like bitmap indexes part of the power of Bloom filters lies in the fact that with the right plan the optimizer can combine them and identify a very small data set very precisely, very early.

The other thing I want you to realise about this plan, though, is that it’s not really an “extreme” parallel plan. It’s effectively running as a set of concurrent, non-interfering, serial plans. Since I was running (parallel 2) Oracle started just 2 PX processes: they both built three hash tables from the three dimension tables then split the fact table in half and took half each to do all the joins, and passed the nearly complete result to the query co-ordinator at the last moment. That’s as close as you can get to two serial, non-interfering, queries and still call it a parallel query. So, if you wonder why there might be any benefit in serial Bloom filters – Oracle’s actually being benefiting from it under the covers for several years.

Summary

Bloom filters trade a decrease in messaging against an increase in preparation and hashing operations. For Exadata systems with predicate offloading it’s very easy to see the potential benefit; for general parallel execution; it’s also fairly easy to see the potential benefit for parallel query execution what inter-process message between two sets of PX processes can be resource intensive; but even for serial queries there can be some benefit though, in absolute terms, they are likely to be only a small saving in CPU.

 

August 19, 2020

Subquery with OR

Filed under: 12c,Execution plans,Hints,Oracle,Performance — Jonathan Lewis @ 2:03 pm BST Aug 19,2020

I’ve written a couple of notes in the past about the problems of optimising queries with predicates of the form “or exists {subquery}”. A recent question on the Oracle Developer Community forum brought to my attention an improvement in this area in (very precisely) 12.2, as well as giving me a cute example of how the first cut of a new feature doesn’t always cover every detail, and creating a nice example of how the new technology enhances the old technology.

We start with some data and a simple query running under 12.2.0.1:

rem
rem     Script:         exists_with_or_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2  -- feature not implemented
rem

create table cat_contact(
        contact_method_id       varchar2(1) not null,
        contact_id              number(8,0) not null,
        small_vc                varchar2(10),
        padding                 varchar2(100)
);

alter table cat_contact add constraint cc_pk primary key(contact_id);
create index cc_i1 on cat_contact(contact_method_id);

insert into cat_contact
select
        chr(64 + case when rownum <= 10 then rownum else 26 end),
        rownum,
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_contact where contact_method_id in ('A','B','C');

create table cat_item(
        contact_id      number(8,0) not null,
        item_category   varchar2(1) not null,
        small_vc        varchar2(10),
        padding         varchar2(100),
        constraint ci_ref_cc foreign key(contact_id) references cat_contact
)
;

alter table cat_item add constraint ci_pk primary key(contact_id, item_category);
create index ci_i1 on cat_item(item_category);

insert into cat_item 
select
        rownum,
        chr(88 + case when rownum <= 10 then mod(rownum,2) else 2 end),
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_item where item_category in ('X','Y');

execute dbms_stats.gather_table_stats(user,'cat_contact')
execute dbms_stats.gather_table_stats(user,'cat_item')

I’ve created and populated two tables (the table and column names come from the ODC thread). There’s a foreign key relationship defined between cat_item and cat_contact, both tables have primary keys declared, with a couple of extra columns declared not null.

I’ve populated the two tables with a small amount of data and each table has one column rigged with very skewed data:

  • cat_contact.contact_method_id is mostly ‘Z’ with one row each of ‘A’ to ‘J’ ,
  • cat_item.item_category (the second column in the primary key) is mostly ‘Z’ with 5 rows each of ‘X’ and ‘Y’

After populating each table I’ve queried it in a way which means the subsequent stats gathering will create frequency histograms on these two columns and the optimizer will be able to take advantage of the skew in its arithmetic, which means it may choose to use the indexes I’ve created on those skewed columns if the right values appear in the queries.

So here’s the query we’re interested in:

SELECT  /*+ 
                qb_name(main) 
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR      c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

Here’s the default execution plan (in 12.2.0.1 with my settings for system stats and various other optimizer-related figures that MIGHT make a difference) pulled from memory after executing the query to return 10 rows.


-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |       |       |    34 (100)|          |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL | CAT_CONTACT | 10000 |  1152K|    34   (6)| 00:00:01 |
|   3 |   INLIST ITERATOR   |             |       |       |            |          |
|*  4 |    INDEX UNIQUE SCAN| CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("C"."CONTACT_METHOD_ID") OR  IS NOT NULL))
   4 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR
              "I"."ITEM_CATEGORY"='Y')))

For every row in the cat_contact table Oracle has checked whether or not the contact_method is an ‘A’, ‘B’, or ‘C’ and passed any such rows up to its parent, for all other rows it’s then executed the subquery to see if the row with the matching contact_id in contact_item has an ‘X’ or ‘Y’ as the item_category. It’s had to run the subquery 9,997 times (there were only three rows matching ‘A’,’B’,’C’) and the INLIST ITERATOR at operation 3 means that it’s probed the index nearly 20,000 timtes. This does not look efficient.

I’ve said in previous articles that when you need to optimize queries of this shape you need to rewrite them as UNION ALL queries to separate the two parts of the complex OR predicate and then make sure that you don’t report any items twice – which you do by making use of the lnnvl() function. So let’s do this – but let’s do it the lazy “new technology” way by upgrading to 19c and executing the query there; here’s the plan I got in 19.3.0.0:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

The optimizer has used the new “cost-based OR-expansion” transformation to rewrite the query as a UNION ALL query. We can see an efficient access into cat_contact to identify the ‘A’,’B’,’C’ rows, and then we can see that the second branch of the union all handles the existence subquery but the optimizer has unnested the subquery to select the 10 rows from cat_item where the item_category is ‘X’ or ‘Y’ and used those rows in a nested loop to drive into the cat_contact table using the primary key. We can also see the use of the lnnvl() function in operation 13 that ensures we don’t accidentally report the ‘A’,’B’,’C’ rows again.

So let’s go back to 12.2.0.1 and see what happens if we just add the /*+ or_expand(@main) */ hint to the query. Here’s the resulting execution plan:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

We get exactly the plan we want – with the same cost as the 19c cost, which happens to be less than half the cost of the default plan that we got from 12.2.0.1. So it looks like there may be case where you will need to hint OR-expansion because is might not appear by default.

Other Observations 1 – ordering

You may have noticed that my query has, unusually for me, put the existence subquery first and the simple filter predicate second in the where clause. I don’t like this pattern as (over time, and with different developers modifying queries) it’s too easy in more complex cases to “lose” the simple predicate; a one-liner can easily drift, change indents, get bracketed with another predicate that it shouldn’t be connected with and so on. I’ve actually seen production systems producing wrong results because little editing accidents like this (counting brackets is the classic error) have occured – so I’m going to rerun the test on 12.2.0.1 with the predicates in the order I would normally write them.

Here’s the “corrected” query with its execution plan:


SELECT  /*+ 
                qb_name(main) 
                or_expand(@main)
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                c.contact_method_id IN ('A', 'B', 'C')  
        OR
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        )
;  


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    16 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    16   (7)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   4 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   5 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   6 |       INLIST ITERATOR                     |                 |       |       |            |          |
|   7 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|*  8 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|*  9 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|  10 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
|* 11 |    FILTER                                 |                 |       |       |            |          |
|  12 |     INLIST ITERATOR                       |                 |       |       |            |          |
|  13 |      TABLE ACCESS BY INDEX ROWID BATCHED  | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN                    | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|  15 |     INLIST ITERATOR                       |                 |       |       |            |          |
|* 16 |      INDEX UNIQUE SCAN                    | CI_PK           |     1 |     6 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
   9 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  11 - filter(LNNVL( IS NOT NULL))
  14 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  16 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y')))

The execution plan has jumped from 14 lines to 17 lines, the cost has gone up from 14 to 16, and both branches of the plan now report access to cat_contact and cat_item (though only through its primary key index in the second branch). What’s happened?

Oracle 12.2.0.1 has rewritten the query as a UNION ALL working from the bottom up – so in this case the first branch of the rewrite handles the original filter subquery, unnesting it to drive efficient from cat_item to cat_contact. This means the second branch of the rewrite has to find the ‘A’,’B’,’C’ rows in cat_contact and then check that the filter subquery hadn’t previously reported them – so the optimizer has applied the lnnvl() function to the filter subquery – which you can nearly see in the Predicate Information for operation 11.

To make it clearer, here’s what you get as the predicate information for that operation after calling explain plan and dbms_xplan.display()

  11 - filter(LNNVL( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1)))

In 12.2 the order of predicates in your query seems to be important – unless told otherwise the optimizer is working from the bottom-up (then rewriting top-down). But there is hope (though not documented hope). I added the /*+ or_expand(@main) */ hint to the query to force OR-expansion. Checking the Outline Information of the plan I could see that this had been expanded to /*+ or_expand(@main (1) (2)) */. Taking a wild guess as the significance of the numbers and changing the hint to /*+ or_expand(@main (2) (1) */ I re-ran the test and back to the more efficient plan – with the filter subquery branch appearing second in the UNION ALL view and the lnnvl() applied to the simpler predicate.

So the OR-expansion code is not fully cost-based in 12.2.0.1, but you can modify the behaviour through hinting. First to force it to appear (which may not happen even if it seems to be the lower cost option), and secondly to control the ordering of the components of the UNION ALL. As with all things relating to hints, though, act with extreme caution: we do not have sufficient documentation explaining exactly how they work, and with some of them we don’t even know whether the code path is even complete yet.

Other Observations 2 – 12cR1

The or_expand() hint and cost-based OR-expansion appeared specifically in 12.2.0.1; prior to that we had a similar option in the use_concat() hint and concatenation – which also attempts to rewrite your query to produce a union all of disjoint data sets. But there are restrictions on what concatentation can do. I rarely remember what all the restrictions are, but there are two critical restrictions:

  • first, it will only appear by default if there is an indexed access path available to drive every branch of the rewrite
  • secondly, it will not apply further transformations to the separate branches that it produces

If we try adding the or_expand() hint to our query in 12.1.0.2 it will have no effect, so let’s add a suitable use_concat() hint and see what happens:

explain plan for
SELECT  /*+ 
                qb_name(main) 
                use_concat(@main 8 or_predicates(1))
--              use_concat(@main   or_predicates(1))
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR
                c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             | 10000 |  1152K|    40   (3)| 00:00:01 |
|   1 |  CONCATENATION                        |             |       |       |            |          |
|   2 |   INLIST ITERATOR                     |             |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| CAT_CONTACT |     3 |   354 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | CC_I1       |     3 |       |     3   (0)| 00:00:01 |
|*  5 |   FILTER                              |             |       |       |            |          |
|*  6 |    TABLE ACCESS FULL                  | CAT_CONTACT |  9997 |  1151K|    35   (6)| 00:00:01 |
|   7 |    INLIST ITERATOR                    |             |       |       |            |          |
|*  8 |     INDEX UNIQUE SCAN                 | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C')
   5 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1))
   6 - filter(LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C'))
   8 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

26 rows selected.

As you can see by forcing concatentation I’ve got my “union all” view with lnnvl() applied in the second branch. But the second branch was the “select where exists()” branch and the optimizer has not been able (allowed?) to do the unnesting that would let it drive efficiently from the cat_item table to the cat_contact table. The effect of this is that the plan still ends up with a full tablescan of cat_contact running a filter subquery on virtually every row- so concatenation doesn’t save us anything.

The significance of the “8” in the hint, by the way is (I believe) that it tells the optimizer to use inlist iterators when possible. If I had omitted the “8” the plan would have had 4 branches – one each for ‘A’, ‘B’, and ‘C’ and the fourth for the filter subquery. I could also have added a hint /*+ use_concat(@subq or_predicates(1)) */ to replace operations 7 and 8 with a single index range scan with a filter predicate for the ‘X’/’Y’ check (and that might, in any case, be slightly more efficient than the iteration approach).

Footnote(s)

The “legacy” OR-expansion (“concatenation” a.k.a. LORE in the optimizer trace file) can be controlled through the hints use_concat(), and no_expand().

The new cost-based OR-expansion (now ORE in the optimizer trace file) can be controlled through the hints or_expand() and no_or_expand().

The new cost-based OR-expansion has some restrictions, for example it is explicitly blocked in a MERGE statement, even in 19c, as reported in this blog note by Nenad Noveljic. As the blog note shows, concatenation is still possible but you (may) have to disable cost based OR-expansion.

I scanned the executable for the phrase “ORE: bypassed” to see if there were any messages that would suggest other reasons why cost-based OR-expansion would not be used; unfortunately the only relevant string was “ORE: bypassed – %s” [update (see comment 5 below): after ignoring case there was a second option: “ORE: Bypassed for disjunct chain: %s.”] – in other words all the possible bypass messages would be filled in on demand. I found a list of messages that might be relevant; I’d be a little cautious about trusting it but if you don’t see the feature appearing when you’re expecting it then it might be worth checking whether one of these could apply.

  • Old OR expansion hints present
  • Semi join hint present
  • QB has FALSE predicate
  • QB marked for NO Execution
  • Full Outer join QB
  • Rownum found in disjunction
  • Anti/semi/outer join in disjunction
  • Opposite Range in disjunction
  • No Index or Partition driver found
  • Predicate chain has all constant predicates
  • Negated predicate found
  • Long bitmap inlist in OR predicate
  • PRIOR expression in OR predicate
  • All 1-row tables found
  • No valid predicate for OR expansion
  • Disjunctive subquery unnesting possible
  • Subquery unnesting possible
  • Subquery coalesced query block
  • Merge view query block

Finally – here’s another reference blog note comparing LORE with ORE from Mohamed Houri.

 

July 15, 2020

Fetch First vs. Rownum

Filed under: Execution plans,Oracle,Partitioning,Performance,Problem Solving,Tuning — Jonathan Lewis @ 10:11 am BST Jul 15,2020

I’ve pointed out fairly frequently that if you’re running Standard Edition but would like to take advantage of a few features of the Partitioning option then you might be able to do something appropriate with Partition Views (but I’ve just discovered while searching my blog for a suitable item to link to that I haven’t published any of my PV notes on the blog).

I’ve also pointed out that while 12c allows you to use “fetch first N rows” instead of “where rownum <= N” there’s a hidden threat to using the feature because “fetch first N” turns into a hidden row_number() over() analytic function.

Today’s note is a combination of these two topics, prompted by a request to solve a particular coding issue that has appeared a few times on the Oracle Developer Forum and is likely to be recognisable to a number of people.

I’ll start with a very simple model demonstrateing the simplest use of “fetch first N”:


rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);

With this data set I want to write a query that selects rows for class A where id > 9500, ordered by id – but I only want the first two rows. Here’s a very simple query that gets the result I want, followed by the execution plan from 12.2.0.1 (the A-Rows and E-Rows from 19.3 are slightly different):

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)

The plan – with its stats – shows us that we’ve done an index range scan of an index which will return the data in exactly the order we want, and the “fetch first 2 rows” has been translated into the row_number() over() that we expected; but to our great joy the “window sort stopkey” makes the processing stop very early because Oracle recognises that the base data is arriving in the right order so it isn’t necessary to fetch all of it and sort it. The A-Rows column confirms this interpretation of what has happened.

You might notice, by the way, that the optimizer has costed the query as if it were fetching all the rows even though it “knows” that it’s going to fetch only the first two rows. That’s why I had to include the index hint to make the optimizer use the obvious index – a popular alternative is to use the /*+ first_rows(N) */ hint where N matches, or is similar to, the number of rows you want to fetch. If I had omitted the hint the optimizer would have done a full tablescan and then applied a “window sort pushed rank” operation to sort and limit the result to 2 rows.

So now we come to the real problem: the user has a “current” table and an identical “history” table, and would like to replace the table reference with a reference to a union all view for their clients on Standard Edition, or to a partitioned table for clients running Enterprise Edition – and they don’t really want to do any other code changes. So let’s see what happens when we model the union all. I started with a table called t_odd that held only odd values for id, so I’m going to add a table called t_even that holds only even values for id.

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

As you can see t_even is an identically structured table with similar data, and I’ve created a union all view on top of the two tables, changing the query to reference the view rather than referencing a table. Thanks to the costing anomaly (combined with the small size of the tables) I’ve had to supply a couple of “global” hints to tell the optimizer to use the indexes to access the two tables. So how well does the optimizer do its job when we have a union all view?


----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)

Answer: Bad luck, the optimizer isn’t smart enought to find a cheap way through this query. It’s fetched all the relevant data from the two tables before applying the window sort (which it does with some efficiency – the pushed rank) to produce the right answer. As you can see from the A-Rows column, though, it’s had to acquire a couple of hundred rows from each table before getting down to the 2 rows we wanted.

Partitioned Tables

So let’s try to solve the problem by buying into the partitioning option and creating a list-partitioned table with two partitions, one flagged for current data and one flagged for history data – or ‘O’dd and ‘E’ven data as I’ve created them in my model.



create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')

Note particularly that I have created a local index on this partitioned table – so there’s a very close correspondance between the two tables in the previous example and the two partitions in this example. Here’s the plan when I query the partitioned table for the first two rows:


select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)

The optimizer has let us down again. The plan shows us that we have to acquire all the relevant data from the two partitions before applying the row_number() analytic function and discarding all but the two rows we want. (Again we can check the A-Rows column to see that we have started by fetching a total of 404 rows from the table.)

But what happens if we fall back to the good old-fashioned (non-standard) rownum method:


select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)

Oracle really knows how to use rownum well – notice how there is a count stopkey operation as a child to the partition list all operation, and that’s where our rownum <= 2 predicate is first applied. For each partition Oracle finds “the first two rows” and after it has collected two rows from every partition it sorts them (again with a stopkey) to find the top two in that subset. Check the A-Rows column – we selected a total of 4 rows from the table (2 per partition) and reduced that to 2 rows at operation 3.

Conclusion

There are some patterns of processing where partitioned tables can be a lot friendlier to coders than partition views; if you do have to stick with Standard Edition you can usually get what you want but the coding investment may be significantly higher. Even with partitioned tables, though, there are some “old-fashioned” Oracle methods that do a much nicer job than some of the new-fangled “ANSI” mechanisms.

Footnote

Part of the problem presented here revolves around the desire to keep a pattern of SQL generation that already exists, doing nothing more than replacing a table name with a view (or partitioned table) name.

As we’ve seen, if you start with a simple heap table and try to replace it with a partitioned table you have to use the rownum mechanism rather than the fetch first N rows mechanism.

If you’re running with Standard Edition you can’t do anything simple to replace a table name with the name of a union all view; you’d have to change your code generator to apply all the predicates twice (once for each table) and then apply the rownum predicate or fetch first directive again outside the union all. In other words you have to emulate exactly what Oracle EE manages to do with partitioned tables and rownum.

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)


As you can see, the E-Rows still predicts a lot of work, but the A-Rows tells us the work was kept to the minimum we want.

 

July 13, 2020

Min/Max costing

Filed under: Oracle,Performance,subqueries,Tuning — Jonathan Lewis @ 1:07 pm BST Jul 13,2020

A question about the min/max index scan appeared on the Oracle Developer Community forum recently. The query supplied in the thread was a little odd – you might ask why anyone would run it as it stands – and I’ve modified it to make it even stranger to demonstrate a range of details.

I’ll start with a simple data set, not bothering to collect stats because that will be done automatically on create for my versions:

rem
rem     Script:         min_max_cost_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 
as 
select  * 
from    all_objects 
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(object_name);

Now a few simple queries – for which I’ll capture and display the in-memory execution plans a little further on:


set linesize 156
set pagesize 60
set trimspool on
set serveroutput off
alter session set statistics_level = all;

prompt  =====================
prompt  Baseline select max()
prompt  =====================

select max(object_name) from t1;
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  ============================
prompt  select max() with dummy join
prompt  ============================

select max(object_name) from t1, dual where dummy is not null;
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  =============================================
prompt  select max() with dummy join and index() hint
prompt  =============================================

select /*+ index(t1) */  max(object_name) from t1, dual where dummy is not null;
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  ============================================
prompt  select max() with dummy join and inline view
prompt  ============================================

select  obj
from    (
        select  max(object_name)  obj
        from    t1
        ),
        dual 
where   dummy is not null
/

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last alias'));

prompt  ====================================
prompt  select max() with existence subquery
prompt  ====================================

select max(object_name) from t1 where exists (select null from dual where dummy is not null);
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last alias'));

prompt  ============================================
prompt  select max() with failing existence subquery
prompt  ============================================

select max(object_name) from t1 where exists (select null from dual where dummy is null);
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last alias'));

With 50,000 rows and the appropriate index to allow Oracle to find the maximum value very quickly we expect the optimizer to invoke the “index full scan (min/max)” operation, visiting only the extreme leaf block of the index – and, indeed, we are not disappointed, that’s exactly what the baseline query shows us:

=====================
Baseline select max()
=====================
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |      1 |        |     3 (100)|      1 |00:00:00.01 |       3 |
|   1 |  SORT AGGREGATE            |       |      1 |      1 |            |      1 |00:00:00.01 |       3 |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------

However, when we introduce the (as yet unjustified) join to dual something very different happens – the optimizer forgets all about the min/max optimisation and does an index fast full scan of the t1_i1 index, passing all 50,000 rows up to the parent operation.


============================
select max() with dummy join
============================
-------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |      1 |        |    50 (100)|      1 |00:00:00.02 |     360 |
|   1 |  SORT AGGREGATE        |       |      1 |      1 |            |      1 |00:00:00.02 |     360 |
|   2 |   NESTED LOOPS         |       |      1 |  50000 |    50   (6)|  50000 |00:00:00.01 |     360 |
|*  3 |    TABLE ACCESS FULL   | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|   4 |    INDEX FAST FULL SCAN| T1_I1 |      1 |  50000 |    48   (7)|  50000 |00:00:00.01 |     357 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("DUMMY" IS NOT NULL)

We could, of course, try hinting an index range (full) scan to see what happens – and the result is even more surprising: Oracle takes the hint, uses the min/max optimisation, and shows us that it didn’t take that path by default because it had “forgotten” how to cost it correctly.

Note the cost of 354 at operation 5 when the original min/max cost was 3, note also that the optimizer thinks we have to visit all 50,000 index entries even though, at run-time, Oracle correctly uses a path that visits only one index entry:


=============================================
select max() with dummy join and index() hint
=============================================
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |   356 (100)|      1 |00:00:00.01 |       6 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |            |      1 |00:00:00.01 |       6 |
|   2 |   NESTED LOOPS               |       |      1 |  50000 |   356   (2)|      1 |00:00:00.01 |       6 |
|*  3 |    TABLE ACCESS FULL         | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|   4 |    FIRST ROW                 |       |      1 |  50000 |   354   (2)|      1 |00:00:00.01 |       3 |
|   5 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |  50000 |   354   (2)|      1 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("DUMMY" IS NOT NULL)

Of course we could recognise that the t1 access and the access to dual could be de-coupled – and hope that the optimizer doesn’t try to use complex view merging (maybe we should have included a /*+ no_merge */ hint) to fall back to a simple join. Fortunately the optimizer doesn’t try merging the two query blocks, so it optimises the max(object_name) query block correctly, giving us the benefit of the min/max optimisation. I’ve included the ‘alias’ format option in this call to dbms_xplan() so that we can see the two query blocks that are optimised separately.


============================================
select max() with dummy join and inline view
============================================

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     5 (100)|      1 |00:00:00.01 |       6 |
|   1 |  NESTED LOOPS                |       |      1 |      1 |     5   (0)|      1 |00:00:00.01 |       6 |
|*  2 |   TABLE ACCESS FULL          | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|   3 |   VIEW                       |       |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
|   4 |    SORT AGGREGATE            |       |      1 |      1 |            |      1 |00:00:00.01 |       3 |
|   5 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1 / DUAL@SEL$1
   3 - SEL$2 / from$_subquery$_001@SEL$1
   4 - SEL$2
   5 - SEL$2 / T1@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DUMMY" IS NOT NULL)

There is a maxim (or guideline, or rule of thumb) that if the from clause of a query includes tables that don’t get referenced in the select list then those tables should (probably) appear in subqueries. Of course this guideline sometimes turns out to be a very bad idea, and sometimes it just means the optimizer unnests the subqueries and recreates the joins we started with, but let’s try the approach with this query. I’ve included the ‘alias’ option again so that you can see that this plan is optimised as two query blocks, allowing the max(object_name) query block to find the min/max strategy.


====================================
select max() with existence subquery
====================================
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     5 (100)|      1 |00:00:00.01 |       6 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |            |      1 |00:00:00.01 |       6 |
|*  2 |   FILTER                     |       |      1 |        |            |      1 |00:00:00.01 |       6 |
|   3 |    FIRST ROW                 |       |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
|   4 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
|*  5 |    TABLE ACCESS FULL         | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 | 
-------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   4 - SEL$1 / T1@SEL$1
   5 - SEL$2 / DUAL@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   5 - filter("DUMMY" IS NOT NULL)

There’s a very important detail in the execution plan above. At first sight it looks like the optimizer has a plan using a simple filter subquery operation – which means you might be fooled into reading it as “for each row returned by operation 3 call operation 5”. This is not the case.

Because the subquery is not a correlated subquery – it’s an example that I sometimes call a “fixed” or (slightly ambiguously) “constant” subquery – Oracle can execute it once and use the resulting rowsource to decide whether or not to call the main query. It’s a case where (if you didn’t realise the plan consisted of two separate query blocks) you would say that Oracle was calling the second child first.

To prove this point I’ve set up one last variation of the query – the “failed subquery” version – where my select from dual returns no rows. Check the numbers of Starts shown for each line of the plan:


============================================
select max() with failing existence subquery
============================================
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     5 (100)|      1 |00:00:00.01 |       3 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |            |      1 |00:00:00.01 |       3 |
|*  2 |   FILTER                     |       |      1 |        |            |      0 |00:00:00.01 |       3 |
|   3 |    FIRST ROW                 |       |      0 |      1 |     3   (0)|      0 |00:00:00.01 |       0 |
|   4 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      0 |      1 |     3   (0)|      0 |00:00:00.01 |       0 |
|*  5 |    TABLE ACCESS FULL         | DUAL  |      1 |      1 |     2   (0)|      0 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   4 - SEL$1 / T1@SEL$1
   5 - SEL$2 / DUAL@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   5 - filter("DUMMY" IS NULL)

The filter at operation 3 calls operation 5 – the query against dual – which runs once returning no rows. The min/max scan of t1_i1 at operation 4 doesn’t run. Operation 5 was called before operation 4 was considered.

Finally

This brings us back to the question – why would anyone run a strange query like this.

Perhaps the answer is that it’s just a demonstration of one part of a more complex query and what we’re trying to do is say: “if a certain record exists in a control table then include some information from table X”.

This note tells us that if there’s a possibility of a min/max optimisation to find the data then we should avoid using a join, instead we should use a “fixed subquery” to check the control table, and maybe we’ll also have to write the part of our query that collects (or isn’t required to collect) the interesting bit of data as an inline view.

 

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

May 5, 2020

Execution Plans

Filed under: Execution plans,Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 12:36 pm BST May 5,2020

Table Of Contents

1.0 Introduction
2.0 Overview
3.0 The Main Course
4.0 Simplify
5.0 Filling the Gaps
6.0 Looking at the numbers
7.0 Predicate Information
8.0 Resolution
9.0 Summary
Footnote


 

1.0 Introduction

1.1 In a comment to a recent post on reading a non-trivial execution someone asked me to repeat the exercise using a plan I had published a few days previously in a post about tweaking the hints in an outline. The query in question involved a number of subqueries and transformations of different types, which means it’s going to take a little work explaining the details, and it’s probably going to be a fairly long read.

1.2 Here’s the query that produced the plan we’re going to examine. I’ve done some cosmetic alteration  to make it a little easier to read (though it’s still not perfect according to my standards). I’ve also made one very important addition to the query to make it easier to follow my walkthrough of the execution plan; the original text didn’t specify any query block names (/*+ qb_name() */ hints) even though it starts off with 9 separate query blocks, so I’ve walked through the text very carefully adding in the query block names that Oracle would have used (sel$NN) for each query block. In this case I got lucky because there were no views of other recursive problems involved so all I had to do was find each occurence of the word “select” in literal text order and increment the NN in sel$NN for each one.


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id,
                academic_year,
                applicant_gender,
                medium_of_study,
                education_type,
                college_id,
                course_id,
                medium_id,
                hostel_required,
                preference_order,
                status_flag,
                attribute7,  -- Added on 7-mar-20
                college_status_flag,
                percentage,
                caste_category,
                alloted_category,
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        adt.applicant_id,
                        lmt_gender.lov_code applicant_gender,
                        adt.medium_of_study,
                        act.college_id,
                        lmt_education_type.lov_code education_type,
                        act.course_id,
                        act.medium_id,
                        act.hostel_required,
                        act.preference_order,
                        act.status_flag,
                        act.attribute7, -- Added on 7-mar-20
                        adt.college_status_flag,
                        adt.academic_year,
                        adt.percentage,
                        adt.applicant_dob,
                        adt.legacy_appln_date,
                        adt.caste_category,
                        act.attribute1 alloted_category,
                        DECODE (lmt_pass.lov_code,  'ATTFIRST', 1,  'COMPARTL', 2,  3) order_of_pass,
                        DECODE (late_entry_flag,  'N', 1,  'Y', 2,  3)      order_of_entry,
                        DECODE (lmt_appearance.lov_code,  'REGULAR', 1,  'PRIVATE', 2,  3) order_of_appearance,
                        DECODE (adt.is_ttd_employ_ward,  'Y', 1,  'N', 2,  3) order_of_ttd_emp,
                        DECODE (adt.is_balbhavan_studnt,  'Y', 1,  'N', 2,  3) order_of_schooling,
                        act.attribute3 course_qe_priority,
                        adt.is_local_canditature_valid,
                        adt.is_ttd_emp_ward_info_valid,
                        adt.is_sv_bm_student_info_valid,
                        adt.is_social_ctgry_info_valid,
                        DECODE(adt.college_status_flag,'B',1,'O',2,'N',3) order_of_status
                FROM 
                        xxadm.xxadm_applicant_details_tbl    adt,
                        xxadm.xxadm_applicant_coursprefs_tbl act,
                        xxadm.xxadm_college_master_tbl       cmt,
                        xxadm.xxadm_course_master_tbl        crmt,
                        xxadm.xxadm_medium_master_tbl        mmt,
                        xxadm.xxadm_lov_master_tbl           lmt_gender,
                        xxadm.xxadm_lov_master_tbl           lmt_pass,
                        xxadm.xxadm_lov_master_tbl           lmt_appearance,
                        xxadm.xxadm_lov_master_tbl           lmt_religion,
                        xxadm.xxadm_lov_master_tbl           lmt_education_type
                WHERE
                        adt.applicant_id = act.applicant_id
                AND     act.college_id = cmt.college_id
                AND     act.course_id = crmt.course_id
                AND     act.medium_id = mmt.medium_id
                AND     adt.applicant_gender = lmt_gender.lov_id
                AND     adt.pass_type = lmt_pass.lov_id
                AND     adt.appearance_type = lmt_appearance.lov_id
                AND     adt.religion = lmt_religion.lov_id
                AND     cmt.education_type = lmt_education_type.lov_id
                AND     adt.status = 'Active'
                AND     1 = (CASE 
                                WHEN act.hostel_required = 'Y'
                                        THEN (CASE
                                                     WHEN    adt.distance_in_kms >20
                                                     AND     lmt_religion.lov_code = 'HINDU'
                                                     AND     adt.caste_category NOT IN (
                                                                     SELECT  /*+ QB_NAME(SEL$4) */
                                                                             category_id
                                                                     FROM    xxadm.xxadm_category_master_tbl
                                                                     WHERE   category_code IN ('BACKWRDC', 'BACKWRDE')
                                                             )
                                                             THEN 1
                                                             ELSE 2 
                                              END
                                             )
                                        ELSE 1 
                               END
                              )
                AND     1 =  (CASE 
                                WHEN act.hostel_required  = 'Y'
                                        THEN    (CASE 
                                                        WHEN    (    lmt_education_type.lov_code = 'COEDUCOL' 
                                                                 AND mt_gender.lov_code = 'FEMALE'
                                                                )
                                                                THEN 2
                                                                ELSE 1 
                                                 END
                                                )
                                        ELSE 1 
                               END
                              )
                AND     adt.course_applied_for = 'DEG' 
                AND     (adt.college_status_flag IS NULL OR adt.college_status_flag IN ('N','T','C','B','O')) 
                AND     act.preference_order <= NVL( -- > comment to avoid WordPress format issue
                                (SELECT  /*+ QB_NAME(SEL$5) */ 
                                         preference_order 
                                 FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                 WHERE   act1.applicant_id = adt.applicant_id 
                                 AND     status_flag IN('B','T','C','O') 
                                 ), act.preference_order 
                        )
                AND     act.preference_order >=  NVL(
                                (SELECT /*+ QB_NAME(SEL$6) */
                                        preference_order
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act2 
                                WHERE   act2.applicant_id = adt.applicant_id
                                AND     status_flag  = 'C'
                                ), act.preference_order
                        )
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$7) */
                                        act3.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act3
                                WHERE   act3.applicant_id = adt.applicant_id 
                                AND     act3.status_flag  = 'O'
                        ) 
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$8) */
                                        act1.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                WHERE   act1.applicant_id = adt.applicant_id 
                                AND     act1.status_flag IN ('C','B')
                                AND     act1.attribute1 IN (
                                                SELECT  /*+ QB_NAME(SEL9) */
                                                        category_id 
                                                FROM    xxadm.xxadm_category_master_tbl 
                                                WHERE   category_code IN ('OPENMERT')
                                        ) 
                                AND     NVL(act1.attribute7,'N') = 'N'
                        ) 
                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 1-1

1.3 This query first came to light in a thread on the Oracle Developer forums with an extract from a tkprof file showing that it had executed 842,615  times. That number should be ringing alarms and flashing warning lights, but if we assume that there really is no way of doing some sort of batch processing to get through the data we need to do a little bit of arithmetic to see how much of a threat this query is and how much is matters.

1.4 For every extra 0.01 seconds it takes to execute this query the total run-time goes up by8,426 seconds, which is 2 hours and 20 minutes. If the average execution time is a mere 0.06 seconds you’ll be at it all night long – and it will be a long, long night.


 

2.0 Overview

2.1 Before we look at the execution plan let’s take a moment to pick out a few points from the query. You may want to re-open this post in a separate window so that you can switch easily between the SQL and my comments.

2.2 We start off with a simple select from an inline view – and if we replace the inline view the simple “object name” V_THING we get the following query:


select  count(applicant_id)
from    V_THING
where   applicant_id = :p_applicant_id
;

Figure 2-1

2.3 This should prompt two questions

  • First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply, possibly the nature of the view is such that the optimizer could do a simple filter pushdown to apply the predicate very early. That still leaves (or leads on to) the question of whether the optmizer might then be able to generate further uses of the predicate through transitive closure.
  • Secondly, if the view V_THING is a multiable view will we be able to work out which table applicant_id comes from by the time it becomes visible in the view.  It’s possible that changing the table from which applicant_id comes will change the execution plan.

2.4 Digging down one layer we see that our V_THING is also a simple select from an inline view – let’s call it V_ANOTHER – so if we again forget about the complexity of the inner view we’re looking at a query that goes:


select  /*+ QB_NAME(SEL$1) */
        count(applicant_id)
from    (
        select  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                null    allotment_type
        from
                V_OTHER
        where
                rownum <=  :p_seats
        )       V_THING
where 
        applicant_id = :p_applicant_id
;

Figure 2-2

2.5 A couple of details hit the eye when you look at this: Why are we selecting 17 columns from a complex view, and then counting only one of them and discarding the rest. Let’s hope the optimizer is smart enough to discard the excess columns at the earliest possible moment (which might allow it to do some index-only accesses instead of visiting tables for columns we don’t really need).

2.6 Stranger still, one of those columns is a delberately generated NULL! This hints at the possibility that the client code is doing something like “count how many query X will give me, then run query X”– giving us the pattern “select count(*) from (inlne query X); execute query X” Maybe this whole query is a waste of time, but if it can’t be avoided maybe it should be edited down to the smallest  query that will get the correct count.

2.7 Another thought about this layer of the query, the predicate “rownum <= :bind_variable” may be pushing the optimizer into first_rows(n) optimization and this might be enough to make it choose a bad execution plan. I’d have to check, and check for specific versions, but off the top of my head I think that when comparing rownum with a bind variable the optimizer will optimizer for first_rows(10) unless there’s some other reason for choosing anything else.)

2.8 I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

2.9 Let’s dig one layer deeper before we get into the complex stuff. Here’s a version of the code that expands V_OTHER in an extremely stripped down form:


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        {lots of columns}
                FROM 
                        {lots of tables}
                WHERE
                        {lots of predicates}
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                )  
        WHERE
                ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 2-3

2.10 At this point we can start to see reasons for the layering of inline views – we need to select data in the right order before we apply the rownum predicate; as for the excess columns in the select list – even if we selected only the applicant_id in the outer layers the optimizer would still have to acquire the five columns in the order by clause.

2.11 But this emphasises the oddity of the query. If we’re only counting applicant_id to see whether we got :p_seats or fewer rows for a specific applicant_id why does the order matter – surely the order will only matter when we “repeat” the query to get the actual rows (if that’s what we do). As it is, to count a small number of rows this query might have to fetch and sort a large number, then discard most of them. (Some statistics from other posts by the OP indicated that the underlying query might fetch anything between a few hundred and a couple of thousand rows. This particular run showed the query finding 171 rows to sort and then restricting the rowsource to the first two sorted rows)


 

3.0 The Main Course

3.1 To make it a little easier to discuss the detail of the execution plan I’ve laid it out in a small number of sections corresponding to the final (outline_leaf) query blocks the optimizer generated. To do this I applied two sets of information – the Query Block / Object Alias information (which follows the body of the plan) and any appearances of the VIEW operation in the plan.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                                |      1 |        |   574 (100)|      1 |00:00:00.02 |    3822 |       |       |         |
|   1 |  SORT AGGREGATE                             |                                |      1 |      1 |            |      1 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
select count(applicant_id) - above
select where rownum less than - below
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  2 |   VIEW                                      |                                |      1 |      1 |   574   (2)|      0 |00:00:00.02 |    3822 |       |       |         |
|*  3 |    COUNT STOPKEY                            |                                |      1 |        |            |      2 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   4 |     VIEW                                    |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |       |       |         |
|*  5 |      SORT ORDER BY STOPKEY                  |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |  2048 |  2048 | 2048  (0)|
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |       |       |         |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |       |       |         |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |       |       |         |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
join index transformation query block SEL$082F290F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Continuation of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |  1245K|  1245K| 1277K (0)|
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |       |       |         |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |       |       |         |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |       |       |         |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |       |       |         |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |       |       |         |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unnested subquery SEL$A75BE177 (from sel$8, sel$9)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |       |       |         |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$6
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery SEL$F665FE1B (from sel$4 with tranform for index join)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2        / from$_subquery$_001@SEL$1
   3 - SEL$2
   4 - SEL$7E0D484F / from$_subquery$_002@SEL$2
   5 - SEL$7E0D484F
  14 - SEL$082F290F / LMT_GENDER@SEL$3
  15 - SEL$082F290F
  16 - SEL$082F290F / indexjoin$_alias$_001@SEL$082F290F
  17 - SEL$082F290F / indexjoin$_alias$_002@SEL$082F290F
  21 - SEL$7E0D484F / CMT@SEL$3
  22 - SEL$7E0D484F / CMT@SEL$3
  23 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  24 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  25 - SEL$7E0D484F / ACT@SEL$3
  26 - SEL$7E0D484F / ADT@SEL$3
  27 - SEL$7E0D484F / ACT3@SEL$7
  28 - SEL$7E0D484F / ACT3@SEL$7
  29 - SEL$A75BE177 / VW_SQ_1@SEL$67DC521B
  30 - SEL$A75BE177
  31 - SEL$A75BE177 / ACT1@SEL$8
  32 - SEL$A75BE177 / ACT1@SEL$8
  33 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  34 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  35 - SEL$7E0D484F / LMT_PASS@SEL$3
  36 - SEL$7E0D484F / LMT_PASS@SEL$3
  37 - SEL$7E0D484F / LMT_APPEARANCE@SEL$3
  38 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  39 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  40 - SEL$5        / ACT1@SEL$5
  41 - SEL$5        / ACT1@SEL$5
  42 - SEL$6        / ACT2@SEL$6
  43 - SEL$6        / ACT2@SEL$6
  44 - SEL$F665FE1B / XXADM_CATEGORY_MASTER_TBL@SEL$4
  45 - SEL$F665FE1B
  46 - SEL$F665FE1B / indexjoin$_alias$_001@SEL$F665FE1B
  48 - SEL$F665FE1B / indexjoin$_alias$_002@SEL$F665FE1B

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_optimizer_dsdir_usage_control' 0)
      OPT_PARAM('_optimizer_adaptive_plans' 'false')
      OPT_PARAM('_optimizer_gather_feedback' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F665FE1B")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$A75BE177")
      PUSH_PRED(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B" 16 15)
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$9D10C90A")
      UNNEST(@"SEL$9")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$C04829E0")
      ELIMINATE_JOIN(@"SEL$3" "CRMT"@"SEL$3")
      ELIMINATE_JOIN(@"SEL$3" "MMT"@"SEL$3")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")
      OUTLINE(@"SEL$3")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")
      INDEX_RS_ASC(@"SEL$7E0D484F" "CMT"@"SEL$3" ("XXADM_COLLEGE_MASTER_TBL"."COLLEGE_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      FULL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      FULL(@"SEL$7E0D484F" "ADT"@"SEL$3")
      INDEX_JOIN(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_CODE") ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "ACT3"@"SEL$7" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      NO_ACCESS(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      LEADING(@"SEL$7E0D484F" "CMT"@"SEL$3" "LMT_EDUCATION_TYPE"@"SEL$3" "ACT"@"SEL$3" "ADT"@"SEL$3" "LMT_GENDER"@"SEL$3" "ACT3"@"SEL$7" "VW_SQ_1"@"SEL$67DC521B"
              "LMT_PASS"@"SEL$3" "LMT_APPEARANCE"@"SEL$3" "LMT_RELIGION"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "ADT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT3"@"SEL$7")
      USE_NL(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      USE_NL(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      NLJ_BATCHING(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      PQ_FILTER(@"SEL$7E0D484F" SERIAL)
      INDEX_RS_ASC(@"SEL$A75BE177" "ACT1"@"SEL$8" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      INDEX_RS_ASC(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID"))
      LEADING(@"SEL$A75BE177" "ACT1"@"SEL$8" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      USE_NL(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      INDEX_RS_ASC(@"SEL$6" "ACT2"@"SEL$6" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."STATUS_FLAG"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$6" "ACT2"@"SEL$6")
      INDEX_RS_ASC(@"SEL$5" "ACT1"@"SEL$5" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."COLLEGE_ID"
              "XXADM_APPLICANT_COURSPREFS_TBL"."COURSE_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."MEDIUM_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."HOSTEL_REQUIRED"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$5" "ACT1"@"SEL$5")
      INDEX_JOIN(@"SEL$4" "XXADM_CATEGORY_MASTER_TBL"@"SEL$4" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID") ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_CODE"))
      END_OUTLINE_DATA
  */

Figure 3-1

3.2 There’s no rigid rule I can give you about an approach for looking for query blocks and transformations, but it’s worth checking to see which of your original query blocks still exist in the final execution plan and which have disappeared thanks to some transformation.

3.3 If we look down the Query Block Name list above we can see that sel$1, sel$2, sel$5 and sel$6 have “survived” the machinations of the optimizer. We’ve already noted that sel$1 and sel$2 are simply “select from {inline view}” as far as the optimizer is concerned; sel$5 and sel$6 are simple subqueries that appeared as filter subqueries in the original query text and have kept that status by the end of the optimizer’s transformation stage.

3.4 Tracking down the other query blocks that we named we can see the following:

  • sel$3 – most of its tables appear in a new query block called SEL$7E0D484F but one of them appears in a query block called SEL$082F290F; a closer look at SEL$082F290F shows us that it ranges from operations 14 to 17 and holds a “single table” transformation where the optimizer has chosen to use an index join of two indexes on the xxadm_lov_master_tbl rather than doing a tablescan. The index join is represented as a VIEW of a hash join, hence the separate query block. Another little detail we note – the xxadm_lov_master_tbl appears five times in the query, so we need to know which occurrence this is: fortunately the Object Alias information tells us at operation it’s the LMT_GENDER alias.
  • sel$4 is the scalar subquery inside a CASE expression, involving table xxadm_category_master_tbl. We can find the table name (which hasn’t been given an alias) and the query block name in the Object Alias information at operation 44 in a query block called SEL$F665FE1B. There are two points of interest about this query block – it has come into existence because it’s another example where the optimizer has used an index join to avoid a full tablescan; and it has been used in a filter subquery (the parent of operation 44 is the FILTER at operation 6).
  • sel$7 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl with an alias of act3. The Query Block / Object Alias information tells us that ACT3@SEL$7 appears at operations 27 and 28 – and when we track up the plan from operation 27 we see that it is the second child of operation 12 which is a nested loop anti. The optimizer has unnested the subquery and turned it into an anti join as one of the steps that produced query block SEL$7E0D484F
  • sel$8 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl aliased as act1 (OUCH – that’s the second time the alias act1 has appeared in this query!). But the subquery had it’s own subquery, named sel$9, against xxadm_category_master_tbl which didn’t have an alias (more OUCH!). When we search for ACT1@SEL$8 and XXADM_CATEGORY_MASTER_TBL@SEL$9 in the Query Block / Object Alias information we find that they both appear in a query block called SEL$A75BE177 which ranges from operations 29 to 34, and a check of the plan shows that operation 29 is a view pushed predicate of a view called VW_SQ_1 – a name that identifies the view as an internally generated non-mergeable view that was created as the result of unnestingn a subquery. The view contains a join between xxadm_applicant_coursprefs_tbl and xxadm_category_master_tbl, so we can say that the optimizer has unnested our sel$9 to create a NOT IN subquery that is a join subquery, then it has unnested again to produce an inline non-mergeable view, and it has then allowed “join predicate pushdown (JPPD)” so that the non-mergeable view can be the second table of a nested loop. To confirm the last comment we track up the plan to discover that operation 29 is the second child of operation 11 which is, indeed, a nested loop and (since the subquery was a NOT IN subquery) a nested loop anti.
  • sel$9 – see sel$8.

3.5 As you can see, when you’re using the execution plan output to identify what’s happened to the individual query blocks from your original query you’re likely to jump around from the query to the plan body, to the Query Block / Object Alias information in a fairly arbitrary way.

3.6 I’ll close this chapter of the analysis with a quick look at the Outline Data – in particular two of the hint types that appear: OUTLINE() and OUTLINE_LEAF() – which I’ve extracted and sorted for ease of reading:

      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")

      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$7E0D484F")
      OUTLINE(@"SEL$9D10C90A")
      OUTLINE(@"SEL$C04829E0")

      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      OUTLINE_LEAF(@"SEL$A75BE177")
      OUTLINE_LEAF(@"SEL$F665FE1B")

Figure 3-2

3.7 In this context an OUTLINE() is a query block that existed at some point in the optimization sequence that got us to the final execution plan but did not appear as a query block in the final plan. In the previous paragraphs we described how the original query blocks sel$3, sel$4, sel$7, sel$8 and sel$9 disappeared through transformation so (apart from sel$4 which is a bit of an anomaly that I’ll pick up in a moment) they appear in an   OUTLINE() hint. I described how sel$9 would have been unnested into sel$8 to create a join which would still have been a filter subquery until that too was unnested – that join subquery would have been one of the five OUTLINE() query blocks above with the 8 character hexadecimal names.

3.8 An OUTLINE_LEAF() is a “final” query block – one that is present in the final execution plan. If you ignore sel$4, you’ll see that the other 8 query blocks correspond to the 8 query block names that appear in the Query Block Name / Object Alias information. The appearance of sel$4 as an OUTLINE_LEAF() looks like an anomaly to me; I can’t think of a good reason why it should be in the list.


 

4.0 Simplify

4.1 We’re just about ready to do a full read-through of the execution plan. We’ve taken the two outer layers off the query/plan because they represent such simple in-line views, and we’ve discussed the disappearance of some of our initial query blocks and identified and explained all the different query blocks that have appeared in the final plan. So with the extra bits of information in hand let’s take a couple more steps in simplifying the execution plan.

4.2 First I’ll replace each of the three VIEW operations and their descendants with a single line that says “this is a rowsource”. I’ll distinguish between the two variants of the operation (VIEW and VIEW PUSHED PREDICATE) by calling them BULK ROWSOURCE and PRECISION ROWSOURCE respectively: it’s not a perfect description but broadly speaking we expect a VIEW to be called once by its parent to produce a “large” data set and we expect a VIEW PUSHED PREDICATE to be called many times by its parent to produce (each time) a “small” data set using extremely efficient methods.

4.3 Then I’ll remind you that a multichild FILTER operation calls the first child once to supply a rowsource then, for each row returned, calls the other child operations in turn to determine whether or not to keep the row from the first child. This means we can examine just the first child in isolation to see how the optimizer wants to get the driving bulk of the data (and we can examine the other children later, bearing in mind how often they might need to be called and checking how efficient each call is likely to be).

4.4 Finally I’ll note that query block SEL$7E0D484F (the “real main query” as I labelled it in the plan above) starts: “VIEW -> SORT ORDER BY STOPKEY -> FILTER” – after we’ve filtered our data we simply sort it with the intention of keeping only the “top few” rows. That part of the plan is so simple we’ll ignore those lines of the plan and focus on just the first child of the FILTER- leaving the core plan looking like this:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Figure 4-1

4.5 We need to examine a plan of only 25 lines with no complicated bits (because we’ve hidden any bits that looked complicated and will get back to them later). The thing now looks like a single query block which means we can think “First Child First”, so:

  • operation 7 calls operation 8 which calls operation 9 which calls operation 10 which calls operation 11 which calls operation 12 which calls operation 13 which calls operation 14 which is the first operation to produce a rowsource (though we don’t care how at present).
  • Operation 13 is a hash join, so the rowsource from operation 14 becomes its “build” table, and we call operation 18 to supply the “probe” table.
  • Operaion 18 calls operation 19 which calls operation 20 which calls operation 21 which is a table access by rowid that has to call operation 22 to get rowids. So operation 22 supplies the second rowsource (in our collapsed plan). It’s an INDEX UNIQUE SCAN of the index that appears (judging from its name)to be the primary key index of a table, so operation 22 will produce at most one rowid that is passed up to operation 21 that will use that rowid to get the one row from the table. (Operation 21 supplies the 3rd rowsource).
  • Operation 21 passes a row up to operation 20 which calls operation 23 which calls operation 24 (4th rowsource) to do another unique scan of a unique index to get a rowid to pass up to operation 23 to find (and test) a row from the table (5th rowsource) which it passes up to operation 20 to do the join and pass the result up (6th rowsource) to operation 19.
  • Operation 19 calls its second child (operatiomn 25) for each row it receives – but because of the primary key/unique scans the optimizer knows that the first child will return at most one row and sees no problem with using a full tablescan as the second child of the nested loop. So the tablescan of XXADM_APPLICANT_COURSPREFS_TBL is the 7th rowsource. Any rows survinging the join are passed up to operation 18 (making operation 19 the 8th rowsource).
  • Operation 18 uses the incoming rowsource to build its in-memory hash table, and calls operation 26 to supply its second (probe table) rowsource. Operation 26 is the the 9th rowsource, executing a full tablescan of XXADM_APPLICANT_DETAILS_TBL and passing the results up to operation 18, which performs the join and passes the results up to its parent, making it the 10th rowsource.
  • Operation 18 was the second child of the hash join at operation 13, which now uses the incoming data as the probe table to generate the 11th rowsource and pass the results up to operation 12.
  • Operation 12 is a nested loop anti-join and operation 13 has just supplied it with its first child rowsource, so operation 12 will now call its second child once for each row in the first child. Its second child is operation 27 (table access by rowid) which calls its first child (operation 28 index range scan) which fetches rowids from the index passes them up to operation 27 which fetches table rows and passes them up to operation 12. So operation 28 supplies the 12th rowsource, operation 27 the 13th. Since operation 12 is an ANTI join a row from the first child survives if operation 27 doesn’t find a row to return. Operation 12 passes any survivors (14th rowsource) up to operation 11.
  • Operation 11 is another ANTI-join nested loop so for each row from operation 12 it will call its second child, passing in values from its first child to drive an efficient access path and forwarding any rows from the first child where the second child returns no rows. Its second child is operation 29 – our “precision rowsource” – and we can postpone worrying about the exact details of how that works. Operation 29 produces the 15th rowsource in our reduced plan, which it passes up to operation 11.
  • Operation 11 is the first child of the nested loop at operation 10 – and from this point onwards we have 4 nested loop joins and we can simply list through the order of operation. Operation 11 produces the 16th rowsource, then Operation 10 calls its second child (operation 35) which calls operation 36 which passes rowids (17th rowsource) up to operation 35 which passes rows (18th rowsource) up to operation 10.
  • Operation 10 passes its data (19th rowsource) up to operation 9 which calls operation 37 as its second child. Operation 37 (20th rowsource) passes index entries up to operation 9 which performs the join and passes the result (21st rowsource) up to operation 8.
  • Operation 8 calls operation 38 as its second child. Operation 38 (22nd rowsource) passes index entries up to operation 8 which performs the join and passes the result (23rd rowsource) up to operation 7.
  • Operation 7 calls operation 39 as its second child. Operation 39 (24th rowsource) passes index entries up to operation 7 which performs the join and that’s the final (25th) rowsource as far as our reduced execution plan is concerned.

4.6 Here’s the reduced plan, cut back to minimum width, with the order of rowsource generation included:


-----------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           |  Order |
-----------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |     25 |
|   8 |         NESTED LOOPS                        |                                |     23 |
|   9 |          NESTED LOOPS                       |                                |     21 |
|  10 |           NESTED LOOPS                      |                                |     19 |
|  11 |            NESTED LOOPS ANTI                |                                |     16 |
|  12 |             NESTED LOOPS ANTI               |                                |     14 |
|* 13 |              HASH JOIN                      |                                |     11 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |
|* 18 |               HASH JOIN                     |                                |     10 |
|  19 |                NESTED LOOPS                 |                                |      8 |
|  20 |                 NESTED LOOPS                |                                |      6 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      3 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      2 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      5 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      4 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      7 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      9 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |     13 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |     12 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |     15 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |     18 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |     17 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |     20 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |     22 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |     24 |
-----------------------------------------------------------------------------------------------

Figure 4-2

4.7 Once we’ve got this image sorted out we still have a few details to fill in before we’ve gpt the full picture of the execution plan.

  • What does Oracle do to generate the “bulk rowsource” at operation 14
  • What does Oracle do on every call to the “precision rowsource” at operation 29
  • We know that the reduced plan above is the first child of a FILTER operation and if we refer back to previous “real main query” we know that there are three further children to the FILTER that might have to execute once for each row produced by the first child. So that’s another 3 (small) query blocks we need to examine in detail.
  • We need to bring in the predicates to see how the optimizer has used them
  • We need to look at the Starts and A-Rows to compare what happened with the optimizer’s expectation
  • We need to look at disk reads and buffer gets to see how much excess work we did to acquire the data


 

5.0 Filling the Gaps

5.1 After getting the overall shape of the query’s execution we can go back and examine the bits we have so far postponed view. There are three pieces to consider

  • the “bulk rowsource” at operation 14 that was the first child of a hash join.
  • the “precision rowsource” at operation 29 that was the second child of a nested loop anti-join
  • the filter subqueries that were the 2nd, 3rd and 4th children of the explicit FILTER at operation 6

5.2 We start with the “bulk rowsource” that was a VIEW with a highly suggestive name of index$_join$_008. This shows Oracle finding a way of selecting data from a table without visiting the table, using a couple of indexes as if they were skinny tables that could be scanned and joined. In effect Oracle has transformed “select key1, key2 from tableX” into something like:


select  ix1.key1, ix2.key2
from
        (select key1, rowid r1 from tableX) ix1,
        (select key2, rowid r2 from tableX) ix2
where
        ix1.r1 = ix2.r2
;

5.3 This strategy can only be used when Oracle knows that every relevant row will appear in the two indexes – which basically means you have to be careful about NULLs. In the simplest case you might have to have a NOT NULL constraint on at least one column in each of the target indexes; or a predicate in each inline view that ensures that Oracle can use just the index without losing some of the rowids that it needs. After acquiring key values and rowids from each index in turn, Oracle then joins the two sets of data using a hash join. Technically there is no limit to the number of indexes that Oracle can join in this fashion, the choice of strategy depends largely on how big the table is compared to the sum of the sizes of the indexes that could be used instead; practically (as in our main query) it’s rare to see more than two indexes used for this “index join” mechanism.


join index transformation query block SEL$082F290F, with parent
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.4 Moving on to the “precision rowsource” that appears in the original plan as a VIEW PUSHED PREDICATE. This means that Oracle has optimised a non-mergeable view allowing for an input value from its parent. If you take operations 30 to 34 in complete isolation it’s just a simple nested loop join and you might wonder why the view is non-mergable. But when you look back at the parent you discover that it’s an ANTI-join, so Oracle has to say (for each driving row) “join these two tables and see if you get any rows as a result”, it doesn’t have a generic mechanism for doing two separate but consecutive (anti-)join operations at this point.


Unnested subquery SEL$A75BE177 (from sel$8, sel$9) with parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             Driving Rowsource               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.5 Finally we have the three filter subqueries, which I’ve shown with their parent FILTER and a one-liner for the driving rowsource. For each row in operation 7 we call operations 40, 42 and 44 in turn although the parent filter may decide after calling operation 40 that it doesn’t need to call the other two and can simply move on to the next row from operation 7. Similarly the filter might call operations 40 and 42 and not need to call operation 44. It’s also possible that, thanks to scalar subquery caching, Oracle can say “I’ve already called operation 40 for this value, so I know the result and don’t need to call it again”. When we look at the Starts and A-Rows columns for the three operations we will get some idea of how the “notional” execution sequence turned into an actual workload.


Filter subqueries SEL$5, SEL$6 and SEL$F665FE1B, with their parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        "Simplify" Plan                      |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.6 The operations for the first two subqueries (40,41) and (42,43) shouldn’t need any explanation. The third set of operations (44 to 48) is a little more complex. In fact it’s similar to the mechanism that appeared in our first “bulk rowsource” – we have Oracle turning a table access into an “index (only) join”, collecting key values and rowids from two different indexes and doing a hash join on the rowids. The plan is just a little more subtle, though – instead of getting their data from “index fast full scans”, one index is accessed by a range scan (possibly using a predicate value passed in by the filter operation) and the other is using an INLIST ITERATOR with unique scan. In the second case we must be handling a predicate of the form: “key_column in {list of values}”. (Taking a first look at the Starts column we can see that the INLIST ITERATOR runs 6 times calling the INDEX UNIQUE SCAN a total of 12 times so it’s fairly obvious that there are exactly two elements in the list in this case.)


 

6.0 Looking at the Numbers

6.1 Rather than walking through the entire plan again putting the pieces together I’m going to assume that I can carry on to the next stage of analysis, and assume that the pieces will fall into place as we talk about some of the critical numbers.

6.2 It’s probably best to open a copy of the note in a separate window so that you can examine the plan and read my comments at the same time. Starting at the top we apply “first child first”, noting in passing that the examples of the VIEW operations we have in this plan don’t have a significicant impact on the order of operation as we work down the plan – they simply remind us that there are “non-mergeable” parts of the plan. Eventually we get to operation 13 which is a hash join and operations 14 through 17 give us the build (first) table – a quick check shows 127 rows for estimated (E-rows) and actual (A-rows); then we see the probe (second) table is itself a hash join returning 182 rows (estimate 487, so in the right ballpark) and the hash join at operation 13 produces a result set of 182 rows.

6.3 At this point a quick check back UP the plan tells us that the 182 rows survive all the way up to operation 6, where a FILTER eliminates just a few of them; then the result set drops to just 2 rows at operation 5. Then a quick check of operation 5 (with a cross reference to the query) reminds us that we have an inline view that does an “order by” followed by a “rownum < :bind” predicate – so the sort order by stop key at operation 5 is sorting all the data but only passing on the first two rows: so there’s no way we could have modified the join order to eliminate the redundant rows sooner.

6.4 So we see that we get the right volume of data at about the right moment in the plan, and probably can’t do much to avoid the volume of data access – but let’s check how we got the 182 rows at operation 18. Using “first child first” – we see a nested loop joining two “single row by unique index” rowsources, then a nested loop to a full tablescan of XXADM_APPLICANT_COURSPREFS_TBL. The knee-jerk reaction to a “full tablescan as the second table in a nested loop” is that it must be bad – but in this case we know that it will happen no more than once, so we don’t need to panic immediately. Applying a little more thought (and arithmetic): we note that the tablescan returns 209 rows (estimated 478) using 879 buffer gets; that’s not an extreme number of buffer gets per row (especially if the 478 is a reasonably accurate average for the operation). We’ll postpone worrying about the tablescan for the moment but take note that it might be worth revisiting.

6.5 The second child to the hash join at operation 18 is another full tablescan (of XXADM_APPLICANT_DETAILS_TBL) which requires 1,141 buffer gets. Again, though this might not be a bad move, since the alternative would be an actual 209 index probes (or an estimated average 478 index probes). The workload is, again, in the right ballpark but, again, something we might come back to. In fact in both tablescans it might be more important to worry about the work done at each row by the row predicates rather than worrying about the fact that the operation is a tablescan; a predicate involving a CPU-intensive PL/SQL function might be the thing that makes 478 index probes to 478 rows a better option than a tablescan of (say) 100,000 rows.

6.6 From this point onwards (operation 13) we have 6 nested loop joins (though the first two are anti-joins) so it’s “call the second child for each row in the first child” all the way down, and we’ve seen that we don’t eliminate any data as we go. If we want to make the execution plan any faster by “local” tweaking we’ll just have to make sure each “second child” operation is as efficient as possible, which tends to mean looking for cases where we supply a lots of rows (rowids) from an index range scan but find that we then discard the table rows after visiting the table. So …

6.6.1 Operation 12 – nested loop anti join – calls 28/27 (table access by unique index). We find an index entry on each call, but the table row doesn’t qualify – which is what we want for a “successful” anti-join. We could make this a little more efficient by adding a column to the (already unique) index and avoid visiting the table.

6.6.2 Operation 11 – nested loop anti join – calls operation 29 (view with pushed predicate) which operates a high-precision nested loop join at operation 30. The first child of operation 30 is a table access by index, but the table never returns a row (which is nice) so we never call the second child of operation 30. Again we could make this view access a little more efficient by adding extra columns to (already unique) indexes to avoid any need to visit tables.

6.6.3 Operations 10, 9, 8, 7 – nested loop joins – operating very efficiently, some not even visiting tables to acquire data. The order of operation at this point is: 11, 36, 35, 10, 37, 9. 38, 8, 19, 7. And then we get to the FILTER operation, which has 3 subqueries to operate in turn,

6.7 Operation 6 executes in turn the two subqueries we named sel$5 and sel$6 respectively, the first one 182 times, the second one 171 times. Since operation 6 produces 171 rows it seems likely that the initial 182 rows dropped to 171 rows as a consequence of the sel$5 subquery resulting in the smaller number of calls to the sel$6 subquery. It’s worth noting here that operation 41, the index range scan, returned 1,450 rowids, but the subsequent table accesses returned a total of only 29 rows after an additional 316 buffer gets (507 – 191). There may be an opportunity here (yet again) for adding an extra column to the index so that we visit the table only 29 times rather than visiting it 1,450 times. In fact, though it’s not obvious in this SQL Monitor report, the indications from other examples from the same query suggested that this subquery was the single more resource intensive part of the plan.

6.8 The last subquery executed by Operation 6 is the one identified by query block sel$4. The sub-plan starts with a VIEW operation because the table (identified as originating in sel$4 in the query block / object alias information) is “accessed” by way of an index hash join. This subquery is executed only 6 times. Given that there are (at least) 171 rows for which this subquery could be called this means one of two things.  First we can from the query text that this subquery is part of a complex CASE expression – so maybe the simple conditions in the expression mean we rarely need to call the subquery;. secondly it could mean the run-time engine has managed to take advantage of scalar subquery caching and the query doesn’t have many distinct inputs for this subquery – and when we check the predicate section we can see the relevant predicate for a query against the XXADM_CATEGORY table was “category_id = {correlation variable}” which has the look of a table with a few rows and distinct ids..

6.9 In summary, then, there may be a few “localised” tweaks that we can to do improve performance of this plan – largely by adding columns to existing indexes and using them effectively. There are indications that one of the filter subqueries might be a particularly good target for this type of tweak; after which we might want to look at what could be done with the two tablescans which are in that grey area where it’s not easy to decide whether an indexed access or a tablescan is the better option. We have to remember, though, that this query was originaly reported as executing 842,000 times – so maybe we need to do much better than just a little tweaking.


 

7.0 Predicate Information

7.1 Why are we running a query 842,000 times in a batch? The right way to find an answer to that question is to ask the right person – if you can find them. A slightly more long winded way is to find out what is driving the 842,000 executions – and you might be able to do that if you have the full tkprof output from the trace file. (Hint: statement X runs 842,000 times, and if statement Y executes 13 times and produces 842,000 rows maybe Y is driving X.) Sometimes, though, you don’t have the people, or the full data set, or the access you need, so you might take a look at the query and the predicates and start making some reasonable guesses.

7.2 Here’s the tail end of the query, conveniently capturing all the input bind variables:

                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id

7.3 There are two things we can note about these predicates – first they don’t follow the pattern of “:Bnnn” so they’re not from a statement statically embedded in PL/SQL, secondly the names are intelligible and meaningful, so we might draw some tentative conclusions from them, in particular how many distinct values there might be and how this lead to 842,000 executions of the query.

7.4 The variable name that stands out is :p_applicant_id. We seem to be looking at a query about applicants for courses at colleges – and the latter pair probably gives us a relatively small number of combinations. The variable :p_hostel_required is surely just going to be a “yes/no/maybe/null” option. The :p_medium_id is a bit of a puzzle but scanning through the query it looks like it’s the id for the “medium of study” so probably another variable with a small number of values. So where in the plan do these variables get used? Here’s the full list of predicates from the plan, followed by an extra few lines showing just the predicates that reference the bind variables:

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   6 - filter((    "ACT"."PREFERENCE_ORDER"<=NVL(,"ACT"."PREFERENCE_ORDER") -- > comment added to avoid wordpress format issue
               AND "ACT"."PREFERENCE_ORDER">=NVL(,"ACT"."PREFERENCE_ORDER") AND CASE "ACT"."HOSTEL_REQUIRED"
              WHEN 'Y' THEN CASE  WHEN ("ADT"."DISTANCE_IN_KMS">20 AND "LMT_RELIGION"."LOV_CODE"='HINDU' AND  IS NULL) THEN 1 ELSE 2 END  ELSE 1 END =1))
  13 - access("ADT"."APPLICANT_GENDER"="LMT_GENDER"."LOV_ID")
       filter(CASE "ACT"."HOSTEL_REQUIRED" WHEN 'Y' THEN CASE  WHEN ("LMT_EDUCATION_TYPE"."LOV_CODE"='COEDUCOL' AND "LMT_GENDER"."LOV_CODE"='FEMALE') THEN 2 ELSE 1 END
               ELSE 1 END =1)
  15 - access(ROWID=ROWID)
  18 - access("ADT"."APPLICANT_ID"="ACT"."APPLICANT_ID")
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  24 - access("CMT"."EDUCATION_TYPE"="LMT_EDUCATION_TYPE"."LOV_ID")
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))
  26 - filter(("ADT"."STATUS"='Active' AND "ADT"."COURSE_APPLIED_FOR"='DEG' AND (INTERNAL_FUNCTION("ADT"."COLLEGE_STATUS_FLAG") OR "ADT"."COLLEGE_STATUS_FLAG" IS
              NULL)))
  27 - filter("ACT3"."STATUS_FLAG"='O')
  28 - access("ACT3"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT"."PREFERENCE_ORDER"="ACT3"."PREFERENCE_ORDER")
  31 - filter((INTERNAL_FUNCTION("ACT1"."STATUS_FLAG") AND NVL("ACT1"."ATTRIBUTE7",'N')='N'))
  32 - access("ACT1"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT1"."PREFERENCE_ORDER"="ACT"."PREFERENCE_ORDER")
  33 - filter("CATEGORY_CODE"='OPENMERT')
  34 - access("CATEGORY_ID"=TO_NUMBER("ACT1"."ATTRIBUTE1"))
  36 - access("ADT"."PASS_TYPE"="LMT_PASS"."LOV_ID")
  37 - access("ADT"."APPEARANCE_TYPE"="LMT_APPEARANCE"."LOV_ID")
  38 - access("ADT"."RELIGION"="LMT_RELIGION"."LOV_ID")
  40 - filter(("STATUS_FLAG"='B' OR "STATUS_FLAG"='C' OR "STATUS_FLAG"='O' OR "STATUS_FLAG"='T'))
  41 - access("ACT1"."APPLICANT_ID"=:B1)
  43 - access("ACT2"."APPLICANT_ID"=:B1 AND "STATUS_FLAG"='C')
  44 - filter(("CATEGORY_ID"=:B1 AND INTERNAL_FUNCTION("CATEGORY_CODE")))
  45 - access(ROWID=ROWID)
  46 - access("CATEGORY_ID"=:B1)
  48 - access(("CATEGORY_CODE"='BACKWRDC' OR "CATEGORY_CODE"='BACKWRDE'))


   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))

Figure 7-1

7.5 Apart from the predicates in the final shortlist you probably noticed further bind variables at operation 41, 43, 44, and 46 – but these are all named :B1, which is Oracle flagging up the need to pass correlating values into filter subqueries.

7.6 Operation 25 (where we test almost all the predicates) is one of the first operations to drive the query while operation 2 (where we test the :p_applicant_id) is close to the very last thing we do in the execution plan. So we generate a load of data for a college, course, and couple of other predicates, sort it then – at the last moment – decide that we only want a few (:p_seats) rows and count how many rows we’ve found for the specific applicant – and we do that a very large number of times. This takes me back to section 2 where I asked a couple of critical questions:

7.6.1 (2.3) First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply,

7.6.2 (2.8) I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

7.7 We now know the answer to the first query – that predicate isn’t going anywhere, and we recognise why not, of course: it’s a consequence of the “rownum” pseudo-column which has to be evaluated for all the generated data before the rownum restriction can be applied: “select for the applicant then apply the rownum” is very different from “apply the rownum (column and predicate) then restrict to the applicant”. That brings us to the second question – why would you generate all the data, then order it, then restrict it to the first N, and then count how many times a specific applicant appeared? And there’s one “valid” answer to the last bit of the question – what if you’re not really trying to count how many times the applicant appeared, you’re only trying to find out whether the count is zero or non-zero.

7.8 The intent of the query is to answer the question: “does this applicant appear in the first N candidates”. Once you’ve realised this the underlying performance problem with the query becomes clear. In the monitored example show here the query found 171 applicants that matched the initial predicates – and at some point in the batch it’s going to do the same work all over again for each of the other 170 applicants that we’ve discarded. For each combination of the initial predicates (excluding applicant id and seat count) we run the query N times if there are N candidates that match that combination. It’s bad enough that this query took 0.02 seconds to run and would have run 172 times (for a total of 3,4 seconds) but another sample run took 0.05 seconds to run identifying 1,835 applicants (which means another 1,834 executions for a total of 91 seconds run time).


 

8.0 Resolution

8.1 There is a serious flaw in the design of this application. We are seeing a piece of code running once per applicant_id when (with some minor variations) it looks as if it should be running no more than once per set of distinct combinations of (course_id, college_id, medium_id, hostel_required). In fact, if the set of distinct combinations could be generated by a simple query, you could imagine the entire required result set as a join between two non-mergable views, with a little row-by-row post processing – but that might be too ambitious a change to implement in the short term.

8.2 Realistically (as a low-risk strategy) it might be possible to keep a very large percentage of the existing code structure for whatever this task does but precede it with a PL/SQL loop that steps through each of the distinct combinations required, populating a table (perhaps an IOT) with {applicant_id, (combination columns), “rownum”); and then replace our problem query by a simple primary key look up to find the saved “rownum” for each applicant and combination, to check whether the stored “rownum” was fell within the required seat count.


 

9.0 Summaryi

9.1 For a DBA working on-site, or a consultant on a short-term visit, the analysis shown in this post is probably not how things would have progressed. I could imagine the sequence of events being more like:

9.1.1 This “start of year / start of term” batch job takes too long

9.1.2 What is it trying to achieve (business overview) – sketch an outline of the process (technical overview)

9.1.3 Trace the job and discover most of the time went on this query

9.1.4 Investigate the logic of this query and why it is run for every applicant_id

9.1.5 Recognise the fundamental design threat then choose between three possible strategies:

9.1.5.1 make the query much faster

9.1.5.2 re-engineer this part of the batch completely

9.1.5.3 subvert this section of the batch to pre-build a “materialized result” and use a much simpler query to query it

9.2 Effectively, however, we’ve come in at 9.1.5.1 and run the consultation backwards. As we did so we raised an early question about the applicant_id and pushing predicates and the oddity of counting a limited list, and we finally came back to those points towards the end of the post with an educated guess about what the query was trying to achieve and how it should be reduced from “once per applicant” to “once per combination”.

9.3 Despite the post starting at the wrong place it’s quite possible that we would have reached 9.1.5.1 by following a sensible order of problem analysis, and still want to think about how the query might run more quickly – so this investigation wasn’t a total waste of time and it’s allowed us to work through a real-world query and plan in a realistic way which we can sum up in the following stages:

9.3.1 Simplify: cross-referencing between the overall plan shape, the Query Block / Object Alias information, and original query we can take out sections of the plan (sub-plans) and analyse them separately. In particular we can identify and reduce to a minimum the core of the plan that generates the final result set, calling on the various sub-plans as it goes.

9.3.2 Follow the workload: in this case we didn’t get much help from the timing information, but buffer gets, disk reads and A-rows also supply clues to where most work is done. Critically we noted that the volume of data we picked up early on in the query was needed all the way through the query – until the last moment – and we didn’t waste resources carrying and processing unnecessary rows. Along the way, of course, we compare Oracle’s predictions of data volume with the actual data volume (A-Rows = Starts * E-Rows as a guideline). We also noted a couple of opportunities where modifying indexes might eliminate table visits, potentially reducing I/O and buffer gets.

9.3.3 Check the predicates: which goes hand in hand with following the workload – how and where are our predicates used. What predicates have been generated (or eliminated) by transitive closure; which predicates are (or could be) pushed further down the plan tree to eliminate data earlier; will multiple predicates result in bad optimizer estimates followed by bad choices for access paths.

9.4 It would be nice to think that there was a simple progression, a fixed sequence of steps that one could follow to interpret an execution plan quickly and accurately. Unfortunately (like the optimisation process itself) interpretation requires a measure of looping and recursion. It’s probably always best to start with simplifying – but how much you simplify and how you pick which subplan to simplify (or start analysing in detail) depends on being able to spot where the biggest workload appears; and before you get stuck too deeply into a sub-plan you might glance down at the use of predicates because a change in one predicate might make the optimizer completely re-engineer its choice of plan. And maybe, before anything else, you’ll see a single operation which you know should (for exanple) generate about 10 rows when the optimizer is predicting 25,000 rows (or vice versa) and you’ll want to check why there’s such a bad estimate at one point in the plan before you tackle any of the harder questions.

9.5 The bottom line with execution plans is simply this: the more you practice the faster you get at spotting the clues that are worth pursuing; and the faster you spot the clues the less time you waste unpicking every little detail, and the less time you spend on the preripheral pieces of the plan the easier it becomes to keep the big picture in your mind and see how the optimizer got to where it is, and how you might want to redirect it. So pick a couple of random queries each week that produce plans of about 20 lines and use them to exercise your interpretation skills; and increase the complexity of the queries every couple of weeks.

The End

 
 

 

Footnote

I think I’ve spent more than 20 hours writing a detailed description of something that would normally take me a few minutes to do [and some poeple wonder why I’ve not yet written another book on the optimizer]. In part the difference in time is because with practice the “intuitive” skill grows and the pattern of reading is more like –

  • How does it start (ignoring the “trivial” bits around the edges)
  • How does it end ( ditto )
  • Where do we do most work reading and discarding data

What I’ve done in this note is talk about every single query block and every single line whereas in real-life I might have scanned the plan, examined about 10 lines, and done a quick check on the corresponding predicates as the first step to deciding whether or not the plan was reasonably efficient.

April 23, 2020

date_to_date

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:00 pm BST Apr 23,2020

Every now and again someone posts a piece of SQL on the Oracle Developer Forum that includes a predicate with an expression like to_date(date_column). This is a problem for several reasons – not the least being the type of performance problem that showed up in a post from a couple of years back that has just been resurrected.

Before I examine the performance detail, here’s a simple demo of the “wrong data” problem that can go unnoticed, cut-n-paste from a 12.2.0.1 session of SQL*Plus:


SQL> create table t1 (d1 date);

Table created.

SQL> insert into t1 values(sysdate);

1 row created.

SQL> select * from t1 where d1 = to_date(d1);

no rows selected

SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

Session altered.

SQL> select * from t1 where d1 = to_date(d1);

D1
--------------------
22-apr-2020 15:12:36

1 row selected.


Note particularly how changing the nls_date_format can change the result of a query! (There’s another simple example on the referenced blog note.)

What’s going on? to_date(date_col) is equivalent to to_date(to_char(date_col)) using the nls_date_format to do the two conversions, and the most common default format is one that truncates the date column to date-only. So in may systems  to_date(date_col) is nearly (though doing it an expensive way) the same as trunc(date_col).

But let’s go further and see how we wreck the benefit of an index, even if we’ve made sure that we still get the correct results. The following is a minimalist model of a common billing requirement: conversion between currencies:


rem
rem     Script:         date_to_date.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        'GBP'                           from_currency,
        'USD'                           to_currency,
        trunc(sysdate - 1000) + rownum  conversion_date,
        'Corporate'                     conversion_type,
        round(
                1.25 + dbms_random.value/4,
                6
        )                               conversion_rate,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 1000 -- > comment to avoid WordPress format issue
;

create unique index t1_i1 on t1(from_currency, to_currency, conversion_date, conversion_type)
/
alter table t1 add constraint t1_pk primary key(
        from_currency, to_currency, conversion_date, conversion_type
)
/

create table driver(
        invoice_currency        varchar2(3),
        billing_currency        varchar2(3),
        client_type             varchar2(10),
        invoice_date            date
);

insert into driver values(
        'GBP', 'USD','Corporate',trunc(sysdate)
);

commit;

execute dbms_stats.gather_table_stats(null,'driver');

I’ve created table t1 to model the exchange rates between US dollars and UK pounds over a range of about three years ending “today”. There’s also a conversion_type column in the unique key to this table that allows us to have multiple reasons for exchanges, allowing multiple exchange rates on the same day. I really ought to have a check constraint on this table that says something like: check (conversion_date = trunc(conversion_date)).

I’ve also created a “driver” table that holds the data that might be exactly the data we need to extract an exchange rate for a single invoice. So let’s run the SQL that gets the appropriate exchange rate for this one invoice:


set serveroutput off
alter session set statistics_level = all;

prompt  =====================================
prompt  First run with simple date comparison
prompt  =====================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     t1.conversion_date = driver.invoice_date
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  =================================
prompt  Now run with to_date(date_column)
prompt  =================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     to_date(t1.conversion_date) = to_date(driver.invoice_date)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

In the first case I’ve used the correct join predicate between these tables; in the second I’ve put in a redundant to_date() function call at both ends of the predicate. (If you think this is unrealistic – it’s an exact match for the production code I reported in the blog note I cited above).

Here are the two execution plans – with their rowsource execution stats:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      10 |      8 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      10 |      8 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |       9 |      8 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX UNIQUE SCAN         | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       2 |      8 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_DATE"="DRIVER"."INVOICE_DATE"
              AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")


----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      15 |      4 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      15 |      4 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |      14 |      4 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX RANGE SCAN          | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       7 |      4 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")
       filter((TO_DATE(INTERNAL_FUNCTION("T1"."CONVERSION_DATE"))=TO_DATE(INTERNAL_FUNCTION("DRIVE
              R"."INVOICE_DATE")) AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE"))

Three things to take note of:

  1. The index unique scan at operation 4 has changed to an index range scan.
  2. The predicate information for operation 4 has changed from a pure access predicate to an access predicate plus a very messy filter predicate
  3. The range scan that now appears at operation 4 gets 7 buffers (that’s one root block and 6 leaf blocks) to find the one rowid we need, and will have applied the messy filter predicate to all 1,000 index entries where the two currency codes were GBP/USD.

In the case of the older article the problem query was spending almost all of its time in a join like this, doing 59 buffer gets to find a single rowid for every invoice presented – possibly doing thousands of comparisons as it went.

 

March 12, 2020

Sequence Costs

Filed under: Bugs,Oracle,Performance,trace files,Troubleshooting — Jonathan Lewis @ 8:18 pm GMT Mar 12,2020

[Update: make sure you read the comments – the behaviour described here is a bug introduced in 19c]

You’re probably aware of the “identity” construct that appeared in 12.1 and uses Oracle’s sequence mechanism to model the Autonumber or Identity columns that other databases have. A posting from Clay Jackson on the Oracle-L list server suggests that something about their code path has introduced a surprising overhead in 19c … when you’re not using them.

The following code is a slightly modified version of a test case that Clay Jackson posted to demonstrate a strange difference in performance between 12.2 and 19.3

rem
rem     Script:         19c_sequence_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2020
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem

drop table test_tab2;
drop table test_tab1;

drop sequence test_seq1;
drop sequence test_seq2;

create sequence test_seq1;
create sequence test_seq2;

create table test_tab1 (
   id number primary key,
   c1 varchar2(1000)
);

create table test_tab2 (
   id number,
   t1_id number,
   c2 varchar2(1000),
   constraint test_tab2_fk foreign key (t1_id)
        references test_tab1 (id) on delete cascade
);

spool 19c_sequence_cost.lst

set feedback off
set pagesize 0
set serveroutput off

execute snap_my_stats.start_snap
alter session set events '10046 trace name context forever';

prompt  ==============================================================================
prompt  start_1000.sql cascades to start_1.sql
prompt  start_1.sql holds:
prompt
prompt  insert into test_tab1 values (test_seq1.nextval,'hello');
prompt  insert into test_tab2 values (test_seq2.nextval, test_seq1.currval, 'byebye');
prompt  commit;
prompt
prompt  ==============================================================================

start start_1000

set serveroutput on
set pagesize 60
alter session set events '10046 trace name context off';
execute snap_my_stats.end_snap

spool off

I’ve got a couple of tables with columns that I plan to generate from sequences, and I’ve got a referential integrity constraint between those tables. I’ll be using nextval from one sequence to populate the first table, then use currval from the same sequence for the foreign key and nextval from the other sequence as a unique identifier for the second table.

I’ve used my “repeater” code to run a short script from the SQL*Plus prompt 1,000 times (as suggested by Clay Jackson in his posting). I’ve also added lines to enable SQL_trace at the basic level, and taken a snapshot of the session activity stats.

Just as Clay Jackson had reported – 19.3.0.0 took significantly longer than 12.2.0.1 to complete the 1,000 calls to the script. (Though in absolute terms we’re only talking fractions of a second for a small single user test.) Examination of the stats – which prompted me to add the 10046 trace and repeat – made it very clear where the difference lay. Here are a few figures picked from the session activity stats:

Version    Statistic                        Value
========   ========================         =====
12.2.0.1   CPU used by this session            53
19.3.0.0   CPU used by this session           114

12.2.0.1   recursive cpu usage                 18
19.3.0.0   recursive cpu usage                 44

12.2.0.1   recursive calls                  1,182
19.3.0.0   recursive calls                  7,215

12.2.0.1   execute count                    2,137
19.3.0.0   execute count                    8,151

Clearly it’s worth finding out what was happening in those 6,000 extra recursive calls. And this is what the 19.3 trace file (after passing through tkprof … sort=execnt) showed:


select count(1)
from
 sys.idnseq$ where seqobj# = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   6000      0.07       0.09          0          0          0           0
Fetch     6000      0.03       0.04          0       6000          0        6000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    12001      0.10       0.14          0       6000          0        6000

If you can’t guess what the table idnseq$ is about, the following definition appears in $ORACLE_HOME/rdbms/admin/dcore.bsq:

create table idnseq$     /* stores table identity column to sequence mapping */
( obj#         number not null,                       /* table object number */
  intcol#      number not null,                    /* identity column number */
  seqobj#      number not null,                    /* sequence object number */
  startwith    number not null,                   /* sequence starting value */
  supplemental log group idnseq$_log_grp (obj#, intcol#, seqobj#) always
)
/

We’re not using identity columns in this test, but every time we run the script we do six checks against this data dictionary table to see (presumably) if we’re using a sequence that is internally associated with an identity column. The requirement doesn’t seem completely unreasonable – but it’s a bit surprising that we do every check twice. (Note – we have to check test_seq1 for both statements but test_seq2 only for the second statement, which is how we get 6,000 executions: 2 * (2 checks for test_seq1 + 1 check for test_seq2) * 1,000 executions of the script.

The doubling looks like a bug, and you have to wonder why a suitable piece of information isn’t kept in the dictionary cache anyway to allow Oracle to avoid executing the recursive statement. I should point out that the surprisingly large impact is visible because I’ve executed 2,000 separate statements from the client side; mechanisms like array inserts and pl/sql loops might only have to do this check once for each array or pl/sql call. (Tests of this hypothesis are left as exercise for the interested reader.)

dense_rank

Filed under: Execution plans,Oracle,Performance,sorting,subqueries — Jonathan Lewis @ 6:42 pm GMT Mar 12,2020

I’ve just been prompted to complete and publish a draft I started a few years ago. It’s (ultimately) about a feature that appeared in 9i but doesn’t seem to show up very often at client sites or as a common solution to performance problems on the various Oracle forums – but maybe that’s not surprising given how slowly analytic functions have been taken up.

I want to work towards the feature by starting with a requirement, then examine several solutions. To supply a touch of realism I’ll create an orders table, which holds a customer id and an order date (including time), ,and then ask for a report of the most recent order for each customer. Here’s some starting data:

rem
rem     Script:         order_sample.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.0        Costs are consistent
rem             11.2.0.4        Costs become consistent by 11.2.0.3
rem             11.1.0.7
rem             10.2.0.3
rem              9.2.0.8
rem

create table orders
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                                                                  id,
        mod(rownum-1,200)                                                       customer,
        sysdate - 20 + dbms_random.value(0,20)                                  date_ordered,
        rpad('x' || to_char(trunc(dbms_random.value(0,1000)),'FM009'),100)      padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid wordpress format issue
;

alter table orders modify customer not null;
alter table orders modify date_ordered not null;
alter table orders add constraint ord_pk primary key(id);

create index ord_cus on orders(customer);
-- create unique index ord_cus_date on orders(customer, date_ordered);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'orders',
                method_opt       => 'for all columns size 1',
                cascade          => true
        );
end;
/

I’ve got 200 customers, at 50 orders per customer dating back over the last 20 days. There’s a primary key on the table and (as it stands) an obvious “foreign key index” on the customer column, though I’ve allowed for changing this to a more useful (customer, date_ordered) combination which I’ve decided could be declared as unique.

With this data, how do I report “the most recent order for each customer”? The first question to ask in response to this request is: “do you literally mean ‘THE’ most recent; what if the customer has placed two or more orders on the same day or, in my initial case, at the same time?” There’s a special case to think about the moment you start to turn the natural language request into a formal language specification.

In this case I’m going to run with the “customer-only” index and allow for the fact that two or more orders could be placed at the same time by the same customer, and report both (all) of them if any such simultaneously placed orders appear.

Strategy number 1:

Start with a list showing the most recent order date for each customer and then report all orders that we can identify using that list of (customer, date_ordered). To do that I’ll start with a simple aggregate query and use the result it produced in an “IN” subquery:


prompt  =========================
prompt  Use IN subquery for max()
prompt  =========================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        orders  ord1
where
        (ord1.customer, ord1.date_ordered) in (
                select  /*+ qb_name(subq) */
                        ord2.customer, max(ord2.date_ordered)
                from
                        orders  ord2
                group by 
                        ord2.customer
        )
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

Plan hash value: 1500776991

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |          |      1 |      1 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN RIGHT SEMI|          |      1 |      1 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1568K (0)|
|   3 |    VIEW               | VW_NSO_1 |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |          |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1421K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS   |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS   |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."CUSTOMER"="CUSTOMER" AND "ORD1"."DATE_ORDERED"="MAX(ORD2.DATE_ORDERED)")

I’ve included the qb_name() hint in both query blocks here – it’s always a good idea as it gives you a little extra edge in interpreting the execution plan when the queries get more complicated.

The first thing you’ll notice about the resulting execution plan is that the optimizer has “unnested” the subquery to create an inline view (which it has named VW_NSO_1) and then used a simple join to get the final result. That’s an interesting observation, and it’s something that will often happen with an “IN” subquery – and that brings us to strategy 2.

Strategy number 2:

Some people will take as gospel the claim that the optimizer “cannot handle subqueries efficiently” and will prefer to write their own inline views (possibly using the “WITH subquery” a.k.a. “Common Table Expression (CTE)” mechanism). There will be occasions, even in the latest versions of Oracle, where you may need to do this but there will also be occasions where the optimizer hasn’t done it because it would produce the wrong results – and I have seen a couple of accidents go into production code where this variant has been written incorrectly.


prompt  ==============================
prompt  Introduce max() as inline view
prompt  ==============================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        (
                select  /*+ qb_name(in_line) */
                        ord2.customer, max(ord2.date_ordered) date_ordered
                from
                        orders  ord2
                group by 
                        ord2.customer
        )       ordv,
        orders  ord1
where
        ord1.customer     = ordv.customer
and     ord1.date_ordered = ordv.date_ordered
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

Plan hash value: 2750501889

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |        |      1 |    200 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN           |        |      1 |    200 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1531K (0)|
|   3 |    VIEW               |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1413K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."CUSTOMER"="ORDV"."CUSTOMER" AND "ORD1"."DATE_ORDERED"="ORDV"."DATE_ORDERED")

You’ll notice, of course, the remarkable similarity between the previous plan and this one – the only significant difference being that the optimimzer used a “plain” hash join here rather than the “hash join right semi” that appeared in the previous plan. The “right semi” is an indication that the optimizer has first transformed your “IN” subquery to an equivalent “EXISTS” (“=ANY”) subquery. Don’t be misled by the “right”, by the way, this isn’t indicating any sort of outer join it’s just trying to let you know which table is the one where Oracle should stop its probe after finding the first row. It is, however, unfortunate that it gets a little awkward trying to sort out left from right when Oracle can do a “swap join inputs” on you.

It would have been nice if the VIEW operatio1n had reported the name of my inline view (to correspond to the generated VW_NSO_1 viewname from the previous plan) – but you if you included the ‘alias’ formatting option in the call to display_cursor() it would have reported the alias ordv@main at operation 3.

Strategy Number 3:

We might have decided to check every row in the table to see if the date in that row was the most recent date for the customer in that row, which we could do by running a correlated subquery to do the check for every row in the table.

prompt  ========================================
prompt  Orders with correlated EQUALITY subquery
prompt  ========================================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        orders  ord1
where
        ord1.date_ordered = (
                select  /*+ qb_name(subq) */
                        max(ord2.date_ordered)
                from
                        orders  ord2
                where
                        ord2.customer = ord1.customer
        )
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));


Plan hash value: 1152467146

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |         |      1 |    200 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN           |         |      1 |    200 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1622K (0)|
|   3 |    VIEW               | VW_SQ_1 |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |         |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1435K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS  |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS  |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."DATE_ORDERED"="MAX(ORD2.DATE_ORDERED)" AND "ITEM_1"="ORD1"."CUSTOMER")

Yet again we end up with the same execution plan (barring the “right semi” issue) but with a different generated name for the unnested subquery. This is an interesting facet of Oracle (and SQL in general) – completely different ways of stating a requirement can end up doing the same work in the same way.

An important corrollary to this observation is that the first thing you should do when you start writing an SQL statement is to write it in a way that clearly expresses the requirement and is easy for others to comprehend. Don’t (at the first stage) try to do anything clever because (a) you may do it wrong and (b) the optimizer might have taken your clear, simple, code and done the clever bit behind the scenes for you.

However, we may have to move on to doing something new (-ish) and exciting.

Strategy number 4:

An “obvious” defect in the three plans so far is that we have to visit the orders table twice. Is there a way we can avoid doing this? The answer is yes. Oracle 8.1.6 gave us the analytic functions:


prompt  =======================
prompt  Analytic max() function
prompt  =======================

column padding noprint
column date_ordered noprint

select
        /*+ qb_name(main) */
        ordv.* 
from    (
        select  /*+ qb_name(inline) */
                customer, id, date_ordered, padding,
                max(date_ordered) over (
                        partition by customer
                ) max_date
        from    orders  ord2
        )       ordv
where
        ordv.date_ordered = ordv.max_date
order by
        ordv.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats last'));

Plan hash value: 813391662

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   262 (100)|    200 |00:00:00.01 |     172 |       |       |          |
|*  1 |  VIEW               |        |      1 |  10000 |   262   (3)|    200 |00:00:00.01 |     172 |       |       |          |
|   2 |   WINDOW SORT       |        |      1 |  10000 |   262   (3)|  10000 |00:00:00.01 |     172 |  1612K|   624K| 1432K (0)|
|   3 |    TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ORDV"."DATE_ORDERED"="ORDV"."MAX_DATE")

By adding the analytic max() function I can acquire the necessary “raw” data once and post-process it to find the max(date_ordered) for each customer before discarding all the rows where the row’s date doesn’t match the maximum date. The expression “max(date_ordered) over (partition by customer)” is like a virtual column that tells Oracle to partition the data by customer and find the maximum date within customer. Imagine copying the original data into a spreadsheet, sorting it by customer, then using one of the spreadsheet functions to add an extra column that derives it’s value by looking at the rows that have the same customer as the current row and you’ve got an exact replica of what Oracle is doing here.

So we’ve managed to produce the same result with a single tablescan of orders instead of the two tablescans we saw in every other plan. But there’s a drawback – to be able to partition by customer Oracle has had to fetch every row and column we’re interested in and sort the data before deriving values for the new column: the cost of this plan (262) is much higher than the cost of the plan (54) we got from the previous three queries.

In this case the variation in actual run-time for the two different plans was undetectable – and insignificant compared to the time spent getting the result set to the terminal and displaying. In general, though, you need to consider the trade off between the sorting that goes into the use of analytic functions and the “double visit” work of using subqueries.

Strategy number 5:

There is (at least) one more possibility that I’ve used in the past when the data structure has allowed it to produce the right answers; and it’s the one that is the ultimate target of this blog. Consider the following SQL:


select
        customer, 
        max(id)                 keep (dense_rank last order by date_ordered) as max_id,
        max(date_ordered)       keep (dense_rank last order by date_ordered) as max_date,
--      max(padding)            keep (dense_rank last order by date_ordered) as max_padding
        trim(
                max(padding)    keep (dense_rank last order by date_ordered)
        )       as max_padding
from
        orders
group by
        customer
;

(The trim() function on the padding column doesn’t change the fundamental behaviour of this query, it’s just there to avoid line-wrapping on my output.)

I’ve written a query that does an aggregate on customer, so “customer, max() group by customer”, but it’s a variant of the analytic max() function based on “keep(dense_rank last order by …)” rather then the more familiar “over(partition by … order by …)” form.

Because of the group by customer, the max() function is applied per customer (i.e. behaving like over(partition by customer)), and we’re not actually looking for the maximum value of the referenced column, we’re first ordering by the date_ordered (within customer) applying the dense_rank mechanism, keeping only the rows that have the highest (last) dense_rank, and then taking the maximum of that subset of the data.

Here’s an example applying the combination of mechanisms to a tiny data set:

Raw data
=========
   N1           V1
-----           ---
   93           'C'
   54           'Q',
   43           'A'
   13           'X'
   93           'G'
   54           'W',

Ordered by N1 and dense_rank() appended
========================================
   N1           V1              dr()
-----           ---             ----
   13           'X'             1
   43           'A'             2
   54           'W',            3
   54           'Q',            3
   93           'G'             4
   93           'C'             4

Keep(dense rank last)
=====================
   N1           V1              dr()
-----           ---             ----
   93           'G'             4
   93           'C'             4


max(v1) keep(dense rank last order by n1)
V1
---
'G'

In this tiny example we had cases where there were multiple rows for some of the rankings, but if we go back to our orders table and guarantee (by adding a unique constraint) that a customer will never have more than one row for any one value of date_ordered, then the expression max(id) keep (dense_rank last order by date_ordered) for any one customer will be the id of the row that has the maximum order date for that customer and, similarly, max(date_ordered) keep(…), and max(padding) keep (,,,) will also be the values from that same row.

Given the (absolutely critical) uniqueness constraint, we can get the data for the most recent for the customer using this dense_rank() strategy.

The question, of course, is why would we do something that may not be entirely intuitive and looks as if it could make Oracle do a lot of extra work to get the result. Here’s the answer – which is just the execution plan for the query on my orders table – with the unique constraint added:


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |    28 (100)|    200 |00:00:00.01 |     172 |       |       |          |
|   1 |  SORT GROUP BY     |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |   142K|   142K|  126K (0)|
|   2 |   TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

The path uses a basic SORT GROUP BY, that “sorts” only 200 rows (A-rows) using only 126KB of memory. Compare that with the plan for the analytic max() over() in strategy 4 that takes 1.6MB of memory and sorts 10,000 rows and you’ll appreciate that the keep(dense_rank last) mechanism is doing something much more efficient. For cases where the drop from “num_rows” to “num_distinct” for the aggregating column(s) the benefit of using the somewhat non-intuitive dense_rank() approach may make a significant difference to memory, CPU, and even (if it avoids a spill to disk) I/O.

Footnotes

There are two major variations on how you can use the dense_rank() function, as well as this form in which dense_rank appears in the KEEP LAST (and FIRST) mechanism.

Remember the absolutely critical point that the “keep dense_rank last” strategy is only correct if there is a suitable unique constraint on the data set viz: unique on ({group by column(s)},{order by column(s)}).

There is another major option for getting the same “most recent” rows, which is to use the match_recognize() functionality, but I think I probably wrote this draft before the mechanism even existed – so it’s left as an exercise to the reader to work it out.  A key reason why I’m not going to do it myself is that (like the analytic over() in strategy 4) it will require all 10,000 rows to be sorted, and is therefore likely to be less efficient than strategy 5.

Finally – I thought I’d written a note explaining why a “sort group by” can use far less memory and CPU then a basic “sort order by”, but if I have it wasn’t on this blog.  I do have a note on how the mechanism to handle “rownum <= N” with a preceding “order by” minimises its use of memory, and that note should give you some idea of what the “sort group by” is doing to minimise memory usage. I’ll try to write a little note on the aggregate mechanism some time over the next few days.

 

 

February 25, 2020

count(*) – again

Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 1:24 pm GMT Feb 25,2020

I’ve just received an email asking (yet again) a question about counting the number of rows in a table.

We have a large table with a CLOB column, where the CLOB occupies 85% storage space.
when we use select count(*) from , the DBA says that you should not use count(*) as it uses all columns and as this table contains CLOB it results in high CPU usage, where as if we use count(rowid), this brings us faster and same result.

Well I’ve pointed out in the past, in fact more than once, that count(*), count(1), count(declared-non-null-column) will all do the same thing … execute as count(*); I’ve also listed a few quirky anomalies, also more than once. However, count(rowid) is a little different, it doesn’t get transformed to count(*) as we can see from two pieces of evidence:

Exhibit A: fragments from a 10053 (CBO) trace file

----- Current SQL Statement for this session (sql_id=d381q70418ugs) -----
select count(rowid) from emp1

... 

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT("EMP1".ROWID) "COUNT(ROWID)" FROM "TEST_USER"."EMP1" "EMP1"

Unlike the various count() calls that are converted to count(*), counting rowids doesn’t seem to go througn the CNT transformation and the final state of the query still shows count(rowid) as the critical mechanism.

Exhibit B: variations in Column Projection Information

SQL> explain plan for select count(1) from emp1;

SQL> select * from table(dbms_xplan.display(null,null,'projection'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 2110459082

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |     7  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| E1_PK | 20000 |     7  (15)| 00:00:01 |
-----------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]

SQL> explain plan for select count(rowid) from emp1;

SQL> select * from table(dbms_xplan.display(null,null,'projection'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 2110459082

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    12 |     7  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    12 |            |          |
|   2 |   INDEX FAST FULL SCAN| E1_PK | 20000 |   234K|     7  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(ROWID)[22]
   2 - ROWID[ROWID,10]

The count(1) query transforms to count(*) and we don’t see any column information begin generated at operation 2 and passing up to the parent operation. However with count(rowid) we see Oracle at operation 2 constructing actual rowids from block addresses and row directory entries and passing them up to the the parent operation.

The CPU it takes to construct and pass the rowids is probably insignificant compared to the CPU usage of accessing data blocks in the first place so I doubt if there would be much visible difference in clock-time between count(1) (or count(*)) and count(rowid), but technically it looks as if count(rowid) would actually be slower and more resource-intensive than anything that went through the count(*) transformation.

In passing – the execution plan that appears in our count(1) example also tells us that count(*) doesn’t “use all columns” – after all, there aren’t many tables where every column is in the primary key and emp1 is no exception to the general rule, and the plan is doing an index fast full scan of the primary key index.

 

December 16, 2019

IOT Bug

Filed under: Bugs,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 3:58 pm GMT Dec 16,2019

Here’s a worrying bug that showed up a couple of days ago on the Oracle-L mailing list. It’s a problem that I’ve tested against 12.2.0.1 and 19.3.0.0 – it may be present on earlier versions of Oracle. One of the nastiest things about it is that you might not notice it until you get an “out of space” error from the operating system. You won’t get any wrong results from it, but it may well be adding an undesirable performance overhead.

Basically it seems that (under some circumstances, at least) Oracle is setting the “block guess” component of the secondary index on Index Organized Tables (IOTs) to point to blocks in the overflow segment instead of blocks in the primary key segment. As a result, when you execute a query that accesses the IOT through the secondary index and has to do reads from disc to satisfy the query – your session goes through the following steps:

  • Identify index entry from secondary index – acquire “block guess”
  • Read indicated block and discover the object number on the block is wrong, and the block type is wrong
  • Write a (silent) ORA-01410 error and do a block dump into the trace file
  • Use the “logical rowid” from the secondary index (i.e. the stored primary key value) to access the primary key index by key value

So your query runs to completion and you get the right result because Oracle eventually gets there using the primary key component stored in the secondary index, but it always starts with the guess[see sidebar] and for every block you read into the cache because of the guess you get a dump to the trace file.

Here’s a little code to demonstrate. The problem with this code is that everything appears to works perfectly, you have to be able to find the trace file for your session to see what’s gone wrong. First we create some data – this code is largely copied from the original posting on Oracle-L, with a few minor changes:


rem
rem     Script:         iot_bug_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Notes
rem     THe OP had tested on 19.5.0.0 to get the same effect, see:
rem     //www.freelists.org/post/oracle-l/IOT-cannot-get-valid-consensus-bug-or-unexplained-behavio
rem

drop table randomload purge;

create table randomload(
        roll number,
        name varchar2(40),
        mark1 number,
        mark2 number,
        mark3 number,
        mark4 number,
        mark5 number,
        mark6 number,
        primary key (roll)
) 
organization index 
including mark3 overflow
;

create index randomload_idx on randomload(mark6);

insert into randomload 
select 
        rownum, 
        dbms_random.string(0,40) name, 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,10000)) 
from 
        dual 
connect by 
        level < 1e5 -- > comment to avoid wordpress format issue
;

commit;

exec dbms_stats.gather_table_stats(null,'randomload', cascade=>true);

prompt  ==================================================
prompt  pct_direct_access should be 100 for randomload_idx
prompt  ==================================================

select 
        table_name, index_name, num_rows, pct_direct_access, iot_redundant_pkey_elim  
from 
        user_indexes
where
        table_name = 'RANDOMLOAD'
;

It should take just a few seconds to build the data set and you should check that the pct_direct_access is 100 for the index called randomload_idx.

The next step is to run a query that will do an index range scan on the secondary index.

 
column mark6 new_value m_6

select 
        mark6, count(*) 
from
        randomload 
group by 
        mark6
order by 
        count(*)
fetch first 5 rows only
;

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';
set serveroutput off

select avg(mark3) 
from 
        randomload 
where 
        mark6 = &m_6
;

select * from table(dbms_xplan.display_cursor);

alter session set events '10046 trace name context off';
set serveroutput on

I’ve started by selecting one of the least frequencly occuring values of m_6 (a column I know to be in the overflow); then I’ve flushed the buffer cache so that any access I make to the data will have to start with disk reads (the original poster suggested restarting the database at this point, but that’s not necessary).

Then I’ve enabled sql_trace to show wait states (to capture details of what blocks were read and which object they belong to),, and I’ve run a query for m_3 (a column that is in the primary key (TOP) segment of the IOT) and pulled its execution plan from memory to check that the query did use a range scan of the secondary index. Here’s the plan:

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |       |       |    11 (100)|          |
|   1 |  SORT AGGREGATE    |                   |     1 |     7 |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_IOT_TOP_77298 |    10 |    70 |    11   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN| RANDOMLOAD_IDX    |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MARK6"=1316)
   3 - access("MARK6"=1316)

As you can see the plan shows what we are hoping to see – an index range scan of the secondary index that let’s it follow up with a unique scan of the primary key segment. It’s just a little odd that the access predicate reported for operation 2 (unique scan of TOP) suggests that the access is on a column that isn’t in the primary key and isn’t even in the TOP section.

So the query works and gives the right answer. But what do we find in the trace directory ? If you’re running 12c (possibly only 12.2), each time the error occurs the following pattern of information will be written to the alert log (it didn’t appear in 19.3)


ORCL(3):Hex dump of (file 22, block 16747) in trace file /u01/app/oracle/diag/rdbms/orcl12c/orcl12c/trace/orcl12c_ora_7888.trc
ORCL(3):
ORCL(3):Corrupt block relative dba: 0x0580416b (file 22, block 16747)
ORCL(3):Bad header found during multiblock buffer read (logical check)
ORCL(3):Data in bad block:
ORCL(3): type: 6 format: 2 rdba: 0x0580416b
ORCL(3): last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
ORCL(3): spare3: 0x0
ORCL(3): consistency value in tail: 0x484c0601
ORCL(3): check value in block header: 0x4408
ORCL(3): computed block checksum: 0x0
ORCL(3):

And the following pattern of information is written to the trace file [Update: a follow-up test on 11.2.0.4 suggests that the basic “wrong block address” error also happens in that version of Oracle, but doesn’t result in a dump to the trace file]:


kcbzibmlt:: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 makecr 0 line 15461 ds_blk (22, 16747) bh_blk (22, 16747)
kcbds 0x7ff1ca8c0b30: pdb 3, tsn 8, rdba 0x0580416b, afn 22, objd 135348, cls 1, tidflg 0x8 0x80 0x0
    dsflg 0x108000, dsflg2 0x0, lobid 0x0:0, cnt 0, addr 0x0, exf 0x10a60af0, dx 0x0, ctx 0
    whr: 'qeilwh03: qeilbk'
env [0x7ff1ca8e3e54]: (scn: 0x00000b860e364893   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000000000000000  hi-scn: 0x0000000000000000  ma-scn: 0x00000b860e364879  flg: 0x00000660)
BH (0xb1fd6278) file#: 22 rdba: 0x0580416b (22/16747) class: 1 ba: 0xb1c34000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 763,14
  dbwrid: 0 obj: 135348 objn: 135348 tsn: [3/8] afn: 22 hint: f
  hash: [0x9eff0528,0x77cff808] lru: [0xb1fd2578,0x9ff84658]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xb6f654c0,0x9ff84680] objaq: [0xb6f654d0,0x9ff84690]
  use: [0x77b78128,0x77b78128] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  Printing buffer operation history (latest change first):
  cnt: 5
  01. sid:10 L122:zgb:set:st          02. sid:10 L830:olq1:clr:WRT+CKT
  03. sid:10 L951:zgb:lnk:objq        04. sid:10 L372:zgb:set:MEXCL
  05. sid:10 L123:zgb:no:FEN          06. sid:10 L083:zgb:ent:fn
  07. sid:08 L192:kcbbic2:bic:FBD     08. sid:08 L191:kcbbic2:bic:FBW
  09. sid:08 L604:bic2:bis:REU        10. sid:08 L190:kcbbic2:bic:FAW
  11. sid:08 L602:bic1_int:bis:FWC    12. sid:08 L822:bic1_int:ent:rtn
  13. sid:08 L832:oswmqbg1:clr:WRT    14. sid:08 L930:kubc:sw:mq
  15. sid:08 L913:bxsv:sw:objq        16. sid:08 L608:bxsv:bis:FBW
Hex dump of (file 22, block 16747)

   ... etc.

Corrupt block relative dba: 0x0580416b (file 22, block 16747)
Bad header found during multiblock buffer read (logical check)
Data in bad block:
 type: 6 format: 2 rdba: 0x0580416b
 last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x484c0601
 check value in block header: 0x4408
 computed block checksum: 0x0
TRCMIR:kcf_reread     :start:  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
TRCMIR:kcf_reread     :done :  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf

The nasty bit, of course, is the bit I’ve removed and replaced with just “etc.”: it’s a complete block dump (raw and symbolic) which in my example was somthing like 500 lines and 35KB in size.

It’s not immediately obvious exactly what’s going on and why, but the 10046 trace helps a little. From another run of the test (on 19.3.0.0) I got the following combination of details – which is an extract showing the bit of the wait state trace leading into the start of the first block dump:

WAIT #140478118667016: nam='db file scattered read' ela= 108 file#=13 block#=256 blocks=32 obj#=77313 tim=103574529210
WAIT #140478118667016: nam='db file scattered read' ela= 2236 file#=13 block#=640 blocks=32 obj#=77313 tim=103574531549
WAIT #140478118667016: nam='db file scattered read' ela= 534 file#=13 block#=212 blocks=32 obj#=77312 tim=103574532257
kcbzibmlt: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 warm_up abort 0 makecr 0 line 16082 ds_blk (13, 212) bh_blk (13, 212)

Object 77313 is the secondary index, object 77312 is the primary key index (IOT_TOP). It may seem a little odd that Oracle is using db file scattered reads of 32 blocks to read the indexes but this is a side effect of flushing the buffer – Oracle may decide to prefeetch many extra blocks of an object to “warmup” the cache just after instance startup or a flush of the buffer cache. The thing I want to check, though, is what’s wrong with the blocks that Oracle read from object 77312:


alter system dump datafile 13 block min 212 block max 243;

BH (0xc8f68e68) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xc8266000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa7fd6c38) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xa7c2a000
  set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa5f75780) file#: 13 rdba: 0x034000d5 (13/213) class: 0 ba: 0xa5384000
  set: 11 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xdafe9220) file#: 13 rdba: 0x034000d5 (13/213) class: 1 ba: 0xdadcc000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

I’ve reported the first few lines of the symbolic dump for the first few blocks of the resulting trace file. Look at the third line of each group of three BH lines: it’s reporting object 77311 (the overflow segment), not 77312 (the TOP segment). And every single block reported in the db file scattered read of 32 blocks for object 77312 reports itself, when dumped, as being part of object 77311. And that’s possibly the immediate cause of the ORA-01410.

We can take the investigation a little further by dumping a leaf block or two from the secondary index.


alter session set events 'immediate trace name treedump level 77313';

----- begin tree dump
branch: 0x3400104 54526212 (0: nrow: 542, level: 1)
   leaf: 0x340010d 54526221 (-1: row:278.278 avs:2479)
   leaf: 0x340075e 54527838 (0: row:132.132 avs:5372)
   leaf: 0x34005fb 54527483 (1: row:41.41 avs:7185)

alter system dump datafile 13 block 1886   -- leaf: 0x340075e

BH (0xd5f5d090) file#: 13 rdba: 0x0340075e (13/1886) class: 1 ba: 0xd5158000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77313 objn: 77313 tsn: [3/6] afn: 13 hint: f
...
row#6[5796] flag: K------, lock: 0, len=18
col 0; len 2; (2):  c1 1d
col 1; len 4; (4):  c3 07 41 5c
tl: 8 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 4]  03 40 05 7c

I’ve done a treedump of the secondary index and picked a leaf block address from the treedump and dumped that leaf block, and from that leaf block I’ve extracted one index entry to show you the three components: the key value (c1 1d), the primary key for the row (c3 07 41 5c), and the block guess (03 40 05 75). Read the block guess as a 4 byte hex number, and it translates to file 13, block 1397 – which should belong to the TOP segment. So the exciting question is – what object does block (13, 1397) think it belongs to ?


alter system dump datafile 13 block 1397;

Block header dump:  0x03400575
 Object id on Block? Y
 seg/obj: 0x12dff  csc:  0x00000b860e308c46  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400501 ver: 0x01 opc: 0
     inc: 0  exflg: 0

Converting from Hex to Decimal: obj: 0x12dff = 77311 which is the overflow segment. The secondary index block guess is pointing at a block in the overflow segment.

There are two ways to handle this problem – you could simply rebuild the index (alter index rebuild) or, as the original poster did, use the “update block references” command to correct all the block guesses: “alter index randomload_idx update block references;”. Neither is desirable, but if you’re seeing a lot of large trace files following the pattern above then it may be necessary.

There was one particular inconsistency in the tests – which I ran many times – occasionally the pct_direct_access for the secondary index would be reported as zero (which, technically, should always happen given the error).  If it did, of course, Oracle wouldn’t follow the guess but would go straight to the step where it used the primary key “logical rowid” – thus bypassing the error and block dump.

tl;dr

In some circumstances the block guesses in the secondary indexes of IOTs may be pointing to the overflow segment instead of the primary key (TOP) segment. If this happens then queries will still run and give the right answers, but whenever they read a “guessed” block from disc they will report an ORA-01410 error and dump a block trace. This will affect performance and may cause space problems at the O/S level.

Sidebar

An entry in the secondary index of an Index Organized Table (IOT) consists of three parts, which intially we can think in the form:

({key-value}, {logical rowid}, {block guess})

Since IOTs don’t have real rowids the “logical rowid” is actually the primary key of the row where the {key value} will be found. As a short cut for efficient execution Oracle includes the block address (4 bytes) where that primary key value was stored when the row was inserted. Because an IOT is an index “rows” in the IOT can move as new data is inserted and leaf blocks split, so eventually any primary key may move to a different block – this is why we refer to the block address as a guess – a few days, hours, or minutes after you’ve inserted the row the block address may no longer be correct.)

To help the runtime engine do the right thing Oracle collects a statistic called pct_direct_access for secondary indexes of IOTs. This is a measure of what percentage of the block guesses are still correct at the time that the statistics are gathered. If this value is high enough the run-time engine will choose to try using the block guesses while executing a query (falling back to using the logical rowid if it turns out that the guess is invalid), but if the value drops too low the optimizer will ignore the block guesses and only use the logical rowid.

Not relevant to this note – but a final point about secondary indexes and logical rowids – if the definition of the index includes  some of the columns from the primary keys Oracle won’t store those columns twice (in more recent version, that is) – the code is clever enough to use the values stored in the (key value) component when it needs to use the (logical rowid) component.

 

Update (Jan 2020)

I passed this example on to Oracle, and there are now two bugs recorded for it:

  • Bug 30733525 – ALERT LOG ENTRIES RE BLOCK GUESSES IN THE SECONDARY INDEXES OF IOTS POINTING TO OVERFLOW SEGMENT INSTEAD OF INDEX SEGMENT
  • Bug 30733563 – WRONG GUESS DBA IN INDEX

 

 

 

December 6, 2019

Temp space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:18 pm GMT Dec 6,2019

A question about hunting down the source of the error “ORA-01652 unable to extend temp segment by NNN in tablespace XXX” shows up on the Oracle-L mailing list or the Oracle developer community forum from time to time. In most cases the tablespace referenced is the temporary tablespace, which means the session reporting the error was probably trying to allocate some space for sorting, or doing a hash join, or instantiating a GTT (global temporary table) or a CTE (common table expression / “with” subquery). The difficulty in cases like this is that the session reporting the error might be the victim of some other session’s greed – so looking at what the session was doing won’t necessarily point you to the real problem.

Of course you then run into a further problem tracking down the source of the problem. By the time you hear the complaint (even if it’s only seconds after the error appeared) the session that had been hogging the temporary tablespace may have finished what it was doing, leaving a huge amount of free space in the temporary tablespace and suggesting (to the less experienced and cynical user) that there’s something fundamentally wrong with the way Oracle has been accounting for space usage.

If you find yourself in this situation remember that (if you’re licensed to take advantage of it) the active session history may be able to help.  One of the columns in v$active_session_history is called temp_space_allocated with the slightly misleading description: “Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken”. A simple query against v$active_session_history may be enough to identify the session and SQL  statement that had been holding the temporary space when the error was raised, for example:


column pga_allocated        format 999,999,999,999
column temp_space_allocated format 999,999,999,999

break on session_id skip 1 on session_serial#

select
        session_id, session_serial#, 
        sample_id, 
        sql_id, 
        pga_allocated,
        temp_space_allocated
from
        v$active_session_history
where
        sample_time between sysdate - 5/1440 and sysdate
and     nvl(temp_space_allocated,0) != 0
order by
        session_id, sample_id
/

All I’ve done for this example is query v$active_session_history for the last 5 minutes reporting a minimum of information from any rows that show temp space usage. As a minor variation on the theme you can obviously change the time range, and you might want to limit the output to rows reporting more than 1MB (say) of temp space usage.

You’ll notice that I’ve also reported the pga_allocated (Description: Amount of PGA memory (in bytes) consumed by this session at the time this sample was taken) in this query; this is just a little convenience – a query that’s taking a lot of temp space will probably start by acquiring a lot of memory so it’s nice to be able to see the two figures together.

There are plenty of limitations and flaws in the usefulness of this report and I’ll say something about that after showing an example of usage. Let’s start with a script to build some data before running a space-consuming query:


rem
rem     Script:         allocate_tempspace.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 as 
select * from all_objects
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

execute dbms_stats.gather_table_stats(null,'t1')

execute dbms_lock.sleep(20)

set pagesize  60
set linesize 255
set trimspool on
set serveroutput off
alter session set statistics_level = all;

with ttemp as (
        select /*+ materialize */ * from t1
)
select 
        /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ 
        t1a.object_type,
        max(t1a.object_name)
from
        ttemp t1a, ttemp t1b
where
        t1a.object_id = t1b.object_id
group by
        t1a.object_type
order by
        t1a.object_type
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

My working table t1 consists of 16 copies of the view all_objects – so close to 1 million rows in my case – and the query is hinted to avoid any of the clever transformations that the optimizer could use to reduce the workload so it’s going to do a massive hash join and aggregation to report a summary of a couple of dozen rows. Here’s the execution plan (in this case from 12.2.0.1, though the plan is the same for 19.3 with some variations in the numbers).


SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665B_E2772D3 |      1 |        |      0 |00:00:01.51 |   28915 |      0 |   9217 |  2068K|  2068K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    989K|    989K|00:00:00.24 |   19551 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     29 |     29 |00:00:08.51 |   18493 |  21345 |   2910 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     15M|     15M|00:00:03.93 |   18493 |  21345 |   2910 |    48M|  6400K|   65M (1)|   25600 |
|   6 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.36 |    9233 |   9218 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.35 |    9233 |   9218 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.40 |    9257 |   9217 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.39 |    9257 |   9217 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")

Critically this plan shows us two uses of the temp space but only reports one of them as Used-Tmp. The “hash join” at operation 5 tells us that it reached 65MB of (tunable PGA) memory before going “1-pass”, eventually dumping 25,600 KB to disc. This space usage is corroborated by the 2,910 writes (which, at an 8KB block size, would be 23,280 KB). The missing Used-Tmp, however, is the space taken up by the materialized CTE. We can see that operation 2 is a “load as select” that writes 9,217 blocks to disc (subsequently read back twice – the tablescans shown in operations 7 and 9). That’s  74,000 KB of temp space that doesn’t get reported Used-Tmp.

If we take a look at the plan from 19.3 we see different numbers, but the same “error of omission”:

SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6624_E259E68 |      1 |        |      0 |00:00:01.26 |   23706 |      0 |   5593 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    907K|    907K|00:00:00.21 |   18024 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     25 |     25 |00:00:06.89 |   11193 |  13843 |   2655 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     14M|     14M|00:00:03.55 |   11193 |  13843 |   2655 |    44M|  6400K|   64M (1)|      23M|
|   6 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.26 |    5598 |   5594 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.25 |    5598 |   5594 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.34 |    5595 |   5594 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.33 |    5595 |   5594 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


With slightly fewer rows in t1 (907K vs. 989K) we write 5,593 blocks for the materialized CTE  (instead of 9,217) and spill 2,655 blocks during the hash join (instead of 2,910). But again it’s only the hash join spill that is reported under Used-Tmp. Note, by the way, that the Used-Tmp in 12.2 was reported in KB when it’s reported in MB in 19.3.0.0.

Side note: comparing the number of rows created and blocks written for the CTE, it looks as if 19.3 is using the data blocks much more efficiently than 12.2. There’s no obvious reason for this (though a first guess would be that the older mechanism is to write a GTT with pctfree=10 while the new avoid any free space and transactional details) so, as ever, I now have another draft for a blog note reminding me to investigate (eventually) what differences there are in CTE storage on the upgrade. It’s something that might make a difference in a few special cases.

With the figures from the execution plans in mind we can now look at the results of the query against v$active_session_history. Conveniently the queries took a few seconds to complete, so we’re going to see several rows for each execution.

First the results from 12.2.0.1

SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
        14           22234   15306218 1cwabt12zq6zb       95,962,112            1,048,576
                             15306219 1cwabt12zq6zb       97,731,584           37,748,736
                             15306220 1cwabt12zq6zb      148,194,304           77,594,624
                             15306221 1cwabt12zq6zb      168,117,248           85,983,232
                             15306222 1cwabt12zq6zb      168,117,248           90,177,536
                             15306223 1cwabt12zq6zb      168,117,248           95,420,416
                             15306224 1cwabt12zq6zb      168,117,248           98,566,144
                             15306225 1cwabt12zq6zb      168,117,248          102,760,448
                             15306226 1cwabt12zq6zb      116,933,632          103,809,024
                             15306227 1cwabt12zq6zb      116,933,632          103,809,024
                             15306228 b66ycurnwpgud        8,602,624            1,048,576

I pointed out that we had 25,600 KB reported as Used-Tmp and roughly 74,000 KB unreported – a total of nearly 100,000 KB that is reasonably close to the 103,800,000 bytes reported by ASH. Moreover the timing of the plan (loading the CTE in the first 2 seconds) seems to agree with the growth to 77,590,000 of temp_space_allocated by the time we get to sample_id 15306220 in ASH. Then we have several seconds of slow growth as the hash join takes place and feeds its resulte up to the sort group by. At the end of the query we happen to have been lucky enough to catch one last sample just before the session had released all its temp space and ceased to be active.  (Note: however, that the sql_id at that sample point was not the sql_id of our big query – and that’s a clue about one of the limitations of using ASH to find the greedy SQL.)

We see the same pattern of behaviour in 19.3.0.0:


SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
       136           42767    2217500 1cwabt12zq6zb      143,982,592           46,137,344
                              2217501 1cwabt12zq6zb      193,527,808           54,525,952
                              2217502 1cwabt12zq6zb      193,527,808           57,671,680
                              2217503 1cwabt12zq6zb      193,527,808           61,865,984
                              2217504 1cwabt12zq6zb      197,722,112           67,108,864
                              2217505 1cwabt12zq6zb      150,601,728           70,254,592
                              2217506 1cwabt12zq6zb      150,601,728           70,254,592

We start with an almost instantaneous jump to 46MB of temp_space_allocated in the first second of the query – that’s the 5,593 blocks of the CTE being materialized, then the slow growth of temp space as the hash join runs, spills to disc, and passes its data up to the sort group by. Again we can see that the peak usage was the CTE (46MB) plus the reported spill of 23MB (plus rounding errors and odd bits).

Preliminary Observations

Queries against ASH (v$active_session_history) can show us sessions that were holding space in the temporary tablespace at the moment a sample of active sessions was taken. This may allow us to identify greedy sessions that were causing other sessions to fail with ORA-01652 (unable to allocate temp segment).

We have seen that there is at least one case where we get better information about temp space allocation from ASH than we do from the variants on v$sql_plan that include the SQL Workarea information (v$sql_workarea, v$sql_workarea_active) because the space acquired during materialization of CTEs is not reported as a “tunable SQL workarea” but does appear in the ASH temp_space_allocated.

At first sight it looks as if we may be able to use the query against ASH to identify the statement (by sql_id) that was the one being run by the greedy session when it consumed all the space. As we shall see in a further article, there are various reasons why this may over-optimistic, however in many cases there’s a fair chance that when you see the same sql_id appearing in a number of consecutive rows of the report then that statement may be the thing that is responsible for the recent growth in temp space usage – and you can query v$sql to find the text and call dbms_xplan.display_cursor() to get as much execution plan information as possible.

Further questions

  • When does a session release the temp_space_allocated? Will the space be held (locked) as long as the cursor is open, or can it be released when it is no longer needed? Will it be held, but releasable, even after the cursor has (from the client program’s perspective) been closed?
  • Could we be fooled by a report that said a session was holding a lot of space when it didn’t need it and would have released it if the demand had appeared?
  • Under what conditions might the temp_space_allocated in an ASH sample have nothing to do with the sql_id reported in the same sample?
  • Are there any other reasons why ASH might report temp_space_allocated when an execution plan doesn’t?
  • Is temp_space_allocated only about the temporary tablespace, or could it include informatiom about other (“permanent”) tablespaces ?

Stay tuned for further analysis of the limitations of using v$active_session_history.temp_space_allocated to help identify the srouce of a space management ORA-01652 issue.

 

 

Next Page »

Website Powered by WordPress.com.