Oracle Scratchpad

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

May 5, 2020

Execution Plans

Filed under: Execution plans,Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 12:36 pm BST May 5,2020

Table Of Contents

1.0 Introduction
2.0 Overview
3.0 The Main Course
4.0 Simplify
5.0 Filling the Gaps
6.0 Looking at the numbers
7.0 Predicate Information
8.0 Resolution
9.0 Summary
Footnote


 

1.0 Introduction

1.1 In a comment to a recent post on reading a non-trivial execution someone asked me to repeat the exercise using a plan I had published a few days previously in a post about tweaking the hints in an outline. The query in question involved a number of subqueries and transformations of different types, which means it’s going to take a little work explaining the details, and it’s probably going to be a fairly long read.

1.2 Here’s the query that produced the plan we’re going to examine. I’ve done some cosmetic alteration  to make it a little easier to read (though it’s still not perfect according to my standards). I’ve also made one very important addition to the query to make it easier to follow my walkthrough of the execution plan; the original text didn’t specify any query block names (/*+ qb_name() */ hints) even though it starts off with 9 separate query blocks, so I’ve walked through the text very carefully adding in the query block names that Oracle would have used (sel$NN) for each query block. In this case I got lucky because there were no views of other recursive problems involved so all I had to do was find each occurence of the word “select” in literal text order and increment the NN in sel$NN for each one.


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id,
                academic_year,
                applicant_gender,
                medium_of_study,
                education_type,
                college_id,
                course_id,
                medium_id,
                hostel_required,
                preference_order,
                status_flag,
                attribute7,  -- Added on 7-mar-20
                college_status_flag,
                percentage,
                caste_category,
                alloted_category,
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        adt.applicant_id,
                        lmt_gender.lov_code applicant_gender,
                        adt.medium_of_study,
                        act.college_id,
                        lmt_education_type.lov_code education_type,
                        act.course_id,
                        act.medium_id,
                        act.hostel_required,
                        act.preference_order,
                        act.status_flag,
                        act.attribute7, -- Added on 7-mar-20
                        adt.college_status_flag,
                        adt.academic_year,
                        adt.percentage,
                        adt.applicant_dob,
                        adt.legacy_appln_date,
                        adt.caste_category,
                        act.attribute1 alloted_category,
                        DECODE (lmt_pass.lov_code,  'ATTFIRST', 1,  'COMPARTL', 2,  3) order_of_pass,
                        DECODE (late_entry_flag,  'N', 1,  'Y', 2,  3)      order_of_entry,
                        DECODE (lmt_appearance.lov_code,  'REGULAR', 1,  'PRIVATE', 2,  3) order_of_appearance,
                        DECODE (adt.is_ttd_employ_ward,  'Y', 1,  'N', 2,  3) order_of_ttd_emp,
                        DECODE (adt.is_balbhavan_studnt,  'Y', 1,  'N', 2,  3) order_of_schooling,
                        act.attribute3 course_qe_priority,
                        adt.is_local_canditature_valid,
                        adt.is_ttd_emp_ward_info_valid,
                        adt.is_sv_bm_student_info_valid,
                        adt.is_social_ctgry_info_valid,
                        DECODE(adt.college_status_flag,'B',1,'O',2,'N',3) order_of_status
                FROM 
                        xxadm.xxadm_applicant_details_tbl    adt,
                        xxadm.xxadm_applicant_coursprefs_tbl act,
                        xxadm.xxadm_college_master_tbl       cmt,
                        xxadm.xxadm_course_master_tbl        crmt,
                        xxadm.xxadm_medium_master_tbl        mmt,
                        xxadm.xxadm_lov_master_tbl           lmt_gender,
                        xxadm.xxadm_lov_master_tbl           lmt_pass,
                        xxadm.xxadm_lov_master_tbl           lmt_appearance,
                        xxadm.xxadm_lov_master_tbl           lmt_religion,
                        xxadm.xxadm_lov_master_tbl           lmt_education_type
                WHERE
                        adt.applicant_id = act.applicant_id
                AND     act.college_id = cmt.college_id
                AND     act.course_id = crmt.course_id
                AND     act.medium_id = mmt.medium_id
                AND     adt.applicant_gender = lmt_gender.lov_id
                AND     adt.pass_type = lmt_pass.lov_id
                AND     adt.appearance_type = lmt_appearance.lov_id
                AND     adt.religion = lmt_religion.lov_id
                AND     cmt.education_type = lmt_education_type.lov_id
                AND     adt.status = 'Active'
                AND     1 = (CASE 
                                WHEN act.hostel_required = 'Y'
                                        THEN (CASE
                                                     WHEN    adt.distance_in_kms >20
                                                     AND     lmt_religion.lov_code = 'HINDU'
                                                     AND     adt.caste_category NOT IN (
                                                                     SELECT  /*+ QB_NAME(SEL$4) */
                                                                             category_id
                                                                     FROM    xxadm.xxadm_category_master_tbl
                                                                     WHERE   category_code IN ('BACKWRDC', 'BACKWRDE')
                                                             )
                                                             THEN 1
                                                             ELSE 2 
                                              END
                                             )
                                        ELSE 1 
                               END
                              )
                AND     1 =  (CASE 
                                WHEN act.hostel_required  = 'Y'
                                        THEN    (CASE 
                                                        WHEN    (    lmt_education_type.lov_code = 'COEDUCOL' 
                                                                 AND mt_gender.lov_code = 'FEMALE'
                                                                )
                                                                THEN 2
                                                                ELSE 1 
                                                 END
                                                )
                                        ELSE 1 
                               END
                              )
                AND     adt.course_applied_for = 'DEG' 
                AND     (adt.college_status_flag IS NULL OR adt.college_status_flag IN ('N','T','C','B','O')) 
                AND     act.preference_order <= NVL( -- > comment to avoid WordPress format issue
                                (SELECT  /*+ QB_NAME(SEL$5) */ 
                                         preference_order 
                                 FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                 WHERE   act1.applicant_id = adt.applicant_id 
                                 AND     status_flag IN('B','T','C','O') 
                                 ), act.preference_order 
                        )
                AND     act.preference_order >=  NVL(
                                (SELECT /*+ QB_NAME(SEL$6) */
                                        preference_order
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act2 
                                WHERE   act2.applicant_id = adt.applicant_id
                                AND     status_flag  = 'C'
                                ), act.preference_order
                        )
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$7) */
                                        act3.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act3
                                WHERE   act3.applicant_id = adt.applicant_id 
                                AND     act3.status_flag  = 'O'
                        ) 
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$8) */
                                        act1.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                WHERE   act1.applicant_id = adt.applicant_id 
                                AND     act1.status_flag IN ('C','B')
                                AND     act1.attribute1 IN (
                                                SELECT  /*+ QB_NAME(SEL9) */
                                                        category_id 
                                                FROM    xxadm.xxadm_category_master_tbl 
                                                WHERE   category_code IN ('OPENMERT')
                                        ) 
                                AND     NVL(act1.attribute7,'N') = 'N'
                        ) 
                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 1-1

1.3 This query first came to light in a thread on the Oracle Developer forums with an extract from a tkprof file showing that it had executed 842,615  times. That number should be ringing alarms and flashing warning lights, but if we assume that there really is no way of doing some sort of batch processing to get through the data we need to do a little bit of arithmetic to see how much of a threat this query is and how much is matters.

1.4 For every extra 0.01 seconds it takes to execute this query the total run-time goes up by8,426 seconds, which is 2 hours and 20 minutes. If the average execution time is a mere 0.06 seconds you’ll be at it all night long – and it will be a long, long night.


 

2.0 Overview

2.1 Before we look at the execution plan let’s take a moment to pick out a few points from the query. You may want to re-open this post in a separate window so that you can switch easily between the SQL and my comments.

2.2 We start off with a simple select from an inline view – and if we replace the inline view the simple “object name” V_THING we get the following query:


select  count(applicant_id)
from    V_THING
where   applicant_id = :p_applicant_id
;

Figure 2-1

2.3 This should prompt two questions

  • First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply, possibly the nature of the view is such that the optimizer could do a simple filter pushdown to apply the predicate very early. That still leaves (or leads on to) the question of whether the optmizer might then be able to generate further uses of the predicate through transitive closure.
  • Secondly, if the view V_THING is a multiable view will we be able to work out which table applicant_id comes from by the time it becomes visible in the view.  It’s possible that changing the table from which applicant_id comes will change the execution plan.

2.4 Digging down one layer we see that our V_THING is also a simple select from an inline view – let’s call it V_ANOTHER – so if we again forget about the complexity of the inner view we’re looking at a query that goes:


select  /*+ QB_NAME(SEL$1) */
        count(applicant_id)
from    (
        select  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                null    allotment_type
        from
                V_OTHER
        where
                rownum <=  :p_seats
        )       V_THING
where 
        applicant_id = :p_applicant_id
;

Figure 2-2

2.5 A couple of details hit the eye when you look at this: Why are we selecting 17 columns from a complex view, and then counting only one of them and discarding the rest. Let’s hope the optimizer is smart enough to discard the excess columns at the earliest possible moment (which might allow it to do some index-only accesses instead of visiting tables for columns we don’t really need).

2.6 Stranger still, one of those columns is a delberately generated NULL! This hints at the possibility that the client code is doing something like “count how many query X will give me, then run query X”– giving us the pattern “select count(*) from (inlne query X); execute query X” Maybe this whole query is a waste of time, but if it can’t be avoided maybe it should be edited down to the smallest  query that will get the correct count.

2.7 Another thought about this layer of the query, the predicate “rownum <= :bind_variable” may be pushing the optimizer into first_rows(n) optimization and this might be enough to make it choose a bad execution plan. I’d have to check, and check for specific versions, but off the top of my head I think that when comparing rownum with a bind variable the optimizer will optimizer for first_rows(10) unless there’s some other reason for choosing anything else.)

2.8 I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

2.9 Let’s dig one layer deeper before we get into the complex stuff. Here’s a version of the code that expands V_OTHER in an extremely stripped down form:


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        {lots of columns}
                FROM 
                        {lots of tables}
                WHERE
                        {lots of predicates}
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                )  
        WHERE
                ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 2-3

2.10 At this point we can start to see reasons for the layering of inline views – we need to select data in the right order before we apply the rownum predicate; as for the excess columns in the select list – even if we selected only the applicant_id in the outer layers the optimizer would still have to acquire the five columns in the order by clause.

2.11 But this emphasises the oddity of the query. If we’re only counting applicant_id to see whether we got :p_seats or fewer rows for a specific applicant_id why does the order matter – surely the order will only matter when we “repeat” the query to get the actual rows (if that’s what we do). As it is, to count a small number of rows this query might have to fetch and sort a large number, then discard most of them. (Some statistics from other posts by the OP indicated that the underlying query might fetch anything between a few hundred and a couple of thousand rows. This particular run showed the query finding 171 rows to sort and then restricting the rowsource to the first two sorted rows)


 

3.0 The Main Course

3.1 To make it a little easier to discuss the detail of the execution plan I’ve laid it out in a small number of sections corresponding to the final (outline_leaf) query blocks the optimizer generated. To do this I applied two sets of information – the Query Block / Object Alias information (which follows the body of the plan) and any appearances of the VIEW operation in the plan.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                                |      1 |        |   574 (100)|      1 |00:00:00.02 |    3822 |       |       |         |
|   1 |  SORT AGGREGATE                             |                                |      1 |      1 |            |      1 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
select count(applicant_id) - above
select where rownum less than - below
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  2 |   VIEW                                      |                                |      1 |      1 |   574   (2)|      0 |00:00:00.02 |    3822 |       |       |         |
|*  3 |    COUNT STOPKEY                            |                                |      1 |        |            |      2 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   4 |     VIEW                                    |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |       |       |         |
|*  5 |      SORT ORDER BY STOPKEY                  |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |  2048 |  2048 | 2048  (0)|
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |       |       |         |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |       |       |         |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |       |       |         |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
join index transformation query block SEL$082F290F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Continuation of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |  1245K|  1245K| 1277K (0)|
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |       |       |         |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |       |       |         |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |       |       |         |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |       |       |         |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |       |       |         |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unnested subquery SEL$A75BE177 (from sel$8, sel$9)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |       |       |         |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$6
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery SEL$F665FE1B (from sel$4 with tranform for index join)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2        / from$_subquery$_001@SEL$1
   3 - SEL$2
   4 - SEL$7E0D484F / from$_subquery$_002@SEL$2
   5 - SEL$7E0D484F
  14 - SEL$082F290F / LMT_GENDER@SEL$3
  15 - SEL$082F290F
  16 - SEL$082F290F / indexjoin$_alias$_001@SEL$082F290F
  17 - SEL$082F290F / indexjoin$_alias$_002@SEL$082F290F
  21 - SEL$7E0D484F / CMT@SEL$3
  22 - SEL$7E0D484F / CMT@SEL$3
  23 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  24 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  25 - SEL$7E0D484F / ACT@SEL$3
  26 - SEL$7E0D484F / ADT@SEL$3
  27 - SEL$7E0D484F / ACT3@SEL$7
  28 - SEL$7E0D484F / ACT3@SEL$7
  29 - SEL$A75BE177 / VW_SQ_1@SEL$67DC521B
  30 - SEL$A75BE177
  31 - SEL$A75BE177 / ACT1@SEL$8
  32 - SEL$A75BE177 / ACT1@SEL$8
  33 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  34 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  35 - SEL$7E0D484F / LMT_PASS@SEL$3
  36 - SEL$7E0D484F / LMT_PASS@SEL$3
  37 - SEL$7E0D484F / LMT_APPEARANCE@SEL$3
  38 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  39 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  40 - SEL$5        / ACT1@SEL$5
  41 - SEL$5        / ACT1@SEL$5
  42 - SEL$6        / ACT2@SEL$6
  43 - SEL$6        / ACT2@SEL$6
  44 - SEL$F665FE1B / XXADM_CATEGORY_MASTER_TBL@SEL$4
  45 - SEL$F665FE1B
  46 - SEL$F665FE1B / indexjoin$_alias$_001@SEL$F665FE1B
  48 - SEL$F665FE1B / indexjoin$_alias$_002@SEL$F665FE1B

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_optimizer_dsdir_usage_control' 0)
      OPT_PARAM('_optimizer_adaptive_plans' 'false')
      OPT_PARAM('_optimizer_gather_feedback' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F665FE1B")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$A75BE177")
      PUSH_PRED(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B" 16 15)
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$9D10C90A")
      UNNEST(@"SEL$9")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$C04829E0")
      ELIMINATE_JOIN(@"SEL$3" "CRMT"@"SEL$3")
      ELIMINATE_JOIN(@"SEL$3" "MMT"@"SEL$3")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")
      OUTLINE(@"SEL$3")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")
      INDEX_RS_ASC(@"SEL$7E0D484F" "CMT"@"SEL$3" ("XXADM_COLLEGE_MASTER_TBL"."COLLEGE_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      FULL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      FULL(@"SEL$7E0D484F" "ADT"@"SEL$3")
      INDEX_JOIN(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_CODE") ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "ACT3"@"SEL$7" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      NO_ACCESS(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      LEADING(@"SEL$7E0D484F" "CMT"@"SEL$3" "LMT_EDUCATION_TYPE"@"SEL$3" "ACT"@"SEL$3" "ADT"@"SEL$3" "LMT_GENDER"@"SEL$3" "ACT3"@"SEL$7" "VW_SQ_1"@"SEL$67DC521B"
              "LMT_PASS"@"SEL$3" "LMT_APPEARANCE"@"SEL$3" "LMT_RELIGION"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "ADT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT3"@"SEL$7")
      USE_NL(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      USE_NL(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      NLJ_BATCHING(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      PQ_FILTER(@"SEL$7E0D484F" SERIAL)
      INDEX_RS_ASC(@"SEL$A75BE177" "ACT1"@"SEL$8" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      INDEX_RS_ASC(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID"))
      LEADING(@"SEL$A75BE177" "ACT1"@"SEL$8" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      USE_NL(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      INDEX_RS_ASC(@"SEL$6" "ACT2"@"SEL$6" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."STATUS_FLAG"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$6" "ACT2"@"SEL$6")
      INDEX_RS_ASC(@"SEL$5" "ACT1"@"SEL$5" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."COLLEGE_ID"
              "XXADM_APPLICANT_COURSPREFS_TBL"."COURSE_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."MEDIUM_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."HOSTEL_REQUIRED"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$5" "ACT1"@"SEL$5")
      INDEX_JOIN(@"SEL$4" "XXADM_CATEGORY_MASTER_TBL"@"SEL$4" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID") ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_CODE"))
      END_OUTLINE_DATA
  */

Figure 3-1

3.2 There’s no rigid rule I can give you about an approach for looking for query blocks and transformations, but it’s worth checking to see which of your original query blocks still exist in the final execution plan and which have disappeared thanks to some transformation.

3.3 If we look down the Query Block Name list above we can see that sel$1, sel$2, sel$5 and sel$6 have “survived” the machinations of the optimizer. We’ve already noted that sel$1 and sel$2 are simply “select from {inline view}” as far as the optimizer is concerned; sel$5 and sel$6 are simple subqueries that appeared as filter subqueries in the original query text and have kept that status by the end of the optimizer’s transformation stage.

3.4 Tracking down the other query blocks that we named we can see the following:

  • sel$3 – most of its tables appear in a new query block called SEL$7E0D484F but one of them appears in a query block called SEL$082F290F; a closer look at SEL$082F290F shows us that it ranges from operations 14 to 17 and holds a “single table” transformation where the optimizer has chosen to use an index join of two indexes on the xxadm_lov_master_tbl rather than doing a tablescan. The index join is represented as a VIEW of a hash join, hence the separate query block. Another little detail we note – the xxadm_lov_master_tbl appears five times in the query, so we need to know which occurrence this is: fortunately the Object Alias information tells us at operation it’s the LMT_GENDER alias.
  • sel$4 is the scalar subquery inside a CASE expression, involving table xxadm_category_master_tbl. We can find the table name (which hasn’t been given an alias) and the query block name in the Object Alias information at operation 44 in a query block called SEL$F665FE1B. There are two points of interest about this query block – it has come into existence because it’s another example where the optimizer has used an index join to avoid a full tablescan; and it has been used in a filter subquery (the parent of operation 44 is the FILTER at operation 6).
  • sel$7 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl with an alias of act3. The Query Block / Object Alias information tells us that ACT3@SEL$7 appears at operations 27 and 28 – and when we track up the plan from operation 27 we see that it is the second child of operation 12 which is a nested loop anti. The optimizer has unnested the subquery and turned it into an anti join as one of the steps that produced query block SEL$7E0D484F
  • sel$8 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl aliased as act1 (OUCH – that’s the second time the alias act1 has appeared in this query!). But the subquery had it’s own subquery, named sel$9, against xxadm_category_master_tbl which didn’t have an alias (more OUCH!). When we search for ACT1@SEL$8 and XXADM_CATEGORY_MASTER_TBL@SEL$9 in the Query Block / Object Alias information we find that they both appear in a query block called SEL$A75BE177 which ranges from operations 29 to 34, and a check of the plan shows that operation 29 is a view pushed predicate of a view called VW_SQ_1 – a name that identifies the view as an internally generated non-mergeable view that was created as the result of unnestingn a subquery. The view contains a join between xxadm_applicant_coursprefs_tbl and xxadm_category_master_tbl, so we can say that the optimizer has unnested our sel$9 to create a NOT IN subquery that is a join subquery, then it has unnested again to produce an inline non-mergeable view, and it has then allowed “join predicate pushdown (JPPD)” so that the non-mergeable view can be the second table of a nested loop. To confirm the last comment we track up the plan to discover that operation 29 is the second child of operation 11 which is, indeed, a nested loop and (since the subquery was a NOT IN subquery) a nested loop anti.
  • sel$9 – see sel$8.

3.5 As you can see, when you’re using the execution plan output to identify what’s happened to the individual query blocks from your original query you’re likely to jump around from the query to the plan body, to the Query Block / Object Alias information in a fairly arbitrary way.

3.6 I’ll close this chapter of the analysis with a quick look at the Outline Data – in particular two of the hint types that appear: OUTLINE() and OUTLINE_LEAF() – which I’ve extracted and sorted for ease of reading:

      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")

      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$7E0D484F")
      OUTLINE(@"SEL$9D10C90A")
      OUTLINE(@"SEL$C04829E0")

      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      OUTLINE_LEAF(@"SEL$A75BE177")
      OUTLINE_LEAF(@"SEL$F665FE1B")

Figure 3-2

3.7 In this context an OUTLINE() is a query block that existed at some point in the optimization sequence that got us to the final execution plan but did not appear as a query block in the final plan. In the previous paragraphs we described how the original query blocks sel$3, sel$4, sel$7, sel$8 and sel$9 disappeared through transformation so (apart from sel$4 which is a bit of an anomaly that I’ll pick up in a moment) they appear in an   OUTLINE() hint. I described how sel$9 would have been unnested into sel$8 to create a join which would still have been a filter subquery until that too was unnested – that join subquery would have been one of the five OUTLINE() query blocks above with the 8 character hexadecimal names.

3.8 An OUTLINE_LEAF() is a “final” query block – one that is present in the final execution plan. If you ignore sel$4, you’ll see that the other 8 query blocks correspond to the 8 query block names that appear in the Query Block Name / Object Alias information. The appearance of sel$4 as an OUTLINE_LEAF() looks like an anomaly to me; I can’t think of a good reason why it should be in the list.


 

4.0 Simplify

4.1 We’re just about ready to do a full read-through of the execution plan. We’ve taken the two outer layers off the query/plan because they represent such simple in-line views, and we’ve discussed the disappearance of some of our initial query blocks and identified and explained all the different query blocks that have appeared in the final plan. So with the extra bits of information in hand let’s take a couple more steps in simplifying the execution plan.

4.2 First I’ll replace each of the three VIEW operations and their descendants with a single line that says “this is a rowsource”. I’ll distinguish between the two variants of the operation (VIEW and VIEW PUSHED PREDICATE) by calling them BULK ROWSOURCE and PRECISION ROWSOURCE respectively: it’s not a perfect description but broadly speaking we expect a VIEW to be called once by its parent to produce a “large” data set and we expect a VIEW PUSHED PREDICATE to be called many times by its parent to produce (each time) a “small” data set using extremely efficient methods.

4.3 Then I’ll remind you that a multichild FILTER operation calls the first child once to supply a rowsource then, for each row returned, calls the other child operations in turn to determine whether or not to keep the row from the first child. This means we can examine just the first child in isolation to see how the optimizer wants to get the driving bulk of the data (and we can examine the other children later, bearing in mind how often they might need to be called and checking how efficient each call is likely to be).

4.4 Finally I’ll note that query block SEL$7E0D484F (the “real main query” as I labelled it in the plan above) starts: “VIEW -> SORT ORDER BY STOPKEY -> FILTER” – after we’ve filtered our data we simply sort it with the intention of keeping only the “top few” rows. That part of the plan is so simple we’ll ignore those lines of the plan and focus on just the first child of the FILTER- leaving the core plan looking like this:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Figure 4-1

4.5 We need to examine a plan of only 25 lines with no complicated bits (because we’ve hidden any bits that looked complicated and will get back to them later). The thing now looks like a single query block which means we can think “First Child First”, so:

  • operation 7 calls operation 8 which calls operation 9 which calls operation 10 which calls operation 11 which calls operation 12 which calls operation 13 which calls operation 14 which is the first operation to produce a rowsource (though we don’t care how at present).
  • Operation 13 is a hash join, so the rowsource from operation 14 becomes its “build” table, and we call operation 18 to supply the “probe” table.
  • Operaion 18 calls operation 19 which calls operation 20 which calls operation 21 which is a table access by rowid that has to call operation 22 to get rowids. So operation 22 supplies the second rowsource (in our collapsed plan). It’s an INDEX UNIQUE SCAN of the index that appears (judging from its name)to be the primary key index of a table, so operation 22 will produce at most one rowid that is passed up to operation 21 that will use that rowid to get the one row from the table. (Operation 21 supplies the 3rd rowsource).
  • Operation 21 passes a row up to operation 20 which calls operation 23 which calls operation 24 (4th rowsource) to do another unique scan of a unique index to get a rowid to pass up to operation 23 to find (and test) a row from the table (5th rowsource) which it passes up to operation 20 to do the join and pass the result up (6th rowsource) to operation 19.
  • Operation 19 calls its second child (operatiomn 25) for each row it receives – but because of the primary key/unique scans the optimizer knows that the first child will return at most one row and sees no problem with using a full tablescan as the second child of the nested loop. So the tablescan of XXADM_APPLICANT_COURSPREFS_TBL is the 7th rowsource. Any rows survinging the join are passed up to operation 18 (making operation 19 the 8th rowsource).
  • Operation 18 uses the incoming rowsource to build its in-memory hash table, and calls operation 26 to supply its second (probe table) rowsource. Operation 26 is the the 9th rowsource, executing a full tablescan of XXADM_APPLICANT_DETAILS_TBL and passing the results up to operation 18, which performs the join and passes the results up to its parent, making it the 10th rowsource.
  • Operation 18 was the second child of the hash join at operation 13, which now uses the incoming data as the probe table to generate the 11th rowsource and pass the results up to operation 12.
  • Operation 12 is a nested loop anti-join and operation 13 has just supplied it with its first child rowsource, so operation 12 will now call its second child once for each row in the first child. Its second child is operation 27 (table access by rowid) which calls its first child (operation 28 index range scan) which fetches rowids from the index passes them up to operation 27 which fetches table rows and passes them up to operation 12. So operation 28 supplies the 12th rowsource, operation 27 the 13th. Since operation 12 is an ANTI join a row from the first child survives if operation 27 doesn’t find a row to return. Operation 12 passes any survivors (14th rowsource) up to operation 11.
  • Operation 11 is another ANTI-join nested loop so for each row from operation 12 it will call its second child, passing in values from its first child to drive an efficient access path and forwarding any rows from the first child where the second child returns no rows. Its second child is operation 29 – our “precision rowsource” – and we can postpone worrying about the exact details of how that works. Operation 29 produces the 15th rowsource in our reduced plan, which it passes up to operation 11.
  • Operation 11 is the first child of the nested loop at operation 10 – and from this point onwards we have 4 nested loop joins and we can simply list through the order of operation. Operation 11 produces the 16th rowsource, then Operation 10 calls its second child (operation 35) which calls operation 36 which passes rowids (17th rowsource) up to operation 35 which passes rows (18th rowsource) up to operation 10.
  • Operation 10 passes its data (19th rowsource) up to operation 9 which calls operation 37 as its second child. Operation 37 (20th rowsource) passes index entries up to operation 9 which performs the join and passes the result (21st rowsource) up to operation 8.
  • Operation 8 calls operation 38 as its second child. Operation 38 (22nd rowsource) passes index entries up to operation 8 which performs the join and passes the result (23rd rowsource) up to operation 7.
  • Operation 7 calls operation 39 as its second child. Operation 39 (24th rowsource) passes index entries up to operation 7 which performs the join and that’s the final (25th) rowsource as far as our reduced execution plan is concerned.

4.6 Here’s the reduced plan, cut back to minimum width, with the order of rowsource generation included:


-----------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           |  Order |
-----------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |     25 |
|   8 |         NESTED LOOPS                        |                                |     23 |
|   9 |          NESTED LOOPS                       |                                |     21 |
|  10 |           NESTED LOOPS                      |                                |     19 |
|  11 |            NESTED LOOPS ANTI                |                                |     16 |
|  12 |             NESTED LOOPS ANTI               |                                |     14 |
|* 13 |              HASH JOIN                      |                                |     11 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |
|* 18 |               HASH JOIN                     |                                |     10 |
|  19 |                NESTED LOOPS                 |                                |      8 |
|  20 |                 NESTED LOOPS                |                                |      6 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      3 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      2 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      5 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      4 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      7 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      9 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |     13 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |     12 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |     15 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |     18 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |     17 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |     20 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |     22 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |     24 |
-----------------------------------------------------------------------------------------------

Figure 4-2

4.7 Once we’ve got this image sorted out we still have a few details to fill in before we’ve gpt the full picture of the execution plan.

  • What does Oracle do to generate the “bulk rowsource” at operation 14
  • What does Oracle do on every call to the “precision rowsource” at operation 29
  • We know that the reduced plan above is the first child of a FILTER operation and if we refer back to previous “real main query” we know that there are three further children to the FILTER that might have to execute once for each row produced by the first child. So that’s another 3 (small) query blocks we need to examine in detail.
  • We need to bring in the predicates to see how the optimizer has used them
  • We need to look at the Starts and A-Rows to compare what happened with the optimizer’s expectation
  • We need to look at disk reads and buffer gets to see how much excess work we did to acquire the data


 

5.0 Filling the Gaps

5.1 After getting the overall shape of the query’s execution we can go back and examine the bits we have so far postponed view. There are three pieces to consider

  • the “bulk rowsource” at operation 14 that was the first child of a hash join.
  • the “precision rowsource” at operation 29 that was the second child of a nested loop anti-join
  • the filter subqueries that were the 2nd, 3rd and 4th children of the explicit FILTER at operation 6

5.2 We start with the “bulk rowsource” that was a VIEW with a highly suggestive name of index$_join$_008. This shows Oracle finding a way of selecting data from a table without visiting the table, using a couple of indexes as if they were skinny tables that could be scanned and joined. In effect Oracle has transformed “select key1, key2 from tableX” into something like:


select  ix1.key1, ix2.key2
from
        (select key1, rowid r1 from tableX) ix1,
        (select key2, rowid r2 from tableX) ix2
where
        ix1.r1 = ix2.r2
;

5.3 This strategy can only be used when Oracle knows that every relevant row will appear in the two indexes – which basically means you have to be careful about NULLs. In the simplest case you might have to have a NOT NULL constraint on at least one column in each of the target indexes; or a predicate in each inline view that ensures that Oracle can use just the index without losing some of the rowids that it needs. After acquiring key values and rowids from each index in turn, Oracle then joins the two sets of data using a hash join. Technically there is no limit to the number of indexes that Oracle can join in this fashion, the choice of strategy depends largely on how big the table is compared to the sum of the sizes of the indexes that could be used instead; practically (as in our main query) it’s rare to see more than two indexes used for this “index join” mechanism.


join index transformation query block SEL$082F290F, with parent
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.4 Moving on to the “precision rowsource” that appears in the original plan as a VIEW PUSHED PREDICATE. This means that Oracle has optimised a non-mergeable view allowing for an input value from its parent. If you take operations 30 to 34 in complete isolation it’s just a simple nested loop join and you might wonder why the view is non-mergable. But when you look back at the parent you discover that it’s an ANTI-join, so Oracle has to say (for each driving row) “join these two tables and see if you get any rows as a result”, it doesn’t have a generic mechanism for doing two separate but consecutive (anti-)join operations at this point.


Unnested subquery SEL$A75BE177 (from sel$8, sel$9) with parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             Driving Rowsource               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.5 Finally we have the three filter subqueries, which I’ve shown with their parent FILTER and a one-liner for the driving rowsource. For each row in operation 7 we call operations 40, 42 and 44 in turn although the parent filter may decide after calling operation 40 that it doesn’t need to call the other two and can simply move on to the next row from operation 7. Similarly the filter might call operations 40 and 42 and not need to call operation 44. It’s also possible that, thanks to scalar subquery caching, Oracle can say “I’ve already called operation 40 for this value, so I know the result and don’t need to call it again”. When we look at the Starts and A-Rows columns for the three operations we will get some idea of how the “notional” execution sequence turned into an actual workload.


Filter subqueries SEL$5, SEL$6 and SEL$F665FE1B, with their parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        "Simplify" Plan                      |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.6 The operations for the first two subqueries (40,41) and (42,43) shouldn’t need any explanation. The third set of operations (44 to 48) is a little more complex. In fact it’s similar to the mechanism that appeared in our first “bulk rowsource” – we have Oracle turning a table access into an “index (only) join”, collecting key values and rowids from two different indexes and doing a hash join on the rowids. The plan is just a little more subtle, though – instead of getting their data from “index fast full scans”, one index is accessed by a range scan (possibly using a predicate value passed in by the filter operation) and the other is using an INLIST ITERATOR with unique scan. In the second case we must be handling a predicate of the form: “key_column in {list of values}”. (Taking a first look at the Starts column we can see that the INLIST ITERATOR runs 6 times calling the INDEX UNIQUE SCAN a total of 12 times so it’s fairly obvious that there are exactly two elements in the list in this case.)


 

6.0 Looking at the Numbers

6.1 Rather than walking through the entire plan again putting the pieces together I’m going to assume that I can carry on to the next stage of analysis, and assume that the pieces will fall into place as we talk about some of the critical numbers.

6.2 It’s probably best to open a copy of the note in a separate window so that you can examine the plan and read my comments at the same time. Starting at the top we apply “first child first”, noting in passing that the examples of the VIEW operations we have in this plan don’t have a significicant impact on the order of operation as we work down the plan – they simply remind us that there are “non-mergeable” parts of the plan. Eventually we get to operation 13 which is a hash join and operations 14 through 17 give us the build (first) table – a quick check shows 127 rows for estimated (E-rows) and actual (A-rows); then we see the probe (second) table is itself a hash join returning 182 rows (estimate 487, so in the right ballpark) and the hash join at operation 13 produces a result set of 182 rows.

6.3 At this point a quick check back UP the plan tells us that the 182 rows survive all the way up to operation 6, where a FILTER eliminates just a few of them; then the result set drops to just 2 rows at operation 5. Then a quick check of operation 5 (with a cross reference to the query) reminds us that we have an inline view that does an “order by” followed by a “rownum < :bind” predicate – so the sort order by stop key at operation 5 is sorting all the data but only passing on the first two rows: so there’s no way we could have modified the join order to eliminate the redundant rows sooner.

6.4 So we see that we get the right volume of data at about the right moment in the plan, and probably can’t do much to avoid the volume of data access – but let’s check how we got the 182 rows at operation 18. Using “first child first” – we see a nested loop joining two “single row by unique index” rowsources, then a nested loop to a full tablescan of XXADM_APPLICANT_COURSPREFS_TBL. The knee-jerk reaction to a “full tablescan as the second table in a nested loop” is that it must be bad – but in this case we know that it will happen no more than once, so we don’t need to panic immediately. Applying a little more thought (and arithmetic): we note that the tablescan returns 209 rows (estimated 478) using 879 buffer gets; that’s not an extreme number of buffer gets per row (especially if the 478 is a reasonably accurate average for the operation). We’ll postpone worrying about the tablescan for the moment but take note that it might be worth revisiting.

6.5 The second child to the hash join at operation 18 is another full tablescan (of XXADM_APPLICANT_DETAILS_TBL) which requires 1,141 buffer gets. Again, though this might not be a bad move, since the alternative would be an actual 209 index probes (or an estimated average 478 index probes). The workload is, again, in the right ballpark but, again, something we might come back to. In fact in both tablescans it might be more important to worry about the work done at each row by the row predicates rather than worrying about the fact that the operation is a tablescan; a predicate involving a CPU-intensive PL/SQL function might be the thing that makes 478 index probes to 478 rows a better option than a tablescan of (say) 100,000 rows.

6.6 From this point onwards (operation 13) we have 6 nested loop joins (though the first two are anti-joins) so it’s “call the second child for each row in the first child” all the way down, and we’ve seen that we don’t eliminate any data as we go. If we want to make the execution plan any faster by “local” tweaking we’ll just have to make sure each “second child” operation is as efficient as possible, which tends to mean looking for cases where we supply a lots of rows (rowids) from an index range scan but find that we then discard the table rows after visiting the table. So …

6.6.1 Operation 12 – nested loop anti join – calls 28/27 (table access by unique index). We find an index entry on each call, but the table row doesn’t qualify – which is what we want for a “successful” anti-join. We could make this a little more efficient by adding a column to the (already unique) index and avoid visiting the table.

6.6.2 Operation 11 – nested loop anti join – calls operation 29 (view with pushed predicate) which operates a high-precision nested loop join at operation 30. The first child of operation 30 is a table access by index, but the table never returns a row (which is nice) so we never call the second child of operation 30. Again we could make this view access a little more efficient by adding extra columns to (already unique) indexes to avoid any need to visit tables.

6.6.3 Operations 10, 9, 8, 7 – nested loop joins – operating very efficiently, some not even visiting tables to acquire data. The order of operation at this point is: 11, 36, 35, 10, 37, 9. 38, 8, 19, 7. And then we get to the FILTER operation, which has 3 subqueries to operate in turn,

6.7 Operation 6 executes in turn the two subqueries we named sel$5 and sel$6 respectively, the first one 182 times, the second one 171 times. Since operation 6 produces 171 rows it seems likely that the initial 182 rows dropped to 171 rows as a consequence of the sel$5 subquery resulting in the smaller number of calls to the sel$6 subquery. It’s worth noting here that operation 41, the index range scan, returned 1,450 rowids, but the subsequent table accesses returned a total of only 29 rows after an additional 316 buffer gets (507 – 191). There may be an opportunity here (yet again) for adding an extra column to the index so that we visit the table only 29 times rather than visiting it 1,450 times. In fact, though it’s not obvious in this SQL Monitor report, the indications from other examples from the same query suggested that this subquery was the single more resource intensive part of the plan.

6.8 The last subquery executed by Operation 6 is the one identified by query block sel$4. The sub-plan starts with a VIEW operation because the table (identified as originating in sel$4 in the query block / object alias information) is “accessed” by way of an index hash join. This subquery is executed only 6 times. Given that there are (at least) 171 rows for which this subquery could be called this means one of two things.  First we can from the query text that this subquery is part of a complex CASE expression – so maybe the simple conditions in the expression mean we rarely need to call the subquery;. secondly it could mean the run-time engine has managed to take advantage of scalar subquery caching and the query doesn’t have many distinct inputs for this subquery – and when we check the predicate section we can see the relevant predicate for a query against the XXADM_CATEGORY table was “category_id = {correlation variable}” which has the look of a table with a few rows and distinct ids..

6.9 In summary, then, there may be a few “localised” tweaks that we can to do improve performance of this plan – largely by adding columns to existing indexes and using them effectively. There are indications that one of the filter subqueries might be a particularly good target for this type of tweak; after which we might want to look at what could be done with the two tablescans which are in that grey area where it’s not easy to decide whether an indexed access or a tablescan is the better option. We have to remember, though, that this query was originaly reported as executing 842,000 times – so maybe we need to do much better than just a little tweaking.


 

7.0 Predicate Information

7.1 Why are we running a query 842,000 times in a batch? The right way to find an answer to that question is to ask the right person – if you can find them. A slightly more long winded way is to find out what is driving the 842,000 executions – and you might be able to do that if you have the full tkprof output from the trace file. (Hint: statement X runs 842,000 times, and if statement Y executes 13 times and produces 842,000 rows maybe Y is driving X.) Sometimes, though, you don’t have the people, or the full data set, or the access you need, so you might take a look at the query and the predicates and start making some reasonable guesses.

7.2 Here’s the tail end of the query, conveniently capturing all the input bind variables:

                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id

7.3 There are two things we can note about these predicates – first they don’t follow the pattern of “:Bnnn” so they’re not from a statement statically embedded in PL/SQL, secondly the names are intelligible and meaningful, so we might draw some tentative conclusions from them, in particular how many distinct values there might be and how this lead to 842,000 executions of the query.

7.4 The variable name that stands out is :p_applicant_id. We seem to be looking at a query about applicants for courses at colleges – and the latter pair probably gives us a relatively small number of combinations. The variable :p_hostel_required is surely just going to be a “yes/no/maybe/null” option. The :p_medium_id is a bit of a puzzle but scanning through the query it looks like it’s the id for the “medium of study” so probably another variable with a small number of values. So where in the plan do these variables get used? Here’s the full list of predicates from the plan, followed by an extra few lines showing just the predicates that reference the bind variables:

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   6 - filter((    "ACT"."PREFERENCE_ORDER"<=NVL(,"ACT"."PREFERENCE_ORDER") -- > comment added to avoid wordpress format issue
               AND "ACT"."PREFERENCE_ORDER">=NVL(,"ACT"."PREFERENCE_ORDER") AND CASE "ACT"."HOSTEL_REQUIRED"
              WHEN 'Y' THEN CASE  WHEN ("ADT"."DISTANCE_IN_KMS">20 AND "LMT_RELIGION"."LOV_CODE"='HINDU' AND  IS NULL) THEN 1 ELSE 2 END  ELSE 1 END =1))
  13 - access("ADT"."APPLICANT_GENDER"="LMT_GENDER"."LOV_ID")
       filter(CASE "ACT"."HOSTEL_REQUIRED" WHEN 'Y' THEN CASE  WHEN ("LMT_EDUCATION_TYPE"."LOV_CODE"='COEDUCOL' AND "LMT_GENDER"."LOV_CODE"='FEMALE') THEN 2 ELSE 1 END
               ELSE 1 END =1)
  15 - access(ROWID=ROWID)
  18 - access("ADT"."APPLICANT_ID"="ACT"."APPLICANT_ID")
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  24 - access("CMT"."EDUCATION_TYPE"="LMT_EDUCATION_TYPE"."LOV_ID")
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))
  26 - filter(("ADT"."STATUS"='Active' AND "ADT"."COURSE_APPLIED_FOR"='DEG' AND (INTERNAL_FUNCTION("ADT"."COLLEGE_STATUS_FLAG") OR "ADT"."COLLEGE_STATUS_FLAG" IS
              NULL)))
  27 - filter("ACT3"."STATUS_FLAG"='O')
  28 - access("ACT3"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT"."PREFERENCE_ORDER"="ACT3"."PREFERENCE_ORDER")
  31 - filter((INTERNAL_FUNCTION("ACT1"."STATUS_FLAG") AND NVL("ACT1"."ATTRIBUTE7",'N')='N'))
  32 - access("ACT1"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT1"."PREFERENCE_ORDER"="ACT"."PREFERENCE_ORDER")
  33 - filter("CATEGORY_CODE"='OPENMERT')
  34 - access("CATEGORY_ID"=TO_NUMBER("ACT1"."ATTRIBUTE1"))
  36 - access("ADT"."PASS_TYPE"="LMT_PASS"."LOV_ID")
  37 - access("ADT"."APPEARANCE_TYPE"="LMT_APPEARANCE"."LOV_ID")
  38 - access("ADT"."RELIGION"="LMT_RELIGION"."LOV_ID")
  40 - filter(("STATUS_FLAG"='B' OR "STATUS_FLAG"='C' OR "STATUS_FLAG"='O' OR "STATUS_FLAG"='T'))
  41 - access("ACT1"."APPLICANT_ID"=:B1)
  43 - access("ACT2"."APPLICANT_ID"=:B1 AND "STATUS_FLAG"='C')
  44 - filter(("CATEGORY_ID"=:B1 AND INTERNAL_FUNCTION("CATEGORY_CODE")))
  45 - access(ROWID=ROWID)
  46 - access("CATEGORY_ID"=:B1)
  48 - access(("CATEGORY_CODE"='BACKWRDC' OR "CATEGORY_CODE"='BACKWRDE'))


   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))

Figure 7-1

7.5 Apart from the predicates in the final shortlist you probably noticed further bind variables at operation 41, 43, 44, and 46 – but these are all named :B1, which is Oracle flagging up the need to pass correlating values into filter subqueries.

7.6 Operation 25 (where we test almost all the predicates) is one of the first operations to drive the query while operation 2 (where we test the :p_applicant_id) is close to the very last thing we do in the execution plan. So we generate a load of data for a college, course, and couple of other predicates, sort it then – at the last moment – decide that we only want a few (:p_seats) rows and count how many rows we’ve found for the specific applicant – and we do that a very large number of times. This takes me back to section 2 where I asked a couple of critical questions:

7.6.1 (2.3) First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply,

7.6.2 (2.8) I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

7.7 We now know the answer to the first query – that predicate isn’t going anywhere, and we recognise why not, of course: it’s a consequence of the “rownum” pseudo-column which has to be evaluated for all the generated data before the rownum restriction can be applied: “select for the applicant then apply the rownum” is very different from “apply the rownum (column and predicate) then restrict to the applicant”. That brings us to the second question – why would you generate all the data, then order it, then restrict it to the first N, and then count how many times a specific applicant appeared? And there’s one “valid” answer to the last bit of the question – what if you’re not really trying to count how many times the applicant appeared, you’re only trying to find out whether the count is zero or non-zero.

7.8 The intent of the query is to answer the question: “does this applicant appear in the first N candidates”. Once you’ve realised this the underlying performance problem with the query becomes clear. In the monitored example show here the query found 171 applicants that matched the initial predicates – and at some point in the batch it’s going to do the same work all over again for each of the other 170 applicants that we’ve discarded. For each combination of the initial predicates (excluding applicant id and seat count) we run the query N times if there are N candidates that match that combination. It’s bad enough that this query took 0.02 seconds to run and would have run 172 times (for a total of 3,4 seconds) but another sample run took 0.05 seconds to run identifying 1,835 applicants (which means another 1,834 executions for a total of 91 seconds run time).


 

8.0 Resolution

8.1 There is a serious flaw in the design of this application. We are seeing a piece of code running once per applicant_id when (with some minor variations) it looks as if it should be running no more than once per set of distinct combinations of (course_id, college_id, medium_id, hostel_required). In fact, if the set of distinct combinations could be generated by a simple query, you could imagine the entire required result set as a join between two non-mergable views, with a little row-by-row post processing – but that might be too ambitious a change to implement in the short term.

8.2 Realistically (as a low-risk strategy) it might be possible to keep a very large percentage of the existing code structure for whatever this task does but precede it with a PL/SQL loop that steps through each of the distinct combinations required, populating a table (perhaps an IOT) with {applicant_id, (combination columns), “rownum”); and then replace our problem query by a simple primary key look up to find the saved “rownum” for each applicant and combination, to check whether the stored “rownum” was fell within the required seat count.


 

9.0 Summaryi

9.1 For a DBA working on-site, or a consultant on a short-term visit, the analysis shown in this post is probably not how things would have progressed. I could imagine the sequence of events being more like:

9.1.1 This “start of year / start of term” batch job takes too long

9.1.2 What is it trying to achieve (business overview) – sketch an outline of the process (technical overview)

9.1.3 Trace the job and discover most of the time went on this query

9.1.4 Investigate the logic of this query and why it is run for every applicant_id

9.1.5 Recognise the fundamental design threat then choose between three possible strategies:

9.1.5.1 make the query much faster

9.1.5.2 re-engineer this part of the batch completely

9.1.5.3 subvert this section of the batch to pre-build a “materialized result” and use a much simpler query to query it

9.2 Effectively, however, we’ve come in at 9.1.5.1 and run the consultation backwards. As we did so we raised an early question about the applicant_id and pushing predicates and the oddity of counting a limited list, and we finally came back to those points towards the end of the post with an educated guess about what the query was trying to achieve and how it should be reduced from “once per applicant” to “once per combination”.

9.3 Despite the post starting at the wrong place it’s quite possible that we would have reached 9.1.5.1 by following a sensible order of problem analysis, and still want to think about how the query might run more quickly – so this investigation wasn’t a total waste of time and it’s allowed us to work through a real-world query and plan in a realistic way which we can sum up in the following stages:

9.3.1 Simplify: cross-referencing between the overall plan shape, the Query Block / Object Alias information, and original query we can take out sections of the plan (sub-plans) and analyse them separately. In particular we can identify and reduce to a minimum the core of the plan that generates the final result set, calling on the various sub-plans as it goes.

9.3.2 Follow the workload: in this case we didn’t get much help from the timing information, but buffer gets, disk reads and A-rows also supply clues to where most work is done. Critically we noted that the volume of data we picked up early on in the query was needed all the way through the query – until the last moment – and we didn’t waste resources carrying and processing unnecessary rows. Along the way, of course, we compare Oracle’s predictions of data volume with the actual data volume (A-Rows = Starts * E-Rows as a guideline). We also noted a couple of opportunities where modifying indexes might eliminate table visits, potentially reducing I/O and buffer gets.

9.3.3 Check the predicates: which goes hand in hand with following the workload – how and where are our predicates used. What predicates have been generated (or eliminated) by transitive closure; which predicates are (or could be) pushed further down the plan tree to eliminate data earlier; will multiple predicates result in bad optimizer estimates followed by bad choices for access paths.

9.4 It would be nice to think that there was a simple progression, a fixed sequence of steps that one could follow to interpret an execution plan quickly and accurately. Unfortunately (like the optimisation process itself) interpretation requires a measure of looping and recursion. It’s probably always best to start with simplifying – but how much you simplify and how you pick which subplan to simplify (or start analysing in detail) depends on being able to spot where the biggest workload appears; and before you get stuck too deeply into a sub-plan you might glance down at the use of predicates because a change in one predicate might make the optimizer completely re-engineer its choice of plan. And maybe, before anything else, you’ll see a single operation which you know should (for exanple) generate about 10 rows when the optimizer is predicting 25,000 rows (or vice versa) and you’ll want to check why there’s such a bad estimate at one point in the plan before you tackle any of the harder questions.

9.5 The bottom line with execution plans is simply this: the more you practice the faster you get at spotting the clues that are worth pursuing; and the faster you spot the clues the less time you waste unpicking every little detail, and the less time you spend on the preripheral pieces of the plan the easier it becomes to keep the big picture in your mind and see how the optimizer got to where it is, and how you might want to redirect it. So pick a couple of random queries each week that produce plans of about 20 lines and use them to exercise your interpretation skills; and increase the complexity of the queries every couple of weeks.

The End

 
 

 

Footnote

I think I’ve spent more than 20 hours writing a detailed description of something that would normally take me a few minutes to do [and some poeple wonder why I’ve not yet written another book on the optimizer]. In part the difference in time is because with practice the “intuitive” skill grows and the pattern of reading is more like –

  • How does it start (ignoring the “trivial” bits around the edges)
  • How does it end ( ditto )
  • Where do we do most work reading and discarding data

What I’ve done in this note is talk about every single query block and every single line whereas in real-life I might have scanned the plan, examined about 10 lines, and done a quick check on the corresponding predicates as the first step to deciding whether or not the plan was reasonably efficient.

April 23, 2020

date_to_date

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:00 pm BST Apr 23,2020

Every now and again someone posts a piece of SQL on the Oracle Developer Forum that includes a predicate with an expression like to_date(date_column). This is a problem for several reasons – not the least being the type of performance problem that showed up in a post from a couple of years back that has just been resurrected.

Before I examine the performance detail, here’s a simple demo of the “wrong data” problem that can go unnoticed, cut-n-paste from a 12.2.0.1 session of SQL*Plus:


SQL> create table t1 (d1 date);

Table created.

SQL> insert into t1 values(sysdate);

1 row created.

SQL> select * from t1 where d1 = to_date(d1);

no rows selected

SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

Session altered.

SQL> select * from t1 where d1 = to_date(d1);

D1
--------------------
22-apr-2020 15:12:36

1 row selected.


Note particularly how changing the nls_date_format can change the result of a query! (There’s another simple example on the referenced blog note.)

What’s going on? to_date(date_col) is equivalent to to_date(to_char(date_col)) using the nls_date_format to do the two conversions, and the most common default format is one that truncates the date column to date-only. So in may systems  to_date(date_col) is nearly (though doing it an expensive way) the same as trunc(date_col).

But let’s go further and see how we wreck the benefit of an index, even if we’ve made sure that we still get the correct results. The following is a minimalist model of a common billing requirement: conversion between currencies:


rem
rem     Script:         date_to_date.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        'GBP'                           from_currency,
        'USD'                           to_currency,
        trunc(sysdate - 1000) + rownum  conversion_date,
        'Corporate'                     conversion_type,
        round(
                1.25 + dbms_random.value/4,
                6
        )                               conversion_rate,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 1000 -- > comment to avoid WordPress format issue
;

create unique index t1_i1 on t1(from_currency, to_currency, conversion_date, conversion_type)
/
alter table t1 add constraint t1_pk primary key(
        from_currency, to_currency, conversion_date, conversion_type
)
/

create table driver(
        invoice_currency        varchar2(3),
        billing_currency        varchar2(3),
        client_type             varchar2(10),
        invoice_date            date
);

insert into driver values(
        'GBP', 'USD','Corporate',trunc(sysdate)
);

commit;

execute dbms_stats.gather_table_stats(null,'driver');

I’ve created table t1 to model the exchange rates between US dollars and UK pounds over a range of about three years ending “today”. There’s also a conversion_type column in the unique key to this table that allows us to have multiple reasons for exchanges, allowing multiple exchange rates on the same day. I really ought to have a check constraint on this table that says something like: check (conversion_date = trunc(conversion_date)).

I’ve also created a “driver” table that holds the data that might be exactly the data we need to extract an exchange rate for a single invoice. So let’s run the SQL that gets the appropriate exchange rate for this one invoice:


set serveroutput off
alter session set statistics_level = all;

prompt  =====================================
prompt  First run with simple date comparison
prompt  =====================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     t1.conversion_date = driver.invoice_date
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  =================================
prompt  Now run with to_date(date_column)
prompt  =================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     to_date(t1.conversion_date) = to_date(driver.invoice_date)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

In the first case I’ve used the correct join predicate between these tables; in the second I’ve put in a redundant to_date() function call at both ends of the predicate. (If you think this is unrealistic – it’s an exact match for the production code I reported in the blog note I cited above).

Here are the two execution plans – with their rowsource execution stats:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      10 |      8 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      10 |      8 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |       9 |      8 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX UNIQUE SCAN         | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       2 |      8 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_DATE"="DRIVER"."INVOICE_DATE"
              AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")


----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      15 |      4 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      15 |      4 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |      14 |      4 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX RANGE SCAN          | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       7 |      4 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")
       filter((TO_DATE(INTERNAL_FUNCTION("T1"."CONVERSION_DATE"))=TO_DATE(INTERNAL_FUNCTION("DRIVE
              R"."INVOICE_DATE")) AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE"))

Three things to take note of:

  1. The index unique scan at operation 4 has changed to an index range scan.
  2. The predicate information for operation 4 has changed from a pure access predicate to an access predicate plus a very messy filter predicate
  3. The range scan that now appears at operation 4 gets 7 buffers (that’s one root block and 6 leaf blocks) to find the one rowid we need, and will have applied the messy filter predicate to all 1,000 index entries where the two currency codes were GBP/USD.

In the case of the older article the problem query was spending almost all of its time in a join like this, doing 59 buffer gets to find a single rowid for every invoice presented – possibly doing thousands of comparisons as it went.

 

March 12, 2020

Sequence Costs

Filed under: Bugs,Oracle,Performance,trace files,Troubleshooting — Jonathan Lewis @ 8:18 pm GMT Mar 12,2020

You’re probably aware of the “identity” construct that appeared in 12.1 and uses Oracle’s sequence mechanism to model the Autonumber or Identity columns that other databases have. A posting from Clay Jackson on the Oracle-L list server suggests that something about their code path has introduced a surprising overhead in 19c … when you’re not using them.

The following code is a slightly modified version of a test case that Clay Jackson posted to demonstrate a strange difference in performance between 12.2 and 19.3

rem
rem     Script:         19c_sequence_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

drop table test_tab2;
drop table test_tab1;

drop sequence test_seq1;
drop sequence test_seq2;

create sequence test_seq1;
create sequence test_seq2;

create table test_tab1 (
   id number primary key,
   c1 varchar2(1000)
);

create table test_tab2 (
   id number,
   t1_id number,
   c2 varchar2(1000),
   constraint test_tab2_fk foreign key (t1_id) 
        references test_tab1 (id) on delete cascade
);

spool 19c_sequence_cost.lst

set feedback off
set pagesize 0
set serveroutput off

execute snap_my_stats.start_snap
alter session set events '10046 trace name context forever';

prompt  ==============================================================================
prompt  start_1000.sql cascades to start_1.sql
prompt  start_1.sql holds:
prompt
prompt  insert into test_tab1 values (test_seq1.nextval,'hello');
prompt  insert into test_tab2 values (test_seq2.nextval, test_seq1.currval, 'byebye');
prompt  commit;
prompt
prompt  ==============================================================================

start start_1000

set serveroutput on
set pagesize 60
alter session set events '10046 trace name context off';
execute snap_my_stats.end_snap

spool off

I’ve got a couple of tables with columns that I plan to generate from sequences, and I’ve got a referential integrity constraint between those tables. I’ll be using nextval from one sequence to populate the first table, then use currval from the same sequence for the foreign key and nextval from the other sequence as a unique identifier for the second table.

I’ve used my “repeater” code to run a short script from the SQL*Plus prompt 1,000 times (as suggested by Clay Jackson in his posting). I’ve also added lines to enable SQL_trace at the basic level, and taken a snapshot of the session activity stats.

Just as Clay Jackson had reported – 19.3.0.0 took significantly longer than 12.2.0.1 to complete the 1,000 calls to the script. (Though in absolute terms we’re only talking fractions of a second for a small single user test.) Examination of the stats – which prompted me to add the 10046 trace and repeat – made it very clear where the difference lay. Here are a few figures picked from the session activity stats:

Version    Statistic                        Value
========   ========================         ===== 
12.2.0.1   CPU used by this session            53  
19.3.0.0   CPU used by this session           114

12.2.0.1   recursive cpu usage                 18
19.3.0.0   recursive cpu usage                 44

12.2.0.1   recursive calls                  1,182
19.3.0.0   recursive calls                  7,215

12.2.0.1   execute count                    2,137                                     
19.3.0.0   execute count                    8,151

Clearly it’s worth finding out what was happening in those 6,000 extra recursive calls. And this is what the 19.3 trace file (after passing through tkprof … sort=execnt) showed:


select count(1)
from
 sys.idnseq$ where seqobj# = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   6000      0.07       0.09          0          0          0           0
Fetch     6000      0.03       0.04          0       6000          0        6000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    12001      0.10       0.14          0       6000          0        6000

If you can’t guess what the table idnseq$ is about, the following definition appears in $ORACLE_HOME/rdbms/admin/dcore.bsq:

create table idnseq$     /* stores table identity column to sequence mapping */
( obj#         number not null,                       /* table object number */
  intcol#      number not null,                    /* identity column number */
  seqobj#      number not null,                    /* sequence object number */
  startwith    number not null,                   /* sequence starting value */
  supplemental log group idnseq$_log_grp (obj#, intcol#, seqobj#) always
)
/

We’re not using identity columns in this test, but every time we run the script we do six checks against this data dictionary table to see (presumably) if we’re using a sequence that is internally associated with an identity column. The requirement doesn’t seem completely unreasonable – but it’s a bit surprising that we do every check twice. (Note – we have to check test_seq1 for both statements but test_seq2 only for the second statement, which is how we get 6,000 executions: 2 * (2 checks for test_seq1 + 1 check for test_seq2) * 1,000 executions of the script.

The doubling looks like a bug, and you have to wonder why a suitable piece of information isn’t kept in the dictionary cache anyway to allow Oracle to avoid executing the recursive statement. I should point out that the surprisingly large impact is visible because I’ve executed 2,000 separate statements from the client side; mechanisms like array inserts and pl/sql loops might only have to do this check once for each array or pl/sql call. (Tests of this hypothesis are left as exercise for the interested reader.)

 

dense_rank

Filed under: Execution plans,Oracle,Performance,sorting,subqueries — Jonathan Lewis @ 6:42 pm GMT Mar 12,2020

I’ve just been prompted to complete and publish a draft I started a few years ago. It’s (ultimately) about a feature that appeared in 9i but doesn’t seem to show up very often at client sites or as a common solution to performance problems on the various Oracle forums – but maybe that’s not surprising given how slowly analytic functions have been taken up.

I want to work towards the feature by starting with a requirement, then examine several solutions. To supply a touch of realism I’ll create an orders table, which holds a customer id and an order date (including time), ,and then ask for a report of the most recent order for each customer. Here’s some starting data:

rem
rem     Script:         order_sample.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.0        Costs are consistent
rem             11.2.0.4        Costs become consistent by 11.2.0.3
rem             11.1.0.7
rem             10.2.0.3
rem              9.2.0.8
rem

create table orders
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                                                                  id,
        mod(rownum-1,200)                                                       customer,
        sysdate - 20 + dbms_random.value(0,20)                                  date_ordered,
        rpad('x' || to_char(trunc(dbms_random.value(0,1000)),'FM009'),100)      padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid wordpress format issue
;

alter table orders modify customer not null;
alter table orders modify date_ordered not null;
alter table orders add constraint ord_pk primary key(id);

create index ord_cus on orders(customer);
-- create unique index ord_cus_date on orders(customer, date_ordered);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'orders',
                method_opt       => 'for all columns size 1',
                cascade          => true
        );
end;
/

I’ve got 200 customers, at 50 orders per customer dating back over the last 20 days. There’s a primary key on the table and (as it stands) an obvious “foreign key index” on the customer column, though I’ve allowed for changing this to a more useful (customer, date_ordered) combination which I’ve decided could be declared as unique.

With this data, how do I report “the most recent order for each customer”? The first question to ask in response to this request is: “do you literally mean ‘THE’ most recent; what if the customer has placed two or more orders on the same day or, in my initial case, at the same time?” There’s a special case to think about the moment you start to turn the natural language request into a formal language specification.

In this case I’m going to run with the “customer-only” index and allow for the fact that two or more orders could be placed at the same time by the same customer, and report both (all) of them if any such simultaneously placed orders appear.

Strategy number 1:

Start with a list showing the most recent order date for each customer and then report all orders that we can identify using that list of (customer, date_ordered). To do that I’ll start with a simple aggregate query and use the result it produced in an “IN” subquery:


prompt  =========================
prompt  Use IN subquery for max()
prompt  =========================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        orders  ord1
where
        (ord1.customer, ord1.date_ordered) in (
                select  /*+ qb_name(subq) */
                        ord2.customer, max(ord2.date_ordered)
                from
                        orders  ord2
                group by 
                        ord2.customer
        )
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

Plan hash value: 1500776991

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |          |      1 |      1 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN RIGHT SEMI|          |      1 |      1 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1568K (0)|
|   3 |    VIEW               | VW_NSO_1 |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |          |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1421K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS   |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS   |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."CUSTOMER"="CUSTOMER" AND "ORD1"."DATE_ORDERED"="MAX(ORD2.DATE_ORDERED)")

I’ve included the qb_name() hint in both query blocks here – it’s always a good idea as it gives you a little extra edge in interpreting the execution plan when the queries get more complicated.

The first thing you’ll notice about the resulting execution plan is that the optimizer has “unnested” the subquery to create an inline view (which it has named VW_NSO_1) and then used a simple join to get the final result. That’s an interesting observation, and it’s something that will often happen with an “IN” subquery – and that brings us to strategy 2.

Strategy number 2:

Some people will take as gospel the claim that the optimizer “cannot handle subqueries efficiently” and will prefer to write their own inline views (possibly using the “WITH subquery” a.k.a. “Common Table Expression (CTE)” mechanism). There will be occasions, even in the latest versions of Oracle, where you may need to do this but there will also be occasions where the optimizer hasn’t done it because it would produce the wrong results – and I have seen a couple of accidents go into production code where this variant has been written incorrectly.


prompt  ==============================
prompt  Introduce max() as inline view
prompt  ==============================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        (
                select  /*+ qb_name(in_line) */
                        ord2.customer, max(ord2.date_ordered) date_ordered
                from
                        orders  ord2
                group by 
                        ord2.customer
        )       ordv,
        orders  ord1
where
        ord1.customer     = ordv.customer
and     ord1.date_ordered = ordv.date_ordered
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

Plan hash value: 2750501889

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |        |      1 |    200 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN           |        |      1 |    200 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1531K (0)|
|   3 |    VIEW               |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1413K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."CUSTOMER"="ORDV"."CUSTOMER" AND "ORD1"."DATE_ORDERED"="ORDV"."DATE_ORDERED")

You’ll notice, of course, the remarkable similarity between the previous plan and this one – the only significant difference being that the optimimzer used a “plain” hash join here rather than the “hash join right semi” that appeared in the previous plan. The “right semi” is an indication that the optimizer has first transformed your “IN” subquery to an equivalent “EXISTS” (“=ANY”) subquery. Don’t be misled by the “right”, by the way, this isn’t indicating any sort of outer join it’s just trying to let you know which table is the one where Oracle should stop its probe after finding the first row. It is, however, unfortunate that it gets a little awkward trying to sort out left from right when Oracle can do a “swap join inputs” on you.

It would have been nice if the VIEW operatio1n had reported the name of my inline view (to correspond to the generated VW_NSO_1 viewname from the previous plan) – but you if you included the ‘alias’ formatting option in the call to display_cursor() it would have reported the alias ordv@main at operation 3.

Strategy Number 3:

We might have decided to check every row in the table to see if the date in that row was the most recent date for the customer in that row, which we could do by running a correlated subquery to do the check for every row in the table.

prompt  ========================================
prompt  Orders with correlated EQUALITY subquery
prompt  ========================================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        orders  ord1
where
        ord1.date_ordered = (
                select  /*+ qb_name(subq) */
                        max(ord2.date_ordered)
                from
                        orders  ord2
                where
                        ord2.customer = ord1.customer
        )
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));


Plan hash value: 1152467146

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |         |      1 |    200 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN           |         |      1 |    200 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1622K (0)|
|   3 |    VIEW               | VW_SQ_1 |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |         |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1435K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS  |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS  |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."DATE_ORDERED"="MAX(ORD2.DATE_ORDERED)" AND "ITEM_1"="ORD1"."CUSTOMER")

Yet again we end up with the same execution plan (barring the “right semi” issue) but with a different generated name for the unnested subquery. This is an interesting facet of Oracle (and SQL in general) – completely different ways of stating a requirement can end up doing the same work in the same way.

An important corrollary to this observation is that the first thing you should do when you start writing an SQL statement is to write it in a way that clearly expresses the requirement and is easy for others to comprehend. Don’t (at the first stage) try to do anything clever because (a) you may do it wrong and (b) the optimizer might have taken your clear, simple, code and done the clever bit behind the scenes for you.

However, we may have to move on to doing something new (-ish) and exciting.

Strategy number 4:

An “obvious” defect in the three plans so far is that we have to visit the orders table twice. Is there a way we can avoid doing this? The answer is yes. Oracle 8.1.6 gave us the analytic functions:


prompt  =======================
prompt  Analytic max() function
prompt  =======================

column padding noprint
column date_ordered noprint

select
        /*+ qb_name(main) */
        ordv.* 
from    (
        select  /*+ qb_name(inline) */
                customer, id, date_ordered, padding,
                max(date_ordered) over (
                        partition by customer
                ) max_date
        from    orders  ord2
        )       ordv
where
        ordv.date_ordered = ordv.max_date
order by
        ordv.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats last'));

Plan hash value: 813391662

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   262 (100)|    200 |00:00:00.01 |     172 |       |       |          |
|*  1 |  VIEW               |        |      1 |  10000 |   262   (3)|    200 |00:00:00.01 |     172 |       |       |          |
|   2 |   WINDOW SORT       |        |      1 |  10000 |   262   (3)|  10000 |00:00:00.01 |     172 |  1612K|   624K| 1432K (0)|
|   3 |    TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ORDV"."DATE_ORDERED"="ORDV"."MAX_DATE")

By adding the analytic max() function I can acquire the necessary “raw” data once and post-process it to find the max(date_ordered) for each customer before discarding all the rows where the row’s date doesn’t match the maximum date. The expression “max(date_ordered) over (partition by customer)” is like a virtual column that tells Oracle to partition the data by customer and find the maximum date within customer. Imagine copying the original data into a spreadsheet, sorting it by customer, then using one of the spreadsheet functions to add an extra column that derives it’s value by looking at the rows that have the same customer as the current row and you’ve got an exact replica of what Oracle is doing here.

So we’ve managed to produce the same result with a single tablescan of orders instead of the two tablescans we saw in every other plan. But there’s a drawback – to be able to partition by customer Oracle has had to fetch every row and column we’re interested in and sort the data before deriving values for the new column: the cost of this plan (262) is much higher than the cost of the plan (54) we got from the previous three queries.

In this case the variation in actual run-time for the two different plans was undetectable – and insignificant compared to the time spent getting the result set to the terminal and displaying. In general, though, you need to consider the trade off between the sorting that goes into the use of analytic functions and the “double visit” work of using subqueries.

Strategy number 5:

There is (at least) one more possibility that I’ve used in the past when the data structure has allowed it to produce the right answers; and it’s the one that is the ultimate target of this blog. Consider the following SQL:


select
        customer, 
        max(id)                 keep (dense_rank last order by date_ordered) as max_id,
        max(date_ordered)       keep (dense_rank last order by date_ordered) as max_date,
--      max(padding)            keep (dense_rank last order by date_ordered) as max_padding
        trim(
                max(padding)    keep (dense_rank last order by date_ordered)
        )       as max_padding
from
        orders
group by
        customer
;

(The trim() function on the padding column doesn’t change the fundamental behaviour of this query, it’s just there to avoid line-wrapping on my output.)

I’ve written a query that does an aggregate on customer, so “customer, max() group by customer”, but it’s a variant of the analytic max() function based on “keep(dense_rank last order by …)” rather then the more familiar “over(partition by … order by …)” form.

Because of the group by customer, the max() function is applied per customer (i.e. behaving like over(partition by customer)), and we’re not actually looking for the maximum value of the referenced column, we’re first ordering by the date_ordered (within customer) applying the dense_rank mechanism, keeping only the rows that have the highest (last) dense_rank, and then taking the maximum of that subset of the data.

Here’s an example applying the combination of mechanisms to a tiny data set:

Raw data
=========
   N1           V1
-----           ---
   93           'C'
   54           'Q',
   43           'A'
   13           'X'
   93           'G'
   54           'W',

Ordered by N1 and dense_rank() appended
========================================
   N1           V1              dr()
-----           ---             ----
   13           'X'             1
   43           'A'             2
   54           'W',            3
   54           'Q',            3
   93           'G'             4
   93           'C'             4

Keep(dense rank last)
=====================
   N1           V1              dr()
-----           ---             ----
   93           'G'             4
   93           'C'             4


max(v1) keep(dense rank last order by n1)
V1
---
'G'

In this tiny example we had cases where there were multiple rows for some of the rankings, but if we go back to our orders table and guarantee (by adding a unique constraint) that a customer will never have more than one row for any one value of date_ordered, then the expression max(id) keep (dense_rank last order by date_ordered) for any one customer will be the id of the row that has the maximum order date for that customer and, similarly, max(date_ordered) keep(…), and max(padding) keep (,,,) will also be the values from that same row.

Given the (absolutely critical) uniqueness constraint, we can get the data for the most recent for the customer using this dense_rank() strategy.

The question, of course, is why would we do something that may not be entirely intuitive and looks as if it could make Oracle do a lot of extra work to get the result. Here’s the answer – which is just the execution plan for the query on my orders table – with the unique constraint added:


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |    28 (100)|    200 |00:00:00.01 |     172 |       |       |          |
|   1 |  SORT GROUP BY     |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |   142K|   142K|  126K (0)|
|   2 |   TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

The path uses a basic SORT GROUP BY, that “sorts” only 200 rows (A-rows) using only 126KB of memory. Compare that with the plan for the analytic max() over() in strategy 4 that takes 1.6MB of memory and sorts 10,000 rows and you’ll appreciate that the keep(dense_rank last) mechanism is doing something much more efficient. For cases where the drop from “num_rows” to “num_distinct” for the aggregating column(s) the benefit of using the somewhat non-intuitive dense_rank() approach may make a significant difference to memory, CPU, and even (if it avoids a spill to disk) I/O.

Footnotes

There are two major variations on how you can use the dense_rank() function, as well as this form in which dense_rank appears in the KEEP LAST (and FIRST) mechanism.

Remember the absolutely critical point that the “keep dense_rank last” strategy is only correct if there is a suitable unique constraint on the data set viz: unique on ({group by column(s)},{order by column(s)}).

There is another major option for getting the same “most recent” rows, which is to use the match_recognize() functionality, but I think I probably wrote this draft before the mechanism even existed – so it’s left as an exercise to the reader to work it out.  A key reason why I’m not going to do it myself is that (like the analytic over() in strategy 4) it will require all 10,000 rows to be sorted, and is therefore likely to be less efficient than strategy 5.

Finally – I thought I’d written a note explaining why a “sort group by” can use far less memory and CPU then a basic “sort order by”, but if I have it wasn’t on this blog.  I do have a note on how the mechanism to handle “rownum <= N” with a preceding “order by” minimises its use of memory, and that note should give you some idea of what the “sort group by” is doing to minimise memory usage. I’ll try to write a little note on the aggregate mechanism some time over the next few days.

 

 

February 25, 2020

count(*) – again

Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 1:24 pm GMT Feb 25,2020

I’ve just received an email asking (yet again) a question about counting the number of rows in a table.

We have a large table with a CLOB column, where the CLOB occupies 85% storage space.
when we use select count(*) from , the DBA says that you should not use count(*) as it uses all columns and as this table contains CLOB it results in high CPU usage, where as if we use count(rowid), this brings us faster and same result.

Well I’ve pointed out in the past, in fact more than once, that count(*), count(1), count(declared-non-null-column) will all do the same thing … execute as count(*); I’ve also listed a few quirky anomalies, also more than once. However, count(rowid) is a little different, it doesn’t get transformed to count(*) as we can see from two pieces of evidence:

Exhibit A: fragments from a 10053 (CBO) trace file

----- Current SQL Statement for this session (sql_id=d381q70418ugs) -----
select count(rowid) from emp1

... 

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT("EMP1".ROWID) "COUNT(ROWID)" FROM "TEST_USER"."EMP1" "EMP1"

Unlike the various count() calls that are converted to count(*), counting rowids doesn’t seem to go througn the CNT transformation and the final state of the query still shows count(rowid) as the critical mechanism.

Exhibit B: variations in Column Projection Information

SQL> explain plan for select count(1) from emp1;

SQL> select * from table(dbms_xplan.display(null,null,'projection'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 2110459082

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |     7  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| E1_PK | 20000 |     7  (15)| 00:00:01 |
-----------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]

SQL> explain plan for select count(rowid) from emp1;

SQL> select * from table(dbms_xplan.display(null,null,'projection'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 2110459082

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    12 |     7  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    12 |            |          |
|   2 |   INDEX FAST FULL SCAN| E1_PK | 20000 |   234K|     7  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(ROWID)[22]
   2 - ROWID[ROWID,10]

The count(1) query transforms to count(*) and we don’t see any column information begin generated at operation 2 and passing up to the parent operation. However with count(rowid) we see Oracle at operation 2 constructing actual rowids from block addresses and row directory entries and passing them up to the the parent operation.

The CPU it takes to construct and pass the rowids is probably insignificant compared to the CPU usage of accessing data blocks in the first place so I doubt if there would be much visible difference in clock-time between count(1) (or count(*)) and count(rowid), but technically it looks as if count(rowid) would actually be slower and more resource-intensive than anything that went through the count(*) transformation.

In passing – the execution plan that appears in our count(1) example also tells us that count(*) doesn’t “use all columns” – after all, there aren’t many tables where every column is in the primary key and emp1 is no exception to the general rule, and the plan is doing an index fast full scan of the primary key index.

 

December 16, 2019

IOT Bug

Filed under: Bugs,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 3:58 pm GMT Dec 16,2019

Here’s a worrying bug that showed up a couple of days ago on the Oracle-L mailing list. It’s a problem that I’ve tested against 12.2.0.1 and 19.3.0.0 – it may be present on earlier versions of Oracle. One of the nastiest things about it is that you might not notice it until you get an “out of space” error from the operating system. You won’t get any wrong results from it, but it may well be adding an undesirable performance overhead.

Basically it seems that (under some circumstances, at least) Oracle is setting the “block guess” component of the secondary index on Index Organized Tables (IOTs) to point to blocks in the overflow segment instead of blocks in the primary key segment. As a result, when you execute a query that accesses the IOT through the secondary index and has to do reads from disc to satisfy the query – your session goes through the following steps:

  • Identify index entry from secondary index – acquire “block guess”
  • Read indicated block and discover the object number on the block is wrong, and the block type is wrong
  • Write a (silent) ORA-01410 error and do a block dump into the trace file
  • Use the “logical rowid” from the secondary index (i.e. the stored primary key value) to access the primary key index by key value

So your query runs to completion and you get the right result because Oracle eventually gets there using the primary key component stored in the secondary index, but it always starts with the guess[see sidebar] and for every block you read into the cache because of the guess you get a dump to the trace file.

Here’s a little code to demonstrate. The problem with this code is that everything appears to works perfectly, you have to be able to find the trace file for your session to see what’s gone wrong. First we create some data – this code is largely copied from the original posting on Oracle-L, with a few minor changes:


rem
rem     Script:         iot_bug_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Notes
rem     THe OP had tested on 19.5.0.0 to get the same effect, see:
rem     //www.freelists.org/post/oracle-l/IOT-cannot-get-valid-consensus-bug-or-unexplained-behavio
rem

drop table randomload purge;

create table randomload(
        roll number,
        name varchar2(40),
        mark1 number,
        mark2 number,
        mark3 number,
        mark4 number,
        mark5 number,
        mark6 number,
        primary key (roll)
) 
organization index 
including mark3 overflow
;

create index randomload_idx on randomload(mark6);

insert into randomload 
select 
        rownum, 
        dbms_random.string(0,40) name, 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,10000)) 
from 
        dual 
connect by 
        level < 1e5 -- > comment to avoid wordpress format issue
;

commit;

exec dbms_stats.gather_table_stats(null,'randomload', cascade=>true);

prompt  ==================================================
prompt  pct_direct_access should be 100 for randomload_idx
prompt  ==================================================

select 
        table_name, index_name, num_rows, pct_direct_access, iot_redundant_pkey_elim  
from 
        user_indexes
where
        table_name = 'RANDOMLOAD'
;

It should take just a few seconds to build the data set and you should check that the pct_direct_access is 100 for the index called randomload_idx.

The next step is to run a query that will do an index range scan on the secondary index.

 
column mark6 new_value m_6

select 
        mark6, count(*) 
from
        randomload 
group by 
        mark6
order by 
        count(*)
fetch first 5 rows only
;

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';
set serveroutput off

select avg(mark3) 
from 
        randomload 
where 
        mark6 = &m_6
;

select * from table(dbms_xplan.display_cursor);

alter session set events '10046 trace name context off';
set serveroutput on

I’ve started by selecting one of the least frequencly occuring values of m_6 (a column I know to be in the overflow); then I’ve flushed the buffer cache so that any access I make to the data will have to start with disk reads (the original poster suggested restarting the database at this point, but that’s not necessary).

Then I’ve enabled sql_trace to show wait states (to capture details of what blocks were read and which object they belong to),, and I’ve run a query for m_3 (a column that is in the primary key (TOP) segment of the IOT) and pulled its execution plan from memory to check that the query did use a range scan of the secondary index. Here’s the plan:

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |       |       |    11 (100)|          |
|   1 |  SORT AGGREGATE    |                   |     1 |     7 |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_IOT_TOP_77298 |    10 |    70 |    11   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN| RANDOMLOAD_IDX    |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MARK6"=1316)
   3 - access("MARK6"=1316)

As you can see the plan shows what we are hoping to see – an index range scan of the secondary index that let’s it follow up with a unique scan of the primary key segment. It’s just a little odd that the access predicate reported for operation 2 (unique scan of TOP) suggests that the access is on a column that isn’t in the primary key and isn’t even in the TOP section.

So the query works and gives the right answer. But what do we find in the trace directory ? If you’re running 12c (possibly only 12.2), each time the error occurs the following pattern of information will be written to the alert log (it didn’t appear in 19.3)


ORCL(3):Hex dump of (file 22, block 16747) in trace file /u01/app/oracle/diag/rdbms/orcl12c/orcl12c/trace/orcl12c_ora_7888.trc
ORCL(3):
ORCL(3):Corrupt block relative dba: 0x0580416b (file 22, block 16747)
ORCL(3):Bad header found during multiblock buffer read (logical check)
ORCL(3):Data in bad block:
ORCL(3): type: 6 format: 2 rdba: 0x0580416b
ORCL(3): last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
ORCL(3): spare3: 0x0
ORCL(3): consistency value in tail: 0x484c0601
ORCL(3): check value in block header: 0x4408
ORCL(3): computed block checksum: 0x0
ORCL(3):

And the following pattern of information is written to the trace file [Update: a follow-up test on 11.2.0.4 suggests that the basic “wrong block address” error also happens in that version of Oracle, but doesn’t result in a dump to the trace file]:


kcbzibmlt:: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 makecr 0 line 15461 ds_blk (22, 16747) bh_blk (22, 16747)
kcbds 0x7ff1ca8c0b30: pdb 3, tsn 8, rdba 0x0580416b, afn 22, objd 135348, cls 1, tidflg 0x8 0x80 0x0
    dsflg 0x108000, dsflg2 0x0, lobid 0x0:0, cnt 0, addr 0x0, exf 0x10a60af0, dx 0x0, ctx 0
    whr: 'qeilwh03: qeilbk'
env [0x7ff1ca8e3e54]: (scn: 0x00000b860e364893   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000000000000000  hi-scn: 0x0000000000000000  ma-scn: 0x00000b860e364879  flg: 0x00000660)
BH (0xb1fd6278) file#: 22 rdba: 0x0580416b (22/16747) class: 1 ba: 0xb1c34000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 763,14
  dbwrid: 0 obj: 135348 objn: 135348 tsn: [3/8] afn: 22 hint: f
  hash: [0x9eff0528,0x77cff808] lru: [0xb1fd2578,0x9ff84658]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xb6f654c0,0x9ff84680] objaq: [0xb6f654d0,0x9ff84690]
  use: [0x77b78128,0x77b78128] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  Printing buffer operation history (latest change first):
  cnt: 5
  01. sid:10 L122:zgb:set:st          02. sid:10 L830:olq1:clr:WRT+CKT
  03. sid:10 L951:zgb:lnk:objq        04. sid:10 L372:zgb:set:MEXCL
  05. sid:10 L123:zgb:no:FEN          06. sid:10 L083:zgb:ent:fn
  07. sid:08 L192:kcbbic2:bic:FBD     08. sid:08 L191:kcbbic2:bic:FBW
  09. sid:08 L604:bic2:bis:REU        10. sid:08 L190:kcbbic2:bic:FAW
  11. sid:08 L602:bic1_int:bis:FWC    12. sid:08 L822:bic1_int:ent:rtn
  13. sid:08 L832:oswmqbg1:clr:WRT    14. sid:08 L930:kubc:sw:mq
  15. sid:08 L913:bxsv:sw:objq        16. sid:08 L608:bxsv:bis:FBW
Hex dump of (file 22, block 16747)

   ... etc.

Corrupt block relative dba: 0x0580416b (file 22, block 16747)
Bad header found during multiblock buffer read (logical check)
Data in bad block:
 type: 6 format: 2 rdba: 0x0580416b
 last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x484c0601
 check value in block header: 0x4408
 computed block checksum: 0x0
TRCMIR:kcf_reread     :start:  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
TRCMIR:kcf_reread     :done :  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf

The nasty bit, of course, is the bit I’ve removed and replaced with just “etc.”: it’s a complete block dump (raw and symbolic) which in my example was somthing like 500 lines and 35KB in size.

It’s not immediately obvious exactly what’s going on and why, but the 10046 trace helps a little. From another run of the test (on 19.3.0.0) I got the following combination of details – which is an extract showing the bit of the wait state trace leading into the start of the first block dump:

WAIT #140478118667016: nam='db file scattered read' ela= 108 file#=13 block#=256 blocks=32 obj#=77313 tim=103574529210
WAIT #140478118667016: nam='db file scattered read' ela= 2236 file#=13 block#=640 blocks=32 obj#=77313 tim=103574531549
WAIT #140478118667016: nam='db file scattered read' ela= 534 file#=13 block#=212 blocks=32 obj#=77312 tim=103574532257
kcbzibmlt: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 warm_up abort 0 makecr 0 line 16082 ds_blk (13, 212) bh_blk (13, 212)

Object 77313 is the secondary index, object 77312 is the primary key index (IOT_TOP). It may seem a little odd that Oracle is using db file scattered reads of 32 blocks to read the indexes but this is a side effect of flushing the buffer – Oracle may decide to prefeetch many extra blocks of an object to “warmup” the cache just after instance startup or a flush of the buffer cache. The thing I want to check, though, is what’s wrong with the blocks that Oracle read from object 77312:


alter system dump datafile 13 block min 212 block max 243;

BH (0xc8f68e68) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xc8266000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa7fd6c38) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xa7c2a000
  set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa5f75780) file#: 13 rdba: 0x034000d5 (13/213) class: 0 ba: 0xa5384000
  set: 11 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xdafe9220) file#: 13 rdba: 0x034000d5 (13/213) class: 1 ba: 0xdadcc000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

I’ve reported the first few lines of the symbolic dump for the first few blocks of the resulting trace file. Look at the third line of each group of three BH lines: it’s reporting object 77311 (the overflow segment), not 77312 (the TOP segment). And every single block reported in the db file scattered read of 32 blocks for object 77312 reports itself, when dumped, as being part of object 77311. And that’s possibly the immediate cause of the ORA-01410.

We can take the investigation a little further by dumping a leaf block or two from the secondary index.


alter session set events 'immediate trace name treedump level 77313';

----- begin tree dump
branch: 0x3400104 54526212 (0: nrow: 542, level: 1)
   leaf: 0x340010d 54526221 (-1: row:278.278 avs:2479)
   leaf: 0x340075e 54527838 (0: row:132.132 avs:5372)
   leaf: 0x34005fb 54527483 (1: row:41.41 avs:7185)

alter system dump datafile 13 block 1886   -- leaf: 0x340075e

BH (0xd5f5d090) file#: 13 rdba: 0x0340075e (13/1886) class: 1 ba: 0xd5158000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77313 objn: 77313 tsn: [3/6] afn: 13 hint: f
...
row#6[5796] flag: K------, lock: 0, len=18
col 0; len 2; (2):  c1 1d
col 1; len 4; (4):  c3 07 41 5c
tl: 8 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 4]  03 40 05 7c

I’ve done a treedump of the secondary index and picked a leaf block address from the treedump and dumped that leaf block, and from that leaf block I’ve extracted one index entry to show you the three components: the key value (c1 1d), the primary key for the row (c3 07 41 5c), and the block guess (03 40 05 75). Read the block guess as a 4 byte hex number, and it translates to file 13, block 1397 – which should belong to the TOP segment. So the exciting question is – what object does block (13, 1397) think it belongs to ?


alter system dump datafile 13 block 1397;

Block header dump:  0x03400575
 Object id on Block? Y
 seg/obj: 0x12dff  csc:  0x00000b860e308c46  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400501 ver: 0x01 opc: 0
     inc: 0  exflg: 0

Converting from Hex to Decimal: obj: 0x12dff = 77311 which is the overflow segment. The secondary index block guess is pointing at a block in the overflow segment.

There are two ways to handle this problem – you could simply rebuild the index (alter index rebuild) or, as the original poster did, use the “update block references” command to correct all the block guesses: “alter index randomload_idx update block references;”. Neither is desirable, but if you’re seeing a lot of large trace files following the pattern above then it may be necessary.

There was one particular inconsistency in the tests – which I ran many times – occasionally the pct_direct_access for the secondary index would be reported as zero (which, technically, should always happen given the error).  If it did, of course, Oracle wouldn’t follow the guess but would go straight to the step where it used the primary key “logical rowid” – thus bypassing the error and block dump.

tl;dr

In some circumstances the block guesses in the secondary indexes of IOTs may be pointing to the overflow segment instead of the primary key (TOP) segment. If this happens then queries will still run and give the right answers, but whenever they read a “guessed” block from disc they will report an ORA-01410 error and dump a block trace. This will affect performance and may cause space problems at the O/S level.

Sidebar

An entry in the secondary index of an Index Organized Table (IOT) consists of three parts, which intially we can think in the form:

({key-value}, {logical rowid}, {block guess})

Since IOTs don’t have real rowids the “logical rowid” is actually the primary key of the row where the {key value} will be found. As a short cut for efficient execution Oracle includes the block address (4 bytes) where that primary key value was stored when the row was inserted. Because an IOT is an index “rows” in the IOT can move as new data is inserted and leaf blocks split, so eventually any primary key may move to a different block – this is why we refer to the block address as a guess – a few days, hours, or minutes after you’ve inserted the row the block address may no longer be correct.)

To help the runtime engine do the right thing Oracle collects a statistic called pct_direct_access for secondary indexes of IOTs. This is a measure of what percentage of the block guesses are still correct at the time that the statistics are gathered. If this value is high enough the run-time engine will choose to try using the block guesses while executing a query (falling back to using the logical rowid if it turns out that the guess is invalid), but if the value drops too low the optimizer will ignore the block guesses and only use the logical rowid.

Not relevant to this note – but a final point about secondary indexes and logical rowids – if the definition of the index includes  some of the columns from the primary keys Oracle won’t store those columns twice (in more recent version, that is) – the code is clever enough to use the values stored in the (key value) component when it needs to use the (logical rowid) component.

 

Update (Jan 2020)

I passed this example on to Oracle, and there are now two bugs recorded for it:

  • Bug 30733525 – ALERT LOG ENTRIES RE BLOCK GUESSES IN THE SECONDARY INDEXES OF IOTS POINTING TO OVERFLOW SEGMENT INSTEAD OF INDEX SEGMENT
  • Bug 30733563 – WRONG GUESS DBA IN INDEX

 

 

 

December 6, 2019

Temp space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:18 pm GMT Dec 6,2019

A question about hunting down the source of the error “ORA-01652 unable to extend temp segment by NNN in tablespace XXX” shows up on the Oracle-L mailing list or the Oracle developer community forum from time to time. In most cases the tablespace referenced is the temporary tablespace, which means the session reporting the error was probably trying to allocate some space for sorting, or doing a hash join, or instantiating a GTT (global temporary table) or a CTE (common table expression / “with” subquery). The difficulty in cases like this is that the session reporting the error might be the victim of some other session’s greed – so looking at what the session was doing won’t necessarily point you to the real problem.

Of course you then run into a further problem tracking down the source of the problem. By the time you hear the complaint (even if it’s only seconds after the error appeared) the session that had been hogging the temporary tablespace may have finished what it was doing, leaving a huge amount of free space in the temporary tablespace and suggesting (to the less experienced and cynical user) that there’s something fundamentally wrong with the way Oracle has been accounting for space usage.

If you find yourself in this situation remember that (if you’re licensed to take advantage of it) the active session history may be able to help.  One of the columns in v$active_session_history is called temp_space_allocated with the slightly misleading description: “Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken”. A simple query against v$active_session_history may be enough to identify the session and SQL  statement that had been holding the temporary space when the error was raised, for example:


column pga_allocated        format 999,999,999,999
column temp_space_allocated format 999,999,999,999

break on session_id skip 1 on session_serial#

select
        session_id, session_serial#, 
        sample_id, 
        sql_id, 
        pga_allocated,
        temp_space_allocated
from
        v$active_session_history
where
        sample_time between sysdate - 5/1440 and sysdate
and     nvl(temp_space_allocated,0) != 0
order by
        session_id, sample_id
/

All I’ve done for this example is query v$active_session_history for the last 5 minutes reporting a minimum of information from any rows that show temp space usage. As a minor variation on the theme you can obviously change the time range, and you might want to limit the output to rows reporting more than 1MB (say) of temp space usage.

You’ll notice that I’ve also reported the pga_allocated (Description: Amount of PGA memory (in bytes) consumed by this session at the time this sample was taken) in this query; this is just a little convenience – a query that’s taking a lot of temp space will probably start by acquiring a lot of memory so it’s nice to be able to see the two figures together.

There are plenty of limitations and flaws in the usefulness of this report and I’ll say something about that after showing an example of usage. Let’s start with a script to build some data before running a space-consuming query:


rem
rem     Script:         allocate_tempspace.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 as 
select * from all_objects
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

execute dbms_stats.gather_table_stats(null,'t1')

execute dbms_lock.sleep(20)

set pagesize  60
set linesize 255
set trimspool on
set serveroutput off
alter session set statistics_level = all;

with ttemp as (
        select /*+ materialize */ * from t1
)
select 
        /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ 
        t1a.object_type,
        max(t1a.object_name)
from
        ttemp t1a, ttemp t1b
where
        t1a.object_id = t1b.object_id
group by
        t1a.object_type
order by
        t1a.object_type
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

My working table t1 consists of 16 copies of the view all_objects – so close to 1 million rows in my case – and the query is hinted to avoid any of the clever transformations that the optimizer could use to reduce the workload so it’s going to do a massive hash join and aggregation to report a summary of a couple of dozen rows. Here’s the execution plan (in this case from 12.2.0.1, though the plan is the same for 19.3 with some variations in the numbers).


SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665B_E2772D3 |      1 |        |      0 |00:00:01.51 |   28915 |      0 |   9217 |  2068K|  2068K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    989K|    989K|00:00:00.24 |   19551 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     29 |     29 |00:00:08.51 |   18493 |  21345 |   2910 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     15M|     15M|00:00:03.93 |   18493 |  21345 |   2910 |    48M|  6400K|   65M (1)|   25600 |
|   6 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.36 |    9233 |   9218 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.35 |    9233 |   9218 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.40 |    9257 |   9217 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.39 |    9257 |   9217 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")

Critically this plan shows us two uses of the temp space but only reports one of them as Used-Tmp. The “hash join” at operation 5 tells us that it reached 65MB of (tunable PGA) memory before going “1-pass”, eventually dumping 25,600 KB to disc. This space usage is corroborated by the 2,910 writes (which, at an 8KB block size, would be 23,280 KB). The missing Used-Tmp, however, is the space taken up by the materialized CTE. We can see that operation 2 is a “load as select” that writes 9,217 blocks to disc (subsequently read back twice – the tablescans shown in operations 7 and 9). That’s  74,000 KB of temp space that doesn’t get reported Used-Tmp.

If we take a look at the plan from 19.3 we see different numbers, but the same “error of omission”:

SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6624_E259E68 |      1 |        |      0 |00:00:01.26 |   23706 |      0 |   5593 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    907K|    907K|00:00:00.21 |   18024 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     25 |     25 |00:00:06.89 |   11193 |  13843 |   2655 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     14M|     14M|00:00:03.55 |   11193 |  13843 |   2655 |    44M|  6400K|   64M (1)|      23M|
|   6 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.26 |    5598 |   5594 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.25 |    5598 |   5594 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.34 |    5595 |   5594 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.33 |    5595 |   5594 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


With slightly fewer rows in t1 (907K vs. 989K) we write 5,593 blocks for the materialized CTE  (instead of 9,217) and spill 2,655 blocks during the hash join (instead of 2,910). But again it’s only the hash join spill that is reported under Used-Tmp. Note, by the way, that the Used-Tmp in 12.2 was reported in KB when it’s reported in MB in 19.3.0.0.

Side note: comparing the number of rows created and blocks written for the CTE, it looks as if 19.3 is using the data blocks much more efficiently than 12.2. There’s no obvious reason for this (though a first guess would be that the older mechanism is to write a GTT with pctfree=10 while the new avoid any free space and transactional details) so, as ever, I now have another draft for a blog note reminding me to investigate (eventually) what differences there are in CTE storage on the upgrade. It’s something that might make a difference in a few special cases.

With the figures from the execution plans in mind we can now look at the results of the query against v$active_session_history. Conveniently the queries took a few seconds to complete, so we’re going to see several rows for each execution.

First the results from 12.2.0.1

SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
        14           22234   15306218 1cwabt12zq6zb       95,962,112            1,048,576
                             15306219 1cwabt12zq6zb       97,731,584           37,748,736
                             15306220 1cwabt12zq6zb      148,194,304           77,594,624
                             15306221 1cwabt12zq6zb      168,117,248           85,983,232
                             15306222 1cwabt12zq6zb      168,117,248           90,177,536
                             15306223 1cwabt12zq6zb      168,117,248           95,420,416
                             15306224 1cwabt12zq6zb      168,117,248           98,566,144
                             15306225 1cwabt12zq6zb      168,117,248          102,760,448
                             15306226 1cwabt12zq6zb      116,933,632          103,809,024
                             15306227 1cwabt12zq6zb      116,933,632          103,809,024
                             15306228 b66ycurnwpgud        8,602,624            1,048,576

I pointed out that we had 25,600 KB reported as Used-Tmp and roughly 74,000 KB unreported – a total of nearly 100,000 KB that is reasonably close to the 103,800,000 bytes reported by ASH. Moreover the timing of the plan (loading the CTE in the first 2 seconds) seems to agree with the growth to 77,590,000 of temp_space_allocated by the time we get to sample_id 15306220 in ASH. Then we have several seconds of slow growth as the hash join takes place and feeds its resulte up to the sort group by. At the end of the query we happen to have been lucky enough to catch one last sample just before the session had released all its temp space and ceased to be active.  (Note: however, that the sql_id at that sample point was not the sql_id of our big query – and that’s a clue about one of the limitations of using ASH to find the greedy SQL.)

We see the same pattern of behaviour in 19.3.0.0:


SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
       136           42767    2217500 1cwabt12zq6zb      143,982,592           46,137,344
                              2217501 1cwabt12zq6zb      193,527,808           54,525,952
                              2217502 1cwabt12zq6zb      193,527,808           57,671,680
                              2217503 1cwabt12zq6zb      193,527,808           61,865,984
                              2217504 1cwabt12zq6zb      197,722,112           67,108,864
                              2217505 1cwabt12zq6zb      150,601,728           70,254,592
                              2217506 1cwabt12zq6zb      150,601,728           70,254,592

We start with an almost instantaneous jump to 46MB of temp_space_allocated in the first second of the query – that’s the 5,593 blocks of the CTE being materialized, then the slow growth of temp space as the hash join runs, spills to disc, and passes its data up to the sort group by. Again we can see that the peak usage was the CTE (46MB) plus the reported spill of 23MB (plus rounding errors and odd bits).

Preliminary Observations

Queries against ASH (v$active_session_history) can show us sessions that were holding space in the temporary tablespace at the moment a sample of active sessions was taken. This may allow us to identify greedy sessions that were causing other sessions to fail with ORA-01652 (unable to allocate temp segment).

We have seen that there is at least one case where we get better information about temp space allocation from ASH than we do from the variants on v$sql_plan that include the SQL Workarea information (v$sql_workarea, v$sql_workarea_active) because the space acquired during materialization of CTEs is not reported as a “tunable SQL workarea” but does appear in the ASH temp_space_allocated.

At first sight it looks as if we may be able to use the query against ASH to identify the statement (by sql_id) that was the one being run by the greedy session when it consumed all the space. As we shall see in a further article, there are various reasons why this may over-optimistic, however in many cases there’s a fair chance that when you see the same sql_id appearing in a number of consecutive rows of the report then that statement may be the thing that is responsible for the recent growth in temp space usage – and you can query v$sql to find the text and call dbms_xplan.display_cursor() to get as much execution plan information as possible.

Further questions

  • When does a session release the temp_space_allocated? Will the space be held (locked) as long as the cursor is open, or can it be released when it is no longer needed? Will it be held, but releasable, even after the cursor has (from the client program’s perspective) been closed?
  • Could we be fooled by a report that said a session was holding a lot of space when it didn’t need it and would have released it if the demand had appeared?
  • Under what conditions might the temp_space_allocated in an ASH sample have nothing to do with the sql_id reported in the same sample?
  • Are there any other reasons why ASH might report temp_space_allocated when an execution plan doesn’t?
  • Is temp_space_allocated only about the temporary tablespace, or could it include informatiom about other (“permanent”) tablespaces ?

Stay tuned for further analysis of the limitations of using v$active_session_history.temp_space_allocated to help identify the srouce of a space management ORA-01652 issue.

 

 

October 11, 2019

v$session

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:29 pm BST Oct 11,2019

Here’s an odd, and unpleasant, detail about querying v$session in the “most obvious” way. (And if you were wondering what made me resurrect and complete a draft on “my session id” a couple of days ago, this posting is the reason). Specifically if you want to select some information for your own session from v$session the query you’re likely to use in any recent version of Oracle will probably be of the form:


select {list for columns} from v$session where sid = to_number(sys_context('userenv','sid'));

Unfortunately that one little statement hides two anomalies – which you can see in the execution plan. Here’s a demonstration cut from an SQL*Plus session running under 19.3.0.0:


SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  gcfrzq9knynj3, child number 0
-------------------------------------
select program from V$session where sid = sys_context('userenv','sid')

Plan hash value: 2422122865

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  MERGE JOIN CARTESIAN     |                 |     1 |    33 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    12 |     0   (0)|
|*  3 |    FIXED TABLE FULL       | X$KSLWT         |     1 |     8 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |     1 |     4 |     0   (0)|
|   5 |   BUFFER SORT             |                 |     1 |    21 |     0   (0)|
|*  6 |    FIXED TABLE FULL       | X$KSUSE         |     1 |    21 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------ -------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid')))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter((BITAND("S"."KSUSEFLG",1)<>0 AND BITAND("S"."KSSPAFLG",1)<>0 AND 
              "S"."INDX"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))
              AND INTERNAL_FUNCTION("S"."CON_ID") AND "S"."INST_ID"=USERENV('INSTANCE')))

As you can see, v$session is a join of 3 separate structures – x$kslwt (v$session_wait), x$ksled (v$event_name), and x$ksuse (the original v$session as it was some time around 8i), and the plan shows two “full tablescans” and a Cartesian merge join. Tablescans and Cartesian merge joins are not necessarily bad – especially where small tables and tiny numbers of rows are concerned – but they do merit at least a brief glance.

x$ksuse is a C structure in the fixed SGA and that structure is a segmented array (which seems to be chunks of 126 entries in 19.3, and chunks of 209 entries in 12.2 – but that’s fairly irrelevant). The SID is simply the index into the array counting from 1, so if you have a query with a predicate like ‘SID = 99’ Oracle can work out the address of the 99th entry in the array and access it very quickly – which is why the SID column is reported as a “fixed index” column in the view v$indexed_fixed_column.

But we have two problems immediately visible:

  1. the optimizer is not using the “index” to access x$ksuse despite the fact that we’re giving it exactly the value we want to use (and we can see a suitable predicate at operation 6 in the plan)
  2. the optimizer has decided to start executing the query at the x$kslwt table

Before looking at why thing’s have gone wrong, let’s check the execution plan to see what would have happened if we’d copied the value from the sys_context() call into a bind variable and queried using the bind variable – which we’ll keep as a character type to make it a fair comparison:

SQL_ID  cm3ub1tctpdyt, child number 0
-------------------------------------
select program from v$session where sid = to_number(:v1)

Plan hash value: 1627146547

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  MERGE JOIN CARTESIAN     |                 |     1 |    32 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    12 |     0   (0)|
|*  3 |    FIXED TABLE FIXED INDEX| X$KSLWT (ind:1) |     1 |     8 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |     1 |     4 |     0   (0)|
|   5 |   BUFFER SORT             |                 |     1 |    20 |     0   (0)|
|*  6 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    20 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(:V1))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter(("S"."INDX"=TO_NUMBER(:V1) AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))


When we have a (character) bind variable instead of a sys_context() value the optimizer manages to use the “fixed indexes”., but it’s still started executing at x$kslwt, and still doing a Cartesian merge join. The plan would be the same if the bind variable were a numeric type, and we’d still get the same plan if we replaced the bind variable with a literal number.

So problem number 1 is that Oracle only seems able to use the fixed index path for literal values and simple bind variables (plus a few “simple” functions). It doesn’t seem to use the fixed indexes for most functions (even deterministic ones) returning a value and the sys_context() function is a particular example of this.

Transitivity

Problem number 2 comes from a side-effect of something that I first described about 13 years ago – transitive closure. Take a look at the predicates in both the execution plans above. Where’s the join condition between x$ksuse and x$kslwt ? There should be one, because the underlying SQL defining [g]v$session  has the following joins:

from
      x$ksuse s,
      x$ksled e,
      x$kslwt w
where
      bitand(s.ksspaflg,1)!=0
and   bitand(s.ksuseflg,1)!=0
and   s.indx=w.kslwtsid       -- this is the SID column for v$session and v$session_wait
and   w.kslwtevt=e.indx
 

What’s happened here is that the optimizer has used transitive closure: “if a = b and b = c then a = c” to clone the predicate “s.indx = to_number(sys_context(…))” to “w.kslwtsid = to_number(sys_context(…))”. But at the same time the optmizer has eliminated the predicate “s.indx = w.kslwtsid”, which it shouldn’t do because this is 12.2.0.1 and ever since 10g we’ve had the parameter _optimizer_transitivity_retain = true — but SYS is ignoring the parameter setting.

So we no longer have a join condition between x$ksuse and x$kslwt – which means there has to be a cartesian merge join between them and the only question is whether this should take place before or after the join between x$kslwt and x$ksled. In fact, the order doesn’t really matter because there will be only one row identified in x$kslwt and one row in x$ksuse, and the join to x$ksled is simply a lookup (by undeclarable unique key) to translate an id into a name and it will take place only once whatever we do about the other two structures.

But there is a catch – especially if your sessions parameter is 25,000 (which it shouldn’t be) and the number of connected sessions is currently 20,000 (which it shouldn’t be) – the predicate against x$ksuse does a huge amount of work as it walks the entire array testing every row (and it doesn’t even do the indx test first – it does a couple of bitand() operations). Even then this wouldn’t be a disaster – we’re only talking a couple of hundredths of a second of CPU – until you find the applications that run this query a huge number of times.

We would prefer to avoid two full tablescans since the arrays could be quite large, and of the two it’s the tablescan of x$ksuse that is going to be the greater threat; so is there a way to bypass the threat?  Once we’ve identified the optimizer anomaly we’ve got a pointer to a solution. Transitivity is going wrong, so let’s attack the transitivity. Checking the hidden parameters we can find a parameter: _optimizer_generate_transitive_pred which defaults to true, so let’s set it to false for the query and check the plan:

select  /*+   opt_param('_optimizer_generate_transitive_pred','FALSE')
*/  program from  v$session where  sid = sys_context('userenv','sid')

Plan hash value: 3425234845

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  NESTED LOOPS             |                 |     1 |    32 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     1 |    28 |     0   (0)|
|   3 |    FIXED TABLE FULL       | X$KSLWT         |    47 |   376 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    20 |     0   (0)|
|*  5 |   FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |     4 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("S"."INDX"="W"."KSLWTSID" AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INDX"=TO_NUMBER(SYS_CONTEXT('user
              env','sid')) AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTEVT"="E"."INDX")


Although it’s not nice to insert hidden parameters into the optimizer activity we do have a result. We don’t have any filtering on x$kslwt – fortunately this seems to be limited in size (but see footnote) to the number of current sessions (unlike x$ksuse which has an array size defined by the sessions parameter or derived from the processes parameter). For each row in x$kslwt we do an access into x$ksuse using the “index” (note that we don’t see access predicates for the fixed tables, we just have to note the operation says FIXED INDEX and spot the “index-related” predicate in the filter predicate list), so this strategy has reduced the number of times we check the complex predicate on x$ksuse rows.

It’s still far from ideal, though. What we’d really like to do is access x$kslwt by index using the known value from sys_context(‘userenv’,’sid’). As it stands the path we get from using a hidden parameter which isn’t listed as legal for the opt_param() hint is a plan that we would get if we used an unhinted query that searched for audsid = sys_context(‘userenv’,’sessionid’).

SQL_ID  7f3f9b9f32u7z, child number 0
-------------------------------------
select  program from  v$session where  audsid =
sys_context('userenv','sessionid')

Plan hash value: 3425234845

----------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |     1 (100)|
|   1 |  NESTED LOOPS             |                 |     2 |    70 |     0   (0)|
|   2 |   NESTED LOOPS            |                 |     2 |    62 |     0   (0)|
|   3 |    FIXED TABLE FULL       | X$KSLWT         |    47 |   376 |     0   (0)|
|*  4 |    FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) |     1 |    23 |     0   (0)|
|*  5 |   FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |     4 |     0   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("S"."INDX"="W"."KSLWTSID" AND BITAND("S"."KSSPAFLG",1)<>0
              AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."KSUUDSES"=TO_NUMBER(SYS_CONTEXT('
              userenv','sessionid')) AND INTERNAL_FUNCTION("S"."CON_ID") AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTEVT"="E"."INDX")


The bottom line, then, seems to be that if you need a query by SID against v$session to be as efficient as possible then your best bet is to load a numeric variable with the sys_context(‘userenv’,’sid’) and then select where “sid = :bindvariable”.  Otherwise query by audsid, or use a hidden parameter to affect the optimizer.

Until the the SYS schema follows the _optimizer_transitivity_retain parameter or treats sys_context() the same way it treats a bind variable there is always going to be some unnecessary work when querying v$session and that excess will grow with either the number of connected sessions (if you optimize the query) or with the value of the sessions parameter.

Footnote

In (much) older versions of Oracle v$session_wait sat on top of x$ksusecst, which was part of the same C structure as x$ksuse. In newer versions of Oracle x$kslwt is a structure that is created on demand in the PGA/UGA – I hope that there’s a short cut that allows Oracle to find the waiting elements in x$ksuse[cst] efficiently, rather than requiring a walk through the whole thing, otherwise a tablescan of the (nominally smaller) x$kslwt structure will be at least as expensive as a tablescan of the x$ksuse structure.

Update (just a few minutes after posting)

Bob Bryla has pointed out in a tweet that there are many “bugs” not fixed until 19.1 for which the workaround is to set “_optimizer_transitivity_retain” to false. So maybe this isn’t an example of SYS doing something particularly strange – it may be part of a general reworking of the mechanism that still has a couple of undesirable side effects.

Bob’s comment prompted me to clone the x$ tables into real tables in a non-SYS schema and model the fixed indexes with primary keys, and I found that the resulting plan (though very efficient) still discarded the join predicate. So we may be seeing the side effects of a code enhancement relating to generating predicates that produce unique key access paths. (A “contrary” test, the one in the 2013 article I linked to, still retains the join predicate for the query that has non-unique indexes.)

 

October 2, 2019

_cursor_obsolete_threshold

Filed under: 12c,Infrastructure,Oracle,Performance — Jonathan Lewis @ 2:39 pm BST Oct 2,2019

At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?

In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8192 was a good idea ?

The answer is PDBs – specifically the much larger number of PDBs a single CBD can (theoretically) support in 12.2.

In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:

The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.

Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.

It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.

Impact analysis

What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation onhis work. It addresses the following question:

If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.

Here’s the complete code for the test:


create table t1
select 1 id from dual
/

alter table t1 add constraint t1_pk primary key (id)
/

spool cursor_obsolete.lst

alter system flush shared_pool;
alter system flush shared_pool;

set serveroutput off
select /*+ index(t1) */ id from t1 where id > 0;
select * from table(dbms_xplan.display_cursor);

execute snap_my_stats.start_snap
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+8192 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

column sql_text format a60
select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number;

prompt  ===============
prompt  Low child reuse
prompt  ===============

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

prompt  ================
prompt  High child reuse
prompt  ================

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 7168+1..7168+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

spool off

I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.

I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.

The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.

The test runs in three parts.

  • First I loop 8192 times executing the same statement, but with a different value for the optimizer_index_cost_adj for each execution – this gives me the limit of 8192 child cursors, each reporting “Optimizer Mismatch” as the reason for not sharing. I’ve run a query against v$sql after this to check that I have 8192 child cursors – you’ll need to make sure your shared pool is a few hundred megabytes if you want to be sure of keeping them all in memory.
  • The second part of the test simply repeats the loop, but only for the first 1,024 child cursors. At this point the child cursors exist, so the optimizer should be doing “soft” parses rather than hard parses.
  • The final part of the test repeats the loop again, but only for the last 1,024 child cursors. Again they should exist and be usable, so the optimizer should again be doing “soft” parses rather than hard parses.

What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment. So if you have to find child cursor 8191 you will have to following 512 segment pointers, and 16 pointers per segment (totalling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.

One preipheral question to ask, of course, is whether Oracle keeps appending to the segmented array, or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).

And the results, limited to just the second and third parts, with just a couple of small edits are as follows:


host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt

===============                                            |    ================
Low child reuse                                            |    High child reuse
===============                                            |    ================

Interval:-  0 seconds                                      |    Interval:-  6 seconds

opened cursors cumulative                      2,084       |    opened cursors cumulative                      2,054
recursive calls                                6,263       |    recursive calls                                6,151
recursive cpu usage                               33       |    recursive cpu usage                              570
session logical reads                          1,069       |    session logical reads                          1,027
CPU used when call started                        33       |    CPU used when call started                       579
CPU used by this session                          37       |    CPU used by this session                         579
DB time                                           34       |    DB time                                          580
non-idle wait count                               16       |    non-idle wait count                                5
process last non-idle time                         1       |    process last non-idle time                         6
session pga memory                           524,288       |    session pga memory                            65,536
enqueue requests                                  10       |    enqueue requests                                   3
enqueue releases                                  10       |    enqueue releases                                   3
consistent gets                                1,069       |    consistent gets                                1,027
consistent gets from cache                     1,069       |    consistent gets from cache                     1,027
consistent gets pin                            1,039       |    consistent gets pin                            1,024
consistent gets pin (fastpath)                 1,039       |    consistent gets pin (fastpath)                 1,024
consistent gets examination                       30       |    consistent gets examination                        3
consistent gets examination (fastpath)            30       |    consistent gets examination (fastpath)             3
logical read bytes from cache              8,757,248       |    logical read bytes from cache              8,413,184
calls to kcmgcs                                    5       |    calls to kcmgcs                                    3
calls to get snapshot scn: kcmgss              1,056       |    calls to get snapshot scn: kcmgss              1,026
table fetch by rowid                              13       |    table fetch by rowid                               1
rows fetched via callback                          6       |    rows fetched via callback                          1
index fetch by key                                 9       |    index fetch by key                                 1
index scans kdiixs1                            1,032       |    index scans kdiixs1                            1,024
session cursor cache hits                         14       |    session cursor cache hits                          0
cursor authentications                         1,030       |    cursor authentications                         1,025
buffer is not pinned count                     1,066       |    buffer is not pinned count                     1,026
parse time cpu                                    23       |    parse time cpu                                   558
parse time elapsed                                29       |    parse time elapsed                               556
parse count (total)                            2,076       |    parse count (total)                            2,052
parse count (hard)                                11       |    parse count (hard)                                 3
execute count                                  1,050       |    execute count                                  1,028
bytes received via SQL*Net from client         1,484       |    bytes received via SQL*Net from client         1,486

Two important points to note:

  • the CPU utilisation goes up from 0.33 seconds to 5.7 seconds.
  • the number of hard parses is zero, this is all about searching for the

You might question are the 2,048-ish parse count(total) – don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.

The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.

Bottom line

The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.

If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.

Footnote 1

The odd thing about this “cursor obselete” feature is that I have a distinct memory that when  PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.

Footnote 2

If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id  296377.1“Troubleshooting: High Version Count Issues” is a useful reference.

September 16, 2019

Little sleeps

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 11:50 am BST Sep 16,2019

A peripheral question in a recent comment (made in response to me asking whether a loop had been written with a sleep time of 1/100th or 1/1000th of a second) asked “How do you sleep for 1/1000th of a second in pure PL/SQL?”

The answer starts with “How pure is pure ?” Here’s a “pure” PL/SQL solution that “cheats” by calling one of the routines in Oracle’s built-in Java library:


create or replace procedure milli_sleep( i_milliseconds in number) 
as 
        language java
        name 'java.lang.Thread.sleep(long)';
/


create or replace procedure nano_sleep( i_milliseconds in number, i_nanoseconds in number)
as
        language java
        name 'java.lang.Thread.sleep(long, int)';
/

prompt  Milli sleep
prompt  ===========
execute milli_sleep(18)

prompt  Nano sleep
prompt  ==========
execute  nano_sleep(0,999999)


The “nano-second” component of the nano_sleep() procedure is restricted to the ranage 0 – 999999. In both cases the “milli-second” component has to be positive.

Whether your machine is good at handling sleeps of less than 1/100th of a second is another question, of course.

Update – due to popular demand

If you want to find out what else is available in the database you can query view all_java_methods searching by partial name (which is very slow) for something you think might exist, for example:

SQL> select owner, name , method_name from all_java_methods where upper(method_name) like '%MILLI%'

OWNER           NAME                                     METHOD_NAME
--------------- ---------------------------------------- ----------------------------------------
SYS             java/util/concurrent/TimeUnit$4          toMillis
SYS             java/util/concurrent/TimeUnit$5          toMillis
SYS             java/util/concurrent/TimeUnit$6          toMillis
SYS             java/util/concurrent/TimeUnit$7          toMillis
SYS             java/util/concurrent/TimeUnit            toMillis
SYS             java/lang/System                         currentTimeMillis
SYS             javax/swing/ProgressMonitor              setMillisToDecideToPopup
SYS             javax/swing/ProgressMonitor              getMillisToDecideToPopup

There’s a lot more than the few listed above – but I just wanted to pick up currentTimeMillis. If you spot something that looks interesting the easiest next step is probably to do a google search with (for example): Oracle java.lang.system currenttimemillis (alternatively you could just keep a permanent link to Oracle’s manual pages for the Java and serarch them. In my case this link was high on the list of google hits, giving me the following method description:


static long 	currentTimeMillis​() 	Returns the current time in milliseconds.

Conveniently this is easy to embed in pl/sql (be careful with case sensitivity):


create or replace function milli_time return number
as
        language java
        name 'java.lang.System.currentTimeMillis() return long';
/

execute dbms_output.put_line(milli_time)
execute dbms_lock.sleep(1)
execute dbms_output.put_line(milli_time)

----------------

SQL> @ java_procs

Function created.

1568719117725

PL/SQL procedure successfully completed.


PL/SQL procedure successfully completed.

1568719118734

PL/SQL procedure successfully completed.

You now have a PL/SQL function that will return the number of millisecond since 1st January 1970.

August 11, 2019

Troubleshooting

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 9:28 pm BST Aug 11,2019

Here’s a question to provoke a little thought if you’ve got nothing more entertaining to do on a Sunday evening.  What threats do you think of when you see a statement like the following in (say) an AWR report, or in a query against v$sql ?

update tableX set
        col001 = :1, col002 = :2, col003 = :3, ...
        -- etc. 
        -- the names are supposed to indicate that the statement updates 302 columns
        -- etc.
        col301 = :301, col302 = :302
where
        pk_col = :303
;

I’ll be writing up some suggestions tomorrow (Monday, UK BST), possible linking to a few other articles for background reading.

Update

The first three comments have already hit the high points, but I’m going to jot down a few notes anyway.

The first two things that really (should) make an impact are:

  • There’s a table in the application with (at least) 303 columns – anything over 255 is a potential disaster area
  • An update statement that updates 302 columns is probably machine generated by a non-scalable application

A secondary detail that might be useful is recognising the pattern of the text – lower case for column names, simple “:nnn” for bind variables.  As it stands I don’t recognise the possible source for this style, but I know it’s not SQL embedded in PL/SQL (which would be all capitals with “:Bnnn” as bind variable names) and it’s not part of a distributed query from a remote database (which would be in capitals with quoted names, and table aliases like “A1, A2, …”), and it’s not “raw” Hiberbate code which produces meaningless table and column aliases based on numbers with a “t” for table and “c” for column.

So let’s think about possible problems and symptoms relating to the two obvious issues:

Wide tables

Once you have more than 255 (real) columns in a table – even if that count includes columns that have been marked unused – Oracle will have to split rows into “rowpieces” that do not exceed 255 columns and chain those pieces together. Oracle will try to be as efficient as possible – with various damage-limiting code changes appearing across versions – attempting store these row pieces together and keeping the number to a minimum, but there are a number of anomalies that can appear that have a significant impact on performance.

Simply having to visit two row pieces to pick up a column in the 2nd row piece (even if it is in the same block) adds to the cost of processing; but when you have to visit a second block to acquire a 2nd (or 3rd, or 4th) rowpiece the costs can be significant. As a quirky little extra, Oracle’s initial generation of row-pieces creates them from the end backwards – so a row with 256 columns starts with a row-piece of one column following by a rowpiece of 255 columns: so you may find that you have to fetch multiple row pieces for virtually every row you access.

It’s worth noting that a row splitting is based only on columns that have been used in the row. If your data is restricted to the first 255 column of a row then the entire row can be stored as a single row piece (following the basic rule that “trailing nulls take no space”); but as soon as you start to update such a row by populating columns past the 255 boundary Oracle will start splitting from the end – and it may create a new trailing row-piece each time you populate a column past the current “highest” column.  In an extreme case I’ve managed to show an example of a single row consisting of 746 row pieces, each in a different block (though that’s a bug/feature that’s been addressed in very recent versions of Oracle).

If rows have been split across multiple blocks then one of the nastier performance problems appears with direct path read tablescans. As Oracle follows the pointer to a secondary row piece it will do a physical read of the target block then immediately forget the target block so, for example, if you have inserted 20 (short) rows into a block then updated all of them in a way that makes them split and all their 2nd row pieces go to the same block further down the table you can find that Oracle does a single direct path read that picks up the head pieces, then 20 “db file sequential read” calls to the same block to pick up the follow-on pieces. (The same effect appears with simple migrated rows.) Contrarily, if you did the same tablescan using “db file scattered read” requests then Oracle might record a single, highly deceptive “table fetch continued row” because it managed to pin the target block and revisit it 20 times.

Often a very wide row (large number of columns) means the table is trying to hold data for multiple types of object. So a table of 750 columns may use the first 20 columns for generic data, columns 21 to 180 for data for object type A, 181 to 395 for data for object type B, and so on.  This can lead to rows with a couple of hundred used columns and several hundred null columns in the middle of each row – taking one byte per null column and burning up lots of CPU as Oracle walks a row to find a relevant column. A particularly nasty impact can appear from this type of arrangement when you upgrade an applications:  imagine you have millions of rows of the “type A” above which use only the first 180 columns.  For some reason the application adds one new “generic” column that (eventually) has to be populated for each row – as the column is populated for a type A row the row grows by 520 (null counting) bytes and splits into at least 3 pieces. The effect could be catastrophic for anyone who had been happy with their queries reporting type A data.

One of the difficulties of handling rows that are chained due to very high column counts is that the statistics can be quite confusing (and subject to change across versions). The most important clue comes from “table fetch continued row”; but this can’t tell you whether your “continued” rows are migrated or chained (or both), which table they come from, and whether you’ve been fetching the same small number multiple times or many different rows. Unfortunately the segment statistics (v$segstat / v$segment_statistics) don’t capture the number of continued fetches by segment – it would be nice if they did since it ought to be a rare (and therefore low-cost) event. The best you can do, probably, is to look at the v$sql_monitor report for queries that report tablescans against large tables but report large numbers of single block reads in the tablescan – and for repeatable cases enable SQL trace with wait tracing enabled against suspect queries to see if they show the characteristic mix of direct path reads and repeated db file sequential reads.

Update every column

The key performance threat in statements that update every column – including the ones that didn’t change – is that Oracle doesn’t compare before and after values when doing the update. Oracle’s code path assumes you know what you’re doing so it saves every “old” value to an undo record (which gets copied to the redo) and writes every “new” value to a redo change vector.  (Fortunately Oracle does check index definitions to see which index entries really have suffered changes, so it doesn’t visit index leaf blocks unnecessarily). It’s possible that some front-end tool that adopts this approach has a configuration option that switches from “one SQL statement for all update” to “construct minimal statement based on screen changes”.

The simple trade-off between these two options is the undo/redo overhead vs. parsing and optimisation overheads as the tool creates custom statements on demand. In the case of the table with more than 255 columns, of course, there’s the added benefit that an update of only the changed columns might limit the update to columns that are in the first rowpiece, eliminating the need (some of the time) to chase pointers to follow-up pieces.

Limiting the update can help with undo and redo, of course, but if the tool always pulls the entire row to the client anyway you still have network costs to consider. With the full row pulled and then updated you may find it takes several SQL*Net roundtrips to transfer the whole row between client and server.  In a quick test on a simple 1,000 column table with an update that set every column in a single row to null (using a bind variables) I found that the a default setup couldn’t even get 1,000 NULLs (let alone “real values”) across the network without resorting to one wait on “SQL*Net more data from client”


variable b1 number
exec :b1 := null;

update t1 set
        col0001 = :b1,
        col0002 = :b1,
        ...
        col1000 = :b1
;

Although “SQL*Net message to/from client” is typically (though not always correctly) seen as an ignorable wait, you need to be a little more suspicious of losing time to “SQL*Net more data to/from client”. The latter two waits mean you’re sending larger volumes of information across the network and maybe you can eliminate some of the data or make the transfer more efficient – perhaps a simple increase in the SDU (session data unit) in the tnsnames.ora, listener.ora, or sqlnet.ora (for default_sdu_size) might be helpful.

Warning

One of the features of trouble-shooting from cold is that you don’t know very much about the system you’re looking at – so it’s nice to be able to take a quick overview of a system looking for any potentially interesting anomalies and take a quick note of what they are and what thoughts they suggest before you start asking questions and digging into a system. This article is just a couple of brief notes along the lines of: “that doesn’t look nice- what questions does it prompt”.

August 9, 2019

Split Partition

Filed under: Oracle,Partitioning,Performance,Tuning — Jonathan Lewis @ 1:02 pm BST Aug 9,2019

This is a little case study on “pre-emptive trouble-shooting”, based on a recent question on the ODC database forum asking about splitting a range-based partition into two at a value above the current highest value recorded in a max_value partition.

The general principle for splitting (range-based) partitions is that if the split point is above the current high value Oracle will recognise that it can simply rename the existing partition and create a new, empty partition, leaving all the indexes (including the global and globally partitioned indexes) in a valid state. There are, however, three little wrinkles to this particular request:

  • first is that the question relates to a system running 10g
  • second is that there is a LOB column in the table
  • third is that the target is to have the new (higher value) partition(s) in a different tablespace

It’s quite possible that 10g won’t have all the capabilities of partition maintenance of newer versions, and if anything is going to go wrong LOBs are always a highly dependable point of failure, and since all the examples in the manuals tend to be very simple examples maybe any attempt to introduce complications like tablespace specification will cause problems.

So, before you risk doing the job in production, what are you going to test?

In Oracle terms we want to check the following

  • Will Oracle have silently copied/rebuilt some segments rather than simply renaming old segments and creating new, empty segments.
  • Will the segments end up where we want them
  • Will all the indexes stay valid

To get things going, the OP had supplied a framework for the table and told us about two indexes, and had then given us two possible SQL statements to do the split, stating they he (or she) had tested them and they both worked. Here’s the SQL (with a few tweaks) that creates the table and indexes. I’ve also added some data – inserting one row into each partition.

rem
rem     Script:         split_pt_lob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem
rem     Last tested 
rem             12.2.0.1
rem             10.2.0.5
rem

define m_old_ts = 'test_8k'
define m_new_ts = 'assm_2'

drop table part_tab purge;

create table part_tab(
  pt_id            NUMBER,
  pt_name          VARCHAR2(30),
  pt_date          DATE default SYSDATE,
  pt_lob           CLOB,
  pt_status        VARCHAR2(2)
)
tablespace &m_old_ts
lob(pt_lob) store as (tablespace &m_old_ts)
partition by range (pt_date)
(
  partition PRT1 values less than (TO_DATE('2012-01-01', 'YYYY-MM-DD')),
  partition PRT2 values less than (TO_DATE('2014-09-01', 'YYYY-MM-DD')),
  partition PRT_MAX values less than (MAXVALUE)
)
/

alter table part_tab
add constraint pt_pk primary key(pt_id)
/

create index pt_i1 on part_tab(pt_date, pt_name) local
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    1,'one',to_date('01-Jan-2011'),rpad('x',4000),'X'
)
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    2,'two',to_date('01-Jan-2013'),rpad('x',4000),'X'
)cascade=>trueee
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    3,'three',to_date('01-Jan-2015'),rpad('x',4000),'X'
)
/

commit;

execute dbms_stats.gather_table_stats(null,'part_tab',cascade=>true,granularity=>'ALL')

We were told that

The table has
– Primary Key on pt_id column with unique index (1 Different table has FK constraint that refers to this PK)
– Composite index on pt_date and pt_name columns

This is why I’ve added a primary key constraint (which will generate a global index) and created an index on (pt_date,pt_name) – which I’ve created as a local index since it contains the partitioning column.

The description of the requirement was:

  • The Task is to split partition(PRT_MAX) to a different tablespace
  • New partition(s) won’t have data at the moment of creation

And the two “tested” strategies were:

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts,
        PARTITION PRT_MAX tablespace &m_new_ts
);

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts LOB (pt_lob) store as (TABLESPACE &m_old_ts), 
        PARTITION PRT_MAX tablespace &m_new_ts LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
;
 

If we’re going to test these strategies properly we will need queries similar to the following:


break on object_name skip 1
select object_name, subobject_name, object_id, data_object_id  from user_objects order by object_name, subobject_name;

break on index_name skip 1
select index_name, status from user_indexes;
select index_name, partition_name, status from user_ind_partitions order by index_name, partition_name;

break on segment_name skip 1
select segment_name, partition_name, tablespace_name from user_segments order by segment_name, partition_name;

First – what are the object_id and data_object_id for each object before and after the split. Have we created new “data objects” while splitting, or has an existing data (physical) object simply changed its name.

Secondly – are there any indexes or index partitions that are no longer valid

Finally – which tablespaces do physical objects reside in.

On a test run of the first, simpler, split statement here are the before and after results for the object_id and data_object_id, followed by the post-split results for index and segment details:


Before Split
============

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT_MAX                     23679          23679
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT_MAX                     23692          23692
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P254                 23687          23687

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P251                23683          23683
                                                             23680          23680

After split
===========

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT3                        23693          23679
                                 PRT_MAX                     23679          23694
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT3                        23700          23692
                                 PRT_MAX                     23699          23699
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P257                 23697          23687
                                 SYS_IL_P258                 23698          23698

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P255                23695          23683
                                 SYS_LOB_P256                23696          23696
                                                             23680          23680


INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023676C00004$$         N/A


INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023676C00004$$         SYS_IL_P252            USABLE
                                 SYS_IL_P253            USABLE
                                 SYS_IL_P257            USABLE
                                 SYS_IL_P258            USABLE


SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023676C00004$$  SYS_IL_P252            TEST_8K
                          SYS_IL_P253            TEST_8K
                          SYS_IL_P257            TEST_8K
                          SYS_IL_P258            TEST_8K

SYS_LOB0000023676C00004$$ SYS_LOB_P249           TEST_8K
                          SYS_LOB_P250           TEST_8K
                          SYS_LOB_P255           TEST_8K
                          SYS_LOB_P256           TEST_8K

Before the split partition PRT_MAX – with 4 segments: table, index, LOB, LOBINDEX – has object_id = data_object_id, with the values: 23679 (table), 23692 (index), 23683 (LOB), 23687 (LOBINDEX); and after the split these reappear as the data_object_id values for partition PRT3 (though the object_id values are larger than the data_object_id values) – so we infer that Oracle has simply renamed the various PRT_MAX objects to PRT3 and created new, empty PRT_MAX objects.

We can also see that all the indexes (including the global primary key index) have remained valid. We also note that the data_object_id of the primary key index has not changed, so Oracle didn’t have to rebuild it to ensure that it stayed valid.

There is a problem, though, the LOB segment and LOBINDEX segments for the new PRT_MAX partition are not in the desired target tablespace. So we need to check the effects of the second version of the split command where we add the specification of the LOB tablespaces. This is what we get – after rerunning the entire test script from scratch:


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT_MAX                     23729          23729
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT_MAX                     23742          23742
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P274                 23737          23737

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P271                23733          23733
                                                             23730          23730


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT3                        23743          23743
                                 PRT_MAX                     23729          23744
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT3                        23750          23750
                                 PRT_MAX                     23749          23749
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P277                 23747          23747
                                 SYS_IL_P278                 23748          23748

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P275                23745          23745
                                 SYS_LOB_P276                23746          23746
                                                             23730          23730

INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            UNUSABLE
SYS_IL0000023726C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   UNUSABLE
                                 PRT_MAX                USABLE

SYS_IL0000023726C00004$$         SYS_IL_P272            USABLE
                                 SYS_IL_P273            USABLE
                                 SYS_IL_P277            USABLE
                                 SYS_IL_P278            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023726C00004$$  SYS_IL_P272            TEST_8K
                          SYS_IL_P273            TEST_8K
                          SYS_IL_P277            TEST_8K
                          SYS_IL_P278            ASSM_2

SYS_LOB0000023726C00004$$ SYS_LOB_P269           TEST_8K
                          SYS_LOB_P270           TEST_8K
                          SYS_LOB_P275           TEST_8K
                          SYS_LOB_P276           ASSM_2


Before looking at the more complex details the first thing that leaps out to hit the eye is the word UNUSABLE – which appears for the status of the (global) primary key index and the PRT3 subpartition. The (empty) PRT_MAX LOB and LOBINDEX partitions are where we wanted them, but by specifying the location we seem to have broken two index segments that will need to be rebuilt.

It gets worse, because if we check the data_object_id of the original PRT_MAX partition (23729) and its matching index partition (23742) we see that they don’t correspond to the (new) PRT3 data_object_id values which are 23743 and 23750 respectively – the data has been physically copied from one data object to another completely unnecessarily; moreover the same applies to the LOB and LOBINDEX segments – the data object ids for the PRT_MAX LOB and LOBINDEX partitions were 23733 and 23737, the new PRT3 data object ids are 23746 and 23747.

If you did a test with only a tiny data set you might not notice the implicit threat that these changes in data_object_id tell you about – you’re going to be copying the whole LOB segment when you don’t need to.

Happy Ending (maybe)

A quick check with 12.2 suggested that Oracle had got much better at detecting that it didn’t need to copy LOB data and invalidate indexes with the second form of the code; but the OP was on 10g – so that’s not much help. However it was the thought that Oracle might misbehave when you specifyied tablespaces that made me run up this test – in particular I had wondered if specifying a tablespace for the partition that would end up holding the existing data might trigger an accident, so here’s a third variant of the split statement I tested, with the results on the indexes, segments, and data objects. Note that I specify the tablespace only for the new (empty) segments:


alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
    PARTITION PRT3,
    PARTITION PRT_MAX tablespace &m_new_ts  LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
/

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT_MAX                     23754          23754
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT_MAX                     23767          23767
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P284                 23762          23762

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P281                23758          23758
                                                             23755          23755

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT3                        23768          23754
                                 PRT_MAX                     23754          23769
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT3                        23775          23767
                                 PRT_MAX                     23774          23774
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P287                 23772          23762
                                 SYS_IL_P288                 23773          23773

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P285                23770          23758
                                 SYS_LOB_P286                23771          23771
                                                             23755          23755
INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023751C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023751C00004$$         SYS_IL_P282            USABLE
                                 SYS_IL_P283            USABLE
                                 SYS_IL_P287            USABLE
                                 SYS_IL_P288            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023751C00004$$  SYS_IL_P282            TEST_8K
                          SYS_IL_P283            TEST_8K
                          SYS_IL_P287            TEST_8K
                          SYS_IL_P288            ASSM_2

SYS_LOB0000023751C00004$$ SYS_LOB_P279           TEST_8K
                          SYS_LOB_P280           TEST_8K
                          SYS_LOB_P285           TEST_8K
                          SYS_LOB_P286           ASSM_2

All the index and index partitions stay valid; the new empty segments all end up in the target tablespace, and all the data object ids for the old PRT_MAX partitions becaome the data object ids for the new PRT3 partitions. Everything we want, and no physical rebuilds of any data sets.

Moral:

When you’re testing, especially when you’re doing a small test while anticipating a big data set, don’t rely on the clock; check the data dictionary (and trace files, if necessary) carefully to find out what activity actually took place.

Footnote:

It’s possible that there are ways to fiddle around with the various default attributes of the partitioned table to get the same effect – but since 12.2 is much better behaved anyway there’s no point in me spending more time looking for alternative solutions to a 10g problem.

 

August 8, 2019

Free Space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 1:27 pm BST Aug 8,2019

Several years ago I wrote a note about reporting dba_free_space and dba_extents to produce a map of the space usage in a tablespace in anticipation of messing about with moving or rebuilding objects to try and reduce the size of the files in the tablespace.  In the related page where I published the script I pointed out that a query against dba_extents would be expensive because it makes use of structure x$ktfbue which generates the information dynamically by reading segment header blocks. I also pointed out in a footnote to the original article that if you’ve enabled the recyclebin and have “dropped” some objects then there will be some space that is reported as free but is not quite free since the extents will still be allocated. This brings me to the topic for today’s blog.

While visiting a client site recently I came across an instance that was running a regular report to monitor available space in the database. Basically this was a query against view dba_free_space. Surprisingly it was taking a rather long time to complete – and the reason for this came in two parts. First, the recyclebin was enabled and had some objects in it and secondly there were no stats on the fixed object x$ktfbue.

In the case of the client the particular query produced a plan that included the following lines:


Id  Operation             Name              Rows    Bytes  Cost (%CPU)  Time
--  --------------------- ----------------  ----   ------  -----------  --------
63  HASH JOIN                               2785     212K     46  (85)  00:00:01
64    TABLE ACCESS FULL   RECYCLEBIN$       1589    20657      7   (0)  00:00:01
65    FIXED TABLE FULL    X$KTFBUE          100K    6347K     38 (100)  00:00:01 

This is part of the view where Oracle calculates the size of all the extents of objects in the recyclebin so that they can be reported as free space. Notice that in this plan (which is dependent on version, system stats, object_stats and various optimizer parameters) the optimizer has chosen to do a hash join between the recyclebin (recyclebin$) and the x$ structure – and that has resulted in a “full tablescan” of x$ktfbue, which means Oracle reads the segment header block of every single segment in the entire database. (I don’t know where the row stats came from as there were no stats on x$ktfbue, and this plan was pulled from the AWR history tables so the query had been optimised and captured some time in the past.)

If there had been nothing in the recyclebin the hash join and two tablescans wouldn’t have mattered, unfortunately the recyclebin had been enabled and there were a few rows in recyclebin$, so the “tablescan” happened. Here’s a cut-n-paste from a much simpler query run against a fairly new (no 3rd party app) database running 12.1.0.2 to give you some idea of the impact:


SQL> execute snap_events.start_snap

PL/SQL procedure successfully completed.

SQL> select count(*) from x$ktfbue;

  COUNT(*)
----------
      8774

1 row selected.

SQL> execute snap_events.end_snap
---------------------------------------------------------
Session Events - 01-Aug 21:28:13
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
Disk file operations I/O                              7           0           0        .018           1
db file sequential read                           5,239           0          14        .003           6
SQL*Net message to client                             7           0           0        .000           0
SQL*Net message from client                           7           0       1,243     177.562         572
events in waitclass Other                             3           1           0        .002           0

PL/SQL procedure successfully completed.

On my little laptop, with nothing else going on, I’ve managed to get away with “only” 5,239 single block reads, and squeezed them all into just 14 centiseconds (local SSD helps). The clients wasn’t so lucky – they were seeing tens of thousands of real physical reads.

The ideal solution, of course, was to purge the recyclebin and disable the feature – it shouldn’t be necessary to enable it on a production system – but that’s something that ought to require at least some paperwork. In the short term gathering stats on the fixed table helped because the plan changed from a hash join with “tablescan” of x$ktfbue to a nested loop with an “indexed” access path, looking more like the following (from a query against just recyclebin$ and x$ktfbue)

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |       |       |     4 (100)|          |
|   1 |  NESTED LOOPS            |                  |     7 |   182 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL      | RECYCLEBIN$      |     6 |    66 |     4   (0)| 00:00:01 |
|*  3 |   FIXED TABLE FIXED INDEX| X$KTFBUE (ind:1) |     1 |    15 |     0   (0)|          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("BUE"."KTFBUESEGBNO"="BLOCK#" AND "BUE"."KTFBUESEGTSN"="TS#" AND
              "BUE"."KTFBUESEGFNO"="FILE#"))

This was still fairly resource-intensive for the client, but was something of an improvement – they had a lot more than 6 items in their recyclebin.

Part of the problem, of course, is that x$ktfbue is one of the objects that Oracle skips when you gather “fixed object” stats – it can be a bit expensive for exactly the reason that querying it can be expensive, all those single block segment header reads.

If you want to check the stats and gather them (as a one-off, probably) here’s some suitable SQL:


select
        table_name, num_rows, avg_row_len, sample_size, last_analyzed
from
        dba_tab_statistics
where
        owner = 'SYS'
and     table_name = 'X$KTFBUE'
;

begin
        dbms_stats.gather_table_stats('SYS','X$KTFBUE');
end;
/

Summary

You probably shouldn’t have the recyclebin enabled in a production system; but if you do, and if you also run a regular report on free space (as many sites seem to do) make sure (a) you have a regular routine to minimise the number of objects that it accumulates and (b) gather statistics (occasionally) on x$ktfbue to minimise the overhead of the necessary join between recyclebin$ and x$ktfbue.

July 3, 2019

DB links

Filed under: distributed,Oracle,Performance — Jonathan Lewis @ 1:16 pm BST Jul 3,2019

A recent thread on the ODC SQL & PL/SQL forum raised the question of how data moves across a database link – is it row by row or array based (or other)? It’s a good question as it’s fairly common knowledge that distributed queries can be very much slower than the same query when executed on an equivalent set of local tables, so it’s worth having a little poke around to see if there’s anything in the way the data moves across the link that could be tweaked to improve performance.

It’s fairly easy to get some ideas by running a couple of queries and checking session activity stats and wait events – so here’s a little test conducted between a database running 12.2.0.1 and a database running 11.2.0.4. For this type of testing it’s probably sensible to use two database rather than faking things with a loopback database link in case the loopback introduces some special features (a.k.a. dirty tricks).


rem
rem     Script:         db_link_usage.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem
rem     Last tested 
rem             12.2.0.1 -> 11.2.0.4
rem

prompt  ==================================
prompt  SQL*Plus 20,000 rows, arraysize 20
prompt  ==================================

set feedback only
set arraysize 20

execute snap_my_stats.start_snap
execute snap_events.start_snap
select * from all_objects@d11g where rownum <= 20000;

set feedback on
execute snap_events.end_snap
execute snap_my_stats.end_snap

prompt  ====================================
prompt  SQL*Plus 20,000 rows, arraysize 2000
prompt  ====================================

set feedback only
set arraysize 2000

execute snap_my_stats.start_snap
execute snap_events.start_snap
select * from all_objects@d11g where rownum <= 20000;

set feedback on
execute snap_events.end_snap
execute snap_my_stats.end_snap

All I’ve done is select 20,000 rows from view all_objects in a remote database with two different settings for the array fetch size. The “feedback only” option is a 12c option that means the session doesn’t output the data, it shows only the final “N rows selected” report. The two “snap” packages are some code I wrote a couple of decades ago to report changes in my session’s activity stats and wait events.

It turned out that there was very little of interest in the session activity stats although the impact of SQL*Net compression is always worth a quick reminder (here and here) on the other hand the wait events (specifically the wait counts) are most revealing:

================================== 
SQL*Plus 20,000 rows, arraysize 20
================================== 
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                       1,004           0          15.69        .016     414,828
SQL*Net message to dblink                         1,004           0           0.04        .000           0
SQL*Net message from dblink                       1,004           0          61.02        .061          11
SQL*Net message to client                         1,004           0           0.11        .000           0

====================================
SQL*Plus 20,000 rows, arraysize 2000
====================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                          14           0           4.88        .349     414,828
SQL*Net message to dblink                            13           0           0.00        .000           0
SQL*Net message from dblink                          13           0           7.72        .594          11
SQL*Net message to client                            14           0           0.00        .000           0

SQL*Net more data from dblink                        96           0           6.26        .065           6
SQL*Net more data to client                          96           0           0.16        .002           4

Doing a little arithmetic (and allowing for a few “metadata” messages that introduce small variations in the numbers), we can see that when we fetched the 20,000 rows with an arraysize of 20 this turned into 1,000 (fetch) calls from SQL*Plus to the server, with a matching 1,000 calls from the local server to the remote server. When the arraysize goes up to 2,000 rows, though, the SDU (session data unit) setting for SQL*Net is too small to hold the whole of a single fetch and we see a single fetch from SQL*Plus turning into one “SQL*Net message to client” accompanied by 7 or 8 “SQ:(Net more data to client” with exactly the same pattern of conversation between the local server and the remote server. You could imagine the conversation as something like:

  • Client to local server: “give me 2,000 rows”
  • local server to remote server: “give me 2,000 rows”
  • remote server to local server: “Here, I can manage 120 rows as the first installment”
  • local server to client: “Here’s a first installment”
  • Client to local server: “Got it, where’s the rest?”     *** but no message recorded in the session activity stats
  • Local server to remote server: “Where’s the rest?”
  • Remote server to local server: “You want more – here’s another 120 rows”
  • Local server to client: “Here’s a second installment”
  • Client to local server: “Got it, where’s the rest?”     *** but no message recorded in the session activity stats
  • etc.

In this example the volume of data sent back to the client in each message was limited by the SDU size negotiated between the local server and the remote server as the link opens. In my case I happened to have the default (apparently 8KB) as the SDU_SIZE in the 12c and a non-default 32KB setting in the 11g sqlnet.ora, so the negotiated SDU between servers was 8KB (the lower of the two).

Here’s what the figures looked like after I had restarted with the SDU set to 16KB in the 12c tnsnames.ora, and 16KB in the 11g listener.ora:


==================================
SQL*Plus 20,000 rows, arraysize 20
==================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                       1,004           0          24.23        .024         588
SQL*Net message to dblink                         1,009           0           0.06        .000           0
SQL*Net message from dblink                       1,010           0          77.76        .077           4
SQL*Net message to client                         1,004           0           0.15        .000           0

====================================
SQL*Plus 20,000 rows, arraysize 2000
====================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                          14           0           1.61        .115         588
SQL*Net message to dblink                            13           0           0.00        .000           0
SQL*Net message from dblink                          13           0           4.21        .324           4
SQL*Net message to client                            14           0           0.00        .000           0

SQL*Net more data from dblink                        45           0          13.53        .301           6
SQL*Net more data to client                          45           0           0.13        .003           0

The first set of figures (arraysize 20) don’t change. If an 8KB SDU is large enough to hold an array of 20 rows then it’s not going to make a difference when the SDU is increased. In the second set of figures we see that for each “SQL*Net message from dblink” we now have roughly 3 “SQL*Net more data from dblink” (with matching counts for “SQL*Net more data to client”). With an SDU of double the size it now takes a total of roughly 4 packets to transmit the array fetch rather than the 8 or 9 we saw with the smaller SDU size.

As a final test with SQL*Plus, what happens if we set the SDU size at 32K for the 12c database (and that means for the client calling in and the server calling out) and 8KB for the 11g database? The client negotiates a 32KB SDU with the 12c server, but the 12c server negotiates an 8KB SDU with the 11g server. Here are the stats for just the test with the large fetch size in this case:


====================================
SQL*Plus 20,000 rows, arraysize 2000
====================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                          14           0           5.30        .379     214,570
SQL*Net message to dblink                            13           0           0.00        .000           0
SQL*Net message from dblink                          13           0           4.09        .314          13
SQL*Net message to client                            14           0           0.00        .000           0

SQL*Net more data from dblink                        96           0          14.46        .151           6
SQL*Net more data to client                          20           0           0.08        .004           0

We get the same 10 (plus a few) “message to/from client/dblink”, but now the “more data” waits are dramatically different. When the client calls for the “next fetch” the local server has to call the remote server 4 or 5 times to get enough 8KB data packets to fill a single 32KB packet to return to the client. You can confirm this (and some of my descriptions of the earlier behaviour) by enabling extended tracing for wait states. Here’s an extract of 5 consecutive lines from a suitable trace file (you have to ignore the “#bytes” parameter on the “SQL*Net more data from dblink” waits, they’ve always been garbage:


WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 2793 driver id=1413697536 #bytes=7 p3=0 obj#=-1 tim=703316387674
WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 34 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=703316388447
WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 8 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=703316389134
WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 16 driver id=1413697536 #bytes=12 p3=0 obj#=-1 tim=703316389818
WAIT #140028098478296: nam='SQL*Net more data to client' ela= 73 driver id=1413697536 #bytes=32671 p3=0 obj#=-1 tim=703316389960

Summary Note

Oracle will use the negotiated SDU to do array fetches across a database link. Since a larger arraysize can (for large volumes of data) reduce the total work load on the remote database and on the network you may want the local server to have a specifically crafted tnsnames.ora entry and the remote server to expose a specific service with matching SDU size to help improve the efficiency of transferring data between two databases.

Lagniappe

If you want to look at other cases of how array fetching and SDU sizes interact, here are a couple of examples of using PL/SQL to execute SQL that fetches data across database links. The first is a simple, embedded “cursor for loop” that (apparently) does “row by row” procssing – although an enhancement appeared many versions ago to make this type of code use array fetching of 100 rows under the covers. The second demonstrates the effects of an explicit cursor with “fetch, bulk collect, limit”:


execute snap_events.start_snap

declare
        m_total number := 0;
        m_ct    number := 0;
begin
        for c1 in (select * from all_objects@d11g where rownum < 20000) loop
                m_total := m_total + c1.object_id;
                m_ct    := m_ct + 1;
        end loop;
end;
/

execute snap_events.end_snap

execute snap_events.start_snap

declare

        cursor c1 is  select * from all_objects@d11g where rownum <= 20000;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

        m_ct number := 0;

begin
        open c1;
        loop
                fetch   c1
                bulk collect
                into    m_tab
                limit   1000
                ;

                exit when c1%notfound;
                m_ct := m_ct + 1;

        end loop;
        close c1;
end;
/

execute snap_events.end_snap

Checking the session events for these two test we see the following when the SDU_SIZE has been set at 16KB at both ends of the database link (which means the negotiated SDU will be 16KB):


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                           4           0           0.66        .165         588 
SQL*Net message to dblink                           203           0           0.05        .000           0
SQL*Net message from dblink                         203           0          38.51        .190           5
SQL*Net message to client                             4           0           0.00        .000           0

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                           4           0           0.08        .021         588
SQL*Net message to dblink                            24           0           0.01        .000           0
SQL*Net message from dblink                          24           0           9.19        .383           5
SQL*Net more data from dblink                        40           0           8.47        .212           6
SQL*Net message to client                             4           0           0.00        .000           0

The critical information we can see in the first example is that getting 20,000 rows from the remote database “row by row” takes 200 (plus noise) round-trips – showing that Oracle really is converting our single-row processing loop in array fetches of 100 rows.

The second example shows us that (allowing for a little noise, again) it takes 20 messages to the remote database to fetch 20,000 rows – so 1,000 rows per message – but the SDU size is too small and we have to send each array in three consecutive packets, the “message from dblink” wait and two “more data from dblink” waits.

Footnote

I did point out that the session activity stats for theses tests weren’t interesting. There is, however, one detail worth mentioning since you could otherwise be fooled into thinking the number of packet travelling between the databases was smaller than it really was. There are two statistics about network roundtrips:


Name                                         Value
----                                         -----
SQL*Net roundtrips to/from client               18         
SQL*Net roundtrips to/from dblink               13

A “roundtrip” is a “message to / message from” pair. It doesn’t account for traffic due to “more data” being requested or sent.

 

Next Page »

Powered by WordPress.com.