Oracle Scratchpad

June 25, 2019

opt_estimate 2

Filed under: CBO,Hints,Oracle — Jonathan Lewis @ 8:22 pm BST Jun 25,2019

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t affect the arithmetic but it does mean the appearance of the nested loop goes back to the original pre-9i form that happens to make it a little easier to see costs and cardinalities adding and multiplying their way through the plan.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2: the optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15*15 = 255 and the cost comes to 15 (t1 rows) * 16 (t2 unit cost) + 2 (t1 cost) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name nlj_index_scan, table_alias (list of possible driving tables), target_index, numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is 15 (t1 rows) * 2 (t2 unit cost) + 2 (t1 cost).

Let’s try to adjust the predication that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decode to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality in operation 1 has also dropped from 225 to 47 – if the optimizer is told that it’s cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the sescond table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

I’m fairly sure I’ve seen a note somewhere on MoS that the opt_estimate(join() …) hint is only supposed to handle a two-table join while the cardinality() hint can be used for a many-table join cardinality fix This doesn’t seem to be the case, both mechanisms seem to work for more than two tables – though you may find the way they work a little counter-intuitive.

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted because desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

June 22, 2019

ANSI bug

Filed under: Oracle,ANSI Standard,Bugs — Jonathan Lewis @ 1:01 pm BST Jun 22,2019

The following note is about a script that I found on my laptop while I was searching for some details about a bug that appears when you write SQL using the ANSI style format rather than traditional Oracle style. The script is clearly one that I must have cut and pasted from somewhere (possibly the OTN/ODC database forum) many years ago without making any notes about its source or resolution. All I can say about it is that the file has a creation date of July 2012 and I can’t find any reference to a problem through Google searches – though the tables and even a set of specific insert statements appears in a number of pages that look like coursework for computer studies and MoS has a similar looking bug “fixed in 11.2”.

Here’s the entire script:

rem
rem     Script:         ansi_bug.sql
rem     Author:         ???
rem     Dated:          July 2012
rem

CREATE TABLE Student (
  sid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  address VARCHAR(20) NOT NULL,
  major CHAR(2)
);

CREATE TABLE Professor (
  pid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  department VARCHAR(10) NOT NULL
);

CREATE TABLE Course (
  cid INT PRIMARY KEY,
  title VARCHAR(20) NOT NULL UNIQUE,
  credits INT NOT NULL,
  area VARCHAR(5) NOT NULL
);

CREATE TABLE Transcript (
  sid INT,
  cid INT,
  pid INT,
  semester VARCHAR(9),
  year CHAR(4),
  grade CHAR(1) NOT NULL,
  PRIMARY KEY (sid, cid, pid, semester, year),
  FOREIGN KEY (sid) REFERENCES Student (sid),
  FOREIGN KEY (cid) REFERENCES Course (cid),
  FOREIGN KEY (pid) REFERENCES Professor (pid)
);

INSERT INTO Student (sid, name, address, major) VALUES (101, 'Nathan', 'Edinburg', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (105, 'Hussein', 'Edinburg', 'IT');
INSERT INTO Student (sid, name, address, major) VALUES (103, 'Jose', 'McAllen', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (102, 'Wendy', 'Mission', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (104, 'Maria', 'Pharr', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (106, 'Mike', 'Edinburg', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (107, 'Lily', 'McAllen', NULL);

INSERT INTO Professor (pid, name, department) VALUES (201, 'Artem', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (203, 'John', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (202, 'Virgil', 'MATH');
INSERT INTO Professor (pid, name, department) VALUES (204, 'Pearl', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (205, 'Christine', 'CS');

INSERT INTO Course (cid, title, credits, area) VALUES (4333, 'Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (1201, 'Comp literacy', 2, 'INTRO');
INSERT INTO Course (cid, title, credits, area) VALUES (6333, 'Advanced Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (6315, 'Applied Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (3326, 'Java', 3, 'PL');
INSERT INTO Course (cid, title, credits, area) VALUES (1370, 'CS I', 4, 'INTRO');

INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 4333, 201, 'Spring', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6315, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 4333, 203, 'Summer I', '2010', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (102, 4333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 3326, 204, 'Spring', '2008', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1201, 205, 'Fall', '2009', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1370, 203, 'Summer II', '2010', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1201, 205, 'Fall', '2009', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1370, 203, 'Summer II', '2010', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 3326, 204, 'Spring', '2001', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 6315, 203, 'Fall', '2008', 'A');

SELECT 
        pid, 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
order by pid
;

I’ve run three minor variations of the same query – the one in the middle selects two columns from a three table join using natural joins. The first query does the same but includes an extra column in the select list while the third query selects only the original columns but orders the result set by the extra column.

The middle query returns 60 rows – the first and third, with the “extra” column projected somewhere in the execution plan, return 13 rows.

I didn’t even have a note of the then-current version of Oracle when I copied this script, but I’ve just run it on 12.2.0.1, 18.3.0.0, and 19.2.0.0 (using LiveSQL), and the error reproduces on all three versions.

June 17, 2019

Can’t Unnest

Filed under: Conditional SQL,Execution plans,Oracle,subqueries — Jonathan Lewis @ 3:35 pm BST Jun 17,2019

In an echo of a very old “conditional SQL” posting, a recent posting on the ODC general database discussion forum ran into a few classic errors of trouble-shooting. By a lucky coincidence this allowed me to rediscover and publish an old example of parallel execution gone wild before moving on to talk about the fundamental problem exhibited in the latest query.

The ODC thread started with a question along the lines of “why isn’t Oracle using the index I hinted”, with the minor variation that it said “When I hint my SQL with an index hint it runs quickly so I’ve created a profile that applies the hint, but the hint doesn’t get used in production.”

The query was a bit messy and, as is often the case with ODC, the formatting wasn’t particularly readable, so I’ve extracted the where clause from the SQL that was used to generate the profile and reformatted it below. See if you can spot the hint clue that tells you why there might be a big problem using this SQL to generate a profile to use in the production environment:


WHERE   
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE' 
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS' 
AND     MSG.SRCH_4_FLD_VAL = '123456'   
AND     (
            (    'INVOICENUMBER' = 'INVOICENUMBER' 
             AND MSG.MSG_ID IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('123456') AS TABLE_OF_VARCHAR)))
            ) 
         OR (    'INVOICENUMBER' = 'SIEBELORDERID' 
             AND MSG.SRCH_3_FLD_VAL IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('') AS TABLE_OF_VARCHAR)))
            )
        ) 
AND     MSG.MSG_ID = TRK.INV_NUM(+) 
AND     (   TRK.RESEND_DT IS NULL 
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)   
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1   
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

If the SQL by itself doesn’t give you an inportant clue, compare it with the Predicate Information from the “good” execution plan that it produced:


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("TRK"."RESEND_DT" IS NULL OR "TRK"."RESEND_DT"=))  
   8 - filter(("MSG"."SRCH_4_FLD_VAL"='123456' AND "MSG"."MSG_CAPTR_STG_CD"='PRE_BCS'))  
   9 - access("MSG"."MSG_ID"="COLUMN_VALUE" AND "MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
       filter("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
  10 - access("MSG"."MSG_ID"="TRK"."INV_NUM")  
  13 - access("TRK1"."INV_NUM"=:B1)  

Have you spotted the thing that isn’t there in the predicate information ?

What happened to the ‘INVOICENUMBER’ = ‘INVOICENUMBER’ predicate and the ‘INVOICENUMBER’ = ‘SIEBELORDERID’ predicate? They’ve disappeared because the optimizer knows that the first predicate is always true and doesn’t need to be tested at run-time and the second one is always false and doesn’t need to be tested at run-time. Moreover both predicates are part of a conjunct (AND) – so in the second case the entire two-part predicate can be eliminated; so the original where clause can immediately be reduced to:


WHERE   
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE' 
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS' 
AND     MSG.SRCH_4_FLD_VAL = '123456'   
AND     (
                 MSG.MSG_ID IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('123456') AS TABLE_OF_VARCHAR)))
        ) 
AND     MSG.MSG_ID = TRK.INV_NUM(+) 
AND     (   TRK.RESEND_DT IS NULL 
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)   
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1   
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

Looking at this reduced predicate you may note that the IN subquery referencing the fnm_gn_in_string_list() collection could now be unnested and used to drive the final execution plan, and the optimizer will even recognize that it’s a rowsource with at most one row. So here’s the “good” execution plan:


---------------------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                               | Name                  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |  
---------------------------------------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                        |                       |      1 |        |      2 |00:00:00.08 |      12 |      7 |       |       |          |  
|   1 |  SORT ORDER BY                          |                       |      1 |      1 |      2 |00:00:00.08 |      12 |      7 |  2048 |  2048 | 2048  (0)|  
|*  2 |   FILTER                                |                       |      1 |        |      2 |00:00:00.08 |      12 |      7 |       |       |          |  
|   3 |    NESTED LOOPS OUTER                   |                       |      1 |      1 |      2 |00:00:00.08 |      10 |      7 |       |       |          |  
|   4 |     NESTED LOOPS                        |                       |      1 |      1 |      2 |00:00:00.06 |       6 |      5 |       |       |          |  
|   5 |      VIEW                               | VW_NSO_1              |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |  
|   6 |       HASH UNIQUE                       |                       |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |  1697K|  1697K|  487K (0)|  
|   7 |        COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |  
|*  8 |      TABLE ACCESS BY INDEX ROWID        | FNM_VSBL_MSG          |      1 |      1 |      2 |00:00:00.06 |       6 |      5 |       |       |          |  
|*  9 |       INDEX RANGE SCAN                  | XIE2FNM_VSBL_MSG      |      1 |      4 |      4 |00:00:00.04 |       4 |      3 |       |       |          |  
|* 10 |     INDEX RANGE SCAN                    | XPKBCS_INV_RESEND_TRK |      2 |      1 |      2 |00:00:00.01 |       4 |      2 |       |       |          |  
|  11 |    SORT AGGREGATE                       |                       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
|  12 |     FIRST ROW                           |                       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
|* 13 |      INDEX RANGE SCAN (MIN/MAX)         | XPKBCS_INV_RESEND_TRK |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
---------------------------------------------------------------------------------------------------------------------------------------------------------------  

The plan looks great – Oracle predicts a single row driver (operation 5) which can use a very good index (XIE2FNM_VSBL_MSG) in a nested loop, followed by a second nested loop, followed by a filter subquery and a sort of a tiny amount of data. Predictions match actuals all the way down the plan, and the workload is tiny. So what goes wrong in production?

You’ve probably guessed the flaw in this test. Why would anyone include a predicate like ‘INVOICENUMBER’ = ‘INVOICENUMBER’ in production code, or even worse ‘INVOICENUMBER’ = ‘SIEBELORDERID’. The OP has taken a query using bind variables picked up the actual values that were peeked when the query was executed, and substituted them into the test as literals. This has allowed the optimizer to discard two simple predicates and one subquery when the production query would need a plan that catered for the possibility that the second subquery would be the one that had to be executed and the first one bypassed. Here’s the corrected where clause using SQL*Plus variables (not the substitution type, the proper type) for the original bind variables:


WHERE
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE'
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS'
AND     MSG.SRCH_4_FLD_VAL = :BindInvoiceTo
AND     (
            (    :BindSearchBy = 'INVOICENUMBER' 
             AND MSG.MSG_ID IN (
                        SELECT  *
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST(:BindInvoiceList) AS TABLE_OF_VARCHAR)))
            )
         OR (    :BindSearchBy = 'SIEBELORDERID' 
             AND MSG.SRCH_3_FLD_VAL IN (
                        SELECT  *
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST(:BindSeibelIDList) AS TABLE_OF_VARCHAR)))
            )
        )
AND     MSG.MSG_ID = TRK.INV_NUM(+)
AND     (   TRK.RESEND_DT IS NULL
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

And this, with the “once good” hint in place to force the use of the XIE2FNM_VSBL_MSG index, is the resulting execution plan


---------------------------------------------------------------------------------------------------------  
| Id  | Operation                           | Name                  | E-Rows |  OMem |  1Mem | Used-Mem |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                    |                       |        |       |       |          |  
|   1 |  SORT ORDER BY                      |                       |      1 | 73728 | 73728 |          |  
|*  2 |   FILTER                            |                       |        |       |       |          |  
|   3 |    NESTED LOOPS OUTER               |                       |      1 |       |       |          |  
|*  4 |     TABLE ACCESS BY INDEX ROWID     | FNM_VSBL_MSG          |      1 |       |       |          |  
|*  5 |      INDEX FULL SCAN                | XIE2FNM_VSBL_MSG      |   4975K|       |       |          |  
|*  6 |     INDEX RANGE SCAN                | XPKBCS_INV_RESEND_TRK |      1 |       |       |          |  
|*  7 |    COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |       |       |          |  
|*  8 |    COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |       |       |          |  
|   9 |    SORT AGGREGATE                   |                       |      1 |       |       |          |  
|  10 |     FIRST ROW                       |                       |      1 |       |       |          |  
|* 11 |      INDEX RANGE SCAN (MIN/MAX)     | XPKBCS_INV_RESEND_TRK |      1 |       |       |          |  
---------------------------------------------------------------------------------------------------------  
 
Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter((((:BINDSEARCHBY='INVOICENUMBER' AND  IS NOT NULL) OR  
              (:BINDSEARCHBY='SIEBELORDERID' AND  IS NOT NULL)) AND ("TRK"."RESEND_DT" IS NULL OR  
              "TRK"."RESEND_DT"=)))  
   4 - filter(("MSG"."SRCH_4_FLD_VAL"=:BINDINVOICETO AND "MSG"."MSG_CAPTR_STG_CD"='PRE_BCS'))  
   5 - access("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
       filter("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
   6 - access("MSG"."MSG_ID"="TRK"."INV_NUM")  
   7 - filter(VALUE(KOKBF$)=:B1)  
   8 - filter(VALUE(KOKBF$)=:B1)  
  11 - access("TRK1"."INV_NUM"=:B1)  

The “unnested driving subquery” approach can no longer be used – we now start with the fnm_vsbl_msg table (accessing it using a most inefficient execution path because that’s what the hint does for us, and we can obey the hint), and for each row check which of the two subqueries we need to execute. There is, in fact, no way we can hint this query to operate efficiently [at least, that’s my opinion, .I may be wrong].

The story so far

If you’re going to try to use SQL*Plus (or similar) to test a production query with bind variables you can’t just use a sample of literal values in place of the bind variables (though you may get lucky sometimes, of course), you should set up some SQL*Plus variables and assign values to them.

Though I haven’t said it presiously in this article this is an example where a decision that really should have been made by the front-end code has been embedded in the SQL and passed to the database as SQL which cannot be run efficiently. The front end code should have been coded to recognise the choice between invoice numbers and Siebel order ids and sent the appropriate query to the database.

Next Steps

WIthout making a significant change to the front-end mechanism wrapper is it possible to change the SQL so something the optimizer can handle efficiently? Sometimes the answer is yes; so I’ve created a simpler model to demonstrate the basic problem and supply a solution for cases like this one. The key issue is finding a way of working around the OR clauses that are trying to allow the optimizer to choose between two subqueries but make it impossible for either to be unnested into a small driving data set.

First, some tables:


rem
rem     Script:         or_in_twice.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,371)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,372)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table t3
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,373)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


Now a query – first setting up a variable in SQL*Plus to allow us to emulate a production query with bind variables. Since I’m only going to use Explain Plan the variable won’t be peekable, so there would still be some scope for this plan not matching a production plan, but it’s adequate to demonstrate the structural problem:


variable v1 varchar2(10)
exec :v1 := 'INVOICE'

explain plan for
select
        t1.v1 
from
        t1
where
        (
            :v1 = 'INVOICE' 
        and t1.id in (select id from t2 where n1 = 0)
        )
or      (
            :v1 = 'ORDERID' 
        and t1.id in (select id from t3 where n1 = 0)
        )
;

select * from table(dbms_xplan.display);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |   150 |    26   (4)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    26   (4)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL| T3   |     1 |     8 |    26   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:V1='INVOICE' AND  EXISTS (SELECT 0 FROM "T2" "T2" WHERE
              "ID"=:B1 AND "N1"=0) OR :V1='ORDERID' AND  EXISTS (SELECT 0 FROM "T3"
              "T3" WHERE "ID"=:B2 AND "N1"=0))
   3 - filter("ID"=:B1 AND "N1"=0)
   4 - filter("ID"=:B1 AND "N1"=0)

As you can see, thanks to the OR that effectively gives Oracle the choice between running the subquery against t3 or the one against t2, Oracle is unable to do any unnesting. (In fact different versions of Oracle allow different levels of sophistication with disjuncts (OR) of subqueries, so this is the kind of example that’s always useful to keep for tests against future versions.)

Since we know that we are going to use one of the data sets supplied in one of the subqueries and have no risk of double-counting or eliminating required duplicates, one strategy we could adopt for this query is to rewrite the two subqueries as a single subquery with a union all – because we know the optimizer can usually handle a single IN subquery very nicely. So let’s try the following:


explain plan for
select
        t1.v1
from
        t1
where
        t1.id in (
                select  id 
                from    t2 
                where   n1 = 0
                and     :v1 = 'INVOICE'
                union all
                select  id 
                from    t3 
                where   n1 = 0
                and     :v1 = 'ORDERID'
        )
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |    54 |  1512 |    77   (3)| 00:00:01 |
|*  1 |  HASH JOIN             |          |    54 |  1512 |    77   (3)| 00:00:01 |
|   2 |   VIEW                 | VW_NSO_1 |    54 |   702 |    51   (2)| 00:00:01 |
|   3 |    HASH UNIQUE         |          |    54 |   432 |    51   (2)| 00:00:01 |
|   4 |     UNION-ALL          |          |       |       |            |          |
|*  5 |      FILTER            |          |       |       |            |          |
|*  6 |       TABLE ACCESS FULL| T2       |    27 |   216 |    26   (4)| 00:00:01 |
|*  7 |      FILTER            |          |       |       |            |          |
|*  8 |       TABLE ACCESS FULL| T3       |    27 |   216 |    26   (4)| 00:00:01 |
|   9 |   TABLE ACCESS FULL    | T1       | 10000 |   146K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="ID")
   5 - filter(:V1='INVOICE')
   6 - filter("N1"=0)
   7 - filter(:V1='ORDERID')
   8 - filter("N1"=0)


Thanks to the FILTERs at operations 5 and 7 this plan will pick the data from just one of the two subqueries, reduce it to a unique list and then use that as the build table to a hash join. Of course, with different data (or suitable hints) that hash join could become a nested loop using a high precision index.

But there’s an alternative. We manually rewrote the two subqueries as a single union all subquery and as we did so we moved the bind variable comparisons inside their respective subqueries; maybe we don’t need to introduce the union all. What would happen if we simply take the original query and move the “constant” predicates inside their subqueries?


explain plan for
select
        t1.v1
from
        t1
where
        t1.id in (select id from t2 where n1 = 0 and :v1 = 'INVOICE')
or      t1.id in (select id from t3 where n1 = 0 and :v1 = 'ORDERID')
;

select * from table(dbms_xplan.display);


-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |    54 |  1512 |    77   (3)| 00:00:01 |
|*  1 |  HASH JOIN             |          |    54 |  1512 |    77   (3)| 00:00:01 |
|   2 |   VIEW                 | VW_NSO_1 |    54 |   702 |    51   (2)| 00:00:01 |
|   3 |    HASH UNIQUE         |          |    54 |   432 |    51   (2)| 00:00:01 |
|   4 |     UNION-ALL          |          |       |       |            |          |
|*  5 |      FILTER            |          |       |       |            |          |
|*  6 |       TABLE ACCESS FULL| T3       |    27 |   216 |    26   (4)| 00:00:01 |
|*  7 |      FILTER            |          |       |       |            |          |
|*  8 |       TABLE ACCESS FULL| T2       |    27 |   216 |    26   (4)| 00:00:01 |
|   9 |   TABLE ACCESS FULL    | T1       | 10000 |   146K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="ID")
   5 - filter(:V1='ORDERID')
   6 - filter("N1"=0)
   7 - filter(:V1='INVOICE')
   8 - filter("N1"=0)

In 12.2.0.1 and 18.3.0.0 it gets the same plan as we did with our “single subquery” rewrite – the optimizer is able to construct the union all single subquery (although the ordering of the subqueries has been reversed) and unnest without any other manual intervention. (You may find that earlier versions of Oracle don’t manage to do this, but you might have to go all the way back to 10g.

Conclusion

Oracle doesn’t like disjuncts (OR) and finds conjuncts (AND) much easier to cope with. Mixing OR and subqueries is a good way to create inefficient execution plans, especially when you try to force the optimizer to handle a decision that should have been taken in the front-end code. The optimizer, however, gets increasingly skilled at handling the mixture as you move through the newer versions; but you may have to find ways to give it a little help if you see it running subqueries as filter subqueries when you’re expecting it to unnest a subquery to produce a small driving data set.

 

June 14, 2019

Trouble-shooting

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 9:19 am BST Jun 14,2019

Here’s an answer I’ve just offered on the ODC database forum to a fairly generic type of problem.

The question was about finding out why a “program” that used to take only 10 minutes to complete is currently taking significantly longer. The system is running Standard Edition, and the program runs once per day. There’s some emphasis on the desirability of taking action while the program is still running with the following as the most recent statement of the requirements:

We have a program which run daily 10minutes and suddenly one day,it is running for more than 10minutes…in this case,we are asked to look into the backend session to check what exactly the session is doing.I understand we have to check the events,last sql etc..but we need to get the work done by that session in terms of buffergets or physical reads(in case of standard edition)

1)In this case,we cannot enable trace to the session as it already started running and 10 minutes passed way.If we enable trace at this point,will it gives the required info?

2)To check the statistics of this specific session,what is the best way to proceed and this is my initial question.One of my friend pointed out to check v$sess_io and he is not sure whether it will return cumulative values because this view has only sid and not having serial#..and sid is not unique

So, answering the questions as stated, with a little bit of padding:

1) In this case, we cannot enable trace to the session as it already started running and 10 minutes passed way.If we enable trace at this point,will it gives the required info?

If the session has started and has been running for 10 minutes it’s still possible to force tracing into the session and, depending what the program does, you may be lucky enough to get enough information in the trace/tkprof file to help you. The “most-approved” way of doing this for a session is probably through a call to dbms_monitor.session_trace_enable(), but if that’s a package that Standard Edition is not licensed to use then there’s dbms_system.set_sql_trace_in_session().

If this doesn’t help, and if the session is still live and running, you could also look at v$open_cursor for that SID to see what SQL statements (sql_id, child_address, last_active_time and first 60 characters of the text) are still open for that session, then query v$sql for more details about those statements (using sql_id and child_address). The stats you find in those statements are accumulated across all executions by all sessions from the moment the cursor went into the library cache, but if this is a program that runs once per day then it’s possible that the critical statement will only be run by that one session, and the cursor from the previous day will have aged out of the library cache so that what you see is just today’s run.

Since you’re on Standard Edition and don’t have access to the AWR you should have installed Statspack – which gives you nearly everything that you can get from the AWR reports (the most significant difference being the absence of the v$active_session_history – but there are open-source emulators that you can install as a fairly good substitute for that functionality). If there is one statement in your program that does a lot of work then it’s possible that it might be one of the top N statements in a Statspack snapshot.

If this program is a known modular piece of code could you alter the mechanism that calls it to include calls to enable session tracing at the start of the program (and disable it, perhaps, at the end of the progam).  This might be by modifying the code directly, or by modifying the wrapper that drive the program, or by adding a logon trigger if there’s a mechanism that would allow Oracle to recognise the specific session that runs this particular program, or if something sets an identifiable (and unambiguous) module and/or action as part of calling the program then you could use the dbms_monitor package to enable tracing for (e.g.) a particular module and action on a permanent basis.

2) To check the statistics of this specific session,what is the best way to proceed and this is my initial question.One of my friend pointed out to check v$sess_io and he is not sure whether it will return cumulative values because this view has only sid and not having serial#..and sid is not unique

An answer: the stats are for one SID and SERIALl#, whether you’re looking at v$sess_io, v$sesstat, v$session_event, v$sess_time_model and any other v$sesXXX views that I can’t recall off-hand.  In passing, if you can add a wrapper to the calling code, capturing sessions activity stats (v$sesstat) wait time (v$session_event) and time model summary (v$sess_time_model) is a very useful debugging aid.

And an explanation: the “session” array is a fixed size array, and the SID is the array subscript of the entry your session acquired at logon. Since the array is fixed size Oracle has to keep re-using the array entries so each time it re-uses an array entry it increments the serial# so that (sid, serial#) becomes the unique identifier across history[1]. As it acquires the array entry it also resets/re-initializes all the other v$sesXXX arrays/linked list/structures.

The one thing to watch out for when you try to capture any of the session numbers is that you don’t query these v$sesXXX things twice to find a difference unless you also capture the serial# at the same time so that you can be sure that the first session didn’t log off and a second session log on and reuse the same SID between your two snapshots.  (It’s quite possible that this will be obvious even if you haven’t captured the serial#, because you may spot that some of the values that should only increase with time have decreased)

 

Footnote

[1] I think there was a time when restarting an instance would reset the serial# to 1 and the serial# has to wrap eventually and the wrap used to be at 65536 because it was stored as a two-byte number – which doesn’t seem to have changed.  Nowadays the serial# seems to be maintained across instance restart (I wonder how that works with carefully timed instance crashes), and given the amount of history that Oracle could maintain in the AWR I suppose there could be some scope for connect two pieces of history that were really from two different sessions.

 

June 13, 2019

Parallel Fun – 2

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 6:24 pm BST Jun 13,2019

I started writing this note in March 2015 with the following introductory comment:

A little while ago I wrote a few notes about a very resource-intensive parallel query. One of the points I made about it was that it was easy to model, and then interesting to run on later versions of Oracle. So today I’m going to treat you to a few of the observations and notes I made after modelling the problem; and here’s the SQL to create the underlying objects:

Unfortunately I failed to do anything more with the model I had created until a few days ago (June 2019 – in case I stall again) when a related question came up on the ODC database forum. This time I’m ready to go a little further – so I’ll start with a bait-and-switch approach. Here are the first few lines (omitting the SQL) of an SQL Monitor report from an instance of 18.3 – is this a power-crazed machine or what ?


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TEST_USER (169:11324)
 SQL ID              :  73y5quma4jnw4
 SQL Execution ID    :  16777216
 Execution Started   :  06/13/2019 22:06:32
 First Refresh Time  :  06/13/2019 22:06:32
 Last Refresh Time   :  06/13/2019 22:07:03
 Duration            :  31s
 Module/Action       :  MyModule/MyAction
 Service             :  SYS$USERS
 Program             :  sqlplus@linux183.localdomain (TNS V1-V3)
 Fetch Calls         :  591

Global Stats
=========================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency |  Other   | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes |
=========================================================================================
|      14 |    3.18 |     0.00 |        0.05 |       11 |   591 |  25978 |   62 |  13MB |
=========================================================================================

Parallel Execution Details (DOP=3 , Servers Allocated=6730)
==========================================================================================

It didn’t take long to run the query, only about 31 seconds. But the thing to notice in the report is that while the DOP is reported as 3, the number of “Servers Allocated” is a massive 6,730. So the big question – before I show you more of the report, explain what’s happening, and supply the code to build the model: how many PX processes did I actually start.

Here’s a little more of the output:


Parallel Execution Details (DOP=3 , Servers Allocated=6730)
==========================================================================================================================================================
|      Name      | Type  | Group# | Server# | Elapsed |   Cpu   |    IO    | Concurrency |  Other   | Buffer | Read | Read  |        Wait Events         |
|                |       |        |         | Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes |         (sample #)         |
==========================================================================================================================================================
| PX Coordinator | QC    |        |         |      14 |    3.13 |          |        0.05 |       11 |  23727 |      |     . | PX Deq: Join ACK (5)       |
|                |       |        |         |         |         |          |             |          |        |      |       | PX Deq: Signal ACK EXT (2) |
|                |       |        |         |         |         |          |             |          |        |      |       | sql_id: 6405a2hc50bt4 (1)  |
| p004           | Set 1 |      1 |       1 |    0.00 |    0.00 |          |             |          |    180 |      |     . | library cache: mutex X (1) |
|                |       |        |         |         |         |          |             |          |        |      |       |                            |
| p005           | Set 1 |      1 |       2 |    0.00 |    0.00 |          |             |          |    100 |      |     . |                            |
| p006           | Set 1 |      1 |       3 |    0.00 |    0.00 |          |             |          |     90 |      |     . |                            |
| p000           | Set 1 |      2 |       1 |    0.01 |    0.01 |          |             |          |        |      |     . |                            |
| p001           | Set 1 |      2 |       2 |    0.02 |    0.02 |          |             |          |        |      |     . |                            |
| p002           | Set 2 |      2 |       1 |    0.01 |    0.01 |     0.00 |             |          |    944 |   32 |   7MB |                            |
| p003           | Set 2 |      2 |       2 |    0.01 |    0.01 |     0.00 |             |          |    937 |   30 |   7MB |                            |
==========================================================================================================================================================

Despite “allocating” 6,730 servers Oracle is only admitting to having used 7 of them -so let’s take a closer look at how they’re used. There are two groups, and we have one set of 3 slaves in group 1, and two sets of two slaves in group 2. (It looks to me as if the Group# and Type columns should be the other way around given the hierarchy of group / type / server#). We can understand a little more of what these numbers mean if we look at the execution plan – particularly the special columns relating to Data Flow Operations (DFOs) and “DFO trees”.


SQL Plan Monitoring Details (Plan Hash Value=3398913290)
========================================================================================================================================================================
| Id |          Operation           |   Name   |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity |      Activity Detail       |
|    |                              |          | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |        (# samples)         |
========================================================================================================================================================================
|  0 | SELECT STATEMENT             |          |         |      |        32 |     +0 |     1 |     8846 |      |       |     . |     2.70 | Cpu (1)                    |
|  1 |   FILTER                     |          |         |      |        32 |     +0 |     1 |     8846 |      |       |     . |     5.41 | PX Deq: Signal ACK EXT (2) |
|  2 |    PX COORDINATOR            |          |         |      |        32 |     +0 |     5 |     8846 |      |       |     . |          |                            |
|  3 |     PX SEND QC (RANDOM)      | :TQ20002 |    9146 |  128 |        29 |     +2 |     2 |     8846 |      |       |     . |          |                            |
|  4 |      HASH JOIN BUFFERED      |          |    9146 |  128 |        29 |     +2 |     2 |     8846 |      |       |   9MB |          |                            |
|  5 |       PX RECEIVE             |          |    8846 |   11 |        14 |     +2 |     2 |     8846 |      |       |     . |          |                            |
|  6 |        PX SEND HYBRID HASH   | :TQ20000 |    8846 |   11 |         1 |     +0 |     2 |     8846 |      |       |     . |          |                            |
|  7 |         STATISTICS COLLECTOR |          |         |      |         1 |     +0 |     2 |     8846 |      |       |     . |          |                            |
|  8 |          PX BLOCK ITERATOR   |          |    8846 |   11 |         1 |     +0 |     2 |     8846 |      |       |     . |          |                            |
|  9 |           TABLE ACCESS FULL  | T2       |    8846 |   11 |         1 |     +0 |    23 |     8846 |   24 |   1MB |     . |          |                            |
| 10 |       PX RECEIVE             |          |   50000 |  116 |        14 |     +2 |     2 |     2509 |      |       |     . |          |                            |
| 11 |        PX SEND HYBRID HASH   | :TQ20001 |   50000 |  116 |         1 |     +0 |     2 |     2509 |      |       |     . |          |                            |
| 12 |         PX BLOCK ITERATOR    |          |   50000 |  116 |         1 |     +0 |     2 |     2509 |      |       |     . |          |                            |
| 13 |          TABLE ACCESS FULL   | T1       |   50000 |  116 |         1 |     +0 |    26 |     2509 |   38 |  12MB |     . |          |                            |
| 14 |    PX COORDINATOR            |          |         |      |        31 |     +1 |  8978 |     2252 |      |       |     . |    13.51 | PX Deq: Join ACK (5)       |
| 15 |     PX SEND QC (RANDOM)      | :TQ10000 |       1 |   77 |        32 |     +0 |  6667 |     3692 |      |       |     . |          |                            |
| 16 |      PX BLOCK ITERATOR       |          |       1 |   77 |        32 |     +0 |  6667 |    92478 |      |       |     . |     2.70 | Cpu (1)                    |
| 17 |       TABLE ACCESS FULL      | T3       |       1 |   77 |        32 |     +0 | 53118 |    92478 |   32 |   8MB |     . |    67.57 | Cpu (25)                   |
========================================================================================================================================================================

The “Name” column shows us that we have two DFO trees (:TQ2nnnn, and :TQ1nnnn) – this is why we see two “groups” in PX server detail, and why those groups can have difference deggrees of parallelism.

Looking at the general shape of the plan you can see that operation 1 is a FILTER operation with two child operations, one at operation 2 the other at operation 14. So we probably have a filter subquery in place operated as DFO tree #1 while the main query is operated as DFO tree #2. This means the main query is running at DOP = 2 (it’s a hash join with hash distribution so it needs two sets of slave processes so all the details agree with what we’ve learned abaout Group# 2 above); and the subquery is operating a DOP = 3 – and it’s using only one set of slave processes.

There is a little anomaly in the number of Execs of operation 14 – at some point I will examine this more closely, but it might simply be a reporting error that has added the number of Execs of its child operations to its own Execs, it might be something to do with counting in Exec calls by its parent, it might be a side effect of scalar subquery caching. I’ll worry about it when I have a good reason to do so. What I want to look at is the Execs of operations 15/16, the PX Block Iterator / PX Send QC. There are 6,667 reports of PX slave executing, and that matches up quite nicely with the 6,730 reported “Servers Allocated” – so it would appear that Oracle says it’s allocating a server whenever it uses a server. But does it really “allocate” (and, presumably, de-allocate).

Here’s how you find out – you run the query again, taking various snapshot and looking for cross-references. I’ve got some results from v$pq_tqstat and v$pq_slace for the run that produced the SQL Monitor report above, and some of the QC session stats and enqueue stats for a subsequent run. This is what we see:


select  process, count(*) 
from    v$pq_tqstat 
group by 
        process 
order by 
        process
;


PROCESS                    COUNT(*)
------------------------ ----------
P000                              3
P001                              3
P002                              2
P003                              2
P004                           2225
P005                           2214
P006                           2218
QC                             2243


SQL> select slave_name, sessions from V$pq_slave order by slave_name;

SLAV   SESSIONS
---- ----------
P000          1
P001          1
P002          1
P003          1
P004       2242
P005       2242
P006       2242

Key Session Stats
=================
Name                                                                         Value                                                                          
----                                                                         -----                                                                          
opened cursors cumulative                                                    6,955                                                                          
user calls                                                                  20,631                                                                          
recursive calls                                                             20,895                                                                          
enqueue requests                                                            22,699                                                                          
enqueue conversions                                                         13,610                                                                          
enqueue releases                                                            15,894                                                                          
parse count (total)                                                          6,857                                                                          
execute count                                                                6,966                                                                          
DFO trees parallelized                                                           2
Parallel operations not downgraded                                           2,268

Key Enqueue Stats
=================
Type Short name                   Requests       Waits     Success      Failed    Wait m/s                                                                  
---- ----------                   --------       -----     -------      ------    --------                                                                  
DA   Slave Process Array             2,272          13       2,272           0          43                                                                  
PS   contention                     27,160       1,586      27,080           7         415                                                                  
SE   contention                      6,784           0       6,785           0           0                                                                  

TYPE                 DESCRIPTION
-------------------- ------------------------------------------------------------------------
PS                   Parallel Execution Server Process reservation and synchronization
DA                   Slave Process Spawn reservation and synchronization
SE                   Lock used by transparent session migration

Oracle really did start and stop something like 6,700 PX session (constantly re-using the same small set of PX slave processes) for each execution of the filter subquery. This is definitely a performance threat – we keep acquiring and releasing PX slaves, we keep creating new sessions (yes, really), and we keep searching for cursors in the library cache. All these activities are highly contentious. If you start running multiple queries that do this sort of thing you find that you see increasing amounts of time being spent on latch contention, PX slave allocation, mutex waits, and all the other problems you get with sessions that log on, do virtually nothing, then log off in rapid succession.

So how do you write SQL that does this type of thing. Here’s my data model (you may want to limit the number of rows in the tables:


create table t1 as
select * from all_source;

create table t2 as
select * from all_source where mod(line,20) = 1;

create table t3 as
select * from all_source;

And here’s all you have to do to start creating problems – I’ve added explicit hints to force parallelism (particularly for the subquery), it’s more likely that it has been introduced accidentally by table or index definitions, or by an “alter session” to “force parallel”:


set feedback only

select
        /*+ 
                parallel(t1 2) 
                parallel(t2 2)
                leading(t1 t2)
                use_hash(t2)
                swap_join_inputs(t2)
                pq_distribute(t2 hash hash)
                cardinality(t1,50000)
        */
        t1.owner,
        t1.name,
        t1.type
from
        t1
join
        t2
on      t2.owner = t1.owner
and     t2.name = t1.name
and     t2.type = t1.type
where
        t1.line = 1
and     (
           mod(length(t1.text), 10) = 0
        or exists (
                select --+ parallel(t3 3) 
                        null
                from    t3
                where   t3.owner = t1.owner
                and     t3.name = t1.name
                and     t3.type = t1.type
                and     t3.line >= t1.line
                )
        )
;

set feedback on

I’ve written notes in the past about SQL that forces the optimizer to run subqueries as filter subqueries instead of unnesting them – this is just an example of that type of query, pushed into parallelism. It’s not the only way (see comment #1 from Dmitry Remizov below) to end up with scalar subqueries being executed many times as separate DFO trees even though Oracle has enhanced the optimizer several times over the years in ways that bypass the threat – but the probalm can still appear and it’s important to notice in development that you’ve got a query that Oracle can’t work around.

 

June 11, 2019

Redo Dumps

Filed under: ASSM,Infrastructure,Oracle,redo,Troubleshooting — Jonathan Lewis @ 12:53 pm BST Jun 11,2019

A thread started on the Oracle-L list-server a few days ago asking for help analysing a problem where a simple “insert values()” (that handled millions of rows per day) was running very slowly. There are many reasons why this might happen, ranging from the trivial (someone has locked the table in exclusive mode), through the slightly subtle (we’re trying to insert a row that collides on a uniqueness constraint with an uncommitted insert from another session) to the subtle (Oracle has to read through the undo to check current versions of blocks against read-consistent versions) ending up at the esoteric (the ASSM space management blocks are completely messed up again).

A 10046 trace of a session doing an insert showed only that there was a lot of time spent on single block reads. Unfortunately, since this was on an Exadata system the waits were reported as “cell single block physical read”. Unfortunately the parameters to this wait event are “cellhash#”, “diskhash#”, and “bytes” and we don’t see the file_id, block_id which can be very helpful for a case like this. The only information we got from the trace file was that the object_id was for the table were rows were being inserted.

Before digging into exotic debugging methods, the OP supplied us with a 1-second session report from Tanel Poder’s snapper script:


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   1070  @2, SYSADMIN, STAT, session logical reads                                     ,         13865,      7.73k,         ,             ,          ,           ,      14.1k total buffer visits
>   1070  @2, SYSADMIN, STAT, user I/O wait time                                        ,           141,      78.65,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, non-idle wait time                                        ,           141,      78.65,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, non-idle wait count                                       ,         12230,      6.82k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total IO requests                           ,          6112,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read requests optimized                          ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total bytes optimized                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total bytes                                 ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, cell physical IO interconnect bytes                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, gcs messages sent                                         ,             3,       1.67,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets                                             ,         13860,      7.73k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets from cache                                  ,         13860,      7.73k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets from cache (fastpath)                       ,          7737,      4.32k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets                                           ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets from cache                                ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets pin                                       ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets pin (fastpath)                            ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, logical read bytes from cache                             ,     113541120,     63.34M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical reads                                            ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical reads cache                                      ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read IO requests                                 ,          6112,      3.41k,         ,             ,          ,           ,      8.19k bytes per request
>   1070  @2, SYSADMIN, STAT, physical read bytes                                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block changes                                          ,            11,       6.14,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, free buffer requested                                     ,          6112,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, hot buffers moved to head of LRU                          ,           958,     534.39,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, free buffer inspected                                     ,          6144,      3.43k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, shared hash latch upgrades - no wait                      ,             7,        3.9,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, blocks decrypted                                          ,          6110,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo entries                                              ,          6120,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo size                                                 ,        465504,    259.67k,         ,             ,          ,           ,          ~ bytes per user commit
>   1070  @2, SYSADMIN, STAT, redo entries for lost write detection                     ,          6110,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo size for lost write detection                        ,        464756,    259.25k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo subscn max counts                                    ,             7,        3.9,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, file io wait time                                         ,       1408659,    785.78k,         ,             ,          ,           ,   230.47us bad guess of IO wait time per IO request
>   1070  @2, SYSADMIN, STAT, gc current blocks received                                ,             3,       1.67,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, gc local grants                                           ,          6116,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM cbk:blocks examined                                  ,         12366,       6.9k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:L1 bitmaps examined                              ,          2478,      1.38k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:L2 bitmaps examined                              ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:reject db                                        ,         12388,      6.91k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, buffer is pinned count                                    ,           230,      128.3,         ,             ,          ,           ,       1.63 % buffer gets avoided thanks to buffer pin caching
>   1070  @2, SYSADMIN, STAT, cell flash cache read hits                                ,          6723,      3.75k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, TIME, background cpu time                                       ,        365192,   203.71ms,    20.4%, [##        ],          ,           ,
>   1070  @2, SYSADMIN, TIME, background elapsed time                                   ,       1273623,   710.45ms,    71.0%, [########  ],          ,           ,      28.95 % unaccounted time
>   1070  @2, SYSADMIN, WAIT, gc current block busy                                     ,           629,   350.87us,      .0%, [          ],         3,       1.67,   209.67us average wait
>   1070  @2, SYSADMIN, WAIT, cell single block physical read                           ,       1557638,   868.88ms,    86.9%, [WWWWWWWWW ],      6746,      3.76k,    230.9us average wait
>  
> --  End of Stats snap 1, end=2019-05-18 12:58:58, seconds=1.8

My first step was simply to read down the list (using a very small font to get the entire width on screen without wrapping) to see if anything stood out as unusual. The report showed two things I rarely see in the session stats:


blocks decrypted                                          ,          6110
redo entries for lost write detection                     ,          6110

These stats tell me that there are two “uncommon” features enabled: db_lost_write_protect, and block level encryption. (So whatever else is going on it’s just possible that mixing in two rarely used – and therefore less frequently tested – features may be confusing the issue.

Lost write protection means Oracle writes a “block read record” (BRR) to the redo log every time it reads a block from disc, so I decided to follow up the 6,110 figure to see what other stats reported similar values.


physical read total IO requests                           ,          6112
physical read requests optimized                          ,          6111
physical reads                                            ,          6111
physical reads cache                                      ,          6111
physical read IO requests                                 ,          6112
free buffer requested                                     ,          6112
redo entries                                              ,          6120
redo entries for lost write detection                     ,          6110
gc local grants                                           ,          6116
cell flash cache read hits                                ,          6723

There’s nothing particularly surprising here – basically we see all the blocks being read as single block reads, into cache. All the necessary global cache (gc) grants are local so it’s possible the table of interest has been remastered to this node. The value for “cell flash cache read hits” look a little odd as the cache is hit more frequently than blocks are read – but dynamic performance views are not read-consistent and this session is hammering away like crazy so this might just be a side effect of the time to gather the data for the report.

We can chase the redo a little further – the number of redo entries is slightly larger than the number of blocks read, so (even though small inconsistencies are not necessarily meaningful) this might tell us something:


redo entries                                              ,          6120
redo size                                                 ,        465504
redo entries for lost write detection                     ,          6110
redo size for lost write detection                        ,        464756 
db block changes                                          ,            11

The number of “redo entries” that were NOT for lost write detection is 10, totalling 748 bytes (not a lot – so indicative of “non-user” activity). The number of “db block changes” is 11 (close enough to 10), and generally it’s changes to db blocks that require redo to be generated. The final significant number is the one that isn’t there – there’s no undo generated, so no user-change to data. This system is working like crazy achieving absolutely nothing at this point.

The next point to ponder is what sort of work it is doing – so let’s check how the physical reads turn into buffer gets.


session logical reads                                     ,         13865
db block gets                                             ,         13860
db block gets from cache                                  ,         13860
db block gets from cache (fastpath)                       ,          7737
consistent gets                                           ,             1
consistent gets from cache                                ,             1
consistent gets pin                                       ,             1
consistent gets pin (fastpath)                            ,             1
hot buffers moved to head of LRU                          ,           958

buffer is pinned count                                    ,           230 

The unusual thing you notice with these figures is that virtually every buffer get is a current get. We’ve also got a number of blocks pinned – this might just be the segment header block, or the segment header and level 2 bitmap block that we keep revisiting. Finally we can see a lot of hot buffers being moved to the head of the LRU; since our session has been doing a lot of work for a long time it seems likely that those buffers are ones that our session is keeping hot – and for a big insert that shouldn’t really be happening unless, perhaps, we were managing to do a lot of maintenance of (well-clustered) indexes.

I’ve isolated the (new in 12.2) “ASSM gsp (get space)” statistics from this output – they’re all about handling blocks, but I wanted to look at them without being distracted by other stats.


ASSM cbk:blocks examined                                  ,         12366 
ASSM gsp:L1 bitmaps examined                              ,          2478 
ASSM gsp:L2 bitmaps examined                              ,             1 
ASSM gsp:reject db                                        ,         12388

We can see that we’ve examined 2,478 “level 1” bitmap blocks. A level 1 block holds the basic “bitmap” that records the state of a number of data blocks (typically 128 blocks once the object gets very large) so our session has worked its way through 2,478 maps trying to find a data block that it could use to insert a row. The “reject db” statistic tells us about data blocks that have been examined and rejected (presumably because the row we want to insert is too large to fit, or maybe because there are no free ITL (interested transaction list) entries available in the block). So we seem to be spending all our time searching for somewhere to insert rows. This shouldn’t really be happening – it’s a type of problem that Oracle has been worrying away at for quite some time: how do you avoid “losing” space by updating bitmap blocks too soon on inserts without going to the opposite extreme and leaving bitmap blocks that claim the space is free when it’s in use by uncommitted transactions.

Note to self: I don’t know how we managed to reject more blocks (12,388) than we’ve examined (12,366) but possibly it’s just one of those timing glitches (the error is less than one fifth of one percent) , possibly it’s something to do with the reject count including some of the L1 bitmap blocks.

Clearly there’s something funny going on with space management – and we need to look at a few blocks that are exhibiting the problems. But how do we find a few suitable blocks? And that’s where, finally, we get to the title of the piece.

We are in the lucky position of having “lost write protection” enabled – so the redo log file will hold lots of “block read records”. We can’t get the file and block addresses we need from the “cell physical read” wait events so let’s ask the redo log to supply them. We just have to pick a log file (online or archived) and tell Oracle to dump some of it – and we can probably get away with a fairly small dump since we want just a single type of redo record over a short period of time. Here’s an example showing the format of two slightly different commands you could execute:


alter system dump logfile '/u01/app/oracle/oradata/orcl12c/redo03.log'
        rba min 2781    1
        rba max 2781    1000
        layer 23 opcode 2
;

alter system dump redo 
        scn min 19859991 scn max 19964263 
        layer 23 opcode 2
;

The first command is to dump a log file by name – but you may have to fiddle around a bit to find the names of an archived log file because if you choose this option you need to know the sequence number (sequence# in v$log_history) of the file if you want to restrict the size of the dump. The second command simply dumps redo for (in this example) an SCN range – and it’s easy to query v$log_history to find dates, times, and SCN ranges – Oracle will work out for itself which file it has to access. In both cases I’ve restricted the dump to just those redo records that contain change vectors of type BRR (block read records) which is what the layer 23 opcode 2 line is about.

Here’s an example of a redo record that contains nothing but a single BRR. (It’s from a single-block read, a multi-block read would produce a redo record with multiple change vectors, one vector for each block read.)


REDO RECORD - Thread:1 RBA: 0x000add.00000019.01b0 LEN: 0x004c VLD: 0x10 CON_UID: 2846920952
SCN: 0x00000000025a7c13 SUBSCN:  1 05/23/2019 10:42:51
CHANGE #1 CON_ID:3 TYP:2 CLS:6 AFN:9 DBA:0x00407930 OBJ:40 SCN:0x00000000001a1e2a SEQ:2 OP:23.2 ENC:0 RBL:0 FLG:0x0000
 Block Read - afn: 9 rdba: 0x00407930 BFT:(1024,4225328) non-BFT:(1,31024)
              scn: 0x00000000001a1e2a seq: 0x02
              flags: 0x00000006 ( dlog ckval )
              where: qeilwh05: qeilbk

If you’re wondering about the two interpretations of the rdba (relative datablock address), one is for BFTs (big file tablespaces) and one for non-BFTs. The other thing you’ll notice about the interpretations is that neither file number (1024 or 1) matches the afn (absolute file number). In smaller, non-CDB databases you will probably find that the afn matches the file number in the non-BFT interpretation, but I happen to be testing on a PDB and the first file in my SYSTEM tablespace happens to be the 9th file created in the CDB – connecting as SYS in my PDB I can compare the absolute and “relative” file number very easily:


SQL> select file#, rfile#, name from v$datafile;

     FILE#     RFILE# NAME
---------- ---------- ----------------------------------------------------------------
         9          1 /u01/app/oracle/oradata/orcl12c/orcl/system01.dbf
        10          4 /u01/app/oracle/oradata/orcl12c/orcl/sysaux01.dbf
        11          9 /u01/app/oracle/oradata/orcl12c/orcl/undotbs01.dbf
        12         12 /u01/app/oracle/oradata/orcl12c/orcl/users01.dbf
        13         13 /u01/app/oracle/oradata/orcl12c/orcl/APEX_1991375173370654.dbf
        14         14 /u01/app/oracle/oradata/orcl12c/orcl/APEX_1993195660370985.dbf
        22         22 /u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
        23         23 /u01/app/oracle/oradata/orcl12c/orcl/test_8k.dbf

8 rows selected.

For bigfile tablespaces the “relative” file number is a complete fake and simply reports 1024 – you’re only allowed one file in a bigfile tablespace, so there is no “relativity” involved. (Unless you’re working at CERN and storing data about particle collisions in the LHC.)

The key point to remember when reading BRRs then, is that you should take the file number from the afn and the block number from the (appropriate) interpretation of the rdba. For the example above I would issue: “alter database dump datafile 9 block 31024;”

Finally

The originator of the thread hasn’t yet made any public response to the suggestion of dumping and reviewing blocks – possibly they’ve started a private conversation with Stefan Koehler who had suggested a strategy that examined function calls rather than block contents – so we’re unable to do any further analysis on what’s going on behind the scenes.

What we would be looking for is any indication that Oracle is repeatedly re-reading the same bitmap blocks and the same data blocks (by a simple check of block addresses); and if that is the case we would want to get some clue about why that might be happening by examining the contents of the data blocks that are subject to repeated reads without changing their status in the bitmap from “space available” to “full”.  As it is we just have to wait for the OP to tell us if they’ve made any further progress.

 

June 9, 2019

CPU percent

Filed under: 12c,AWR,Oracle — Jonathan Lewis @ 2:31 pm BST Jun 9,2019

A recent post on the ODC General Database forum asked for an explanation of the AWR report values “%Total CPU” and “%Busy CPU” under the “Instance CPU” label, and how the “%Busy CPU “ could be greater than 100%.  Here’s a text reproduction of the relevant sample supplied:

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
2 2 1 0.30 1.23 10.7 5.6 5.3 77.7

Instance CPU

%Total CPU %Busy CPU %DB Time waiting for CPU (Resource Manager)
29.8 133.8 0.0

The answer is probably “It’s 12.1 and it’s a programmer error”.

  • Note that the Host CPU %Idle is not consistent with the three usage figures:  10.7 + 5.6 + 5.3 = 21.6 whereas 100 – 77.7 = 22.3.
  • So let’s run with 22.3% and see what else we can notice: 29.8 / 22.3 = 1.3363 – that’s pretty close (when expressed as a percentage) to 133.8%

Hypothesis:

Someone did the division the wrong way round when trying to work out the percentage of the host’s non-idle CPU that could be attributed to the instance. In this example the “%Busy CPU” should actually report 100 * 22.3 / 29.8 = 74.8%

Note – the difference between 133.8 and 133.63 can be attributed to the fact that the various figures reported in this bit of the AWR are rounded to the nearest 1 decimal place.

Note 2 – I don’t think this error is present in 11.2.0.4 or 12.2.0.1

 

 

 

 

 

June 6, 2019

Scalar Subquery Costing

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 7:54 pm BST Jun 6,2019

A question came up on Oracle-l list-server a few days ago about how Oracle calculates costs for a scalar subquery in the select list. The question included an example to explain the point of the question. I’ve reproduced the test below, with the output from an 18.3 test system. The numbers don’t match the numbers produced in the original posting but they are consistent with the general appearance.

rem
rem     Script:         ssq_costing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t_1k ( n1 integer ) ;
create table t_100k ( n1 integer ) ;

insert into t_1k
  select
         level
    from dual
    connect by level <= 1e3;

insert into t_100k
  select level
    from dual
    connect by level <= 1e5;

commit ;

begin
  dbms_stats.gather_table_stats ( null, 'T_1K') ;
  dbms_stats.gather_table_stats ( null, 'T_100K') ;
end ;
/

explain plan for
select 
        /*+ qb_name(QB_MAIN) */
        (
        select /*+ qb_name(QB_SUBQ) */ count(*)
        from t_1k
        where t_1k.n1 = t_100k.n1
        )
from t_100k
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100K|   488K|  1533K  (2)| 00:01:00 |
|   1 |  SORT AGGREGATE    |        |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T_1K   |     1 |     4 |    17   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL | T_100K |   100K|   488K|    36   (9)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T_1K"."N1"=:B1)

The key point to note is this – the scalar subquery has to execute 100,000 times because that’s the number of rows in the driving table. The cost for executing the scalar subquery once is 17 – so the total cost of the query should be 1,700,036 – not 1,533K (and for execution plans the K means x1000, not x1024). There’s always room for rounding errors, of course, but a check of the 10053 (CBO trace) file shows the numbers to be 17.216612 for the t_1k tablescan, 36.356072 for the t_100K tablescan, and 1533646.216412 for the whole query. So how is Oracle managing to get a cost that looks lower than it ought to be?

There’s plenty of scope for experimenting to see how the numbers change – and my first thought was simply to see what happens as you change the number of distinct values in the t_100K.n1 column. It would be rather tedious to go through the process of modifying the data a few hundred times to see what happens, so I took advantage of the get_column_stats() and set_column_stats() procedures in the dbms_stats package to create a PL/SQL loop that faked a number of different scenarios that lied about the actual table data.


delete from plan_table;
commit;

declare

        srec                    dbms_stats.statrec;
        n_array                 dbms_stats.numarray;

        m_distcnt               number;
        m_density               number;
        m_nullcnt               number;
        m_avgclen               number;


begin

        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 't_100k',
                colname         => 'n1', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

        for i in 1 .. 20 loop

                m_distcnt := 1000 * i;
                m_density := 1/m_distcnt;

                dbms_stats.set_column_stats(
                        ownname         => user,
                        tabname         => 't_100k',
                        colname         => 'n1', 
                        distcnt         => m_distcnt,
                        density         => m_density,
                        nullcnt         => m_nullcnt,
                        srec            => srec,
                        avgclen         => m_avgclen
                ); 


        execute immediate
        '
                explain plan set statement_id = ''' || m_distcnt || 
        '''
                for
                select
                        /*+ qb_name(QB_MAIN) */
                        (
                        select /*+ qb_name(QB_SUBQ) */ count(*)
                        from t_1k
                        where t_1k.n1 = t_100k.n1
                        )
                from t_100k
        ';
        
        end loop;       

end;
/

The code is straightforward. I’ve declared a few variables to hold the column stats from the t_100k.n1 column, called get_column stats(), then looped 20 times through a process that changes the number of distinct values (and corresponding density) recorded in the column stats, then used execute immediate to call “explain plan” for the original query.

You’ll notice I’ve given each plan a separate statement_id that corresponds to the num_distinct that generated the plan. In the code above I’ve changed the num_distinct from 1,000 to 20,000 in steps of 1,000.

Once the PL/SQL block ends I’ll have a plan table with 20 execution plans stored in it and, rather than reporting those plans with calls to dbms_xplan.display(), I’m going to be selective about which rows and columns I report.

select
        statement_id, 
        io_cost,
        io_cost - lag(io_cost,1) over (order by to_number(statement_id)) io_diff,
        cpu_cost,
        cpu_cost - lag(cpu_cost,1) over (order by to_number(statement_id)) cpu_diff,
        cost
from 
        plan_table
where 
        id = 0
order by 
        to_number(statement_id)
;

I’ve picked id = 0 (the top line of the plan) for each statement_id and I’ve reported the cost column, which is made up of the io_cost column plus a scaled down value of the cpu_cost column. I’ve also used the analytic lag() function to calculate how much the io_cost and cpu_cost changed from the previous statement_id. Here are my results from 18c:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
1000                                17033            1099838920                 17253
2000                                34033      17000 2182897480 1083058560      34470
3000                                51033      17000 3265956040 1083058560      51686
4000                                68033      17000 4349014600 1083058560      68903
5000                                85033      17000 5432073160 1083058560      86119
6000                               102033      17000 6515131720 1083058560     103336
7000                               119033      17000 7598190280 1083058560     120553
8000                               136033      17000 8681248840 1083058560     137769
9000                               153033      17000 9764307400 1083058560     154986
10000                              170033      17000 1.0847E+10 1083058560     172202
11000                              197670      27637 1.2608E+10 1760725019     200191
12000                              338341     140671 2.1570E+10 8962036084     342655
13000                              457370     119029 2.9153E+10 7583261303     463200
14000                              559395     102025 3.5653E+10 6499938259     566525
15000                              647816      88421 4.1287E+10 5633279824     656073
16000                              725185      77369 4.6216E+10 4929119846     734428
17000                              793452      68267 5.0565E+10 4349223394     803565
18000                              854133      60681 5.4431E+10 3865976350     865019
19000                              908427      54294 5.7890E+10 3459031472     920005
20000                              957292      48865 6.1003E+10 3113128324     969492

The first pattern that hits the eye is the constant change of 17,000 in the io_cost in the first few lines of the output. For “small” numbers of distinct values the (IO) cost of the query is (33 + 17 * num_distinct) – in other words, the arithmetic seems to assume that it will execute the query once for each value and then cache the results so that repeated executions for any given value will not be needed. This looks as if the optimizer is trying to match its arithmetic to the “scalar subquery caching” mechanism.

But things change somewhere between 10,000 and 11,000 distinct values. The point comes where adding one more distinct value causes a much bigger jump in cost than 17, and that’s because Oracle assumes it’s reached a point where there’s a value that it won’t have room for in the cache and will have to re-run the subquery multiple times for that value as it scans the rest of the table. Let’s find the exact break point where that happens.

Changing my PL/SQL loop so that we calculate m_distcnt as “19010 + i” this is the output from the final query:


-- m_distcnt := 10910 + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911                              185520            1.1834E+10                187887
10912                              185537         17 1.1835E+10    1083059     187904
10913                              185554         17 1.1836E+10    1083058     187921
10914                              185571         17 1.1837E+10    1083059     187938
10915                              185588         17 1.1838E+10    1083058     187956
10916                              185605         17 1.1839E+10    1083059     187973
10917                              185622         17 1.1841E+10    1083059     187990
10918                              185639         17 1.1842E+10    1083058     188007
10919                              185656         17 1.1843E+10    1083059     188025
10920                              185673         17 1.1844E+10    1083058     188042
10921                              185690         17 1.1845E+10    1083059     188059
10922                              185707         17 1.1846E+10    1083058     188076
10923                              185770         63 1.1850E+10    4027171     188140
10924                              185926        156 1.1860E+10    9914184     188298
10925                              186081        155 1.1870E+10    9912370     188455
10926                              186237        156 1.1880E+10    9910555     188613
10927                              186393        156 1.1890E+10    9908741     188770
10928                              186548        155 1.1900E+10    9906928     188928
10929                              186703        155 1.1909E+10    9905114     189085
10930                              186859        156 1.1919E+10    9903302     189243

If we have 10,922 distinct values in the column the optimizer calculates as if it will be able to cache them all; but if we have 10,923 distinct values the optimizer thinks that there’s going to be one value where it can’t cache the result and will have to run the subquery more than once.

Before looking at this in more detail let’s go to the other interesting point – when does the cost stop changing: we can see the cost increasing as the number of distinct values grows, we saw at the start that the cost didn’t seem to get as large as we expected, so there must be a point where it stops increasing before it “ought” to.

I’ll jump straight to the answer: here’s the output from the test when I start num_distinct off at slightly less than half the number of rows in the table:


 -- m_distcnt := (50000 - 10) + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991                             1514281            9.6488E+10               1533579
49992                             1514288          7 9.6489E+10     473357    1533586
49993                             1514296          8 9.6489E+10     473337    1533594
49994                             1514303          7 9.6490E+10     473319    1533601
49995                             1514311          8 9.6490E+10     473299    1533609
49996                             1514318          7 9.6491E+10     473281    1533616
49997                             1514325          7 9.6491E+10     473262    1533624
49998                             1514333          8 9.6492E+10     473243    1533631
49999                             1514340          7 9.6492E+10     473224    1533639
50000                             1514348          8 9.6493E+10     473205    1533646
50001                             1514348          0 9.6493E+10          0    1533646
50002                             1514348          0 9.6493E+10          0    1533646
50003                             1514348          0 9.6493E+10          0    1533646
50004                             1514348          0 9.6493E+10          0    1533646
50005                             1514348          0 9.6493E+10          0    1533646
50006                             1514348          0 9.6493E+10          0    1533646
50007                             1514348          0 9.6493E+10          0    1533646
50008                             1514348          0 9.6493E+10          0    1533646
50009                             1514348          0 9.6493E+10          0    1533646
50010                             1514348          0 9.6493E+10          0    1533646

The cost just stops changing when num_distinct = half the rows in the table.

Formulae

During the course of these experiments I had been exchanging email messages with Nenad Noveljic via the Oracle-L list-server (full monthly archive here) and he came up with the suggesion of a three-part formula that assumed a cache size and gave a cost of

  • “tablescan cost + num_distinct * subquery unit cost” for values of num_distinct up to the cache size;
  • then, for values of num_distinct greater than the cache_size and up to half the size of the table added a marginal cost representing the probability that some values would not be cached;
  • then for values of num_distinct greater than half the number of rows in the table reported the cost associated with num_distinct = half the number of rows in the table.

Hence:

  • for 1 <= num_distinct <= 10922, cost = (33 + num_distinct + 17)
  • for 10,923 <= num_distinct <= 50,000, cost = (33 + 10,922 * 17) + (1 – 10,922/num_distinct) * 100,000 * 17
  • for 50,000 <= num_distinct <= 100,000, cost = cost(50,000).

The middle line needs a little explanation: ( 1-10,922 / num_distinct ) is the probability that a value will not be in the cache; this has to be 100,000 to give the expected number of rows that will not be cached, and then multiplied by 17 as the cost of running the subquery for those rows.

The middle line can be re-arranged as 33 + 17 * (10,922 + (1 – 10,922/num_distinct) * 100,000)

Tweaking

At this point I could modify my code loop to report the calculated value for the cost and compare it with the actual cost to show you that the two values didn’t quite match. Instead I’ll jump forward a little bit to a correction that needs to be made to the formula above. It revolves around how Oracle determines the cache size. There’s a hidden parameter (which I mentioned in CBO Fundamentals) that controls scalar subquery caching. In the book I think I only referenced it in the context of subqueries in the “where” clause. The parameter is “_query_execution_cache_max_size” and has a default value of 131072 (power(2,7)) – so when I found that the initial formula didn’t quite work I made the following observation:

  • 131072 / 10922 = 12.00073
  • 131072 / 12 = 10922.666…

So I put 1092.66667 into the formula to see if that would improve things.

For the code change I added a variable m_cost to the PL/SQL block, and set it inside the loop as follows:

m_cost := round(33 + 17 * (10922.66667 + 100000 * (1 - (10922.66667 / m_distcnt))));

Then in the “execute immediate” I changed the “explain plan” line to read:

explain plan set statement_id = ''' || lpad(m_distcnt,7) || ' - ' || lpad(m_cost,8) ||

This allowed me to show the formula’s prediction of (IO)cost in final output, and here’s what I got for values of num_distinct in the region of 10,922:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  10911 -   183901                 185520            1.1834E+10                187887
  10912 -   184057                 185537         17 1.1835E+10    1083059     187904
  10913 -   184212                 185554         17 1.1836E+10    1083058     187921
  10914 -   184368                 185571         17 1.1837E+10    1083059     187938
  10915 -   184524                 185588         17 1.1838E+10    1083058     187956
  10916 -   184680                 185605         17 1.1839E+10    1083059     187973
  10917 -   184836                 185622         17 1.1841E+10    1083059     187990
  10918 -   184992                 185639         17 1.1842E+10    1083058     188007
  10919 -   185147                 185656         17 1.1843E+10    1083059     188025
  10920 -   185303                 185673         17 1.1844E+10    1083058     188042
  10921 -   185459                 185690         17 1.1845E+10    1083059     188059
  10922 -   185615                 185707         17 1.1846E+10    1083058     188076
  10923 -   185770                 185770         63 1.1850E+10    4027171     188140
  10924 -   185926                 185926        156 1.1860E+10    9914184     188298
  10925 -   186081                 186081        155 1.1870E+10    9912370     188455
  10926 -   186237                 186237        156 1.1880E+10    9910555     188613
  10927 -   186393                 186393        156 1.1890E+10    9908741     188770
  10928 -   186548                 186548        155 1.1900E+10    9906928     188928
  10929 -   186703                 186703        155 1.1909E+10    9905114     189085
  10930 -   186859                 186859        156 1.1919E+10    9903302     189243

The formula is only supposed to work in the range 10923 – 50,000, so the first few results don’t match; but in the range 10,923 to 10,930 the match is exact. Then, in the region of 50,000 we get:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  49991 -  1514281                1514281            9.6488E+10               1533579
  49992 -  1514288                1514288          7 9.6489E+10     473357    1533586
  49993 -  1514296                1514296          8 9.6489E+10     473337    1533594
  49994 -  1514303                1514303          7 9.6490E+10     473319    1533601
  49995 -  1514311                1514311          8 9.6490E+10     473299    1533609
  49996 -  1514318                1514318          7 9.6491E+10     473281    1533616
  49997 -  1514325                1514325          7 9.6491E+10     473262    1533624
  49998 -  1514333                1514333          8 9.6492E+10     473243    1533631
  49999 -  1514340                1514340          7 9.6492E+10     473224    1533639
  50000 -  1514348                1514348          8 9.6493E+10     473205    1533646
  50001 -  1514355                1514348          0 9.6493E+10          0    1533646
  50002 -  1514363                1514348          0 9.6493E+10          0    1533646
  50003 -  1514370                1514348          0 9.6493E+10          0    1533646
  50004 -  1514377                1514348          0 9.6493E+10          0    1533646
  50005 -  1514385                1514348          0 9.6493E+10          0    1533646
  50006 -  1514392                1514348          0 9.6493E+10          0    1533646
  50007 -  1514400                1514348          0 9.6493E+10          0    1533646
  50008 -  1514407                1514348          0 9.6493E+10          0    1533646
  50009 -  1514415                1514348          0 9.6493E+10          0    1533646
  50010 -  1514422                1514348          0 9.6493E+10          0    1533646

Again, the formula applies only in the range up to 50,000 (half the rows in the table) – and the match is perfect in that range.

Next steps

The work so far gives us some idea of the algorithm that the optimizer is using to derive a cost, but this is just one scenario and there are plenty of extra questions we might ask. What, as the most pressing one, is the significance of the number 12 in the calculation 131,072/12. From previous experience I guess that is was related to the length of the input and output values of the scalar subquery – as in “value X for n1 returns value Y for count(*)”.

To pursue this idea I recreated the data sets using varchar2(10) as the definition of n1 and lpad(rownum,10) as the value – the “breakpoint” dropped from 10,922 down to 5,461. Checking the arithmetic 131,072 / 5461 = 24.001456, then 131,072/24 = 5461.333… And that’s the number that made fhe formular work perfectly for the modified data set.

Then I set used set_column_stats() to hack the avg_col_,len of t_100K.n1 to 15 and the break point dropped to 4,096.  Again we do the two arithmetic steps: 131072/4096 = 32 (but then we don’t need to do the reverse step since the first result is integral).

Checking the original data set when n1 was a numeric the avg_col_len was 5, so we have three reference points:

  • Avg_col_len = 5. “Cache unit size” = 12
  • Avg_col_len = 11. Cache unit size = 24 (don’t forget the avg_col_len includes the length byte, so our padded varchar2(10) has a length of 11).
  • Avg_col_len = 15, Cache unit size = 32

There’s an obvious pattern here: “Cache unit size” = (2 x avg_col_len + 2).  Since I hadn’t been changing the t_1k.n1 column at the same time, that really does look like a deliberate factor of 2 (I’d thought intially that maybe the 12 was affected by the lengths of both columns in the predicate – but that doesn’t seem to be the case.)

The scientific method says I should now make a prediction based on my hypothesis – so I set the avg_col_len for t_100K.n1 to 23 and guessed that the break point would be at 2730 – and it was.  (131072 / (2 * 23 + 2) = 2730.6666…) .

The next question, of course, is “where does the “spare 2″ come from?” Trying to minimize the change in the code I modified my subquery to select sum(to_number(n1)) rather than count(*), then to avg(to_number(n1)) – remember I had changed n1 to a varchar2(10) that looked like a number left-padded with spaces. In every variant of the tests I’d done so far all I had to do to get an exact match between the basic formula and the optimizer’s cost calculation was to use “2 * avg_col_len + 22” as the cache unit size – and 22 is the nominal maximum length of an internally stored numeric column.

Bottom line: the cache unit size seems to be related to the input and output values, but I don’t know why there’s a factor of 2 applied to the input column length, and I don’t know why the length of count(*) is deemed to be 2 when other derived numeric outputs use have the more intuitive 22 for their length.

tl;dr

The total cost calculation for a scalar subquery in the select list is largely affected by:

  • a fixed cache size (131,072 bytes) possibly set by hidden parameter _query_execution_cache_max_size
  • the avg_col_len of the input (correlating) column(s) from the driving table
  • the nominal length of the output (select list) of the subquery

There is an unexplained factor of 2 used with the avg_col_len of the input, and a slightly surprising value of 2 if the output is simply count(*).

If the number N of distinct values for the driving column(s) is less than the number of possible cache entries the effect of the scalar subquery is to add N * estimated cost of executing the subquery once.  As the number of distinct values for the driving column(s) goes above the limit then the incremental effect of the subquery is based on the expected number of times an input value will not be cached. When the number of distinct values in the driving column(s) exceeds half the number of rows in the driving table the cost stops increasing – there is no obvious reason when the algorithm does this.

There are many more cases that I could investigate at this point – but I think this model is enough as an indication of general method. If you come across a variation where you actually need to work out how the optimizer derived a cost then this framework will probably be enough to get you started in the right direction.

 

June 3, 2019

So long …

Filed under: Non-technical — Jonathan Lewis @ 2:09 pm BST Jun 3,2019

… and thanks for any Douglas Adams references.

I decided on Friday that after more than 31 years using the Oracle software and more than 37 years of being self-employed in the computer industry – not to mention a further 8 years being an enthusiastic amateur – the time had finally come for a change of pace.  As from 1st June 2019 I am retired.

I have a couple of long term commitments that I’ll still be doing a little work for over the next few months, and I wouldn’t be surprised if someone occasionally tempts me into doing a few days solving interesting problems.  I’ll still be answering questions on the Oracle Developer Community forum and (if the email gets through) on the Oracle-L listserver , and I’ll still be writing blog notes about Oracle and submitting abstracts to conferences for a few years. Who knows, I may even get round to writing another book – after all, I’ve already downloaded Oracle 19 so I’ll still be experimenting, investigating, and trying to break things Some people retire to do the Times crossword or Sudoku, my retirement pastime is going to be playing with Oracle.

But now I’ve got 3 musical instruments to re-learn, a garden to re-engineer, apple trees to train, and a lot of rowing [U.S. “crew”] time to catch up on so that I can get fit enough to get back to playing squash (which might be an interesting experience now that my eyesight has been fixed). And for relaxation I’ll be sitting in the garden watching the birds, drinking fine wine and reading a good book.

Update (6th June 2019)

Thanks for all the appreciative comments to know. It’s nice to have a record, both here and on the tweet, of the fact that I have made a difference for so many people.

Ignoring Hints

Filed under: 18c,Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 8:36 am BST Jun 3,2019

One of the small changes (and, potentially big but temporary, threats) in 18.3 is the status of the “ignore hints” parameter. It ceases to be a hidden (underscore) parameter so you can now officially set parameter optimizer_ignore_hints to true in the parameter file, or at the system level, or at the session level. The threat, of course, it that some of your code may use the hidden version of the parameter (perhaps in an SQL_Patch as an opt_param() option rather than in its hint form) which no longer works after the upgrade.

But there’s more. The parameter (whether the old hidden version or the new revealed version) doesn’t make the optimizer ignore parallel() hints. But 18.3 now has a related parameter optimizer_ignore_parallel_hints to address this limitation. Here’s a quick demo – we start by creating a table and then running a query where the full tablescan is clearly the default strategy that the optimizer would take if we didn’t hint an indexed access path:

rem
rem     Script:         ignore_parallel_hints.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,10)                  n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(id);

set serveroutput off

prompt  =============
prompt  Baseline test
prompt  =============

select
        /*+ index(t1) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);


SQL_ID  gudnnk7j7q5bz, child number 0
-------------------------------------
select  /*+ index(t1) */  n1, sum(id) from  t1 where  id > 0 group by
n1 order by  n1

Plan hash value: 356059923

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |   198 (100)|          |
|   1 |  SORT GROUP BY                       |       |    10 |    70 |   198   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    | 10000 | 70000 |   196   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 | 10000 |       |    22   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">0)


Now we repeat the exercise with the version-specific “alter session” command below – and you should try each option with each version of Oracle if you want to do the complete test cycle – to see that the session will ignore hints and the plan will change (side note – using the underscore version  with 18.3 doesn’t raise an error, the statement is silently ignored):


alter session set "_optimizer_ignore_hints" = true;
alter session set "optimizer_ignore_hints" = true;

SQL_ID  gudnnk7j7q5bz, child number 1
-------------------------------------
select  /*+ index(t1) */  n1, sum(id) from  t1 where  id > 0 group by
n1 order by  n1

Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    27 (100)|          |
|   1 |  SORT GROUP BY     |      |    10 |    70 |    27  (12)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 10000 | 70000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID">0)

Then, of course, we have to test a query with a parallel() hint – or shared() hint, which is the internal equivalent you will notice occasionally in outlines or the “remote” statement for distributed execution plans – to show that we don’t yet ignore parallel queries – the plans following the code are from 18.3:


select
        /*+ parallel(t1 3) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);

alter session set "optimizer_ignore_parallel_hints" = true;

select
        /*+ parallel(t1 3) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);




SQL_ID  7jynurdtc48kv, child number 0
-------------------------------------
select  /*+ parallel(t1 3) */  n1, sum(id) from  t1 where  id > 0 group
by  n1 order by  n1

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |       |       |    10 (100)|          |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT GROUP BY         |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE       | :TQ10000 |    10 |    70 |    10  (10)| 00:00:01 |  Q1,00 | P->P | RANGE      |
|   6 |       HASH GROUP BY      |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |          | 10000 | 70000 |     9   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL| T1       | 10000 | 70000 |     9   (0)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access(:Z>=:Z AND :Z<=:Z) -- > comment added to avoid wordpress format issue
       filter("ID">0)

Note
-----
   - Degree of Parallelism is 3 because of table property


Session altered.


SQL_ID  7jynurdtc48kv, child number 1
-------------------------------------
select  /*+ parallel(t1 3) */  n1, sum(id) from  t1 where  id > 0 group
by  n1 order by  n1

Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    27 (100)|          |
|   1 |  SORT GROUP BY     |      |    10 |    70 |    27  (12)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 10000 | 70000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID">0)


tl;dr

In 18.3 Oracle exposes the parameter optimizer_ignore_hints – any code using depending on the hidden version of this parameter will no longer behave as expected. 18.3 also introduces optimiser_ignore_parallel_hints to allow you to ignore parallel hints as well.

 

May 29, 2019

Timestamp Oddity

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:17 pm BST May 29,2019

[Editorial note: this is something I started writing in 2013, managed to complete in 2017, and still failed to publish. It should have been a follow-on to another posting on the oddities of timestamp manipulation.]

Just as national language support used to be, timestamps and time-related columns are still a bit of a puzzle to the Oracle world – so much so that OEM could cripple a system if it was allowed to do the check for “failed logins over the last 30 minutes”. And, just like NLS, it’s one of those things that you use so rarely that you keep forgetting what went wrong the last time you used it. Here’s one little oddity that I reminded myself about recently:

rem
rem     Script:         timestamp_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          April 2013
rem
create table t1 (
        ts_tz   timestamp(9) with time zone,
        ts_ltz  timestamp(9) with local time zone
);

insert into t1 values(systimestamp, systimestamp);
commit;

alter table t1 add constraint ts_ltz_uk unique (ts_ltz);
alter table t1 add constraint ts_tz_uk  unique (ts_tz);


Nothing terribly difficult – just a table with two variants on the timestamp data type and a unique constraint on both: except for one problem. Watch what happens as I create the unique constraints:

SQL> alter table t1 add constraint ts_ltz_uk unique (ts_ltz);

Table altered.

SQL> alter table t1 add constraint ts_tz_uk  unique (ts_tz);
alter table t1 add constraint ts_tz_uk  unique (ts_tz)
                                        *
ERROR at line 1:
ORA-02329: column of datatype TIME/TIMESTAMP WITH TIME ZONE cannot be unique or a primary key

Slightly unexpected – unless you’ve memorized the manuals, of course, which I hadn’t. I wonder if you can create a unique index on timestamp with time zone:


SQL> create unique index ts_tz_uk on t1(ts_tz);

Index created.

You can’t have a unique constraint, but you CAN create a unique index! How curious – did that really happen ?

SQL> select index_name, column_name from user_ind_columns where table_name = 'T1';

INDEX_NAME           COLUMN_NAME
-------------------- --------------------
TS_LTZ_UK            TS_LTZ
TS_TZ_UK             SYS_NC00003$

The index is on a column called SYS_NC00003$ – which looks suspiciously like one of those “function-based-index” things:


SQL> select * from user_ind_expressions where table_name = 'T1';

INDEX_NAME           TABLE_NAME           COLUMN_EXPRESSION                        COLUMN_POSITION
-------------------- -------------------- ---------------------------------------- ---------------
TS_TZ_UK             T1                   SYS_EXTRACT_UTC("TS_TZ")                               1

Oracle has silently invoked the sys_extract_utc() function on our (free-floating) timestamp column to normalize it to UTC. This is really not very friendly but it does make sense, of course – it would be rather expensive to enforce uniqueness if there were (at least) 24 different ways of storing the same absolute value – and 24 is a conservative estimate.

 

 

May 27, 2019

Re-partitioning 2

Filed under: 12c,Infrastructure,Oracle,Partitioning,Uncategorized — Jonathan Lewis @ 8:20 pm BST May 27,2019

Last week I wrote a note about turning a range-partitioned table into a range/list composite partitioned table using features included in 12.2 of Oracle. But my example was really just an outline of the method and bypassed a number of the little extra problems you’re likely to see in a real-world system, so in this note I’m going to bring in an issue that you might run into – and which I’ve seen appearing a number of times: ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION.

It’s often the case that a system has a partitioned table that’s been around for a long time, and over its lifetime it may have had (real or virtual) columns added, made inivisble, dropped, or mark unused. As a result you may find that the apparent definition of the table is not the same as the real definition of the table – and that’s why Oracle has given us (in 12c) the option to “create table for exchange”.

You might like to read a MoS note giving you one example of a problem with creating an exchange table prior to this new feature. ORA-14097 At Exchange Partition After Adding Column With Default Value (Doc ID 1334763.1) I’ve created a little model by cloning the code from that note.


rem
rem     Script:         pt_exchange_problem.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table mtab (pcol number)
partition by list (pcol) (
        partition p1 values (1),
        partition p2 values (2)
);

alter table mtab add col2 number default 0 not null;

prompt  ========================================
prompt  Traditional creation method => ORA-14097
prompt  ========================================

create table mtab_p2 as select * from mtab where 1=0;
alter table mtab exchange partition P2 with table mtab_p2;

prompt  ===================
prompt  Create for exchange
prompt  ===================

drop table mtab_p2 purge;
create table mtab_p2 for exchange with table mtab;
alter table mtab exchange partition P2 with table mtab_p2;

[/sourcecode}


Here's the output from running this on an instance of 18.3


Table created.

Table altered.

========================================
Traditional creation method => ORA-14097
========================================

Table created.

alter table mtab exchange partition P2 with table mtab_p2
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

===================
Create for exchange
===================

Table dropped.


Table created.


Table altered.

So we don’t have to worry about problems creating an exchange table in Oracle 12c or later. But we do still have a problem if we’re trying to convert our range-partitioned table into a range/list composite partitioned table by doing using the “double-exchange” method. In my simple example I used a “create table” statement to create an empty table that we could exchange into; but without another special version of a “create table” command I won’t be able to create a composite partitioned table that is compatible with the simple table that I want to use as my intermediate table.

Here’s the solution to that problem – first in a thumbnail sketch:

  • create a table for exchange (call it table C)
  • alter table C modify to change it to a composite partitioned table with one subpartition per partition
  • create a table for exchange (call it table E)
  • Use table E to exchange partitions from the original table to the (now-partitioned) table C
  • Split each partition of table C into the specific subpartitions required

And now some code to work through the details – first the code to create and populate the partitioned table.


rem
rem     Script:         pt_comp_from_pt_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

drop table t purge;
drop table pt_range purge;
drop table pt_range_list purge;

-- @@setup

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
	rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

Then some code to create the beginnings of the target composite partitioned table. We create a simple heap table “for exchange”, then modify it to be a composite partitioned table with a named starting partition and high_value and a template defining a single subpartition then, as a variant on the example from last week, specifying interval partitioning.


prompt	==========================================
prompt	First nice feature - "create for exchange"
prompt	==========================================

create table pt_range_list for exchange with table pt_range;

prompt	============================================
prompt	Now alter the table to composite partitioned
prompt	============================================

alter table pt_range_list modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_def      values(default)
)
(
	partition p200 values less than (200)
)
;

If you want to do the conversion from range partitioning to interval partitioning you will have to check very carefully that your original table will be able to convert safely – which means you’ll need to check that the “high_value” values for the partitions are properly spaced to match the interval you’ve defined and (as a special requirement for the conversion) there are no omissions from the current list of high values. If your original table doesn’t match these requirement exactly you may end up trying to exchange data into a partition where it doesn’t belong; for example, if my original table had partitions with high value of 200, 600, 800 then there may be values in the 200-399 range currently stored in the original “600” range partition which shouldn’t go into the new “600” interval partition. You may find you have to split (and/or merge) a few partitions in your range-partitioned table before you can do the main conversion.

Now we create create the table that we’ll actually use for the exchange and go through each exchange in turn. Because I’ve got an explicitly named starting partition the first exchange takes only two steps – exchange out, exchange in. But because I’m using interval partitioning in the composite partitioned table I’m doing a “lock partition” before the second exchange on all the other partitions as this will bring the required target partition into existence. I’m also using the “[sub]partition for()” syntax to identify the pairs of [sub]partitions – this isn’t necessary for the original range-partitioned table, of course, but it’s the only way I can identify the generated subpartitions that will appear in the composite partitioned table.


create table t for exchange with table pt_range;

prompt	=======================================================================
prompt	Double exchange to move a partition to become a composite subpartition
prompt	Could drive this programatically by picking one row from each partition
prompt	=======================================================================

alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition for (399) with table t;
lock  table pt_range_list partition for (399) in exclusive mode;
alter table pt_range_list exchange subpartition for (399,'0') with table t;

alter table pt_range exchange partition for (599) with table t;
lock  table pt_range_list partition for (599) in exclusive mode;
alter table pt_range_list exchange subpartition for (599,'0') with table t;

prompt	=====================================
prompt	Show that we've got the data in place
prompt	=====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

Now that the data is in the target table we can split each default subpartition into the four subpartitions that we want for each partition. To cater for the future, though, I’ve first modified the subpartition template so that each new partition will have four subpartitions (though the naming convention won’t be applied, of course, Oracle will generate system name for all new partitions and subpartitions).


prompt  ================================================
prompt  Change the subpartition template to what we want
prompt  ================================================

alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  ====================================================
prompt  Second nice feature - multiple splits in one command
prompt  Again, first split is fixed name.
prompt  We could do this online after allowing the users in
prompt  ====================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition for (399,'0')
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition for (599,'0')
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Finally a little demonstration that we can’t add an explicitly named partition to the interval partitioned table; then we insert a row to generate the partition and show that it has 4 subpartitions.

Finishing off we rename everything (though that’s a fairly pointless exercise).


prompt  ==============================================================
prompt  Could try adding a partition to show it uses the new template
prompt  But that's not allowed for interval partitions: "ORA-14760:"
prompt  ADD PARTITION is not permitted on Interval partitioned objects
prompt  So insert a value that would go into the next (800) partition
prompt  ==============================================================

alter table pt_range_list add partition p800 values less than (800);

insert into pt_range_list (
        id, grp, small_vc, padding
)
values ( 
        799, '0', lpad(799,10,'0'), rpad('x',100,'x')
)
;

commit;

prompt  ===================================================
prompt  Template naming is not used for the subpartitions,
prompt  so we have to use the "subpartition for()" strategy 
prompt  ===================================================

alter table pt_range_list rename subpartition for (799,'0') to p800_p_0;
alter table pt_range_list rename subpartition for (799,'1') to p800_p_1;
alter table pt_range_list rename subpartition for (799,'2') to p800_p_2;
alter table pt_range_list rename subpartition for (799,'3') to p800_p_def;

prompt  ==============================================
prompt  Might as well clean up the partition names too
prompt  ==============================================

alter table pt_range_list rename partition for (399) to p400;
alter table pt_range_list rename partition for (599) to p600;
alter table pt_range_list rename partition for (799) to p800;

prompt  =======================================
prompt  Finish off by listing the subpartitions 
prompt  =======================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

It’s worth pointing out that you could do the exchanges (and the splitting and renaming at the same time) through some sort of simple PL/SQL loop – looping through the named partitions in the original table and using a row from the first exchange to drive the lock and second exchange (and splitting and renaming). For exanple something like the following which doesn’t have any of the error-trapping and defensive mechanisms you’d want to use on a production system:



declare
        m_pt_val number;
begin
        for r in (select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position) 
        loop
                execute immediate
                        'alter table pt_range exchange partition ' || r.partition_name ||
                        ' with table t';
        
                select id into m_pt_val from t where rownum = 1;
        
                execute immediate 
                        'lock table pt_range_list partition for (' || m_pt_val || ') in exclusive mode';
        
                execute immediate
                        'alter table pt_range_list exchange subpartition  for (' || m_pt_val || ',0)' ||
                        ' with table t';
        
        end loop;
end;
/

If you do go for a programmed loop you have to be really careful to consider what could go wrong at each step of the loop and how your program is going to report (and possibly attempt to recover) the situation. This is definitely a case where you don’t want code with “when others then null” appearing anywhere, and don’t be tempted to include code to truncate the exchange table.

 

May 24, 2019

Re-partitioning – 18

Filed under: 18c,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 12:50 pm BST May 24,2019

In yesterday’s note on the options for converting a range-partioned table into a composite range/list parititioned table I mentioned that you could do this online with a single command in 18c, so here’s some demonstration code to demonstrate that claim:


rem
rem     Script:         pt_comp_from_pt_18.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum - 1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

alter table pt_range modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
(
        partition p200 values less than (200)
)
-- online
;

execute dbms_stats.gather_table_stats(null, 'pt_range', granularity=>'all')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE'
order by
        partition_name, subpartition_name
;

Run this (with or without the online option) and you’ll (probably) see the Oracle error “ORA-00604: error occurred at recursive SQL level 1” with one of two underlying errors:

    ORA-01950: no privileges on tablespace 'SYSTEM'
    ORA-01536: space quota exceeded for tablespace 'SYSTEM'

So what’s gone wrong – it ought to work.

After enabling a 10046 trace I repeated the “alter table” command then scanned the trace file for the text “err=1950” (that being the error I’d received on my first attempt) and scanned backwards for the “PARSING IN CURSOR” line with a matching cursor id:


ERROR #139721552722200:err=1950 tim=26541227462

PARSING IN CURSOR #139721552722200 len=182 dep=1 uid=104 oct=1 lid=0 tim=26541224560 hv=2451601965 ad='7f1377267890' sqlid='0wsjfgk920yjd'
create table  "TEST_USER"."SYS_RMTAB$$_H124028"  ( src_rowid rowid not null , tgt_rowid rowid not null) 
    segment creation immediate nologging 
    tablespace  "SYSTEM"  
    rowid_mapping_table
END OF STMT

The code is trying to create a “rowid_mapping_table” in the system tablespace and I have no quota for the tablespace. (The 124028 in the table name relates to the object_id of the table I was trying to modify, by the way.)

The source of the error offered a big clue about a possible workaround sp I gave myself a quota (unlimited) on the system tablespace (alter user test_user quota unlimited on system) and that made it possible for the restructuring to take place. It’s not really an appropriate workaround for a production system though – especially if you’re using the online option and the table is subject to a lot of change.  (Note – this “rowid_mapping_table” and a “journal” table are created even if you haven’t selected the online option.)

Footnotes

  • The problem has been fixed in 19c (tested on LiveSQL)  and is listed on MoS as Bug 27580976 : INTERNAL RECURSIVE MAPPING TABLE MISTAKENLY PLACE IN SYSTEM FOR ONLINE OPS. There are no patches for 18c at present.
  • After I’d finished testing the quota workaround I tried to deprive myself of the quota on the system tablespace. I may have missed something in the manuals but it looks like the only way to do this is to give myself a zero quota (or, as I have done occasionally in the past, drop user cascade) because there’s no option for “quota denied” or “revoke quota” . This is why you may get one of two different messages after the ORA-00604. If you’ve never had a quota on the system tablespace you’ll get the “ORA-1950: no privileges” message, if you’ve had a quota at some time in the pasat and then had it set to zero’ you’ll get the “ORA-01536: space quota exceeded” message.

 

May 23, 2019

Re-partitioning

Filed under: 12c,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 11:45 am BST May 23,2019

I wrote a short note a little while ago demonstrating how flexible Oracle 12.2 can be about physically rebuilding a table online to introduce or change the partitioning while discarding data, and so on.  But what do you do (as a recent question on ODC asked) if you want to upgrade a customer’s database to meet the requirements of a new release of your application by changing a partitioned table into a composite partitioned table and don’t have enough room to do an online rebuild. Which could require two copies of the data to exist at the same time.)

If you’ve got the down time (and not necessarily a lot is needed) you can fall back on “traditional methods” with some 12c enhancements. Let’s start with a range partitioned table:


rem
rem     Script:         pt_comp_from_pt.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600
;

commit;

So we’ve got a range-partitioned table with three partitions and some data in each partition. Let’s pretend we want to change this to range/list with the grp column as the subpartition key, allowing explicit use of values 0,1,2 and a bucket subpartition for anything else. First we create an empty version of the table with a suitable subpartition template, and a simple heap table to be used as an exchange medium:


create table pt_range_list (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id)
subpartition by list (grp)
subpartition template (
        subpartition p_def      values(default)
)
(
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

prompt  ===============================================
prompt  First nice 12.2 feature - "create for exchange"
prompt  ===============================================

create table t for exchange with table pt_range;

You’ll notice that our subpartition template identifies just a single subpartition that takes default values – i.e. anything for which no explicit subpartition has been identified. This means we have a one to one correspondance between the data segments of the original table and the copy table. So now we go through a tedious loop (which we could code up with a PL/SQL “execute immediate” approach) to do a double-exchange for each partition in turn. (Any PL/SQL code is left as an exercise to the interested reader.)


alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition p400 with table t;
alter table pt_range_list exchange subpartition p400_p_def with table t;

alter table pt_range exchange partition p600 with table t;
alter table pt_range_list exchange subpartition p600_p_def with table t;

prompt  =====================================
prompt  Show that we've got the data in place
prompt  =====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;


PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_DEF                    200

P400                   P400_P_DEF                    200

P600                   P600_P_DEF                    200


3 rows selected.

We now have to split the newly arrived subpartitions into the 4 pieces we want – but before we do that let’s make sure that any new partitions automatically have the correct subpartitions by changing the subpartition template:


alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  =========================================================
prompt  Second nice 12.2 feature - multiple splits in one command
prompt  We could do this online after allowing the users back on.
prompt  =========================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition p400_p_def
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition p600_p_def
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Now, just to check that everything is behaving, let’s add a new partition, and check to see what partitions and subpartitions we end up with:


alter table pt_range_list add partition p800 values less than (800);

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_0                      100
                       P200_P_1                      100
                       P200_P_2                        0
                       P200_P_DEF                      0

P400                   P400_P_0                      100
                       P400_P_1                      100
                       P400_P_2                        0
                       P400_P_DEF                      0

P600                   P600_P_0                      100
                       P600_P_1                      100
                       P600_P_2                        0
                       P600_P_DEF                      0

P800                   P800_P_0                        0
                       P800_P_1                        0
                       P800_P_2                        0
                       P800_P_DEF                      0


16 rows selected.

And as a final note – if we decide we want to put it all back we could merge four subpartitions down to one subpartition with a single command – then loop through every partition in turn:


alter table pt_range_list
        merge subpartitions  p200_p_0, p200_p_1, p200_p_2, p200_p_def
        into  subpartition  p200_p_def
;

And now I feel like I’m turning into Tim Hall – writing potentially useful demonstrations instead of trying to baffle people with rocket science. But I hope to get over that quite soon. Of course I have left out some important “real-world” details – particularly how you choose to handle indexes while doing the double-exchange. My view would be to take the local indexes with you on the exchange, bypass the global indexes on the exchange out, and be choosy about which global indexes to maintain on the exchange back in; but it all depends on how much downtime you have, how many indexes there are, and the state they’re likely to start or end in.

As ever it’s possible to start with a simple idea like this, then discover there are real-world complications that have to be dealt with. So there’s another article in the pipeline to handle a slightly more complex case. I’ll also be publishing a short note about the easy way of getting the job done from 18c onwards – if you’ve got the spare resources.

 

May 22, 2019

Danger – Hints

Filed under: 12c,Bugs,Hints,Oracle — Jonathan Lewis @ 2:56 pm BST May 22,2019

It shouldn’t be possible to get the wrong results by using a hint – but hints are dangerous and the threat may be there if you don’t know exactly what a hint is supposed to do (and don’t check very carefully what has happened when you’ve used one that you’re not familiar with).

This post was inspired by a blog note from Connor McDonald titled “Being Generous to the Optimizer”. In his note Connor gives an example where the use of “flexible” SQL results in an execution plan that is always expensive to run when a more complex version of the query could produce a “conditional” plan which could be efficient some of the time and would be expensive only when there was no alternative. In his example he rewrote the first query below to produce the second query:


select data
from   address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select  data
from    address
where   ( :choice = 1 and street = :val )
union all
select  data
from    address
where   ( :choice = 2 and suburb = :val );

(We are assuming that bind variable :choice is constrained to be 1 or 2 and no other value.)

In its initial form the optimizer had to choose a tablescan for the query, in its final form the query can select which half of a UNION ALL plan to execute because the optimizer inserts a pair of FILTER operations that check the actual value of :choice at run-time.

When I started reading the example my first thought was to wonder why the optimizer hadn’t simply used “OR-expansion” (or concatenation if you’re running an older version), then I remembered that by the time the optimizer really gets going it has forgotten that “:choice” is the same bind variable in both cases, so doesn’t realise that it would use only one of two possible predicates. However, that doesn’t mean you can’t tell the optimizer to use concatenation. Here’s a model – modified slightly from Connor’s original:


drop table address purge;
create table address ( street int, suburb int, post_code int,  data char(100));

insert into address
select mod(rownum,1e4), mod(rownum,10), mod(rownum,1e2), rownum
from dual connect by level  <= 1e5 -- > comment to avoid WordPress format issue
;

commit;

exec dbms_stats.gather_table_stats('','ADDRESS')

create index ix1 on address ( street );
create index ix2 on address ( suburb );
create index ix3 on address ( post_code );

variable val number = 6
variable choice number = 1

alter session set statistics_level = all;
set serveroutput off
set linesize 180
set pagesize 60

select
        /*+ or_expand(@sel$1) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

I’ve added one more column to the table and indexed it – I’ll explain why later. I’ve also modified the query to show the output but restricted the result set to a count of the data column rather than a (long) list of rows.

Here’s the execution plan output when hinted:


SQL_ID  6zsh2w6d9mddy, child number 0
-------------------------------------
select  /*+ or_expand(@sel$1) */  count(data) from    address where  (
:choice = 1 and street = :val ) or     ( :choice = 2 and suburb = :val )

Plan hash value: 3986461375

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |      12 |     27 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |      12 |     27 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |     10 |00:00:00.01 |      12 |     27 |
|   3 |    UNION-ALL                            |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|*  4 |     FILTER                              |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      1 |     10 |     10 |00:00:00.01 |      12 |     27 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      1 |     10 |     10 |00:00:00.01 |       2 |     27 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |      0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

As you can see we have a UNION ALL plan with two FILTER operations, and the filter operations allow one or other of the two branches of the UNION ALL to execute depending on the value for :choice. Since I’ve reported the rowsource execution statistics you can also see that the table access through index range scan (operations 5 and 6) has executed once (Starts = 1) but the tablescan (operation 8) has not been executed at all.

If you check the Predicate Information you will see that operation 8 has introduced two lnnvl() predicates. Since the optimizer has lost sight of the fact that :choice is the same variable in both cases it has to assume that sometimes both branches will be relevant for a single execution, so it has to add predicates to the second branch to eliminate data that might have been found in the first branch. This is the (small) penalty we pay for avoiding a “fully-informed” manual rewrite.

Take a look at the Outline Data – we can see our or_expand() hint repeated there, and we can discover that it’s been enhanced. The hint should have been or_expand(@sel$1 (1) (2)). This might prompt you to modify the original SQL to use the fully qualified hint rather than the bare-bones form we’ve got so far. So let’s assume we do that before shipping the code to production.

Now imagine that a couple of months later an enhancement request appears to allow queries on post_code and the front-end has been set up so that we can specify a post_code query by selecting choice number 3. The developer who happens to pick up the change request duly modifies the SQL as follows:


select
        /*+ or_expand(@sel$1 (1) (2)) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
or     ( :choice = 3 and post_code = :val)
;

Note that we’ve got the “complete” hint in place, but there’s now a 3rd predicate. Do you think the hint is still complete ? What do you think will happen when we run the query ? Here’s the execution plan when I set :choice to 3.


select  /*+ or_expand(@sel$1 (1) (2)) */  count(data) from    address
where  ( :choice = 1 and street = :val ) or     ( :choice = 2 and
suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 3986461375

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |      0 |00:00:00.01 |
|   3 |    UNION-ALL                            |                 |      1 |        |      0 |00:00:00.01 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

We get a UNION ALL with two branches, one for :choice = 1, one for :choice = 2 and both of them show zero starts – and we don’t have any part of the plan to handle :choice = 3. The query returns no rows – and if you check the table creation code you’ll see it should have returned 1000 rows. An incorrect (historically adequate) hint has given us wrong results.

If we want the full hint for this new queryy we need to specify the 3rd predicate, by adding (3) to the existing hint to get the following plan (and correct results):


select  /*+ or_expand(@sel$1 (1) (2) (3)) */  count(data) from
address where  ( :choice = 1 and street = :val ) or     ( :choice = 2
and suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 2153173029

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |    1639 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |    1639 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  11009 |   1000 |00:00:00.01 |    1639 |
|   3 |    UNION-ALL                            |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |
|*  9 |     FILTER                              |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|* 10 |      TABLE ACCESS FULL                  | ADDRESS         |      1 |    999 |   1000 |00:00:00.01 |    1639 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      OUTLINE_LEAF(@"SET$49E1C21B_2")
      OUTLINE_LEAF(@"SET$49E1C21B_1")
      OUTLINE_LEAF(@"SET$49E1C21B")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1")
      FULL(@"SET$49E1C21B_2" "ADDRESS"@"SET$49E1C21B_2")
      FULL(@"SET$49E1C21B_3" "ADDRESS"@"SET$49E1C21B_3")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))
   9 - filter(:CHOICE=3)
  10 - filter(("POST_CODE"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL)) AND (LNNVL(:CHOICE=2) OR
              LNNVL("SUBURB"=:VAL))))


We now have three branches to the UNION ALL, and the final branch (:choice =3) ran to show A-rows = 1000 selected in the tablescan.

Conclusion

You shouldn’t mess about with hints unless you’re very confident that you know how they work and then test extremely carefully – especially if you’re modifying old code that already contains some hints.

 

Next Page »

Powered by WordPress.com.