Oracle Scratchpad

June 22, 2019

ANSI bug

Filed under: Oracle,ANSI Standard,Bugs — Jonathan Lewis @ 1:01 pm BST Jun 22,2019

The following note is about a script that I found on my laptop while I was searching for some details about a bug that appears when you write SQL using the ANSI style format rather than traditional Oracle style. The script is clearly one that I must have cut and pasted from somewhere (possibly the OTN/ODC database forum) many years ago without making any notes about its source or resolution. All I can say about it is that the file has a creation date of July 2012 and I can’t find any reference to a problem through Google searches – though the tables and even a set of specific insert statements appears in a number of pages that look like coursework for computer studies and MoS has a similar looking bug “fixed in 11.2”.

Here’s the entire script:

rem
rem     Script:         ansi_bug.sql
rem     Author:         ???
rem     Dated:          July 2012
rem

CREATE TABLE Student (
  sid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  address VARCHAR(20) NOT NULL,
  major CHAR(2)
);

CREATE TABLE Professor (
  pid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  department VARCHAR(10) NOT NULL
);

CREATE TABLE Course (
  cid INT PRIMARY KEY,
  title VARCHAR(20) NOT NULL UNIQUE,
  credits INT NOT NULL,
  area VARCHAR(5) NOT NULL
);

CREATE TABLE Transcript (
  sid INT,
  cid INT,
  pid INT,
  semester VARCHAR(9),
  year CHAR(4),
  grade CHAR(1) NOT NULL,
  PRIMARY KEY (sid, cid, pid, semester, year),
  FOREIGN KEY (sid) REFERENCES Student (sid),
  FOREIGN KEY (cid) REFERENCES Course (cid),
  FOREIGN KEY (pid) REFERENCES Professor (pid)
);

INSERT INTO Student (sid, name, address, major) VALUES (101, 'Nathan', 'Edinburg', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (105, 'Hussein', 'Edinburg', 'IT');
INSERT INTO Student (sid, name, address, major) VALUES (103, 'Jose', 'McAllen', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (102, 'Wendy', 'Mission', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (104, 'Maria', 'Pharr', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (106, 'Mike', 'Edinburg', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (107, 'Lily', 'McAllen', NULL);

INSERT INTO Professor (pid, name, department) VALUES (201, 'Artem', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (203, 'John', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (202, 'Virgil', 'MATH');
INSERT INTO Professor (pid, name, department) VALUES (204, 'Pearl', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (205, 'Christine', 'CS');

INSERT INTO Course (cid, title, credits, area) VALUES (4333, 'Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (1201, 'Comp literacy', 2, 'INTRO');
INSERT INTO Course (cid, title, credits, area) VALUES (6333, 'Advanced Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (6315, 'Applied Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (3326, 'Java', 3, 'PL');
INSERT INTO Course (cid, title, credits, area) VALUES (1370, 'CS I', 4, 'INTRO');

INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 4333, 201, 'Spring', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6315, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 4333, 203, 'Summer I', '2010', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (102, 4333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 3326, 204, 'Spring', '2008', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1201, 205, 'Fall', '2009', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1370, 203, 'Summer II', '2010', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1201, 205, 'Fall', '2009', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1370, 203, 'Summer II', '2010', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 3326, 204, 'Spring', '2001', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 6315, 203, 'Fall', '2008', 'A');

SELECT 
        pid, 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
order by pid
;

I’ve run three minor variations of the same query – the one in the middle selects two columns from a three table join using natural joins. The first query does the same but includes an extra column in the select list while the third query selects only the original columns but orders the result set by the extra column.

The middle query returns 60 rows – the first and third, with the “extra” column projected somewhere in the execution plan, return 13 rows.

I didn’t even have a note of the then-current version of Oracle when I copied this script, but I’ve just run it on 12.2.0.1, 18.3.0.0, and 19.2.0.0 (using LiveSQL), and the error reproduces on all three versions.

May 22, 2019

Danger – Hints

Filed under: 12c,Bugs,Hints,Oracle — Jonathan Lewis @ 2:56 pm BST May 22,2019

It shouldn’t be possible to get the wrong results by using a hint – but hints are dangerous and the threat may be there if you don’t know exactly what a hint is supposed to do (and don’t check very carefully what has happened when you’ve used one that you’re not familiar with).

This post was inspired by a blog note from Connor McDonald titled “Being Generous to the Optimizer”. In his note Connor gives an example where the use of “flexible” SQL results in an execution plan that is always expensive to run when a more complex version of the query could produce a “conditional” plan which could be efficient some of the time and would be expensive only when there was no alternative. In his example he rewrote the first query below to produce the second query:


select data
from   address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select  data
from    address
where   ( :choice = 1 and street = :val )
union all
select  data
from    address
where   ( :choice = 2 and suburb = :val );

(We are assuming that bind variable :choice is constrained to be 1 or 2 and no other value.)

In its initial form the optimizer had to choose a tablescan for the query, in its final form the query can select which half of a UNION ALL plan to execute because the optimizer inserts a pair of FILTER operations that check the actual value of :choice at run-time.

When I started reading the example my first thought was to wonder why the optimizer hadn’t simply used “OR-expansion” (or concatenation if you’re running an older version), then I remembered that by the time the optimizer really gets going it has forgotten that “:choice” is the same bind variable in both cases, so doesn’t realise that it would use only one of two possible predicates. However, that doesn’t mean you can’t tell the optimizer to use concatenation. Here’s a model – modified slightly from Connor’s original:


drop table address purge;
create table address ( street int, suburb int, post_code int,  data char(100));

insert into address
select mod(rownum,1e4), mod(rownum,10), mod(rownum,1e2), rownum
from dual connect by level  <= 1e5 -- > comment to avoid WordPress format issue
;

commit;

exec dbms_stats.gather_table_stats('','ADDRESS')

create index ix1 on address ( street );
create index ix2 on address ( suburb );
create index ix3 on address ( post_code );

variable val number = 6
variable choice number = 1

alter session set statistics_level = all;
set serveroutput off
set linesize 180
set pagesize 60

select
        /*+ or_expand(@sel$1) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

I’ve added one more column to the table and indexed it – I’ll explain why later. I’ve also modified the query to show the output but restricted the result set to a count of the data column rather than a (long) list of rows.

Here’s the execution plan output when hinted:


SQL_ID  6zsh2w6d9mddy, child number 0
-------------------------------------
select  /*+ or_expand(@sel$1) */  count(data) from    address where  (
:choice = 1 and street = :val ) or     ( :choice = 2 and suburb = :val )

Plan hash value: 3986461375

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |      12 |     27 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |      12 |     27 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |     10 |00:00:00.01 |      12 |     27 |
|   3 |    UNION-ALL                            |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|*  4 |     FILTER                              |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      1 |     10 |     10 |00:00:00.01 |      12 |     27 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      1 |     10 |     10 |00:00:00.01 |       2 |     27 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |      0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

As you can see we have a UNION ALL plan with two FILTER operations, and the filter operations allow one or other of the two branches of the UNION ALL to execute depending on the value for :choice. Since I’ve reported the rowsource execution statistics you can also see that the table access through index range scan (operations 5 and 6) has executed once (Starts = 1) but the tablescan (operation 8) has not been executed at all.

If you check the Predicate Information you will see that operation 8 has introduced two lnnvl() predicates. Since the optimizer has lost sight of the fact that :choice is the same variable in both cases it has to assume that sometimes both branches will be relevant for a single execution, so it has to add predicates to the second branch to eliminate data that might have been found in the first branch. This is the (small) penalty we pay for avoiding a “fully-informed” manual rewrite.

Take a look at the Outline Data – we can see our or_expand() hint repeated there, and we can discover that it’s been enhanced. The hint should have been or_expand(@sel$1 (1) (2)). This might prompt you to modify the original SQL to use the fully qualified hint rather than the bare-bones form we’ve got so far. So let’s assume we do that before shipping the code to production.

Now imagine that a couple of months later an enhancement request appears to allow queries on post_code and the front-end has been set up so that we can specify a post_code query by selecting choice number 3. The developer who happens to pick up the change request duly modifies the SQL as follows:


select
        /*+ or_expand(@sel$1 (1) (2)) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
or     ( :choice = 3 and post_code = :val)
;

Note that we’ve got the “complete” hint in place, but there’s now a 3rd predicate. Do you think the hint is still complete ? What do you think will happen when we run the query ? Here’s the execution plan when I set :choice to 3.


select  /*+ or_expand(@sel$1 (1) (2)) */  count(data) from    address
where  ( :choice = 1 and street = :val ) or     ( :choice = 2 and
suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 3986461375

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |      0 |00:00:00.01 |
|   3 |    UNION-ALL                            |                 |      1 |        |      0 |00:00:00.01 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

We get a UNION ALL with two branches, one for :choice = 1, one for :choice = 2 and both of them show zero starts – and we don’t have any part of the plan to handle :choice = 3. The query returns no rows – and if you check the table creation code you’ll see it should have returned 1000 rows. An incorrect (historically adequate) hint has given us wrong results.

If we want the full hint for this new queryy we need to specify the 3rd predicate, by adding (3) to the existing hint to get the following plan (and correct results):


select  /*+ or_expand(@sel$1 (1) (2) (3)) */  count(data) from
address where  ( :choice = 1 and street = :val ) or     ( :choice = 2
and suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 2153173029

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |    1639 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |    1639 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  11009 |   1000 |00:00:00.01 |    1639 |
|   3 |    UNION-ALL                            |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |
|*  9 |     FILTER                              |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|* 10 |      TABLE ACCESS FULL                  | ADDRESS         |      1 |    999 |   1000 |00:00:00.01 |    1639 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      OUTLINE_LEAF(@"SET$49E1C21B_2")
      OUTLINE_LEAF(@"SET$49E1C21B_1")
      OUTLINE_LEAF(@"SET$49E1C21B")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1")
      FULL(@"SET$49E1C21B_2" "ADDRESS"@"SET$49E1C21B_2")
      FULL(@"SET$49E1C21B_3" "ADDRESS"@"SET$49E1C21B_3")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))
   9 - filter(:CHOICE=3)
  10 - filter(("POST_CODE"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL)) AND (LNNVL(:CHOICE=2) OR
              LNNVL("SUBURB"=:VAL))))


We now have three branches to the UNION ALL, and the final branch (:choice =3) ran to show A-rows = 1000 selected in the tablescan.

Conclusion

You shouldn’t mess about with hints unless you’re very confident that you know how they work and then test extremely carefully – especially if you’re modifying old code that already contains some hints.

 

April 8, 2019

Describe Upgrade

Filed under: 18c,Bugs,Infrastructure,Oracle,Performance — Jonathan Lewis @ 11:02 am BST Apr 8,2019

Here’s an odd little change between Oracle versions that could have a stunning impact on the application performance if the thing that generates your client code happens to use an unlucky selection of constructs.  It’s possible to demonstrate the effect remarkably easily – you just have to describe a table, doing it lots of times to make it easy to see what’s happening.

rem
rem     Script:         describe_18c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem 

create table t1 as
select  *
from    all_objects
where   rownum = 1
;

set heading off
set feedback off
set pagesize 0
set linesize 156
set trimspool on
set termout off
set serveroutput on

execute snap_rowcache.start_snap
execute snap_libcache.start_snap

start start_10000
-- start_1 contains "describe t1"

set termout on
set serveroutput on

spool temp
execute snap_rowcache.end_snap
execute snap_libcache.end_snap

spool off

The start_10000 script is my mechanism for running a simple piece of code many times, and as the comment following it says, all I’m doing is repeating “describe t1”. The calls to “snap” something are examples of procedures I use to find the changes recorded in various dynamic performance views over short periods of time (there’s an example of the code for v$mystat here) In this case, as the names suggest, the snapshots record the changes in v$rowcache (the dictionary cache) and v$librarycache (the library cache). I’ve put a simple variant of the code at the end of the blog note so you don’t have to do all the preparation if you want to run a quick test for yourself.

Here are the results I get when running the test in Oracle 18.3.0.0

---------------------------------
Dictionary Cache - 05-Apr 19:00:00
Interval:-      27 seconds
---------------------------------
Parameter                 Sub# Usage Fixed    Gets  Misses   Scans  Misses    Comp    Mods Flushes
---------                ----- ----- -----    ----  ------   -----  --------------    ---- -------
dc_objects                         0     0 260,051       0       0       0       0       0       0
dc_users                           0     0 260,000       0       0       0       0       0       0
---------------------------------
Library Cache - 05-Apr 19:00:00
Interval:-      27 seconds
---------------------------------
Type      Description              Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
-----     -----                    ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE TABLE/PROCEDURE        10,003      10,003   1.0     280,028     280,028   1.0         0         0

Before showing you corresponding figures from 12.2.0.1 I’ll just point out that in version 18.3.0.0 of Oracle the structure of view all_objects gives me a table of 26 columns. Think about that and the 10,000 describes while looking at the number above, then look at the corresponding 12.2.0.1 results:

---------------------------------
Dictionary Cache - 05-Apr 19:00:00 
Interval:-      28 seconds
---------------------------------
Parameter                 Usage Fixed    Gets  Misses   Scans  Misses    Comp    Mods Flushes
---------                 ----- -----    ----  ------   -----  --------------    ---- -------
dc_users                      0     0       2       0       0       0       0       0       0
dc_objects                    0     0       9       0       0       0       0       0       0

---------------------------------
Library Cache - 05-Apr 19:04:17
Interval:-      28 seconds
---------------------------------
Type      Description              Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
-----     -----                    ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE TABLE/PROCEDURE        10,005      10,005   1.0      20,018      20,018   1.0         0         0

The internal mechanism of the “describe” call has changed between 12.2.0.1 to 18.3.0.0.

For each describe in 18.3, for each column in the table you see a “get” on dc_users and dc_objects in v$rowcache and you see one “get” on the TABLE/PROCEDURE namespace in v$librarycache, and (2 + number of columns) “pins”. In 12.2.0.1 there are no gets on the dictionary cache and only 1 get and two pins in the library cache for each describe.

As a couple of extra checks I modified the test to query from a 12c client to and 18c server, then from an 18c client to a 12c server. The big numbers appeared in the former test (i.e. when the server is 18c) and the small number for the latter (when the server is 12c). I also tried a couple of other variations on the theme:

  • If the table t1 doesn’t exist when I run the test then there are no gets on the row cache, and I see 2 gets and pins (with hits) on the library cache per describe.
  • If I run the test using “decribe other_schema.t1 the pins (and hits) on the library cache go up by 1 per describe
  • If I execute “alter session set current_schema = other_schema” so that “describe t1” is actually describing a table in another schema the pins (and hits) on the library cache go up by 1 per describe
  • If I run the test in the SYS schema, 18c behaves like 12c !! But SYS is often a little wierd compared to other schemas

Enabling the 10051 trace – I can see that both versions report an OPI call type = 119: “V8 Describe Any” for each call to “describe” (which, presumably, corresponds to the OCIDescribeAny() low-level function call). And that’s really where this blog started, and why lots of people might need to be aware (at least in the short term) of this change in behaviour across versions .

Welcome to the Real World.

My demonstration is clearly silly – no-one does hundreds of describes per second in a real application, surely. Well, not deliberately, and not necessarily with the intent to do a full describe, but sometimes n-tier development environments end up generating code that does things you might not expect. One such example is the way that JDBC can handle a requirement of the form:

insert into tableX( id, ......) values (my_sequence.nextval, ...) returning id into ?;

In the course of handling this requirement one of the coding strategies available to JDBC ends up executing the type 119 “V8 Describe Any” call. Imagine the effect this has when you have a couple of dozen concurrent sessions totalling a couple of million single row inserts per hour. The competition for library cache pins and row cache gets is massive – and the memory structures involved are all protected by mutexes. So when a a client of mine recently upgraded their system from 11.2.0.4 to 18.3.0.0 they saw “library cache: mutex X” waits change from a few hundred seconds per hour to tens of thousands of seconds, and “row cache mutex” leaping up  from nowhere in the “Top timed events” to reporting further even more thousands of seconds of wait time per hour.

The actual impact of this issue will depend very strongly on how much use you (or your ORM) makes of this construct. The problem may be particularly bad for my client because of the very large number of concurrent executions of a very small number of distinct statements that all address the same table. For low concurrency, or for a wide range of different tables and statements, you may not see so much contention.

If you are seeing contention for “row cache mutex” and “library cache: mutex X”, then a quick corroborative test (if you are licensed for the performance and dianostic packs) is to check the top_level_call# and top_level_call_name from v$active_session_history:

select
        top_level_call#, top_level_call_name, count(*)
from    v$active_session_history
group by
        top_level_call#, top_level_call_name
order by
        count(*)

If (119, ‘V8 Describe Any’) shows up as a significant fraction of the total then you’re probably looking at this issue.

Java is not my strong point – but here’s a trivial piece of standalone Java that you can use to demonstrate the issue if you’re familiar with running Java on the server. There are a few notes inline to explain necessary preparatory steps and code changes:


/*
        To create a class file, you need to execute
        javac temptest2.java

        This generates file temptest2.class
        If this is successful then execute
        java temptest {number of iterations} {commit frequency}

        e.g.
        java temptest2 10000 10

        To be able to compile, you need a CLASSPATH environment variable
        e.g. export CLASSPATH=$CLASSPATH:$ORACLE_HOME/jdbc/lib/ojdbc8.jar

        For java to see the created class the CLASSPATH must also include 
        the holding directory
        e.g. export CLASSPATH=$CLASSPATH:/mnt/working

        Example combined setting:
        export CLASSPATH=$CLASSPATH:$ORACLE_HOME/jdbc/lib/ojdbc8.jar:/mnt/working

        A schema will need to be created to hold two objects,
        And the connection string in the code below will need to be modified -
        and the listener has to be started and the database registered to it.

        Database objects:
        -----------------
        create sequence s1;

        create table test(
                id number, 
                code varchar2(32), 
                descr varchar2(32), 
                insert_user varchar2(32),
                insert_date date
        );

*/


import java.sql.*;
import oracle.jdbc.OracleDriver;
import java.util.Date;

public class temptest2
{
  public static void main (String arr[]) throws Exception
  {
    DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
    Connection con = DriverManager.getConnection
          ("jdbc:oracle:thin:@localhost:1521:or18","test_user","test");

    Integer iters = new Integer(arr[0]);
    Integer commitCnt = new Integer(arr[1]);

    con.setAutoCommit(false);
    doInserts( con, iters.intValue(), commitCnt.intValue() );

    con.commit();
    con.close();
  }

  static void doInserts(Connection con, int count, int commitCount )
  throws Exception
  {

    int  rowcnt = 0;
    int  committed = 0;
    long start = new Date().getTime();

    for (int i = 0; i < count; i++ ) {
      PreparedStatement ps = con.prepareStatement(
           "insert into test (id, code, descr, insert_user, insert_date) " +
                     "values (s1.nextval,?,?, user, sysdate)",
           new String[]{"id"}
      );
      ps.setString(1,"PS - code" + i);
      ps.setString(2,"PS - desc" + i);
      ps.executeUpdate();

      ResultSet rs = ps.getGeneratedKeys();
      int x = rs.next() ? rs.getInt(1) : 0;
      System.out.println(x);
                
      rowcnt++;
      ps.close();

      if ( rowcnt == commitCount )
      {
        con.commit();
        rowcnt = 0;
        committed++;
      }
    }
    long end = new Date().getTime();
    con.commit();
    System.out.println
    ("pstatement " + count + " times in " + (end - start) + " milli seconds committed = "+committed);
  }
}

/*
 *
 * Sample from trace file after setting events 10046 and 10051:
 *
 * OPI CALL: type=119 argc= 7 cursor=  0 name=V8 Describe Any
 * OPI CALL: type=94 argc=38 cursor=  0 name=V8 Bundled Exec
 * PARSE #140693461998224:c=0,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3884345238,tim=1368184246
 * EXEC #140693461998224:c=0,e=135,p=0,cr=0,cu=4,mis=0,r=1,dep=0,og=1,plh=3884345238,tim=1368184411
 * OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
 * CLOSE #140693461998224:c=0,e=15,dep=0,type=3,tim=1368185231
 * OPI CALL: type=14 argc= 0 cursor=  0 name=COMMIT
 * XCTEND rlbk=0, rd_only=0, tim=1368185291
 * OPI CALL: type=119 argc= 7 cursor=  0 name=V8 Describe Any
 * OPI CALL: type=94 argc=38 cursor=  0 name=V8 Bundled Exec
 * PARSE #140693461998224:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3884345238,tim=1368187929
 * EXEC #140693461998224:c=0,e=162,p=0,cr=0,cu=4,mis=0,r=1,dep=0,og=1,plh=3884345238,tim=1368188141
 * OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
 * CLOSE #140693461998224:c=0,e=6,dep=0,type=3,tim=1368189336
 * OPI CALL: type=14 argc= 0 cursor=  0 name=COMMIT
 * XCTEND rlbk=0, rd_only=0, tim=1368189373
 *
*/


You’ll notice that I’ve prepared, executed and closed a statement inside a loop. The problem wouldn’t happen if I prepared the statement before the loop and closed it after the loop, doing nothing but the execute inside the loop; but the code is simply modelling the “single row processing” effect that typically appears through ORMs.

You’ll have to decide for yourself how to take snapshots of the dynamic performance views while this code is running, and how to emable tracing – but anyone who want to fiddle with the code is probably better at coding Java than I am – so it’s left as an exercise to the reader (I used a logon trigger for the traces, and snap_rowcache and snap_libcache from another session).

There is a certain cruel irony to this issue.  For years I have been telling people that

    insert into my_table(id, ...) values(my_sequence.nextval,...) returning id into :bind1;

is more efficient than:

    select my_sequence.nextval into :bind1 from dual;
    insert into my_table(id,.....) values(:bind1, ...);

If, at present, you’re using  Hibernate as your ORM it generates code that does the (inefficient, bad practice) latter and you won’t see the “describe” problem.

Footnote

If you want a very simple SQL*Plus script to see the effect – and have privileges to query v$rowcache and v$librarycache – here’s a hundred describes with a little wrapper to show the effect:

em
rem     The schema running this script must not be SYS
rem     but must be granted select on v_$rowcache and
rem     v_$librarycache. For the results to be clearly
rem     visible the test needs to be run while virtually
rem     nothing else is running on the instance.
rem
rem     In 18.3.0.0 every describe seems to access 
rem     dc_users, dc_objects, and pin the library cache
rem     once for every column in the table described
rem     (plus a fixed "overhead" of 2 pins and one get)
rem
rem     When run by SYS the counts fall back to the
rem     12.2  numbers -i.e. only two pins and one get
rem     on the libraray cache with no accesses to the 
rem     dictionary cache
rem
rem     The excess gets and pins disappear in 19.2, 
rem     thought the pin count on the library cache 
rem     goes up to 4 per describe.
rem

drop table t1 purge;
create table t1(
        n1 number,
        n2 number,
        n3 number,
        n4 number,
        n5 number,
        n6 number,
        n7 number
)
;


-- variant create table t1 as select * from all_objects where rownum = 1;


set serveroutput off
set linesize 167
set trimspool on


spool temp_desc
select namespace, gets, pins from v$librarycache where namespace = 'TABLE/PROCEDURE';
select parameter, gets from v$rowcache where parameter in ('dc_users','dc_objects') and subordinate# is null;
spool off

set termout off

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

set termout on
set serveroutput on

spool temp_desc append

select namespace, gets, pins from v$librarycache where namespace = 'TABLE/PROCEDURE';
select parameter, gets from v$rowcache where parameter in ('dc_users','dc_objects') and subordinate# is null;

spool off



set doc off
doc

Sample output from 18.3
=======================
NAMESPACE                                                              GETS       PINS
---------------------------------------------------------------- ---------- ----------
TABLE/PROCEDURE                                                       27449      71108


PARAMETER                              GETS
-------------------------------- ----------
dc_users                              17341
dc_objects                           115830


NAMESPACE                                                              GETS       PINS
---------------------------------------------------------------- ---------- ----------
TABLE/PROCEDURE                                                       27555      72017


PARAMETER                              GETS
-------------------------------- ----------
dc_users                              18041
dc_objects                           116533


Note change in rowcache gets - one per column per describe on each parameter.
Note change in library cache pins - (one per column + 2) per describe.

Sample output from 12.2
=======================
NAMESPACE                  GETS       PINS
-------------------- ---------- ----------
TABLE/PROCEDURE           13393      20318


PARAMETER                              GETS
-------------------------------- ----------
dc_users                               4889
dc_objects                            31413


NAMESPACE                  GETS       PINS
-------------------- ---------- ----------
TABLE/PROCEDURE           13504      20539


PARAMETER                              GETS
-------------------------------- ----------
dc_users                               4889
dc_objects                            31416


No change in v$rowcache
Only the same single get and the "+2" pins per describe in v$librarycache

#


The good news is that I sent this text to a colleague who has access to Oracle 19, and the problem goes away (almost completley) – there are just two extra pins on the library cache in Oracle 19 compared to Oracle 12, and no gets on the rowcache dc_users and dc_objects. This suggests that it’s a known issue (even though there’s no visible bug report, and the problem is still present in 18.5) so it may be possible to get a backport of the version 19 code for vesion 18 fairly quickly. If not the best temporary workaround is probably to bypass the ORM and manually code for a function call that processes an anonymous PL/SQL block – but I haven’t tested that idea yet.

There is a JDBC cursor cache available – and if this were enabled than the prepared statement that was closed by the code would be kept open by the JDBC cache (and, of course, still be present in Oracle’s v$open_cursor) and Oracle wouldn’t receive any further parse or “describe” calls. Unfortunately it seems that there’s a cursor leak (still) in the JDBC caching algorithm that will lead to sessions hitting Oracle error “ORA-01000: maximum open cursors exceeded.”

Acknowledgements.

I’d particularly like to thank Lasse Jenssen who spent a couple of hours with me (when he could have been attending some interesting sessions) at the OUG Ireland conference a few days ago, working through various tests and strategies to pin down the problem and attempt to circumvent it. (Any criticism of the Java code above should, nevertheless be aimed at me).

Update

This problem is now visible on MoS as: Bug 29628952 : INCREASED ROW CACHE GETS ON DC_OBJECTS AND DC_USERS FOR DESCRIBE IN 18C AND LATER.

It’s also visible as Bug 29628647 : INCREASED GETS FOR DC_OBJECTS AND DC_USERS FOR DESCRIBE, unfortunately this latter bug has been associated with version 19.2 – where the problem doesn’t exist so the analyst has reported back (quite corretly) with “I see no problem.”

 

November 30, 2018

Index rebuild bug

Filed under: Bugs,Indexing,Oracle — Jonathan Lewis @ 1:02 pm BST Nov 30,2018

I tweeted a reference yesterday to a 9 year old article about index rebuilds, and this led me on to look for an item that I thought I’d written on a related topic. I hadn’t written it (so there’s another item on my todo list) but I did discover a draft I’d written a few years ago about an unpleasant side effect relating to rebuilding subpartitions of local indexes on composite partitoned tables. It’s probably the case that no-one will notice they’re suffering from it because it’s a bit of an edge case – but you might want to review the things your system does.

Here’s the scenario: you have a large table that is composite partitioned with roughly 180 daily partitions and 512 subpartitions (per partition). For some strange reason you have a couple of local indexes on the table that have been declared unusable – hoping, perhaps, that no-one ever does anything that makes Oracle decide to rebuild all the unusable bits.

One day you decide to rebuild just one subpartition of one of the indexes that isn’t marked as unusable. You might be planning to rebuild every single subpartition of that index overnight, but you’re going to start with just one to see how long it takes. Something very strange happens – and here’s a simple model to demonstrate:

rem
rem     Script:         index_rebuild_pt_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2015
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.1
rem             11.2.0.4
rem

create table interval_hash (
        n1 number,
        n2 number,
        n3 number
)
segment creation immediate
partition by range(n1) interval (1)
subpartition by hash (n2) subpartitions 16 (
        partition p1 values less than (2) 
)
;


begin
        for i in 1 .. 16 loop
                for j in 1..64 loop
                        insert into interval_hash i(n1, n2, n3) values (i, j, j + 64*(i-i));
                end loop;
        end loop;

        commit;
end;
/

create index ih_i1 on interval_hash(n1) local;
create index ih_i2 on interval_hash(n2) local;
create index ih_i3 on interval_hash(n3) local;

alter index ih_i1 unusable;
alter index ih_i2 unusable;

The code creates a table which extends as data arrives to have 16 partitions with 16 subpartitions each – for a total of 256 data segments. After loading the data I’ve created 3 local indexes on the table and made two of them unusable.

After setting up the table and indexes I’ve identified one subpartition of the table by name, enabled tracing, and rebuilt the corresponding subpartition of the index which is currently usable (the same effect appears if I rebuild a partition of one of the unusable indexes, but the phenomenon is slightly more surprising if you rebuild a usable subpartition). Here’s the code for the rebuild:


column max_subp new_value m_subp

select
        max(partition_name) max_subp
from
        user_segments
where
        segment_name = 'INTERVAL_HASH'
;

alter session set events '10046 trace name context forever, level 4';

alter index ih_i3 rebuild subpartition &m_subp;

alter session set events '10046 trace name context off';

Would you expect to find anything interesting in the trace file after running it through tkprof ? Here’s the most frequently executed SQL statement I found when running this test on an instance of Oracle 18.3:


tkprof or18_ora_24939.trc temp sort=execnt

SQL ID: 0yn07bvqs30qj Plan Hash: 866645418

select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg,
  maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg,
  maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg,
   cmpflag_stg, cmplvl_stg,imcflag_stg, ccflag_stg, flags2_stg
from
 deferred_stg$  where obj# =:1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    512      0.02       0.03          0          0          0           0
Fetch      512      0.00       0.00          0       1536          0         512
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1025      0.03       0.03          0       1536          0         512

This query runs once for every single subpartition of the two unusable indexes. (There’s another statement that runs once for every partition of the two unusable indexes to provide the object numbers of the subpartitions and that shouldn’t be forgotten). In my example the impact and time to run doesn’t look too bad – but when the numbers climb to a couple of hundred thousand executions before you start to rebuild the first subpartition you might start to worry. Depending on the state of your data dictionary, and how you got to the point where you had so many unusable segments, the time to execute could become large, and you might do most of it all over again for the next subpartition!

You might wonder why anyone would have a couple of unusable indexes. First, many years ago (in Practival Oracle 8i) I pointed out that if you wanted to create a new locally partitioned index you might want to create it unusable and then rebuild each partition in turn – that might not be a good idea any more. (The book also pointed out the requirement to think about sizing the dictionary cache (rowcache)).

Secondly, before the introduction of partial indexing it was possible to emulate the feature manually for local indexes by setting partitions and subpartition unusable and allowing the optimizer to use table expansion to pick the best plan for partitions that had different index partitions still usable.

Finally if you are using the new partial indexing feature of 12.2 where you can set the default characteristic of a partitioned table to “indexing off”, and the default characteristic of an index to “indexing partial”, the partitions of any local index that are not created are deemed to be deferred – but you won’t see the effect in my example unless you modify it to include partial indexes and include an “alter system flush shared pool” just before the rebuild.

Footnote

I’ve found this bug (or something very similar) on MoS: Bug 17335646 : ALTER INDEX IDX REBUILD SUBPARTITION SP VISITS EVERY INDEX AND SUBPARTION. However the bug was recorded against 11.2.0.3 and its status is: “31 – Could Not Reproduce. To Filer”. It does seem to be terribly easy to reproduce, though, provided you have a large number of unusable subpartitions in your indexes – so it’s possible the original bug appeared even when there were no unusable subpartitions (the customer comments about the bug don’t give any suggestion that there might be unusable indexes in place – and it seems unlikely that the 22 indexes mentioned were all unusable).

 

 

 

 

 

November 15, 2018

num_index_keys

Filed under: 12c,Bugs,CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:13 pm BST Nov 15,2018

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

October 22, 2018

Column Groups

Filed under: Bugs,CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:36 pm BST Oct 22,2018

Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:


rem
rem     Script:         distinct_key_prob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2016
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > commment to avoid wordpress format issue
)
select
        cast(mod(rownum-1,10) as number(8,0))   non_null,
        cast(null as number(8,0))               null_col,
        cast(lpad(rownum,10) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > commment to avoid wordpress format issue
;

create index t1_i1 on t1(null_col, non_null);

begin

/*
        dbms_output.put_line(
                dbms_stats.create_extended_stats(user,'t1','(non_null, null_col)')
        );
*/

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

So I have a table with 1,000,000 rows; one of its columns is always null and another has a very small number of distinct values and is never null (though it hasn’t been declared as not null). I’ve created an index that starts with the “always null” column (in a production system we’d really be looking at a column that was “almost always” null and have a few special rows where the column was not null, so an index like this can make sense).

I’ve also got a few lines, commented out, to create extended stats on the column group (non_null, null_col) because any anomaly relating to the handling of the number of distinct keys in a multi-column index may also be relevant to column groups. I can run two variations of this code, one with the index, one without the index but with the column group, and see the same cardinality issue appearing in both cases.

So let’s execute a couple of queries – after setting up a couple of bind variables – and pull their execution plans from memory:


variable b_null    number
variable b_nonnull number

exec :b_null    := 5
exec :b_nonnull := 5

set serveroutput off

prompt  ===================
prompt  Query null_col only
prompt  ===================

select  count(small_vc)
from    t1
where
        null_col = :b_null
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

prompt  =========================
prompt  Query (null_col,non_null)
prompt  =========================

select  count(small_vc)
from    t1
where
        null_col = :b_null
and     non_null = :b_nonnull
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

The optimizer has statistics that tell it that null_col is always null so its estimate of rows where null_col = 5 should be zero (which will be rounded up to 1); and we have an index starting with null_col so we might expect the optimizer to use an index range scan on that index for these queries. Here are the plans that actually appeared:


SQL_ID  danj9r6rq3c7g, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |    24 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    24 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("NULL_COL"=:B_NULL)



SQL_ID  d8kbtq594bsp0, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null and non_null =
:b_nonnull

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  2189 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    27 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  2636K|  2189   (4)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("NULL_COL"=:B_NULL AND "NON_NULL"=:B_NONNULL))

Take a careful look at what we’ve got: the second query has to access exactly the same table rows as those identified by the first query and then apply a second predicate which may discard some of those rows – but the optimizer has changed the access path from a low-cost index-driven access to a high cost tablescan. This is clearly idiotic – there has to be a flaw in the optimizer logic in this situation.

The defect revolves around a slight inconsistency in the handling of columns groups – whether they are explicitly created, or simply inferred by reference to user_indexes.distinct_keys. The anomaly is most easily seen by explicitly creating the column group, gathering stats, and reporting from user_tab_cols.


select
        column_name, sample_size, num_distinct, num_nulls, density, histogram, data_default
from
        user_tab_cols
where
        table_name = upper('T1')
order by
        column_id

;

COLUMN_NAME			       Sample	  Distinct  NUM_NULLS	 DENSITY HISTOGRAM	 DATA_DEFAULT
-------------------------------- ------------ ------------ ---------- ---------- --------------- --------------------------------------------
NON_NULL			    1,000,000		10	    0	      .1 NONE
NULL_COL						 0    1000000	       0 NONE
SMALL_VC			    1,000,000	   995,008	    0 .000001005 NONE
PADDING 			    1,000,000		 1	    0	       1 NONE
SYS_STULC#01EE$DE1QB7UY1K4$PBI	    1,000,000		10	    0	      .1 NONE		 SYS_OP_COMBINED_HASH("NON_NULL","NULL_COL")

As you can see, the optimizer can note that “null_col” is always null so the arithmetic for “null_col = :bind1” is going to produce a very small cardinality estimate; on the other hand when the optimizer sees “null_col = :bind1 and non_null = :bind2” it’s going to transform this into the single predicate “SYS_STULC#01EE$DE1QB7UY1K4$PBI = sys_op_combined_hash(null_col, non_null)”, and the statistics say there are 10 distinct values for this (virtual) column with no nulls – hence the huge cardinality estimate and full tablescan.

The “slight inconsistency” in handling that I mentioned above is that if you used a predicate like “null_col is null and non_null = :bind2″ the optimizer would not use column group because of the “is null” condition – even though it’s exactly the case where the column group statistics would be appropriate. (In the example I’ve constructed the optimizer’s estimate from ignoring the column group would actually be correct – and identical to the estimate it would get from using the column group – because the column is null for every single row.)

tl;dr

Column groups can give you some very bad estimates, and counter-intuitive behaviour, if any of the columns in the group has a significant percentage of nulls; this happens because the column group makes the optimizer lose sight of the number of nulls in the underlying data set.

 

August 22, 2018

Descending bug

Filed under: Bugs,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:20 pm BST Aug 22,2018

Following on from Monday’s posting about reading execution plans and related information, I noticed a question on the ODC database forum asking about the difference between “in ({list of values})” and a list of “column = {constant}” predicates connected by OR. The answer to the question is that there’s essentially no difference as you would be able to see from the predicate section of an execution plan:


SELECT  c1, c2, c3, c4, c5, c6, c7, c8..  
FROM    TAB1  
WHERE   STS IN ( 'A', 'B')  
AND     cnt < '4'  
AND     dt < sysdate  
and     rownum <=1;  
  
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 33399   (1)| 00:03:14 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   114K|   201M| 33399   (1)| 00:03:14 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   114K|       |   723   (1)| 00:00:05 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"<SYSDATE@!)  
   4 - access("STS"='A' OR "STS"='B')  

Note how the predicate section tells you that the original “sts in ( ‘A’, ‘B’ )” has been transformed into “sts = ‘A’ or sts = ‘B'”.

A further point I made about IN-lists in Monday’s post was that as one step in the transformation Oracle would sort the list and eliminate duplicates, and it suddenly occurred to me to wonder whether Oracle would sort the list in descending order if the only relevant index were defined to start with a descending column. Naturally I had to try it so here’s a suitable script to prepare some data:

rem
rem     Script:         descending_bug_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0        Crashes
rem             12.2.0.1        Crashes
rem             12.1.0.2        Crashes
rem             11.2.0.4        Bad Plan
rem

create table t1
nologging
pctfree 95 pctused 5
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        case mod(rownum,1000)
                when 0 then 'A'
                when 3 then 'B'
                when 6 then 'C'
                       else 'D'
        end                             sts,
        case mod(rownum,1000)
                when 0 then '1'
                when 3 then '2'
                when 6 then '3'
                       else '4'
        end                             cnt,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

create index t1_i1a on t1(sts) nologging;
create index t1_i1d on t1(sts desc) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size skewonly'
        );
end;
/

There is one oddity in this script – if you’ve got every column in an index declared as DESC you’ve made a mistake and none of the columns should be declared as DESC. The feature is relevant only if you want a mixture of ascending and descending column in a single index.

An important detail of the script is that I’ve gathered stats AFTER creating the objects – it’s important to do this, even in 18.3, because (a) creating the “descending” index will result in a hidden virtual column being created to represent the descending column and I want make sure I have stats on that column and (b) the “stats on creation” code doesn’t generate histograms and I want a (frequency) histogram on columns sts and the hidden, virtual, descending version of the column.

After generating the data and checking that I have the correct histograms for sts and sys_nc00006$ (the relevant hidden column) I can then run the following test:

set serveroutput off
alter session set statistics_level = all;

alter index t1_i1d invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter index t1_i1d   visible;
alter index t1_i1a invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
set serveroutput on


The code makes one index invisible then runs a query that should use an inlist iterator; then it switches indexes making the invisible one visible and vice versa and repeats the query. I’ve enabled rowsource execution statistics and pulled the execution plans from memory to make sure I don’t get fooled by any odd glitches that might exist within “explain plan”. Here are the results from 11.2.0.4 – normal index, then descending index – with a little cosmetic cleaning:


S   COUNT(*)
- ----------
B        100
C        100

SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.01 |       5 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.01 |       5 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    200 |00:00:00.01 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |      2 |    178 |    200 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))


Index altered.
Index altered.


S   COUNT(*)
- ----------
C        100
B        100


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.02 |     198 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.02 |     198 |
|*  2 |   INDEX FULL SCAN    | T1_I1D |      1 |   1000 |    200 |00:00:00.02 |     198 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


As expected we see counts of 100 for ‘B’s and ‘C’s, and we also see that the “sort group by nosort” operation with the descending index has produced the results in reverse order. The problem though is that the optimizer has decided to use an “index full scan” on the descending index, and the estimate of the rows returned is terribly wrong (and seems to be the common “5% guess”, used once for each target value), and the number of buffer visits is huge compared to the result from the normal index – Oracle really did walk every leaf block in the index to get this result. The predicate section also looks rather silly – why hasn’t the optimizer produced predicates more like: “sys_nc00006$ = sys_op_descend(‘B’)” ?

In passing you’ll notice that the estimated rows in the plan using the normal index is a little low. This is the result of Oracle using a small sample (ca. 5,500 rows) in 11g to gather histogram stats. 12c will do better for a frequency histogram with the fast algorithm it uses for a 100% (auto) sample size.

So 11g doesn’t do very well but we’ve got 12.1.0.2, 12.2.0.1, and (in the last couple of weeks) 18.3 to play with. Here’s the result from 12.1.0.2 and 12.2.0.1 for the query that should use the descending index:


select  sts, count(*)
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qernsRowP], [1], [], [], [], [], [], [], [], [], [], []


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      0 |00:00:00.01 |       0 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      0 |00:00:00.01 |       0 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    101 |00:00:00.03 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1D |      2 |    200 |    101 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR
              "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))

The query crashed! The plan, however, did look appropriate – the optimizer picked an inlist iterator, picked an index range scan, got the correct estimate of rows (index entries), and did better with the predicate section (though having used a sensible predicate for the access predciate it then used the bizarre version as the filter predicate). Judging from the A-rows column the query seems to have crashed at roughly the point where the optimizer was switching from the range scan for the first iteration into the range scan for the second iteration.

And then there’s Oracle 18.3 – which does the same as the 12c versions :(

To make sure that my silly “single column so it shouldn’t be declared descending” index was the sole cause of the problem I repeated the tests using a two-column index on (sts, cnt).

Conclusion:

Descending indexes or (to be more accurate) indexes with descending columns can still produce problems even in the very latest version of Oracle.

Footnote

Oracle MoS has the wonderful “ORA-600/ORA-7445/ORA-700 Error Look-up Tool (Doc ID 153788.1)” (which doesn’t yet allow you to choose 18.3 as a version) so I used this to do a look up for ORA-00600 errors with first paremeter qernsRowP in 12.2.0.1 and got the following suggestion from doc ID 285913.1: “set event:10119 to disable no-sort fetch and then reparse the failing SQL.” The example suggested setting the event to level 12, and this solved the problem for all three failing versions – but the suggestion came with a warning: “Setting this event at system level may impact the performance of database.” The execution plan (taken, in this case, from 18.2) may explain the warning:

 

S   COUNT(*)
- ----------
B	 100
C	 100

SQL_ID	f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	    |	   1 |	      |      2 |00:00:00.01 |	    4 |       |       | 	 |
|   1 |  HASH GROUP BY	   |	    |	   1 |	    2 |      2 |00:00:00.01 |	    4 |  1558K|  1558K|  659K (0)|
|   2 |   INLIST ITERATOR  |	    |	   1 |	      |    200 |00:00:00.01 |	    4 |       |       | 	 |
|*  3 |    INDEX RANGE SCAN| T1_I1D |	   2 |	  200 |    200 |00:00:00.01 |	    4 |       |       | 	 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


The plan has changed from using a “sort group by nosort” – which effectively means just keeping a running count as you go – to a real “hash group by” which means you have to do the hashing arithmetic for every value (though maybe there’s a deterministic trick that means Oracle won’t do the arithmetic if the next value to be hashed is the same as the previous value) and the actual memory used (659K) does seem a little extreme for counting two distinct values.

 

May 30, 2018

Index Bouncy Scan 3

Filed under: 12c,Bugs,CBO,Oracle — Jonathan Lewis @ 1:15 pm BST May 30,2018

This is a follow-up to a problem I had with yesterday’s example of using recursive CTEs to “bounce” along a multi-column index to pick out the unique set of combinations of the first two columns. Part of the resulting query used a pair of aggregate scalar subqueries in a select list – and Andrew Sayer improved on my query by introducing a “cross apply” (which I simply hadn’t thought of) which the optimizer transformed into a lateral view (which I had thought of, but couldn’t get to work).

After seeing what the Andrew and the optimizer had done I looked a little more closely at my lateral view experiment and modified it so that it worked. Here are the three critical versions of the relevant code fragment; first is my original code, then Andrew’s cross apply, then my working lateral view version:

select
        (select min(t1.val1) val1 from t1 where t1.val1 > bounce1.val1) val1,
        (select min(t1.val2) val2 from t1 where t1.val1 > bounce1.val1 and rownum = 1) val2
from    bounce1
where   bounce1.val1 is not null
 
 
select
        ca.val1 ,ca.val2
from    bounce1
cross  apply (select val1, val2
              from  (select /*+ index(t1) no_index_ffs(t1) */
                             val1, val2
                     from    t1
                     where   t1.val1 > bounce1.val1
                     and     rownum = 1
                    )
             ) ca
where  bounce1.val1 is not null
 
----

select
        ca.val1 ,ca.val2
from    bounce1, 
        lateral(select val1, val2
              from  (select /*+ index(t1) no_index_ffs(t1) */
                             val1, val2
                     from    t1
                     where   t1.val1 > bounce1.val1
                     and     rownum = 1
                    )
             ) ca
where  bounce1.val1 is not null

All I’ve done to modify Andrew’s code is put a comma after the table (actually CTE) bounce1, then change “cross apply” to “lateral”. Compare the resulting text with the following lateral version that doesn’t work:


select
        ca.val1 ,ca.val2
from    bounce1, 
        lateral (
                   select /*+ index(t1) no_index_ffs(t1) */
                             val1, val2
                     from    t1
                     where   t1.val1 > bounce1.val1
                     and     rownum = 1
             ) ca
where  bounce1.val1 is not null

To get from not working to working all I’ve done is wrap the text in my lateral() subquery inside one more (apparently redundant) layer of “select * from ()”!

In fact my assumption that my code wasn’t working was incorrect – what was really going on was that the code I had written was producing the wrong results but I thought that I had made a mistake in the way I was writing it and couldn’t figure out what I had done wrong.

Problem Solving:

To get a better idea of what’s going on, I took a closer look at the execution plans. Here are the plans (main body only) for the two variants of using the lateral() view – the first from the SQL with the “redundant” select, the second as I originally wrote it. Notice that the number of rows (A-Rows) returned in the first case is the 30 expected while in the second case it’s only 10.


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |      1 |        |   125 (100)|     30 |00:00:00.01 |      40 |     28 |       |       |          |
|   1 |  SORT ORDER BY                               |                 |      1 |      4 |   125   (3)|     30 |00:00:00.01 |      40 |     28 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                                       |                 |      1 |      4 |   124   (2)|     30 |00:00:00.01 |      40 |     28 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                 |      1 |        |            |     33 |00:00:00.01 |      40 |     28 |  1024 |  1024 |          |
|*  4 |     VIEW                                     |                 |      1 |      2 |    61   (2)|      3 |00:00:00.01 |       8 |      4 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |      3 |00:00:00.01 |       8 |      4 |  1024 |  1024 |          |
|*  6 |       VIEW                                   |                 |      1 |      1 |    29   (0)|      1 |00:00:00.01 |       2 |      1 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY                 |                 |      1 |  10000 |    29   (0)|      1 |00:00:00.01 |       2 |      1 | 73728 | 73728 |          |
|   8 |         INDEX FULL SCAN                      | T1_PK           |      1 |  10000 |    29   (0)|      2 |00:00:00.01 |       2 |      1 |       |       |          |
|   9 |       NESTED LOOPS                           |                 |      3 |      1 |    31   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|  10 |        RECURSIVE WITH PUMP                   |                 |      3 |        |            |      3 |00:00:00.01 |       0 |      0 |       |       |          |
|  11 |        VIEW                                  | VW_LAT_1BBF5C63 |      3 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|  12 |         VIEW                                 |                 |      3 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|* 13 |          COUNT STOPKEY                       |                 |      3 |        |            |      2 |00:00:00.01 |       6 |      3 |       |       |          |
|* 14 |           INDEX RANGE SCAN                   | T1_PK           |      3 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|  15 |     SORT AGGREGATE                           |                 |     30 |      1 |            |     30 |00:00:00.01 |      32 |     24 |       |       |          |
|  16 |      FIRST ROW                               |                 |     30 |      1 |     2   (0)|     27 |00:00:00.01 |      32 |     24 |       |       |          |
|* 17 |       INDEX RANGE SCAN (MIN/MAX)             | T1_PK           |     30 |      1 |     2   (0)|     27 |00:00:00.01 |      32 |     24 |       |       |          |
|  18 |     RECURSIVE WITH PUMP                      |                 |     11 |        |            |     30 |00:00:00.01 |       0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |      1 |        |   125 (100)|     10 |00:00:00.01 |      16 |       |       |          |
|   1 |  SORT ORDER BY                               |                 |      1 |      4 |   125   (3)|     10 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                                       |                 |      1 |      4 |   124   (2)|     10 |00:00:00.01 |      16 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                 |      1 |        |            |     11 |00:00:00.01 |      16 |  1024 |  1024 |          |
|*  4 |     VIEW                                     |                 |      1 |      2 |    61   (2)|      1 |00:00:00.01 |       4 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |      1 |00:00:00.01 |       4 |  1024 |  1024 |          |
|*  6 |       VIEW                                   |                 |      1 |      1 |    29   (0)|      1 |00:00:00.01 |       2 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY                 |                 |      1 |  10000 |    29   (0)|      1 |00:00:00.01 |       2 | 73728 | 73728 |          |
|   8 |         INDEX FULL SCAN                      | T1_PK           |      1 |  10000 |    29   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|   9 |       NESTED LOOPS                           |                 |      1 |      1 |    31   (0)|      0 |00:00:00.01 |       2 |       |       |          |
|  10 |        RECURSIVE WITH PUMP                   |                 |      1 |        |            |      1 |00:00:00.01 |       0 |       |       |          |
|* 11 |        VIEW                                  | VW_DCL_1BBF5C63 |      1 |      1 |     2   (0)|      0 |00:00:00.01 |       2 |       |       |          |
|* 12 |         COUNT STOPKEY                        |                 |      1 |        |            |      1 |00:00:00.01 |       2 |       |       |          |
|  13 |          INDEX FULL SCAN                     | T1_PK           |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       2 |       |       |          |
|  14 |     SORT AGGREGATE                           |                 |     10 |      1 |            |     10 |00:00:00.01 |      12 |       |       |          |
|  15 |      FIRST ROW                               |                 |     10 |      1 |     2   (0)|      9 |00:00:00.01 |      12 |       |       |          |
|* 16 |       INDEX RANGE SCAN (MIN/MAX)             | T1_PK           |     10 |      1 |     2   (0)|      9 |00:00:00.01 |      12 |       |       |          |
|  17 |     RECURSIVE WITH PUMP                      |                 |     11 |        |            |     10 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Most importantly we can see that the optimizer has used two different transformations. For the working query we see the view name VW_LAT_xxxxxxxx at operation 11, this is Oracle implementing a lateral view; for the problem query we see the view name VW_DCL_xxxxxxxx at operation 11, which is Oracle implementing a transformation to a “decorrelated lateral view”.

My first test after noting this difference was to see what would happen in I added the hint /*+ no_query_transformation */ to the query: it resulted in the VW_DCL_xxxxxxxx view name changing to VW_LAT_xxxxxxxx and the query producing the right result. Andrew Sayer, on the ODC thread, then pointed out that he’d done a couple more experiments and used the /*+ no_decorrelate() */ hint so I tried that with my query, adding it (with no parameters) to the subquery inside the lateral() clause – again the plan changed from using VW_DCL to VW_LAT and the results were correct.

Test Case

Bottom line on this – it looks like the optimizer is decorrelating a subquery when it shouldn’t, leading to wrong results. To make it easier to see this anomaly I stripped the original sample down to a basic test case starting with the table that I used in the previous posting:

rem
rem     Script:         decorralate.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem
rem     Last tested 
rem             18.1.0.0  -- via liveSQL
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum-1,3)                 val1,
        mod(rownum-1,10)                val2,
        lpad('x',100,'x')               padding
from
        generator       v1
order by
        dbms_random.value
;

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter table t1 add constraint t1_pk primary key(val1, val2, id);

Now two versions of a simplified piece of code that should select the distinct values of val1 greater than the lowest value (each row in the UNION ALL of dual is emulating the way in which yesterday’s recursive CTE was effectively saying “this is a current known value, find the next higher”):


prompt  =============
prompt  Right results
prompt  =============

select
        v1.val1, v1.val2
from    (
        select  0 val1, 0 val2 from dual
        union all
        select 1,0 from dual
        union all
        select 2,0 from dual
        ) bounce1,
        lateral (
            select val1, val2 from (
              select  /*+ index(t1) no_index_ffs(t1) */
                      t1.val1, t1.val2
              from    t1
              where   t1.val1 > bounce1.val1
              and     rownum = 1
            )
        ) v1
;

prompt  ===========================================
prompt  Wrong results -- "redundant" select removed
prompt  ===========================================

select
        v1.val1, v1.val2
from    (
        select  0 val1, 0 val2 from dual
        union all
        select 1,0 from dual
        union all
        select 2,0 from dual
        ) bounce1,
        lateral (
            -- select val1, val2 from (
              select  /*+ index(t1) no_index_ffs(t1) */
                      t1.val1, t1.val2
              from    t1
              where   t1.val1 > bounce1.val1
              and     rownum = 1
            -- )
        ) v1
;

Here’s a cut-n-paste from running the two queries:


=============
Right results
=============

      VAL1       VAL2
---------- ----------
         1          0
         2          0

2 rows selected.

============================================
Wrong results  -- "redundant" select removed
============================================

no rows selected

Finally, to get an idea of what’s gone wrong – and to show that the optimizer has done something wrong when attempting to decorrelate – we can take a look at the optimizer trace file to see the final transformed SQL that the optimizer has produced a plan for. (I enabled the trace with the command “alter session set events ‘trace [rdbms.SQL_Transform.*]’;” to limit the trace to just the information about optimizer transformations.) This – cosmetically altered – is the final “unparsed” query:

select 
        vw_dcl_a18161ff.val1 val1,
        vw_dcl_a18161ff.val2 val2 
from    ( 
                (select 0 val1 from sys.dual dual) 
                union all  
                (select 1 1 from sys.dual dual) 
                union all  
                (select 2 2 from sys.dual dual)
        ) bounce1, 
        (
        select
                 /*+ no_index_ffs (t1) index (t1) */ 
                t1.val1 val1_0,
                t1.val2 val2_1 
        from
                test_user.t1 t1
        where 
                rownum = 1
        ) vw_dcl_a18161ff 
where 
        vw_dcl_a18161ff.val1 > bounce1.val1

As you can see, the lateral view has turned into a non-mergeable inline view which selects the first row available from t1 by following the supplied hints, and joins that single row result set to bounce1. I have a suspicion that lateral views which include rownum predicates should not be decorrelated. I have looked on MoS to see if I can find any bugs related to decorrelating lateral views, but either there are none or my search terms weren’t good enough.

 

Upgrades

Filed under: 12c,Bugs,Function based indexes,Indexing,Oracle,Upgrades — Jonathan Lewis @ 10:08 am BST May 30,2018

One of my maxims for Oracle performance is: “Don’t try to be too clever”. Apart from the obvious reason that no-one else may be able to understand how to modify your code if the requirements change at a future date, there’s always the possibility that an Oracle upgrade will mean some clever trick you implemented will simply stop working.

While searching for information about a possible Oracle bug recently I noticed the following fix control (v$system_fix_control) in 12.2.0.1:


     BUGNO OPTIMIZE SQL_FEATURE                        DESCRIPTION                                                             VALUE
---------- -------- ---------------------------------- ---------------------------------------------------------------- ------------
  18385778          QKSFM_CARDINALITY_18385778         avoid virtual col usage if FI is unusable or invisible 

Maybe that’s just invalidated an idea I published 12 years ago.

I haven’t researched the bug or any underlying SR, but I can think of valid argument both for and against the fix as described.

 

 

February 21, 2018

Interval Partition Problem

Filed under: Bugs,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 8:40 am BST Feb 21,2018

Assume you’ve got a huge temporary tablespace, there’s plenty of space in your favourite tablespace, you’ve got a very boring, simple table you want to copy and partition, and no-one and nothing is using the system. Would you really expect a (fairly) ordinary “create table t2 as select * from t1” to end with an Oracle error “ORA-1652: unable to extend temp segment by 128 in tablespace TEMP” . That’s the temporary tablespace that’s out of space, not the target tablespace for the copy.

Here’s a sample data set (tested on 11.2.0.4 and 12.1.0.2) to demonstrate the surprise – you’ll need about 900MB of space by the time the entire model has run to completion:

rem
rem     Script:         pt_interval_threat_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

column today new_value m_today
select to_char(sysdate,'dd-mon-yyyy') today from dual;

create table t1
as
with g as (
        select rownum id
        from dual
        connect by level <= 2e3
)
select
        rownum id,
        trunc(sysdate) + g2.id  created,
        rpad('x',50)            padding
from
        g g1,
        g g2
where
        rownum <= 4e6 --> comment to avoid WordPress format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

I’ve created a table of 4 million rows, covering 2,000 dates out into the future starting from sysdate+1 (tomorrow). As you can see there’s nothing in the slightest bit interesting, unusual, or exciting about the data types and content of the table.

I said my “create table as select” was fairly ordinary – but it’s actually a little bit out of the way because it’s going to create a partitioned copy of this table.


execute snap_my_stats.start_snap

create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start       values less than (to_date('&m_today','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
as
select
        *
from
        t1
;

set serveroutput on
execute snap_my_stats.end_snap

I’ve created the table as a range-partitioned table with an interval() declared. Conveniently I need only mention the partitioning column by name in the declaration, rather than listing all the columns with their types, and I’ve only specified a single starting partition. Since the interval is 7 days and the data spans 2,000 days I’m going to end up with nearly 290 partitions added.

There’s no guarantee that you will see the ORA-01652 error when you run this test – the data size is rather small and your machine may have sufficient other resources to hide the problem even when you’re looking for it – but the person who reported the problem on the OTN/ODC database forum was copying a table of 2.5 Billion rows using about 200 GB of storage, so size is probably important, hence the 4 million rows as a starting point on my small system.

Of course, hitting an ORA-01652 on TEMP when doing a simple “create as select” is such an unlikely sounding error that you don’t necessarily have to see it actually happen; all you need to see (at least as a starting point in a small model) is TEMP being used unexpectedly so, for my first test (on 11.2.0.4), I’ve included some code to calculate and report changes in the session stats – that’s the calls to the package snap_my_stats. Here are some of the more interesting results:


---------------------------------
Session stats - 20-Feb 16:58:24
Interval:-  14 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
table scan rows gotten                                               4,000,004
table scan blocks gotten                                                38,741

session pga memory max                                             181,338,112

sorts (rows)                                                         2,238,833

physical reads direct temporary tablespace                              23,313
physical writes direct temporary tablespace                             23,313

The first couple of numbers show the 4,000,000 rows being scanned from 38,741 table blocks – and that’s not a surprise. But for a simple copy the 181MB of PGA memory we’ve acquired is a little surprising, though less so when we see that we’ve sorted 2.2M rows, and then ended up spilling 23,313 blocks to the temporary tablespace. But why are we sorting anything – what are those rows ?

My first thought was that there was a bug in some recursive SQL that was trying to define or identify dynamically created partitions, or maybe something in the space management code trying to find free space, so the obvious step was to enable extended tracing and look for any recursive statements that were running a large number of times or doing a lot of work. There weren’t any – and the trace file (particularly the detailed wait events) suggested the problem really was purely to do with the CTAS itself; so I ran the code again enabling events 10032 and 10033 (the sort traces) and found the following:


---- Sort Statistics ------------------------------
Initial runs                              1
Input records                             2140000
Output records                            2140000
Disk blocks 1st pass                      22292
Total disk blocks used                    22294
Total number of comparisons performed     0
Temp segments allocated                   1
Extents allocated                         175
Uses version 1 sort
Uses asynchronous IO

One single operation had resulted in Oracle sorting 2.14 million rows (but not making any comparisons!) – and the only table in the entire system with enough rows to do that was my source table! Oracle seems to be sorting a large fraction of the data for no obvious reason before inserting it.

  • Why, and why only 2.14M out of 4M ?
  • Does it do the same on 12.1.0.2 (yes), what about 12.2.0.1 (no – but I still saw a large PGA allocation, possibly stats gathering).
  • Is there any clue about this on MoS (yes Bug 17655392 – though that one is erroneously, I think, flagged as “closed not a bug”)
  • Is there a workaround ? (Yes – I think so).

Playing around and trying to work out what’s happening the obvious pointers are the large memory allocation and the “incomplete” spill to disc – what would happen if I fiddled around with workarea sizing – switching it to manual, say, or setting the pga_aggregate_target to a low value. At one point I got results showing 19M rows (that’s not a typo, it really was close to 5 times the number of rows in the table) sorted with a couple of hundred thousand blocks of TEMP used – the 10033 trace showed 9 consecutive passes (that I can’t explain) as the code executed from which I’ve extract the row counts, temp blocks used, and number of comparisons made:


Input records                             3988000
Total disk blocks used                    41544
Total number of comparisons performed     0

Input records                             3554000
Total disk blocks used                    37023
Total number of comparisons performed     0

Input records                             3120000
Total disk blocks used                    32502
Total number of comparisons performed     0

Input records                             2672000
Total disk blocks used                    27836
Total number of comparisons performed     0

Input records                             2224000
Total disk blocks used                    23169
Total number of comparisons performed     0

Input records                             1762000
Total disk blocks used                    18357
Total number of comparisons performed     0

Input records                             1300000
Total disk blocks used                    13544
Total number of comparisons performed     0

Input records                             838000
Total disk blocks used                    8732
Total number of comparisons performed     0

Input records                             376000
Total disk blocks used                    3919
Total number of comparisons performed     0

There really doesn’t seem to be any good reason why Oracle should do any sorting of the data (and maybe it wasn’t given the total number of comparisons performed in this case) – except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition. Thinking along these lines I decided to pre-create all the necessary partitions just in case this made any difference – the code is at the end of the blog note. Another idea was to create the table empty (with, and without, pre-created partitions), then do an “insert /*+ append */” of the data.

Nothing changed (much – though the number of rows sorted kept varying).

And then — it all started working perfectly with virtually no rows reported sorted and no I/O to the temporary tablespace !

Fortunately I thought of looking at v$memory_resize_ops and found that the automatic memory management had switched a lot of memory to the PGA, allowing Oracle to do whatever it needed to do completely in memory without reporting any sorting (and a couple of checks on v$process_memory showed that the amount of memory assigned to category “Other” – rather and “SQL”, interestingly – had exceeded the current value of the pga_aggregate_target). A quick re-start of the instance fixed that “internal workaround”.

Still struggling with finding a reasonable workaround I decided to see if the same anomaly would appear if the table were range partitioned but didn’t have an interval clause. This meant I had to precreate all the necessary partitions, of course – which I did by starting with an interval partitioned table, letting Oracle figure out which partitions to create, then disabling the interval feature – again, see the code at the end of this note.

The results: no rows sorted on the insert, no writes to temp. Unless it’s just a question of needing even more data to reproduce the problem with simple range partitioned tables, it looks as if there’s a problem somewhere in the code for interval partitioned tables and all you have to do to work around it is precreate loads of partitions, disable intervals, load, then re-enable the intervals.

Footnote:

Here’s the “quick and dirty” code I used to generate the t2 table with precreated partitions:


create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start values less than (to_date('&m_today','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
monitoring
as
select
        *
from
        t1
where
        rownum <= 0
;


<<expand>>
declare
        m_max_date      date;
begin
        select  max(created)
        into    expand.m_max_date
        from    t1
        ;

        <<loop>>
        for i in 1..expand.m_max_date - trunc(sysdate) loop
                dbms_output.put(
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') || chr(9)
                );
                execute immediate
                        'lock table t2 partition for ('''  ||
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') ||
                        ''') in exclusive mode'
                ;
        end loop;
        dbms_output.new_line();
end;
/

prompt  ========================
prompt  How to disable intervals
prompt  ========================

alter table t2 set interval();

The code causes partitions to be created by locking the relevant partition for each date between the minimum and maximum dates in the t1 table – locking the partition is enough to create it if it doesn’t already exists. The code is a little wasteful since it locks each partition 7 times as we walk through the dates, but it’s only a quick demo for a model and for copying a very large table wastage would probably be very small compared to the work of doing the actual data copy. Obviously one could be more sophisticated and limit the code to locking and creating only the partitions needed, and only locking them once each.

Update 28th Feb 2018

In comment #2 below Alexander Chervinskiy supplies another workaround. If your load is going to go to one partition and you know which one it is then SQL that specifies the target partition does not result in a massive sort – e.g. if you know you data is for 10th March 2018 then the following SQL would behave nicely:


insert
         /*+ append */
 into
        t2 partition for ('10-Mar-2018')
select
        *
from
        t1
/

Note, particularly, the use of the “for ({value})” clause that makes it possible to identify the correct partition without knowing its name.

January 19, 2018

Nested MVs

Filed under: 12c,Bugs,Infrastructure,Materialized view,Oracle — Jonathan Lewis @ 2:43 pm BST Jan 19,2018

A recent client was seeing a very large redo penalty from refreshing materialized views. Unfortunately they had to be refreshed very frequently, and were being handled with a complete refresh in atomic mode – which means delete every row from every MV then re-insert every row.  The total redo was running at about 5GB per hour, which wasn’t a problem for throughput, but the space for handling backup and recovery was getting a bit extreme.

The requirement consisted of two MVs which extracted and aggregated row and column subsets in two different ways from a single table; then two MVs that aggregated one of the first MVs in two different ways; then two MVs which each joined one of the first level MVs to one of the scond level MVs.

No problem – join MVs are legal, aggregate MVs are legal, “nested” MVs are legal: all you have to do is create the right MV logs and pick the right refresh command.  Since the client was also running Standard Editions (SE2) there was no need to worry about how to ensure that query rewrite would work (feature not implemented on SE).

So here, simplified and camouflaged, is a minimum subset of just the first few stages of the construction: a base table with MV log, one first-level aggregate MV with its own MV log, and two aggregate MVs based on the first MV.

drop materialized view log on req_line;
drop materialized view log on jpl_req_group_numlines;

drop materialized view jpl_req_group_numlines;
drop materialized view jpl_req_numsel;
drop materialized view jpl_req_basis;

drop table req_line;

-- ----------
-- Base Table
-- ----------

create table req_line(
        eventid         number(10,0),
        selected        number(10,0),
        req             number(10,0),
        basis           number(10,0),
        lnid            number(10,0),
        area            varchar2(10),
        excess          number(10,0),
        available       number(10,0),
        kk_id           number(10,0),
        eventdate       number(10,0),
        rs_id           number(10,0)
)
;

-- --------------------
-- MV log on base table
-- --------------------

create materialized view log 
on
req_line
with rowid(
        req, basis, lnid, eventid, selected, area,
        excess, available, kk_id, eventdate, rs_id
)
including new values
;

-- --------------------
-- Level 1 aggregate MV
-- --------------------

create materialized view jpl_req_group_numlines(
        eventid, selected, 
        row_ct, req_ct, basis_ct, req, basis, 
        maxlnid, excess, numsel, area, available, kk_id, 
        rs_id, eventdate
)
segment creation immediate
build immediate
refresh fast on demand 
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(lnid)       maxlnid,
        excess,
        count(selected) numsel,
        area,
        available,
        kk_id,
        rs_id,
        eventdate
from 
        req_line
group by 
        eventid, selected, area, excess,
        available, kk_id, eventdate, rs_id
;

-- ------------------------
-- MV log on first level MV
-- ------------------------

create materialized view log 
on
jpl_req_group_numlines
with rowid 
(
        eventid, area, selected, available,
        basis, req, maxlnid, numsel
)
including new values
;


-- ----------------------------
-- First "level 2" aggregate MV
-- ----------------------------

create materialized view jpl_req_numsel(
        eventid, selected, 
        row_ct, totalreq_ct, totalbasis_ct, totalreq, totalbasis, 
        maxlnid, numsel_ct, numsel, area
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel_ct,
        sum(numsel)     numsel,
        area
from 
        jpl_req_group_numlines
group by 
        eventid, selected, area
;


-- -----------------------------
-- Second "level 2" aggregate MV
-- -----------------------------

create materialized view jpl_req_basis(
        eventid, 
        row_ct, totalbasis_ct, totalreq_ct, totalbasis, totalreq, 
        area, selected, available, maxlnid ,
        numsel_ct, numsel
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        count(*)        row_ct,
        count(basis)    totalbasis_ct,
        count(req)      totalreq_ct,
        sum(basis)      totalbasis,
        sum(req)        totalreq,
        area,
        selected,
        available,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel,
        sum(numsel)     numsel
from
        jpl_req_group_numlines
group by 
        eventid, area, available, selected
;

Once the table, MV logs and MVs exist we can insert some data into the base table, then try refreshing the views. I have tried three different calls to the dbms_refresh package, dbms_mview.refresh_all_mviews(), dbms_mview.refresh_dependent(), and dbms_mview.refresh(), specifying the ‘F’ (fast) refresh method, atomic refresh, and nested. All three fail in the same way on 12.2.0.1. The code below shows only the refresh_dependent() call.

I’ve included a query to report the current state of the materialized views before and after the calls, and set a two second sleep before the refresh so that changes in “last refresh” time will appear. The final queries are just to check that the expected volume of data has been transferred to the materialized views.


-- ------------------------------------
-- Insert some data into the base table
-- ------------------------------------

begin
        for i in 1..100 loop
                execute immediate 'insert into req_line values( :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx)' 
                using i,i,i,i,i,i,i,i,i,i,i;
                commit;
        end loop;
end;
/

set linesize 144
column mview_name format a40

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
ORDER by
        last_refresh_date, mview_name
;

prompt  Waiting for 2 seconds to allow refresh time to change

execute dbms_lock.sleep(2)

declare
        m_fail_ct       number(6,0);
begin
        dbms_mview.refresh_dependent(
                number_of_failures      => m_fail_ct,
                list                    => 'req_line',
                method                  => 'F',
                nested                  => true,
                atomic_refresh          => true
        );

        dbms_output.put_line('Failures: ' || m_fail_ct);
end;
/

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
order by
        last_refresh_date, mview_name
;

-- --------------------------------
-- Should be 100 rows in each table
-- --------------------------------

select count(*) from jpl_req_basis;
select count(*) from jpl_req_group_numlines;
select count(*) from jpl_req_numsel;

Both the earlier versions of Oracle are happy with this code and refresh all three materialized view without fail. Oracle 12.2.0.1 crashes the procedure call with a deadlock error which, when traced, shows itself to be a self-deadlock while attempting to select a data dictionary row for update:


MVIEW_NAME                               STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_BASIS                            FRESH		     VALID		 COMPLETE 19-jan 14:03:01
JPL_REQ_GROUP_NUMLINES			 NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_NUMSEL                           FRESH		     VALID		 COMPLETE 19-jan 14:03:01

3 rows selected.

Waiting for 2 seconds to allow refresh time to change

PL/SQL procedure successfully completed.

declare
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2952
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 85
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 245
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 1243
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2414
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2908
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3699
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3723
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 75
ORA-06512: at line 4


MVIEW_NAME				 STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_NUMSEL                           NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_BASIS                            FRESH		     VALID		 FAST	  19-jan 14:03:04
JPL_REQ_GROUP_NUMLINES                   FRESH		     VALID		 FAST	  19-jan 14:03:04

The deadlock graph from the trace file, with a little extra surrounding information, looks like this:


Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00020009-00000C78-A9B090F8-00000000         26      14     X        40306      26      14           X  40306


*** 2018-01-19T14:18:03.925859+00:00 (ORCL(3))
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=2vnzfjzg6px33) -----
select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400,  flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq  from sys.mlog$ where mowner = :1 and master = :2 for update
----- PL/SQL Stack -----

So far I haven’t been able to spot whether or not I’m doing something wrong, or prohibited, and I haven’t been able to find a matching problem on MoS. Since the code works on 11gR2 and 12cR1 I’m inclined to believe it’s a bug introduced in the 12cR2 timeline – which is a nuisance for my client, but if it is a bug then perhaps a fix will appear fairly promptly.

January 9, 2018

ASSM argh!

Filed under: 12c,ASSM,Bugs,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 5:53 pm BST Jan 9,2018

Here’s a problem with ASSM that used to exist in older versions of Oracle had disappeared by 11.2.0.4 and then re-appeared in 12.1.0.2 – disappearing again by 12.2.0.1. It showed up on MoS a few days ago under the heading: “Insert is running long with more waits on db file sequential read”.

The obvious response to this heading is to question the number of indexes on the table – because big tables with lots of indexes tend to give you lots of random I/O as Oracle maintains the indexes – but this table had no indexes. The owner of the problem supplied several of bits of information in the initial post, with further material in response to follow-up questions, including the tkprof summary of the 10046/level 12 trace of the insert and two extracts from the trace file to show us some of the “db file sequential read” waits – the first extract made me wonder if there might be some issue involving 16KB blocks but the second one dispelled that illusion.

There are several buggy things that can appear with ASSM and large-scale DML operations, and sometimes the problems can appear long after the original had done the dirty deed, so I thought I’d create a simple model based on the information supplied to date – and discovered what the problem (probably) was. Here’s how it starts – I’ve created a tablespace using ASSM, and in this tablespace I’ve created a table which has 48 columns with a row length of 290 bytes (roughly matching the OP’s table), and I’ve hacked out a simple PL/SQL block that loops around inserting arrays of 100 rows at a time into the table for a total of 1M rows before committing.


rem
rem     Script:         assm_cleanout.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2        Lots of blocks left "not full"
rem             11.2.0.4
rem

rem
rem     using OMF, so no file-name needed
rem     Ran this bit as SYS, and altered test user to have unlimited quota
rem

/*
create
        tablespace test_8k_assm
        datafile size 1G
        extent management local
        autoallocate
        segment space management auto
;
*/

rem
rem     Create the table, pre-allocate some space.
rem     This means we should get consistent 8M extents and not initial little ones
rem

create table t1 (
        v001 varchar2(5), v002 varchar2(5), v003 varchar2(5), v004 varchar2(5), v005 varchar2(5),
        v006 varchar2(5), v007 varchar2(5), v008 varchar2(5), v009 varchar2(5), v010 varchar2(5),
        v011 varchar2(5), v012 varchar2(5), v013 varchar2(5), v014 varchar2(5), v015 varchar2(5),
        v016 varchar2(5), v017 varchar2(5), v018 varchar2(5), v019 varchar2(5), v020 varchar2(5),
        v021 varchar2(5), v022 varchar2(5), v023 varchar2(5), v024 varchar2(5), v025 varchar2(5),
        v026 varchar2(5), v027 varchar2(5), v028 varchar2(5), v029 varchar2(5), v030 varchar2(5),
        v031 varchar2(5), v032 varchar2(5), v033 varchar2(5), v034 varchar2(5), v035 varchar2(5),
        v036 varchar2(5), v037 varchar2(5), v038 varchar2(5), v039 varchar2(5), v040 varchar2(5),
        v041 varchar2(5), v042 varchar2(5), v043 varchar2(5), v044 varchar2(5), v045 varchar2(5),
        v046 varchar2(5), v047 varchar2(5), v048 varchar2(5)
)
segment creation immediate
tablespace test_8k_assm
storage(initial 8M)
;

alter table t1 allocate extent (size 8M);
alter table t1 allocate extent (size 8M);

rem
rem     Simple anonymous pl/sql block
rem     Large insert, handled with array inserts
rem     Can modify loop count and array size very easily
rem

declare
        type tab_array is table of t1%rowtype;
        junk_array tab_array;
begin

        select
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx'
        bulk collect into
                junk_array
        from
                all_objects
        where
                rownum  <= 100 -- > comment to avoid WordPress format issue
        ;

        for i in 1..10000 loop
                forall j in 1..junk_array.count
                        insert into t1 values junk_array(j) ;
        end loop;

end;

commit;

The number of rows per block after this insert is 24, with 1038 bytes free space left (808 due to the pctfree = 10, then the bit that was too small to take a 25th row before breaching the pctfree barrier). This means we should report 1M/24 = 41,666 full blocks and one block with some free space. So we query the table using the dbms_space package:


declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.new_line;
        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));

end;
/

The results aren’t what we expect:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       35,001 /      286,728,192
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :        6,665 /       54,599,680

We have one block marked as 25 – 50% free (that’s the one block with 16 rows in it, which means about 40% space currently free) but our 41,666 full blocks are actually reported as 6,665 full blocks and 35,001 blocks with some space available. That’s going to hurt eventually if some process wants to insert more rows and finds that it has to fail its way through 35,001 blocks before finding a block which has enough free space.

So what happens when I repeat the PL/SQL block (and commit)? Here are the results from calls to dbms_space after the next two cycles:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       70,002 /      573,456,384
Freespace 2 ( 25 -  50% free) :            2 /           16,384
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          220 /        1,802,240
Full                          :       13,330 /      109,199,360

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :      105,003 /      860,184,576
Freespace 2 ( 25 -  50% free) :            3 /           24,576
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          117 /          958,464
Full                          :       19,995 /      163,799,040

Every time we execute the PL/SQL block we leave a trail of 35,001 more blocks which are flagged as “not quite full”.

Looking at the session stats while running the insert loop I can tell that Oracle isn’t checking to see whether or not it should be using those blocks. (A quick way of proving this is to flush the buffer cache before each execution of the PL/SQL and note that Oracle doesn’t read back the 105,000 blocks before inserting any data). So somehow, sometime, someone might get a nasty surprise – and here’s one way that it might happen:

Since I know I my data fits 24 rows per block I’m going to modify my PL/SQL block to select one row into the array then loop round the insert 25 times – so I know I’m inserting a little bit more than one block’s worth of data. Starting from the state with 105,003 blocks marked as “Freespace 1” this is what I saw – first, the free space report after inserting 25 rows:


Unformatted                   :          240 /        1,966,080
Freespace 1 (  0 -  25% free) :        1,074 /        8,798,208
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :      123,927 /    1,015,209,984

Then a few wait events and session statistics for the insert:


---------------------------------------------------------
SID:    39:TEST_USER - jonathan
Session Events - 09-Jan 16:57:18
Interval:-      6 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
db file sequential read                          15,308           0         128        .008           3
db file scattered read                           20,086           0         271        .014           4

---------------------------------
Session stats - 09-Jan 16:57:18
Interval:-  6 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
session logical reads                                                  269,537
physical read total IO requests                                         35,401
db block gets                                                          229,522
consistent gets                                                         40,015
physical reads                                                         124,687
physical reads cache                                                   124,687
db block changes                                                       208,489
physical reads cache prefetch                                           89,293
redo entries                                                           207,892
redo size                                                           16,262,724
undo change vector size                                                  1,720
deferred (CURRENT) block cleanout applications                         103,932
table scan blocks gotten                                                20,797
HSC Heap Segment Block Changes                                              25

The session has read and updated almost all of the level 1 bitmap blocks. I don’t know exactly what triggered this re-read, but seems to be related to the number of rows inserted (or, perhaps, the amount of space used rather than the row count) as an insert crosses the pctfree boundary and fails over to the next block. I’ve only done a couple of little tests to try and get a better idea of why an insert sometimes sweeps through the bitmap blocks – so I know that inserting 2 or 3 rows at a time will also trigger the cleanout – but there are probably several little details involved that need to be identified.

You might note a couple of details in the stats:

  • Because I had flushed the buffer cache before the insert Oracle did its “cache warmup” tablescanning trick – if this had not happened I would probably have done a single block read for every single bitmap block I touched.
  • There are 103,932 block cleanout applications – but 208,000 db block changes and redo entries. Roughly half the latter are for data block cleanouts (OP code 4.1) and half are the state changes on the level 1 bitmap blocks (OP code 13.22). You’ll notice that neither change produces any undo.
  • I’ve also included the HSC Heap Segment Block Changes statistics to show you that not all changes to Heap Segment Blocks show up where you might expect them.

And finally:

If you re-run the tests on 11.2.0.4 and 12.2.0.1 you get the following results after the intial script run – the problem doesn’t exist:


11.2.0.4
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

12.2.0.1
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

So how does this help the OP.

  • First, there may be a huge mess still waiting to be cleaned in the table – but at 34M blocks I’m not too keen to suggest running the dbms_space routine to find out what it looks like – but maybe that’s necessary.
  • Secondly – an upgrade to 12.2 will probably avoid the problem in future.
  • Thirdly – if the number of rows per block is very close to uniform, write a little code to do a loop that inserts (say) 2 * expected number of rows per block as single row inserts and rolls back; the inserts will probably trigger a massive though perhaps not complete cleanout, so rinse and repeat until the cleanout is complete. Try to find a time when you don’t mind the extra load to get this job done.
  • Finally – on the big job that does the bulk insert – repeat the dummy insert/rollback at the end of the job to clean up the mess made by the job.

Addenda

Prompted by comment #2 below, I should add that if the problem has been fixed in 12.2 then possibly there’s a bug report and patch for it already. If there isn’t then the OP could raise an SR (referencing this blog note), and request a bug fix or back-port from 12.2.

And with 24 hours of publication, comment #4 (from Yury Pudovchenko) tells us that the bug is fixed by the Oct 2017 Bundle Patch.

 

 

August 14, 2017

Join Elimination Bug

Filed under: Bugs,CBO,Oracle — Jonathan Lewis @ 11:59 am BST Aug 14,2017

A few years ago a bug relating to join elimination showed up in a comment to a post I’d done about the need to keep on testing and learining. The bug was visible in version 11.2.0.2 and, with a script to replay it, I’d found that it had disappeared by 11.2.0.4.

Today I had a reason to rediscover the script, and decided to test it against 12.2.0.1 – and found that the bug was still present (though it seemed to have been fixed – temporarily – in 12.1.0.2)

Here’s the model:


rem     Script:         join_eliminate_bug_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2012

drop table child purge;
drop table parent purge;

create table parent (
        id      number(4),
        name    varchar2(10),
        constraint par_pk primary key (id)
        deferrable initially immediate
)
;

create table child(
        id_p    number(4)       
                constraint chi_fk_par
                references parent,
        id      number(4),
        name    varchar2(10),
        constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

insert into child values(2,1,'Jack');
insert into child values(2,2,'Jill');

commit;

begin
        dbms_stats.gather_table_stats(user,'child');
        dbms_stats.gather_table_stats(user,'parent');
end;
/

set serveroutput off

select
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);

The setup is just to show you the correct results with join elimination taking place. Here’s the output from the query and the actual execution plan:

      ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          2 Sally
         2          1 Jack
         2          2 Jill

4 rows selected.


PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  1whubydgj8w0s, child number 0
-------------------------------------
select  chi.* from  child chi,  parent par where  par.id = chi.id_p

Plan hash value: 2406669797

-----------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |    11 |
|   1 |  TABLE ACCESS FULL| CHILD |     4 |    48 |    11 |
-----------------------------------------------------------

On a single column join, with referential integrity in place, and no columns other than the primary key involved, the optimizer eliminates table parent from the query. But if I now defer the primary key constraint on parent and duplicate every row (which ought to duplicate the query result), watch what happens with the query:


set constraint par_pk deferred;

insert into parent (id,name) values (1,'Smith');
insert into parent (id,name) values (2,'Jones');

alter system flush shared_pool;

select
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);


      ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          2 Sally
         2          1 Jack
         2          2 Jill

4 rows selected.


PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  1whubydgj8w0s, child number 0
-------------------------------------
select  chi.* from  child chi,  parent par where  par.id = chi.id_p

Plan hash value: 2406669797

-----------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |    11 |
|   1 |  TABLE ACCESS FULL| CHILD |     4 |    48 |    11 |
-----------------------------------------------------------

I get the same plan, so I get the same results – and notice that I flushed the shared pool before repeating the query so I haven’t fooled Oracle into reusing the wrong plan by accident – it’s a whole new freshly optimized plan.

Just to show what ought to happen here’s the last bit of the test case:


select  /*+ no_eliminate_join(@sel$1 par@sel$1) */
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);


      ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          2 Sally
         1          1 Simon
         1          2 Sally
         2          1 Jack
         2          2 Jill
         2          1 Jack
         2          2 Jill

8 rows selected.


PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  5p8sp7k8b0fgq, child number 0
-------------------------------------
select /*+ no_eliminate_join(@sel$1 par@sel$1) */  chi.* from  child
chi,  parent par where  par.id = chi.id_p

Plan hash value: 65982890

-----------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |       |       |     5 |
|   1 |  NESTED LOOPS                |        |     4 |    60 |     5 |
|   2 |   NESTED LOOPS               |        |     4 |    60 |     5 |
|   3 |    INDEX FULL SCAN           | PAR_PK |     2 |     6 |     1 |
|*  4 |    INDEX RANGE SCAN          | CHI_PK |     2 |       |     1 |
|   5 |   TABLE ACCESS BY INDEX ROWID| CHILD  |     2 |    24 |     2 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PAR"."ID"="CHI"."ID_P")


I ran this test on 11.2.0.4 – and then repeated it on 12.2.0.1: the bug is still present (although I thought I’d seen a MoS note saying it had been fixed in 12.1).

It’s always a little dangerous playing around with deferrable constraints – my view is that you should keep the interval of deferment as short as possible and don’t try to use it for doing anything other than correcting known data errors. At present if you have code that defers constraints and runs non-trivial queries afterwards you might want that code to start with an “alter session” to set the hidden parameter _optimizer_join_elimination_enabled to false (after checking with Oracle support, of course).

Update (May 2019)

I’ve just rediscovered this note and find myself surprised by all the “it’s working for me” comments – especially since I’m still getting inconsistent results (i.e. the results change) when I disable join-elimination all the way through to 19.2.  Maybe some of the comments are really trying to say: “I see the same thing”. If you want to see the effects there is a version of the script on LiveSQL which can only be reached by private link.

 

March 23, 2017

min/max Upgrade

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle,Troubleshooting — Jonathan Lewis @ 8:53 am BST Mar 23,2017

A question came up on the OTN database forum a little while ago about a very simple query that was taking different execution paths on two databases with the same table and index definitions and similar data. In one database the plan used the “index full scan (min/max)” operation while the other database used a brute force “index fast full scan” operation.

In most circumstances the starting point to address a question like this is to check whether some configuration details, or some statistics, or the values used in the query are sufficiently different to result in a significant change in costs; and the first simple procedure you can follow is to hint each database to use the plan from the opposite database to see if this produces any clues about the difference – it’s a good idea when doing this test to use one of the more verbose formatting options for the call to dbms_xplan.

In this case, though, the OP discovered a note on MoS reporting exactly the problem he was seeing:

Doc ID 2144428.1: Optimizer Picking Wrong ‘INDEX FAST FULL SCAN’ Plan vs Correct ‘INDEX FULL SCAN (MIN/MAX)’

which referred to

Bug 22662807: OPTIMIZER PICKING INDEX FFS CAN INSTEAD OF MIN/MAX

Conveniently the document suggested a few workarounds:

  • alter session set optimizer_features_enable = ‘11.2.0.3’;
  • alter session set “_fix_control” = ‘13430622:off’;
  • delete object stats [Ed: so that dynamic sampling takes place … maybe a /*+ dynamic_sampling(alias level) */ hint would suffice].

Of the three options my preference would (at least in the short term) be the _fix_control one. Specifically, from the v$system_fix_control view, we can see that it addresses the problem very precisely with the description: “index min/max cardinality estimate fix for filter predicates”.

The example in the bug note showed a very simple statement (even more simple than the OP’s query which was only a single table query anyway), so I thought I’d build a model and run a few tests to see what was going on. Luckily, before I’d started work, one of the other members of the Oak Table network sent an email to the list asking if anyone knew how the optimizer was costing an example he’d constructed – and I’ve finally got around to looking at his example, and here’s the model and answer(s), starting with the data set:


rem
rem     Script:         test_min_max.sql
rem     Dated:          March 2017
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.2.0.3
rem

create table min_max_test nologging
as
with ids as (
        select /*+ Materialize */ rownum  id from dual connect by rownum <= 50000 -- > comment to protect formatting
),
line_nrs as (
        select /*+ Materialize */  rownum line_nr from dual connect by rownum <= 20 -- > comment to protect formatting
)
select
        id, line_nr ,rpad(' ', 800, '*') data
from
        line_nrs, ids
order by
        line_nr, id
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'min_max_test',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index mmt_ln_id on min_max_test (line_nr, id) nologging;
create index mmt_id    on min_max_test (id)          nologging;

The table has two critical columns: each id has 20 line_nr values associated with it, but the way the data was generated means that the line numbers for a given id are scattered across 20 separate table blocks.

There are two indexes – one on the id which will allow us to find all the rows for a given id as efficiently as possible, and one (slightly odd-looking in this context) that would allow us to find a specific row for a given line_nr and id very efficiently. Two things about these indexes – in a live application they should both be compressed on the first (only, in the case of index mmt_id) column, and secondly the necessity of the mmt_id index is questionable and it might be an index you could drop if you reversed the order of the columns in mmt_ln_id. The thing about these indexes, though, is that they allow us to demonstrate a problem. So let’s query the data – twice, hinting each index in turn:


variable b1 number;
exec :b1 := 50000;

set serveroutput off

select
        /*+ index(t(id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

select
        /*+ index(t(line_nr, id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

It’s fairly safe to make a prediction about the execution plan and cost of the first query – it’s likely to be a range scan that accesses a couple of branch blocks, a leaf block and 20 separate table blocks followed by a “sort aggregate” – with a cost of about 23.

It’s a little harder to make a prediction about the second query. The optimizer could infer that the min(line_nr) has to be close to the left hand section of the index, and could note that the number of rows in the table is the same as the product of the number of distinct values of the two separate columns, and it might note that the id column is evenly distributed (no histogram) across the data, so it might “guess” that it need only range scan all the entries for the first line_nr to find the appropriate id. So perhaps the optimizer will use the index min/max range scan with a cost that is roughly 2 branch blocks plus total leaf blocks / 20 (since there are 20 distinct values for line_nr); maybe it would divide the leaf block estimate by two because “on average” – i.e. for repeated random selections of value for id – it would have to scan half the leaf blocks. There were 2,618 leaf blocks in my index, so the cost should be close to either 133 or 68.

Here are the two plans – range scan first, min/max second:


select  /*+ index(t(id)) */  min(line_nr) from  min_max_test t where id = :b1
-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       |    23 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MIN_MAX_TEST |    20 |   160 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | MMT_ID       |    20 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"=:B1)

select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |    22 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |    22   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Spot on with the estimate for the simple range scan – but what did we do wrong with the estimate for the min/max scan ? You might notice in the first example the “table access by rowid batched” and realise that this is running on 12c. Here’s the plan if I get if I set the optimizer_features_enable back to 11.2.0.3 before running the second query again:


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   136 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |   136   (1)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |   136   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Using the 11.2.0.3 optimizer model the plan has a cost that’s very close to our prediction – we’ll see why there’s a slight difference in a moment. If we set the optimizer_features_enable to 11.2.0.4 the cost drops back to 22. So for our example 11.2.0.3 will use the simple “index range scan” and an upgrade to 11.2.0.4 (or higher) will switch to the “index full scan (min/max)”. If you look at the OTN posting the impact of the change in costing is exactly the other way around – 11.2.0.3 uses the min/max path, 11.2.0.4 uses the simple index range scan.

The techy bit

You really don’t need to know this – experimenting with the optimizer_features_enable (or _fix_control) will give you plans that show you all the numbers you need to see to check whether or not you’ve run into this particular problem – but if you’re interested here’s a little bit from the two 10053 trace files. We need only look at a few critical lines. From the 11.2.0.3 costing for the min/max scan:


Index Stats::
  Index: MMT_ID  Col#: 1
  LVLS: 2  #LB: 2202  #DK: 50000  LB/K: 1.00  DB/K: 20.00  CLUF: 1000000.00  NRW: 1000000.00
  Index: MMT_LN_ID  Col#: 2 1
  LVLS: 2  #LB: 2618  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 125000.00  NRW: 1000000.00

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 135.000000  resc_cpu: 961594
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 135.697679  Resp: 135.697679  Degree: 1

I was running 12.1.0.2 so there were a few extra bits and pieces that I’ve deleted (mostly about SQL Plan Directives and in-memory). Critically we can see that the stats collection has a small error for the ID column – 50,536 distinct values (NDV) instead of exactly 50,000. This seems to have given us a cost for the expected index range of: 2 (blevel) + ceiling(2618 (leaf blocks) * 50536 / 1000000) = 2 + ceil(132.3) = 135, to which we add a bit for the CPU and get to 136. (Q.E.D.)

Then we switch to costing for 11.2.0.4:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 21.787874  resc_cpu: 156872
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 22.324608  Resp: 22.324608  Degree: 1

We still have the small error in the number of distinct values for id, so the estimated number of rows that we need to access from the table for a given id (before “aggregating” to find its minimum line_nr) is 19.787874 (Computed: / Non Adjusted:) rather than exactly 20. Notice, then, that the cost of using the index is 19.787874 + 2 which looks suspiciously like adding the blevel to the number of table blocks to get a cost and forgetting that we might have to kiss a lot of frogs before we find the prince. Basically, in this example at least, it looks like the costing algorithm has NOTHING to do with the mechanics of what actually has to happen at run-time.

Footnote

This is only an initial probe into what’s going on with the min/max scan; there are plenty more patterns of data that would need to be tested before we could have any confidence that we had produced a generic model of how the optimizer does its calculations – the only thing to note so far is that there IS a big change as  you move from 11.2.0.3 to later versions: the case on OTN showed the min/max scan disappearing on the upgrade, the example above shows the min/max disappearing on the downgrade – either change could be bad news for parts of a production system.

There are a couple of related bugs that might also be worth reviewing.

  • Bug 11834402 : CBO CHOOSES A SLOW INDEX FULL SCAN OVER A MUCH FASTER INDEX RANGE SCAN
  • Bug 13430622 : INDEX SCAN IN VERY SLOW FOR ONE PREDICATE AND FAST FOR OTHERS

There is a note, though that this last bug was fixed in 12.1

Footnote 2

When experimenting, one idea to pursue as the models get more complex and you’re using indexes with more than two columns is to test whether the presence of carefully chosen column group statistics might make a difference to the optimizer’s estimates of cardinality (hence cost) of the min/max scan.

January 10, 2017

Join Elimination 12.2

Filed under: 12c,Bugs,Execution plans,Oracle — Jonathan Lewis @ 1:03 pm BST Jan 10,2017

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:


rem
rem	Script:		join_elimination_12c2.sql
rem	Author:		Jonathan Lewis
rem	

-- Environment details eliminated

define m_pad=100

/*
	IDs will be 1 to 1000
*/

create table grandparent 
as
select 
	rownum			id,
	trunc((rownum-1)/5)	small_num_g,
	rpad(rownum,10)		small_vc_g,
	rpad(rownum,&m_pad)	padding_g
from 
	all_objects 
where 
	rownum <= 1000
;

/*
	Each GP has two (scattered) children here
	Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
	1+mod(rownum,1000)	id_g,
	rownum			id,
	trunc((rownum-1)/5)	small_num_p,
	rpad(rownum,10)		small_vc_p,
	rpad(rownum,&m_pad)	padding_p
from 
	all_objects 
where 
	rownum <= 2000
;

/*
	Simple trick to get 5 (clustered) children per parent
	Child IDs are 1 to 12,000
*/

create table child 
as
select 
	id_g,
	id			id_p,
	rownum			id,
	trunc((rownum-1)/5)	small_num_c,
	rpad(rownum,10)		small_vc_c,
	rpad(rownum,&m_pad)	padding_c
from 
	parent	p,
	(
		select /*+ no_merge */ 
			rownum 
		from	parent p 
		where	rownum <= 5
	)	d
;

create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem	Don't need to collect stats because it's 12c
rem

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered gp -> p -> c
prompt	The final plan is GP->C, The optimizer eliminated P before
prompt	considering GP
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered c -> p -> gp
prompt	The final plan is a tablescan of C only. The optimizer managed 
prompt	to eliminate GP first and P second
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	child		c,
	parent		p,
	grandparent	g 
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	==================================================
prompt	Convert to ANSI standard in the order gp -> p -> c
prompt	and both gp and p eliminated.
prompt	==================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g
join
	parent		p
on	p.id_g = g.id
join
	child		c
on	c.id_p = p.id
and	c.id_g = p.id_g
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===================================================
prompt	Convert to ANSI standard in the order c -> p -> gp
prompt	and only p is eliminated. 
prompt	===================================================

select 
	count(c.small_vc_c)
from 
	child		c
join
	parent		p
on      p.id   = c.id_p 
and	p.id_g = c.id_g 
join
	grandparent	g
on	g.id = p.id_g 
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.

Footnote:

If you’re wondering why the “traditional” and “ANSI” syntax should exhibit this flaw for joins in the opposite direction – remember that ANSI SQL is first transformed into an equivalent Oracle form and – in the simple cases – the first two tables form the first query block then each table after that introduces a new query block, so the optimizer strategy does (approximately) the following translation:


select ... from grandparent join parent join child

==>

select ... from (select ... from grandparent join parent) join child

The optimizer then optimizes the inline query, which eliminates grandparent leaving a join between parent and child, which then allows parent to be eliminated.

Conversely we get:

select ... form child join parent join grandparent 

==>

select ... from (select ... from child join parent) join grandparent

In this form the optimizer eliminates parent from the inline view and is left with a join between child and grandparent – so no further elimination.

Next Page »

Powered by WordPress.com.