Oracle Scratchpad

January 30, 2023

Lost Or-Expand

Filed under: 12c,19c,CBO,Oracle,Transformations,Upgrades — Jonathan Lewis @ 1:39 pm GMT Jan 30,2023

I’ve commented previously on the “new” cost-based Or-Expansion introduced in 12c to replace the “legacy” Concatenation transformation, and I’ve been re-running some of my concatenation scripts to see whether the most recent versions of the optimizer will use Or-expansion unhinted in places where I’ve previously had to use hints to force concatenation to appear.

The latest test has produced a surprising result – I’ve got an example where 19c and 21c will use concatenation when hinted with use_concat(), but will not obey the or_expand() hint on the grounds that there’s “No valid predicate for OR expansion”

It’s worth knowing this could happen if you’re upgrading from 11g to 19c (as many people seem to be doing at present) as you may find that you have some statements that used to use concatenation unhinted, but now need to be hinted to do so as they can’t switch to or-expansion and won’t use concatenation unless hinted to do so.

tl;dr (the rest of the note is just a demonstration.) When you upgrade from 11g to 19c (or later) you may find that some queries perform badly because they stop using the legacy “concatenation” operator but can’t be transformed by the new “cost-based Or Expand” operator, and need to be hinted with a use_concat() hint.

Here’s a statement I can use to demonstrate the effect – I’ll post the code to create the tables at the end of the note:

select  /*+ gather_plan_statistics */
        n1, n2, small_vc
from
        t1
where
        (n1 = 1 and n2 = 10000)
or      (n1 = 10000 and n2 = 1)
;

I’ve rigged the data so that there are 9,999 distinct values of n1 each with one row, and 10,001 rows with the value 10,000; and I’ve done the same with n2 – 9,999 distinct values with one row each and 10,001 rows with the value 10,000.

I’ve gathered stats that include histograms on n1 and n2 (separately) and I’ve created indexes on n1 and n2 (separately). As a result the ideal path for this query is to use the index on n1 to find rows for the first of the two compound predicates and use the index on n2 to find rows for the second of the predicates, which should be possible if the optimizer first transforms the query using OR-expansion.

You’ll notice I’ve included the hint to capture rowsource execution statistics, so I’ll be executing this query with various hints and reporting the actual execution plans and workload. Using 19.11.0.0 and 21.3.0.0 with no special parameter settings the execution plan that appeared used B-tree/bitmap conversion:

| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |    45 (100)|      2 |00:00:00.01 |      50 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |    45   (3)|      2 |00:00:00.01 |      50 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |      1 |        |            |      2 |00:00:00.01 |      48 |
|   3 |    BITMAP OR                        |       |      1 |        |            |      1 |00:00:00.01 |      48 |
|   4 |     BITMAP AND                      |       |      1 |        |            |      1 |00:00:00.01 |      24 |
|   5 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|*  6 |       INDEX RANGE SCAN              | T1_N1 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
|   7 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |      22 |
|*  8 |       INDEX RANGE SCAN              | T1_N2 |      1 |        |    21   (0)|  10001 |00:00:00.01 |      22 |
|   9 |     BITMAP AND                      |       |      1 |        |            |      1 |00:00:00.01 |      24 |
|  10 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|* 11 |       INDEX RANGE SCAN              | T1_N2 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
|  12 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |      22 |
|* 13 |       INDEX RANGE SCAN              | T1_N1 |      1 |        |    21   (0)|  10001 |00:00:00.01 |      22 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("N1"=1)
   8 - access("N2"=10000)
  11 - access("N2"=1)
  13 - access("N1"=10000)

This is a fairly clever plan but not what I wanted to test so I set the hidden parameter ‘_b_tree_bitmap_plans’ to false for all subsequent tests. With this block in place the plan changed to a full tablescan:

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    99 (100)|      2 |00:00:00.01 |     349 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |      1 |    99   (2)|      2 |00:00:00.01 |     349 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((("N1"=1 AND "N2"=10000) OR ("N1"=10000 AND "N2"=1)))


Definitely not what I wanted – so I added a hint telling the optimizer I wanted to see OR-expansion. The optimizer produced the same full tablescan! Since I had included the format option ‘hint_report’ in my call to dbms_xplan.display_cursor() I can show you the extra lines of output that explained why the optimizer “ignored” my hint:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         U -  or_expand(@sel$1 (1) (2)) / No valid predicate for OR expansion

As you can see the hint was not “N – unresolved” or “E – Syntax error”. It was recognised, syntactically correct, notionally applicable but unused because the optmizer couldn’t see a way to use it (even though we can see an obvious way to use it).

Idle curiosity then prompted me to try the use_concat() hint, in the form: “use_concat(@sel$1 1)” – here’s the resulting execution plan:

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     4 (100)|      2 |00:00:00.01 |       7 |
|   1 |  CONCATENATION                       |       |      1 |        |            |      2 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|*  3 |    INDEX RANGE SCAN                  | T1_N2 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |
|*  4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|*  5 |    INDEX RANGE SCAN                  | T1_N1 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"=10000)
   3 - access("N2"=1)
   4 - filter(("N2"=10000 AND (LNNVL("N2"=1) OR LNNVL("N1"=10000))))
   5 - access("N1"=1)

Exactly the plan I wanted to see from or_expand(), although the two subqueries are in the reverse order to the order I would expect from or_expand(). So the new cost-based or-expansion says there’s no valid predicate available for expansion, but the old, deprecated, heuristic, concatenation transformation manages to find a disjunct (OR) that can be expanded.

Of course the next thing to do is look at the predicted cost and actual work (mostly buffer gets) that Oracle reported for each plan:

  • bitmap conversion: (cost 45, buffers 50)
  • full tablescan: (cost 99, buffers 349)
  • concatenation: (cost 4, buffers 7)

The predicted costs are actually fairly consistent with buffer gets (which, if I flushed the cache, would also be mostly disk reads). I had been fairly impressed that the optimizer picked bitmap conversion, but it would have been so much better if the optimizer could see that this (slightly complex) set of predicates included an opportunity for or-expansion.

Footnote 1

This query shows an example of disjunctive normal form (DNF), i.e the where clause is a disjunct (OR) of conjuncts (ANDs). I understand that optimizers (in general) quite like this form, but there is another “nice” form which is CNF (conjunctive normal form) i.e. where the where clause is a conjuct (AND) of disjuncts (ORs). So, for entertainment, I rewrote the where clause in conjunctive normal form. You have to be a little careful when you play the “normal form” game, it’s quite easy to get it wrong, so here are the steps I took (using A, B, C, D instead of my 4 atomic predicates):

(A and B) or (C and D) ==
        (A or (C and D)) and (B or (C and D)) ==               -- distributing the (A and B)
        (A or C) and (A or D) and (B or C) and (B or D)        -- distributing the two occurrences of (C and D)

Here’s the restulting query and unhinted execution plan after substituting “n = 1” etc. back into the symbolic presentation (and it probably gives you some idea why I played safe by starting with A, B, C, D):

select  /*+ gather_plan_statistics */
        n1, n2, small_vc
from
        t1
where
        (n1 = 1 or n2 = 1) 
and     (n1 = 1 or n1 = 10000) 
and     (n2 = 10000 or n2 = 1)
and     (n2 = 10000 or n1 = 10000)
;

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                 |      1 |        |     4 (100)|      2 |00:00:00.01 |       7 |
|   1 |  VIEW                                 | VW_ORE_BA8ECEFB |      1 |      2 |     4   (0)|      2 |00:00:00.01 |       7 |
|   2 |   UNION-ALL                           |                 |      1 |        |            |      2 |00:00:00.01 |       7 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1              |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN                  | T1_N1           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |
|*  5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1              |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|*  6 |     INDEX RANGE SCAN                  | T1_N2           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("N2"=10000)
   4 - access("N1"=1)
   5 - filter(("N1"=10000 AND LNNVL("N1"=1)))
   6 - access("N2"=1)

It’s the OR-expansion I wanted to see.

If I can do an algorithmic rewrite that produces the desired plan the optimizer can be coded to do the rewrite – so I think you can expect to see this limitation removed at some future point. This plan, however, did still depend on my disabling B-tree/bitmap conversion; when I enabled B-tree/bimap conversion the optimizer used it to produce the following plan:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |     2 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |      1 |        |            |      2 |00:00:00.01 |       4 |
|   3 |    BITMAP OR                        |       |      1 |        |            |      1 |00:00:00.01 |       4 |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|*  5 |      INDEX RANGE SCAN               | T1_N1 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
|   6 |     BITMAP CONVERSION FROM ROWIDS   |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|*  7 |      INDEX RANGE SCAN               | T1_N2 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((INTERNAL_FUNCTION("N1") AND INTERNAL_FUNCTION("N2") AND ("N2"=10000 OR "N1"=10000)))
   5 - access("N1"=1)
   7 - access("N2"=1)

The thing to note in this case, though, is that the B-tree/bitmap conversion is logically the correct thing to choose when you compare the estimated cost and actual workload:

  • or-expansion: (cost 4, buffers 7)
  • bitmap conversion: (cost 2, buffers 6)

Footnote 2

Mohamed Houri wrote an article on Or-expansion a year ago explaining the possible settings for the hidden parameter “_optimizer_cbqt_or_expansion”, which can off, on, linear, greedy or two_pass. I tried all the options to see if that would make any difference (apart from the obvious impact of “off”); but it didn’t.

Source code

If you want to do further experiments, here’s the script I used to generate the data:

rem
rem     Script:         concat_3b.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2008 / Jan 2003
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             21.3.0.0
rem

create table t1
as
with generator as (
        select
                rownum  id
        from    dual
        connect by level <= 10000
)
select
        rownum                  n1,
        10000                   n2,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1
;

insert /*+ append */ into t1
select
        n2, n1, small_vc, padding
from
        t1
;

commit;

create index t1_n1 on t1(n1);
create index t1_n2 on t1(n2);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns size 100 n1, n2'
        );
end;
/


February 21, 2022

CTE Enhancement

Filed under: 12c,Oracle,Performance,Transformations — Jonathan Lewis @ 9:24 am GMT Feb 21,2022

For many years I’ve told people that when you materialize a CTE (common table expression / “with” subquery) the result set will be written to the temporary tablespace using direct path writes and will be read back using cached reads. This stopped being an accurate description in 12c.

There is a clue about this in the way that the corresponding execution plans and I’ll be pointing that out later. The key difference between earlier versions of Oracle and newer versions is that the GTT (global temporary table) that holds the materialized result set is not necessarily written to disc, and may even avoid allocating a temporary segment.

I started looking at this feature a couple of days ago after remembering that someone, somewhere, had mentioned some details about a temporary object being kept in the PGA rather than being written to disc if the size wasn’t too big. I couldn’t remember if this was GTTs or temporary LOBs (or something completely different) and I only had a vague memory that there was a critical size that might have been 256KB; so I started experimenting with materializing CTEs.

Here’s the script I used to establish a baseline in 11g. I took a fairly arbitrary starting guess that if there was a PGA/Temp split is might be around 64KB.

rem
rem     Script:         cte_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

alter session set events '10046 trace name context forever, level 8';

prompt  ==============================
prompt  First sample - just under 64KB
prompt  ==============================

set arraysize  35
set pagesize  35

set pause Waiting...
set pause on

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 63
)
select
        substr(v1,-10)
from
        g1
;

alter session set events '10046 trace name context off';

execute snap_my_stats.end_snap
execute snap_ts.end_snap

prompt  ==============================
prompt  Second sample - just over 64KB
prompt  ==============================

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 64
)
select
        substr(v1,-10)
from
        g1
;

execute snap_my_stats.end_snap
execute snap_ts.end_snap

set pause off

The calls to the package snap_my_stats are the usual ones I use (very old source at this Wayback URL) to report the session’s activity (v$mystat) between start and end snapshot and the calls in the snap_ts package do the same for the I/O per tablespace, summing bu tablespace across v$filestat and v$tempstat.

This script basically materializes and reports a GTT with a single column of 1,024 bytes, and in this case either 63 or 64 rows. You’ll note that I’ve built another assumption into the code that the CTE (if kept in memory) won’t follow all the details of a “real” table block, but will be a simple array with a tiny overhead per row – I’ve also assumed that the optimizer isn’t smart enough (or foolhardy enough) to push the substr() call inside the CTE.

I’ve set pause on and set both the arraysize and pagesize to a value less than the total number of rows I’ll be fetching so that I can check a couple of dynamic performance views (in particular v$sort_usage) from another session while the queries are running.

As a starting point, here are some critical values I’ve selected from the various outputs for just the 63 row case when running 11.2.04:

-------------
Session stats
-------------
Name                                                 Value
----                                                 -----
physical reads                                           9
physical reads cache                                     9
physical writes                                          9
physical writes direct                                   9
physical writes direct temporary tablespace              9

---------
T/S Stats 
---------
TS#        Reads      Blocks   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max Tablespace
----       -----      ------   ---------    ---      ------      ------   ---------    --- -------------------
   3           1           9        .000      1           1           9        .000      0 TEMP


Since I've enabled extended tracing at level 8 (waits) I can also show you the I/O waits reported for the reads and writes:
WAIT #140185232180256: nam='direct path write temp' ela= 180 file number=201 first dba=35073 block cnt=9 obj#=-1 tim=1645178184267768
...
WAIT #140185232180256: nam='db file scattered read' ela= 183 file#=201 block#=35073 blocks=9 obj#=-40016369 tim=1645178184268342

A dump of the 9 blocks of the temporary file (the “file number-201” translates to tempfile 1 since I have db_files = 200) starting at block 35073 looks like an ordinary table with pctfree 10 (which is one of the annoying details of GTTs – you can’t adjust it), 3 ITL slots (which is normal for CTAS) and 7 rows per block.

So, for 11g, we conclude that the effect of materializing a CTE is simply to create a GTT in the temporary tablespace, write it out using direct path writes, then read it back into the buffer cache using db file scattered reads. (You might want to confirm that this always happens, even if the CTE holds only one row.)

If you take advantage of the pause to issue “alter system flush buffer_cache” from another session you can also dump the segment header block (35072 in my case) to see that it’s a normal table segment header block – using freelist management, not ASSM because that’s the way temporary tablespaces have to be declared. The segment header block didn’t get written to disc in the normal course of the test.

12c Enhancement

This is the moment where the second query, and the pause that allows me to query v$sort_usage, becomes significant. When I started 12.2.0.1 with the 63 row query I saw:

  • No I/O on the temporary tablespace
  • No entry in v$sort_usage

To my great satisfaction the 64 row query did report I/O to the temporary tablespace (10 blocks this time – needing one extra block to cater for the 64th row) with v$sort_usage reporting a segment being created on my behalf. Obviously I re-ran the test a couple of times, flushing the buffer cache and shared pool, and connecting to a new session each time. The results were totally consistent: 63 rows => no GTT, 64 rows => GTT.

If you’re feeling a little suspicious at this point, bear with me.

This is the point where I switched to 19.11.0.0 – and both queries ran in memory with no sign of a GTT being created. Luckily I had cloned the query several times in the script generating different pairs of numbers of rows: 127/128, 255/256, 511/512, 1023/1024, and when I hit 1024 (and 1023) my session produced a GTT.

Somewhere between 512 and 1023 rows I was hitting a critical breakpoint – so I nearly started working through a binary chop to find the actually breakpoint; luckily, though, I had a little inspiration: if the overhead per row was 3 bytes (as it would be for a normal table column of more than 254 bytes) then 1023 rows would have an overhead of about 3KB – so I should test 1021 rows if I wanted to test a memory of just under 1MB.

Sure enough, at 1021 rows the GTT didn’t appear, at 1022 rows it did – time after time after time.

But …

My tests seemed to be totally repeatable. Again, I connected to a new session a few times, I flushed the buffer cache, I flushed the shared pool, I checked v$sort_usage. Consistently the results seemed to say:

  • 12.2 uses the PGA up to 64KB then dumps to a GTT
  • 19.11.0.0 uses the PGA up to 1MB then dumps to a GTT

Except – that night I had to shut down the two virtual machines because sometimes, for no obvious reason, I can’t hibernate my laptop while the VMs are running; and when I started everything up again the following morning and started re-running the tests as I wrote up the results something had changed. In fact my 19.11 instance didn’t dump to a GTT until I had reached nearly 10MB of data and my 12.2 wasn’t even dumping at 1MB; and I have no idea why a complete restart made such a difference.

After spending a little time trying to figure out what had changed – and I think it may have been that I’d been running the previous day’s tests after doing a lot of heavy work with temporary LOBs trying to pin down an anomaly with the handling of the temporary tablespace – I finally tried a google search using keywords that might be relevant and found this article that Keith Laker wrote about 5 years ago.

The feature is known as In-memory “cursor-duration” temporary table. I mentioned a clue in the execution plans at the start of this note: materialization shows up with a “temp table transformation” operation followed, in 11g, by with a child operation of “load as select”; but in 12.2 the child operation is “load as select (cursor duration memory)”. I really should have started my invesigation by putting the entire text of that operation into a search engine.

Summary

(Basically the highlights from Keith’s article):

  • The “in-memory cursor-duration”temporary table” change appeared in 12.2
  • It can be used in a number of transformations that the optimizer does
  • It’s not possible to force the use of the feature for a given query, it’s down to an internal algorithm
  • The mechanism uses memory that is “essentially” PGA
  • Despite the name this feature does not require you to licence the In-Memory option
  • If you’re still using an older version of Oracle this could be a good reason for upgrading as it can reduce the I/O load particularly for “analytic” types of query at a cost of using extra memory.

All the work I had done trying to find a break-point where Oracle switched from using PGA to using a GTT had been a waste of time – and the apparently consistent results on the first day had been an “accident” dictated (possibly) by some PGA-related activity that had taken place before I started running my tests .

Footnotes and geeky things

Five years on from the publication date of Keith’s article we shouldn’t be surprised to see some changes. Keith notes that the mechanism will apply only to serial queries that do more than one pass over the table – but there are two points to raise there:

  • possibly the two-pass thing is because it usually takes two passes over a CTE before Oracle will materialize a CTE automatically; my example shows the in-memory effect on a single pass – but that was a forced materialization.
  • the restrictions on parallelism may have been relaxed by now – if you check for hidden parameters you will find: _in_memory_cdt_maxpx, default value 4, described as “Max Parallelizers allowed for IMCDT”.

Interestingly there are more “cdt” parameters in 12.2 than there are in 19.11, and there are clear indications of some changes in algorithm and mechanism:

12c parameters like '%cdt%
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_cdt_shared_memory_limit                   0                    Shared Memory limit for CDT
_cdt_shared_memory_query_percent           20                   Per Query Percentage from CDT pool
_imcdt_use_mga                             ON                   MGA for In Memory CDT
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

19g parameters like '%cdt%'
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_hcs_enable_in_mem_cdt_hint                FALSE                add hint opt_param('_in_memory_cdt', 'off')
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

The parameter “_in_memory_cdt” can take the values ON, LIMITED, or OFF – which tells you that even if you can’t force a query to use in-memory CDTs you can (if you really want to) stop a query from using the feature. There are a few notes about this parameter and its significance to RAC and parallel execution (for 12.2) on MOS – if you have an account – Doc ID 2388236.1 What is _in_memory_cdt Parameter?

The reference to MGA (the “managed global area”) in 12.2 is also quite interesting. This is documented as a Solaris feature using OSM to share memory between processes. For more general details you can review MOS Doc ID 2638904.1 MGA (Managed Global Area) Reference Note (again, only if you have an account).

The “new” oradebug mechanism shows (from 18c) a couple of relevant components under SQL compilation and execution that you could trace if you want to investigate further.

Components in library RDBMS:
--------------------------
  SQL_Compiler                 SQL Compiler ((null))
    ICDT_Compile               In Memory CDT Compilation (qks3t)
  SQL_Execution                SQL Execution (qer, qes, kx, qee)
    ICDT_Exec                  In Memory CDT Execution (qes3t, kxtt)

January 1, 2022

Happy New Year

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,Transformations — Jonathan Lewis @ 12:01 am GMT Jan 1,2022

Here’s an entertaining little bug that appeared on the oracle-l list server just in time to end one year and start another in a suitable way. The thread starts with an offering from Matthias Rogel (shown below with some cosmetic changes) to be run on Oracle 12.2.0.1:

rem
rem     Script:         group_by_bug.sql
rem     Author:         Matthias Rogel  / Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             19.11.0.0       Fixed
rem             12.2.0.1        Fail
rem

create table t as (
        select date'2021-12-30' d from dual 
        union all 
        select date'2021-12-31'   from dual
);

select extract(year from d), count(*) from t group by extract(year from d);

alter table t add primary key(d);
select extract(year from d), count(*) from t group by extract(year from d);

This doesn’t look particularly exciting – I’ve created a table with two rows holding two dates in the same year, then counted the number of rows for “each” year before and after adding a primary key on the date column. Pause briefly to think about what the results might look like …

Table created.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          2

1 row selected.


Table altered.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          1
              2021          1

2 rows selected.

After adding the primary key (with its unique index) the result changes to something that is clearly (for this very simple data set) wrong.

At this point I offered a hypothetical reason why Oracle might be producing the wrong result, but Tim Gorman was one step ahead of me and supplied a bug reference from MOS: Wrong Result Using GROUP BY with EXTRACT Function Against DATE (Doc ID 2629968.1)

The MOS document describes this as a bug introduced in the upgrade from 12.1.0.2 to 12.2.0.1, demonstrates the error with the extract() function applied to a date, and supplies three possible workarounds (but not the workaround or explanation I supplied in my response on oracle-l).

The document also pointed to a further bug note that described how the problem also appeared with the combination of the to_char() function applied to a date column with a unique indexes: 12.2 Wrong Results from Query with GROUP BY Clause Based on To_char Function of Unique Index Columns (Doc ID 2294763.1) with a further suggestion for applying a patch (see MOS Doc ID: 26588069.8) or upgrading to 18.1 (where the bug has been fixed).

Matthias Rogel supplied a follow-up demonstrating the problem with to_char(), which prompted me to create an example showing that it wasn’t just about dates – which I’ve tidied up below (reminder, all results on this page are from 12.2.0.1):

create  table t1 as 
select  round(rownum/10,1) n1 
from    all_objects 
where   rownum <= 10
;


select n1 from t1 order by n1;
select n1, count(*) from t1 group by n1 order by n1;

column nch format A3

select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

alter table t1 add constraint t1_pk primary key(n1);
select to_char(n1,'99') nch , count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

As before I’ve created a simple table, and populated it with a few rows of data. THe first two queries are there to show you the data (0.1 to 1.0 by steps of 0.1), and show that aggregating the raw data produces one row per value.

I’ve then repeated the aggregation query, but converted each value to a character string that effectively rounds the value to an integer. Here are the two sets of results, before and after adding the primary key.

NCH   COUNT(*)
--- ----------
  0          4
  1          6

2 rows selected.

Table altered.

NCH   COUNT(*)
--- ----------
  0          1
  0          1
  0          1
  0          1
  1          1
  1          1
  1          1
  1          1
  1          1
  1          1

10 rows selected.

Again, the introduction of the primary key constraint on the column results in wrong results. In this example, though I’ve pulled the execution plans from memory along with their outlines, and this is what the two plans look like.

SQL_ID  gt5a14jb0g4n0, child number 0
-------------------------------------
select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99')
order by 1,2

Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    10 |    30 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    10 |    30 |     4  (50)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |    10 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


SQL_ID  4fxxtmrh8cpzp, child number 0
-------------------------------------
select to_char(n1,'99') nch , count(*) from t1 group by
to_char(n1,'99') order by 1,2

Plan hash value: 1252675504

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     2 (100)|          |
|   1 |  SORT ORDER BY   |       |    10 |    30 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9BB7A81A")
      ELIM_GROUPBY(@"SEL$47952E7A")
      OUTLINE(@"SEL$47952E7A")
      ELIM_GROUPBY(@"SEL$1")
      OUTLINE(@"SEL$1")
      INDEX(@"SEL$9BB7A81A" "T1"@"SEL$1" ("T1"."N1"))
      END_OUTLINE_DATA
  */

In the absence of the primary key (index) Oracle does a full tablescan, then hash group by, then sort order by. When the primary key is put in place Oracle does an index full scan (which is legal because the index must contain all the data thanks to the not null declaration inherent in a primary key) and a sort order by without any group by.

You might wonder if the problem arises because Oracle assumes the indexed path somehow means the aggregation doesn’t apply – but with a /*+ full(t1) */ hint in place and a full tablescan in the plan the aggregation step is still missing — and if you look at the Outline Data section of the plan you can see that this is explicitly demanded by the hint(s): /*+ elim_groupby() */

My hypothesis (before I read the bug note) was that the optimizer had picked up the primary key declaration and seen that n1 was unique and therefore allowed the aggregating group by to be eliminated, but failed to “notice” that the to_char() – or extract() in the date example – meant that the assumption of uniqueness was no longer valid. To work around this problem very locally I simply added the hint /*+ no_elim_groupby */ (with no query block specified) to the query – and got the correct results.

Footnote

There is an interesting side note to this example (though not one that I would want to see used in a production system – this comment is for interest only). If you look at the Outline Data for the plan when there was no primary key you’ll notice that the only outline_leaf() is named sel$1 whereas in the plan with the primary key sel$1 appears as an outline() and the only outline_leaf() is named sel$9bb7a81a. As “outline leaf” is a query block that was used by the optimizer in constructing the final plan, while an “outline” is an intermediate query block that was examined before being transformed into another query block. So this difference in the Outline Data tells us that the problem appears thanks to a transformation that did not happen when there was no index – so what would our execution plan look like if the only hint we used in the query was /*+ outline_leaf(@sel$1) */ ?

SQL_ID  apgu34hc3ap7f, child number 0
-------------------------------------
select /*+ outline_leaf(@sel$1) */ to_char(n1,'99') nch , count(*) from
t1 group by to_char(n1,'99') order by 1,2

Plan hash value: 3280011052

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY    |       |    10 |    30 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY   |       |    10 |    30 |     3  (67)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


This posting was scheduled to launch at 00:01 GMT on 1st January 2022. Happy new year – just be careful that you don’t try to extract() or to_char() it if you’re running 12.2.0.1 unless you’ve applied patch 26588069.

October 7, 2021

Hints and Costs

Filed under: 12c,CBO,Conditional SQL,Execution plans,Oracle — Jonathan Lewis @ 12:06 pm BST Oct 7,2021

This note is one I drafted three years ago, based on a question from the Oracle-L. It doesn’t directly address that question because at the time I was unable to create a data set that reproduced the problem’ but it did highlight a detail that’s worth mentioning, so I’ve finally got around to completing it (and testing on a couple of newer versions of Oracle).

I’ll start with a model that was supposed to demonstrate the problem behind the question:


rem
rem     Script:         122_or_expand.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:        
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem             12.2.0.1
rem

create table t1
segment creation immediate
pctfree 80 pctused 20
nologging
as
select
        *
from
        all_objects
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk
        primary key(object_id)
        using index pctfree 80
;

variable b1 number
variable b2 number
variable b3 number
variable b4 number
variable b5 number
variable b6 number

exec :b1 := 100
exec :b2 := 120
exec :b3 := 1100
exec :b4 := 1220
exec :b5 := 3100
exec :b6 := 3320

set serveroutput off

select
        object_name
from 
        t1
where
        object_id between :b1 and :b2
or      object_id between :b3 and :b4
or      object_id between :b5 and :b6
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));


The critical feature of the query is the list of disjuncts (ORs) which all specify a range for object_id. The problem was that the query used a plan with an index full scan when there were no statistics on the table (or its indexes), but switched to a plan  that used index range scans when statistics were gathered – and the performance of the plan with the full scan was unacceptable.  (Clearly the “proper” solution is to have some suitable statistics in place – but sometimes such things are out of the control of the people who have to solve the problems.)

The /*+ index() */ and (undocumented) /*+ index_rs_asc() */ hints had no effect on the plan. The reason why the /*+ index() */ hint made no difference is because an index full scan is one of the ways in which the /*+ index() */ hint can be obeyed – the hint doesn’t instruct the optimizer to pick an index range scan. The hint /*+ index_rs_asc() */ specifically tells the optimizer to pick an index Range Scan ASCending if the hint has been specified correctly and the choice is available and legal. So why was the optimizer not doing as it was told. Without seeing the execution plan or CBO trace file from a live example I can’t guarantee that the following hypothesis is correct, but I think it’s in the right ball park.

I think the optimizer was probably using the (new to 12c) cost-based“OR expansion” transformation, which basically transformed the query into a UNION ALL of several index range scans – and that’s why its outline would show /*+ index_rs_asc() */ hints, and the hint would only become valid after the transformation had taken place so if Oracle didn’t consider (or considered and discarded) the transformation when there were no stats in place then the hint would have to be “Unused” (as the new 19c hint-report would say).

When I tried to model the problem the optimizer kept doing nice things with my data, so I wasn’t able to demonstrate the OP’s problem. However in one of my attempts to get a silly plan I did something silly – that can happen by accident if your client code isn’t careful! I’ll tell you what that was in a moment – first, a couple of plans.

As it stands, with the data and bind variables as shown, the optimizer used “b-tree / bitmap conversion” to produce an execution plan that did three separate index range scans, converts rowids to bit, OR-ed the bit-strings, then converted back to rowids before accessing the table:

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |    84 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |    84   (5)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|   3 |    BITMAP OR                        |       |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   5 |      SORT ORDER BY                  |       |       |       |            |          |
|*  6 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   8 |      SORT ORDER BY                  |       |       |       |            |          |
|*  9 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|  10 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|  11 |      SORT ORDER BY                  |       |       |       |            |          |
|* 12 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

So the first thing I had to do was disable this feature, which I did by adding the hint /*+ opt_param(‘_b_tree_bitmap_plans’,’false’) */ to the query. This adjustment left Oracle doing the OR-expansion that I didn’t want to see:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   297 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   288 | 19008 |   297   (1)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|  10 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |   173 |  7612 |   177   (1)| 00:00:01 |
|* 11 |      INDEX RANGE SCAN                  | T1_PK           |   173 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

You’ll notice that the three range scans have different row estimates and costs – that’s the effect of bind variable peeking and my careful choice of bind variables to define different sized ranges. Take note, by the way, for the three filter predicates flagged at operations 3, 6, and 9.  These are the “conditional plan” filters that say things like: “don’t run the sub-plan if the runtime value of :b5 is greater than :b6”.

Since I didn’t want to see OR-expansion just yet I then added the hint /*+ no_or_expand(@sel$1) */ to the query and that gave me a plan with tablescan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   617 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   291 | 12804 |   617   (4)| 00:00:01 |
--------------------------------------------------------------------------

This was a shame because I really wanted to see the optimizer produce an index full scan at this point – so I decided to add an “unnamed index” hint to the growing list of hints – specifically: /*+ index_(@sel$1 t1@sel$1) */

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |   405 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |   405   (2)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | T1_PK |   291 |       |   112   (7)| 00:00:01 |
---------------------------------------------------------------------------------------------

This, of course, is where things started to get a little interesting – the index full scan costs less than the tablescan but didn’t appear until hinted. But after a moment’s thought you can dismiss this one (possibly correctly) as an example of the optimizer being cautious about the cost of access paths that are dictated by bind variables or unpeekable inputs. (But these bind variables were peekable – so maybe there’s more to it than that – I was still trying to get to a point where my model would behave more like the OP’s, so I didn’t follow up on this detail: maybe in a couple of years time … ).

Once last tweak – and that will bring me to the main point of this note. In my original code I was using three ranges dictated by 3 pairs of bind variables, for example [:b5, :b6]. What would happen if I made :b5 greater than :b6, say I swapped their values?

The original btree/bitmap plan didn’t change, but where I had simply blocked bree/bitmap plans and seen OR-expansion as a result the plan changed to a full tablescan (with the cost you saw above of 617). So tried again, adding the hint /*+ or_expand(@sel$1) */ to see why; and this is the plan I got:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   735 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   116 |  7656 |   735   (3)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|* 10 |     TABLE ACCESS FULL                  | T1              |     1 |    44 |   615   (4)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

I still get the same three branches in the expansion, but look what’s happened to the sub-plan for the third pair of bind variables. The optimizer still has the FILTER at operation 9 – and that will evaluate to FALSE for the currently peeked values; but the optimizer has decided that it should use a tablescan for this part of the query if it ever gets a pair of bind variables in the right order; and the cost of the tablescan has echoed up the plan to make the total cost of the plan 735, which is (for obvious reasons) higher than the cost of running the whole query as a single tablescan.

The same anomaly appears in 19.11.0.0 and 21.3.0.0. On the plus side, it’s possible that if you have code like this the optimizer will be using the btree/bitmap conversion anyway;

tl;dr

As a generic point it’s worth ensuring that if you’re using bind variables in client code to define ranges then you’ve got to get the values in the right order otherwise one day the answer to the question “nothing changed why is the query running so slowly?” will be “someone got in first with the bound values the wrong way round”.

August 19, 2020

Subquery with OR

Filed under: 12c,Execution plans,Hints,Oracle,Performance — Jonathan Lewis @ 2:03 pm BST Aug 19,2020

I’ve written a couple of notes in the past about the problems of optimising queries with predicates of the form “or exists {subquery}”. A recent question on the Oracle Developer Community forum brought to my attention an improvement in this area in (very precisely) 12.2, as well as giving me a cute example of how the first cut of a new feature doesn’t always cover every detail, and creating a nice example of how the new technology enhances the old technology.

We start with some data and a simple query running under 12.2.0.1:

rem
rem     Script:         exists_with_or_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2  -- feature not implemented
rem

create table cat_contact(
        contact_method_id       varchar2(1) not null,
        contact_id              number(8,0) not null,
        small_vc                varchar2(10),
        padding                 varchar2(100)
);

alter table cat_contact add constraint cc_pk primary key(contact_id);
create index cc_i1 on cat_contact(contact_method_id);

insert into cat_contact
select
        chr(64 + case when rownum <= 10 then rownum else 26 end),
        rownum,
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_contact where contact_method_id in ('A','B','C');

create table cat_item(
        contact_id      number(8,0) not null,
        item_category   varchar2(1) not null,
        small_vc        varchar2(10),
        padding         varchar2(100),
        constraint ci_ref_cc foreign key(contact_id) references cat_contact
)
;

alter table cat_item add constraint ci_pk primary key(contact_id, item_category);
create index ci_i1 on cat_item(item_category);

insert into cat_item 
select
        rownum,
        chr(88 + case when rownum <= 10 then mod(rownum,2) else 2 end),
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_item where item_category in ('X','Y');

execute dbms_stats.gather_table_stats(user,'cat_contact')
execute dbms_stats.gather_table_stats(user,'cat_item')

I’ve created and populated two tables (the table and column names come from the ODC thread). There’s a foreign key relationship defined between cat_item and cat_contact, both tables have primary keys declared, with a couple of extra columns declared not null.

I’ve populated the two tables with a small amount of data and each table has one column rigged with very skewed data:

  • cat_contact.contact_method_id is mostly ‘Z’ with one row each of ‘A’ to ‘J’ ,
  • cat_item.item_category (the second column in the primary key) is mostly ‘Z’ with 5 rows each of ‘X’ and ‘Y’

After populating each table I’ve queried it in a way which means the subsequent stats gathering will create frequency histograms on these two columns and the optimizer will be able to take advantage of the skew in its arithmetic, which means it may choose to use the indexes I’ve created on those skewed columns if the right values appear in the queries.

So here’s the query we’re interested in:

SELECT  /*+ 
                qb_name(main) 
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR      c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

Here’s the default execution plan (in 12.2.0.1 with my settings for system stats and various other optimizer-related figures that MIGHT make a difference) pulled from memory after executing the query to return 10 rows.


-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |       |       |    34 (100)|          |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL | CAT_CONTACT | 10000 |  1152K|    34   (6)| 00:00:01 |
|   3 |   INLIST ITERATOR   |             |       |       |            |          |
|*  4 |    INDEX UNIQUE SCAN| CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("C"."CONTACT_METHOD_ID") OR  IS NOT NULL))
   4 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR
              "I"."ITEM_CATEGORY"='Y')))

For every row in the cat_contact table Oracle has checked whether or not the contact_method is an ‘A’, ‘B’, or ‘C’ and passed any such rows up to its parent, for all other rows it’s then executed the subquery to see if the row with the matching contact_id in contact_item has an ‘X’ or ‘Y’ as the item_category. It’s had to run the subquery 9,997 times (there were only three rows matching ‘A’,’B’,’C’) and the INLIST ITERATOR at operation 3 means that it’s probed the index nearly 20,000 timtes. This does not look efficient.

I’ve said in previous articles that when you need to optimize queries of this shape you need to rewrite them as UNION ALL queries to separate the two parts of the complex OR predicate and then make sure that you don’t report any items twice – which you do by making use of the lnnvl() function. So let’s do this – but let’s do it the lazy “new technology” way by upgrading to 19c and executing the query there; here’s the plan I got in 19.3.0.0:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

The optimizer has used the new “cost-based OR-expansion” transformation to rewrite the query as a UNION ALL query. We can see an efficient access into cat_contact to identify the ‘A’,’B’,’C’ rows, and then we can see that the second branch of the union all handles the existence subquery but the optimizer has unnested the subquery to select the 10 rows from cat_item where the item_category is ‘X’ or ‘Y’ and used those rows in a nested loop to drive into the cat_contact table using the primary key. We can also see the use of the lnnvl() function in operation 13 that ensures we don’t accidentally report the ‘A’,’B’,’C’ rows again.

So let’s go back to 12.2.0.1 and see what happens if we just add the /*+ or_expand(@main) */ hint to the query. Here’s the resulting execution plan:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

We get exactly the plan we want – with the same cost as the 19c cost, which happens to be less than half the cost of the default plan that we got from 12.2.0.1. So it looks like there may be case where you will need to hint OR-expansion because is might not appear by default.

Other Observations 1 – ordering

You may have noticed that my query has, unusually for me, put the existence subquery first and the simple filter predicate second in the where clause. I don’t like this pattern as (over time, and with different developers modifying queries) it’s too easy in more complex cases to “lose” the simple predicate; a one-liner can easily drift, change indents, get bracketed with another predicate that it shouldn’t be connected with and so on. I’ve actually seen production systems producing wrong results because little editing accidents like this (counting brackets is the classic error) have occured – so I’m going to rerun the test on 12.2.0.1 with the predicates in the order I would normally write them.

Here’s the “corrected” query with its execution plan:


SELECT  /*+ 
                qb_name(main) 
                or_expand(@main)
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                c.contact_method_id IN ('A', 'B', 'C')  
        OR
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        )
;  


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    16 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    16   (7)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   4 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   5 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   6 |       INLIST ITERATOR                     |                 |       |       |            |          |
|   7 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|*  8 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|*  9 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|  10 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
|* 11 |    FILTER                                 |                 |       |       |            |          |
|  12 |     INLIST ITERATOR                       |                 |       |       |            |          |
|  13 |      TABLE ACCESS BY INDEX ROWID BATCHED  | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN                    | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|  15 |     INLIST ITERATOR                       |                 |       |       |            |          |
|* 16 |      INDEX UNIQUE SCAN                    | CI_PK           |     1 |     6 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
   9 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  11 - filter(LNNVL( IS NOT NULL))
  14 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  16 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y')))

The execution plan has jumped from 14 lines to 17 lines, the cost has gone up from 14 to 16, and both branches of the plan now report access to cat_contact and cat_item (though only through its primary key index in the second branch). What’s happened?

Oracle 12.2.0.1 has rewritten the query as a UNION ALL working from the bottom up – so in this case the first branch of the rewrite handles the original filter subquery, unnesting it to drive efficient from cat_item to cat_contact. This means the second branch of the rewrite has to find the ‘A’,’B’,’C’ rows in cat_contact and then check that the filter subquery hadn’t previously reported them – so the optimizer has applied the lnnvl() function to the filter subquery – which you can nearly see in the Predicate Information for operation 11.

To make it clearer, here’s what you get as the predicate information for that operation after calling explain plan and dbms_xplan.display()

  11 - filter(LNNVL( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1)))

In 12.2 the order of predicates in your query seems to be important – unless told otherwise the optimizer is working from the bottom-up (then rewriting top-down). But there is hope (though not documented hope). I added the /*+ or_expand(@main) */ hint to the query to force OR-expansion. Checking the Outline Information of the plan I could see that this had been expanded to /*+ or_expand(@main (1) (2)) */. Taking a wild guess as the significance of the numbers and changing the hint to /*+ or_expand(@main (2) (1) */ I re-ran the test and back to the more efficient plan – with the filter subquery branch appearing second in the UNION ALL view and the lnnvl() applied to the simpler predicate.

So the OR-expansion code is not fully cost-based in 12.2.0.1, but you can modify the behaviour through hinting. First to force it to appear (which may not happen even if it seems to be the lower cost option), and secondly to control the ordering of the components of the UNION ALL. As with all things relating to hints, though, act with extreme caution: we do not have sufficient documentation explaining exactly how they work, and with some of them we don’t even know whether the code path is even complete yet.

Other Observations 2 – 12cR1

The or_expand() hint and cost-based OR-expansion appeared specifically in 12.2.0.1; prior to that we had a similar option in the use_concat() hint and concatenation – which also attempts to rewrite your query to produce a union all of disjoint data sets. But there are restrictions on what concatentation can do. I rarely remember what all the restrictions are, but there are two critical restrictions:

  • first, it will only appear by default if there is an indexed access path available to drive every branch of the rewrite
  • secondly, it will not apply further transformations to the separate branches that it produces

If we try adding the or_expand() hint to our query in 12.1.0.2 it will have no effect, so let’s add a suitable use_concat() hint and see what happens:

explain plan for
SELECT  /*+ 
                qb_name(main) 
                use_concat(@main 8 or_predicates(1))
--              use_concat(@main   or_predicates(1))
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR
                c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             | 10000 |  1152K|    40   (3)| 00:00:01 |
|   1 |  CONCATENATION                        |             |       |       |            |          |
|   2 |   INLIST ITERATOR                     |             |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| CAT_CONTACT |     3 |   354 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | CC_I1       |     3 |       |     3   (0)| 00:00:01 |
|*  5 |   FILTER                              |             |       |       |            |          |
|*  6 |    TABLE ACCESS FULL                  | CAT_CONTACT |  9997 |  1151K|    35   (6)| 00:00:01 |
|   7 |    INLIST ITERATOR                    |             |       |       |            |          |
|*  8 |     INDEX UNIQUE SCAN                 | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C')
   5 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1))
   6 - filter(LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C'))
   8 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

26 rows selected.

As you can see by forcing concatentation I’ve got my “union all” view with lnnvl() applied in the second branch. But the second branch was the “select where exists()” branch and the optimizer has not been able (allowed?) to do the unnesting that would let it drive efficiently from the cat_item table to the cat_contact table. The effect of this is that the plan still ends up with a full tablescan of cat_contact running a filter subquery on virtually every row- so concatenation doesn’t save us anything.

The significance of the “8” in the hint, by the way is (I believe) that it tells the optimizer to use inlist iterators when possible. If I had omitted the “8” the plan would have had 4 branches – one each for ‘A’, ‘B’, and ‘C’ and the fourth for the filter subquery. I could also have added a hint /*+ use_concat(@subq or_predicates(1)) */ to replace operations 7 and 8 with a single index range scan with a filter predicate for the ‘X’/’Y’ check (and that might, in any case, be slightly more efficient than the iteration approach).

Footnote(s)

The “legacy” OR-expansion (“concatenation” a.k.a. LORE in the optimizer trace file) can be controlled through the hints use_concat(), and no_expand().

The new cost-based OR-expansion (now ORE in the optimizer trace file) can be controlled through the hints or_expand() and no_or_expand().

The new cost-based OR-expansion has some restrictions, for example it is explicitly blocked in a MERGE statement, even in 19c, as reported in this blog note by Nenad Noveljic. As the blog note shows, concatenation is still possible but you (may) have to disable cost based OR-expansion.

I scanned the executable for the phrase “ORE: bypassed” to see if there were any messages that would suggest other reasons why cost-based OR-expansion would not be used; unfortunately the only relevant string was “ORE: bypassed – %s” [update (see comment 5 below): after ignoring case there was a second option: “ORE: Bypassed for disjunct chain: %s.”] – in other words all the possible bypass messages would be filled in on demand. I found a list of messages that might be relevant; I’d be a little cautious about trusting it but if you don’t see the feature appearing when you’re expecting it then it might be worth checking whether one of these could apply.

  • Old OR expansion hints present
  • Semi join hint present
  • QB has FALSE predicate
  • QB marked for NO Execution
  • Full Outer join QB
  • Rownum found in disjunction
  • Anti/semi/outer join in disjunction
  • Opposite Range in disjunction
  • No Index or Partition driver found
  • Predicate chain has all constant predicates
  • Negated predicate found
  • Long bitmap inlist in OR predicate
  • PRIOR expression in OR predicate
  • All 1-row tables found
  • No valid predicate for OR expansion
  • Disjunctive subquery unnesting possible
  • Subquery unnesting possible
  • Subquery coalesced query block
  • Merge view query block

Finally – here’s another reference blog note comparing LORE with ORE from Mohamed Houri.

 

August 6, 2020

Case and Aggregate bug

Filed under: 12c,Bugs,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 12:43 pm BST Aug 6,2020

[Fixed – but currently needs a patch for 19c]

The following description of a bug appeared on the Oracle Developer Community forum a little while ago – on an upgrade from 12c to 19c a query starting producing the wrong results on a simple call to the average() function. In fact it turned out to be a bug introduced in 12.2.0.1.

The owner of the thread posted a couple of zip files to build a test case – but I had to do a couple of edits, and change the nls_numeric_characters to ‘,.’ in order to get past a formatting error on a call to the to_timestamp() function. I’ve stripped the example to a minimum, and translated column name from German (which was presumably the source of the nls_numeric_characters issue) to make it easier to demonstrate and play with the bug.

First the basic data – you’ll notice that I’ve tested this on 12.1.0.2, 12.2.0.1 and 19.3.0.0 to find out when the bug appeared:

rem
rem     Script:         case_aggregate_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create table test(
        case_col        varchar2(11),
        duration        number(*,0),
        quarter         varchar2(6),
        q2h_knum_b      varchar2(10)
   )
/

insert into test values('OK',22,'1.2020','AB1234');
insert into test values('OK',39,'1.2020','AB1234');
insert into test values('OK',30,'1.2020','AB1234');
insert into test values('OK',48,'1.2020','AB1234');
commit;

execute dbms_stats.gather_table_stats(user,'test')

create or replace force view v_test
as
select
        q2h_knum_b,
        case
                when b.case_col not like 'err%'
                        then b.duration
        end     duration,
        case
                when b.case_col not like 'err%'
                        then 1
                        else 0
        end     status_ok
from
        test b
where
        substr(b.quarter, -4) = 2020
;

break on report
compute avg of duration on report
select * from v_test;

---------------------------------------------

Q2H_KNUM_B   DURATION  STATUS_OK
---------- ---------- ----------
AB1234             22          1
AB1234             39          1
AB1234             30          1
AB1234             48          1
           ----------
avg             34.75

I’ve created a table, loaded some data, gathered stats, then created a view over the table. The view includes a couple of columns that use a simple case expression, and both expressions are based in the same way on the same base column (this may, or may not, be significant in what’s coming). I’ve then run off a simple query with a couple of SQL*Plus commands to report the actual content of the view with the average of the duration column – which is 34.75.

So now we run a couple of queries against the view which aggregate the data down to a single row – including the avg() of the duration – using the coalesce() function – rather than the older nvl() function – to convert any nulls to zero.


select
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg,
        coalesce(sum(status_ok), 0)     ok_count
from
        v_test  v1
where
        instr('AB1234', q2h_knum_b) > 0
/

---------------------------------

DURATION_COUNT DURATION_MED DURATION_AVG   OK_COUNT
-------------- ------------ ------------ ----------
             4         34.5            0          4

You’ll notice that the duration_avg is reported as zero (this would be the same if I used nvl(), and would be a null if I omitted the coalesce(). This is clearly incorrect. This was the output from 19.3; 12.2 gives the same result, 12.1.0.2 reports the average correctly as 34.75.

There are several way in which you can modify this query to get the right average – here’s one, just put the ok_count column first in the select list:


select
        coalesce(sum(status_ok), 0)     ok_count,
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg
from
        v_test  v1
where
        instr('AB1234', q2h_knum_b) > 0
/

---------------------------------

  OK_COUNT DURATION_COUNT DURATION_MED DURATION_AVG
---------- -------------- ------------ ------------
         4              4         34.5        34.75

There’s no obvious reason why the error should occur, but there’s a little hint about what may be happening in the Column projection information from the execution plan. The basic plan is the same in both cases, so I’m only show it once; but it’s followed by two versions of the projection information (restricted to operation 1) which I’ve formatted to improve:

Plan hash value: 2603667166

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT GROUP BY     |      |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST |     1 |    20 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((INSTR('AB1234',"Q2H_KNUM_B")>0 AND
              TO_NUMBER(SUBSTR("B"."QUARTER",(-4)))=2020))

Column Projection Information (Operation 1 only):  (Wrong result)
-----------------------------------------------------------------
PERCENTILE_CONT(.5) WITHIN GROUP ( ORDER BY CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
COUNT(CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN 1 ELSE 0 END)[22],
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22]

Column Projection Information (Operation 1 only):  (Right result)
-----------------------------------------------------------------
PERCENTILE_CONT(.5) WITHIN GROUP ( ORDER BY CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
COUNT(CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN 1 ELSE 0 END)[22]

As you can see, to report avg() Oracle has projected sum() and count().

When we get the right result the sum() for duration appears immediately after the count().

When we get the wrong result the sum() for ok_count comes between the count() and sum() for duration.

This makes me wonder whether Oracle is somehow just losing track of the sum() for duration and therefore dividing null by the count().

This is purely conjecture, of course, and may simply be a coincidence – particularly since 12.1.0.2 gets the right result and shows exactly the same projection information.

Readers are left to experiment with other variations to see if they can spot other suggestive patterns.

Update (Aug 2020)

This is now logged as Bug 31732779 – WRONG RESULT WITH CASE STATEMENT AGGREGATION , though it’s not yet publicly visible.

Update (May 2021)

The script produces the correct result in 19.11.0.0. 

Update (Aug 2022)

I’ve just had a follow-up on a continuation forum thread telling me that their original example still produces the wrong result even on 19.14.0.0 on LiveSQL, and I’ve just re-run my script on 19.11.0.0 and got the wrong results. I don’t know what I did to fool myself into thinking it had produced the right results in May 2021 (except that I ran it again the same afternoon after restarting the instance and started getting the right results!)

The bug, however, is part of a more general issue and has been flagged as a duplicate of (unpublished) 31567719 WRONG RESULT ISSUE WITH AVG AND COUNT AGGREGATES ON EXPRESSIONS, is marked fixed in 21.1 with some patches already available for some versions of 19 (up to 19.10 at present).

A MOS search for 31567719 will report the list of currently available patches under the “Recommended links” heading.

July 12, 2020

Massive Deletes

Filed under: 12c,Infrastructure,Oracle,Upgrades — Jonathan Lewis @ 7:36 pm BST Jul 12,2020

One of the recurrent questions on the Oracle Developer Commuity forum is:

What’s the best way to delete millions of rows from a table?

There are an enormous number of relevant details that you need to know before you can give the “right” answer to this question, e.g.

  • Which version of Oracle
  • Are you running Standard Edition or Enterprise Edition
  • Is “millions” a tiny percentage of the table or a large percentage
  • Are there any referential integrity constraints in place
  • Is this a heap table, an IOT, a partitioned table, a clustered table.
  • Are there any LOB (or LONG!) columns in the table
  • Are there any user-defined types in the table
  • Does the system have to keep running while the deletion takes place
  • Is the table compressed
  • How many indexes are there – and can you drop, or mark unusable, some of them
  • Are there any spatial or context indexes on the table
  • Is this for performance reasons, or for space reclamation
  • Have any alternative strategies been considered
  • Will this be a regular occurrence, or is it a one-off clean-up that may be followed by regular smaller deletes
  • How much space do you have to do this job
  • How much time do you have to do this job

One of the most important ones, of course, is “Which version of Oracle?” because it can make an enormous difference to the range of possible strategies.

I’m writing this particular note because the question came up a little while ago where the user wanted to delete all the data from 2008 through to the end of 2018, keeping only the last 18 months of data. That sounds like the volume of data to be deleted (11 years) is very much larger than the volume of data to be kept (1.5 years) – but we can’t be sure of that since businesses tend to grow over time so that last 18 months of data might actually be just as big as the previous 11 years.

As usually happens in response to this question there were suggestions to “create a new table selecting the data you want to keep”, “use dbms_parallel_execute to delete by rowid ranges in parallel”, and a relatively new one “convert to a partitioned table so that the data you want to keep is in its own partition and drop the other partition(s)”. 

I wrote a note a few years ago giving an example of converting a simple heap table to a partitioned table online while maintaining indexes (choosing between local and global) and eliminating the data you don’t want to keep so there’s no need to waste resources copying redundant data.  So, after learning that the OP was running 12.2 Enterprise Edition with the Partitioning option, I suggested that (s)he convert the table into a hash partitioned table with a single partition as this should (for purposes of optimisation) behave just like a simple heap table using the “including rows” clause to copy only the last 18 months of data.

I pointed out that their version of Oracle(EE + PO) gave them the 2nd best option – because I knew that in 19c you could simply do something like:

rem
rem     Script:         122_move.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem

create table t1
as
select  *
from    all_objects
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

alter table t1 add constraint t1_pk primary key(object_id);

alter table t1 move
        including rows where owner != 'SYS'
        online
;

It wasn’t until a little later that a tiny nagging doubt crept into my mind that maybe this enhancement to the basic move command may have appeared at the same time as the modify partition enhancement – in other words in 12.2.0.1; so I ran the test above and found that it did actually seem to work. (And I haven’t yet found any bugs on MOS suggesting that it shouldn’t be used.)

Having discovered that the command was available I thought that I’d better check whether it was also documented, and found that it was in the 12.2 SQL Reference Manual (though not the 12.1 reference manual – for the obvious reason) under Alter Table. Page down to the “tram-tracks” for the Alter Table command and follow the link for the “move_table_clause”, and from there follow the link for “filter_condition”.

Note:

This option is not available on 12.1 and, if you run the test using that version, Oracle will raise error “ORA-25184: column name expected” at the point where the word “rows” appears. This may look somewhat counter-intuitive but, for a very long time, a command like “alter table TabX move including ColY online” is how you would rebuild an index organized table (IOT) with all columns up to ColY in the “IOT_TOP” segment.

Update [The following morning]

Once you’ve got the framework of a test in place it really doesn’t take very long to start running through “what if” cases or potential boundary conditions.  So this morning I added one very obvious test – what happens if you have referential integrity declared between two tables and try to move both of them including a subset of rows from each that ensures that the referential integrity is still in place:


rem
rem     Script:         122_move_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0        Parent can't move
rem             12.2.0.1        Parent can't move
rem

create table parent
as
select  *
from    all_objects
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

alter table parent add constraint par_pk primary key(object_id);

create table child
as
select  *
from    parent
order by
        dbms_random.value
;

alter table child add constraint chi_pk primary key(object_id);
alter table child add constraint chi_fk_par foreign key(object_id) references parent;

I’ve created the child table from the parent data, with random ordering. Now I’m going to delete all the child rows where owner = ‘PUBLIC’ using an online move, then I’ll try and do the same for the parent.


alter table child move
        including rows where owner != 'PUBLIC'
        online
;

-- Child move succeeds (of course)

alter table parent move
        including rows where owner != 'PUBLIC'
        online
;

--
-- Trying to do the matching move on the parent results in:
-- ORA-02266: unique/primary keys in table referenced by enabled foreign keys
--

So there’s a gap in the functionality that makes it less desirable than the simplest case suggests. The referential integrity constraint has to be disabled before the parent table can be restructured.

But something that merits a little further investigation is the option to set the foreign key to “disable validate” (which is sufficient to allow the parent move to take place) and then to set the constraint back to “enable”. When I tried this I had expected Oracle to do a lot of work to revalidate the constraint before enabling it, but I couldn’t find any indication that any such work had taken place.

Update (Nov 2020 – following comment #3 below about IOTs)

It’s important to check the manuals before getting stuck into too many experiments. The “problem” with foreign key constraints is one of the specified restrictions, as given in the 19c Language Reference manual under Alter Table:

Restrictions on Filter Conditions

The following restrictions apply to the filter_condition clause:

    • Filter conditions are supported only for heap-organized tables.
    • Filter conditions can refer only to columns in the table being altered.
    • Filter conditions cannot contain operations, such as joins or subqueries, that reference other database objects.
    • Filter conditions are unsupported for tables with primary or unique keys that are referenced by enabled foreign keys.

Update (Feb 2021)

Another example of experimentation and reading the manuals – there are restrictions on the online move command.  Among other things the online option is not available if the table has domain indexes (e.g. context indexes). The annoying thing about such restrictions, of course, is that it might not even occur to you that it’s something you would need to check the manuals for before you build the model. From the 19c SQL Language Reference manual for Alter Table again:

Restrictions on the ONLINE Clause

The ONLINE clause is subject to the following restrictions when moving table partitions:

    • You cannot specify the ONLINE clause for tables owned by SYS
    • You cannot specify the ONLINE clause for index-organized tables.
    • You cannot specify the ONLINE clause for heap-organized tables that contain object types or on which bitmap join indexes or domain indexes are defined.
    • Parallel DML and direct path INSERT operations require an exclusive lock on the table. Therefore, these operations are not supported concurrently with an ongoing online partition MOVE, due to conflicting locks.

There are a number of little features scattered through this section of the manual (e.g. drop column) that have their own restrictions on the use of the ONLINE option, so you may have to search for “online” if you want to get complete coverage.

 

June 3, 2020

Fetch First Update

Filed under: 12c,Hints,Oracle,Tuning — Jonathan Lewis @ 1:48 pm BST Jun 3,2020

A question about mixing the (relatively new) “fetch first” syntax with “select for update” appeared a few days ago on the Oracle Developer Forum. The requirement was for a query something like:


select
        *
from
        t1
order by
        n1
fetch
        first 10 rows only
for     update
;

The problem with this query is that it results in Oracle raising error ORA-02014: cannot select FOR UPDATE from view with DISTINCT, GROUP BY, etc. The error doesn’t seem to be particularly relevant, of course, until you remember that “fetch first” creates an inline view using the analytic row_number() under the covers.

One suggested solution was to use PL/SQL to open a cursor with a pure select then use a loop to lock each row in turn. This would need a little defensive programming, of course, since each individual “select for update” would be running at a different SCN from the driving loop, and there would be some risk of concurrency problems (locking, or competing data change) occuring.

There is a pure – thought contorted – SQL solution though where we take the driving SQL and put it into a subquery that generates the rowids of the rows we want to lock, as follows:


select
        /*+
                qb_name(main)
        */
        *
from
        t1
where
        t1.rowid in (
                select
                        /*+ qb_name(inline) unnest no_merge */
                        t1a.rowid
                from
                        t1 t1a
                order by
                        t1a.n1
                fetch 
                        first 10 rows only
        )
for update
;

The execution plan for this query is critical – so once you can get it working it would be a good idea to create a baseline (or SQL Patch) and attach it to the query. It is most important that the execution plan should be the equivalent of the following:


select  /*+   qb_name(main)  */  * from  t1 where  t1.rowid in (
select    /*+ qb_name(inline) unnest no_merge */    t1a.rowid   from
t1 t1a   order by    t1a.n1   fetch    first 10 rows only  ) for update

Plan hash value: 1286935441

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |     10 |00:00:00.01 |     190 |       |       |          |
|   1 |  FOR UPDATE                   |      |      1 |        |     10 |00:00:00.01 |     190 |       |       |          |
|   2 |   BUFFER SORT                 |      |      2 |        |     20 |00:00:00.01 |     178 |  2048 |  2048 | 2048  (0)|
|   3 |    NESTED LOOPS               |      |      1 |     10 |     10 |00:00:00.01 |     178 |       |       |          |
|*  4 |     VIEW                      |      |      1 |     10 |     10 |00:00:00.01 |     177 |       |       |          |
|*  5 |      WINDOW SORT PUSHED RANK  |      |      1 |  10000 |     10 |00:00:00.01 |     177 |  2048 |  2048 | 2048  (0)|
|   6 |       TABLE ACCESS FULL       | T1   |      1 |  10000 |  10000 |00:00:00.01 |     177 |       |       |          |
|   7 |     TABLE ACCESS BY USER ROWID| T1   |     10 |      1 |     10 |00:00:00.01 |       1 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=10)
   5 - filter(ROW_NUMBER() OVER ( ORDER BY "T1A"."N1")<=10)

Critically you need the VIEW operation to be the driving query of a nested loop join that does the “table access by user rowid” joinback. In my case the query has used a full tablescan to identify the small number of rowids needed – in a production system that would be the part of the statement that should first be optimised.

It’s an unfortunate feature of this query structure (made messier by the internal rewrite for the analytic function) that it’s not easy to generate a correct set of hints to force the plan until after you’ve already managed to get the plan. Here’s the outline information that shows the messiness of the hints I would have needed:


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"INLINE")
      OUTLINE_LEAF(@"SEL$A3F38ADC")
      UNNEST(@"SEL$1")
      OUTLINE(@"INLINE")
      OUTLINE(@"MAIN")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$A3F38ADC" "from$_subquery$_003"@"SEL$1")
      ROWID(@"SEL$A3F38ADC" "T1"@"MAIN")
      LEADING(@"SEL$A3F38ADC" "from$_subquery$_003"@"SEL$1" "T1"@"MAIN")
      USE_NL(@"SEL$A3F38ADC" "T1"@"MAIN")
      FULL(@"INLINE" "T1A"@"INLINE")
      END_OUTLINE_DATA
  */

You’ll notice that my /*+ unnest */ hint is now modified – for inclusion at the start of the query – to /*+ unnest(@sel1) */ rather than the /*+ unnest(@inline) */ that you might have expected. That’s the side effect of the optimizer doing the “fetch first” rewrite before applying “missing” query block names. If I wanted to write a full hint set into the query itself (leaving the qb_name() hints in place but removing the unnest and merge I had originally) I would need the following:


/*+
        unnest(@sel$1)
        leading(@sel$a3f38adc from$_subquery$_003@sel$1 t1@main)
        use_nl( @sel$a3f38adc t1@main)
        rowid(  @sel$a3f38adc t1@main)
*/

I did make a bit of a fuss about the execution plan. I think it’s probably very important that everyone who runs this query gets exactly the same plan and the plan should be this nested loop. Although there’s a BUFFER SORT at operation 2 that is probably ensuring that every would get the same data in the same order regardless of the execution plan before locking any of it, I would be a little worried that different plans might somehow be allowed to lock the data in a different order, thus allowing for deadlocks.

January 23, 2020

WITH Subquery

Filed under: 12c,Infrastructure,Oracle — Jonathan Lewis @ 8:37 pm GMT Jan 23,2020

Here’s another anomaly that appears when you mix and match Oracle features. In this case it’s “With” subqueries (common table expressions / CTEs) and Active Dataguard (ADG) Standby databases. The problem appeared on the Oracle-l listserver and luckily for the OP another member of the list had seen it before and could point to a relevant MOS document id which explained the issue and supplied a workaround.

The OP had their standby database opened read-only for reporting and found the following oddity in the extended SQL trace file for one of their reports:


WAIT #140196872952648: nam='db file scattered read' ela= 1588 file#=4097 block#=579715946 blocks=128 obj#=-39778567 tim=17263910670242
WAIT #140196872952648: nam='db file scattered read' ela= 1495 file#=4097 block#=579715947 blocks=128 obj#=-39778567 tim=17263910672065
WAIT #140196872952648: nam='db file scattered read' ela= 1671 file#=4097 block#=579715948 blocks=128 obj#=-39778567 tim=17263910674042
WAIT #140196872952648: nam='db file scattered read' ela= 1094 file#=4097 block#=579715949 blocks=128 obj#=-39778567 tim=17263910675443

Before pointing out the oddity (if you haven’t spotted it already) I’ll just explain a few of the numbers thayt are a little unusual.

  • File# = 4097: the user has parameter db_files = 4096, so this is the first Temp file.
  • Block# = 579,715,946: the database is 120TB, and the temporary tablespace is a “bigfile” tablespace so it’s okay for the file to hold more than 579M blocks.
  • Obj# < 0: Negative object numbers is a characteristic of materialized CTEs: if you look at the execution plan a materialized CTE will be reported as a table with a name like  SYS_TEMP_FDA106F9_E259E68.  If you take the first hexadecimal number and treat is as a 32-bit signed integer you get the value that would be reported as the obj# in the trace file.  (Converting to decimal and subtract power(2,32) is one way of doing the arithmetic).
  • tim= nnnnnnnn:  this is the timestamp (usually in microseconds), and we can see intervals of roughly 1,400 to 2,000 microseconds between these lines.

So here’s the oddity: in this set of 4 consecutive waits we’re waiting for multiblock reads of 128 blocks – but each read starts one block after the previous read. It’s as if Oracle is reading 128 blocks and forgetting everything after the first one. And the timestamps are significant because they tell us that this isn’t a case of Oracle spending so much time between reads that the other blocks fall off the end of  the buffer cache before the query reaches them.

I think I’ve seen a pattern like this once before but it would have been quite a long time ago and I can’t find any notes I might have made about it (and it turns out that my previous experience was not relevant to this case). Fortunately another member of Oracle-l had also seen the pattern and supplied the solution through a reference to a MOS document that led to: Doc ID 2251339.1 With Subquery Factorization Temp Table Does Not Cache in Standby in 12.1.0.2.

It’s not a bug – Oracle is supposed to do this if you manage to materialize a CTE in a Read-only Standby database. I don’t understand exactly why there’s a problem but thanks to some feature of how consistent reads operate and block SCNs are generated when you populate the blocks of the global temporary table (GTT) that is your materialized CTE it’s possible for Oracle to produce the wrong results if it re-visits blocks that have been read into the cache from the GTT. So when you do a multiblock read during a tablescan of the GTT Oracle can use the first block it has read (presumably because it’s immediately pinned), but can’t use the remaining 127 – and so you get the odd pattern of consecutive blocks appearing at the start of consecutive multiblock reads.

This raises a couple of interesting (and nasty) questions.

  • First – does every 128 block read get read to the middle of the buffer cache, pushing another 128 blocks out of the buffer cache or does Oracle automatically read the blocks to the “cold” end of the LRU, minimising the impact on the rest of the cache; we hope it’s the latter.
  • Second – If I use a small fetch size while running my query might I find that I have to re-read the same block (with its 127 neghbours) many times because Oracle releases any pinned blocks at the end of each fetch and has to re-acquire the blocks on the next fetch.

If anyone wants to test the second question by running a query from SQL*Plus with extended trace enabled the following simple query should answer the question:

alter session set events '10046 trace name context forever, level 8';
set arraysize 2

with v1 as (select /*+ materialize */ * from all_objects)
select object_name from v1;

Workarounds

There is a workaround to the issue – you can add the hint /*+ inline */ to the query to ensure that the CTE is not materialized. There is a bit of a catch to this, though (on top of the fact that you might then need to have two slightly different versions of the code if you want to run the query on production and standby) – if Oracle places the subquery text inline the optimizer may manage to merge it into the rest of the query and come up with a bad execution plan. Again you can probably work around this threat by extending the hint to read: /*+ inline no_merge */. Even then the optimizer could decide it has better statistics about the “real” table columns that it might have lost when it materialized the subquery, so it could still produce a different execution plan from the materialized plan.

As an alternative (and somewhat more brutal) workaround you could set the hidden parameter “_with_subquery” to inline either at the session or system level, or in the startup parameter file.

 

January 17, 2020

Group by Elimination

Filed under: 12c,18c,Bugs,Oracle — Jonathan Lewis @ 12:57 pm GMT Jan 17,2020

Update: The specific cases of incorrect results reported in this note have been fixed by 19.11.0.0 and 21.3.0.0

Here’s a bug that was highlighted a couple of days ago on the Oracle Developer Community forum; it may be particularly worth thinking about if you haven’t yet got up to Oracle 12c as it appeared in an optimizer feature that appeared in 12.2 and hasn’t been completely fixed even in the latest release of 19c (currently 19.6 as I write this).

Oracle introduced a feature known as “aggregate group by elimination” in 12.2, controlled by the hidden parameter “_optimizer_aggr_groupby_elim”. The notes on MOS about the feature tell us that Oracle can eliminate a group by operation from a query block if a unique key from every table in the query block appears in the group by clause (MOS 23210039.8). Unfortunately there are a couple of gaps in the implementation in 12.2 that can produce wrong results. Here’s some code to model the problem.

rem
rem     Script:         group_by_elim_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

create table ref_clearing_calendar(
calendar_name   char(17),
business_date   date,
update_ts       timestamp (6) default systimestamp,
constraint pk_ref_clearing_calendar
primary key (business_date)
)
/

insert into ref_clearing_calendar (business_date)
select
sysdate + 10 * rownum
from
all_objects
where
rownum &lt;= 40 -- &gt; comment to avoid wordpress format issue
/

commit;

execute dbms_stats.gather_table_stats(null,'ref_clearing_calendar',cascade=&gt;true)

set autotrace on explain

select
to_char(business_date,'YYYY') , count(*)
from
ref_clearing_calendar
group by
to_char(business_date,'YYYY')
order by
to_char(business_date,'YYYY')
/

set autotrace off

I’ve created a table with a primary key on a date column, and then inserted 40 rows which are spaced every ten days from the current date; this ensures that I will have a few dates in each of two consecutive years (future proofing the example, I hope!). Then I’ve aggregated to count the rows per year using the to_char({date column},’YYYY’) conversion option to extract the year from the date. (Side note: the table definition doesn’t follow my normal pattern as the example started life in the forum thread.)

If you run this query on Oracle 12.2 you will find that it returns 40 (non-unique) rows and displays the following execution plan:


---------------------------------------------------------------------------------------------
| Id  | Operation        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   1 |  SORT ORDER BY   |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| PK_REF_CLEARING_CALENDAR |    40 |   320 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

The optimizer has applied “aggregate group by elimination” because it hasn’t detected that the primary key column that appears in the group by clause has been massaged in a way that means the resulting value is no longer unique.

Fortunately this problem with to_char() is fixed in Oracle 18.1 where the query returns two rows using the following execution plan (which I’ve reported from an instance of 19.5):

---------------------------------------------------------------------------------------------
| Id  | Operation        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   1 |  SORT GROUP BY   |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| PK_REF_CLEARING_CALENDAR |    40 |   320 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Unfortunately there is still at least one gap in the implementation. Change the to_char(business_date) to extract(year from business_date) at all three points in the query, and even in 19.6 you’re back to the wrong results – inappropriate aggregate group by elimination and 40 rows returned.

There are a couple of workarounds, one is the hidden parameter _optimizer_aggr_groupby_elim to false at the system or session level, or through an opt_param() hint at the statement level (possibly injected through an SQL_Patch. The other option is to set a fix_control, again at the system, session, or statement level – but there’s seems to be little point in using the fix_control approach (which might be a little obscure for the next developer to see the code) when it seems to do the same as the explicitly named hidden parameter.

select
/*+ opt_param('_optimizer_aggr_groupby_elim','false') */
extract(year from business_date) , count(*)
from ,,,

select
/*+ opt_param('_fix_control','23210039:0') */
extract(year from business_date) , count(*)
from ...

One final thought about this “not quite fixed” bug. It’s the type of “oversight” error that gives you the feeling that there may be other special cases that might have been overlooked. The key question would be: are there any other functions (and not necessarily datetime functions) that might be applied (perhaps implicitly) to a primary or unique key that would produce duplicate results from distinct inputs – if so has the code that checks the validity of eliminating the aggregate operation been written to notice the threat.

Footnote

The problem with extract() has been raised as a bug on MOS, but it was not public at the time of writing this note.

Update (about 60 seconds after publication)

Re-reading my comment about “other functions” it occurred to me that to_nchar() might, or might not, behave the same way as to_char() in 19c – so I tested it … and got the wrong results in 19c.

Update (Dec 2022)

A recent test of the model gave the correct results in 19.11.0.0 and 21.3.0.0

October 3, 2019

Trace Files

Filed under: 12c,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:38 pm BST Oct 3,2019

A recent blog note by Martin Berger about reading trace files in 12.2 popped up in my twitter timeline yesterday and reminded me of a script I wrote a while ago to create a simple view I could query to read the tracefile generated by the current session while the session was still connected. You either have to create the view and a public synonym through the SYS schema, or you have to use the SYS schema to grant select privileges on several dynamic performance views to the user to allow the user to create the view in the their own schema. For my scratch database I tend to create the view in the SYS schema.

Script to be run by SYS:

rem
rem     Script: read_trace_122.sql
rem     Author: Jonathan Lewis
rem     Dated:  Sept 2018
rem
rem     Last tested
rem             12.2.0.1

create or replace view my_trace_file as
select 
        *
from 
        v$diag_trace_file_contents
where
        (adr_home, trace_filename) = (
                select
                --      substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                        substr(
                                substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                1,
                                instr(
                                        substr(tracefile, 1, instr(tracefile,'/',-1)),
                                        'trace'
                                ) - 2
                        ),
                        substr(tracefile, instr(tracefile,'/',-1)+1) trace_filename
                from 
                        v$process
                where   addr = (
                                select  paddr
                                from    v$session
                                where   sid = (
                                        sys_context('userenv','sid')
                                        -- select sid from v$mystat where rownum = 1
                                        -- select dbms_support.mysid from dual
                                )
                        )
        )
;


create public synonym my_trace_file for sys.my_trace_file;
grant select on my_trace_file to {some role};

Alternatively, the privileges you could grant to a user from SYS so that they could create their own view:


grant select on v_$process to some_user;
grant select on v_$session to some_user;
grant select on v_$diag_trace_file_contents to some_user;
and optionally one of:
        grant select on v_$mystat to some_user;
        grant execute on dbms_support to some_user;
                but dbms_support is no longer installed by default.

The references to package dbms_support and view v$mystat are historic ones I have lurking in various scripts from the days when the session id (SID) wasn’t available in any simpler way.

Once the view exists and is available, you can enable some sort of tracing from your session then query the view to read back the trace file. For example, here’s a simple “self-reporting” (it’s going to report the trace file that it causes) script that I’ve run from 12.2.0.1 as a demo:


alter system flush shared_pool;
alter session set sql_trace true;

set linesize 180
set trimspool on
set pagesize 60

column line_number      format  999,999
column piece            format  a150    
column plan             noprint
column cursor#          noprint

break on plan skip 1 on cursor# skip 1

select
        line_number,
        line_number - row_number() over (order by line_number) plan,
        substr(payload,1,instr(payload,' id=')) cursor#,
        substr(payload, 1,150) piece
from
        my_trace_file
where
        file_name = 'xpl.c'
order by
        line_number
/

alter session set sql_trace false;

The script flushes the shared pool to make sure that it’s going to trigger some recursive SQL then enables a simple SQL trace. The query then picks out all the lines in the trace file generated by code in the Oracle source file xpl.c (execution plans seems like a likely guess) which happens to pick out all the STAT lines in the trace (i.e. the ones showing the execution plans).

I’ve used the “tabibitosan” method to identify all the lines that belong to a single execution plan by assuming that they will be consecutive lines in the output starting from a line which includes the text ” id=1 “ (the surrounding spaces are important), but I’ve also extracted the bit of the line which includes the cursor number (STAT #nnnnnnnnnnnnnnn) because two plans may be dumped one after the other if multiple cursors close at the same time. There is still a little flaw in the script because sometimes Oracle will run a sys-recursive statement in the middle of dumping a plan to turn an object_id into an object_name, and this will cause a break in the output.

The result of the query is to extract all the execution plans in the trace file and print them in the order they appear – here’s a sample of the output:


LINE_NUMBER PIECE
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------
         38 STAT #140392790549064 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=53 us cost=4 size=113 card
         39 STAT #140392790549064 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=0 card=1)'


         53 STAT #140392790535800 id=1 cnt=1 pid=0 pos=1 obj=0 op='MERGE JOIN OUTER (cr=5 pr=0 pw=0 str=1 time=95 us cost=2 size=178 card=1)'
         54 STAT #140392790535800 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 str=1 time=57 us cost=2 size=138 card=1)'
         55 STAT #140392790535800 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'
         56 STAT #140392790535800 id=4 cnt=0 pid=1 pos=2 obj=0 op='BUFFER SORT (cr=2 pr=0 pw=0 str=1 time=29 us cost=0 size=40 card=1)'
         57 STAT #140392790535800 id=5 cnt=0 pid=4 pos=1 obj=73 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=2 pr=0 pw=0 str=1 time=10 us cost=0 size=40 card=1)
         58 STAT #140392790535800 id=6 cnt=0 pid=5 pos=1 obj=74 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=2 pr=0 pw=0 str=1 time=8 us cost=0 size=0 card=1)'


         84 STAT #140392791412824 id=1 cnt=1 pid=0 pos=1 obj=20 op='TABLE ACCESS BY INDEX ROWID BATCHED ICOL$ (cr=4 pr=0 pw=0 str=1 time=25 us cost=2 size=54 card
         85 STAT #140392791412824 id=2 cnt=1 pid=1 pos=1 obj=42 op='INDEX RANGE SCAN I_ICOL1 (cr=3 pr=0 pw=0 str=1 time=23 us cost=1 size=0 card=2)'


         94 STAT #140392790504512 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 str=1 time=432 us cost=6 size=374 card=2)'
         95 STAT #140392790504512 id=2 cnt=2 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=7 pr=0 pw=0 str=1 time=375 us cost=5 size=374 card=2)'
         96 STAT #140392790504512 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 str=1 time=115 us cost=2 size=288 card=2)'
         97 STAT #140392790504512 id=4 cnt=2 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 str=1 time=100 us cost=2 size=184 card=2)'
         98 STAT #140392790504512 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=85 us cost=1 size=0 card=1)'
         99 STAT #140392790504512 id=6 cnt=0 pid=3 pos=2 obj=75 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=1 pr=0 pw=0 str=2 time=8 us cost=0 size=52 card=1)'
        100 STAT #140392790504512 id=7 cnt=0 pid=6 pos=1 obj=76 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=1 pr=0 pw=0 str=2 time=7 us cost=0 size=0 card=1)'
        101 STAT #140392790504512 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW  (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=43 card=1)'
        102 STAT #140392790504512 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=0 pw=0 str=1 time=44 us cost=3 size=15 card=1)'
        103 STAT #140392790504512 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS CLUSTER CDEF$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=15 card=1)'
        104 STAT #140392790504512 id=11 cnt=1 pid=10 pos=1 obj=30 op='INDEX UNIQUE SCAN I_COBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        116 STAT #140392791480168 id=1 cnt=4 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 str=1 time=62 us cost=3 size=858 card=13)'
        117 STAT #140392791480168 id=2 cnt=4 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 str=1 time=24 us cost=2 size=858 card=13)'
        118 STAT #140392791480168 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        126 STAT #140392789565328 id=1 cnt=1 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=68 card=1)'
        127 STAT #140392789565328 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 str=1 time=12 us cost=1 size=0 card=1)'


        135 STAT #140392789722208 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=22 us cost=3 size=51 card=
        136 STAT #140392789722208 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 str=1 time=16 us cost=2 size=0 card=1)'


        153 STAT #140392792055264 id=1 cnt=1 pid=0 pos=1 obj=68 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 str=1 time=25 us)'
        154 STAT #140392792055264 id=2 cnt=1 pid=1 pos=1 obj=70 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 str=1 time=19 us)'

If you want to investigate further, the “interesting” columns in the underlying view are probably: section_name, component_name, operation_name, file_name, and function_name. The possible names of functions, files, etc. vary with the trace event you’ve enabled.

 

October 2, 2019

_cursor_obsolete_threshold

Filed under: 12c,Infrastructure,Oracle,Performance,Upgrades — Jonathan Lewis @ 2:39 pm BST Oct 2,2019

At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?

In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8,192 was a good idea ?

The answer is PDBs – specifically the much larger number of PDBs a single CDB can (theoretically) support in 12.2.

In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:

The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.

Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.

It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1,024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.

Impact analysis

What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation on his work. It addresses the following question:

If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.

Here’s the complete code for the test:

rem
rem     Script:         cursor_obsolete.sql
rem     Author:         Jonathan Lewis / Chris Antognini
rem     Dated:          Sep 2019
rem
rem     Last tested 
rem		12.2.0.1
rem

create table t1
as
select 1 id from dual
/

alter table t1 add constraint t1_pk primary key (id)
/

spool cursor_obsolete.lst

alter system flush shared_pool;
alter system flush shared_pool;

set serveroutput off
select /*+ index(t1) */ id from t1 where id > 0;
select * from table(dbms_xplan.display_cursor);

execute snap_my_stats.start_snap
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+8192 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

column sql_text format a60
select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number;

prompt  ===============
prompt  Low child reuse
prompt  ===============

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

prompt  ================
prompt  High child reuse
prompt  ================

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 7168+1..7168+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

spool off

I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.

I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.

The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.

The test runs in three parts.

  • First I loop 8192 times executing the same statement, but with a different value for the optimizer_index_cost_adj for each execution – this gives me the limit of 8192 child cursors, each reporting “Optimizer Mismatch” as the reason for not sharing. I’ve run a query against v$sql after this to check that I have 8192 child cursors – you’ll need to make sure your shared pool is a few hundred megabytes if you want to be sure of keeping all those child cursors in memory.
  • The second part of the test simply repeats the loop but only for the first 1,024 child cursors. At this point the child cursors exist so the optimizer should be doing “soft” parses rather than hard parses.
  • The final part of the test repeats the loop again but only for the last 1,024 child cursors. Again they should exist and be usable so the optimizer should again be doing “soft” parses rather than hard parses.

What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment of the array. So if you have to find child cursor 8191 you will have to following 512 segment pointers and 16 pointers per segment (totaling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.

One peripheral question to ask, of course, is whether Oracle keeps appending to the segmented array or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).

The results, limited to just the second and third parts of the test, and with a couple of small edits are as follows:


host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt

===============                                            |    ================
Low child reuse                                            |    High child reuse
===============                                            |    ================

Interval:-  0 seconds                                      |    Interval:-  6 seconds

opened cursors cumulative                      2,084       |    opened cursors cumulative                      2,054
recursive calls                                6,263       |    recursive calls                                6,151
recursive cpu usage                               33       |    recursive cpu usage                              570
session logical reads                          1,069       |    session logical reads                          1,027
CPU used when call started                        33       |    CPU used when call started                       579
CPU used by this session                          37       |    CPU used by this session                         579
DB time                                           34       |    DB time                                          580
non-idle wait count                               16       |    non-idle wait count                                5
process last non-idle time                         1       |    process last non-idle time                         6
session pga memory                           524,288       |    session pga memory                            65,536
enqueue requests                                  10       |    enqueue requests                                   3
enqueue releases                                  10       |    enqueue releases                                   3
consistent gets                                1,069       |    consistent gets                                1,027
consistent gets from cache                     1,069       |    consistent gets from cache                     1,027
consistent gets pin                            1,039       |    consistent gets pin                            1,024
consistent gets pin (fastpath)                 1,039       |    consistent gets pin (fastpath)                 1,024
consistent gets examination                       30       |    consistent gets examination                        3
consistent gets examination (fastpath)            30       |    consistent gets examination (fastpath)             3
logical read bytes from cache              8,757,248       |    logical read bytes from cache              8,413,184
calls to kcmgcs                                    5       |    calls to kcmgcs                                    3
calls to get snapshot scn: kcmgss              1,056       |    calls to get snapshot scn: kcmgss              1,026
table fetch by rowid                              13       |    table fetch by rowid                               1
rows fetched via callback                          6       |    rows fetched via callback                          1
index fetch by key                                 9       |    index fetch by key                                 1
index scans kdiixs1                            1,032       |    index scans kdiixs1                            1,024
session cursor cache hits                         14       |    session cursor cache hits                          0
cursor authentications                         1,030       |    cursor authentications                         1,025
buffer is not pinned count                     1,066       |    buffer is not pinned count                     1,026
parse time cpu                                    23       |    parse time cpu                                   558
parse time elapsed                                29       |    parse time elapsed                               556
parse count (total)                            2,076       |    parse count (total)                            2,052
parse count (hard)                                11       |    parse count (hard)                                 3
execute count                                  1,050       |    execute count                                  1,028
bytes received via SQL*Net from client         1,484       |    bytes received via SQL*Net from client         1,486

Two important points to note:

  • the CPU utilisation goes up from 0.33 seconds to 5.7 seconds.
  • the number of hard parses is zero, this is all about searching for the correct pre-existing cursor

You might question the 2,048-ish parse count(total) – but don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.

The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.

Bottom line

The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.

If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.

Footnote 1

The odd thing about this “cursor obselete” feature is that I have a distinct memory that when  PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.

Footnote 2

If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id  296377.1“Troubleshooting: High Version Count Issues” is a useful reference.

September 8, 2019

Quiz Night

Filed under: 12c,Infrastructure,Oracle,redo,Upgrades — Jonathan Lewis @ 10:15 pm BST Sep 8,2019

Upgrades cause surprises – here’s a pair of results from a model that I constructed more than 15 years ago, and ran today on 12.2, then modified and ran again, then ran on 11.2.0.4, then on 12.1.0.2. It’s very simple, I just create a table, gather stats, then update every row.

rem
rem     Script:         update_nochange.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2019
rem
rem     Last tested
rem             19.11.0.0 
rem             12.2.0.1 
rem

create table t1
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                rownum <= 1e4  -- > comment to avoid wordpress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'x')             small_vc,
--      lpad(rownum,10,'0')             small_vc,
        'Y'                             flag
from
        generator       v1
where   rownum <= 1e3   -- > comment to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

execute snap_my_stats.start_snap

update t1 set small_vc = upper(small_vc);

execute snap_my_stats.end_snap

The calls to package snap_my_stats are my little routines to calculate the change in the session activity stats (v$sesstat, joined to v$statname) due to the update. Here are a a few of the results for the test using the code as it stands:


Name                                    Value
----                                    -----
redo entries                               36
redo size                             111,756
undo change vector size                53,220

You’ll notice, however, that the CTAS has an option commented out to create the small_vc column using lpad(rownum,‘0’) rather than lpad(rownum,‘x’). This is what the redo stats look like if I use ‘0’ instead of ‘x’:


Name                                    Value
----                                    -----
redo entries                              909
redo size                             223,476
undo change vector size                68,256

What – they’re different ?!  (and it’s reproducible).

Running the test on 12.1.0.2 or 11.2.0.4, both variants of the code produce the same (lower) number of redo entries (and bytes) and undo – it’s only 12.2.0.1 that shows a difference.  [Update Jan 2020:The same behaviour, with slight variation in numbers, also appears in 19.3.0.0]

Tonight’s quiz:

Figure out what’s happening in 12.2.0.1 to give two different sets of undo and redo figures.

If that problem is too easy – extrapolate the test to more complex cases to see when the difference stops appearing, and see if you can find any cases where this new feature might cause existing applications to break.

I’ll supply the basic answer in 48 hours.

Update (a few hours early)

The question has been answered in the comments – it’s an optimisation introduced in 12.2 that attempts to reduce the amount of undo and redo by minimising the work done for “no change” updates to data.  In principle – but we don’t yet know the rules and limitations – if an update does not change the column values Oracle 12.2 will not “save the old values in an undo record and log the new values in a redo change vector”, it will simply lock the row, to produce a minimal redo change vector.

Unfortunately Oracle goes into “single row” mode to lock rows, while it can do “block-level” – i.e. multi-row/array processing – when it using the normal “change” mechanism.  Inevitably there are likely to be cases where the 12.2 optimisation actually produces a worse result in terms of volume of redo, or contention for redo latches.

If we modify the code to dump the redo generated by the two different updates we can see more clearly what Oracle is doing:

alter session set tracefile_identifier = 'UPD';

column start_scn new_value m_start_scn
select to_char(current_scn,'999999999999999999999999') start_scn from v$database;

update t1 set small_vc = upper(small_vc);
commit;

column end_scn new_value m_end_scn
select to_char(current_scn,'999999999999999999999999') end_scn from v$database;

alter system dump redo scn min &m_start_scn scn max &m_end_scn;

Then, after running the test we can dump the list of redo op codes from the trace file:

First when we do the “no change” update (with lots of repetitions deleted):

grep -n OP orcl12c_ora_21999_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

129:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
138:CHANGE #1  OP:11.4
147:CHANGE #2  OP:5.2
150:CHANGE #3  OP:11.4
159:CHANGE #4  OP:11.4
168:CHANGE #5  OP:11.4
177:CHANGE #6  OP:11.4
...
2458:CHANGE #189  OP:5.1
2474:CHANGE #190  OP:5.1
2490:CHANGE #191  OP:5.1
2506:CHANGE #192  OP:5.1
2525:CHANGE #1  OP:5.1
2541:CHANGE #2  OP:11.4
2553:CHANGE #1  OP:5.1
2569:CHANGE #2  OP:11.4
...
27833:CHANGE #1  OP:5.1
27849:CHANGE #2  OP:11.4
27861:CHANGE #1  OP:5.1
27877:CHANGE #2  OP:11.4
27889:CHANGE #1  OP:5.4

The dump starts with a large redo record (192 change vectors) that started life in a private redo buffer, then switches to the standard “paired change vectors” in the public redo buffer. The 11.4 vectors are “lock row piece” while the 5.1 vectors are the “generate undo”. Counting the 11.4 and 5.1 lines there are exactly 1,000 of each – every row has been individually locked.

Now for the “real change” update:

grep -n OP orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

126:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
135:CHANGE #1  OP:11.19
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
430:CHANGE #4  OP:11.19
576:CHANGE #5  OP:11.19
...
5469:CHANGE #41  OP:5.1
5573:CHANGE #42  OP:5.1
5726:CHANGE #43  OP:5.1
5879:CHANGE #44  OP:5.1
6035:CHANGE #1  OP:5.1
6188:CHANGE #2  OP:11.19
6337:CHANGE #1  OP:5.1
6490:CHANGE #2  OP:11.19
...
15029:CHANGE #2  OP:11.19
15101:CHANGE #1  OP:5.1
15177:CHANGE #2  OP:11.19
15249:CHANGE #1  OP:5.4

It’s a much smaller trace file (ca. 15,249 lines compared to ca. 27889 lines), and the table change vectors are 11.19 (Table array update) rather than 11.4 (table lock row piece). Counting the op codes we get 52 of each of the 11.19 and 5.1. If we want a little more information about those vectors we can do the following:


egrep -n -e "OP:" -e "Array Update" orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//' 

126:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
135:CHANGE #1  OP:11.19
140:Array Update of 20 rows:
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
289:Array Update of 20 rows:
430:CHANGE #4  OP:11.19
435:Array Update of 20 rows:
576:CHANGE #5  OP:11.19
581:Array Update of 20 rows:
...
5469:CHANGE #41  OP:5.1
5481:Array Update of 13 rows:
5573:CHANGE #42  OP:5.1
5585:Array Update of 20 rows:
5726:CHANGE #43  OP:5.1
5738:Array Update of 20 rows:
5879:CHANGE #44  OP:5.1
5891:Array Update of 20 rows:
6035:CHANGE #1  OP:5.1
6047:Array Update of 20 rows:
6188:CHANGE #2  OP:11.19
6193:Array Update of 20 rows:
6337:CHANGE #1  OP:5.1
6349:Array Update of 20 rows:
...
14953:CHANGE #1  OP:5.1
14965:Array Update of 9 rows:
15029:CHANGE #2  OP:11.19
15034:Array Update of 9 rows:
15101:CHANGE #1  OP:5.1
15113:Array Update of 9 rows:
15177:CHANGE #2  OP:11.19
15182:Array Update of 9 rows:
15249:CHANGE #1  OP:5.4

As you can see, the 11.19 (table change) and 5.1 (undo) change vectors both report that they are are structured as array updates. In most cases the array size is 20 rows, but there are a few cases where the array size is smaller. In this test I found one update with an array size of 13 rows and three updates with an array size of 9 rows.

Summary

Oracle has introduced an optimisation for “no change” updates in 12.2 that tries to avoid generating quite so much undo and redo; however this may result in some cases where an “array update” change vector turns into many “single row lock” change vectors, so when you upgrade to 12.2 (or beyone) you may want to check any large update mechanism you run to see if your system has benefited or been penalised to any significant extent by this change. The key indicator will be an increase in the value of the session/system stats “redo entries” and “redo size”.

Update (June 2021)

Prompted by a thread on the Oracle-L mailing list I’ve just re-run the tests on an instance of 19.11.0.0/

Some time between 19.3 and 19.11 the code for generating redo has changed again, and it looks as if Oracle is now using the 11.19 array update redo vector to lock unchanged rows using the simple (on the face of it) fix to the standard code so that each entry in the array says: “no new columns and no change in row size”. The results were as follows:

19.11.0.0 - when the values changed
=============================================
Name                                    Value
----                                    -----
redo entries                               37
redo size                             111,860

19.11.0.0 - when the values didn't change
=============================================
Name                                    Value
----                                    -----
redo entries                               34
redo size                              89,588

So, a big improvement over the earlier “no change” implementation. Obviously, though, there are many more tests that could be done to add to this note. What happens, for example, if there is a mix of changed and unchanged rows. (I can guess, but I won’t say anything until after I’ve checked.)

Another Update (June 2021)

I added one more variation to the test I was running on 19.11.0.0, which was to mix a few “real” changes with a lot of “no-change” updates. Since I had noted that Oracle’s Array Update vector (11.19) was processing 20 rows at a time I set up a data pattern that would basically repreat 15 no-change rows followed by  2 changed rows, using the following expression to generate my small_vc column:

        case
                when mod(rownum,17) in (15,16)
                        then  lpad(rownum,10,'x')
                        else  lpad(rownum,10,'0')
        end     small_vc,

The resulting redo change vectors suggested fairly strongly that Oracle was walking through each block constructing a “no-change” redo array, until it hit a changed row at which point it started a new redo array for changed rows and carried on populating that array until it reached the next no-change row and started the next redo array.

Here’s an extract from a point in the redo log dump shortly after the point where the session has switched from using the private redo buffer and has started writing directly to the public redo buffer:


egrep -n -e "OP:" -e "Array Update" or19_ora_1744_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

4553:CHANGE #1 OP:5.1
4567:Array Update of 15 rows:
4658:CHANGE #2 OP:11.19
4666:Array Update of 15 rows:

4760:CHANGE #1 OP:5.1
4774:Array Update of 2 rows:
4789:CHANGE #2 OP:11.19
4797:Array Update of 2 rows:

4815:CHANGE #1 OP:5.1
4829:Array Update of 15 rows:
4920:CHANGE #2 OP:11.19
4928:Array Update of 15 rows:

5022:CHANGE #1 OP:5.1
5036:Array Update of 2 rows:
5051:CHANGE #2 OP:11.19
5059:Array Update of 2 rows:

I’ve edited a blank line between the redo records to make it a little easier to see that we have a 15 row undo/redo pair of change vectors followed by a 2 row undo/redo pair. There are various boundary cases where the numbers aren’t this clean – the end of data block is one case, the “glitch” partway through the data block due to the initial insert using an internal array size of 255 rows is another.

June 26, 2019

Glitches

Filed under: 12c,Bugs,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 5:11 pm BST Jun 26,2019

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.

 

June 9, 2019

CPU percent

Filed under: 12c,AWR,Oracle — Jonathan Lewis @ 2:31 pm BST Jun 9,2019

A recent post on the ODC General Database forum asked for an explanation of the AWR report values “%Total CPU” and “%Busy CPU” under the “Instance CPU” label, and how the “%Busy CPU “ could be greater than 100%.  Here’s a text reproduction of the relevant sample supplied:

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
2 2 1 0.30 1.23 10.7 5.6 5.3 77.7

Instance CPU

%Total CPU %Busy CPU %DB Time waiting for CPU (Resource Manager)
29.8 133.8 0.0

The answer is probably “It’s 12.1 and it’s a programmer error”.

  • Note that the Host CPU %Idle is not consistent with the three usage figures:  10.7 + 5.6 + 5.3 = 21.6 whereas 100 – 77.7 = 22.3.
  • So let’s run with 22.3% and see what else we can notice: 29.8 / 22.3 = 1.3363 – that’s pretty close (when expressed as a percentage) to 133.8%

Hypothesis:

Someone did the division the wrong way round when trying to work out the percentage of the host’s non-idle CPU that could be attributed to the instance. In this example the “%Busy CPU” should actually report 100 * 22.3 / 29.8 = 74.8%

Note – the difference between 133.8 and 133.63 can be attributed to the fact that the various figures reported in this bit of the AWR are rounded to the nearest 1 decimal place.

Note 2 – I don’t think this error is present in 11.2.0.4 or 12.2.0.1

 

 

 

 

 

Next Page »

Website Powered by WordPress.com.