Oracle Scratchpad

October 3, 2019

Trace Files

Filed under: 12c,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:38 pm BST Oct 3,2019

A recent blog note by Martin Berger about reading trace files in 12.2 poped up in my twitter timeline yesterday and reminded me of a script I wrote a while ago to create a simple view I could query to read the tracefile generated by the current session while the session was still connected. You either have to create the view and a public synonym through the SYS schema, or you have to use the SYS schema to grant select privileges on several dynamic performance views to the user to allow the user to create the view in the user’s schema. For my scratch database I tend to create the view in the SYS schema.

Script to be run by SYS:

rem
rem     Script: read_trace_122.sql
rem     Author: Jonathan Lewis
rem     Dated:  Sept 2018
rem
rem     Last tested
rem             12.2.0.1

create or replace view my_trace_file as
select 
        *
from 
        v$diag_trace_file_contents
where
        (adr_home, trace_filename) = (
                select
                --      substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                        substr(
                                substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                1,
                                instr(
                                        substr(tracefile, 1, instr(tracefile,'/',-1)),
                                        'trace'
                                ) - 2
                        ),
                        substr(tracefile, instr(tracefile,'/',-1)+1) trace_filename
                from 
                        v$process
                where   addr = (
                                select  paddr
                                from    v$session
                                where   sid = (
                                        sys_context('userenv','sid')
                                        -- select sid from v$mystat where rownum = 1
                                        -- select dbms_support.mysid from dual
                                )
                        )
        )
;


create public synonym my_trace_file for sys.my_trace_file;
grant select on my_trace_file to {some role};

Alternatively, the privileges you could grant to a user from SYS so that they could create their own view:


grant select on v_$process to some_user;
grant select on v_$session to some_user;
grant select on v_$diag_trace_file_contents to some_user;
and optionally one of:
        grant select on v_$mystat to some_user;
        grant execute on dbms_support to some_user;
                but dbms_support is no longer installed by default.

The references to package dbms_support and view v$mystat are historic ones I have lurking in various scripts from the days when the session id (SID) wasn’t available in any simpler way.

Once the view exists and is available, you can enable some sort of tracing from your session then query the view to read back the trace file. For example, here’s a simple “self-reporting” (it’s going to report the trace file that it causes) script that I’ve run from 12.2.0.1 as a demo:


alter system flush shared_pool;
alter session set sql_trace true;

set linesize 180
set trimspool on
set pagesize 60

column line_number      format  999,999
column piece            format  a150    
column plan             noprint
column cursor#          noprint

break on plan skip 1 on cursor# skip 1

select
        line_number,
        line_number - row_number() over (order by line_number) plan,
        substr(payload,1,instr(payload,' id=')) cursor#,
        substr(payload, 1,150) piece
from
        my_trace_file
where
        file_name = 'xpl.c'
order by
        line_number
/

alter session set sql_trace false;

The script flushes the shared pool to make sure that it’s going to trigger some recursive SQL then enables a simple SQL trace. The query then picks out all the lines in the trace file generated by code in the Oracle source file xpl.c (execution plans seems like a likely guess) which happens to pick out all the STAT lines in the trace (i.e. the ones showing the execution plans).

I’ve used the “tabibitosan” method to identify all the lines that belong to a single execution plan by assuming that they will be consecutive lines in the output starting from a line which includes the text ” id=1 “ (the surrounding spaces are important), but I’ve also extracted the bit of the line which includes the cursor number (STAT #nnnnnnnnnnnnnnn) because two plans may be dumped one after the other if multiple cursors close at the same time. There is still a little flaw in the script because sometimes Oracle will run a sys-recursive statement in the middle of dumping a plan to turn an object_id into an object_name, and this will cause a break in the output.

The result of the query is to extract all the execution plans in the trace file and print them in the order they appear – here’s a sample of the output:


LINE_NUMBER PIECE
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------
         38 STAT #140392790549064 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=53 us cost=4 size=113 card
         39 STAT #140392790549064 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=0 card=1)'


         53 STAT #140392790535800 id=1 cnt=1 pid=0 pos=1 obj=0 op='MERGE JOIN OUTER (cr=5 pr=0 pw=0 str=1 time=95 us cost=2 size=178 card=1)'
         54 STAT #140392790535800 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 str=1 time=57 us cost=2 size=138 card=1)'
         55 STAT #140392790535800 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'
         56 STAT #140392790535800 id=4 cnt=0 pid=1 pos=2 obj=0 op='BUFFER SORT (cr=2 pr=0 pw=0 str=1 time=29 us cost=0 size=40 card=1)'
         57 STAT #140392790535800 id=5 cnt=0 pid=4 pos=1 obj=73 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=2 pr=0 pw=0 str=1 time=10 us cost=0 size=40 card=1)
         58 STAT #140392790535800 id=6 cnt=0 pid=5 pos=1 obj=74 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=2 pr=0 pw=0 str=1 time=8 us cost=0 size=0 card=1)'


         84 STAT #140392791412824 id=1 cnt=1 pid=0 pos=1 obj=20 op='TABLE ACCESS BY INDEX ROWID BATCHED ICOL$ (cr=4 pr=0 pw=0 str=1 time=25 us cost=2 size=54 card
         85 STAT #140392791412824 id=2 cnt=1 pid=1 pos=1 obj=42 op='INDEX RANGE SCAN I_ICOL1 (cr=3 pr=0 pw=0 str=1 time=23 us cost=1 size=0 card=2)'


         94 STAT #140392790504512 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 str=1 time=432 us cost=6 size=374 card=2)'
         95 STAT #140392790504512 id=2 cnt=2 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=7 pr=0 pw=0 str=1 time=375 us cost=5 size=374 card=2)'
         96 STAT #140392790504512 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 str=1 time=115 us cost=2 size=288 card=2)'
         97 STAT #140392790504512 id=4 cnt=2 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 str=1 time=100 us cost=2 size=184 card=2)'
         98 STAT #140392790504512 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=85 us cost=1 size=0 card=1)'
         99 STAT #140392790504512 id=6 cnt=0 pid=3 pos=2 obj=75 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=1 pr=0 pw=0 str=2 time=8 us cost=0 size=52 card=1)'
        100 STAT #140392790504512 id=7 cnt=0 pid=6 pos=1 obj=76 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=1 pr=0 pw=0 str=2 time=7 us cost=0 size=0 card=1)'
        101 STAT #140392790504512 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW  (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=43 card=1)'
        102 STAT #140392790504512 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=0 pw=0 str=1 time=44 us cost=3 size=15 card=1)'
        103 STAT #140392790504512 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS CLUSTER CDEF$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=15 card=1)'
        104 STAT #140392790504512 id=11 cnt=1 pid=10 pos=1 obj=30 op='INDEX UNIQUE SCAN I_COBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        116 STAT #140392791480168 id=1 cnt=4 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 str=1 time=62 us cost=3 size=858 card=13)'
        117 STAT #140392791480168 id=2 cnt=4 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 str=1 time=24 us cost=2 size=858 card=13)'
        118 STAT #140392791480168 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'


        126 STAT #140392789565328 id=1 cnt=1 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=68 card=1)'
        127 STAT #140392789565328 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 str=1 time=12 us cost=1 size=0 card=1)'


        135 STAT #140392789722208 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=22 us cost=3 size=51 card=
        136 STAT #140392789722208 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 str=1 time=16 us cost=2 size=0 card=1)'


        153 STAT #140392792055264 id=1 cnt=1 pid=0 pos=1 obj=68 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 str=1 time=25 us)'
        154 STAT #140392792055264 id=2 cnt=1 pid=1 pos=1 obj=70 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 str=1 time=19 us)'

If you want to investigate further, the “interesting” columns in the underlying view are probably: section_name, component_name, operation_name, file_name, and function_name. The possible names of functions, files, etc. vary with the trace event you’ve enabled.

 

October 2, 2019

_cursor_obsolete_threshold

Filed under: 12c,Infrastructure,Oracle,Performance — Jonathan Lewis @ 2:39 pm BST Oct 2,2019

At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?

In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8192 was a good idea ?

The answer is PDBs – specifically the much larger number of PDBs a single CBD can (theoretically) support in 12.2.

In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:

The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.

Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.

It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.

Impact analysis

What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation onhis work. It addresses the following question:

If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.

Here’s the complete code for the test:


create table t1
select 1 id from dual
/

alter table t1 add constraint t1_pk primary key (id)
/

spool cursor_obsolete.lst

alter system flush shared_pool;
alter system flush shared_pool;

set serveroutput off
select /*+ index(t1) */ id from t1 where id > 0;
select * from table(dbms_xplan.display_cursor);

execute snap_my_stats.start_snap
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+8192 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

column sql_text format a60
select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number;

prompt  ===============
prompt  Low child reuse
prompt  ===============

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

prompt  ================
prompt  High child reuse
prompt  ================

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 7168+1..7168+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

spool off

I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.

I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.

The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.

The test runs in three parts.

  • First I loop 8192 times executing the same statement, but with a different value for the optimizer_index_cost_adj for each execution – this gives me the limit of 8192 child cursors, each reporting “Optimizer Mismatch” as the reason for not sharing. I’ve run a query against v$sql after this to check that I have 8192 child cursors – you’ll need to make sure your shared pool is a few hundred megabytes if you want to be sure of keeping them all in memory.
  • The second part of the test simply repeats the loop, but only for the first 1,024 child cursors. At this point the child cursors exist, so the optimizer should be doing “soft” parses rather than hard parses.
  • The final part of the test repeats the loop again, but only for the last 1,024 child cursors. Again they should exist and be usable, so the optimizer should again be doing “soft” parses rather than hard parses.

What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment. So if you have to find child cursor 8191 you will have to following 512 segment pointers, and 16 pointers per segment (totalling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.

One preipheral question to ask, of course, is whether Oracle keeps appending to the segmented array, or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).

And the results, limited to just the second and third parts, with just a couple of small edits are as follows:


host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt

===============                                            |    ================
Low child reuse                                            |    High child reuse
===============                                            |    ================

Interval:-  0 seconds                                      |    Interval:-  6 seconds

opened cursors cumulative                      2,084       |    opened cursors cumulative                      2,054
recursive calls                                6,263       |    recursive calls                                6,151
recursive cpu usage                               33       |    recursive cpu usage                              570
session logical reads                          1,069       |    session logical reads                          1,027
CPU used when call started                        33       |    CPU used when call started                       579
CPU used by this session                          37       |    CPU used by this session                         579
DB time                                           34       |    DB time                                          580
non-idle wait count                               16       |    non-idle wait count                                5
process last non-idle time                         1       |    process last non-idle time                         6
session pga memory                           524,288       |    session pga memory                            65,536
enqueue requests                                  10       |    enqueue requests                                   3
enqueue releases                                  10       |    enqueue releases                                   3
consistent gets                                1,069       |    consistent gets                                1,027
consistent gets from cache                     1,069       |    consistent gets from cache                     1,027
consistent gets pin                            1,039       |    consistent gets pin                            1,024
consistent gets pin (fastpath)                 1,039       |    consistent gets pin (fastpath)                 1,024
consistent gets examination                       30       |    consistent gets examination                        3
consistent gets examination (fastpath)            30       |    consistent gets examination (fastpath)             3
logical read bytes from cache              8,757,248       |    logical read bytes from cache              8,413,184
calls to kcmgcs                                    5       |    calls to kcmgcs                                    3
calls to get snapshot scn: kcmgss              1,056       |    calls to get snapshot scn: kcmgss              1,026
table fetch by rowid                              13       |    table fetch by rowid                               1
rows fetched via callback                          6       |    rows fetched via callback                          1
index fetch by key                                 9       |    index fetch by key                                 1
index scans kdiixs1                            1,032       |    index scans kdiixs1                            1,024
session cursor cache hits                         14       |    session cursor cache hits                          0
cursor authentications                         1,030       |    cursor authentications                         1,025
buffer is not pinned count                     1,066       |    buffer is not pinned count                     1,026
parse time cpu                                    23       |    parse time cpu                                   558
parse time elapsed                                29       |    parse time elapsed                               556
parse count (total)                            2,076       |    parse count (total)                            2,052
parse count (hard)                                11       |    parse count (hard)                                 3
execute count                                  1,050       |    execute count                                  1,028
bytes received via SQL*Net from client         1,484       |    bytes received via SQL*Net from client         1,486

Two important points to note:

  • the CPU utilisation goes up from 0.33 seconds to 5.7 seconds.
  • the number of hard parses is zero, this is all about searching for the

You might question are the 2,048-ish parse count(total) – don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.

The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.

Bottom line

The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.

If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.

Footnote 1

The odd thing about this “cursor obselete” feature is that I have a distinct memory that when  PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.

Footnote 2

If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id  296377.1“Troubleshooting: High Version Count Issues” is a useful reference.

September 8, 2019

Quiz Night

Filed under: 12c,Infrastructure,Oracle,redo — Jonathan Lewis @ 10:15 pm BST Sep 8,2019

Upgrades cause surprises – here’s a pair of results from a model that I constructed more than 15 years ago, and ran today on 12.2, then modified and ran again, then ran on 11.2.0.4, then on 12.1.0.2. It’s very simple, I just create a table, gather stats, then update every row.

rem
rem     Script:         update_nochange.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2019
rem
rem     Last tested 
rem             12.2.0.1 

create table t1
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                rownum <= 1e4  -- > comment to avoid wordpress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'x')             small_vc,
--      lpad(rownum,10,'0')             small_vc,
        'Y'                             flag
from
        generator       v1
where   rownum <= 1e3   -- > comment to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

execute snap_my_stats.start_snap

update t1 set small_vc = upper(small_vc);

execute snap_my_stats.end_snap

The calls to package snap_my_stats are my little routines to calculate the change in the session activity stats (v$sesstat, joined to v$statname) due to the update. Here are a a few of the results for the test using the code as it stands:


Name                                    Value
----                                    -----
redo entries                               36
redo size                             111,756
undo change vector size                53,220

You’ll notice, however, that the CTAS has an option commented out to create the small_vc column using lpad(rownum,‘0’) rather than lpad(rownum,‘x’). This is what the redo stats look like if I use ‘0’ instead of ‘x’:


Name                                    Value
----                                    -----
redo entries                              909
redo size                             223,476
undo change vector size                68,256

What – they’re different ?!  (and it’s reproducible).

Running the test on 12.1.0.2 or 11.2.0.4, both variants of the code produce the smaller number of redo entries (and bytes) and undo – it’s only 12.2.0.1 that shows a change. (I haven’t got around to testing 18.3 and 19.3 yet.)

Tonight’s quiz:

Figure out what’s happening in 12.2.0.1 to give two different sets of undo and redo figures.

If that problem is too easy – extrapolate the test to more complex cases to see when the difference stops appearing, and see if you can find any cases where this new feature might cause existing applications to break.

I’ll supply the basic answer in 48 hours.

Update (a few hours early)

The question has been answered in the comments – it’s an optimisation introduced in 12.2 that attempts to reduce the amount of undo and redo by minimising the work done for “no change” updates to data.  In principle – but we don’t yet know the rules and limitations – if an update does not change the column values Oracle 12.2 will not “save the old values in an undo record and log the new values in a redo change vector”, it will simply lock the row, to produce a minimal redo change vector.

Unfortunately Oracle goes into “single row” mode to lock rows, while it can do “block-level – i.e. multi-row/array” processing if it uses the “change” mechanism.  Inevitably there are likely to be cases where the 12.2 optimisation actually produces a worse result in terms of volume of redo, or contention for redo latches.

If we modify the code to dump the redo generated by the two different updates we can see more clearly what Oracle is doing:

alter session set tracefile_identifier = 'UPD';

column start_scn new_value m_start_scn
select to_char(current_scn,'999999999999999999999999') start_scn from v$database;

update t1 set small_vc = upper(small_vc);
commit;

column end_scn new_value m_end_scn
select to_char(current_scn,'999999999999999999999999') end_scn from v$database;

alter system dump redo scn min &m_start_scn scn max &m_end_scn;

Then, after running the test we can dump the list of redo op codes from the trace file:

First when we do the “no-change” update (with lots of repetitions deleted):

grep -n OP orcl12c_ora_21999_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

129:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
138:CHANGE #1  OP:11.4
147:CHANGE #2  OP:5.2
150:CHANGE #3  OP:11.4
159:CHANGE #4  OP:11.4
168:CHANGE #5  OP:11.4
177:CHANGE #6  OP:11.4
...
2458:CHANGE #189  OP:5.1
2474:CHANGE #190  OP:5.1
2490:CHANGE #191  OP:5.1
2506:CHANGE #192  OP:5.1
2525:CHANGE #1  OP:5.1
2541:CHANGE #2  OP:11.4
2553:CHANGE #1  OP:5.1
2569:CHANGE #2  OP:11.4
...
27833:CHANGE #1  OP:5.1
27849:CHANGE #2  OP:11.4
27861:CHANGE #1  OP:5.1
27877:CHANGE #2  OP:11.4
27889:CHANGE #1  OP:5.4

The dump starts with a large redo record (192 change vectors) that started life in the private redo buffer, and then switch to the standard “paired change vectors” in the public redo buffer. The 11.4 vectors are “lock row piece” while the 5.1 vectors are the “generate undo”. Counting the 11.4 and 5.1 lines there are exactly 1,000 of each – every row has been individually locked.

Now for the “real change” update:

grep -n OP orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

126:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
135:CHANGE #1  OP:11.19
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
430:CHANGE #4  OP:11.19
576:CHANGE #5  OP:11.19
...
5469:CHANGE #41  OP:5.1
5573:CHANGE #42  OP:5.1
5726:CHANGE #43  OP:5.1
5879:CHANGE #44  OP:5.1
6035:CHANGE #1  OP:5.1
6188:CHANGE #2  OP:11.19
6337:CHANGE #1  OP:5.1
6490:CHANGE #2  OP:11.19
...
15029:CHANGE #2  OP:11.19
15101:CHANGE #1  OP:5.1
15177:CHANGE #2  OP:11.19
15249:CHANGE #1  OP:5.4

It’s a much smaller trace file (ca. 15,249 lines compared to ca. 27889 lines), and the table change vectors are 11.19 (Table array update) rather than 11.4 (table lock row piece). Counting the op codes we get 52 of each of the 11.19 and 5.1. If we want a little more information about those vectors we can do the following:


egrep -n -e "OP:" -e "Array Update" orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//' 

126:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
135:CHANGE #1  OP:11.19
140:Array Update of 20 rows:
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
289:Array Update of 20 rows:
430:CHANGE #4  OP:11.19
435:Array Update of 20 rows:
576:CHANGE #5  OP:11.19
581:Array Update of 20 rows:
...
5469:CHANGE #41  OP:5.1
5481:Array Update of 13 rows:
5573:CHANGE #42  OP:5.1
5585:Array Update of 20 rows:
5726:CHANGE #43  OP:5.1
5738:Array Update of 20 rows:
5879:CHANGE #44  OP:5.1
5891:Array Update of 20 rows:
6035:CHANGE #1  OP:5.1
6047:Array Update of 20 rows:
6188:CHANGE #2  OP:11.19
6193:Array Update of 20 rows:
6337:CHANGE #1  OP:5.1
6349:Array Update of 20 rows:
...
14953:CHANGE #1  OP:5.1
14965:Array Update of 9 rows:
15029:CHANGE #2  OP:11.19
15034:Array Update of 9 rows:
15101:CHANGE #1  OP:5.1
15113:Array Update of 9 rows:
15177:CHANGE #2  OP:11.19
15182:Array Update of 9 rows:
15249:CHANGE #1  OP:5.4

As you can see, the 11.19 (table change) and 5.1 (undo) change vectors both report that they are are structured as array updates. In most cases the array size is 20 rows, but there are a few cases where the array size is smaller. In this test I found one update with an array size of 13 rows and three updates with an array size of 9 rows.

Summary

Oracle has introduced an optimisation for “no change” updates in 12.2 that tries to avoid generating quite so much undo and redo; however this may result in some cases where an “array update” change vector turns into several “single row lock” change vectors, so when you upgrade to 12.2 (or beyone) you may want to check any large update mechanism you run to see if your system has benefited or been penalised to any significant extent by this change. The key indicator will be an increase in the value of the session/system stats “redo entries” and “redo size”.

June 26, 2019

Glitches

Filed under: 12c,Bugs,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 5:11 pm BST Jun 26,2019

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.

 

June 9, 2019

CPU percent

Filed under: 12c,AWR,Oracle — Jonathan Lewis @ 2:31 pm BST Jun 9,2019

A recent post on the ODC General Database forum asked for an explanation of the AWR report values “%Total CPU” and “%Busy CPU” under the “Instance CPU” label, and how the “%Busy CPU “ could be greater than 100%.  Here’s a text reproduction of the relevant sample supplied:

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
2 2 1 0.30 1.23 10.7 5.6 5.3 77.7

Instance CPU

%Total CPU %Busy CPU %DB Time waiting for CPU (Resource Manager)
29.8 133.8 0.0

The answer is probably “It’s 12.1 and it’s a programmer error”.

  • Note that the Host CPU %Idle is not consistent with the three usage figures:  10.7 + 5.6 + 5.3 = 21.6 whereas 100 – 77.7 = 22.3.
  • So let’s run with 22.3% and see what else we can notice: 29.8 / 22.3 = 1.3363 – that’s pretty close (when expressed as a percentage) to 133.8%

Hypothesis:

Someone did the division the wrong way round when trying to work out the percentage of the host’s non-idle CPU that could be attributed to the instance. In this example the “%Busy CPU” should actually report 100 * 22.3 / 29.8 = 74.8%

Note – the difference between 133.8 and 133.63 can be attributed to the fact that the various figures reported in this bit of the AWR are rounded to the nearest 1 decimal place.

Note 2 – I don’t think this error is present in 11.2.0.4 or 12.2.0.1

 

 

 

 

 

May 27, 2019

Re-partitioning 2

Filed under: 12c,Infrastructure,Oracle,Partitioning,Uncategorized — Jonathan Lewis @ 8:20 pm BST May 27,2019

Last week I wrote a note about turning a range-partitioned table into a range/list composite partitioned table using features included in 12.2 of Oracle. But my example was really just an outline of the method and bypassed a number of the little extra problems you’re likely to see in a real-world system, so in this note I’m going to bring in an issue that you might run into – and which I’ve seen appearing a number of times: ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION.

It’s often the case that a system has a partitioned table that’s been around for a long time, and over its lifetime it may have had (real or virtual) columns added, made inivisble, dropped, or mark unused. As a result you may find that the apparent definition of the table is not the same as the real definition of the table – and that’s why Oracle has given us (in 12c) the option to “create table for exchange”.

You might like to read a MoS note giving you one example of a problem with creating an exchange table prior to this new feature. ORA-14097 At Exchange Partition After Adding Column With Default Value (Doc ID 1334763.1) I’ve created a little model by cloning the code from that note.


rem
rem     Script:         pt_exchange_problem.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table mtab (pcol number)
partition by list (pcol) (
        partition p1 values (1),
        partition p2 values (2)
);

alter table mtab add col2 number default 0 not null;

prompt  ========================================
prompt  Traditional creation method => ORA-14097
prompt  ========================================

create table mtab_p2 as select * from mtab where 1=0;
alter table mtab exchange partition P2 with table mtab_p2;

prompt  ===================
prompt  Create for exchange
prompt  ===================

drop table mtab_p2 purge;
create table mtab_p2 for exchange with table mtab;
alter table mtab exchange partition P2 with table mtab_p2;

[/sourcecode}


Here's the output from running this on an instance of 18.3


Table created.

Table altered.

========================================
Traditional creation method => ORA-14097
========================================

Table created.

alter table mtab exchange partition P2 with table mtab_p2
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

===================
Create for exchange
===================

Table dropped.


Table created.


Table altered.

So we don’t have to worry about problems creating an exchange table in Oracle 12c or later. But we do still have a problem if we’re trying to convert our range-partitioned table into a range/list composite partitioned table by doing using the “double-exchange” method. In my simple example I used a “create table” statement to create an empty table that we could exchange into; but without another special version of a “create table” command I won’t be able to create a composite partitioned table that is compatible with the simple table that I want to use as my intermediate table.

Here’s the solution to that problem – first in a thumbnail sketch:

  • create a table for exchange (call it table C)
  • alter table C modify to change it to a composite partitioned table with one subpartition per partition
  • create a table for exchange (call it table E)
  • Use table E to exchange partitions from the original table to the (now-partitioned) table C
  • Split each partition of table C into the specific subpartitions required

And now some code to work through the details – first the code to create and populate the partitioned table.


rem
rem     Script:         pt_comp_from_pt_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

drop table t purge;
drop table pt_range purge;
drop table pt_range_list purge;

-- @@setup

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
	rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

Then some code to create the beginnings of the target composite partitioned table. We create a simple heap table “for exchange”, then modify it to be a composite partitioned table with a named starting partition and high_value and a template defining a single subpartition then, as a variant on the example from last week, specifying interval partitioning.


prompt	==========================================
prompt	First nice feature - "create for exchange"
prompt	==========================================

create table pt_range_list for exchange with table pt_range;

prompt	============================================
prompt	Now alter the table to composite partitioned
prompt	============================================

alter table pt_range_list modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_def      values(default)
)
(
	partition p200 values less than (200)
)
;

If you want to do the conversion from range partitioning to interval partitioning you will have to check very carefully that your original table will be able to convert safely – which means you’ll need to check that the “high_value” values for the partitions are properly spaced to match the interval you’ve defined and (as a special requirement for the conversion) there are no omissions from the current list of high values. If your original table doesn’t match these requirement exactly you may end up trying to exchange data into a partition where it doesn’t belong; for example, if my original table had partitions with high value of 200, 600, 800 then there may be values in the 200-399 range currently stored in the original “600” range partition which shouldn’t go into the new “600” interval partition. You may find you have to split (and/or merge) a few partitions in your range-partitioned table before you can do the main conversion.

Now we create create the table that we’ll actually use for the exchange and go through each exchange in turn. Because I’ve got an explicitly named starting partition the first exchange takes only two steps – exchange out, exchange in. But because I’m using interval partitioning in the composite partitioned table I’m doing a “lock partition” before the second exchange on all the other partitions as this will bring the required target partition into existence. I’m also using the “[sub]partition for()” syntax to identify the pairs of [sub]partitions – this isn’t necessary for the original range-partitioned table, of course, but it’s the only way I can identify the generated subpartitions that will appear in the composite partitioned table.


create table t for exchange with table pt_range;

prompt	=======================================================================
prompt	Double exchange to move a partition to become a composite subpartition
prompt	Could drive this programatically by picking one row from each partition
prompt	=======================================================================

alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition for (399) with table t;
lock  table pt_range_list partition for (399) in exclusive mode;
alter table pt_range_list exchange subpartition for (399,'0') with table t;

alter table pt_range exchange partition for (599) with table t;
lock  table pt_range_list partition for (599) in exclusive mode;
alter table pt_range_list exchange subpartition for (599,'0') with table t;

prompt	=====================================
prompt	Show that we've got the data in place
prompt	=====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

Now that the data is in the target table we can split each default subpartition into the four subpartitions that we want for each partition. To cater for the future, though, I’ve first modified the subpartition template so that each new partition will have four subpartitions (though the naming convention won’t be applied, of course, Oracle will generate system name for all new partitions and subpartitions).


prompt  ================================================
prompt  Change the subpartition template to what we want
prompt  ================================================

alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  ====================================================
prompt  Second nice feature - multiple splits in one command
prompt  Again, first split is fixed name.
prompt  We could do this online after allowing the users in
prompt  ====================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition for (399,'0')
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition for (599,'0')
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Finally a little demonstration that we can’t add an explicitly named partition to the interval partitioned table; then we insert a row to generate the partition and show that it has 4 subpartitions.

Finishing off we rename everything (though that’s a fairly pointless exercise).


prompt  ==============================================================
prompt  Could try adding a partition to show it uses the new template
prompt  But that's not allowed for interval partitions: "ORA-14760:"
prompt  ADD PARTITION is not permitted on Interval partitioned objects
prompt  So insert a value that would go into the next (800) partition
prompt  ==============================================================

alter table pt_range_list add partition p800 values less than (800);

insert into pt_range_list (
        id, grp, small_vc, padding
)
values ( 
        799, '0', lpad(799,10,'0'), rpad('x',100,'x')
)
;

commit;

prompt  ===================================================
prompt  Template naming is not used for the subpartitions,
prompt  so we have to use the "subpartition for()" strategy 
prompt  ===================================================

alter table pt_range_list rename subpartition for (799,'0') to p800_p_0;
alter table pt_range_list rename subpartition for (799,'1') to p800_p_1;
alter table pt_range_list rename subpartition for (799,'2') to p800_p_2;
alter table pt_range_list rename subpartition for (799,'3') to p800_p_def;

prompt  ==============================================
prompt  Might as well clean up the partition names too
prompt  ==============================================

alter table pt_range_list rename partition for (399) to p400;
alter table pt_range_list rename partition for (599) to p600;
alter table pt_range_list rename partition for (799) to p800;

prompt  =======================================
prompt  Finish off by listing the subpartitions 
prompt  =======================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

It’s worth pointing out that you could do the exchanges (and the splitting and renaming at the same time) through some sort of simple PL/SQL loop – looping through the named partitions in the original table and using a row from the first exchange to drive the lock and second exchange (and splitting and renaming). For exanple something like the following which doesn’t have any of the error-trapping and defensive mechanisms you’d want to use on a production system:



declare
        m_pt_val number;
begin
        for r in (select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position) 
        loop
                execute immediate
                        'alter table pt_range exchange partition ' || r.partition_name ||
                        ' with table t';
        
                select id into m_pt_val from t where rownum = 1;
        
                execute immediate 
                        'lock table pt_range_list partition for (' || m_pt_val || ') in exclusive mode';
        
                execute immediate
                        'alter table pt_range_list exchange subpartition  for (' || m_pt_val || ',0)' ||
                        ' with table t';
        
        end loop;
end;
/

If you do go for a programmed loop you have to be really careful to consider what could go wrong at each step of the loop and how your program is going to report (and possibly attempt to recover) the situation. This is definitely a case where you don’t want code with “when others then null” appearing anywhere, and don’t be tempted to include code to truncate the exchange table.

 

May 23, 2019

Re-partitioning

Filed under: 12c,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 11:45 am BST May 23,2019

I wrote a short note a little while ago demonstrating how flexible Oracle 12.2 can be about physically rebuilding a table online to introduce or change the partitioning while discarding data, and so on.  But what do you do (as a recent question on ODC asked) if you want to upgrade a customer’s database to meet the requirements of a new release of your application by changing a partitioned table into a composite partitioned table and don’t have enough room to do an online rebuild. Which could require two copies of the data to exist at the same time.)

If you’ve got the down time (and not necessarily a lot is needed) you can fall back on “traditional methods” with some 12c enhancements. Let’s start with a range partitioned table:


rem
rem     Script:         pt_comp_from_pt.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600
;

commit;

So we’ve got a range-partitioned table with three partitions and some data in each partition. Let’s pretend we want to change this to range/list with the grp column as the subpartition key, allowing explicit use of values 0,1,2 and a bucket subpartition for anything else. First we create an empty version of the table with a suitable subpartition template, and a simple heap table to be used as an exchange medium:


create table pt_range_list (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id)
subpartition by list (grp)
subpartition template (
        subpartition p_def      values(default)
)
(
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

prompt  ===============================================
prompt  First nice 12.2 feature - "create for exchange"
prompt  ===============================================

create table t for exchange with table pt_range;

You’ll notice that our subpartition template identifies just a single subpartition that takes default values – i.e. anything for which no explicit subpartition has been identified. This means we have a one to one correspondance between the data segments of the original table and the copy table. So now we go through a tedious loop (which we could code up with a PL/SQL “execute immediate” approach) to do a double-exchange for each partition in turn. (Any PL/SQL code is left as an exercise to the interested reader.)


alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition p400 with table t;
alter table pt_range_list exchange subpartition p400_p_def with table t;

alter table pt_range exchange partition p600 with table t;
alter table pt_range_list exchange subpartition p600_p_def with table t;

prompt  =====================================
prompt  Show that we've got the data in place
prompt  =====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;


PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_DEF                    200

P400                   P400_P_DEF                    200

P600                   P600_P_DEF                    200


3 rows selected.

We now have to split the newly arrived subpartitions into the 4 pieces we want – but before we do that let’s make sure that any new partitions automatically have the correct subpartitions by changing the subpartition template:


alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  =========================================================
prompt  Second nice 12.2 feature - multiple splits in one command
prompt  We could do this online after allowing the users back on.
prompt  =========================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition p400_p_def
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition p600_p_def
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Now, just to check that everything is behaving, let’s add a new partition, and check to see what partitions and subpartitions we end up with:


alter table pt_range_list add partition p800 values less than (800);

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_0                      100
                       P200_P_1                      100
                       P200_P_2                        0
                       P200_P_DEF                      0

P400                   P400_P_0                      100
                       P400_P_1                      100
                       P400_P_2                        0
                       P400_P_DEF                      0

P600                   P600_P_0                      100
                       P600_P_1                      100
                       P600_P_2                        0
                       P600_P_DEF                      0

P800                   P800_P_0                        0
                       P800_P_1                        0
                       P800_P_2                        0
                       P800_P_DEF                      0


16 rows selected.

And as a final note – if we decide we want to put it all back we could merge four subpartitions down to one subpartition with a single command – then loop through every partition in turn:


alter table pt_range_list
        merge subpartitions  p200_p_0, p200_p_1, p200_p_2, p200_p_def
        into  subpartition  p200_p_def
;

And now I feel like I’m turning into Tim Hall – writing potentially useful demonstrations instead of trying to baffle people with rocket science. But I hope to get over that quite soon. Of course I have left out some important “real-world” details – particularly how you choose to handle indexes while doing the double-exchange. My view would be to take the local indexes with you on the exchange, bypass the global indexes on the exchange out, and be choosy about which global indexes to maintain on the exchange back in; but it all depends on how much downtime you have, how many indexes there are, and the state they’re likely to start or end in.

As ever it’s possible to start with a simple idea like this, then discover there are real-world complications that have to be dealt with. So there’s another article in the pipeline to handle a slightly more complex case. I’ll also be publishing a short note about the easy way of getting the job done from 18c onwards – if you’ve got the spare resources.

 

May 22, 2019

Danger – Hints

Filed under: 12c,Bugs,Hints,Oracle — Jonathan Lewis @ 2:56 pm BST May 22,2019

It shouldn’t be possible to get the wrong results by using a hint – but hints are dangerous and the threat may be there if you don’t know exactly what a hint is supposed to do (and don’t check very carefully what has happened when you’ve used one that you’re not familiar with).

This post was inspired by a blog note from Connor McDonald titled “Being Generous to the Optimizer”. In his note Connor gives an example where the use of “flexible” SQL results in an execution plan that is always expensive to run when a more complex version of the query could produce a “conditional” plan which could be efficient some of the time and would be expensive only when there was no alternative. In his example he rewrote the first query below to produce the second query:


select data
from   address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select  data
from    address
where   ( :choice = 1 and street = :val )
union all
select  data
from    address
where   ( :choice = 2 and suburb = :val );

(We are assuming that bind variable :choice is constrained to be 1 or 2 and no other value.)

In its initial form the optimizer had to choose a tablescan for the query, in its final form the query can select which half of a UNION ALL plan to execute because the optimizer inserts a pair of FILTER operations that check the actual value of :choice at run-time.

When I started reading the example my first thought was to wonder why the optimizer hadn’t simply used “OR-expansion” (or concatenation if you’re running an older version), then I remembered that by the time the optimizer really gets going it has forgotten that “:choice” is the same bind variable in both cases, so doesn’t realise that it would use only one of two possible predicates. However, that doesn’t mean you can’t tell the optimizer to use concatenation. Here’s a model – modified slightly from Connor’s original:


rem
rem     Script:         122_or_expand_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem
rem     Last tested 
rem             19.3.0.0
rem

drop table address purge;
create table address ( street int, suburb int, post_code int,  data char(100));

insert into address
select mod(rownum,1e4), mod(rownum,10), mod(rownum,1e2), rownum
from dual connect by level  <= 1e5 -- > comment to avoid WordPress format issue
;

commit;

exec dbms_stats.gather_table_stats('','ADDRESS')

create index ix1 on address ( street );
create index ix2 on address ( suburb );
create index ix3 on address ( post_code );

variable val number = 6
variable choice number = 1

alter session set statistics_level = all;
set serveroutput off
set linesize 180
set pagesize 60

select
        /*+ or_expand(@sel$1) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

I’ve added one more column to the table and indexed it – I’ll explain why later. I’ve also modified the query to show the output but restricted the result set to a count of the data column rather than a (long) list of rows.

Here’s the execution plan output when hinted:


SQL_ID  6zsh2w6d9mddy, child number 0
-------------------------------------
select  /*+ or_expand(@sel$1) */  count(data) from    address where  (
:choice = 1 and street = :val ) or     ( :choice = 2 and suburb = :val )

Plan hash value: 3986461375

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |      12 |     27 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |      12 |     27 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |     10 |00:00:00.01 |      12 |     27 |
|   3 |    UNION-ALL                            |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|*  4 |     FILTER                              |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      1 |     10 |     10 |00:00:00.01 |      12 |     27 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      1 |     10 |     10 |00:00:00.01 |       2 |     27 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |      0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

As you can see we have a UNION ALL plan with two FILTER operations, and the filter operations allow one or other of the two branches of the UNION ALL to execute depending on the value for :choice. Since I’ve reported the rowsource execution statistics you can also see that the table access through index range scan (operations 5 and 6) has executed once (Starts = 1) but the tablescan (operation 8) has not been executed at all.

If you check the Predicate Information you will see that operation 8 has introduced two lnnvl() predicates. Since the optimizer has lost sight of the fact that :choice is the same variable in both cases it has to assume that sometimes both branches will be relevant for a single execution, so it has to add predicates to the second branch to eliminate data that might have been found in the first branch. This is the (small) penalty we pay for avoiding a “fully-informed” manual rewrite.

Take a look at the Outline Data – we can see our or_expand() hint repeated there, and we can discover that it’s been enhanced. The hint should have been or_expand(@sel$1 (1) (2)). This might prompt you to modify the original SQL to use the fully qualified hint rather than the bare-bones form we’ve got so far. So let’s assume we do that before shipping the code to production.

Now imagine that a couple of months later an enhancement request appears to allow queries on post_code and the front-end has been set up so that we can specify a post_code query by selecting choice number 3. The developer who happens to pick up the change request duly modifies the SQL as follows:


select
        /*+ or_expand(@sel$1 (1) (2)) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
or     ( :choice = 3 and post_code = :val)
;

Note that we’ve got the “complete” hint in place, but there’s now a 3rd predicate. Do you think the hint is still complete ? What do you think will happen when we run the query ? Here’s the execution plan when I set :choice to 3.


select  /*+ or_expand(@sel$1 (1) (2)) */  count(data) from    address
where  ( :choice = 1 and street = :val ) or     ( :choice = 2 and
suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 3986461375

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |      0 |00:00:00.01 |
|   3 |    UNION-ALL                            |                 |      1 |        |      0 |00:00:00.01 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

We get a UNION ALL with two branches, one for :choice = 1, one for :choice = 2 and both of them show zero starts – and we don’t have any part of the plan to handle :choice = 3. The query returns no rows – and if you check the table creation code you’ll see it should have returned 1000 rows. An incorrect (historically adequate) hint has given us wrong results.

If we want the full hint for this new queryy we need to specify the 3rd predicate, by adding (3) to the existing hint to get the following plan (and correct results):


select  /*+ or_expand(@sel$1 (1) (2) (3)) */  count(data) from
address where  ( :choice = 1 and street = :val ) or     ( :choice = 2
and suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 2153173029

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |    1639 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |    1639 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  11009 |   1000 |00:00:00.01 |    1639 |
|   3 |    UNION-ALL                            |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |
|*  9 |     FILTER                              |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|* 10 |      TABLE ACCESS FULL                  | ADDRESS         |      1 |    999 |   1000 |00:00:00.01 |    1639 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      OUTLINE_LEAF(@"SET$49E1C21B_2")
      OUTLINE_LEAF(@"SET$49E1C21B_1")
      OUTLINE_LEAF(@"SET$49E1C21B")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1")
      FULL(@"SET$49E1C21B_2" "ADDRESS"@"SET$49E1C21B_2")
      FULL(@"SET$49E1C21B_3" "ADDRESS"@"SET$49E1C21B_3")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))
   9 - filter(:CHOICE=3)
  10 - filter(("POST_CODE"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL)) AND (LNNVL(:CHOICE=2) OR
              LNNVL("SUBURB"=:VAL))))


We now have three branches to the UNION ALL, and the final branch (:choice =3) ran to show A-rows = 1000 selected in the tablescan.

Conclusion

You shouldn’t mess about with hints unless you’re very confident that you know how they work and then test extremely carefully – especially if you’re modifying old code that already contains some hints.

Note that in comment #2 below Nigel Bayliss has supplued a bug number for this – so if you think OR-expansion will help you, keep an eye open on MOS for patches relating to this bug.

 

March 22, 2019

Stats advisor

Filed under: 12c,Oracle,Statistics — Jonathan Lewis @ 1:10 pm GMT Mar 22,2019

This is just a little shout-out about the Stats Advisor – if you decide to give it a go, what sort of things is it likely to tell you. The answer is in a dynamic performance view called v$stats_advisor_rules – which I’ve list below from an instance running 18.3.0.0.


SQL> set linesize 180
SQL> set trimspool on
SQL> set pagesize 40
SQL> column description format a75
SQL> column name format a32
SQL> break on rule_type duplicate skip 1
SQL> select * from v$stats_advisor_rules;

  RULE_ID NAME                             RULE_TYPE DESCRIPTION                                                                     CON_ID
---------- -------------------------------- --------- --------------------------------------------------------------------------- ----------
         0                                  SYSTEM                                                                                         0
         1 UseAutoJob                       SYSTEM    Use Auto Job for Statistics Collection                                               0
         2 CompleteAutoJob                  SYSTEM    Auto Statistics Gather Job should complete successfully                              0
         3 MaintainStatsHistory             SYSTEM    Maintain Statistics History                                                          0
         4 UseConcurrent                    SYSTEM    Use Concurrent preference for Statistics Collection                                  0
         5 UseDefaultPreference             SYSTEM    Use Default Preference for Stats Collection                                          0
         6 TurnOnSQLPlanDirective           SYSTEM    SQL Plan Directives should not be disabled                                           0

         7 AvoidSetProcedures               OPERATION Avoid Set Statistics Procedures                                                      0
         8 UseDefaultParams                 OPERATION Use Default Parameters in Statistics Collection Procedures                           0
         9 UseGatherSchemaStats             OPERATION Use gather_schema_stats procedure                                                    0
        10 AvoidInefficientStatsOprSeq      OPERATION Avoid inefficient statistics operation sequences                                     0

        11 AvoidUnnecessaryStatsCollection  OBJECT    Avoid unnecessary statistics collection                                              0
        12 AvoidStaleStats                  OBJECT    Avoid objects with stale or no statistics                                            0
        13 GatherStatsAfterBulkDML          OBJECT    Do not gather statistics right before bulk DML                                       0
        14 LockVolatileTable                OBJECT    Statistics for objects with volatile data should be locked                           0
        15 UnlockNonVolatileTable           OBJECT    Statistics for objects with non-volatile should not be locked                        0
        16 MaintainStatsConsistency         OBJECT    Statistics of dependent objects should be consistent                                 0
        17 AvoidDropRecreate                OBJECT    Avoid drop and recreate object seqauences                                            0
        18 UseIncremental                   OBJECT    Statistics should be maintained incrementally when it is beneficial                  0
        19 NotUseIncremental                OBJECT    Statistics should not be maintained incrementally when it is not beneficial          0
        20 AvoidOutOfRange                  OBJECT    Avoid Out of Range Histogram endpoints                                               0
        21 UseAutoDegree                    OBJECT    Use Auto Degree for statistics collection                                            0
        22 UseDefaultObjectPreference       OBJECT    Use Default Object Preference for statistics collection                              0
        23 AvoidAnalyzeTable                OBJECT    Avoid using analyze table commands for statistics collection                         0

24 rows selected.

As you can see the rules fall into three groups: system, operation, and object – and you can’t help noticing at all three levels how commonly the theme is: “just stick with the defaults!”.

As so often happens when I start writing a catch-up or “remind myself” not I found that Tim Hall has already written all about it.

March 6, 2019

12c Snapshots

Filed under: 12c,Oracle,Partitioning,Performance — Jonathan Lewis @ 10:35 am GMT Mar 6,2019

I published a note a few years ago about using the 12c “with function” mechanism for writing simple SQL statements to takes deltas of dynamic performance views. The example I supplied was for v$event_histogram but I’ve just been prompted by a question on ODC to supply a couple more – v$session_event and v$sesstat (joined to v$statname) so that you can use one session to get an idea of the work done and time spent by another session – the first script reports wait time:


rem
rem     Program:        12c_with_function_2.sql
rem     Dated:          July 2013
rem
rem     See also
rem     12c_with_function.sql
rem     https://jonathanlewis.wordpress.com/2013/06/30/12c-fun/
rem
rem     Notes:
rem             Reports session WAIT time
rem             Modify the list of SIDs of interest
rem             Set the time in seconds
rem

define m_snap_time = 60
define m_sid_list  = '3, 4, 121, 127'

set timing on
set sqlterminator off

set linesize 180

break on sid skip 1

with
        function wait_row (
                i_secs  number, 
                i_return        number
        ) return number
        is
        begin
                dbms_lock.sleep(i_secs);
                return i_return;
        end;
select
        sid, 
        sum(total_waits),
        sum(total_timeouts), 
        sum(time_waited), 
        event
from    (
        select
                sid, event_id, 
                -total_waits total_waits, 
                -total_timeouts total_timeouts, 
                -time_waited time_waited, 
                -time_waited_micro time_waited_micro, 
                event
        from    v$session_event
        where   sid in ( &m_sid_list )
        union all
        select
                null, null, null, null, null, wait_row(&m_snap_time, 0), null
        from    dual
        union all
        select
                sid, event_id, total_waits, total_timeouts, time_waited, time_waited_micro, event
        from    v$session_event
        where   sid in ( &m_sid_list )
        )
where
        time_waited_micro != 0
group by
        sid, event_id, event
having
        sum(time_waited) != 0
order by
        sid, sum(time_waited) desc
/


And this one reports session activity:

rem
rem     Program:        12c_with_function_3.sql
rem     Dated:          July 2013
rem
rem     See also
rem     12c_with_function.sql
rem     https://jonathanlewis.wordpress.com/2013/06/30/12c-fun/
rem
rem     Notes:
rem             Reports session stats
rem             Modify the list of SIDs of interest
rem             Set the time in seconds
rem

define m_snap_time = 60
define m_sid_list  = '3, 4, 13, 357'


set timing on
set sqlterminator off

set linesize 180

break on sid skip 1
column name format a64

with
        function wait_row (
                i_secs  number, 
                i_return        number
        ) return number
        is
        begin
                dbms_lock.sleep(i_secs);
                return i_return;
        end;
select
        sid, 
        name,
        sum(value)
from    (
        select
                ss.sid, 
                ss.statistic#,
                sn.name,
                -ss.value value
        from
                v$sesstat       ss,
                v$statname      sn
        where   ss.sid in ( &m_sid_list )
        and     sn.statistic# = ss.statistic#
        union all
        select
                null, null, null, wait_row(&m_snap_time, 0)
        from    dual
        union all
        select
                ss.sid, ss.statistic#, sn.name, ss.value value
        from
                v$sesstat       ss,
                v$statname      sn
        where   ss.sid in ( &m_sid_list )
        and     sn.statistic# = ss.statistic#
        )
where
        value != 0
group by
        sid, statistic#, name
having
        sum(value) != 0
order by
        sid, statistic#
/


You’ll notice that I’ve used dbms_lock.sleep() in my wait function – and the session running the SQL can be granted the execute privilege on the package through a role to make this work – but if you’re running Oracle 18 then you’ve probably noticed that the sleep() function and procedure have been copied to the dbms_session package.

 

November 15, 2018

num_index_keys

Filed under: 12c,Bugs,CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:13 pm GMT Nov 15,2018

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

October 28, 2018

Upgrades – again

Filed under: 12c,Histograms,Oracle,Statistics,Upgrades — Jonathan Lewis @ 12:39 pm GMT Oct 28,2018

I’ve got a data set which I’ve recreated in 11.2.0.4 and 12.2.0.1.

I’ve generated stats on the data set, and the stats are identical.

I don’t have any indexes or extended stats, or SQL Plan directives or SQL Plan Profiles, or SQL Plan Baselines, or SQL Patches to worry about.

I’m joining two tables, and the join column on one table has a frequency histogram while the join column on the other table has a height-balanced histogram.  The histograms were created with estimate_percent => 100%. (which explains why I’ve got a height-balanced histogram in 12c rather than a hybrid histogram.)

Here are the two execution plans, 11.2.0.4 first, pulled from memory by dbms_xplan.display_cursor():


SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      12 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      12 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1855 |   1327 |00:00:00.01 |      12 |  2440K|  2440K| 1357K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       6 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")



SQL_ID	f8wj7karu0hhs, child number 0
-------------------------------------
select	       count(*) from	     t1, t2 where	  t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation	    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |	OMem |	1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |	   |	  1 |	     |	    1 |00:00:00.01 |	  41 |	     |	     |		|
|   1 |  SORT AGGREGATE     |	   |	  1 |	   1 |	    1 |00:00:00.01 |	  41 |	     |	     |		|
|*  2 |   HASH JOIN	    |	   |	  1 |	1893 |	 1327 |00:00:00.01 |	  41 |	2545K|	2545K| 1367K (0)|
|   3 |    TABLE ACCESS FULL| T1   |	  1 |	 100 |	  100 |00:00:00.01 |	   7 |	     |	     |		|
|   4 |    TABLE ACCESS FULL| T2   |	  1 |	 800 |	  800 |00:00:00.01 |	   7 |	     |	     |		|
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

The key point is the the difference between the two cardinality estimates. Why has that appeared, and what might the optimizer do in a more complex plan when a cardinality estimates changes?

The difference is only 2% but that was on a couple of data sets I just happened to run up to check something completely different, I wasn’t trying to break something, so who know how big the variation can get. Of course if you’re switching from 11g to 12c then Oracle (Corp.) expects you to be using auto_sample_size anyway so you shouldn’t be producing height-balanced histograms.

So does this difference really matter? Maybe not, but if you (like many sites I’ve seen) are still using fixed percentage sample sizes and are generating histograms it’s another reason (on top of the usual instability effects of height-balanced and hybrid histograms) why you might see plans change as you upgrade from 11g to 12c.

Footnote

It looks as if the difference comes mostly from a coding error in 11g that has been fixed in 12c – I couldn’t find an official bug or fix_control that matched, though. More on that later in the week.

Update

Chinar Aliyev has pointed out that there are three fix-controls that may be associated with this (and other ) changes. From v$system_fix_control these are:

14033181 1 QKSFM_CARDINALITY_14033181   correct ndv for non-popular values in join cardinality comp.         (12.1.0.1)
19230097 1 QKSFM_CARDINALITY_19230097   correct join card when popular value compared to non popular         (12.2.0.1)
22159570 1 QKSFM_CARDINALITY_22159570   correct non-popular region cardinality for hybrid histogram          (12.2.0.1)

I haven’t tested them yet, but with the code easily available in the article it won’t take long to see what the effects are when I have a few minutes. The first fix may also be why I had a final small discrepancy between 11g and 12c on the join on two columns with frequency histograms.

October 23, 2018

Upgrade threat

Filed under: 12c,18c,Histograms,Oracle,Statistics,Upgrades — Jonathan Lewis @ 7:50 pm BST Oct 23,2018

Here’s one I’ve just discovered while trying to build a reproducible test case – that didn’t reproduce because an internal algorithm has changed.

If you upgrade from 12c to 18c and have a number of hybrid histograms in place you may find that some execution plans change because of a change in the algorithm for producing hybrid histograms (and that’s not just if you happen to get the patch that fixes the top-frequency/hybrid bug relating to high values).

Here’s a little test to demonstrate how I wasted a couple of hours trying to solve the wrong problem – first a simple data set:


rem
rem     Script:         18c_histogram_upgrade.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem 

drop table t2 purge;

execute dbms_random.seed(0)

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

I’ve created a skewed data set which (we will see) has 22 distinct values and created a histogram of 13 buckets on it. This will be a hybrid histogram – but different versions of Oracle will produce different histograms (even though the data set is the same for both versions):


select
        j2, count(*)
from
        t2
group by
        j2
order by
        j2
;

select
        endpoint_value                                                            value,
        endpoint_number,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) bucket_size,
        endpoint_repeat_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
;

Here’s the dataset from 12.2.0.1 and 18.3.0.0


        J2   COUNT(*)
---------- ----------
         1          1
         8          3
         9          1
        10          5
        11          4
        12          8
        13         14
        14          9
        15         11
        16         22
        17         34
        18         31
        19         36
        20         57
        21         44
        22         45
        23         72
        24         70
        25         87
        26        109
        27         96
        28         41

22 rows selected.



And here are the histograms - 12.2.0.1 then 18.3.0.0:



     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        18             143          31                    31
        19             179          36                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        28             800         137                    41

13 rows selected.

     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        19             179          67                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        27             759          96                    96
        28             800          41                    41

13 rows selected.

Both histograms have 13 buckets as requested; both are hybrid histograms as expected.

But why does 12c have the value 18 when 18c doesn’t, and why does 18c have the value 27 when 12c doesn’t ?

That’s the second time in two weeks I’ve had reproducible test cases not reproducing – thanks to an 18c upgrade.

Update (See comments)

I had completely forgotten that a previous defect in the construction of hybrid (and Top-N) histograms had been addressed in 18.3 but needed a fix in 12.2 and a backport patch in 12.1.0.2.

Since the defect could “lose” a popular value in order to ensure that both the low and high values were captured in the histogram it’s not surprising that a fix could result in one of the popular values in a histogram dissappearing (after the upgrade) even when the gather had used a 100% sample. Quite possibly the algorithm used to ensure the presence of the high value has had a cascading effect down the histogram that can affect which popular values get into the histogram with repeat counts.

I think I’m going to have to grit my teeth and patch a 12.1.0.2, or update a 12.2.0.1 with exactly the right patch-set to find out.

[It has now been confirmed by Nigel Bayliss that this is a side effect of the fix to the bug 25994960]

October 10, 2018

Hybrid Fake

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 3:12 pm BST Oct 10,2018

Oracle 12c introduced the “Hybrid” histogram – a nice addition to the available options and one that (ignoring the bug for which a patch has been created) supplies the optimizer with better information about the data than the equivalent height-balanced histogram. There is still a problem, though, in the trade-off between accuracy and speed: just as it does with height-balanced histograms when using auto_sample_size Oracle samples (typically) about 5,500 rows to create a hybrid histogram, and the SQL it uses to generate the necessary summary is essentially an aggregation of the sample, so either you have a small sample with the risk of lower accuracy or a large sample with an increase in workload. This being the case it’s worth knowing how to create a hybrid histogram using the dbms_stats.set_column_stats() API.

It’s fairly easy to identify the cases where a hybrid histogram could be helpful.  You have a large volume of data spread over a large number (more than 2048) of distinct values, but a few values (typically less than 250) which are responsible for a significant fraction of the data. You would like to tell Oracle about the special “extreme” cases so that the optimizer can take defensive if you query for one of those values, but at the same time you would like to give Oracle a picture of the way the rest of the data is distributed. This is similar in some respects to the Top-N (a.k.a. Top-Frequency) histogram which says to Oracle “We have a small number of popular values, and some odds and ends on the side that are pretty ignorable”, the critical difference is that you need the hybrid histogram when it’s not safe to “ignore” the odds and ends.

Here’s an example of creating some data and then generating a completely artificial hybrid histogram. The code demonstrates 3 points – the principle feature of creating hybrid histograms and a couple of generic details about Oracle’s histograms:

  • The main point is that Oracle 12c introduces a new numeric array in the dbms_stats.statrec structure. This allows each row (bucket) in a histogram to hold a second statistic about the bucket so we can now store a frequency figure for the bucket as a whole, and a “repeat-count” figure for the highest value in the bucket. (Warning – there is a counter-intuitive conflict between the name of the new structure and the way it is used for hybrid histograms).
  • As side-point I’ve included a code variation that shows you the remarkable similarity between generating a Frequency histogram and a Hybrid histogram.
  • As a second side-point I have also highlighted the effect you see in the dba_tab_histograms view when your popular values are “too similar” to each other – i.e. when they match on the first 6 characters.

We start by creating a table as a copy of the view all_objects – then we’re going to create a hybrid histogram on the object_type column that looks nothing like the  data. The histogram will say:

  • for every 15,000 rows (where the column is not null)
    • 5,000 will have values less than or equal to ‘C’, of which 3,000 will have the value ‘C’
    • The next 2,000 (i.e. running total 7,000) will have values greater than ‘C’ and up to ‘PPPPPP1’, but ‘PPPPPP1’ itself is not a popular value
    • The next 2,000 (i.e. running total 9,000) will have values greater than ‘PPPPPP1’ and up to ‘PPPPPP2’, but ‘PPPPPP2’ itself is not a popular value
    • The next 2,000 (i.e. running total 11,000) will have values greater than ‘PPPPPP2’ and up to ‘PPPPPP3’, but ‘PPPPPP3’ itself is not a popular value
    • The last 4,000 (i.e. running total 15,000) will have values greater than ‘PPPPPP3’ and up to ‘X’ of which 3,000 will have the value ‘X’

Note particularly that the “how many rows hold the endpoint value” are stored in the statrec.bkvals array – just as they would be for a frequency histogram – and the cumulative count of rows is stored in the statrec.rpcnts structure. All we have to do to create a frequency histogram instead of a hybrid histogram is to store zeros in the statrec.rpcnts structure, or leave it uninitialized.

You’ll notice that since I’m creating a histogram on a character column I’ve used an array of type dbms_stats.chararray to hold the list of values (in ascending order) that I want the histogram to describe.


rem
rem     Script:         12c_hybrid_histogram_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2018
rem 

create table t1
as
select * from all_objects
;

begin
        dbms_stats.gather_table_stats(
                ownname         => null,
                tabname         => 't1',
                method_opt      => 'for all columns size 1'
        );
end;
/

declare
                c_array         dbms_stats.chararray;
                m_rec           dbms_stats.statrec;
                m_distcnt       number;
                m_density       number;
                m_nullcnt       number;
                m_avgclen       number;

begin
        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 'T1',
                colname         => 'OBJECT_TYPE', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => m_rec,
                avgclen         => m_avgclen
        );

        m_rec.epc    := 5;

        c_array      := dbms_stats.chararray( 'C',  'PPPPPP1',  'PPPPPP2',  'PPPPPP3',   'X');
        m_rec.bkvals := dbms_stats.numarray (3000,          1,          1,          1,  3000);

        m_rec.rpcnts := dbms_stats.numarray (5000,       7000,       9000,      11000, 15000);
--      m_rec.rpcnts := dbms_stats.numarray (0000,       0000,       0000,       0000, 00000);

        dbms_stats.prepare_column_values(m_rec, c_array);

        dbms_stats.set_column_stats(
                ownname         => user,
                tabname         => 'T1',
                colname         => 'OBJECT_TYPE', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => m_rec,
                avgclen         => m_avgclen
        ); 
end;
/

That’s it – it’s remarkably simple. To show the effect of running this code I can report the content of user_tab_histograms for the column. I’ve actually run the code and queried the results twice; first for the case where I created the hybrid histogram and then after modifying the PL/SQL block to set the rpcnts array to zeros to create a frequency histogram.


column endpoint_actual_value format a22
column endpoint_value        format 999,999,999,999,999,999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value, endpoint_actual_value, endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'OBJECT_TYPE'
order by
        endpoint_value
;

With non-zero rpcnts (hybrid histogram)
=======================================
ENDPOINT_NUMBER                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
--------------- ------------------------------------------------ ---------------------- ---------------------
           3000  347,883,889,521,833,000,000,000,000,000,000,000 C                                       3000
           7000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP1                                    1
           9000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP2                                    1
          11000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP3                                    1
          15000  456,922,123,551,065,000,000,000,000,000,000,000 X                                       3000


With rpcnts set to zero (frequency histogram)
=============================================
ENDPOINT_NUMBER                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
--------------- ------------------------------------------------ ---------------------- ---------------------
           3000  347,883,889,521,833,000,000,000,000,000,000,000 C                                          0
           3001  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP1                                    0
           3002  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP2                                    0
           3003  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP3                                    0
           6003  456,922,123,551,065,000,000,000,000,000,000,000 X                                          0

I made a comment earlier on that the naming and use of the rpcnts structure was somewhat counter-intuitive. As you can see in the results above, when I created the hybrid histogram the values I stored in the rpcnts structure are not the values reported as the “repeat count”, the numbers reported as the “repeat count” are from the bkvals (bucket values).  As far as I’m concerned this means I have to go back to my basic examples every time I want to fake a histogram because I’m never too sure which arrays I should populate with what values – and whether I should use absolute or cumulative values.

One last minor point: you’ll see that the endpoint_actual_value has been populated in this example. This is because (with Oracle’s interesting transformation from character to numeric) the three ‘PPPPPPx’ character values turn into the same number – so Oracle stores the first 64 bytes (or 32 for versions of Oracle prior to 12c) of the actual value.

 

September 30, 2018

Case Study

Filed under: 12c,Execution plans,Oracle,subqueries,Troubleshooting — Jonathan Lewis @ 7:59 pm BST Sep 30,2018

A question about reading execution plans and optimising queries arrived on the ODC database forum a little while ago; the owner says the following statement is taking 14 minutes to return 30,000 rows and wants some help understanding why.

If you look at the original posting you’ll see that we’ve been given the text of the query and the execution plan including rowsource execution stats. There’s an inconsistency between the supplied information and the question asked, and I’ll get back to that shortly, but to keep this note fairly short I’ve excluded the 2nd half of the query (which is a UNION ALL) because the plan says the first part of the query took 13 minutes and 20 second and the user is worried about a total of 14 minutes.

SELECT /*+ gather_plan_statistics*/ DISTINCT
                rct.org_id,
                hzp.party_name,
                hca.account_number,
                rct.interface_header_attribute1 order_number,
                rct.customer_trx_id,
                rct.trx_number,
                rct.trx_date,
                rctd.gl_date,
                rct.creation_date,
                rctl.line_number,
                rct.invoice_currency_code inv_currency,
                (
                       SELECT SUM (rct_1.extended_amount)
                       FROM   apps.ra_customer_trx_lines_all rct_1
                       WHERE  rct_1.customer_trx_id = rct.customer_trx_id
                       AND    rct_1.line_type = 'LINE') inv_net_amount,
                (
                       SELECT SUM (rct_2.extended_amount)
                       FROM   apps.ra_customer_trx_lines_all rct_2
                       WHERE  rct_2.customer_trx_id = rct.customer_trx_id
                       AND    rct_2.line_type = 'TAX') inv_tax_amount,
                (
                       SELECT SUM (rct_3.extended_amount)
                       FROM   apps.ra_customer_trx_lines_all rct_3
                       WHERE  rct_3.customer_trx_id = rct.customer_trx_id) inv_gross_amount,
                gll.currency_code                                    func_currency,
                Round((
                        (
                        SELECT SUM (rct_4.extended_amount)
                        FROM   apps.ra_customer_trx_lines_all rct_4
                        WHERE  rct_4.customer_trx_id = rct.customer_trx_id
                        AND    rct_4.line_type = 'LINE')*gdr.conversion_rate),2) func_net_amount,
                Round((
                        (
                        SELECT SUM (rct_5.extended_amount)
                        FROM   apps.ra_customer_trx_lines_all rct_5
                        WHERE  rct_5.customer_trx_id = rct.customer_trx_id
                        AND    rct_5.line_type = 'TAX')*gdr.conversion_rate),2) func_tax_amount,
                Round((
                        (
                        SELECT SUM (rct_6.extended_amount)
                        FROM   apps.ra_customer_trx_lines_all rct_6
                        WHERE  rct_6.customer_trx_id = rct.customer_trx_id)*gdr.conversion_rate),2) func_gross_amount,
                glcc.segment1                                                                 company,
                glcc.segment2                                                                 account,
                hg.geography_name                                                             billing_country,
                gdr.conversion_rate
FROM            apps.hz_parties hzp,
                apps.hz_cust_accounts hca,
                apps.ra_customer_trx_all rct,
                apps.ra_customer_trx_lines_all rctl,
                apps.ra_cust_trx_line_gl_dist_all rctd,
                apps.gl_code_combinations_kfv glcc,
                apps.hz_cust_site_uses_all hcsua,
                apps.hz_cust_acct_sites_all hcasa,
                apps.hz_party_sites hps,
                apps.hz_locations hl,
                apps.hz_geographies hg,
                apps.gl_ledgers gll,
                apps.gl_daily_rates gdr
WHERE           hzp.party_id = hca.party_id
AND             hca.cust_account_id = rct.bill_to_customer_id
AND             hca.cust_account_id = hcasa.cust_account_id
AND             rct.customer_trx_id = rctl.customer_trx_id
AND             rctl.customer_trx_line_id = rctd.customer_trx_line_id
AND             glcc.code_combination_id = rctd.code_combination_id
AND             rct.bill_to_site_use_id = hcsua.site_use_id
AND             hcsua.cust_acct_site_id = hcasa.cust_acct_site_id
AND             hcasa.party_site_id = hps.party_site_id
AND             hps.location_id = hl.location_id
AND             hl.country = hg.country_code
AND             hg.geography_type = 'COUNTRY'
AND             rctl.line_type = 'TAX'
AND             gll.ledger_id = rct.set_of_books_id
AND             gdr.from_currency = rct.invoice_currency_code
AND             gdr.to_currency = gll.currency_code
AND             to_date(gdr.conversion_date) = to_date(rctd.gl_date)
AND             gdr.conversion_type = 'Corporate'
AND             rctd.gl_date BETWEEN To_date ('01-JAN-2018', 'DD-MON-YYYY') AND  To_date ('31-JAN-2018', 'DD-MON-YYYY')
AND             glcc.segment1 = '2600'
AND             glcc.segment2 = '206911'
GROUP BY        hzp.party_name,
                hca.account_number,
                rct.interface_header_attribute1,
                rct.trx_number,
                rct.trx_date,
                rct.creation_date,
                rctl.line_number,
                rctl.unit_selling_price,
                rct.org_id,
                rctd.gl_date,
                rct.customer_trx_id,
                glcc.segment1,
                glcc.segment2,
                hg.geography_name,
                rct.invoice_currency_code,
                gll.currency_code,
                gdr.conversion_rate 

We note that there are six scalar subqueries in the text I’ve reported – and they form two groups of three, and the difference between the two groups is that one group is multiplied by a conversion rate while the other isn’t; moreover in each group the three subqueries are simply querying subsets of the same correlated data set. So it looks as if all 6 scalar subqueries could be eliminated and replaced by the inclusion of an aggregate view in the from clause and the projection of 6 columns from that view.

However, before pursuing that option, take a look at the plan with the rowsource execution stats – where is the time going ?


-----------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                                  | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                                           |                              |      1 |        |    501 |00:13:20.17 |    3579K|  
|   1 |  UNION-ALL                                                 |                              |      1 |        |    501 |00:13:20.17 |    3579K|  
|   2 |   HASH UNIQUE                                              |                              |      1 |      1 |    501 |00:13:20.17 |    3579K|  
|   3 |    HASH GROUP BY                                           |                              |      1 |      1 |  19827 |00:13:20.15 |    3579K|  
|   4 |     NESTED LOOPS                                           |                              |      1 |        |  21808 |00:13:10.26 |    3579K|  
|   5 |      NESTED LOOPS                                          |                              |      1 |      1 |  21808 |00:13:10.11 |    3578K|  
|   6 |       NESTED LOOPS OUTER                                   |                              |      1 |      1 |  21808 |00:13:09.90 |    3576K|  
|   7 |        NESTED LOOPS OUTER                                  |                              |      1 |      1 |  21808 |00:13:09.25 |    3501K|  
|   8 |         NESTED LOOPS OUTER                                 |                              |      1 |      1 |  21808 |00:13:08.48 |    3426K|  
|   9 |          NESTED LOOPS OUTER                                |                              |      1 |      1 |  21808 |00:13:07.66 |    3333K|  
|  10 |           NESTED LOOPS OUTER                               |                              |      1 |      1 |  21808 |00:13:06.92 |    3258K|  
|  11 |            NESTED LOOPS OUTER                              |                              |      1 |      1 |  21808 |00:13:06.08 |    3183K|  
|  12 |             NESTED LOOPS                                   |                              |      1 |      1 |  21808 |00:13:04.69 |    3090K|  
|  13 |              NESTED LOOPS                                  |                              |      1 |      1 |  21808 |00:13:05.75 |    3026K|  
|  14 |               NESTED LOOPS                                 |                              |      1 |      1 |  21808 |00:13:03.30 |    2961K|  
|  15 |                NESTED LOOPS                                |                              |      1 |      1 |  33459 |00:00:04.33 |    1123K|  
|  16 |                 NESTED LOOPS                               |                              |      1 |    351 |  33459 |00:00:03.67 |    1025K|  
|  17 |                  NESTED LOOPS                              |                              |      1 |    351 |  33459 |00:00:03.06 |     926K|  
|  18 |                   NESTED LOOPS                             |                              |      1 |    351 |  33459 |00:00:02.47 |     827K|  
|* 19 |                    HASH JOIN                               |                              |      1 |    351 |  33459 |00:00:01.90 |     730K|  
|  20 |                     TABLE ACCESS FULL                      | GL_LEDGERS                   |      1 |     38 |     39 |00:00:00.01 |      15 |  
|  21 |                     NESTED LOOPS                           |                              |      1 |        |  33459 |00:00:01.75 |     730K|  
|  22 |                      NESTED LOOPS                          |                              |      1 |    351 |  33459 |00:00:01.44 |     696K|  
|  23 |                       NESTED LOOPS                         |                              |      1 |    351 |  33459 |00:00:01.11 |     646K|  
|* 24 |                        HASH JOIN                           |                              |      1 |    385 |  33459 |00:00:00.40 |     526K|  
|* 25 |                         TABLE ACCESS BY INDEX ROWID BATCHED| GL_CODE_COMBINATIONS         |      1 |     35 |      1 |00:00:00.01 |     108 |  
|* 26 |                          INDEX RANGE SCAN                  | GL_CODE_COMBINATIONS_N2      |      1 |    499 |     77 |00:00:00.01 |       3 |  
|* 27 |                         TABLE ACCESS BY INDEX ROWID BATCHED| RA_CUST_TRX_LINE_GL_DIST_ALL |      1 |    651K|   1458K|00:00:02.22 |     526K|  
|* 28 |                          INDEX RANGE SCAN                  | RA_CUST_TRX_LINE_GL_DIST_N2  |      1 |    728K|   1820K|00:00:01.60 |   11147 |  
|* 29 |                        TABLE ACCESS BY INDEX ROWID         | RA_CUSTOMER_TRX_LINES_ALL    |  33459 |      1 |  33459 |00:00:00.53 |     119K|  
|* 30 |                         INDEX UNIQUE SCAN                  | RA_CUSTOMER_TRX_LINES_U1     |  33459 |      1 |  33459 |00:00:00.31 |   86364 |  
|* 31 |                       INDEX UNIQUE SCAN                    | RA_CUSTOMER_TRX_U1           |  33459 |      1 |  33459 |00:00:00.21 |   49850 |  
|  32 |                      TABLE ACCESS BY INDEX ROWID           | RA_CUSTOMER_TRX_ALL          |  33459 |      1 |  33459 |00:00:00.20 |   33459 |  
|  33 |                    TABLE ACCESS BY INDEX ROWID             | HZ_CUST_ACCOUNTS             |  33459 |      1 |  33459 |00:00:00.42 |   97887 |  
|* 34 |                     INDEX UNIQUE SCAN                      | HZ_CUST_ACCOUNTS_U1          |  33459 |      1 |  33459 |00:00:00.24 |   64428 |  
|  35 |                   TABLE ACCESS BY INDEX ROWID              | HZ_PARTIES                   |  33459 |      1 |  33459 |00:00:00.44 |   98783 |  
|* 36 |                    INDEX UNIQUE SCAN                       | HZ_PARTIES_U1                |  33459 |      1 |  33459 |00:00:00.26 |   65175 |  
|  37 |                  TABLE ACCESS BY INDEX ROWID               | HZ_CUST_SITE_USES_ALL        |  33459 |      1 |  33459 |00:00:00.46 |   98374 |  
|* 38 |                   INDEX UNIQUE SCAN                        | HZ_CUST_SITE_USES_U1         |  33459 |      1 |  33459 |00:00:00.28 |   64915 |  
|* 39 |                 TABLE ACCESS BY INDEX ROWID                | HZ_CUST_ACCT_SITES_ALL       |  33459 |      1 |  33459 |00:00:00.45 |   98195 |  
|* 40 |                  INDEX UNIQUE SCAN                         | HZ_CUST_ACCT_SITES_U1        |  33459 |      1 |  33459 |00:00:00.26 |   64736 |  
|  41 |                TABLE ACCESS BY INDEX ROWID BATCHED         | GL_DAILY_RATES               |  33459 |      1 |  21808 |00:12:44.59 |    1838K|  
|* 42 |                 INDEX RANGE SCAN                           | GL_DAILY_RATES_U1            |  33459 |      1 |  21808 |00:13:08.16 |    1837K|  
|  43 |               TABLE ACCESS BY INDEX ROWID                  | HZ_PARTY_SITES               |  21808 |      1 |  21808 |00:00:00.35 |   64339 |  
|* 44 |                INDEX UNIQUE SCAN                           | HZ_PARTY_SITES_U1            |  21808 |      1 |  21808 |00:00:00.23 |   42531 |  
|  45 |              TABLE ACCESS BY INDEX ROWID                   | HZ_LOCATIONS                 |  21808 |      1 |  21808 |00:00:00.33 |   64353 |  
|* 46 |               INDEX UNIQUE SCAN                            | HZ_LOCATIONS_U1              |  21808 |      1 |  21808 |00:00:00.18 |   42545 |  
|  47 |             VIEW PUSHED PREDICATE                          | VW_SSQ_1                     |  21808 |      1 |  21808 |00:00:01.17 |   93476 |  
|  48 |              SORT GROUP BY                                 |                              |  21808 |      1 |  21808 |00:00:01.06 |   93476 |  
|  49 |               TABLE ACCESS BY INDEX ROWID BATCHED          | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |     16 |    145K|00:00:00.84 |   93476 |  
|* 50 |                INDEX RANGE SCAN                            | XXC_CUSTOMER_GETPAID         |  21808 |     16 |    145K|00:00:00.36 |   59938 |  
|  51 |            VIEW PUSHED PREDICATE                           | VW_SSQ_2                     |  21808 |      1 |  21808 |00:00:00.69 |   74433 |  
|  52 |             SORT GROUP BY                                  |                              |  21808 |      1 |  21808 |00:00:00.59 |   74433 |  
|  53 |              TABLE ACCESS BY INDEX ROWID BATCHED           | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  92201 |00:00:00.49 |   74433 |  
|* 54 |               INDEX RANGE SCAN                             | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  92201 |00:00:00.24 |   59903 |  
|  55 |           VIEW PUSHED PREDICATE                            | VW_SSQ_3                     |  21808 |      1 |  21808 |00:00:00.61 |   74852 |  
|  56 |            SORT GROUP BY                                   |                              |  21808 |      1 |  21808 |00:00:00.51 |   74852 |  
|  57 |             TABLE ACCESS BY INDEX ROWID BATCHED            | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  53060 |00:00:00.38 |   74852 |  
|* 58 |              INDEX RANGE SCAN                              | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  53060 |00:00:00.19 |   59148 |  
|  59 |          VIEW PUSHED PREDICATE                             | VW_SSQ_4                     |  21808 |      1 |  21808 |00:00:00.70 |   93490 |  
|  60 |           SORT GROUP BY                                    |                              |  21808 |      1 |  21808 |00:00:00.61 |   93490 |  
|  61 |            TABLE ACCESS BY INDEX ROWID BATCHED             | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |     16 |    145K|00:00:00.63 |   93490 |  
|* 62 |             INDEX RANGE SCAN                               | XXC_CUSTOMER_GETPAID         |  21808 |     16 |    145K|00:00:00.25 |   59950 |  
|  63 |         VIEW PUSHED PREDICATE                              | VW_SSQ_5                     |  21808 |      1 |  21808 |00:00:00.63 |   74427 |  
|  64 |          SORT GROUP BY                                     |                              |  21808 |      1 |  21808 |00:00:00.54 |   74427 |  
|  65 |           TABLE ACCESS BY INDEX ROWID BATCHED              | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  92201 |00:00:00.44 |   74427 |  
|* 66 |            INDEX RANGE SCAN                                | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  92201 |00:00:00.21 |   59900 |  
|  67 |        VIEW PUSHED PREDICATE                               | VW_SSQ_6                     |  21808 |      1 |  21808 |00:00:00.59 |   74846 |  
|  68 |         SORT GROUP BY                                      |                              |  21808 |      1 |  21808 |00:00:00.50 |   74846 |  
|  69 |          TABLE ACCESS BY INDEX ROWID BATCHED               | RA_CUSTOMER_TRX_LINES_ALL    |  21808 |      8 |  53060 |00:00:00.35 |   74846 |  
|* 70 |           INDEX RANGE SCAN                                 | XXC_CUSTOMER_GETPAID         |  21808 |     12 |  53060 |00:00:00.17 |   59144 |  
|* 71 |       INDEX RANGE SCAN                                     | HZ_GEOGRAPHIES_N11           |  21808 |   5812 |  21808 |00:00:00.13 |    2684 |  
|  72 |      TABLE ACCESS BY INDEX ROWID                           | HZ_GEOGRAPHIES               |  21808 |    168 |  21808 |00:00:00.07 |     620 |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  

Let’s start by raising some concerns about the quality of information available.

First, the OP says it takes 14 minutes to return 30,000 rows: but the top line of the plan says it has taken 13 minutes and 20 seconds to return the first 501 rows, and if we look a little further down the plan operation 3 (Hash Group By) reports 00:13:20.15 to aggregate down to 19,827 rows. So this half of the plan cannot return more than 19,827 rows, and the half I have discarded (for the moment) must be returning the other 10,000+ rows. The information we have is incomplete.

Of course you may think that whatever the rest of the plan does is fairly irrelevant – it’s only going to be responsible for at most another 40 seconds of processing – except my previous experience of rowsource execution statistics tells me that when you do a large number of small operations the times reported can be subject to fairly large rounding errors and that enabling the measurement can increase the execution time by a factor of three or four. It’s perfectly feasible that this half of the query is actually the faster half under normal run-time circumstances but runs much more slowly (with a much higher level of CPU utilisation) when rowsource execution stats is in enabled. So let’s not get too confident.

With that warning in mind, what can we see in this half of the plan.

Big picture: the inline scalar subqueries have disappeared. In 12c the optimimzer can unnest scalar subqueries in the select list and turn them into outer joins, and we can see that there are 6 “Nested Loop Outer” operations, corresponding to 6 “View Pushed Predicate” operations against views labelled VW_SSQ1 through to VW_SSQ6 (SSQ = Scalar Sub Query ?). This goes back to my early comment – a person could probably rewrite the 6 scalar subqueries as a single aggregate view in the from clause: the optimizer isn’t quite clever enough to manage that in this case, but in simpler cases it might be able to do exactly that.

Big picture 2: most of the 13 minutes 20 seconds appears at operation 14 as it processes the 33,459 rows supplied to it from the 4.33 seconds of work done by operation 15 and its descendants. Reducing this part of the execution plan to the smallest relevant section we get the following:

-----------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                                  | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  
|  14 |               NESTED LOOPS                                 |                              |      1 |      1 |  21808 |00:13:03.30 |    2961K|  
|  15 |                NESTED LOOPS                                |                              |      1 |      1 |  33459 |00:00:04.33 |    1123K|  
|  41 |                TABLE ACCESS BY INDEX ROWID BATCHED         | GL_DAILY_RATES               |  33459 |      1 |  21808 |00:12:44.59 |    1838K|  
|* 42 |                 INDEX RANGE SCAN                           | GL_DAILY_RATES_U1            |  33459 |      1 |  21808 |00:13:08.16 |    1837K|  
-----------------------------------------------------------------------------------------------------------------------------------------------------  

For each row supplied by operation 15 Oracle calls operation 41, which calls operation 42 to do an index range scan to supply a set of rowids so that operation 41 can access a table and return rows. Apparently the total time spent by operation 41 waiting for operation 42 to return rowids and then doing its own work is 12 minutes 44 seconds, while the range scans alone (all 33,459 of them) take 13 minutes and 8 seconds. Remember, though, that “lots of small operations = scope of rounding errors” when you look at these timings. Despite the inconsistency between the timings for operations 41 and 42 it’s reasonable to conclude that between them that’s where most of the execution time went.

Two questions – (a) can we refine our analysis of how the time is split between the two operations and (b) why do these lines take so much time.

Check the Starts and the A-rows: (reminder: for comparison, we expect A-rows to be approximately E-rows * Starts) for both operations we see 33,459 starts and 21,808 rows. The index range scans return (on average) a single rowid about two-thirds of the time, and every time a range scan returns a rowid the corresponding row is returned from the table (If you check the Id column there’s no asterisk on operation 41 so no extra predicate is applied as Oracle accesses the table row – but even if there were an extra predicate we’d still be happy to infer that if 21,808 rowids returned from operation 42 turned into 21,808 rows returned from the table then there are no wasted accesses to the table).

Now look at the Buffers for the index range scan – 1.837M: that’s roughly 56 buffers per range scan – that’s a lot of index to range through to find one rowid, which is a good clue that perhaps we do a lot of work with each Start and really do use up a lot of CPU on this operation. Let’s see what the Predicate Section of the plan tells us about this range scan:


Predicate Information (identified by operation id):  
---------------------------------------------------  
  42 - access("GDR"."FROM_CURRENCY"="RCT"."INVOICE_CURRENCY_CODE" AND "GDR"."TO_CURRENCY"="GLL"."CURRENCY_CODE" AND   
              "GDR"."CONVERSION_TYPE"='Corporate')  
       filter(("GDR"."CONVERSION_TYPE"='Corporate' AND TO_DATE(INTERNAL_FUNCTION("GDR"."CONVERSION_DATE"))=TO_DATE(INTERNAL_FUNCTION("RCTD"."  
              GL_DATE"))))  

We have access predicates (things which narrow down the number of leaf blocks that we walk through) and filter predicates (things we do to test every key entry we access). Notably the gdr.conversion type is a filter predciate as well as an access predicate – and that suggests that our set of predicates has “skipped over” a column in the index: from_currency and to_currency might be the first two columns in the index, but conversion_type is then NOT the third.

More significantly, though, there’s a column called conversion_date in the index (maybe that’s column 3 in the index – it feels like it ought to be); but for every index entry we’ve selected from the 56 blocks we walk through we do some sort of internal conversion (or un-translated transformation) to the column then convert the result to a date to compare it with another date (similarly processed from an earlier operation). What is that “internal function” – let’s check the query:


AND             gdr.from_currency = rct.invoice_currency_code
AND             gdr.to_currency = gll.currency_code
AND             gdr.conversion_type = 'Corporate'
AND             to_date(gdr.conversion_date) = to_date(rctd.gl_date)
AND             rctd.gl_date BETWEEN To_date ('01-JAN-2018', 'DD-MON-YYYY') AND  To_date ('31-JAN-2018', 'DD-MON-YYYY')

(I’ve swapped the order of a couple of lines to highlight a detail).

The filter predicate is comparing gdr.conversion_date with rctd.gl_date – and we can probably assume that both columns really are dates because (a) the word “date” is in their names and (b) the rctd.gl_date is being compared with genuine date values in the next predicate down (and – though I haven’t shown it – the way the plan reports the next predicate proves that the column really is a date datatype).

So the predicate in the SQL applies the to_date() function to two columns that are dates – which means the optimizer has to convert the date columns to some default character format and then convert them back to dates. The “internal function” is a to_char() call. Conversions between date and character formats are CPU-intensive, and we’re doing a double conversion (to_date(to_char(column_value)) to every data value in roughly 56 blocks of an index each time we call that line of the plan. It’s not surprising we spend a lot of time in that line.

Initial strategy:

Check the column types for those two columns, if they are both date types decide whether or not the predicate could be modified to a simple gdr.conversion_date = rctd.gl_date (though it’s possible that something slightly more sophisticated should be used) but whatever you do avoid the redundant conversion through character format.

Ideally, of course, if we can avoid this conversion we may find that Oracle can be more accurate in its range scan through the index, but we may still find that we do a large range scan even if we do manage to do it a little more efficiently, in which case we may want to see if there is an alternative index which will allow use to pick the one rowid we need from the index without  visiting so many leaf blocks in the index.

Warning

Simply eliminating the to_date() calls may changes the results. Here’s a demonstration of how nasty things happen when you apply to_date() to a date:


SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 D1                                     DATE
 D2                                     DATE

SQL> insert into t1 values(sysdate, sysdate + 10/86400);

1 row created.

SQL> select * from t1 where d1 = d2;

no rows selected

SQL> select * from t1 where to_date(d1) = to_date(d2);

D1        D2
--------- ---------
30-SEP-18 30-SEP-18

1 row selected.

SQL> alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';

Session altered.

SQL> select * from d1 where to_date(d1) = to_date(d2);

no rows selected

Different users could get different results because they have different settings for their nls_date_format.

Reminder

I started my analysis with two comments about the quality of information – first, we don’t really know whether or not this half of the union all would be responsble for most of the time if rowsource execution statistics were not enabled; secondly large number of small operations can lead to a lot of rounding errors in timing. There are six occurrences of unnested scalar subqueries which are all called 21,808 times – and the recorded time for all 6 of them is remarkably small given the number of executions, even when allowing for the precision with which they operate; it’s possible that these subqueries take a larger fraction of the total time than the plan indicates, in which case it might become necessary (rather than just nice) to do a manual unnesting and reduce the number of inline views to 3 (one for each line_type), 2 (one with, one without, conversion_rate) or just one.

Footnote

Once again I’ve spent a couple of hours writing notes to explain the thoughts that went through my mind in roughly 10 minutes of reading the original posting. It’s a measure really of how many bits of information you can pull together, and possibly discard, very quickly once you understand how many things the optimizer is capable of doing and how the execution plan tries to show you how a statement was (or will be) handled.

Update (5th Oct 2018)

Another way of looking for the best strategy for tuning this statement, given the available information, is this:

Where, in the sequence of events, does the data volume we’re processing drop to the right scale for the output. If we don’t drop to the right scale very early in the plan execution then we may need to re-arrange the order in which we visit tables; if we are operating at the right volume almost immediately then there’s a good chance that we’ve started the right way. Take a look at the first few lines of this plan (remembering that the query was interrupted before returning the whole result set):

-----------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                                  | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                                           |                              |      1 |        |    501 |00:13:20.17 |    3579K|  
|   1 |  UNION-ALL                                                 |                              |      1 |        |    501 |00:13:20.17 |    3579K|  
|   2 |   HASH UNIQUE                                              |                              |      1 |      1 |    501 |00:13:20.17 |    3579K|  
|   3 |    HASH GROUP BY                                           |                              |      1 |      1 |  19827 |00:13:20.15 |    3579K|  
|   4 |     NESTED LOOPS                                           |                              |      1 |        |  21808 |00:13:10.26 |    3579K|
-----------------------------------------------------------------------------------------------------------------------------------------------------    

At line 4 we generate 21,808 rows which we aggregate down to 19,827, which we then hash down to distinct values – the original user told us that the query returns 30,000 rows so we shouldn’t assume that the uniqueness requirement has reduced 19,827 rows to the 501 reported so far, there may be more to come. What we can say about these numbers, particularly lines 3 and 4 is that prior to the aggregation we need to find about 22,000 rows and carry them through the rest of the plan.

Now look at lines 24 – 28 where the heavy duty action starts (the first physical operation is actually at lines 19/20 where (thanks to swapping join inputs) Oracle scans the gl_ledger table and hashes it into memory in anticipation of incoming probe data – but that’s a tiny blip on the way to the big join):

-----------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                                  | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------------------------------------------  
|* 24 |                        HASH JOIN                           |                              |      1 |    385 |  33459 |00:00:00.40 |     526K|  
|* 25 |                         TABLE ACCESS BY INDEX ROWID BATCHED| GL_CODE_COMBINATIONS         |      1 |     35 |      1 |00:00:00.01 |     108 |  
|* 26 |                          INDEX RANGE SCAN                  | GL_CODE_COMBINATIONS_N2      |      1 |    499 |     77 |00:00:00.01 |       3 |  
|* 27 |                         TABLE ACCESS BY INDEX ROWID BATCHED| RA_CUST_TRX_LINE_GL_DIST_ALL |      1 |    651K|   1458K|00:00:02.22 |     526K|  
|* 28 |                          INDEX RANGE SCAN                  | RA_CUST_TRX_LINE_GL_DIST_N2  |      1 |    728K|   1820K|00:00:01.60 |   11147 | 
-----------------------------------------------------------------------------------------------------------------------------------------------------    

The important thing we see here is that the very first hash join identifies 33,459 rows: we’re immediately into the right ball-park for the final output. The timings are a bit suspect – I really don’t like seeing the time for hash join (0.4 seconds) being smaller than one of its direct child operations (the 2.22 seconds) – but this bit of the work seems to get to the right scale very quickly: this looks as if it’s likely to be a good way to start the final join order.

We might question whether the optimizer has been wise to use an index range scan to identify 1.45 million rows in a table (and probing it 1.82 million times). Maybe that was quick because all the data had previously been buffered and perhaps thisrange scan will be extremely slow on a busy production system; maybe a tablescan would be better, maybe there’s a way of getting to this big table through a different join order that means we only visit it roughly 33,459 times through an index that identifies exactly the rows we really need. Without good knowledge of what the data looks like (and without understanding what the query is supposed to achieve and how often it runs) we can only look at the supplied execution plan and work out where the time went and whether that suggests the plan is doing roughly the right thing or doing something that is clearly silly. This plan looks like a reasonable starting point with one minor (we hope) glitch around line 42 that we identified earlier on.

 

Next Page »

Powered by WordPress.com.