Oracle Scratchpad

May 24, 2019

Re-partitioning – 18

Filed under: 18c,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 12:50 pm BST May 24,2019

In yesterday’s note on the options for converting a range-partioned table into a composite range/list parititioned table I mentioned that you could do this online with a single command in 18c, so here’s some demonstration code to demonstrate that claim:


rem
rem     Script:         pt_comp_from_pt_18.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum - 1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

alter table pt_range modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
(
        partition p200 values less than (200)
)
-- online
;

execute dbms_stats.gather_table_stats(null, 'pt_range', granularity=>'all')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE'
order by
        partition_name, subpartition_name
;

Run this (with or without the online option) and you’ll (probably) see the Oracle error “ORA-00604: error occurred at recursive SQL level 1” with one of two underlying errors:

    ORA-01950: no privileges on tablespace 'SYSTEM'
    ORA-01536: space quota exceeded for tablespace 'SYSTEM'

So what’s gone wrong – it ought to work.

After enabling a 10046 trace I repeated the “alter table” command then scanned the trace file for the text “err=1950” (that being the error I’d received on my first attempt) and scanned backwards for the “PARSING IN CURSOR” line with a matching cursor id:


ERROR #139721552722200:err=1950 tim=26541227462

PARSING IN CURSOR #139721552722200 len=182 dep=1 uid=104 oct=1 lid=0 tim=26541224560 hv=2451601965 ad='7f1377267890' sqlid='0wsjfgk920yjd'
create table  "TEST_USER"."SYS_RMTAB$$_H124028"  ( src_rowid rowid not null , tgt_rowid rowid not null) 
    segment creation immediate nologging 
    tablespace  "SYSTEM"  
    rowid_mapping_table
END OF STMT

The code is trying to create a “rowid_mapping_table” in the system tablespace and I have no quota for the tablespace. (The 124028 in the table name relates to the object_id of the table I was trying to modify, by the way.)

The source of the error offered a big clue about a possible workaround sp I gave myself a quota (unlimited) on the system tablespace (alter user test_user quota unlimited on system) and that made it possible for the restructuring to take place. It’s not really an appropriate workaround for a production system though – especially if you’re using the online option and the table is subject to a lot of change.  (Note – this “rowid_mapping_table” and a “journal” table are created even if you haven’t selected the online option.)

Footnotes

  • The problem has been fixed in 19c (tested on LiveSQL)  and is listed on MoS as Bug 27580976 : INTERNAL RECURSIVE MAPPING TABLE MISTAKENLY PLACE IN SYSTEM FOR ONLINE OPS. There are no patches for 18c at present.
  • After I’d finished testing the quota workaround I tried to deprive myself of the quota on the system tablespace. I may have missed something in the manuals but it looks like the only way to do this is to give myself a zero quota (or, as I have done occasionally in the past, drop user cascade) because there’s no option for “quota denied” or “revoke quota” . This is why you may get one of two different messages after the ORA-00604. If you’ve never had a quota on the system tablespace you’ll get the “ORA-1950: no privileges” message, if you’ve had a quota at some time in the pasat and then had it set to zero’ you’ll get the “ORA-01536: space quota exceeded” message.

 

April 8, 2019

Describe Upgrade

Filed under: 18c,Bugs,Infrastructure,Oracle,Performance — Jonathan Lewis @ 11:02 am BST Apr 8,2019

Here’s an odd little change between Oracle versions that could have a stunning impact on the application performance if the thing that generates your client code happens to use an unlucky selection of constructs.  It’s possible to demonstrate the effect remarkably easily – you just have to describe a table, doing it lots of times to make it easy to see what’s happening.

rem
rem     Script:         describe_18c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem 

create table t1 as
select  *
from    all_objects
where   rownum = 1
;

set heading off
set feedback off
set pagesize 0
set linesize 156
set trimspool on
set termout off
set serveroutput on

execute snap_rowcache.start_snap
execute snap_libcache.start_snap

start start_10000
-- start_1 contains "describe t1"

set termout on
set serveroutput on

spool temp
execute snap_rowcache.end_snap
execute snap_libcache.end_snap

spool off

The start_10000 script is my mechanism for running a simple piece of code many times, and as the comment following it says, all I’m doing is repeating “describe t1”. The calls to “snap” something are examples of procedures I use to find the changes recorded in various dynamic performance views over short periods of time (there’s an example of the code for v$mystat here) In this case, as the names suggest, the snapshots record the changes in v$rowcache (the dictionary cache) and v$librarycache (the library cache). I’ve put a simple variant of the code at the end of the blog note so you don’t have to do all the preparation if you want to run a quick test for yourself.

Here are the results I get when running the test in Oracle 18.3.0.0

---------------------------------
Dictionary Cache - 05-Apr 19:00:00
Interval:-      27 seconds
---------------------------------
Parameter                 Sub# Usage Fixed    Gets  Misses   Scans  Misses    Comp    Mods Flushes
---------                ----- ----- -----    ----  ------   -----  --------------    ---- -------
dc_objects                         0     0 260,051       0       0       0       0       0       0
dc_users                           0     0 260,000       0       0       0       0       0       0
---------------------------------
Library Cache - 05-Apr 19:00:00
Interval:-      27 seconds
---------------------------------
Type      Description              Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
-----     -----                    ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE TABLE/PROCEDURE        10,003      10,003   1.0     280,028     280,028   1.0         0         0

Before showing you corresponding figures from 12.2.0.1 I’ll just point out that in version 18.3.0.0 of Oracle the structure of view all_objects gives me a table of 26 columns. Think about that and the 10,000 describes while looking at the number above, then look at the corresponding 12.2.0.1 results:

---------------------------------
Dictionary Cache - 05-Apr 19:00:00 
Interval:-      28 seconds
---------------------------------
Parameter                 Usage Fixed    Gets  Misses   Scans  Misses    Comp    Mods Flushes
---------                 ----- -----    ----  ------   -----  --------------    ---- -------
dc_users                      0     0       2       0       0       0       0       0       0
dc_objects                    0     0       9       0       0       0       0       0       0

---------------------------------
Library Cache - 05-Apr 19:04:17
Interval:-      28 seconds
---------------------------------
Type      Description              Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
-----     -----                    ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE TABLE/PROCEDURE        10,005      10,005   1.0      20,018      20,018   1.0         0         0

The internal mechanism of the “describe” call has changed between 12.2.0.1 to 18.3.0.0.

For each describe in 18.3, for each column in the table you see a “get” on dc_users and dc_objects in v$rowcache and you see one “get” on the TABLE/PROCEDURE namespace in v$librarycache, and (2 + number of columns) “pins”. In 12.2.0.1 there are no gets on the dictionary cache and only 1 get and two pins in the library cache for each describe.

As a couple of extra checks I modified the test to query from a 12c client to and 18c server, then from an 18c client to a 12c server. The big numbers appeared in the former test (i.e. when the server is 18c) and the small number for the latter (when the server is 12c). I also tried a couple of other variations on the theme:

  • If the table t1 doesn’t exist when I run the test then there are no gets on the row cache, and I see 2 gets and pins (with hits) on the library cache per describe.
  • If I run the test using “decribe other_schema.t1 the pins (and hits) on the library cache go up by 1 per describe
  • If I execute “alter session set current_schema = other_schema” so that “describe t1” is actually describing a table in another schema the pins (and hits) on the library cache go up by 1 per describe
  • If I run the test in the SYS schema, 18c behaves like 12c !! But SYS is often a little wierd compared to other schemas

Enabling the 10051 trace – I can see that both versions report an OPI call type = 119: “V8 Describe Any” for each call to “describe” (which, presumably, corresponds to the OCIDescribeAny() low-level function call). And that’s really where this blog started, and why lots of people might need to be aware (at least in the short term) of this change in behaviour across versions .

Welcome to the Real World.

My demonstration is clearly silly – no-one does hundreds of describes per second in a real application, surely. Well, not deliberately, and not necessarily with the intent to do a full describe, but sometimes n-tier development environments end up generating code that does things you might not expect. One such example is the way that JDBC can handle a requirement of the form:

insert into tableX( id, ......) values (my_sequence.nextval, ...) returning id into ?;

In the course of handling this requirement one of the coding strategies available to JDBC ends up executing the type 119 “V8 Describe Any” call. Imagine the effect this has when you have a couple of dozen concurrent sessions totalling a couple of million single row inserts per hour. The competition for library cache pins and row cache gets is massive – and the memory structures involved are all protected by mutexes. So when a a client of mine recently upgraded their system from 11.2.0.4 to 18.3.0.0 they saw “library cache: mutex X” waits change from a few hundred seconds per hour to tens of thousands of seconds, and “row cache mutex” leaping up  from nowhere in the “Top timed events” to reporting further even more thousands of seconds of wait time per hour.

The actual impact of this issue will depend very strongly on how much use you (or your ORM) makes of this construct. The problem may be particularly bad for my client because of the very large number of concurrent executions of a very small number of distinct statements that all address the same table. For low concurrency, or for a wide range of different tables and statements, you may not see so much contention.

If you are seeing contention for “row cache mutex” and “library cache: mutex X”, then a quick corroborative test (if you are licensed for the performance and dianostic packs) is to check the top_level_call# and top_level_call_name from v$active_session_history:

select
        top_level_call#, top_level_call_name, count(*)
from    v$active_session_history
group by
        top_level_call#, top_level_call_name
order by
        count(*)

If (119, ‘V8 Describe Any’) shows up as a significant fraction of the total then you’re probably looking at this issue.

Java is not my strong point – but here’s a trivial piece of standalone Java that you can use to demonstrate the issue if you’re familiar with running Java on the server. There are a few notes inline to explain necessary preparatory steps and code changes:


/*
        To create a class file, you need to execute
        javac temptest2.java

        This generates file temptest2.class
        If this is successful then execute
        java temptest {number of iterations} {commit frequency}

        e.g.
        java temptest2 10000 10

        To be able to compile, you need a CLASSPATH environment variable
        e.g. export CLASSPATH=$CLASSPATH:$ORACLE_HOME/jdbc/lib/ojdbc8.jar

        For java to see the created class the CLASSPATH must also include 
        the holding directory
        e.g. export CLASSPATH=$CLASSPATH:/mnt/working

        Example combined setting:
        export CLASSPATH=$CLASSPATH:$ORACLE_HOME/jdbc/lib/ojdbc8.jar:/mnt/working

        A schema will need to be created to hold two objects,
        And the connection string in the code below will need to be modified -
        and the listener has to be started and the database registered to it.

        Database objects:
        -----------------
        create sequence s1;

        create table test(
                id number, 
                code varchar2(32), 
                descr varchar2(32), 
                insert_user varchar2(32),
                insert_date date
        );

*/


import java.sql.*;
import oracle.jdbc.OracleDriver;
import java.util.Date;

public class temptest2
{
  public static void main (String arr[]) throws Exception
  {
    DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
    Connection con = DriverManager.getConnection
          ("jdbc:oracle:thin:@localhost:1521:or18","test_user","test");

    Integer iters = new Integer(arr[0]);
    Integer commitCnt = new Integer(arr[1]);

    con.setAutoCommit(false);
    doInserts( con, iters.intValue(), commitCnt.intValue() );

    con.commit();
    con.close();
  }

  static void doInserts(Connection con, int count, int commitCount )
  throws Exception
  {

    int  rowcnt = 0;
    int  committed = 0;
    long start = new Date().getTime();

    for (int i = 0; i < count; i++ ) {
      PreparedStatement ps = con.prepareStatement(
           "insert into test (id, code, descr, insert_user, insert_date) " +
                     "values (s1.nextval,?,?, user, sysdate)",
           new String[]{"id"}
      );
      ps.setString(1,"PS - code" + i);
      ps.setString(2,"PS - desc" + i);
      ps.executeUpdate();

      ResultSet rs = ps.getGeneratedKeys();
      int x = rs.next() ? rs.getInt(1) : 0;
      System.out.println(x);
                
      rowcnt++;
      ps.close();

      if ( rowcnt == commitCount )
      {
        con.commit();
        rowcnt = 0;
        committed++;
      }
    }
    long end = new Date().getTime();
    con.commit();
    System.out.println
    ("pstatement " + count + " times in " + (end - start) + " milli seconds committed = "+committed);
  }
}

/*
 *
 * Sample from trace file after setting events 10046 and 10051:
 *
 * OPI CALL: type=119 argc= 7 cursor=  0 name=V8 Describe Any
 * OPI CALL: type=94 argc=38 cursor=  0 name=V8 Bundled Exec
 * PARSE #140693461998224:c=0,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3884345238,tim=1368184246
 * EXEC #140693461998224:c=0,e=135,p=0,cr=0,cu=4,mis=0,r=1,dep=0,og=1,plh=3884345238,tim=1368184411
 * OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
 * CLOSE #140693461998224:c=0,e=15,dep=0,type=3,tim=1368185231
 * OPI CALL: type=14 argc= 0 cursor=  0 name=COMMIT
 * XCTEND rlbk=0, rd_only=0, tim=1368185291
 * OPI CALL: type=119 argc= 7 cursor=  0 name=V8 Describe Any
 * OPI CALL: type=94 argc=38 cursor=  0 name=V8 Bundled Exec
 * PARSE #140693461998224:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3884345238,tim=1368187929
 * EXEC #140693461998224:c=0,e=162,p=0,cr=0,cu=4,mis=0,r=1,dep=0,og=1,plh=3884345238,tim=1368188141
 * OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
 * CLOSE #140693461998224:c=0,e=6,dep=0,type=3,tim=1368189336
 * OPI CALL: type=14 argc= 0 cursor=  0 name=COMMIT
 * XCTEND rlbk=0, rd_only=0, tim=1368189373
 *
*/


You’ll notice that I’ve prepared, executed and closed a statement inside a loop. The problem wouldn’t happen if I prepared the statement before the loop and closed it after the loop, doing nothing but the execute inside the loop; but the code is simply modelling the “single row processing” effect that typically appears through ORMs.

You’ll have to decide for yourself how to take snapshots of the dynamic performance views while this code is running, and how to emable tracing – but anyone who want to fiddle with the code is probably better at coding Java than I am – so it’s left as an exercise to the reader (I used a logon trigger for the traces, and snap_rowcache and snap_libcache from another session).

There is a certain cruel irony to this issue.  For years I have been telling people that

    insert into my_table(id, ...) values(my_sequence.nextval,...) returning id into :bind1;

is more efficient than:

    select my_sequence.nextval into :bind1 from dual;
    insert into my_table(id,.....) values(:bind1, ...);

If, at present, you’re using  Hibernate as your ORM it generates code that does the (inefficient, bad practice) latter and you won’t see the “describe” problem.

Footnote

If you want a very simple SQL*Plus script to see the effect – and have privileges to query v$rowcache and v$librarycache – here’s a hundred describes with a little wrapper to show the effect:

em
rem     The schema running this script must not be SYS
rem     but must be granted select on v_$rowcache and
rem     v_$librarycache. For the results to be clearly
rem     visible the test needs to be run while virtually
rem     nothing else is running on the instance.
rem
rem     In 18.3.0.0 every describe seems to access 
rem     dc_users, dc_objects, and pin the library cache
rem     once for every column in the table described
rem     (plus a fixed "overhead" of 2 pins and one get)
rem
rem     When run by SYS the counts fall back to the
rem     12.2  numbers -i.e. only two pins and one get
rem     on the libraray cache with no accesses to the 
rem     dictionary cache
rem
rem     The excess gets and pins disappear in 19.2, 
rem     thought the pin count on the library cache 
rem     goes up to 4 per describe.
rem

drop table t1 purge;
create table t1(
        n1 number,
        n2 number,
        n3 number,
        n4 number,
        n5 number,
        n6 number,
        n7 number
)
;


-- variant create table t1 as select * from all_objects where rownum = 1;


set serveroutput off
set linesize 167
set trimspool on


spool temp_desc
select namespace, gets, pins from v$librarycache where namespace = 'TABLE/PROCEDURE';
select parameter, gets from v$rowcache where parameter in ('dc_users','dc_objects') and subordinate# is null;
spool off

set termout off

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

set termout on
set serveroutput on

spool temp_desc append

select namespace, gets, pins from v$librarycache where namespace = 'TABLE/PROCEDURE';
select parameter, gets from v$rowcache where parameter in ('dc_users','dc_objects') and subordinate# is null;

spool off



set doc off
doc

Sample output from 18.3
=======================
NAMESPACE                                                              GETS       PINS
---------------------------------------------------------------- ---------- ----------
TABLE/PROCEDURE                                                       27449      71108


PARAMETER                              GETS
-------------------------------- ----------
dc_users                              17341
dc_objects                           115830


NAMESPACE                                                              GETS       PINS
---------------------------------------------------------------- ---------- ----------
TABLE/PROCEDURE                                                       27555      72017


PARAMETER                              GETS
-------------------------------- ----------
dc_users                              18041
dc_objects                           116533


Note change in rowcache gets - one per column per describe on each parameter.
Note change in library cache pins - (one per column + 2) per describe.

Sample output from 12.2
=======================
NAMESPACE                  GETS       PINS
-------------------- ---------- ----------
TABLE/PROCEDURE           13393      20318


PARAMETER                              GETS
-------------------------------- ----------
dc_users                               4889
dc_objects                            31413


NAMESPACE                  GETS       PINS
-------------------- ---------- ----------
TABLE/PROCEDURE           13504      20539


PARAMETER                              GETS
-------------------------------- ----------
dc_users                               4889
dc_objects                            31416


No change in v$rowcache
Only the same single get and the "+2" pins per describe in v$librarycache

#


The good news is that I sent this text to a colleague who has access to Oracle 19, and the problem goes away (almost completley) – there are just two extra pins on the library cache in Oracle 19 compared to Oracle 12, and no gets on the rowcache dc_users and dc_objects. This suggests that it’s a known issue (even though there’s no visible bug report, and the problem is still present in 18.5) so it may be possible to get a backport of the version 19 code for vesion 18 fairly quickly. If not the best temporary workaround is probably to bypass the ORM and manually code for a function call that processes an anonymous PL/SQL block – but I haven’t tested that idea yet.

There is a JDBC cursor cache available – and if this were enabled than the prepared statement that was closed by the code would be kept open by the JDBC cache (and, of course, still be present in Oracle’s v$open_cursor) and Oracle wouldn’t receive any further parse or “describe” calls. Unfortunately it seems that there’s a cursor leak (still) in the JDBC caching algorithm that will lead to sessions hitting Oracle error “ORA-01000: maximum open cursors exceeded.”

Acknowledgements.

I’d particularly like to thank Lasse Jenssen who spent a couple of hours with me (when he could have been attending some interesting sessions) at the OUG Ireland conference a few days ago, working through various tests and strategies to pin down the problem and attempt to circumvent it. (Any criticism of the Java code above should, nevertheless be aimed at me).

Update

This problem is now visible on MoS as: Bug 29628952 : INCREASED ROW CACHE GETS ON DC_OBJECTS AND DC_USERS FOR DESCRIBE IN 18C AND LATER.

It’s also visible as Bug 29628647 : INCREASED GETS FOR DC_OBJECTS AND DC_USERS FOR DESCRIBE, unfortunately this latter bug has been associated with version 19.2 – where the problem doesn’t exist so the analyst has reported back (quite corretly) with “I see no problem.”

 

March 19, 2019

IM_DOMAIN$

Filed under: 18c,Oracle,Troubleshooting — Jonathan Lewis @ 12:05 pm BST Mar 19,2019

A few months ago Franck Pachot wrote about a recursive SQL statement that kept appearing in the library cache. I discovered the note today because I had just found a client site where the following statement suddenly appeared near the top of the “SQL ordered by Executions” section of their AWR reports after they had upgraded to 18c.


select domain# from sys.im_domain$ where objn = :1 and col# = :2

I found Franck’s article by the simple expedient of typing the entire query into a Google search – his note was the first hit on the list, and he had a convenient example (based on the SCOTT schema) to demonstrate the effect, so I built the tables from the schema and ran a simple test with extended SQL tracing (event 10046) enabled.

Here’s an extract (with no deletions) from the resulting trace file:

PARSING IN CURSOR #139819795591784 len=110 dep=0 uid=104 oct=3 lid=104 tim=31306461773 hv=3572295767 ad='6bf8b8a0' sqlid='8n2bcc3aftu2r'
select /*+ leading(EMP DEPT) USE_HASH(DEPT) USE_HASH(BONUS) */ * from DEPT natural join EMP natural join BONUS
END OF STMT
PARSE #139819795591784:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306461772

PARSING IN CURSOR #139819795585328 len=64 dep=1 uid=0 oct=3 lid=0 tim=31306461966 hv=1240924087 ad='69a8b760' sqlid='0b639nx4zdzxr'
select domain# from sys.im_domain$ where objn = :1 and col# = :2
END OF STMT
PARSE #139819795585328:c=37,e=37,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306461965
EXEC #139819795585328:c=32,e=31,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306462058
FETCH #139819795585328:c=17,e=17,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306462098
STAT #139819795585328 id=1 cnt=0 pid=0 pos=1 obj=10422 op='TABLE ACCESS BY INDEX ROWID IM_DOMAIN$ (cr=1 pr=0 pw=0 str=1 time=21 us cost=0 size=39 card=1)'
STAT #139819795585328 id=2 cnt=0 pid=1 pos=1 obj=10423 op='INDEX UNIQUE SCAN IM_DOMAIN_UK (cr=1 pr=0 pw=0 str=1 time=18 us cost=0 size=0 card=1)'
CLOSE #139819795585328:c=5,e=5,dep=1,type=1,tim=31306462287

EXEC #139819795591784:c=484,e=484,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306462316
FETCH #139819795591784:c=0,e=804,p=0,cr=44,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306463191
STAT #139819795591784 id=1 cnt=0 pid=0 pos=1 obj=0 op='HASH JOIN  (cr=45 pr=0 pw=0 str=1 time=1222 us cost=72 size=97 card=1)'
STAT #139819795591784 id=2 cnt=4 pid=1 pos=1 obj=0 op='HASH JOIN  (cr=45 pr=0 pw=0 str=1 time=1001 us cost=70 size=232 card=4)'
STAT #139819795591784 id=3 cnt=4 pid=2 pos=1 obj=117764 op='TABLE ACCESS FULL EMP (cr=22 pr=0 pw=0 str=1 time=259 us cost=35 size=152 card=4)'
STAT #139819795591784 id=4 cnt=4 pid=2 pos=2 obj=117765 op='TABLE ACCESS FULL DEPT (cr=22 pr=0 pw=0 str=1 time=81 us cost=35 size=80 card=4)'
STAT #139819795591784 id=5 cnt=0 pid=1 pos=2 obj=117766 op='TABLE ACCESS FULL BONUS (cr=0 pr=0 pw=0 str=1 time=4 us cost=2 size=39 card=1)'
CLOSE #139819795591784:c=24,e=24,dep=0,type=1,tim=31306508552

PARSE #139819795591784:c=41,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306508798
PARSE #139819795585328:c=21,e=22,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306509010
EXEC #139819795585328:c=132,e=132,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306509220
FETCH #139819795585328:c=20,e=19,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306509415
CLOSE #139819795585328:c=8,e=8,dep=1,type=3,tim=31306509494
EXEC #139819795591784:c=682,e=704,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306509558
FETCH #139819795591784:c=588,e=1246,p=0,cr=44,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306511014
CLOSE #139819795591784:c=23,e=22,dep=0,type=3,ti

As you can see, every time I do a parse call for the query against the SCOTT schema (PARSE #139819795591784), Oracle does a parse/exec/fetch/close for the query against im_domain$ (PARSE #139819795585328) – and this happens even when the SCOTT query is in the session cursor cache!

As Franck points out, this looks like something to do with the In Memory option even though the option wasn’t enabled in his database, and wasn’t enabled in my client’s database. Once you’ve got a reproducible example of a problem, though, you can start fiddling to see if you can bypass it. In this case I decided to check all the parameters to do with the in-memory option – which is a bit like hard work because there are 208 parameters that include the word “inmemory”. After checking the descriptions of the first twenty or so I decided there was an easier option – if Oracle is asking about “domains” for columns possibly it’s something to do with the relatively new “join group” feature for in-memory columnar compression, so I ran a query to produce the list of parameter names and description for parameter with the words “join” and “group” in their names – there are two:


_sqlexec_join_group_aware_hj_enabled              enable/disable join group aware hash join
_sqlexec_join_group_aware_hj_unencoded_rowset     minimum number of unencoded rowsets processed before adaptation 

The first one looks rather promising – and it has a default value to TRUE, and it can be changed by “alter session” or “alter system”. So I executed:


alter session set "_sqlexec_join_group_aware_hj_enabled" = false;
alter system flush shared_pool;

Then I ran my test again and voila! there it wasn’t. No more queries against in_domain$.

Problem solved (until the client decides they want to use the in-memory option, perhaps).

There may be other reasons why this recursive query appears which aren’t to do with hash joins, of course, but join groups are specifically to allow efficient hash joins with the in-memory option, (it’s a trick to do with common encoding for compression to allow Bloom filtering to eliminate CUs without decoding) so I’m hoping I won’t have to track down and eliminate another sources for the query.

 

October 23, 2018

Upgrade threat

Filed under: 12c,18c,Histograms,Oracle,Statistics,Upgrades — Jonathan Lewis @ 7:50 pm BST Oct 23,2018

Here’s one I’ve just discovered while trying to build a reproducible test case – that didn’t reproduce because an internal algorithm has changed.

If you upgrade from 12c to 18c and have a number of hybrid histograms in place you may find that some execution plans change because of a change in the algorithm for producing hybrid histograms (and that’s not just if you happen to get the patch that fixes the top-frequency/hybrid bug relating to high values).

Here’s a little test to demonstrate how I wasted a couple of hours trying to solve the wrong problem – first a simple data set:


rem
rem     Script:         18c_histogram_upgrade.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem 

drop table t2 purge;

execute dbms_random.seed(0)

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

I’ve created a skewed data set which (we will see) has 22 distinct values and created a histogram of 13 buckets on it. This will be a hybrid histogram – but different versions of Oracle will produce different histograms (even though the data set is the same for both versions):


select
        j2, count(*)
from
        t2
group by
        j2
order by
        j2
;

select
        endpoint_value                                                            value,
        endpoint_number,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) bucket_size,
        endpoint_repeat_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
;

Here’s the dataset from 12.2.0.1 and 18.3.0.0


        J2   COUNT(*)
---------- ----------
         1          1
         8          3
         9          1
        10          5
        11          4
        12          8
        13         14
        14          9
        15         11
        16         22
        17         34
        18         31
        19         36
        20         57
        21         44
        22         45
        23         72
        24         70
        25         87
        26        109
        27         96
        28         41

22 rows selected.



And here are the histograms - 12.2.0.1 then 18.3.0.0:



     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        18             143          31                    31
        19             179          36                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        28             800         137                    41

13 rows selected.

     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        19             179          67                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        27             759          96                    96
        28             800          41                    41

13 rows selected.

Both histograms have 13 buckets as requested; both are hybrid histograms as expected.

But why does 12c have the value 18 when 18c doesn’t, and why does 18c have the value 27 when 12c doesn’t ?

That’s the second time in two weeks I’ve had reproducible test cases not reproducing – thanks to an 18c upgrade.

Update (See comments)

I had completely forgotten that a previous defect in the construction of hybrid (and Top-N) histograms had been addressed in 18.3 but needed a fix in 12.2 and a backport patch in 12.1.0.2.

Since the defect could “lose” a popular value in order to ensure that both the low and high values were captured in the histogram it’s not surprising that a fix could result in one of the popular values in a histogram dissappearing (after the upgrade) even when the gather had used a 100% sample. Quite possibly the algorithm used to ensure the presence of the high value has had a cascading effect down the histogram that can affect which popular values get into the histogram with repeat counts.

I think I’m going to have to grit my teeth and patch a 12.1.0.2, or update a 12.2.0.1 with exactly the right patch-set to find out.

[It has now been confirmed by Nigel Bayliss that this is a side effect of the fix to the bug 25994960]

October 8, 2018

Random Upgrade

Filed under: 18c,Oracle,Upgrades — Jonathan Lewis @ 1:36 pm BST Oct 8,2018

Here’s a problem that (probably) won’t affect the day to day running of most systems – but it could be a pain in the backside for people who write programs to generate repeatable test data. I’m not going to say much about the problem, just leave you with a test script.


rem
rem	Script	random_upgrade.sql
rem	Author:	Jonathan Lewis
rem	Dated:	Oct 2018
rem
rem	Last tested
rem		18.3.0.0
rem		12.2.0.1
rem	Notes
rem	In the upgrade from 12.2.0.1 something
rem	changed that meant
rem		create as select dbms_random
rem	gets different data from
rem		select dbms_random
rem

drop table t4 purge;
drop table t3 purge;
drop table t2 purge;
drop table t1 purge;
drop table t0 purge;

set feedback off

create table t0 as
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
;


execute dbms_random.seed(0);

create table t1
as
select dbms_random.normal
from
	t0
;

execute dbms_random.seed(0);

create table t2
as
with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	dbms_random.normal
from
	g1
;

prompt	=================
prompt	Diff the two CTAS
prompt	=================

select count(*)
from (
select * from t1
minus
select * from t2
union all
select * from t2
minus
select * from t1
)
;


create table t3 
as 
select * from t2 
where rownum < 1 -- > comment to avoid WordPress format issue
;

create table t4 
as 
select * from t2 
where rownum < 1 -- > comment to avoid WordPress format issue
;

execute dbms_random.seed(0)

insert into t3
select dbms_random.normal
from
	t0
;

execute dbms_random.seed(0)

insert into t4
with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	dbms_random.normal
from
	g1
;


prompt	===================
prompt	Diff the two Insert
prompt	===================

select count(*)
from (
select * from t3
minus
select * from t4
union all
select * from t4
minus
select * from t3
)
;


prompt	===========
prompt	Sum of CTAS
prompt	===========

select sum(normal) from t1;

prompt	=============
prompt	Sum of Insert
prompt	=============

select sum(normal) from t3;


execute dbms_random.seed(0)

prompt	=============
prompt	Sum of select
prompt	=============

with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select sum(n) from (
select
	dbms_random.normal n
from
	g1
)
;


I’m repeatedly using dbms_random.seed(0) to reset the random number generator and trying to generate 10,000 normally distributed numbers. (I’ve chosen the normal distribution because that happened to be the function in a script I sent someone with the comment that “this will recreate the data for the demonstration” – and they wrote back to say that it didn’t.)

I’ve got two “create as select”, and two “insert as select”. One of each pair selects from a real existing table to get 10,000 rows, the other uses the “select dual connect by” trick to generate rows. I’ve written SQL that shows whether or not the two pairs of tables end up with the same data (they do, pairwise), then I’ve summed one table from each pair to see if the different mechanisms produce the same data – and that depends on the version of Oracle you’re using. Finally I’ve reset the random number generator and summed across a pure select to see what that produces.

If you run this code on 12.2.0.1 or earlier you’ll see that the “diffs” report zeros and the “sums” report -160.39249. If you upgrade to 18.3 the diffs will still report zeros and some of the sums will still report -160.39249 but the sum of the CTAS will report -91.352172.

Bottom Line

If you’ve got code that you wrote to create reproducible test cases and the code uses: “create table … as select … dbms_random …” then it won’t produce the same data when you upgrade to 18.3. You’ll have to modify the code to do “create table (); insert as select …”.

As of this afternoon I have 1,209 test scripts on my laptop that use the dbms_random package to model data distribution patterns. It is almost certain that I will end up modifying every single one of them eventually.

There are words to express how I feel about this – but not ones that I would consider publishing.

Powered by WordPress.com.