Oracle Scratchpad

October 30, 2018

Index Splits – 2

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 1:29 pm GMT Oct 30,2018

In yesterday’s article I described the mechanism that Oracle for an index leaf block split when you try to insert a new entry into a leaf block that is already full, and I demonstrated that the “50-50” split and the “90-10” split work in the same way, namely:

  • save the old block into the undo
  • prepare a new leaf block
  • “share” the data between the old and new leaf blocks
  • sort out pointers

The obvious question to ask about this process is: “Why does Oracle save and rewrite the whole of the old leaf block during a 90-10 split when the data in the block doesn’t appear to change ?” The “sharing” in the 90-10 split is most uneven, and it appears that Oracle simply attaches a new leaf block to the index structure and writes the new index entry into it, leaving the existing index entries unchanged in the current leaf block.

The answer to that question can be found by doing block dumps – except you won’t see the answer if you use my original test data. So here’s a follow-on script to the previous test (written 11 years after the previous script):


rem
rem     Script:         index_splits3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem

drop table t2 purge;

create table t2 as select * from t1 where id <= 148;
alter table t2 add constraint t2_pk primary key(id, idx_pad) using index pctfree 0;

column object_id new_value m_index_id
select object_id from user_objects where object_name = 'T2_PK' and object_type = 'INDEX';

begin
        for r in (select * from t1 where id between 149 and 292 order by dbms_random.value) loop
                insert into t2 values(r.id, r.idx_pad, r.padding);
        end loop;
        commit;
end;
/

alter session set events 'immediate trace name treedump level &m_index_id';
alter system flush buffer_cache;

prompt  check the trace file for the last block of the index
prompt  then do a block dump for it.
prompt  then press return

pause

insert into t2 values(293, rpad('x',40,'x'), rpad('y',50,'y'));
commit;

alter session set events 'immediate trace name treedump level &m_index_id';
alter system flush buffer_cache;


This test depends on the number of rows used for the previous test – and I have four hard-coded values (148, 149, 292, 293) in it that matter. If you’ve had to use a different number of rows in your version of the first test you will need to adjust these values to match.

I’ve created a clone of the t1 table copying only the first 148 rows – this is just enough rows that when I create a unique (PK) index on the table the index will have two leaf blocks, the first holding 147 entries and the second holding one entry. I’ve then inserted the next 144 rows from t1 into t2 in random order, so that I end up with two full leaf blocks.

Once the data is ready the code issues a treedump command (so that we can check the index is as I’ve described it) and flushes the buffer_cache, then prompts you with some instructions and waits for you to press return. At this point you need some manual intervention from another session – you can examine the treedump to work out the file and block addresses of the two leaf blocks and dump the second leaf block (‘alter database dump datafile N block M;’).

After you’ve done the block dump press return and my code resumes and inserts a new row that will cause a 90-10 split to happen, then it does another treedump (to let you check the block addresses and see that the split was 90-10), and flushes the buffer cache again. This is where you can check the block address of the second leaf block (in case it has managed to change – which it shouldn’t) and dump the block again.

Here, with a huge chunk removed from the middle, are the results I got from searching for the expression “row#” in the two block dumps that I generated in my test.


Before 90/10 block split:
-------------------------
row#0[7979] flag: -------, lock: 0, len=53, data:(6):  01 40 01 84 00 03
row#1[1885] flag: -------, lock: 2, len=53, data:(6):  01 40 01 fd 00 2b
row#2[1938] flag: -------, lock: 2, len=53, data:(6):  01 40 01 fd 00 2a
row#3[5595] flag: -------, lock: 2, len=53, data:(6):  01 40 01 f9 00 2c
row#4[3581] flag: -------, lock: 2, len=53, data:(6):  01 40 01 fd 00 0b
...
row#142[1408] flag: -------, lock: 2, len=53, data:(6):  01 40 01 fd 00 34
row#143[2150] flag: -------, lock: 2, len=53, data:(6):  01 40 01 fd 00 26
row#144[878] flag: -------, lock: 2, len=53, data:(6):  01 40 01 fd 00 3e


After 90/10 block split
-----------------------
row#0[348] flag: -------, lock: 0, len=53, data:(6):  01 40 01 84 00 03
row#1[401] flag: -------, lock: 0, len=53, data:(6):  01 40 01 fd 00 2b
row#2[454] flag: -------, lock: 0, len=53, data:(6):  01 40 01 fd 00 2a
row#3[507] flag: -------, lock: 0, len=53, data:(6):  01 40 01 f9 00 2c
row#4[560] flag: -------, lock: 0, len=53, data:(6):  01 40 01 fd 00 0b
...
row#142[7873] flag: -------, lock: 0, len=53, data:(6):  01 40 01 fd 00 34
row#143[7926] flag: -------, lock: 0, len=53, data:(6):  01 40 01 fd 00 26
row#144[7979] flag: -------, lock: 0, len=53, data:(6):  01 40 01 fd 00 3e

The “row#” is in ascending order – these lines in an index leaf block dump show Oracle walking through the block’s “row directory”; the number in square brackets following the row number is the offset into the block where the corresponding index entry will be found. When Oracle inserts an index entry into a leaf block it adjusts the row directory to make a gap in the right place so that walking the directory in row# order allows Oracle to jump around the block and find the index entries in key order.

When Oracle rewrites the block it first sorts the index entries into key order so that the actual index entries are written into the block in key order and a range scan that moves a pointer smoothly through the row directory will be moving another pointer smoothly down the block rather than making the pointer jump all over the place. Presumably this has (or maybe had) a benefit as far as the CPU cache and cache lines are concerned.

So there is a method in the madness of “copy the whole block even when the content doesn’t change”. The content doesn’t change but the order does, and paying the cost of sorting once may return a benefit in efficiency many times in the future.

 

October 29, 2018

Index splits

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 1:48 pm GMT Oct 29,2018

After writing this note I came to the conclusion that it will be of no practical benefit to anyone …  but I’m publishing it anyway because it’s just an interesting little observation about the thought processes of some Oracle designer/developer. (Or maybe it’s an indication of how it’s sensible to re-use existing code rather than coding for a particular boundary case, or maybe it’s an example of how to take advantage of “dead time” to add a little icing to the cake when the extra time required might not get noticed). Anyway, the topic came up in a recent thread on the OTN/ODC database forum and since the description given there wasn’t quite right I thought I’d write up a correction and a few extra notes.

When an index leaf block is full and a new row has to be inserted in the block Oracle will usually allocate a new leaf block, split the contents of the full block fairly evenly between two leaf blocks, then update various pointers to bring the index structure up to date. At various moments in this process the branch block above the leaf block and the leaf blocks either side of the splitting block have to be pinned. The number of times this happens is reported under the statistic “leaf node splits” but there is a special subset of leaf node splits that handles the case when the key in the new row is greater than the current high value in the block and the block is the “rightmost” (i.e. high values) block in the index. In this case Oracle adds a new leaf block to the end of the index and inserts the new value in the new block; it doesn’t share the data at all between the old and new leaf blocks. This special case is reported under the statistic “leaf node 90-10 splits”, even though “100-0” would be a more accurate description than “90-10”.

This note is a description of the work done by  a leaf node split and compares the work for a “50-50” split (as the general case is often called) and a 90-10 split. You might think that the latter would be less resource-intensive than the former but, in fact, that’s not the case. Here’s a little script to get things going – I’m using an 8KB block size and ASSM (automatic segment space management); if your default tablespace definition is different the number of rows you have to use will need to be changed.


rem
rem     Script:         index_splits3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          September 2007
rem

start setenv
set timing off
set feedback off

define m_limit = 292

drop table t1 purge;

create table t1 (id number, idx_pad varchar2(40), padding varchar2(50));
alter table t1 add constraint t1_pk primary key(id, idx_pad);

column object_id new_value m_index_id

select
        object_id
from
        user_objects
where
        object_name = 'T1_PK'
and     object_type = 'INDEX'
;

begin
        for i in 1..&m_limit loop
                insert into t1 values(
                        i, rpad('x',40,'x'), rpad(' ',50)
                );
                commit;
        end loop;
end;
/


I’ve created a table with a two-column primary key and inserted “m_limit” rows into that table in an order that matches the primary key. The limit of 292 rows (which I’ve declared at the top of the program) means that the index entries for the data set will exactly fill two leaf blocks. I’ve captured the object_id of the index because I want to do a “treedump” of the index before and after inserting the next row.

I now need to run one of two tests inserting a single row. Either insert a row that is above the current highest value to force a 90-10 index leaf node split, or insert a row a little below the current high value to force a 50-50 index node split in the 2nd of the two index leaf blocks.


alter session set events 'immediate trace name treedump level &m_index_id';

alter system switch logfile;
execute snap_my_stats.start_snap;

begin
        for i in &m_limit + 1 .. &m_limit + 1  loop
                insert into t1 values(
                        i, rpad('x',40,'x'), rpad(' ',50)
--                      i - 2 , rpad('y',40,'y'), rpad(' ',50)
                );
                commit;
        end loop;
end;
/

execute snap_my_stats.end_snap

alter session set events 'immediate trace name treedump level &m_index_id';

execute dump_log

The calls to snap_my_stats are using a package I wrote a long time ago to report a delta in my session’s stats. The call to dump_log uses another little package to identify the current log file and issue an “alter system dump logfile …” command. Of the two possible sets of values for the row being inserted the first one will cause a 90-10 split the second (commented out) will cause a 50-50 split.

Here are the results from the calls to treedump – first the dump taken before the insert then the dump after a 90-10 split, finally the dump after re-running the test and forcing a 50-50 split. These results came from a database running 11.2.0.4, but the results are the same for 12.1.0.2 and 18.3.0.0:


----- begin tree dump
branch: 0x140008b 20971659 (0: nrow: 2, level: 1)
   leaf: 0x140008e 20971662 (-1: nrow: 147 rrow: 147)
   leaf: 0x140008f 20971663 (0: nrow: 145 rrow: 145)
----- end tree dump

----- begin tree dump
branch: 0x140008b 20971659 (0: nrow: 3, level: 1)
   leaf: 0x140008e 20971662 (-1: nrow: 147 rrow: 147)
   leaf: 0x140008f 20971663 (0: nrow: 145 rrow: 145)
   leaf: 0x140008c 20971660 (1: nrow: 1 rrow: 1)
----- end tree dump


----- begin tree dump
branch: 0x140008b 20971659 (0: nrow: 3, level: 1)
   leaf: 0x140008e 20971662 (-1: nrow: 147 rrow: 147)
   leaf: 0x140008f 20971663 (0: nrow: 78 rrow: 78)
   leaf: 0x140008c 20971660 (1: nrow: 68 rrow: 68)
----- end tree dump


As you can see the extra row in the first case has been inserted into a new leaf block leaving the 2nd leaf block (apparently) unchanged; in the second case the 145 initial rows plus the one extra row have been shared fairly evenly between two leaf block. I can’t explain the imbalance in this case, it doesn’t affect the length of the branch entry. If you’re wondering why the first leaf block held 147 entries while the original 2nd leaf block held 145 it’s because the first 100 entries in the first leaf block had a value for the id column that was 2 bytes long, after which the id needed 3 bytes storage for Oracle’s internal representation.)

Having examined the treedumps to see that the splits are 90-10 and 50-50 respectively we can now look at the undo and redo generated by the different cases. Here are the relevant values extracted from the snapshots of the session stats. Again the first set comes from the 90-10 split, the second from the 50-50 split.


Redo/Undo stats 90/10 split
--------------------------------------------
redo entries                               9
redo size                             18,500
undo change vector size                8,736

Redo/Undo stats 50/50 split
--------------------------------------------
redo entries                               9
redo size                             18,520
undo change vector size                8,736

In both cases the volume of undo and redo is the same (plus or minus a tiny bit – with tiny variations across versions). The undo is equivalent to roughly a whole block plus a few percent (and that will be copied into the redo, of course) and the “pure” redo is also equivalent to a whole block plus a few percent for a total of two data blocks worth plus a couple of thousand bytes. (The extra percentage is mostly about the old and new pointers as we break and make links in the leaf blocks and update and insert links from the branch block above.)

So why does a 90/10 split, which appears simply to add a leaf block and insert one row, do so much work? The answer lies (to start with) in the dump of the redo log file. The session statistics show 9 redo entries (redo change records) generated in both cases, so I’m going to start by picking out a summary of those records from the log file dumps using egrep to identify the lines showing the redo change record length (LEN:) and the redo change vector op codes (OP:). Here’s the output, with a little cosmetic modification, for the 90-10 split.


egrep -e "LEN:" -e"OP:" test_ora_20690.trc

REDO RECORD - Thread:1 RBA: 0x00314b.00000002.0010 LEN: 0x0074 VLD: 0x05
(LWN RBA: 0x00314b.00000002.0010 LEN: 0038 NST: 0001 SCN: 0x0b86.0cfca74e)
CHANGE #1 TYP:2 CLS:1 AFN:5 DBA:0x0140008f OBJ:349950 SCN:0x0b86.0cfca74a SEQ:2 OP:4.1 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000002.0084 LEN: 0x0144 VLD: 0x01
CHANGE #1 TYP:0 CLS:73 AFN:3 DBA:0x00c00160 OBJ:4294967295 SCN:0x0b86.0cfca746 SEQ:2 OP:5.2 ENC:0 RBL:0
CHANGE #2 TYP:1 CLS:74 AFN:3 DBA:0x00c0465f OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:1 OP:5.1 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008f OBJ:349950 SCN:0x0b86.0cfca74e SEQ:1 OP:10.6 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000002.01c8 LEN: 0x20a4 VLD: 0x01                                        ***
CHANGE #1 TYP:0 CLS:73 AFN:3 DBA:0x00c00160 OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:1 OP:5.2 ENC:0 RBL:0
CHANGE #2 TYP:1 CLS:74 AFN:3 DBA:0x00c04660 OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:1 OP:5.1 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008f OBJ:349950 SCN:0x0b86.0cfca74e SEQ:2 OP:10.9 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000013.017c LEN: 0x0044 VLD: 0x01
CHANGE #1 TYP:0 CLS:8 AFN:5 DBA:0x01400088 OBJ:349950 SCN:0x0b86.0cfca638 SEQ:3 OP:13.22 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000013.01c0 LEN: 0x01ac VLD: 0x01
CHANGE #1 TYP:0 CLS:73 AFN:3 DBA:0x00c00160 OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:2 OP:5.2 ENC:0 RBL:0
CHANGE #2 TYP:1 CLS:74 AFN:3 DBA:0x00c04661 OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:1 OP:5.1 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008c OBJ:349950 SCN:0x0b86.0cfca638 SEQ:2 OP:10.8 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000014.017c LEN: 0x0048 VLD: 0x01
CHANGE #1 TYP:2 CLS:1 AFN:5 DBA:0x0140008b OBJ:349950 SCN:0x0b86.0cfca639 SEQ:2 OP:4.1 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000014.01c4 LEN: 0x00e0 VLD: 0x01
CHANGE #1 TYP:0 CLS:74 AFN:3 DBA:0x00c04661 OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:2 OP:5.1 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:1 AFN:5 DBA:0x0140008b OBJ:349950 SCN:0x0b86.0cfca74e SEQ:1 OP:10.15 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000015.00b4 LEN: 0x1fb0 VLD: 0x01                                        ***
CHANGE #1 TYP:0 CLS:73 AFN:3 DBA:0x00c00160 OBJ:4294967295 SCN:0x0b86.0cfca74e SEQ:3 OP:5.4 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:1 AFN:5 DBA:0x0140008f OBJ:349950 SCN:0x0b86.0cfca74e SEQ:3 OP:10.8 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314b.00000025.0164 LEN: 0x0320 VLD: 0x09
CHANGE #1 TYP:2 CLS:1 AFN:5 DBA:0x01400084 OBJ:349949 SCN:0x0b86.0cfca74a SEQ:2 OP:11.2 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:83 AFN:3 DBA:0x00c004a8 OBJ:4294967295 SCN:0x0b86.0cfca738 SEQ:2 OP:5.2 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008c OBJ:349950 SCN:0x0b86.0cfca74e SEQ:1 OP:10.5 ENC:0 RBL:0
CHANGE #4 TYP:0 CLS:83 AFN:3 DBA:0x00c004a8 OBJ:4294967295 SCN:0x0b86.0cfca750 SEQ:1 OP:5.4 ENC:0 RBL:0
CHANGE #5 TYP:0 CLS:84 AFN:3 DBA:0x00c04b0f OBJ:4294967295 SCN:0x0b86.0cfca738 SEQ:2 OP:5.1 ENC:0 RBL:0
CHANGE #6 TYP:0 CLS:84 AFN:3 DBA:0x00c04b0f OBJ:4294967295 SCN:0x0b86.0cfca750 SEQ:1 OP:5.1 ENC:0 RBL:0

I’ve highlighted two redo records with ‘***’ at the end of line. One of these records has length 0x20a4, the other has length 0x1fb0 i.e. roughly a whole data block each. We’ll look at those in more detail in a moment. Here, for comparison, is the result from the 50-50 split – again with a few highlighted lines:

REDO RECORD - Thread:1 RBA: 0x00314f.00000002.0010 LEN: 0x0074 VLD: 0x05
(LWN RBA: 0x00314f.00000002.0010 LEN: 0038 NST: 0001 SCN: 0x0b86.0cfcbc25)
CHANGE #1 TYP:2 CLS:1 AFN:5 DBA:0x0140008f OBJ:349962 SCN:0x0b86.0cfcbc21 SEQ:2 OP:4.1 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.00000002.0084 LEN: 0x0144 VLD: 0x01
CHANGE #1 TYP:0 CLS:69 AFN:3 DBA:0x00c000e8 OBJ:4294967295 SCN:0x0b86.0cfcbc15 SEQ:2 OP:5.2 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:70 AFN:3 DBA:0x00c10c43 OBJ:4294967295 SCN:0x0b86.0cfcbc15 SEQ:2 OP:5.1 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008f OBJ:349962 SCN:0x0b86.0cfcbc25 SEQ:1 OP:10.6 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.00000002.01c8 LEN: 0x20a4 VLD: 0x01                                        ***
CHANGE #1 TYP:0 CLS:69 AFN:3 DBA:0x00c000e8 OBJ:4294967295 SCN:0x0b86.0cfcbc25 SEQ:1 OP:5.2 ENC:0 RBL:0
CHANGE #2 TYP:1 CLS:70 AFN:3 DBA:0x00c10c44 OBJ:4294967295 SCN:0x0b86.0cfcbc25 SEQ:1 OP:5.1 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008f OBJ:349962 SCN:0x0b86.0cfcbc25 SEQ:2 OP:10.9 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.00000013.017c LEN: 0x0044 VLD: 0x01
CHANGE #1 TYP:0 CLS:8 AFN:5 DBA:0x01400088 OBJ:349962 SCN:0x0b86.0cfcbb24 SEQ:3 OP:13.22 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.00000013.01c0 LEN: 0x1010 VLD: 0x01$                                       ***
CHANGE #1 TYP:0 CLS:69 AFN:3 DBA:0x00c000e8 OBJ:4294967295 SCN:0x0b86.0cfcbc25 SEQ:2 OP:5.2 ENC:0 RBL:0
CHANGE #2 TYP:1 CLS:70 AFN:3 DBA:0x00c10c45 OBJ:4294967295 SCN:0x0b86.0cfcbc25 SEQ:1 OP:5.1 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008c OBJ:349962 SCN:0x0b86.0cfcbb24 SEQ:2 OP:10.8 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.0000001c.0060 LEN: 0x0048 VLD: 0x01
CHANGE #1 TYP:2 CLS:1 AFN:5 DBA:0x0140008b OBJ:349962 SCN:0x0b86.0cfcbb25 SEQ:2 OP:4.1 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.0000001c.00a8 LEN: 0x00e0 VLD: 0x01
CHANGE #1 TYP:0 CLS:70 AFN:3 DBA:0x00c10c45 OBJ:4294967295 SCN:0x0b86.0cfcbc25 SEQ:2 OP:5.1 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:1 AFN:5 DBA:0x0140008b OBJ:349962 SCN:0x0b86.0cfcbc25 SEQ:1 OP:10.15 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.0000001c.0188 LEN: 0x1150 VLD: 0x01                                        ***
CHANGE #1 TYP:0 CLS:69 AFN:3 DBA:0x00c000e8 OBJ:4294967295 SCN:0x0b86.0cfcbc25 SEQ:3 OP:5.4 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:1 AFN:5 DBA:0x0140008f OBJ:349962 SCN:0x0b86.0cfcbc25 SEQ:3 OP:10.8 ENC:0 RBL:0

REDO RECORD - Thread:1 RBA: 0x00314f.00000025.0168 LEN: 0x0330 VLD: 0x09
CHANGE #1 TYP:2 CLS:1 AFN:5 DBA:0x01400084 OBJ:349961 SCN:0x0b86.0cfcbc21 SEQ:2 OP:11.2 ENC:0 RBL:0
CHANGE #2 TYP:0 CLS:73 AFN:3 DBA:0x00c00160 OBJ:4294967295 SCN:0x0b86.0cfcbc1a SEQ:2 OP:5.2 ENC:0 RBL:0
CHANGE #3 TYP:0 CLS:1 AFN:5 DBA:0x0140008c OBJ:349962 SCN:0x0b86.0cfcbc25 SEQ:1 OP:10.5 ENC:0 RBL:0
CHANGE #4 TYP:0 CLS:73 AFN:3 DBA:0x00c00160 OBJ:4294967295 SCN:0x0b86.0cfcbc27 SEQ:1 OP:5.4 ENC:0 RBL:0
CHANGE #5 TYP:0 CLS:74 AFN:3 DBA:0x00c04c64 OBJ:4294967295 SCN:0x0b86.0cfcbc1a SEQ:2 OP:5.1 ENC:0 RBL:0
CHANGE #6 TYP:0 CLS:74 AFN:3 DBA:0x00c04c64 OBJ:4294967295 SCN:0x0b86.0cfcbc27 SEQ:1 OP:5.1 ENC:0 RBL:0

There are three interesting records in the 50-50 split with lengths 0x20a4 (the same as the 90-10 split), 0x1010, 0x1150. So we seem to start the same way with a “full block” record, and follow up with two “half block” records. The numbers allow you to make a reasonable guess – Oracle copies the original leaf block into the undo, then writes the two new leaf blocks as “pure” redo; in one case the two new leaf block redo records constitute a whole block and a tiny fraction of a block; in the other case the two new leaf block redo records constitute two half blocks.

I won’t show you the full detail that I checked in the log file dump, but the big 0x20a4 record in the 90-10 split is mostly made up of an “OP:5.1” change vector labelled “restore block before image (8032)”, while the 5th and 8th records in both dumps hold “OP:10.8” change vectors labelled “(kdxlne) … new block has NNN rows”. In the case of the 90-10 split the values for NNN are 1 and 145, in the case of the 50-50 split the values for NNN are 68 and 78 – in that (higher values leaf block first) order.

The 90-10 split and the 50-50 split act in exactly the same way – save the old block, allocate a new block, populate two blocks. It really looks as if code re-use has missed an easy opportunity for some optimisation – why save and rewrite a block when the data content is not going to change ?

Before assuming there’s a bug (or defect, or sub-optimal implementation) though it’s wise to consider whether there might be something else going on – Oracle developers (especially at the deeper levels) tend to have good reasons for what they do so maybe the rewrite is deliberate and serves a useful purpose.

If you do anything with my current test you won’t spot the extra little feature because my test is running a very special edge case – but I had a thought that would justify the cost (and timing) of the rewrite, and I’ll be publishing the idea, the test, and the results tomorrow.

Footnote

It is possible that a leaf node split means Oracle has to insert a pointer into a level 1 branch node that is already full – in which case Oracle will have to allocate a new branch node, share the branch data (including the new leaf pointer) between the two nodes, and insert a new branch pointer into the relevant level 2 branch block … and that may split etc. all the way up to the root. When the root node splits Oracle allocates two new blocks, increasing the branch level by one and keeping the original root block in place (immediately after all the space management blocks) but now pointing to just 2 “branch N-1” level blocks. Oracle will update the statistics “branch node splits” and “root node splits”.

In certain situations (typically relating to very large deletes followed by highly concurrent small inserts) Oracle may run into problems identifying a suitable “free” block while trying to do a split, and this can result in very slow splits that generate a lot of undo and redo which pinning index leaf blocks exclusively (leading to a couple of the more rare “enq – TX:” waits. In this case you may see statistics “failed probes on index block reclamation” and “recursive aborts on index block reclamation” starting to climb.  In theory I think you shouldn’t see more than a handful of “failed probes” per “recursive abort” – but I’ve never been able to model the problem to check that.

 

 

October 28, 2018

Upgrades – again

Filed under: 12c,Histograms,Oracle,Statistics,Upgrades — Jonathan Lewis @ 12:39 pm GMT Oct 28,2018

I’ve got a data set which I’ve recreated in 11.2.0.4 and 12.2.0.1.

I’ve generated stats on the data set, and the stats are identical.

I don’t have any indexes or extended stats, or SQL Plan directives or SQL Plan Profiles, or SQL Plan Baselines, or SQL Patches to worry about.

I’m joining two tables, and the join column on one table has a frequency histogram while the join column on the other table has a height-balanced histogram.  The histograms were created with estimate_percent => 100%. (which explains why I’ve got a height-balanced histogram in 12c rather than a hybrid histogram.)

Here are the two execution plans, 11.2.0.4 first, pulled from memory by dbms_xplan.display_cursor():


SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      12 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      12 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1855 |   1327 |00:00:00.01 |      12 |  2440K|  2440K| 1357K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       6 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")



SQL_ID	f8wj7karu0hhs, child number 0
-------------------------------------
select	       count(*) from	     t1, t2 where	  t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation	    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |	OMem |	1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |	   |	  1 |	     |	    1 |00:00:00.01 |	  41 |	     |	     |		|
|   1 |  SORT AGGREGATE     |	   |	  1 |	   1 |	    1 |00:00:00.01 |	  41 |	     |	     |		|
|*  2 |   HASH JOIN	    |	   |	  1 |	1893 |	 1327 |00:00:00.01 |	  41 |	2545K|	2545K| 1367K (0)|
|   3 |    TABLE ACCESS FULL| T1   |	  1 |	 100 |	  100 |00:00:00.01 |	   7 |	     |	     |		|
|   4 |    TABLE ACCESS FULL| T2   |	  1 |	 800 |	  800 |00:00:00.01 |	   7 |	     |	     |		|
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

The key point is the the difference between the two cardinality estimates. Why has that appeared, and what might the optimizer do in a more complex plan when a cardinality estimates changes?

The difference is only 2% but that was on a couple of data sets I just happened to run up to check something completely different, I wasn’t trying to break something, so who know how big the variation can get. Of course if you’re switching from 11g to 12c then Oracle (Corp.) expects you to be using auto_sample_size anyway so you shouldn’t be producing height-balanced histograms.

So does this difference really matter? Maybe not, but if you (like many sites I’ve seen) are still using fixed percentage sample sizes and are generating histograms it’s another reason (on top of the usual instability effects of height-balanced and hybrid histograms) why you might see plans change as you upgrade from 11g to 12c.

Footnote

It looks as if the difference comes mostly from a coding error in 11g that has been fixed in 12c – I couldn’t find an official bug or fix_control that matched, though. More on that later in the week.

Update

Chinar Aliyev has pointed out that there are three fix-controls that may be associated with this (and other ) changes. From v$system_fix_control these are:

14033181 1 QKSFM_CARDINALITY_14033181   correct ndv for non-popular values in join cardinality comp.         (12.1.0.1)
19230097 1 QKSFM_CARDINALITY_19230097   correct join card when popular value compared to non popular         (12.2.0.1)
22159570 1 QKSFM_CARDINALITY_22159570   correct non-popular region cardinality for hybrid histogram          (12.2.0.1)

I haven’t tested them yet, but with the code easily available in the article it won’t take long to see what the effects are when I have a few minutes. The first fix may also be why I had a final small discrepancy between 11g and 12c on the join on two columns with frequency histograms.

October 25, 2018

Join Cardinality – 4

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 9:09 am GMT Oct 25,2018

In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve  covered two options for the types of histogram involved: frequency to frequency, and frequency to top-frequency. Today it’s time to examine frequency to hybrid.

My first thought about this combination was that it was likely to be very similar to frequency to top-frequency because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an “average selectivity” of the rest of the data.

I was nearly right but the arithmetic didn’t quite work out the way I expected.  Fortunately Chinar Aliyev’s document highlighted my error – the optimizer doesn’t use all the repeat counts, it uses only those repeat counts that identify popular values, and a popular value is one where the endpoint_repeat_count is not less than the average number of rows in a bucket. Let’s work through an example – first the data (which repeats an earlier article, but is included here for ease of reference):

rem
rem     Script:         freq_hist_join_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

set linesize 156
set pagesize 60
set trimspool on

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

As before I’ve got a table with 100 rows using the sqrt() function to generate column j1, and a table with 800 rows using the dbms_random.normal function to generate column j2. So the two columns have skewed patterns of data distribution, with a small number of low values and larger numbers of higher values – but the two patterns are different.

I’ve generated a histogram with 254 buckets (which dropped to 10) for the t1.j1 column, and generated a histogram with 13 buckets for the t2.j2 column as I knew (after a little trial and error) that this would give me a hybrid histogram.

Here’s a simple query, with its result set, to report the two histograms – using a full outer join to line up matching values and show the gaps where (endpoint) values in one histogram do not appear in the other:


define m_popular = 62

break on report skip 1

compute sum of product on report
compute sum of product_rp on report

compute sum of t1_count on report
compute sum of t2_count on report
compute sum of t2_repeats on report
compute sum of t2_pop_count on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null)
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value t1_value,
        f2.value t2_value,
        f1.row_or_bucket_count t1_count,
        f2.row_or_bucket_count t2_count,
        f1.endpoint_repeat_count t1_repeats,
        f2.endpoint_repeat_count t2_repeats,
        f2.pop_count t2_pop_count
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT T1_REPEATS T2_REPEATS T2_POP_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ------------
                    1                     1                     1
         2                     5                     0
         5                    15                     0
         7                    15                     0
        10                    17                     0
        12                    13                     0
        15         15         13         55          0         11
        17         17         11         56          0         34
                   19                    67                    36
        20         20          7         57          0         57
                   21                    44                    44
        22         22          3         45          0         45
                   23                    72                    72           72
                   24                    70                    70           70
        25         25          1         87          0         87           87
                   26                   109                   109          109
                   27                    96                    96           96
                   28                    41                    41
---------- ---------- ---------- ----------            ---------- ------------
                             100        800                   703          434

You’ll notice that there’s a substitution variable (m_popular) in this script that I use to identify the “popular values” in the hybrid histogram so that I can report them separately. I’ve set this value to 62 for this example because a quick check of user_tables and user_tab_cols tells me I have 800 rows in the table (user_tables.num_rows) and 13 buckets (user_tab_cols.num_buckets) in the histogram: 800/13 = 61.52. A value is popular only if its repeat count is 62 or more.

This is where you may hit a problem – I certainly did when I switched from testing 18c to testing 12c (which I just knew was going to work – but I tested anyway). Although my data has been engineered so that I get the same “random” data in both versions of Oracle, I got different hybrid histograms (hence my complaint in a recent post.) The rest of this covers 18c in detail, but if you’re running 12c there are a couple of defined values that you can change to get the right results in 12c.

At this point I need to “top and tail” the output because the arithmetic only applies where the histograms overlap, so I need to pick the range from 2 to 25. Then I need to inject a “representative” or “average” count/frequency in all the gaps, then cross-multiply. The average frequency for the frequency histogram is “half the frequency of the least frequently occurring value” (which seems to be identical to new_density * num_rows), and the representative frequency for the hybrid histogram is (“number of non-popular rows” / “number of non-popular values”). There are 800 rows in the table with 22 distinct values in the column, and the output above shows us that we have 5 popular values totally 434 rows, so the average frequency is (800 – 434) / (22 – 5) = 21.5294. (Alternatively we could say that the average selectivities (which is what I’ve used in the next query) are 0.5/100 and 21.5294/800.)

[Note for 12c, you’ll get 4 popular values covering 338 rows, so your figurese will be: (800 – 338) / (22 – 4) = 25.6666… and 0.0302833]

So here’s a query that restricts the output to the rows we want from the histograms, discards a couple of columns, and does the arithmetic:


define m_t2_sel = 0.0302833
define m_t2_sel = 0.0269118
define m_t1_sel = 0.005

break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null) pop_count
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,100 * &m_t1_sel) t1_count,
        nvl(f2.pop_count,          800 * &m_t2_sel) t2_count,
        case when (   f1.row_or_bucket_count is not null
                   or f2.pop_count is not null
        )    then
                nvl(f1.row_or_bucket_count,100 * &m_t1_sel) *
                nvl(f2.pop_count,          800 * &m_t2_sel)
        end      product_rp
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;


 F1_VALUE   F2_VALUE   T1_COUNT   T2_COUNT PRODUCT_RP
---------- ---------- ---------- ---------- ----------
         2                     5   21.52944   107.6472
         5                    15   21.52944   322.9416
         7                    15   21.52944   322.9416
        10                    17   21.52944  366.00048
        12                    13   21.52944  279.88272
        15         15         13   21.52944  279.88272
        17         17         11   21.52944  236.82384
                   19         .5   21.52944
        20         20          7   21.52944  150.70608
                   21         .5   21.52944
        22         22          3   21.52944   64.58832
                   23         .5         72         36
                   24         .5         70         35
        25         25          1         87         87
                      ---------- ---------- ----------
sum                          102  465.82384 2289.41456

There’s an important detail that I haven’t mentioned so far. In the output above you can see that some rows show “product_rp” as blank. While we cross multiply the frequencies from t1.j1 and t2.j2, filling in average frequencies where necessary, we exclude from the final result any rows where average frequencies have been used for both histograms.

[Note for 12c, you’ll get the result 2698.99736 for the query, and 2699 for the execution plan]

Of course we now have to check that the predicted cardinality for a simple join between these two tables really is 2,289. So let’s run a suitable query and see what the optimizer predicts:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

SQL_ID  cf4r52yj2hyd2, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     108 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     108 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   2289 |   1327 |00:00:00.01 |     108 |  2546K|  2546K| 1194K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |      18 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |      34 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

As you can see, the E-Rows for the join is 2,289, as required.

I can’t claim that the model I’ve produced is definitely what Oracle does, but it looks fairly promising. No doubt, though, there are some variations on the theme that I haven’t considered – even when sticking to a simple (non-partitioned) join on equality on a single column.

October 23, 2018

Upgrade threat

Filed under: 12c,18c,Histograms,Oracle,Statistics,Upgrades — Jonathan Lewis @ 7:50 pm GMT Oct 23,2018

Here’s one I’ve just discovered while trying to build a reproducible test case – that didn’t reproduce because an internal algorithm has changed.

If you upgrade from 12c to 18c and have a number of hybrid histograms in place you may find that some execution plans change because of a change in the algorithm for producing hybrid histograms (and that’s not just if you happen to get the patch that fixes the top-frequency/hybrid bug relating to high values).

Here’s a little test to demonstrate how I wasted a couple of hours trying to solve the wrong problem – first a simple data set:


rem
rem     Script:         18c_histogram_upgrade.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem 

drop table t2 purge;

execute dbms_random.seed(0)

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

I’ve created a skewed data set which (we will see) has 22 distinct values and created a histogram of 13 buckets on it. This will be a hybrid histogram – but different versions of Oracle will produce different histograms (even though the data set is the same for both versions):


select
        j2, count(*)
from
        t2
group by
        j2
order by
        j2
;

select
        endpoint_value                                                            value,
        endpoint_number,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) bucket_size,
        endpoint_repeat_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
;

Here’s the dataset from 12.2.0.1 and 18.3.0.0


        J2   COUNT(*)
---------- ----------
         1          1
         8          3
         9          1
        10          5
        11          4
        12          8
        13         14
        14          9
        15         11
        16         22
        17         34
        18         31
        19         36
        20         57
        21         44
        22         45
        23         72
        24         70
        25         87
        26        109
        27         96
        28         41

22 rows selected.



And here are the histograms - 12.2.0.1 then 18.3.0.0:



     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        18             143          31                    31
        19             179          36                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        28             800         137                    41

13 rows selected.

     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        19             179          67                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        27             759          96                    96
        28             800          41                    41

13 rows selected.

Both histograms have 13 buckets as requested; both are hybrid histograms as expected.

But why does 12c have the value 18 when 18c doesn’t, and why does 18c have the value 27 when 12c doesn’t ?

That’s the second time in two weeks I’ve had reproducible test cases not reproducing – thanks to an 18c upgrade.

Update (See comments)

I had completely forgotten that a previous defect in the construction of hybrid (and Top-N) histograms had been addressed in 18.3 but needed a fix in 12.2 and a backport patch in 12.1.0.2.

Since the defect could “lose” a popular value in order to ensure that both the low and high values were captured in the histogram it’s not surprising that a fix could result in one of the popular values in a histogram dissappearing (after the upgrade) even when the gather had used a 100% sample. Quite possibly the algorithm used to ensure the presence of the high value has had a cascading effect down the histogram that can affect which popular values get into the histogram with repeat counts.

I think I’m going to have to grit my teeth and patch a 12.1.0.2, or update a 12.2.0.1 with exactly the right patch-set to find out.

[It has now been confirmed by Nigel Bayliss that this is a side effect of the fix to the bug 25994960]

October 22, 2018

Column Groups

Filed under: Bugs,CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:36 pm GMT Oct 22,2018

Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:


rem
rem     Script:         distinct_key_prob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2016
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > commment to avoid wordpress format issue
)
select
        cast(mod(rownum-1,10) as number(8,0))   non_null,
        cast(null as number(8,0))               null_col,
        cast(lpad(rownum,10) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > commment to avoid wordpress format issue
;

create index t1_i1 on t1(null_col, non_null);

begin

/*
        dbms_output.put_line(
                dbms_stats.create_extended_stats(user,'t1','(non_null, null_col)')
        );
*/

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

So I have a table with 1,000,000 rows; one of its columns is always null and another has a very small number of distinct values and is never null (though it hasn’t been declared as not null). I’ve created an index that starts with the “always null” column (in a production system we’d really be looking at a column that was “almost always” null and have a few special rows where the column was not null, so an index like this can make sense).

I’ve also got a few lines, commented out, to create extended stats on the column group (non_null, null_col) because any anomaly relating to the handling of the number of distinct keys in a multi-column index may also be relevant to column groups. I can run two variations of this code, one with the index, one without the index but with the column group, and see the same cardinality issue appearing in both cases.

So let’s execute a couple of queries – after setting up a couple of bind variables – and pull their execution plans from memory:


variable b_null    number
variable b_nonnull number

exec :b_null    := 5
exec :b_nonnull := 5

set serveroutput off

prompt  ===================
prompt  Query null_col only
prompt  ===================

select  count(small_vc)
from    t1
where
        null_col = :b_null
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

prompt  =========================
prompt  Query (null_col,non_null)
prompt  =========================

select  count(small_vc)
from    t1
where
        null_col = :b_null
and     non_null = :b_nonnull
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

The optimizer has statistics that tell it that null_col is always null so its estimate of rows where null_col = 5 should be zero (which will be rounded up to 1); and we have an index starting with null_col so we might expect the optimizer to use an index range scan on that index for these queries. Here are the plans that actually appeared:


SQL_ID  danj9r6rq3c7g, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |    24 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    24 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("NULL_COL"=:B_NULL)



SQL_ID  d8kbtq594bsp0, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null and non_null =
:b_nonnull

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  2189 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    27 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  2636K|  2189   (4)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("NULL_COL"=:B_NULL AND "NON_NULL"=:B_NONNULL))

Take a careful look at what we’ve got: the second query has to access exactly the same table rows as those identified by the first query and then apply a second predicate which may discard some of those rows – but the optimizer has changed the access path from a low-cost index-driven access to a high cost tablescan. This is clearly idiotic – there has to be a flaw in the optimizer logic in this situation.

The defect revolves around a slight inconsistency in the handling of columns groups – whether they are explicitly created, or simply inferred by reference to user_indexes.distinct_keys. The anomaly is most easily seen by explicitly creating the column group, gathering stats, and reporting from user_tab_cols.


select
        column_name, sample_size, num_distinct, num_nulls, density, histogram, data_default
from
        user_tab_cols
where
        table_name = upper('T1')
order by
        column_id

;

COLUMN_NAME			       Sample	  Distinct  NUM_NULLS	 DENSITY HISTOGRAM	 DATA_DEFAULT
-------------------------------- ------------ ------------ ---------- ---------- --------------- --------------------------------------------
NON_NULL			    1,000,000		10	    0	      .1 NONE
NULL_COL						 0    1000000	       0 NONE
SMALL_VC			    1,000,000	   995,008	    0 .000001005 NONE
PADDING 			    1,000,000		 1	    0	       1 NONE
SYS_STULC#01EE$DE1QB7UY1K4$PBI	    1,000,000		10	    0	      .1 NONE		 SYS_OP_COMBINED_HASH("NON_NULL","NULL_COL")

As you can see, the optimizer can note that “null_col” is always null so the arithmetic for “null_col = :bind1” is going to produce a very small cardinality estimate; on the other hand when the optimizer sees “null_col = :bind1 and non_null = :bind2” it’s going to transform this into the single predicate “SYS_STULC#01EE$DE1QB7UY1K4$PBI = sys_op_combined_hash(null_col, non_null)”, and the statistics say there are 10 distinct values for this (virtual) column with no nulls – hence the huge cardinality estimate and full tablescan.

The “slight inconsistency” in handling that I mentioned above is that if you used a predicate like “null_col is null and non_null = :bind2″ the optimizer would not use column group because of the “is null” condition – even though it’s exactly the case where the column group statistics would be appropriate. (In the example I’ve constructed the optimizer’s estimate from ignoring the column group would actually be correct – and identical to the estimate it would get from using the column group – because the column is null for every single row.)

tl;dr

Column groups can give you some very bad estimates, and counter-intuitive behaviour, if any of the columns in the group has a significant percentage of nulls; this happens because the column group makes the optimizer lose sight of the number of nulls in the underlying data set.

 

October 19, 2018

add_colored_sql

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 3:08 pm GMT Oct 19,2018

The following request appeared recently on the Oracle-L mailing list:

I have one scenario related to capturing of sql statement in history table..  Like dba_hist_sqltext capture the queries that ran for 10 sec or more..  How do I get the sql stmt which took less time say in  millisecond..  Any idea please share.

An AWR snapshot captures statements that (a) meet some workload criteria such as “lots of executions” and (b) happen to be in the library cache when the snapshot takes place; but if you have some statements which you think are important or interesting enough to keep an eye on that don’t do enough work to meet the normal workload requirements of the AWR snapshots it’s still possible to tell Oracle to capture them by “coloring” them.  (Apologies for the American spelling – it’s necessary to avoid error ‘PLS_00302: component %s must be declared’.)

Somewhere in the 11gR1 timeline the package dbms_workload_repository acquired the following two procedures:


PROCEDURE ADD_COLORED_SQL
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_ID                         VARCHAR2                IN
 DBID                           NUMBER                  IN     DEFAULT

PROCEDURE REMOVE_COLORED_SQL
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_ID                         VARCHAR2                IN
 DBID                           NUMBER                  IN     DEFAULT

You have to be licensed to use the workload repository, of course, but if you are you can call the first procedure to mark an SQL statement as “interesting”, after which its execution statistics will be captured whenever it’s still in the library cache at snapshot time. The second procedure lets you stop the capture – and you will probably want to use this procedure from time to time because there’s a limit (currently 100) to the number of statements you’re allowed to color and if you try to exceed the limit your call will raise Oracle error ORA-13534.


ORA-13534: Current SQL count(100) reached maximum allowed (100)
ORA-06512: at "SYS.DBMS_WORKLOAD_REPOSITORY", line 751
ORA-06512: at line 3

If you want to see the list of statements currently marked as colored you can query table wrm$_colored_sql, exposed through the views dba_hist_colored_sql and (in 12c) cdb_hist_colored_sql. (Note: I haven’t yet tested whether the limit of 100 statements is per PDB or summed across the entire CDB [but see comment #2 below] – and the answer may vary with version of Oracle, of course).


SQL> select * from sys.wrm$_colored_sql;

      DBID SQL_ID             OWNER CREATE_TI
---------- ------------- ---------- ---------
3089296639 aedf339438ww3          1 28-SEP-18

1 row selected.

If you’ve had to color a statement to force the AWR snapshot to capture it the statement probably won’t appear in the standard AWR reports; but it will be available to the “AWR SQL” report (which I usually generate from SQL*Plus with a call to $ORACLE_HOME/rdbms/admin/awrsqrpt./sql).

Footnote

If the statement you’re interested in executes very infrequently and often drops out of the library cache before it can be captured in an AWR snapshot then an alternative strategy is to enable system-wide tracing for that statement so that you can capture every execution in a trace file.

 

October 17, 2018

Problem Solving

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 4:11 pm GMT Oct 17,2018

Here’s a little question that popped up on the Oracle-L list server a few days ago:

I am facing this issue running this command in 11.2.0.4.0 (also in 12c R2 I got the same error)

SQL> SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL;
SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL
ORA-01873: a precisão precedente do intervalo é pequena demais

 

How do you go about finding out what’s going on ? In my case the first thing is to check the translation the error message (two options):

SQL> execute dbms_output.put_line(sqlerrm(-1873))
ORA-01873: the leading precision of the interval is too small

SQL> SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL;
SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL
                                                                                                       *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small

That didn’t quite match my guess, but it was similar, I had been guessing that it was saying something about precision – but it doesn’t really strike me as an intuitively self-explanatory message, so maybe a quick check in $ORACLE_HOME/rdbms/mesg/oraus.msg to find the error number with cause and action will help:


01873, 00000, "the leading precision of the interval is too small"
// *Cause: The leading precision of the interval is too small to store the
//  specified interval.
// *Action: Increase the leading precision of the interval or specify an
//  interval with a smaller leading precision.

Well, that doesn’t really add value – and I can’t help feeling that if the leading precision of the interval is too small it won’t help to make it smaller. So all I’m left to go on is that there’s a precision problem of some sort and it’s something to do with the interval, and probably NOT with adding the interval to the timestamp. So let’s check that bit alone:


SQL> SELECT NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL;
SELECT NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL
                                    *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small


So the interval bit is the problem. Since the problem is about “precision”, let’s try messing about with the big number. First I’ll do a bit of cosmetic tidying by doing the division to knock off the trailing zeros, then I’ll see what happens when I divide by 10:

SQL> SELECT NUMTODSINTERVAL(285016680, 'SECOND') from dual;

NUMTODSINTERVAL(285016680,'SECOND')
---------------------------------------------------------------------------
+000003298 19:18:00.000000000

So 285 million works, but 2.85 billion doesn’t. The value that works give an interval of about 3,298 days, which is about 10 years, so maybe there’s an undocumented limit of 100 years on the input value; on the other hand the jump from 285 million to 2.85 billion does take you through a critical computer-oriented limit: 231 – 1, the maximum signed 32 bit integer (2147483647) so lets try using that value, and that value plus 1 in the expression:


SQL> SELECT NUMTODSINTERVAL(power(2,31), 'SECOND') from dual;
SELECT NUMTODSINTERVAL(power(2,31), 'SECOND') from dual
                       *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small


SQL> SELECT NUMTODSINTERVAL(power(2,31)-1, 'SECOND') from dual;

NUMTODSINTERVAL(POWER(2,31)-1,'SECOND')
---------------------------------------------------------------------------
+000024855 03:14:07.000000000

1 row selected.

Problem identified – it’s a numeric limit of the numtodsinterval() function. Interestingly it’s not documented in the Oracle manuals, in fact the SQL Reference manual suggests that this shouldn’t be a limit because it says that “any number value or anything that can be cast as a number is legal” and in Oracle-speak a number allows for roughly 38 digits precision.

Whilst we’ve identified the problem we still need a way to turn the input number into the timestamp we need – the OP didn’t need help with that one: divide by sixty and convert using minutes instead of seconds:


SQL> SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000/60, 'MINUTE') FROM DUAL;

TO_TIMESTAMP('1970-01-0100:00:00.0','YYYY-MM-DDHH24:MI:SS.FF')+NUMTODSINTER
---------------------------------------------------------------------------
26-APR-60 01.00.02.000000000 AM

1 row selected

Job done.

October 15, 2018

Faking Histograms

Filed under: Uncategorized — Jonathan Lewis @ 1:37 pm GMT Oct 15,2018

This is a short index of articles I’ve written on how to create the different types of histogram that the optimizer uses:

  • Faking a frequency histogram    How to create frequency histograms (using a numeric column for the example)
  • Histogram Tip  An example of creating a simple character-based frequency histogram (published in the IOUG Tips booklet 2014).
  • Faking a height-balanced histogram  How to create a height-balanced histogram (using a numeric column for the example).
  • Hybrid Fake: How to create a hybrid histogram (using a character column for the example).
  • Extended Histogram:  faking values into a histogram for a column group – only special because we need to derive the value stored.
  • Top frequency:  I haven’t yet worked out how to fake a Top Frequency histogram. Since it’s little more than a frequency histogram where the optimizer knows there’s a further small percentage (less than one bucketful) of other data, this doesn’t worry me; if necessary I’ll just create a “good enough” frequency histogram and set a suitable density for the remainder.

And a couple of miscellaneous things about histograms

  • Big number problem – older versions of Oracle (pre 12c) can go wrong with data values more than 15 digits long
  • Long strings problem – until 12c Oracle stored at most 32 bytes of a string in the endpoint_actual_value column.
  • Hybrid/Top-N problem – a bug, fixed in 12.2 with a patch for 12.1.
  • Upgrade threat – a step you need to take to upgrade from 11.2.0.3 if you have histograms on char() columns
  • Upgrade threat 2 (Oracle-L) – if you’ve got a big history of histograms then the upgrade from 11.2.0.3 (or earlier) could take a long time

 

October 14, 2018

Soup

Filed under: Non-technical — Jonathan Lewis @ 7:02 pm GMT Oct 14,2018

When my mother-in-law comes round to Sunday lunch we often have roast chicken – and a serious error in estimating the requirement for the vegetable bed I was roasting on led to the discovery of home-made soup. (I did warn you that my post-operative posts would be light-weight)

Ingredients

  • Remnants of cooked chicken
    • (or 250ml, 1/3rd pint, one cup of  boiled water with a vegetable or chicken stock cube)
  • One large carrot
  • One small parsnip
  • One medium onion
    • The three vegetables should be similar in weight: roughly 200g, a bit less than 1/2 lb.
    • (Or just about any leftover vegetables from a roast dinner – I’ve even used leeks in cheese sauce)
  • Greek yoghurt (One tablespoon)
  • Choice of herbs and seasoning.

 

Method

  • Discard any large areas of fatty skin from the chicken carcase
  • Break up the carcase and place in saucepan with 500ml (2 cups, 3/4 pint) water
  • Bring to the boil and simmer for about 30 minutes with saucepan lid on.
  • Strain into a fresh saucepan, discard the remnants of chicken

 

  • Top, tail and peel the carrot, cut into discs
  • Top, tail and peel turnip, cut into small chunks
  • Peel and dice the onion.
  • Mix the stock and vegetables in a fresh saucepan
  • add herbs and seasoning
    • (I like a teaspoon of chopped tarragon – fresh from the garden)
  • Simmer with lid on for about 20 minutes
    • (until the carrots are softened)

 

  • Tip contents into blender
  • Blend until smooth.
  • Add a rounded tablespoon of plain Greek yoghurt, and blend
  • Serve – 2 large or 3 small portions

Despite looking a bit boring before it goes into the blender the soup tends to come out a surprisingly cheerful sunshine yellow thanks to the carrot. If you’ve managed to get the same results as I do then, because of the yoghurt I think, the texture will be almost like a thick foam or very light mousse.

I typically end up making the stock immediately after lunch is over, then keep it in the fridge for a couple of days before making the soup; that does mean you can skim off any excess fat before using the stock for the soup. And if there’s any gravy left over from lunch that’s a bonus to go in the soup.

 

October 12, 2018

dbms_log

Filed under: Oracle — Jonathan Lewis @ 4:39 pm GMT Oct 12,2018

I’ve been a long time (though occasional) user of the undocumented dbms_system package, typically using it to write messages or insert break lines in trace files (or the alert log). Thanks to an email from Cary Millsap I’ve recently discovered that the procedures for writing to trace files have been copied to a separate dbms_log package – which is nice because some of the things in dbms_system shouldn’t be made available to general code, for example the procedure kcfrms which resets a number of the “max time” columns in various dynamic performance views. It can be very useful occasionally – you might even want to call it just before or just after every AWR snapshot – but I’d rather that no-one else was able to call if I thought that I needed to  do so.

The dbms_log package is also (currently – even in 18.3) undocumented but maybe one day soon it will appear in the PL/SQL Packages and Types reference manual. The procedures available in the package are as follows:

SQL> desc dbms_log
PROCEDURE KSDDDT
PROCEDURE KSDFLS
PROCEDURE KSDIND
 Argument Name			Type			In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LVL				BINARY_INTEGER		IN
PROCEDURE KSDWRT
 Argument Name			Type			In/Out Default?
 ------------------------------ ----------------------- ------ --------
 DEST				BINARY_INTEGER		IN
 TST				VARCHAR2		IN
  • ksdddt moves to a new line in the trace file writes the date and moves to the next line but won’t do anything if the trace file has not already been opened; so if you want a datestamp at the top of the output you actually have to start with a ksdwrt or ksdind call.
  • ksdfls flushes any pending writes to the trace file / alert log and closes the file – which isn’t relevant in my example but would make a difference to when you see the text that has been written from inside a pl/sql block.
  • ksdind writes an “indent” of lvl colon (‘:’) symbols to the trace file. This is a one-off effect, it doesn’t set an indent for all future lines it merely writes the ‘:’ so that the next call to ksdwrt appends its text after the colons.
  • ksdwrt writes tst to the trace file if dest = 1, to the alert log if dest = 2 and to both if dest = 3, adding a new-line at the end of the text.

Here’s a fragment of code calling the procedures to write something to my trace file:


execute dbms_log.ksdddt
execute dbms_log.ksdwrt(1,'Starting')
execute dbms_log.ksdddt
execute dbms_log.ksdwrt(1,'Underlining')
execute dbms_log.ksdind(20)
execute dbms_log.ksdwrt(1,'Indented')
execute dbms_log.ksdwrt(1,'Not Indented')
execute dbms_log.ksdind(30)
execute dbms_log.ksdddt
execute dbms_log.ksdwrt(1,'Finished')
execute dbms_log.ksdind(30)
execute dbms_log.ksdfls


Here’s the text that appears in the trace files:


Starting

*** 2018-10-04T16:31:15.525515+01:00 (ORCL(3))
Underlining
::::::::::::::::::::Indented
Not Indented
::::::::::::::::::::::::::::::
*** 2018-10-04T16:31:15.532881+01:00 (ORCL(3))
Finished
::::::::::::::::::::::::::::::

Note how the call to ksdddt in line 1 of code didn’t write a date into the trace file because it wasn’t yet open. The call to ksdwrt in line 2 writes ‘Starting’ and moves to a new line so we get a blank line when the call to ksdddt in line 3 moves to a new line and writes the date. At line 5 we “indent 20”, so the ksdwrt at line 6 starts after the string of colons, then moves to a new line where the indent is not repeated. We indent again at line 8, which leaves us at the end of a line, so when we call ksdddt it moves to the start of the next line and writes the date there – we don’t get a blank line.

Footnote: when I saw Cary Millsap’s note I assumed that the procedures had been copied across in a recent version of Oracle; in fact dbms_log has been around since at least 11.2.0.4

Footnote 2: While checking my library for references to dbms_system I came across a script I’d used to create a local version of dbms_system that allowed me to execute a call to “dbms_system.set_bool_param_in_sesssion(‘#_SILVER_BULLET’, true)”. I used it at the IOUG conference in 2006 to demonstrate that if you set this “very hidden” parameter to true then some of your queries could run faster.  (What the call actually did was enable query rewrite and disable function-based indexes so that a special time-wasting index I’d created couldn’t get used.)

 

October 10, 2018

Hybrid Fake

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 3:12 pm GMT Oct 10,2018

Oracle 12c introduced the “Hybrid” histogram – a nice addition to the available options and one that (ignoring the bug for which a patch has been created) supplies the optimizer with better information about the data than the equivalent height-balanced histogram. There is still a problem, though, in the trade-off between accuracy and speed: just as it does with height-balanced histograms when using auto_sample_size Oracle samples (typically) about 5,500 rows to create a hybrid histogram, and the SQL it uses to generate the necessary summary is essentially an aggregation of the sample, so either you have a small sample with the risk of lower accuracy or a large sample with an increase in workload. This being the case it’s worth knowing how to create a hybrid histogram using the dbms_stats.set_column_stats() API.

It’s fairly easy to identify the cases where a hybrid histogram could be helpful.  You have a large volume of data spread over a large number (more than 2048) of distinct values, but a few values (typically less than 250) which are responsible for a significant fraction of the data. You would like to tell Oracle about the special “extreme” cases so that the optimizer can take defensive if you query for one of those values, but at the same time you would like to give Oracle a picture of the way the rest of the data is distributed. This is similar in some respects to the Top-N (a.k.a. Top-Frequency) histogram which says to Oracle “We have a small number of popular values, and some odds and ends on the side that are pretty ignorable”, the critical difference is that you need the hybrid histogram when it’s not safe to “ignore” the odds and ends.

Here’s an example of creating some data and then generating a completely artificial hybrid histogram. The code demonstrates 3 points – the principle feature of creating hybrid histograms and a couple of generic details about Oracle’s histograms:

  • The main point is that Oracle 12c introduces a new numeric array in the dbms_stats.statrec structure. This allows each row (bucket) in a histogram to hold a second statistic about the bucket so we can now store a frequency figure for the bucket as a whole, and a “repeat-count” figure for the highest value in the bucket. (Warning – there is a counter-intuitive conflict between the name of the new structure and the way it is used for hybrid histograms).
  • As side-point I’ve included a code variation that shows you the remarkable similarity between generating a Frequency histogram and a Hybrid histogram.
  • As a second side-point I have also highlighted the effect you see in the dba_tab_histograms view when your popular values are “too similar” to each other – i.e. when they match on the first 6 characters.

We start by creating a table as a copy of the view all_objects – then we’re going to create a hybrid histogram on the object_type column that looks nothing like the  data. The histogram will say:

  • for every 15,000 rows (where the column is not null)
    • 5,000 will have values less than or equal to ‘C’, of which 3,000 will have the value ‘C’
    • The next 2,000 (i.e. running total 7,000) will have values greater than ‘C’ and up to ‘PPPPPP1’, but ‘PPPPPP1’ itself is not a popular value
    • The next 2,000 (i.e. running total 9,000) will have values greater than ‘PPPPPP1’ and up to ‘PPPPPP2’, but ‘PPPPPP2’ itself is not a popular value
    • The next 2,000 (i.e. running total 11,000) will have values greater than ‘PPPPPP2’ and up to ‘PPPPPP3’, but ‘PPPPPP3’ itself is not a popular value
    • The last 4,000 (i.e. running total 15,000) will have values greater than ‘PPPPPP3’ and up to ‘X’ of which 3,000 will have the value ‘X’

Note particularly that the “how many rows hold the endpoint value” are stored in the statrec.bkvals array – just as they would be for a frequency histogram – and the cumulative count of rows is stored in the statrec.rpcnts structure. All we have to do to create a frequency histogram instead of a hybrid histogram is to store zeros in the statrec.rpcnts structure, or leave it uninitialized.

You’ll notice that since I’m creating a histogram on a character column I’ve used an array of type dbms_stats.chararray to hold the list of values (in ascending order) that I want the histogram to describe.


rem
rem     Script:         12c_hybrid_histogram_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2018
rem 

create table t1
as
select * from all_objects
;

begin
        dbms_stats.gather_table_stats(
                ownname         => null,
                tabname         => 't1',
                method_opt      => 'for all columns size 1'
        );
end;
/

declare
                c_array         dbms_stats.chararray;
                m_rec           dbms_stats.statrec;
                m_distcnt       number;
                m_density       number;
                m_nullcnt       number;
                m_avgclen       number;

begin
        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 'T1',
                colname         => 'OBJECT_TYPE', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => m_rec,
                avgclen         => m_avgclen
        );

        m_rec.epc    := 5;

        c_array      := dbms_stats.chararray( 'C',  'PPPPPP1',  'PPPPPP2',  'PPPPPP3',   'X');
        m_rec.bkvals := dbms_stats.numarray (3000,          1,          1,          1,  3000);

        m_rec.rpcnts := dbms_stats.numarray (5000,       7000,       9000,      11000, 15000);
--      m_rec.rpcnts := dbms_stats.numarray (0000,       0000,       0000,       0000, 00000);

        dbms_stats.prepare_column_values(m_rec, c_array);

        dbms_stats.set_column_stats(
                ownname         => user,
                tabname         => 'T1',
                colname         => 'OBJECT_TYPE', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => m_rec,
                avgclen         => m_avgclen
        ); 
end;
/

That’s it – it’s remarkably simple. To show the effect of running this code I can report the content of user_tab_histograms for the column. I’ve actually run the code and queried the results twice; first for the case where I created the hybrid histogram and then after modifying the PL/SQL block to set the rpcnts array to zeros to create a frequency histogram.


column endpoint_actual_value format a22
column endpoint_value        format 999,999,999,999,999,999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value, endpoint_actual_value, endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'OBJECT_TYPE'
order by
        endpoint_value
;

With non-zero rpcnts (hybrid histogram)
=======================================
ENDPOINT_NUMBER                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
--------------- ------------------------------------------------ ---------------------- ---------------------
           3000  347,883,889,521,833,000,000,000,000,000,000,000 C                                       3000
           7000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP1                                    1
           9000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP2                                    1
          11000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP3                                    1
          15000  456,922,123,551,065,000,000,000,000,000,000,000 X                                       3000


With rpcnts set to zero (frequency histogram)
=============================================
ENDPOINT_NUMBER                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
--------------- ------------------------------------------------ ---------------------- ---------------------
           3000  347,883,889,521,833,000,000,000,000,000,000,000 C                                          0
           3001  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP1                                    0
           3002  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP2                                    0
           3003  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP3                                    0
           6003  456,922,123,551,065,000,000,000,000,000,000,000 X                                          0

I made a comment earlier on that the naming and use of the rpcnts structure was somewhat counter-intuitive. As you can see in the results above, when I created the hybrid histogram the values I stored in the rpcnts structure are not the values reported as the “repeat count”, the numbers reported as the “repeat count” are from the bkvals (bucket values).  As far as I’m concerned this means I have to go back to my basic examples every time I want to fake a histogram because I’m never too sure which arrays I should populate with what values – and whether I should use absolute or cumulative values.

One last minor point: you’ll see that the endpoint_actual_value has been populated in this example. This is because (with Oracle’s interesting transformation from character to numeric) the three ‘PPPPPPx’ character values turn into the same number – so Oracle stores the first 64 bytes (or 32 for versions of Oracle prior to 12c) of the actual value.

 

Eye, Eye, Cap’n

Filed under: Non-technical — Jonathan Lewis @ 2:00 pm GMT Oct 10,2018

By the time you read this I will have had the lenses in both my eyes replaced, so I won’t be staring at a computer screen for a while – and that means, in particular, I won’t be doing any further investigation into join cardinality for a while.

For those who might otherwise feel deprived of my exquisite prose I have, however, prepared a couple of lightweight articles for automatic launching over the next few days. But please don’t expect any prompt follow-ups if you add comments or send email over the next couple of weeks.

October 9, 2018

Join Cardinality – 3

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:01 pm GMT Oct 9,2018

In the previous posting I listed the order of precision of histograms as:

  • Frequency
  • Top-Frequency
  • Hybrid
  • Height-balanced
  • None

Having covered the Frequency/Frequency join (for a single column, no nulls, equijoin) in the previous posting I’ve decided to work down the list and address Frequency/Top-Frequency in this posting. It gets a little harder to generate data as we move to the less precise histograms since we need to have skew, we want some gaps, and (for Top-Frequency) we need to have some data that can be “ignored”. On the plus side, though, I want to work with a small number of buckets to keep the output of any queries I run fairly short so I’m going to stick with a small number of buckets, which means the “small” volume of “ignorable” data (the “spare” bucket) can be relative large. Here’s the code I used to generate data for my investigation – 100 rows for the table with a frequency histogram and 800 rows for the table with a top-frequency.


rem
rem     Script:         freq_hist_join_05.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 16'
        );
end;
/

In this example I’ve used the sqrt() function and the dbms_random.normal() function to generate the data. The scaling and truncating I’ve done on the results has given me two sets of data which have a nice skew, some gaps, but different patterns (though both have a small number of small values and a larger number of larger values). The data from dbms_random.normal() will produce 22 distinct values, so I’ve requested a histogram with 16 buckets and checked that this will produce a Top-Frequency histogram. (If I want a Hybrid histogram – for the next thrilling installment in the series – I’ll just reduce the number of buckets slightly).

Here are the resulting stats, preceded by the code that reported them:


select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by table_name
;

select  table_name, num_rows
from    user_tables
where   table_name in ('T1','T2')
order by table_name
;

break on table_name skip 1 on report skip 1

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   TOP-FREQUENCY             22          16    .000625

TABLE_NAME             NUM_ROWS
-------------------- ----------
T1                          100
T2                          800

TABLE_NAME                VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
-------------------- ---------- ------------------- ---------------
T1                            2                   5               5
                              5                  15              20
                              7                  15              35
                             10                  17              52
                             12                  13              65
                             15                  13              78
                             17                  11              89
                             20                   7              96
                             22                   3              99
                             25                   1             100

T2                            1                   1               1
                             13                  14              15
                             15                  11              26
                             16                  22              48
                             17                  34              82
                             18                  31             113
                             19                  36             149
                             20                  57             206
                             21                  44             250
                             22                  45             295
                             23                  72             367
                             24                  70             437
                             25                  87             524
                             26                 109             633
                             27                  96             729
                             28                  41             770

Table t1 reports 100 rows, 10 distinct values and a Frequency histogram with 10 buckets.
Table t2 reports 800 rows, 22 distinct values and a Top-Frequency histogram with 16 buckets.

Things we notice from the histograms are: t1 has a range from 2 to 25, while t2 has a range from 1 to 28. We also notice that the highest endpoint_number for t2 is only 770 out of a possible 800 – we’ve “lost” 30 rows. We don’t really care what they are for the purposes of the arithmetic, but if we did a quick “select j2, count(*)” query we’d see that we had lost the following:


SQL> select j2, count(*) from t2 group by j2 order by count(*), j2;

	J2   COUNT(*)
---------- ----------
	 1	    1
	 9	    1  *
	 8	    3  *
	11	    4  *
	10	    5  *
	12	    8  *
	14	    9  *
	15	   11
...

The reason why the total number of rows accounted for is less than the total number of rows in the table comes in two parts. The Top-Frequency histogram is designed to hold the Top N most popular entries in the table, so there will be some entries that don’t make an appearance in the histogram despite contributing rows to the total table count; the number of “lost” rows can then be increased because the Top N popular values may not include the column low and high values, and these two values must appear in the histogram. Looking at the output above we can see that we could have reported 14 as the 16th most popular value, instead we have to record 1, losing a further 9 rows and regaining 1.

Let’s test the pure join query on the two tables to see what the optimizer is predicting as the join cardinality, and then try to re-create that cardinality from the histogram data:


alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*) 
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1608 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1355K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Our target is to work out how we can query the histogram data in a way that gets the result 1,608. Ideally we’ll also think of a rationale for justifying our method, and then we’ll apply the same method with 15 buckets and 17 buckets, and with a couple of variations to the data (e.g. update all rows where j1 = 25 to set j1 = 28), to see if the method still gets the right result.

All we did with the frequency/frequency join was to join the two histograms on matching values, multiply the frequencies on each resulting row , then sum down the set, and this automatically eliminated rows which were outside the “highest low” and “lowest high” (i.e. we only examined rows where the histograms overlapped). We might hope that things shouldn’t be too different when one of the histograms is a top-frequency histogram.

There is an important difference, though, between frequency and top-frequency histograms – in the latter case there are values in the table which will not be in the histogram, so we ought to make some allowance for these (even though it’s only “one bucket’s worth”). It’s possible that some of these values might match values in the frequency histogram so we need to include a mechanism for adding in a factor to allow for them. So as a first step let’s work out the “average number of rows per value” for the missing values.

We have 22 distinct values and 16 end points so there are 6 missing values. We have 800 rows in the table but only 770 rows reported in the histogram so there are 30 missing rows. So let’s say the missing values have an average cardinality of 30/6 = 5 (and we might extend that to say they have an average selectivity of 5/800 = 0.00625).

Let’s bring that value into the query we wrote for the frequency/frequency case by using an outer join (which I’ll write as an “ANSI” Full Outer Join”) with a predicate in place that restricts the result to just the overlapping range, which is [2,25], the “higher low value” and “lower high value” across the two histograms. Here’s some code – with an odd little detail included:


column product format 999,999,999.99
compute sum of product on report

compute sum of t1_count on report
compute sum of t1_value on report
compute sum of t2_count on report
compute sum of t2_value on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,0.00) t1_count, 
        nvl(f2.row_or_bucket_count,800*0.00625) t2_count,
        nvl(f1.row_or_bucket_count,0.00) * 
        nvl(f2.row_or_bucket_count,800*0.006250) product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;

I’ve included an nvl() on the columns for the top-frequency histograms that convert nulls (i.e. the preserved rows derived from the frequency histogram) into the average frequency we’ve just calculated, using the “num_rows * selectivity” representation. The odd little detail that I commented on above does something similar for the preserved rows derived from the top-frequency histogram because this first guess at the calculation was wrong and needed an adjustment which I’m anticipating. Here are the results I got with this code:

  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13          0         14             .00
        15         15         13         11          143.00
                   16          0         22             .00
        17         17         11         34          374.00
                   18          0         31             .00
                   19          0         36             .00
        20         20          7         57          399.00
                   21          0         44             .00
        22         22          3         45          135.00
                   23          0         72             .00
                   24          0         70             .00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233        100        548        1,463.00

The figure is too low, so there has to be an adjustment. What if the code is allowing for the “maybe there are other values” algorithm that the optimizer uses with fequency histograms ? If you’ve gathered a frequency histogram on a column but query it with a value that isn’t in the histogram than Oracle applies an algorithm that looks like: “if you’re asking for something that isn’t in the histogram I’ll assume that there must be some data there and use a frequency that’s half the lowest frequency I have recorded”**Important footnote. The value 25 appears once in our histogram so let’s include a fudge-factor of 0.5 (i.e. half a row) in the nvl() expression for the t1 frequencies and see what happens. This is what the new results look like:


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13         .5         14            7.00
        15         15         13         11          143.00
                   16         .5         22           11.00
        17         17         11         34          374.00
                   18         .5         31           15.50
                   19         .5         36           18.00
        20         20          7         57          399.00
                   21         .5         44           22.00
        22         22          3         45          135.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233      103.5        548        1,607.50

Since we were looking for 1,608 I’m going to call that a success. I can check precision, of course, by looking at the 10053 trace file. Extracting a few critical lines:

egrep -e"Density" -e"Join Card" orcl12c_ora_6520_BASELINE.trc

    AvgLen: 3 NDV: 22 Nulls: 0 Density: 0.006250 Min: 1.000000 Max: 28.000000
    AvgLen: 3 NDV: 10 Nulls: 0 Density: 0.005000 Min: 2.000000 Max: 25.000000

Join Card:  1607.500000 = outer (100.000000) * inner (800.000000) * sel (0.020094)

The “Density” lines come from the column statistics – note the 0.00625 that matches the “average selectivity” I derived from the top-frequency figures. You might also note that the “half the least frequent value” could be derived from the t1.j1 density (0.005) * t1.num_rows (100).

The “Join Card” line is exactly what it says – the join cardinality calculation showing that the plan’s prediction of 1,608 rows was actually a rounded 1607.5

There is one more important thing to check before I start tweaking the data to see if there are any other factors involved. Is the 0.5 I stuck into the query really the value of “half the least common frequency” or is it a fixed value in all cases. A nice easy way of testing this is to update the t1 table to change one row from 22 to 25 (22 will still be present in the table and histogram before and after this test, so it’s a minimal and safe change). Making this change and re-running the calculation query leaving the 0.5 unchanged gives the following:


update t1 set j1 = 25 where j1 = 22 and rownum = 1;

...

                   21         .5         44           22.00
        22         22          2         45           90.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          2         87          174.00
                      ---------- ---------- ---------------
sum                        103.5        548        1,649.50

Without reporting all the details:

  • the estimate in the plan went up from 1,608 to 1,794
  • leaving 0.5 in the query the derived result was 1,649.5 (last few lines of output above)
  • changing the 0.5 to 1.0 the derived result was 1,794.0

Conclusion – the “fudge factor” is consistent with the model the optimizer uses with frequency histogram calculations. The optimizer models “missing” rows in the join calculation as “half the number of the least frequently occuring value**Important footnote

Filter Predicates:

After a dozen tests varying the number of buckets in the top-frequency histogram (and checking it really was still a top-frequency histogram), and tweaking the t1 (frequency histogram) data to use values on the boundaries of, or outside, the range of the t2 (top-frequency) data, I concluded that my approach was probably correct. Outer join the two histograms, restrict to the overlap, supply the “num_rows * density” figure on the top-frequency side, and “half the lowest frequency”**Important footnote on the frequency side, and the query produces the same result as the optimizer for the pure join cardinality.

So the next step is to check what happens when you add filter predicates on one, or both, sides. I listed a fragment of code earlier on to execute the pure join and count the number of rows it produced, enabling the 10053 trace and pulling the actual plan from memory at the same time. I repeated this code with 3 variations and checked the “Join Card” lines from the resulting trace files:


select count(*) from  t1, t2 where  t1.j1 = t2.j2
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2
select count(*) from  t1, t2 where  t1.j1 = t2.j2                and t2.n30 = 25
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2 and t2.n30 = 25

egrep -e"Join Card" orcl12c_ora_10447*.trc

orcl12c_ora_10447_BASELINE.trc:Join Card:  1607.500000 = outer (800.000000) * inner (100.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ1.trc:Join Card:  401.875000 = outer (800.000000) * inner (25.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ2.trc:Join Card:  53.583333 = outer (100.000000) * inner (26.666667) * sel (0.020094)
orcl12c_ora_10447_FILTJ1J2.trc:Join Card:  13.395833 = outer (26.666667) * inner (25.000000) * sel (0.020094)

As you can see in all 4 cases, Oracle reports an inner and outer cardinality estimate and a join selectivity. The join selectivity remains unchanged throughout; it’s the value we can derive from our pure join test (0.020094 = 1607.5 / (100 * 800)). All that changes is that the individual table predicates are applied to the base tables before the join selectivity is applied to the product of the filtered base table cardinalities:

  • Column n04 has 4 distinct values in 100 rows – filter cardinality = 100/4 = 25
  • Column n30 has 30 distinct values in 800 rows – filter cardinality = 800/30 = 26.66666…

Conclusion

For a single column equijoin on columns with no nulls where one column has a frequency histogram and the other has a top-frequency histogram the optimizer calculates the “pure” join cardinality using the overlapping range of column values and two approximating frequencies, then derives the filtered cardinality by applying the base table filters, calculates the cardinality of the cartesian join of the filtered data sets, then multiplies by the pure join selectivity.

 

 

**Important Footnote  Until Chinar Aliyev questioned what I had written, I had never noticed that the “half the lowest frequency” that I describe at various point in the arithmetic was anything other than a fixed fudge factor. In fact, in perfect symmetry with the expression used for the average selectivity in the top-frequency part of the calculcation, this “fudge factor” is simple “num_rows * column_density” for the column with the frequency histogram. (Whether the “half the lowest frequency” drops out as a side effect of the density calculation, or whether the column density is derived from half the lowest frequency is another matter.)

October 8, 2018

Random Upgrade

Filed under: 18c,Oracle,Upgrades — Jonathan Lewis @ 1:36 pm GMT Oct 8,2018

Here’s a problem that (probably) won’t affect the day to day running of most systems – but it could be a pain in the backside for people who write programs to generate repeatable test data. I’m not going to say much about the problem, just leave you with a test script.


rem
rem	Script	random_upgrade.sql
rem	Author:	Jonathan Lewis
rem	Dated:	Oct 2018
rem
rem	Last tested
rem		18.3.0.0
rem		12.2.0.1
rem	Notes
rem	In the upgrade from 12.2.0.1 something
rem	changed that meant
rem		create as select dbms_random
rem	gets different data from
rem		select dbms_random
rem

drop table t4 purge;
drop table t3 purge;
drop table t2 purge;
drop table t1 purge;
drop table t0 purge;

set feedback off

create table t0 as
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
;


execute dbms_random.seed(0);

create table t1
as
select dbms_random.normal
from
	t0
;

execute dbms_random.seed(0);

create table t2
as
with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	dbms_random.normal
from
	g1
;

prompt	=================
prompt	Diff the two CTAS
prompt	=================

select count(*)
from (
select * from t1
minus
select * from t2
union all
select * from t2
minus
select * from t1
)
;


create table t3 
as 
select * from t2 
where rownum < 1 -- > comment to avoid WordPress format issue
;

create table t4 
as 
select * from t2 
where rownum < 1 -- > comment to avoid WordPress format issue
;

execute dbms_random.seed(0)

insert into t3
select dbms_random.normal
from
	t0
;

execute dbms_random.seed(0)

insert into t4
with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select
	dbms_random.normal
from
	g1
;


prompt	===================
prompt	Diff the two Insert
prompt	===================

select count(*)
from (
select * from t3
minus
select * from t4
union all
select * from t4
minus
select * from t3
)
;


prompt	===========
prompt	Sum of CTAS
prompt	===========

select sum(normal) from t1;

prompt	=============
prompt	Sum of Insert
prompt	=============

select sum(normal) from t3;


execute dbms_random.seed(0)

prompt	=============
prompt	Sum of select
prompt	=============

with g1 as (
	select rownum id
	from dual
	connect by
		level <= 1e4 -- > comment to avoid WordPress format issue
)
select sum(n) from (
select
	dbms_random.normal n
from
	g1
)
;


I’m repeatedly using dbms_random.seed(0) to reset the random number generator and trying to generate 10,000 normally distributed numbers. (I’ve chosen the normal distribution because that happened to be the function in a script I sent someone with the comment that “this will recreate the data for the demonstration” – and they wrote back to say that it didn’t.)

I’ve got two “create as select”, and two “insert as select”. One of each pair selects from a real existing table to get 10,000 rows, the other uses the “select dual connect by” trick to generate rows. I’ve written SQL that shows whether or not the two pairs of tables end up with the same data (they do, pairwise), then I’ve summed one table from each pair to see if the different mechanisms produce the same data – and that depends on the version of Oracle you’re using. Finally I’ve reset the random number generator and summed across a pure select to see what that produces.

If you run this code on 12.2.0.1 or earlier you’ll see that the “diffs” report zeros and the “sums” report -160.39249. If you upgrade to 18.3 the diffs will still report zeros and some of the sums will still report -160.39249 but the sum of the CTAS will report -91.352172.

Bottom Line

If you’ve got code that you wrote to create reproducible test cases and the code uses: “create table … as select … dbms_random …” then it won’t produce the same data when you upgrade to 18.3. You’ll have to modify the code to do “create table (); insert as select …”.

As of this afternoon I have 1,209 test scripts on my laptop that use the dbms_random package to model data distribution patterns. It is almost certain that I will end up modifying every single one of them eventually.

There are words to express how I feel about this – but not ones that I would consider publishing.

« Previous PageNext Page »

Powered by WordPress.com.