fragmentation | Oracle Scratchpad

December 5, 2022

Row Migration

Filed under: 19c,fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 12:11 pm GMT Dec 5,2022

This is nothing more than a basic update of a note that I wrote 8 years ago. The update was triggered by a brief comment made by Martin Widlake at the recent UKOUG annual conference “Breakthrough 2022” in Birmingham. In his presentation on “wide tables”, he mentioned row migration and the possible effects of a row having to migrate many times as it grew and the possibility (of which he was not certain as he had only a vague memory of hearing the claim at some unspecified time in the past) that it might leave a “long chain of pointers” from the header to the final location of the migrated row.

It occurred to me that the vague memory might have been due to my blog note from 2014 explaining that this doesn’t happen. If a row migrates (i.e. the whole row gets moved to a new location leaving only a header behind pointing to the new location) then at a future point in time it might migrate to a 3rd (or 4th or more) location and update the header pointer, or it might actually migrate back to the original location if space has since become available.

The following script (originally created on 10gR2, but updated for 19.11 and modified for ease of retesting) builds a table, performans a series of updates on a row, and dumps the header block after each update.

rem
rem     Script:         row_migration.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2014 / Dec 2022
rem
rem     Last tested 
rem             10.2.0.5
rem             19.11.0.0
rem

create table t1 (
        id      number(6,0),
        v1      varchar2(1200)
)
pctfree 0
;


insert  into t1 
select  rownum - 1, rpad('x',100) i
from    all_objects i
where   rownum <= 75
;

commit;

prompt  =========================================
prompt  Get the relative file and block number of 
prompt  the block that hold 74 of the 75 rows, 
prompt  then dump the block to the trace file.
prompt  =========================================

column rel_file_no new_value m_file_no
column block_no    new_value m_block_no

select 
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no, 
        dbms_rowid.rowid_block_number(rowid)    block_no,
        count(*)                                rows_starting_in_block
from 
        t1
group by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid) 
having
        count(*) = 74
;

alter system flush buffer_cache;
alter session set tracefile_identifier= 'D1_start';
alter system dump datafile &m_file_no block &m_block_no;

prompt  ======================================
prompt  Make the first row migrate and show it
prompt  ======================================

update t1 set v1 = rpad('x',400) where id = 0;
commit;

alter system flush buffer_cache;
alter session set tracefile_identifier= 'D2_migrate';
alter system dump datafile &m_file_no block &m_block_no;


prompt  ======================================
prompt  Fill the block the long row is now in,
promtp  the make it migrate again
prompt  ======================================

insert  into t1 
select  rownum + 75, rpad('x',100) 
from    all_objects 
where   rownum <= 75
;
commit;

update t1 set v1 = rpad('x',800) where id = 0;
commit;

alter system flush buffer_cache;
alter session set tracefile_identifier= 'D3_migrate_more';
alter system dump datafile &m_file_no block &m_block_no;

prompt  =======================================================
prompt  Fill the block the long row is in and shrink the row 
prompt  to see if it returns to its original block. (No).
prompt  =======================================================

insert  into t1 
select  rownum + 150, rpad('x',100) 
from    all_objects 
where   rownum <= 75
;
commit;

-- delete from t1 where id between 1 and 20;
-- commit;

update t1 set v1 = rpad('x',50) where id = 0;
commit;

alter system flush buffer_cache;
alter session set tracefile_identifier= 'D4_shrink_row';
alter system dump datafile &m_file_no block &m_block_no;


prompt  ==============================================
prompt  Make a lot of space in the original block then
prompt  GROW the row again to see if it migrates back.
prompt  ==============================================

delete from t1 where id between 1 and 20;
commit;

update t1 set v1 = rpad('x',1200) where id = 0;
commit;

alter system flush buffer_cache;
alter session set tracefile_identifier= 'D5_forcemigrate';
alter system dump datafile &m_file_no block &m_block_no;

When the script has run there will be 5 trace files, and single “grep” command to find the row entry in the dump for the first row of the block (row 0) will give you results like the following:

[oracle@linux19c trace]$ grep -A+3 "row 0" *19012*.trc

or19_ora_19012_D1_start.trc:tab 0, row 0, @0xab
or19_ora_19012_D1_start.trc-tl: 106 fb: --H-FL-- lb: 0x1  cc: 2
or19_ora_19012_D1_start.trc-col  0: [ 1]  80
or19_ora_19012_D1_start.trc-col  1: [100]
--
or19_ora_19012_D2_migrate.trc:tab 0, row 0, @0xab
or19_ora_19012_D2_migrate.trc-tl: 9 fb: --H----- lb: 0x2  cc: 0
or19_ora_19012_D2_migrate.trc-nrid:  0x090000ac.1
or19_ora_19012_D2_migrate.trc-tab 0, row 1, @0x115
--
or19_ora_19012_D3_migrate_more.trc:tab 0, row 0, @0xab
or19_ora_19012_D3_migrate_more.trc-tl: 9 fb: --H----- lb: 0x1  cc: 0
or19_ora_19012_D3_migrate_more.trc-nrid:  0x090000b0.7
or19_ora_19012_D3_migrate_more.trc-tab 0, row 1, @0x115
--
or19_ora_19012_D4_shrink_row.trc:tab 0, row 0, @0xab
or19_ora_19012_D4_shrink_row.trc-tl: 9 fb: --H----- lb: 0x2  cc: 0
or19_ora_19012_D4_shrink_row.trc-nrid:  0x090000b0.7
or19_ora_19012_D4_shrink_row.trc-tab 0, row 1, @0x115
--
or19_ora_19012_D5_forcemigrate.trc:tab 0, row 0, @0x4b9
or19_ora_19012_D5_forcemigrate.trc-tl: 1208 fb: --H-FL-- lb: 0x2  cc: 2
or19_ora_19012_D5_forcemigrate.trc-col  0: [ 1]  80
or19_ora_19012_D5_forcemigrate.trc-col  1: [1200]

The D1 trace shows you the row with a column count (cc) of 2, and the two column lengths.
The D2 trace shows you a column count of zero, and a nrid (next rowid) pointing to row 1 (2nd row) of block 0x090000ac.
The D3 trace shows you a column count of zero, and a nrid pointing to row 7 (eighth row) of block 0x090000b0, the row has moved to a new location and the header is pointing directly to the new location.
The D4 trace shows exactly the same output – after shrinking (even to a length that is less than it started at) the row has not moved back to the original location.
The D5 trace shows that the row has now moved back to its original location, even though it is now several hundred bytes longer than it used to be.

If you’re wondering why the row didn’t move back after shrinking at D4 (and even when I made a lot of space available in the original block the shrink didn’t cause a move), remember that Oracle tries to be “lazy” – the update can take place in situ, so Oracle doesn’t waste time and effort checking the original block.

Footnote

This note makes no claims about what might happen in a more complex case where a row is so long that it splits into multiple row pieces and the pieces end up scattering across multiple blocks. There are a couple of variations on that problem that might be worth investigating if you suspect that there is some behaviour of very wide tables or very long rows that is the source of a performance problem relating to excessive buffer gets or db file sequential reads.

Comments (2)

September 2, 2022

Shrinking indexes

Filed under: fragmentation,Index Rebuilds,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 7:21 pm BST Sep 2,2022

If you want to do something about “wasted” space in an index what are the differences that you need to consider between the following three options (for the purposes of the article I’m ignoring “rebuild” and “rebuild online”):

alter index xxx coalesce;

alter index xxx shrink space compact;

alter index xxx shrink space;

Looking at the notes in a script I wrote a “few” years ago it seems that I haven’t looked at a comparison between the coalesce option and the shrink space options since 10.2.0.3 and I suspect things may have changed since then, so I’ve discarded the results that I had recorded (in 2012) and started again with 19.11.0.0

Background
- rtfm
- what do we need to know
Basic model
Tests and results
- coalesce
- shrink space compact
- shrink space
Summary

Background

I’ve been looking at the “deferred global index maintenance” in the last couple of weeks which is why I was toying with the idea of writing something about shrinking indexes and how it differs from coalescing them when an Oracle Forum question (needs MOS account) produced the (slightly surprising) suggestion to use coalesce – so I decided it was time to (re-)test, write and publish.

RTFM

First a few bullet points from the 19c SQL reference manual under “alter index”, or following the links from there to the “shrink clause”, or the database administration reference

Specify COALESCE to instruct Oracle Database to merge the contents of index blocks where possible to free blocks for reuse.
Use this [shrink] clause to compact the index segments. Specifying ALTER INDEX … SHRINK SPACE COMPACT is equivalent to specifying ALTER INDEX … COALESCE.
- If you specify COMPACT, then Oracle Database only defragments the segment space … The database does not readjust the high water mark and does not release the space immediately.
Can’t shrink space for bitmap join indexes or function-based indexes.
Segment shrink is an online, in-place operation. DML operations and queries can be issued during the data movement phase of segment shrink. Concurrent DML operations are blocked for a short time at the end of the shrink operation when the space is deallocated.
Shrink operations can be performed only on segments in locally managed tablespaces with automatic segment space management (ASSM).
As with other DDL operations, segment shrink causes subsequent SQL statements to be reparsed because of invalidation of cursors unless you specify the COMPACT clause.

As with many little features of Oracle it’s quite hard to pick up a complete and cohesive statement of what something does and what impact it might have. Some of the bullet points above are generic about shrinking segments, and may not be totally accurate for shrinking only an index – will it invalidate cursors, or does that happen only when you shrink a table used by the cursor, or only when you shrink an index that’s used by the cursor.

If you do read through the links you also notice that I’ve omitted several points from the generic shrink details that are not relevant for indexes (for example the requirement to enable row movement), and have only mentioned the restrictions which are explicitly referenced in the “shrink clause” for indexes.

What do we need to know?

Some of the fairly typical bits of information we might need to know about a “house-keeping” task like coalesce/shrink are:

How much work does it do, and of what type?
What exactly is the benefit we might get for the work done
What side-effects do we have to consider (locking, cursor invalidation etc.)
What side effects might show up if the process fails in mid-stream.

In the case of coalesce/shrink for indexes, a few specific questions would be:

Is “shrink space compact” really equivalent to “coalesce”
Are the operations “online” or only “nearly online”.
If shrink/coalesce is moving index entries around and moving index blocks around what happens if a session wants to insert an index entry into a leaf block that’s currently being “transferred” into another leaf block.
If it’s a big index that needs several minutes (or more) to shrink/coalesce, could ongoing transactions cause index leaf block splits that produce unexpected effects when Oracle tried to drop the highwater mark.
How big an index, and how long would the test have to take, and what degree of concurrency, and how (un)lucky would you have to be to hit a moment when something “strange” happened.

Finally – what tools would be helpful. Initially we might just look at:

session stats – to see what work we do
the dbms_space package – to check segment space usage pre and post.
the treedump event – to get a detailed picture of the index

Based on what we see we might feel the need to dig a little deeper with:

v$enqueue_stats
v$rollstat (rollback (undo) segment usage)
SQL tracing with wait states
Enqueue (lock) tracing
redo dumps

The basic model

Here’s a little script to create a model that we can use for testing. Because of the stated requirement of the shrink space command I’ll just point out that the default tablespace should be using automatic segment space management (ASSM), my tablespace is also defined to use 1MB uniform extents:

rem
rem     Script:         shrink_coalesce.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2012
rem
rem     Last tested:
rem             19.11.0.0
rem 

execute dbms_random.seed(0)

create table t1 (
        v1      varchar2(7)
);

create index t1_i1 on t1(v1);

begin
        for i in 1..1e6 loop
                insert into t1(v1) values(
                        to_char(1e6 + trunc(dbms_random.value(0,100000)))
                );
        end loop;
end;
/

commit;

column ind_id new_value m_ind_id

select  object_id ind_id
from    user_objects
where   object_name = 'T1_I1'
;

alter session set tracefile_identifier = 'creation';
alter session set events 'immediate trace name treedump level &m_ind_id';
alter system flush buffer_cache;

pause Check the tree dump and pick a leaf block to dump

-- alter system dump datafile &&datafile block &&block_id;
alter system dump datafile 36 block 5429;


prompt  ========================
prompt  Deleting 4 rows out of 5
prompt  ========================

delete  from t1 
where   mod(v1,5) != 0
;

commit;

alter session set tracefile_identifier = 'deletion';
alter session set events 'immediate trace name treedump level &m_ind_id';
alter system flush buffer_cache;

-- pause Check the tree dump and pick a leaf block to dump
-- alter system dump datafile &&datafile block &&block_id;
alter system dump datafile 36 block 5429;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1',
                cascade          => true
        );
end;
/

select
        rows_per_block,
        count(*)        block_count
from    (
        select
                /*+
                        dynamic_sampling(0)
                        index_ffs(t1,t1_i1)
                        noparallel_index(t,t1_i1)
                */
                sys_op_lbid( &m_ind_id ,'L',t1.rowid)   as block_id,
                count(*)                                as rows_per_block
        from
                t1
        group by
                sys_op_lbid( &m_ind_id ,'L',t1.rowid)
        )
group by
        rows_per_block
order by
        rows_per_block
;

@@dbms_space_use_assm_embedded test_user t1_i1 index

Unusually (for me) I’ve created the data by inserting rows one at a time after creating the index. This is to avoid starting from a “perfect” index i.e. one where the physical ordering of the leaf blocks is closely correlated with the logical ordering of the leaf blocks, and where the leaf blocks are very well packed.

With a single session inserting rows there will be a visible pattern to the choice that Oracle makes for “the next avilable free block” when it needs to do a leaf block split, but with the random value insertions there won’t be a pattern in “which block just split” so when you walk the index in key order the steps from one leaf block to the next will jump fairly randomly around the segment.

The table starts at 1,000,000 rows, but ends up with about 200,000 after deletion and an index where roughly 80% of the rows in each leaf block have been deleted. So that we know what state the tests start from I’ve done a treedump of the index before and after the delete (and included a pause in the script to allow you to find an individual block to dump from the treedumps if you want to). Here are the first few lines of the two treedumps:

Before:
----- begin tree dump
branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 401, level: 1)
      leaf: 0x900016d 150995309 (-1: row:222.222 avs:3778)
      leaf: 0x900154e 151000398 (0: row:218.218 avs:3854)
      leaf: 0x9000abd 150997693 (1: row:219.219 avs:3835)
      leaf: 0x900153e 151000382 (2: row:209.209 avs:4025)
      leaf: 0x900058d 150996365 (3: row:230.230 avs:3626)
      leaf: 0x90013a8 150999976 (4: row:229.229 avs:3645)
      leaf: 0x9000ae1 150997729 (5: row:411.411 avs:187)
      leaf: 0x900031c 150995740 (6: row:227.227 avs:3683)
      leaf: 0x90014d3 151000275 (7: row:229.229 avs:3645)
      leaf: 0x9000aec 150997740 (8: row:226.226 avs:3702)
      leaf: 0x90014f3 151000307 (9: row:226.226 avs:3702)
      leaf: 0x9000593 150996371 (10: row:219.219 avs:3835)
      leaf: 0x9001559 151000409 (11: row:223.223 avs:3759)
      leaf: 0x9000a9d 150997661 (12: row:210.210 avs:4006)
      leaf: 0x900152e 151000366 (13: row:215.215 avs:3911)
      leaf: 0x900018a 150995338 (14: row:258.258 avs:3094)
...


After:
----- begin tree dump
branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 401, level: 1)
      leaf: 0x900016d 150995309 (-1: row:222.47 avs:3778)
      leaf: 0x900154e 151000398 (0: row:218.52 avs:3854)
      leaf: 0x9000abd 150997693 (1: row:219.44 avs:3835)
      leaf: 0x900153e 151000382 (2: row:209.43 avs:4025)
      leaf: 0x900058d 150996365 (3: row:230.44 avs:3626)
      leaf: 0x90013a8 150999976 (4: row:229.45 avs:3645)
      leaf: 0x9000ae1 150997729 (5: row:411.88 avs:187)
      leaf: 0x900031c 150995740 (6: row:227.50 avs:3683)
      leaf: 0x90014d3 151000275 (7: row:229.42 avs:3645)
      leaf: 0x9000aec 150997740 (8: row:226.46 avs:3702)
      leaf: 0x90014f3 151000307 (9: row:226.57 avs:3702)
      leaf: 0x9000593 150996371 (10: row:219.46 avs:3835)
      leaf: 0x9001559 151000409 (11: row:223.54 avs:3759)
      leaf: 0x9000a9d 150997661 (12: row:210.33 avs:4006)
      leaf: 0x900152e 151000366 (13: row:215.30 avs:3911)
      leaf: 0x900018a 150995338 (14: row:258.52 avs:3094)
...
      leaf: 0x900077f 150996863 (398: row:356.64 avs:1232)
      leaf: 0x9000d67 150998375 (399: row:327.62 avs:1783)
   branch: 0x9000e45 150998597 (0: nrow: 378, level: 1)
      leaf: 0x900047a 150996090 (-1: row:342.86 avs:1498)
      leaf: 0x9000d46 150998342 (0: row:357.60 avs:1213)
...
...
      leaf: 0x9000607 150996487 (492: row:369.80 avs:985)
      leaf: 0x9000c60 150998112 (493: row:395.70 avs:491)
   branch: 0x9000c68 150998120 (6: nrow: 503, level: 1)
      leaf: 0x90001b2 150995378 (-1: row:235.60 avs:3531)
      leaf: 0x9001323 150999843 (0: row:230.54 avs:3626)

The “before” section is just the first few lines of 3,538 and shows us that we have a root block with 8 branch blocks (numbered from -1 to +6), and the first branch block holds 401 leaf blocks(numbered from -1 to 399), and the first leaf block starts with 222 index entries (in its row directory) of which, we learn from the “after” section, 47 (i.e. roughly 20%) are still “in use” after the delete. The “after” section adds in a few extra lines from the treedump, around branch block 0 and branch block 6.

In passing, if I were to execute a new transaction that inserted a new index entry into the first leaf block Oracle would tidy its directory and the start of the tree dump would look like the following:

 branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 401, level: 1)
      leaf: 0x900016d 150995309 (-1: row:48.48 avs:7084)
      leaf: 0x900154e 151000398 (0: row:218.52 avs:3854)

After the initial big insert many of the leaf blocks hold around 220 rows but we can see one leaf block of the initial 16 holding 411 rows. Allowing for the 9 blocks that aren’t leaf blocks we can calculate that we should see an average of approximately 1,000,000/3,529 = 283 rows per leaf block; the variation is the result of leaf block “50/50” splits. When a leaf block is full the next attempted insert causes Oracle to attach a new leaf block to the structure and share the existing entries fairly evenly between the two blocks (although there is one special case, the so-called “90/10” split that can happen when you insert a new high value into the highest value leaf block). The shares are not exactly equal because Oracle has to insert a new pointer in the parent branch block at the same time and may be able to reduce the size of this pointer by moving the split point some way from the “fair share” 50/50 point.

Of course, there’s also some variation in the content of the leaf blocks because they tend to start refilling shortly after they’ve split, so it can be quite instructive (when your system reaches “steady state” to produce a “histogram” of leaf contents – which is what the last SQL statement in my setup script is about, with the following results:

Click here to expand the index histogram report

ROWS_PER_BLOCK BLOCK_COUNT
-------------- -----------
            24           1
            26           1
            27           1
            28           5
            29           5
            30           7
            31          11
            32          11
            33          26
            34          23
            35          28
            36          28
            37          49
            38          47
            39          43
            40          49
            41          62
            42          73
            43          81
            44          92
            45          98
            46          91
            47         117
            48         104
            49         124
            50         124
            51         117
            52         114
            53         106
            54         123
            55         109
            56         104
            57          96
            58          84
            59          70
            60          95
            61          57
            62          73
            63          77
            64          74
            65          66
            66          56
            67          52
            68          54
            69          59
            70          44
            71          56
            72          49
            73          47
            74          51
            75          27
            76          34
            77          29
            78          27
            79          25
            80          27
            81          28
            82          26
            83          16
            84          23
            85          16
            86          18
            87          11
            88          19
            89          16
            90          10
            91          11
            92           5
            93           5
            94           2
            95           3
            96           4
            97           3
            99           3
           100           4
           103           1
           107           1
           119           1
78 rows selected.

The result (because it’s randomly arriving values) is fairly close to the bell curve of the Normal distribution centred at around 50 rows. There’s a fairly long tail up to 119 rows, but that’s probably there in this case because the index state hadn’t quite reached steady state before I did the big delete.

Having dumped a leaf block I know that a completely packed leaf block could hold 420 rows, and at pctfree 10 that would mean 378 rows, and at 70% utilisation (which is what I expect with random arrival) an average of 294 rows generating an index of 3,400 leaf blocks rather than the 3,529 I got. (Again, I think the divergence from expectation are probably related to needing more time to get to steady state.)

The final call in the script is to a stripped down version of some code I published a few years back; the relevance of the numbers when applied to indexes is described in this blog note and the numbers were as follows:

Unformatted                   :           62 /          507,904
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :           45 /          368,640
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :            0 /                0
Full                          :        3,545 /       29,040,640

PL/SQL procedure successfully completed.

Segment Total blocks:        3,712
Object Unused blocks:            0

PL/SQL procedure successfully completed.

Freespace 2 is the label given to the set of blocks that are available for use (empty) whether or not they are in the index structure. Given the pattern of work so far it’s fairly safe to assume that in this case they are “formatted but not yet attached to the index structure”.

A quick arithmetic check highlights an apparent discrepancy: 62 + 45 + 3,545 = 3,652, which is 60 blocks short of the number in the segment; but that’s okay because I have 29 uniform extents of 1MB in the segment, which means 2 space management level 1 bitmap blocks per extent plus one level 2 bitmap block, plus the segment header / level 3 bitmap block – for a total of 60 space management blocks.

The thing I’m not keen on is that the space management blocks are reporting 3,545 Full blocks, when the treedump showed 3,538 blocks – where did the extra 7 come from. But I’m not going to worry about that for this blog note.

Tests and results

The following block of code shows the full set of logging and tracing that I did – though I didn’t use every single diagnostic in every single run – for each of the three options. The code in this case is wrapped around a call to coalesce:

alter session set tracefile_identifier = 'coalesce';
alter session set events 'immediate trace name treedump level &m_ind_id';

execute snap_enqueues.start_snap
execute snap_rollstats.start_snap
execute snap_my_stats.start_snap
execute snap_redo.start_snap

alter session set events 'trace[ksq] disk=medium';

column current_scn new_value start_scn
select to_char(current_scn,'9999999999999999') current_scn from v$database;

alter index t1_i1 coalesce;

column current_scn new_value end_scn
select to_char(current_scn,'9999999999999999') current_scn from v$database;

alter session set events 'trace[ksq] off';

execute snap_redo.end_snap
execute snap_my_stats.end_snap
execute snap_rollstats.end_snap
execute snap_enqueues.end_snap

alter session set events 'immediate trace name treedump level &m_ind_id';

alter session set tracefile_identifier='coalesce_redo';
alter system dump redo scn min &start_scn scn max &end_scn ;
alter session set tracefile_identifier='';

@@dbms_space_use_assm_embedded test_user t1_i1 index
@@index_histogram_embedded t1 t1_i1 &m_ind_id

Working from the top down:

Set an identifier to include in the trace file name.
take a starting treedump (which will go to that trace file)
take starting snapshots of
- system level enqueue stats
- system leve rollback stats
- my session activity stats
- a subset of session stats relating to redo
enable tracing of Enqueues (locks)
capture the current SCN in a define variable
coalesce the index
capture the final SCN in a define variable
report the change in the 4 sets of stats listed above
save the ending treedump to the trace file
set a new identifier for the tracefile name
dump all the redo generated while the coalesce was going on to the new tracefile
Call a script to report the space usage for the index segment
Call a script to report the histogram of leaf block usage again

The starting treedump will match the “post-delete” treedump above, of course, but it’s just a convenience for each test to have its before and after treedumps in the same trace file; and the redo dump (which will include redo from every active session) is so large – about 275MB – that it’s a good idea to keep it separate from the treedumps and enqueue trace.

The histogram script is just a wrapper for the two sys_op_lbid() queries shown earlier on. The space usage script is one we’ve already met.

A test run takes only a couple of minutes – and most of the time is spent inserting 1M rows into an indexed table one at a time. (The time it took me to analyze the logs, traces and dumps was much longer, and the time to summarize and write up the results longer still!)

Here, then, are the most interesting details from the three tests. Some of the comments I make are not immediately “proved” by the results I’m showing, but the volume of data required to supply corroborative evidence would become excessive and very boring.

Coalesce

The first “big picture” item to look at after the coalesce is the space usage:

Unformatted                   :           62 /          507,904
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :        3,037 /       24,879,104
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :            0 /                0
Full                          :          553 /        4,530,176

PL/SQL procedure successfully completed.

Segment Total blocks:        3,712
Object Unused blocks:            0

The index segment is 3,712 blocks, of which 553 are “Full”, and 3,037 are in the “Freespace 2” state which, for indexes, means they are empty and available for reuse. The coalesce hasn’t released space back to the tablespace but we can’t tell from these figures whether the 553 blocks full blocks are packed into the “bottom end” of the segment or scattered across the entire length of the segment. Or, to view it another way, the figues don’t tell us whether Oracle has been shuffling rows without completely re-arranging the block linkages or whether it’s also been moving rows so that it can reconnect leaf blocks in a way that leaves all the empty blocks above a notional highwater mark.

We can dig a little deeper by looking at the treedump:

branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 64, level: 1)
      leaf: 0x900016d 150995309 (-1: row:377.377 avs:833)
      leaf: 0x90014d3 151000275 (0: row:377.377 avs:833)
      leaf: 0x900118c 150999436 (1: row:377.377 avs:833)
      leaf: 0x9000370 150995824 (2: row:377.377 avs:833)
...
      leaf: 0x9000d2f 150998319 (61: row:377.377 avs:833)
      leaf: 0x9000d67 150998375 (62: row:114.114 avs:5830)
   branch: 0x9000e45 150998597 (0: nrow: 59, level: 1)
      leaf: 0x900047a 150996090 (-1: row:377.377 avs:833)
      leaf: 0x9000725 150996773 (0: row:377.377 avs:833)

...
...
      leaf: 0x9000a05 150997509 (67: row:377.377 avs:833)
      leaf: 0x900030d 150995725 (68: row:376.376 avs:852)
   branch: 0x9000c68 150998120 (6: nrow: 76, level: 1)
      leaf: 0x90001b2 150995378 (-1: row:60.60 avs:6856)
      leaf: 0x9001323 150999843 (0: row:377.377 avs:833)

The root block is still reporting the same number of level 1 branch blocks, but the branch blocks report far fewer leaf blocks each. Most of the leaf blocks report 377 index entries, but the first and last leaf blocks of each branch tend to show fewer.

I pointed out earlier on that with pctfree 10 we’d get 378 rows per leaf block if we recreated the index, but it looks like there’s a little overhead I didn’t allow for and we’ve actually got 377 from the coalesce. You’ll notice that a coalesce will actually reduce the number of index entries in a leaf block if it exceeds the limit set by pctfree (remember how the original treedump extracts showed one leaf block with 411 entries).

Coalesce does not act “across” branch blocks, which is why (a) the number of branch blocks is unchanged, and (b) why the number of rows in the last leaf block of a branch block may have fewer rows than the typical leaf blocks – coalesce will not move rows from the first leaf block of the next branch.

I’ve included a few lines from around the branches numbered 0 and 6 in this extract. If you compare them with the treedump taken just after the delete you’ll see that the coalesce has copied rows back from the second (0^th) leaf of branch 0 into the first (-1^th) leaf , but not from the second (0^th) leaf into the first (-1^th) leaf of branch 6. I don’t know why this is but perhaps it’s something to do with the relative number of rows in the first and second (-1^th and 0^th) leaf blocks – the same behaviour showed up at the start of branch 3 where the two leaf blocks had 58 and 63 rows respectively.

Getting back to the question of whether the “Freespace 2” blocks reported by the space usage procedure are still in the structure or whether they have been unlinked – the number of leaf blocks reported per branch block is fairly convincing – the empty leaf blocks have been detached from the structure and are simply flagged as free blocks in the space management level 1 bitmap. We could do a quick check of all the branch blocks (grep ” branch” from the trace file):

branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 64, level: 1)
   branch: 0x9000e45 150998597 (0: nrow: 59, level: 1)
   branch: 0x90007d1 150996945 (1: nrow: 61, level: 1)
   branch: 0x9000e8a 150998666 (2: nrow: 66, level: 1)
   branch: 0x900043c 150996028 (3: nrow: 70, level: 1)
   branch: 0x9000e18 150998552 (4: nrow: 70, level: 1)
   branch: 0x900073d 150996797 (5: nrow: 70, level: 1)
   branch: 0x9000c68 150998120 (6: nrow: 76, level: 1)

Add up the nrow for the level 1 branches and you get 536; add 9 for the branch blocks themselves and you get 545 – and the space usage report says 553 (an unexplained error of 8 which I’ll get round to worrying about one day; I wonder if there’s any significance in how close it is to the error of 7 that we had before the coalesce).

We can learn more from the tree dump by walking the leaf blocks in order and checking their block addresses.

The first leaf block of the first level 1 branch block is 0x900016d before and after the coalesce.
The second leaf block of the this branch block is 0x90014d3 after the coalesce, but that was the address of leaf block number 7 before the coalesce.
The third leaf block is 0x900118c after the coalesce but was leaf block number 15 before the coalesce.

The coalesce has been walking the index in order, copying rows back to earlier leaf blocks and unlinking the block it’s working on if it becomes empty. The ultimate effect of this is that the final set of index leaf blocks isn’t compacted into the smallest contiguous space possible, it’s scattered just as widely and randomly across the whole segment as it was before the coalesce.

We could go one step further to demonstrate this scattering. Extract all the lines for leaf blocks from the treedump and sort them into order. Since I’m using 1MB exents I’d like to see (nearly) 128 consecutive block addresses in order before a possible jump to a block in the next extent but here are first few addresses when I do the experiment:

      leaf: 0x9000105 150995205 (59: row:377.377 avs:833)
      leaf: 0x9000108 150995208 (3: row:377.377 avs:833)
      leaf: 0x900010a 150995210 (52: row:377.377 avs:833)
      leaf: 0x9000115 150995221 (12: row:377.377 avs:833)
      leaf: 0x9000117 150995223 (63: row:377.377 avs:833)
      leaf: 0x900011e 150995230 (3: row:377.377 avs:833)
      leaf: 0x900011f 150995231 (53: row:377.377 avs:833)
      leaf: 0x900012b 150995243 (34: row:377.377 avs:833)
      leaf: 0x9000137 150995255 (34: row:377.377 avs:833)
      leaf: 0x900013a 150995258 (63: row:377.377 avs:833)
      leaf: 0x900013d 150995261 (43: row:377.377 avs:833)

You don’t have to be skilled at reading hex numbers to see all the gaps between the used block addresses.

Coalesce – Transactions

We now know where the index has got to, so the next question is how did it get there. The snapshot showing the change in rollback statistics (v$rollstat) is revealing.

USN   Ex Size K  HWM K  Opt K      Writes     Gets  Waits Shr Grow Shr K  Act K
----  -- ------  -----  -----      ------     ----  ----- --- ---- ----- ------
   0   0      0      0      0           0        1      0   0    0     0      0
   1   5   5120      0      0     5101714     1199      0   0    5     0 -28446
   2   6   6144      0      0     5278032     1245      0   0    6     0    275
   3   7   7168      0      0     5834744     1365      0   0    7     0  -1492
   4   1   8192      0      0     5944580     1378      0   0    1     0 -17281
   5   6   6144      0      0     5126248     1203      0   0    6     0    303
   6   4   4096      0      0     5076808     1189      0   0    4     0    -72
   7   6   6144      0      0     5244984     1239      0   0    6     0    127
   8   7   7168      0      0     5818394     1363      0   0    7     0    263
   9   7   7168      0      0     6017230     1401      0   0    7     0    213
  10   1   8192   8192      0     5060154     1178      0   0    1     0 -54488

My session was the only one active on the system, and it’s only a small play system so the only undo segments it has are the basic 10 that appear when you create the database (plus the one rollback segment in the SYSTEM tablespace).

The critical numbers are the writes (bytes) and gets (blocks), which tell us that our single operation has behaved as a number of individual transactions that have been starting in different undo segments.

Given the fairly even spread of bytes written it’s a good bet that we’re seeing a fairly large number of fairly small transactions. We can corroborate this by looking at the snapshot of enqueue (lock) stats (v$enqueue_stats):

Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
CF             2           0           2           0           0 contention
XR             1           0           1           0           0 database force logging
TM             1           0           1           0           0 contention
TX         3,051           0       3,051           0           0 contention
HW            50           0          50           0           0 contention
TT            50           0          50           0           0 contention
CU            24           0          24           0           0 contention
OD             1           0           1           0           0 Serializing DDLs
JG           126           0         126           0           0 queue lock
JG            12           0          12           0           0 q mem clnup lck
JG           126           0         126           0           0 contention

The enqueue we’re interested in is the TX (transaction) enqueue – and Oracle reports more than 3,000 of them in the interval. (That’s interestingly close to the number of blocks in the index or, to be even fussier, the number of leaf blocks that have been emptied – but that might be a coincidence.)

You’ll notice, though, that there’s only 1 TM (table) lock request. Whatever else we’re doing we’re not locking and unlocking the table on every single transaction – so we need to find out what that lock is and whether it might be a threat to our application (a TM lock in mode 4, 5, or 6, held for the duration would be a disaster). And that’s why I enabled the ksq (enqueue) trace – here’s the extract from the trace file showing the acquisition of the TM lock.

2022-09-02 11:36:57.645*:ksq.c@9175:ksqgtlctx(): *** TM-000230A3-00000000-0039DED3-00000000 mode=2 flags=0x401 why=173 timeout=0 ***
2022-09-02 11:36:57.645*:ksq.c@9183:ksqgtlctx(): xcb=0x9bbeec68, ktcdix=2147483647 topxcb=0x9bbeec68 ktcipt(topxcb)=0x0
2022-09-02 11:36:57.645*:ksq.c@9203:ksqgtlctx(): ksqgtlctx: Initializing lock structure
2022-09-02 11:36:57.645*:ksq.c@9324:ksqgtlctx(): DID DUMP START
2022-09-02 11:36:57.645*:ksq.c@9328:ksqgtlctx():        ksqlkdid: 0001-0029-0000013C
2022-09-02 11:36:57.645*:ksq.c@9333:ksqgtlctx():        tktcmydid: 0001-0029-0000013C
2022-09-02 11:36:57.645*:ksq.c@9337:ksqgtlctx():        tksusesdi: 0000-0000-00000000
2022-09-02 11:36:57.645*:ksq.c@9341:ksqgtlctx():        tksusetxn: 0001-0029-0000013C
2022-09-02 11:36:57.645*:ksq.c@9343:ksqgtlctx(): DID DUMP END
2022-09-02 11:36:57.645*:ksq.c@9517:ksqgtlctx(): ksqgtlctx: did not find link
2022-09-02 11:36:57.645*:ksq.c@9687:ksqgtlctx(): ksqgtlctx: updated ksqlrar1, ksqlrar:0x9e7f7cb8, ksqlral:(nil)
2022-09-02 11:36:57.645*:ksq.c@9841:ksqgtlctx(): ksqgtlctx: updated ksqlral, ksqlral:0x9bac7bc0, res:0x9e7f7cb8
2022-09-02 11:36:57.645*:ksq.c@9851:ksqgtlctx(): ksqgtlctx: updated lock mode, mode:2 req:0
2022-09-02 11:36:57.645*:ksq.c@9960:ksqgtlctx(): SUCCESS

I’ve highlighted the line where the TM lock appears, reporting an “id1” of 000230A3, which is the object_id of the table t1. Take note of the other highlighted line which gives the address of the resource element used (res: 0x9e7f7cb8) because we can use this to find where the lock is released:

2022-09-02 11:36:58.735*:ksq.c@10367:ksqrcli_int(): ksqrcli_int: updated ksqlral, ksqlral:0x9bac7bc0, res:0x9e7f7cb8
2022-09-02 11:36:58.735*:ksq.c@10501:ksqrcli_int(): returns 0

This appears in the last few lines of the ksq trace, after the appearance of several thousand (brief) TX locks that have been acquired and released. So there is a low-impact table lock held for the duration of the coalesce that is not going to stop other sessions from updating the table (and its indexes).

There was one other lock released after the TM lock:

2022-09-02 11:36:58.769*:ksq.c@10367:ksqrcli_int(): ksqrcli_int: updated ksqlral, ksqlral:0x9e6b2370, res:0x9e7f0788
2022-09-02 11:36:58.769*:ksq.c@10501:ksqrcli_int(): returns 0

Working backwards using the resource address we find that this was an OD lock, taken immediately after the TM lock:

2022-09-02 11:36:57.645*:ksq.c@9175:ksqgtlctx(): *** OD-000230A4-00000000-0039DED3-00000000 mode=4 flags=0x10001 why=277 timeout=0 ***
2022-09-02 11:36:57.645*:ksq.c@9183:ksqgtlctx(): xcb=0x9bbeec68, ktcdix=2147483647 topxcb=0x9bbeec68 ktcipt(topxcb)=0x0
2022-09-02 11:36:57.645*:ksq.c@9203:ksqgtlctx(): ksqgtlctx: Initializing lock structure
2022-09-02 11:36:57.645*:ksq.c@9324:ksqgtlctx(): DID DUMP START
2022-09-02 11:36:57.645*:ksq.c@9328:ksqgtlctx():        ksqlkdid: 0001-0029-0000013C
2022-09-02 11:36:57.645*:ksq.c@9333:ksqgtlctx():        tktcmydid: 0001-0029-0000013C
2022-09-02 11:36:57.645*:ksq.c@9337:ksqgtlctx():        tksusesdi: 0000-0000-00000000
2022-09-02 11:36:57.645*:ksq.c@9341:ksqgtlctx():        tksusetxn: 0001-0029-0000013C
2022-09-02 11:36:57.645*:ksq.c@9343:ksqgtlctx(): DID DUMP END
2022-09-02 11:36:57.645*:ksq.c@9517:ksqgtlctx(): ksqgtlctx: did not find link
2022-09-02 11:36:57.645*:ksq.c@9687:ksqgtlctx(): ksqgtlctx: updated ksqlrar1, ksqlrar:0x9e7f0788, ksqlral:(nil)
2022-09-02 11:36:57.645*:ksq.c@9841:ksqgtlctx(): ksqgtlctx: updated ksqlral, ksqlral:0x9e6b2370, res:0x9e7f0788
2022-09-02 11:36:57.645*:ksq.c@9851:ksqgtlctx(): ksqgtlctx: updated lock mode, mode:4 req:0
2022-09-02 11:36:57.645*:ksq.c@9960:ksqgtlctx(): SUCCESS

Checking v$lock_type we see that the OD lock is the “Online DDLs” lock, with the description “Lock to prevent concurrent online DDLs” and its first parameter is the object_id of the object that is the target of the DDL. The value in the trace file (000230A4) identifies the index that we are coalescing; at mode 4 the lock mode is fairly aggressive, but I’m surprised that it isn’t 6 – if we were to interpret the value the way we would for TM locks it would suggest that two sessions could coalesce the index at the same time!

Apart from 50 pairs of TT/HW locks (tablespace DDL / Segment Highwater mark) due to undo segments growing and shrinking, the rest of the ksq trace was taken up by 3,051 TX locks, typically reporting their acquisition and release on adjacent lines of the trace, e.g.:

2022-09-02 11:36:57.650*:ksq.c@9100:ksqgtlctx(): ksqtgtlctx: PDB mode
2022-09-02 11:36:57.650*:ksq.c@9175:ksqgtlctx(): *** TX-0004001C-00002F83-0039DED3-00000000 mode=6 flags=0x401 why=176 timeout=0 ***
2022-09-02 11:36:57.650*:ksq.c@9183:ksqgtlctx(): xcb=0x9bd37110, ktcdix=2147483647 topxcb=0x9bbeec68 ktcipt(topxcb)=0x0
2022-09-02 11:36:57.650*:ksq.c@9203:ksqgtlctx(): ksqgtlctx: Initializing lock structure
2022-09-02 11:36:57.650*:ksq.c@9324:ksqgtlctx(): DID DUMP START
2022-09-02 11:36:57.650*:ksq.c@9328:ksqgtlctx():        ksqlkdid: 0001-0029-0000013C
2022-09-02 11:36:57.650*:ksq.c@9333:ksqgtlctx():        tktcmydid: 0001-0029-0000013C
2022-09-02 11:36:57.650*:ksq.c@9337:ksqgtlctx():        tksusesdi: 0000-0000-00000000
2022-09-02 11:36:57.650*:ksq.c@9341:ksqgtlctx():        tksusetxn: 0001-0029-0000013C
2022-09-02 11:36:57.650*:ksq.c@9343:ksqgtlctx(): DID DUMP END
2022-09-02 11:36:57.650*:ksq.c@9517:ksqgtlctx(): ksqgtlctx: did not find link
2022-09-02 11:36:57.650*:ksq.c@9687:ksqgtlctx(): ksqgtlctx: updated ksqlrar1, ksqlrar:0x9e80b098, ksqlral:(nil)
2022-09-02 11:36:57.650*:ksq.c@9841:ksqgtlctx(): ksqgtlctx: updated ksqlral, ksqlral:0x9bd37148, res:0x9e80b098
2022-09-02 11:36:57.650*:ksq.c@9851:ksqgtlctx(): ksqgtlctx: updated lock mode, mode:6 req:0
2022-09-02 11:36:57.650*:ksq.c@9960:ksqgtlctx(): SUCCESS
2022-09-02 11:36:57.650*:ksq.c@10367:ksqrcli_int(): ksqrcli_int: updated ksqlral, ksqlral:0x9bd37148, res:0x9e80b098
2022-09-02 11:36:57.650*:ksq.c@10501:ksqrcli_int(): returns 0

Coalesce – Workload

We’ve examined the end result of a coalesce, and seen something of the mechanism that Oracle adopts to get to that result, but what does it cost (in terms of work done)? In many cases it’s sufficient to limit the analysis to:

how much I/O
how much CPU
how much undo and redo generated

The obvious I/O comes from the requirement to walk the index in leaf block order, and the dbwr will eventually have to write back every block (including the empty ones). But that I/O, and the inevitable CPU usage is not particularly interesting, what’s more interesting (and more of a threat) is the impact of the undo and redo. This is where the snapsthos of session stats and redo stats give us the information we need to know, and all I’m going to look at are the redo-related stats for the test:

Name                                                                     Value
----                                                                     -----
messages sent                                                               92
calls to kcmgcs                                                             81
calls to kcmgas                                                          6,118
calls to get snapshot scn: kcmgss                                        3,075
redo entries                                                            34,810
redo size                                                           76,475,936
redo buffer allocation retries                                              39
redo subscn max counts                                                   1,049
redo synch time                                                              3
redo synch time (usec)                                                  33,207
redo synch time overhead (usec)                                            159
redo synch time overhead count (  2ms)                                       1
redo synch writes                                                            1
redo write info find                                                         1
undo change vector size                                             55,062,672
rollback changes - undo records applied                                    287

Bearing in mind that this index started at roughly 3,600 blocks / 28MB and coalesced to roughly 560 blocks / 4.5MB I want to draw your attention to just three of the figures (highlighted): the number and total size of redo records generated, and the volume of undo generated. 23,810 redo records, 75MB of redo, of which 55MB was due to undo.

It’s nice to see that the undo figure is consistent with the sum of the writes we saw in the snapshot of v$rollstat. But the numbers warn us that there’s a lot of work going into a coalesce – and it could have a big impact on other users.

My session is generating a lot of undo, and it’s cycling through every undo segment as it does so – that means other sessions that need to create read-consistent images of recently changed data blocks that are completely unrelated to my index may have to work backwards through a large number of undo blocks trying to find upper bound SCNs (check for statistics like: ‘transaction tables consistent read%’)

You’ll notice that I’ve also reported “rollback changes – undo records applied”; these are appearing because of “DML restarts” that make a statement roll back and try again the first time it triggers an undo segment extension. Luckily all my transactions are very small so each individual transaction won’t suffer much if it has to restart, but if you have a long running DML statement and I keep filling and extending undo segments (possibly shrinking other undo segments to do so) that’s going to increase your chances of finding your undo segment full and doing a huge rollback and restart of your statement. Be very careful about timing your coalesce commands.

Since I’ve dumped all the redo generated during the test run I’ll finish by showing a little analysis of the results. The trace file for this 28MB index was over 250MB so it’s not something you’d dump on a production size coalesce.

All I’m going to do is use grep to pull out the redo OP codes of every change vector in the file and show you a couple of extracts from the results. First a commonly occurring pattern:

CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x090014d3 OBJ:143528 SCN:0x00000000024f686d SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x0900152e OBJ:143528 SCN:0x00000000024f686d SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:31 AFN:17 DBA:0x04402750 OBJ:4294967295 SCN:0x00000000024f685d SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:1 CLS:32 AFN:17 DBA:0x0440ec96 OBJ:4294967295 SCN:0x00000000024f686e SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900152e OBJ:143528 SCN:0x00000000024f686e SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:32 AFN:17 DBA:0x0440ec96 OBJ:4294967295 SCN:0x00000000024f686e SEQ:2 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x090014d3 OBJ:143528 SCN:0x00000000024f686e SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:32 AFN:17 DBA:0x0440ec96 OBJ:4294967295 SCN:0x00000000024f686e SEQ:3 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x0900018a OBJ:143528 SCN:0x00000000024f67c1 SEQ:1 OP:10.11 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x09000438 OBJ:143528 SCN:0x00000000024f686d SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:32 AFN:17 DBA:0x0440ec96 OBJ:4294967295 SCN:0x00000000024f686e SEQ:4 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000438 OBJ:143528 
SCN:0x00000000024f686e SEQ:1 OP:10.39 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:31 AFN:17 DBA:0x04402750 OBJ:4294967295 SCN:0x00000000024f686e SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:1 CLS:32 AFN:17 DBA:0x0440ec97 OBJ:4294967295 SCN:0x00000000024f686e SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x090014d3 OBJ:143528 SCN:0x00000000024f686e SEQ:2 OP:10.8 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:31 AFN:17 DBA:0x04402750 OBJ:4294967295 SCN:0x00000000024f686e SEQ:2 OP:5.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:1 CLS:32 AFN:17 DBA:0x0440ec98 OBJ:4294967295 SCN:0x00000000024f686e SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900152e OBJ:143528 SCN:0x00000000024f686e SEQ:2 OP:10.34 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09001500 OBJ:143528 SCN:0x00000000024f685a SEQ:1 OP:13.22 ENC:0 RBL:0 FLG:0x0000

CHANGE #1 CON_ID:3 TYP:0 CLS:31 AFN:17 DBA:0x04402750 OBJ:4294967295 SCN:0x00000000024f686e SEQ:3 OP:5.4 ENC:0 RBL:0 FLG:0x0000

The last line is Op Code 5.4, a commit (or rollback), and I picked a batch of rows between one commit and the next, so this entire set of 20 change vectors is a single transaction taking place in the coalesce. I’ve placed gaps before every “Change #1” to show the boundaries between redo records. As you can see, my “common pattern” transaction is 11 redo records; that’s another sanity check: we saw roughly 3,000 TX enqueues, and 34,800 redo entries: 11 * 3,000 = 33,000, which is a good enough match.

Op Code 5.2 is “get next undo block”, Op Code 5.1 is “create undo record”, so I’m going to simplify the list by removing those codes. Removing some of the irrelevant material from the start and end of each line the example reduces to:

DBA:0x090014d3 OBJ:143528 SCN:0x00000000024f686d SEQ:1 OP:4.1 
DBA:0x0900152e OBJ:143528 SCN:0x00000000024f686d SEQ:1 OP:4.1 
DBA:0x0900152e OBJ:143528 SCN:0x00000000024f686e SEQ:1 OP:10.6 
DBA:0x090014d3 OBJ:143528 SCN:0x00000000024f686e SEQ:1 OP:10.6 
DBA:0x0900018a OBJ:143528 SCN:0x00000000024f67c1 SEQ:1 OP:10.11 
DBA:0x09000438 OBJ:143528 SCN:0x00000000024f686d SEQ:1 OP:4.1 
DBA:0x09000438 OBJ:143528 SCN:0x00000000024f686e SEQ:1 OP:10.39
DBA:0x090014d3 OBJ:143528 SCN:0x00000000024f686e SEQ:2 OP:10.8 
DBA:0x0900152e OBJ:143528 SCN:0x00000000024f686e SEQ:2 OP:10.34 
DBA:0x09001500 OBJ:143528 SCN:0x00000000024f685a SEQ:1 OP:13.22

Translating the OP Codes (and adding in a little information I have about which blocks the block addresses (DBA) correspond to) this is what the transaction does

block cleanout of leaf block 0x090014d3 (4.1)
block cleanout of leaf block 0x0900152e (4.1)
lock leaf block 0x0900152e (10.6)
lock leaf block 0x090014d3 (10.6)
change the “pointer to previous” of leaf block 0x0900018a (10.11)
block cleanout of branch block 0x09000438 (4.1)
update branch block 0x09000438, delete one leaf entry (10.39)
create new version of leaf block 0x090014d3 (10.8)
create empty version of leaf block 0x0900152e (10.34)
update space management level 1 bitmap block (13.22)

So where does the huge amount of redo appear. If we looked at the 11 Redo Record Headers for the extract we could use the LEN information to point us to the cirtical bits:

REDO RECORD - Thread:1 RBA: 0x000391.000174dc.01c0 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174dd.0028 LEN: 0x0060 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174dd.0088 LEN: 0x0164 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174de.0010 LEN: 0x00e4 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174de.00f4 LEN: 0x00ec VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174de.01e0 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174df.0048 LEN: 0x0104 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174df.014c LEN: 0x3a88 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.000174fd.01b4 LEN: 0x20bc VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.0001750e.0180 LEN: 0x0064 VLD: 0x01 CON_UID: 3792595
REDO RECORD - Thread:1 RBA: 0x000391.0001750e.01e4 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595

I’ve highlighted the two big ones – records 8 and 9, which are the ones holding the 10.8 (create new leaf) and 10.34 (make block empty). Why are they so big at 14,984 bytes and 8,380 bytes respectively?

Record 8 includes a change vector (5.1) for the undo of the replaced block which is a block image at 8,032 bytes, and a change vector for the new version of the block in a format similar to an array insert which happened to have 344 rows at this point for a size of roughly 6,500 bytes.

Record 9 includes a change vector (5.1) for the undo of the emptied block, again a block image of 8,032 bytes. But the 10.34 itself is only a few tens of bytes.

This test highlights a particularly nasty threat from coalesce and its “pairwise” clean-up. Checking the “post-delete” tree dump I can see I’ve emptied leaf block 0x0900152e by copying 30 rows back into leaf block 0x090014d3, and I can see that this is the fifth leaf block that I’ve emptied into 0x090014d3, and I can see that I’ll be doing one more pass to get that block full; and each time I do this I dump two block images, and an “array-update” redo change vector that gets bigger and bigger on each pass until it’s nearly the full 8KB. The operation generates a lot of undo and a lot of redo.

Coalesce – concurrency

As a quick test of what happens when other work is going on on the table I ran a little script to insert an extra 100 rows (without committing) into the table just after the big delete but just before the coalesce, generating random values from the same range as the original values.

The coalesce didn’t seem to take any extra time and I didn’t see any enqueue waits or buffer busy waits (though a different test of 3,000 rapid single row inserts with commits while the coalesce was running manage to get one buffer busy wait on a branch block).

The final result, though was not very good. With 100 uncommitted inserts getting in the way the index report 687 “full” blocks rather than the 553 that we saw originally. That’s an increase of more than one block per row inserted.

Basically when Oracle hits a block with an uncommitted change it looks as if it says – “I can’t copy those rows backwards so I’ll have to leave the current block wherever I’ve got to, skip the modified block and restart the coalesce in the next block along” So every block with an uncommitted change could result in two extra blocks ultimately not being packed as well as they could be.

Click here if you want to see the full treedump

----- begin tree dump
branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 78, level: 1)
      leaf: 0x900016d 150995309 (-1: row:377.377 avs:833)
      leaf: 0x90014d3 151000275 (0: row:377.377 avs:833)
      leaf: 0x900118c 150999436 (1: row:377.377 avs:833)
      leaf: 0x9000370 150995824 (2: row:377.377 avs:833)
      leaf: 0x900011e 150995230 (3: row:120.120 avs:5716)
      leaf: 0x9000a0e 150997518 (4: row:49.49 avs:7065)
      leaf: 0x9001258 150999640 (5: row:377.377 avs:833)
      leaf: 0x9000658 150996568 (6: row:377.377 avs:833)
      leaf: 0x9000b8b 150997899 (7: row:377.377 avs:833)
      leaf: 0x9000155 150995285 (8: row:377.377 avs:833)
      leaf: 0x9000ba1 150997921 (9: row:377.377 avs:833)
      leaf: 0x900063c 150996540 (10: row:377.377 avs:833)
      leaf: 0x9000d3b 150998331 (11: row:377.377 avs:833)
      leaf: 0x9000469 150996073 (12: row:114.114 avs:5830)
      leaf: 0x9000bfb 150998011 (13: row:51.51 avs:7027)
      leaf: 0x900155d 151000413 (14: row:377.377 avs:833)
      leaf: 0x9000ba2 150997922 (15: row:377.377 avs:833)
      leaf: 0x9001512 151000338 (16: row:377.377 avs:833)
      leaf: 0x9000d74 150998388 (17: row:377.377 avs:833)
      leaf: 0x90005be 150996414 (18: row:377.377 avs:833)
      leaf: 0x90005b4 150996404 (19: row:377.377 avs:833)
      leaf: 0x9000c24 150998052 (20: row:377.377 avs:833)
      leaf: 0x90001a7 150995367 (21: row:377.377 avs:833)
      leaf: 0x9001563 151000419 (22: row:377.377 avs:833)
      leaf: 0x9000a68 150997608 (23: row:377.377 avs:833)
      leaf: 0x90011a7 150999463 (24: row:377.377 avs:833)
      leaf: 0x90001b9 150995385 (25: row:327.327 avs:1783)
      leaf: 0x9000ab6 150997686 (26: row:52.52 avs:7008)
      leaf: 0x900149f 151000223 (27: row:377.377 avs:833)
      leaf: 0x900123d 150999613 (28: row:328.328 avs:1764)
      leaf: 0x900033c 150995772 (29: row:89.89 avs:6305)
      leaf: 0x9000b9d 150997917 (30: row:45.45 avs:7141)
      leaf: 0x90014ad 151000237 (31: row:377.377 avs:833)
      leaf: 0x9000d40 150998336 (32: row:377.377 avs:833)
      leaf: 0x9000166 150995302 (33: row:377.377 avs:833)
      leaf: 0x9000c70 150998128 (34: row:377.377 avs:833)
      leaf: 0x90014e9 151000297 (35: row:377.377 avs:833)
      leaf: 0x9000d32 150998322 (36: row:377.377 avs:833)
      leaf: 0x9001546 151000390 (37: row:377.377 avs:833)
      leaf: 0x900018b 150995339 (38: row:377.377 avs:833)
      leaf: 0x9000b8a 150997898 (39: row:377.377 avs:833)
      leaf: 0x9000c4c 150998092 (40: row:169.169 avs:4785)
      leaf: 0x9000a89 150997641 (41: row:69.69 avs:6685)
      leaf: 0x9000335 150995765 (42: row:306.306 avs:2182)
      leaf: 0x9000196 150995350 (43: row:51.51 avs:7027)
      leaf: 0x900134d 150999885 (44: row:377.377 avs:833)
      leaf: 0x90005a7 150996391 (45: row:243.243 avs:3379)
      leaf: 0x9000116 150995222 (46: row:54.54 avs:6970)
      leaf: 0x900130b 150999819 (47: row:377.377 avs:833)
      leaf: 0x9000604 150996484 (48: row:377.377 avs:833)
      leaf: 0x9000655 150996565 (49: row:377.377 avs:833)
      leaf: 0x9000d7d 150998397 (50: row:185.185 avs:4481)
      leaf: 0x9000958 150997336 (51: row:53.53 avs:6989)
      leaf: 0x90011aa 150999466 (52: row:377.377 avs:833)
      leaf: 0x900102a 150999082 (53: row:377.377 avs:833)
      leaf: 0x9000a6d 150997613 (54: row:377.377 avs:833)
      leaf: 0x90014f8 151000312 (55: row:377.377 avs:833)
      leaf: 0x900135d 150999901 (56: row:344.344 avs:1460)
      leaf: 0x9000a37 150997559 (57: row:45.45 avs:7141)
      leaf: 0x900122d 150999597 (58: row:341.341 avs:1517)
      leaf: 0x9000afe 150997758 (59: row:51.51 avs:7027)
      leaf: 0x9001536 151000374 (60: row:377.377 avs:833)
      leaf: 0x9000644 150996548 (61: row:113.113 avs:5849)
      leaf: 0x9000c69 150998121 (62: row:75.75 avs:6571)
      leaf: 0x9000141 150995265 (63: row:377.377 avs:833)
      leaf: 0x9000c56 150998102 (64: row:377.377 avs:833)
      leaf: 0x900059d 150996381 (65: row:377.377 avs:833)
      leaf: 0x9000c46 150998086 (66: row:377.377 avs:833)
      leaf: 0x9000af0 150997744 (67: row:377.377 avs:833)
      leaf: 0x90001bd 150995389 (68: row:377.377 avs:833)
      leaf: 0x9000baa 150997930 (69: row:377.377 avs:833)
      leaf: 0x9000c4f 150998095 (70: row:377.377 avs:833)
      leaf: 0x9000639 150996537 (71: row:377.377 avs:833)
      leaf: 0x9000c1f 150998047 (72: row:377.377 avs:833)
      leaf: 0x9000d4a 150998346 (73: row:377.377 avs:833)
      leaf: 0x900153d 151000381 (74: row:377.377 avs:833)
      leaf: 0x9000e74 150998644 (75: row:377.377 avs:833)
      leaf: 0x9000d2a 150998314 (76: row:244.244 avs:3360)
   branch: 0x9000e45 150998597 (0: nrow: 75, level: 1)
      leaf: 0x900047a 150996090 (-1: row:377.377 avs:833)
      leaf: 0x9000725 150996773 (0: row:377.377 avs:833)
      leaf: 0x90001e2 150995426 (1: row:377.377 avs:833)
      leaf: 0x9000e43 150998595 (2: row:156.156 avs:5032)
      leaf: 0x9000e32 150998578 (3: row:61.61 avs:6837)
      leaf: 0x9000258 150995544 (4: row:377.377 avs:833)
      leaf: 0x90007dd 150996957 (5: row:377.377 avs:833)
      leaf: 0x900166e 151000686 (6: row:377.377 avs:833)
      leaf: 0x90001aa 150995370 (7: row:377.377 avs:833)
      leaf: 0x9000c44 150998084 (8: row:377.377 avs:833)
      leaf: 0x90014e8 151000296 (9: row:71.71 avs:6647)
      leaf: 0x900067a 150996602 (10: row:66.66 avs:6742)
      leaf: 0x9000ba9 150997929 (11: row:71.71 avs:6647)
      leaf: 0x900033f 150995775 (12: row:80.80 avs:6476)
      leaf: 0x9000aeb 150997739 (13: row:377.377 avs:833)
      leaf: 0x90013e7 151000039 (14: row:377.377 avs:833)
      leaf: 0x900067b 150996603 (15: row:377.377 avs:833)
      leaf: 0x9000bff 150998015 (16: row:377.377 avs:833)
      leaf: 0x90001b5 150995381 (17: row:377.377 avs:833)
      leaf: 0x9000364 150995812 (18: row:167.167 avs:4823)
      leaf: 0x9001550 151000400 (19: row:42.42 avs:7198)
      leaf: 0x90005e1 150996449 (20: row:377.377 avs:833)
      leaf: 0x9000c37 150998071 (21: row:293.293 avs:2429)
      leaf: 0x900065d 150996573 (22: row:92.92 avs:6248)
      leaf: 0x9000c2d 150998061 (23: row:377.377 avs:833)
      leaf: 0x9000374 150995828 (24: row:377.377 avs:833)
      leaf: 0x900148f 151000207 (25: row:377.377 avs:833)
      leaf: 0x9000e63 150998627 (26: row:377.377 avs:833)
      leaf: 0x9000eb1 150998705 (27: row:377.377 avs:833)
      leaf: 0x9000c30 150998064 (28: row:377.377 avs:833)
      leaf: 0x9000612 150996498 (29: row:256.256 avs:3132)
      leaf: 0x9000c08 150998024 (30: row:68.68 avs:6704)
      leaf: 0x900074e 150996814 (31: row:377.377 avs:833)
      leaf: 0x9000132 150995250 (32: row:305.305 avs:2201)
      leaf: 0x9000473 150996083 (33: row:73.73 avs:6609)
      leaf: 0x9000d0a 150998282 (34: row:68.68 avs:6704)
      leaf: 0x9000755 150996821 (35: row:377.377 avs:833)
      leaf: 0x9000419 150995993 (36: row:377.377 avs:833)
      leaf: 0x9000eeb 150998763 (37: row:377.377 avs:833)
      leaf: 0x9000d04 150998276 (38: row:293.293 avs:2429)
      leaf: 0x9000d62 150998370 (39: row:83.83 avs:6419)
      leaf: 0x9000767 150996839 (40: row:377.377 avs:833)
      leaf: 0x9000323 150995747 (41: row:377.377 avs:833)
      leaf: 0x9000c1e 150998046 (42: row:282.282 avs:2638)
      leaf: 0x9000d5b 150998363 (43: row:58.58 avs:6894)
      leaf: 0x900060f 150996495 (44: row:76.76 avs:6552)
      leaf: 0x9000d5a 150998362 (45: row:65.65 avs:6761)
      leaf: 0x90001f1 150995441 (46: row:377.377 avs:833)
      leaf: 0x9000c7e 150998142 (47: row:377.377 avs:833)
      leaf: 0x900061c 150996508 (48: row:377.377 avs:833)
      leaf: 0x9000159 150995289 (49: row:149.149 avs:5165)
      leaf: 0x9000c5b 150998107 (50: row:73.73 avs:6609)
      leaf: 0x9000605 150996485 (51: row:377.377 avs:833)
      leaf: 0x9000a97 150997655 (52: row:377.377 avs:833)
      leaf: 0x9000b9a 150997914 (53: row:370.370 avs:966)
      leaf: 0x9000c0c 150998028 (54: row:61.61 avs:6837)
      leaf: 0x90001fd 150995453 (55: row:377.377 avs:833)
      leaf: 0x900075d 150996829 (56: row:377.377 avs:833)
      leaf: 0x9001078 150999160 (57: row:377.377 avs:833)
      leaf: 0x9001193 150999443 (58: row:377.377 avs:833)
      leaf: 0x9001334 150999860 (59: row:377.377 avs:833)
      leaf: 0x9000947 150997319 (60: row:377.377 avs:833)
      leaf: 0x9000a98 150997656 (61: row:377.377 avs:833)
      leaf: 0x90008cb 150997195 (62: row:377.377 avs:833)
      leaf: 0x9001020 150999072 (63: row:377.377 avs:833)
      leaf: 0x90001d0 150995408 (64: row:377.377 avs:833)
      leaf: 0x9000221 150995489 (65: row:377.377 avs:833)
      leaf: 0x9000158 150995288 (66: row:377.377 avs:833)
      leaf: 0x900035f 150995807 (67: row:377.377 avs:833)
      leaf: 0x900103a 150999098 (68: row:377.377 avs:833)
      leaf: 0x900156e 151000430 (69: row:377.377 avs:833)
      leaf: 0x9000129 150995241 (70: row:377.377 avs:833)
      leaf: 0x9000349 150995785 (71: row:377.377 avs:833)
      leaf: 0x90014bc 151000252 (72: row:377.377 avs:833)
      leaf: 0x90011ac 150999468 (73: row:329.329 avs:1745)
   branch: 0x90007d1 150996945 (1: nrow: 75, level: 1)
      leaf: 0x900017a 150995322 (-1: row:64.64 avs:6780)
      leaf: 0x9000d6f 150998383 (0: row:202.202 avs:4158)
      leaf: 0x9000472 150996082 (1: row:76.76 avs:6552)
      leaf: 0x9000e50 150998608 (2: row:377.377 avs:833)
      leaf: 0x9000c0d 150998029 (3: row:266.266 avs:2942)
      leaf: 0x9000757 150996823 (4: row:79.79 avs:6495)
      leaf: 0x9000d54 150998356 (5: row:377.377 avs:833)
      leaf: 0x9000c17 150998039 (6: row:377.377 avs:833)
      leaf: 0x9000710 150996752 (7: row:377.377 avs:833)
      leaf: 0x9000107 150995207 (8: row:377.377 avs:833)
      leaf: 0x9000d4d 150998349 (9: row:142.142 avs:5298)
      leaf: 0x9000fdf 150999007 (10: row:48.48 avs:7084)
      leaf: 0x9000250 150995536 (11: row:377.377 avs:833)
      leaf: 0x900125d 150999645 (12: row:304.304 avs:2220)
      leaf: 0x9000431 150996017 (13: row:56.56 avs:6932)
      leaf: 0x9000fda 150999002 (14: row:377.377 avs:833)
      leaf: 0x900047f 150996095 (15: row:377.377 avs:833)
      leaf: 0x90008fb 150997243 (16: row:377.377 avs:833)
      leaf: 0x9000e36 150998582 (17: row:377.377 avs:833)
      leaf: 0x9000e09 150998537 (18: row:377.377 avs:833)
      leaf: 0x90013d9 151000025 (19: row:377.377 avs:833)
      leaf: 0x90008a8 150997160 (20: row:377.377 avs:833)
      leaf: 0x9000fc1 150998977 (21: row:377.377 avs:833)
      leaf: 0x9000435 150996021 (22: row:377.377 avs:833)
      leaf: 0x90008d3 150997203 (23: row:377.377 avs:833)
      leaf: 0x9000d2d 150998317 (24: row:377.377 avs:833)
      leaf: 0x9000d1f 150998303 (25: row:377.377 avs:833)
      leaf: 0x9000ecc 150998732 (26: row:377.377 avs:833)
      leaf: 0x9000eee 150998766 (27: row:377.377 avs:833)
      leaf: 0x90001f9 150995449 (28: row:377.377 avs:833)
      leaf: 0x9000744 150996804 (29: row:377.377 avs:833)
      leaf: 0x900044e 150996046 (30: row:377.377 avs:833)
      leaf: 0x9000136 150995254 (31: row:377.377 avs:833)
      leaf: 0x90007ce 150996942 (32: row:377.377 avs:833)
      leaf: 0x9000476 150996086 (33: row:377.377 avs:833)
      leaf: 0x9000bd0 150997968 (34: row:377.377 avs:833)
      leaf: 0x9000776 150996854 (35: row:377.377 avs:833)
      leaf: 0x9000e76 150998646 (36: row:377.377 avs:833)
      leaf: 0x9000173 150995315 (37: row:377.377 avs:833)
      leaf: 0x9000e15 150998549 (38: row:110.110 avs:5906)
      leaf: 0x9000245 150995525 (39: row:59.59 avs:6875)
      leaf: 0x900102d 150999085 (40: row:377.377 avs:833)
      leaf: 0x90001f3 150995443 (41: row:377.377 avs:833)
      leaf: 0x900034a 150995786 (42: row:377.377 avs:833)
      leaf: 0x9000ede 150998750 (43: row:377.377 avs:833)
      leaf: 0x900024d 150995533 (44: row:377.377 avs:833)
      leaf: 0x90007e2 150996962 (45: row:377.377 avs:833)
      leaf: 0x9000450 150996048 (46: row:377.377 avs:833)
      leaf: 0x900078a 150996874 (47: row:377.377 avs:833)
      leaf: 0x9000e1d 150998557 (48: row:377.377 avs:833)
      leaf: 0x9000e39 150998585 (49: row:377.377 avs:833)
      leaf: 0x9000e19 150998553 (50: row:377.377 avs:833)
      leaf: 0x9000779 150996857 (51: row:377.377 avs:833)
      leaf: 0x9000c21 150998049 (52: row:377.377 avs:833)
      leaf: 0x9000d5d 150998365 (53: row:377.377 avs:833)
      leaf: 0x90004e0 150996192 (54: row:377.377 avs:833)
      leaf: 0x9000498 150996120 (55: row:377.377 avs:833)
      leaf: 0x9000ffe 150999038 (56: row:377.377 avs:833)
      leaf: 0x9000975 150997365 (57: row:377.377 avs:833)
      leaf: 0x90011d5 150999509 (58: row:274.274 avs:2790)
      leaf: 0x9000a52 150997586 (59: row:55.55 avs:6951)
      leaf: 0x9001347 150999879 (60: row:143.143 avs:5279)
      leaf: 0x900097b 150997371 (61: row:66.66 avs:6742)
      leaf: 0x9001129 150999337 (62: row:357.357 avs:1213)
      leaf: 0x90008ad 150997165 (63: row:55.55 avs:6951)
      leaf: 0x9001047 150999111 (64: row:42.42 avs:7198)
      leaf: 0x90004b3 150996147 (65: row:360.360 avs:1156)
      leaf: 0x9001126 150999334 (66: row:50.50 avs:7046)
      leaf: 0x90004d5 150996181 (67: row:59.59 avs:6875)
      leaf: 0x900114f 150999375 (68: row:377.377 avs:833)
      leaf: 0x9000ff6 150999030 (69: row:377.377 avs:833)
      leaf: 0x9000ead 150998701 (70: row:317.317 avs:1973)
      leaf: 0x90007cc 150996940 (71: row:55.55 avs:6951)
      leaf: 0x9000eea 150998762 (72: row:377.377 avs:833)
      leaf: 0x900097f 150997375 (73: row:346.346 avs:1422)
   branch: 0x9000e8a 150998666 (2: nrow: 78, level: 1)
      leaf: 0x9000142 150995266 (-1: row:60.60 avs:6856)
      leaf: 0x90014c1 151000257 (0: row:377.377 avs:833)
      leaf: 0x9001398 150999960 (1: row:377.377 avs:833)
      leaf: 0x900110d 150999309 (2: row:377.377 avs:833)
      leaf: 0x9000372 150995826 (3: row:377.377 avs:833)
      leaf: 0x9001136 150999350 (4: row:377.377 avs:833)
      leaf: 0x9000940 150997312 (5: row:367.367 avs:1023)
      leaf: 0x9000bfe 150998014 (6: row:84.84 avs:6400)
      leaf: 0x90001d1 150995409 (7: row:377.377 avs:833)
      leaf: 0x900112d 150999341 (8: row:377.377 avs:833)
      leaf: 0x9000931 150997297 (9: row:102.102 avs:6058)
      leaf: 0x9001135 150999349 (10: row:56.56 avs:6932)
      leaf: 0x9000170 150995312 (11: row:118.118 avs:5754)
      leaf: 0x90008b5 150997173 (12: row:51.51 avs:7027)
      leaf: 0x90013e2 151000034 (13: row:377.377 avs:833)
      leaf: 0x9001194 150999444 (14: row:377.377 avs:833)
      leaf: 0x9000ee5 150998757 (15: row:377.377 avs:833)
      leaf: 0x9000976 150997366 (16: row:171.171 avs:4747)
      leaf: 0x9001238 150999608 (17: row:57.57 avs:6913)
      leaf: 0x900093d 150997309 (18: row:377.377 avs:833)
      leaf: 0x9000a32 150997554 (19: row:377.377 avs:833)
      leaf: 0x90001ad 150995373 (20: row:377.377 avs:833)
      leaf: 0x9001547 151000391 (21: row:377.377 avs:833)
      leaf: 0x9000ae3 150997731 (22: row:377.377 avs:833)
      leaf: 0x9000656 150996566 (23: row:377.377 avs:833)
      leaf: 0x900138a 150999946 (24: row:377.377 avs:833)
      leaf: 0x90005b8 150996408 (25: row:377.377 avs:833)
      leaf: 0x900126e 150999662 (26: row:377.377 avs:833)
      leaf: 0x9000c74 150998132 (27: row:377.377 avs:833)
      leaf: 0x9000318 150995736 (28: row:377.377 avs:833)
      leaf: 0x9000160 150995296 (29: row:377.377 avs:833)
      leaf: 0x9001278 150999672 (30: row:377.377 avs:833)
      leaf: 0x90008f3 150997235 (31: row:375.375 avs:871)
      leaf: 0x900101c 150999068 (32: row:47.47 avs:7103)
      leaf: 0x9000921 150997281 (33: row:52.52 avs:7008)
      leaf: 0x9001123 150999331 (34: row:377.377 avs:833)
      leaf: 0x9000138 150995256 (35: row:377.377 avs:833)
      leaf: 0x9000345 150995781 (36: row:377.377 avs:833)
      leaf: 0x90001a4 150995364 (37: row:377.377 avs:833)
      leaf: 0x9000304 150995716 (38: row:377.377 avs:833)
      leaf: 0x900125a 150999642 (39: row:377.377 avs:833)
      leaf: 0x90014ff 151000319 (40: row:377.377 avs:833)
      leaf: 0x9000fa5 150998949 (41: row:377.377 avs:833)
      leaf: 0x9000f9a 150998938 (42: row:377.377 avs:833)
      leaf: 0x900091e 150997278 (43: row:377.377 avs:833)
      leaf: 0x9001153 150999379 (44: row:377.377 avs:833)
      leaf: 0x90004c1 150996161 (45: row:377.377 avs:833)
      leaf: 0x9000fc2 150998978 (46: row:335.335 avs:1631)
      leaf: 0x9000220 150995488 (47: row:69.69 avs:6685)
      leaf: 0x9000fd8 150999000 (48: row:377.377 avs:833)
      leaf: 0x9000168 150995304 (49: row:377.377 avs:833)
      leaf: 0x9001105 150999301 (50: row:377.377 avs:833)
      leaf: 0x9001368 150999912 (51: row:377.377 avs:833)
      leaf: 0x9000358 150995800 (52: row:377.377 avs:833)
      leaf: 0x9001652 151000658 (53: row:82.82 avs:6438)
      leaf: 0x9000125 150995237 (54: row:45.45 avs:7141)
      leaf: 0x90014a9 151000233 (55: row:377.377 avs:833)
      leaf: 0x9000bfa 150998010 (56: row:377.377 avs:833)
      leaf: 0x9000bd7 150997975 (57: row:377.377 avs:833)
      leaf: 0x9000ad6 150997718 (58: row:377.377 avs:833)
      leaf: 0x9000884 150997124 (59: row:377.377 avs:833)
      leaf: 0x9000fcc 150998988 (60: row:377.377 avs:833)
      leaf: 0x90011da 150999514 (61: row:377.377 avs:833)
      leaf: 0x90008eb 150997227 (62: row:377.377 avs:833)
      leaf: 0x9001391 150999953 (63: row:377.377 avs:833)
      leaf: 0x9001104 150999300 (64: row:377.377 avs:833)
      leaf: 0x900043a 150996026 (65: row:377.377 avs:833)
      leaf: 0x9000ebf 150998719 (66: row:215.215 avs:3911)
      leaf: 0x90007cf 150996943 (67: row:68.68 avs:6704)
      leaf: 0x9000e83 150998659 (68: row:377.377 avs:833)
      leaf: 0x90007af 150996911 (69: row:377.377 avs:833)
      leaf: 0x900127a 150999674 (70: row:377.377 avs:833)
      leaf: 0x900166d 151000685 (71: row:377.377 avs:833)
      leaf: 0x90014c9 151000265 (72: row:377.377 avs:833)
      leaf: 0x9000bb4 150997940 (73: row:377.377 avs:833)
      leaf: 0x9000616 150996502 (74: row:377.377 avs:833)
      leaf: 0x90004ff 150996223 (75: row:377.377 avs:833)
      leaf: 0x9001337 150999863 (76: row:283.283 avs:2619)
   branch: 0x900043c 150996028 (3: nrow: 85, level: 1)
      leaf: 0x9000171 150995313 (-1: row:377.377 avs:833)
      leaf: 0x900120f 150999567 (0: row:351.351 avs:1327)
      leaf: 0x9000424 150996004 (1: row:47.47 avs:7103)
      leaf: 0x9000ebc 150998716 (2: row:52.52 avs:7008)
      leaf: 0x9000791 150996881 (3: row:326.326 avs:1802)
      leaf: 0x9000439 150996025 (4: row:71.71 avs:6647)
      leaf: 0x9000e0c 150998540 (5: row:55.55 avs:6951)
      leaf: 0x90007c8 150996936 (6: row:377.377 avs:833)
      leaf: 0x90008b1 150997169 (7: row:377.377 avs:833)
      leaf: 0x9000445 150996037 (8: row:377.377 avs:833)
      leaf: 0x9000777 150996855 (9: row:377.377 avs:833)
      leaf: 0x90001d7 150995415 (10: row:377.377 avs:833)
      leaf: 0x90013ee 151000046 (11: row:377.377 avs:833)
      leaf: 0x900134f 150999887 (12: row:377.377 avs:833)
      leaf: 0x9000e68 150998632 (13: row:377.377 avs:833)
      leaf: 0x9000f91 150998929 (14: row:377.377 avs:833)
      leaf: 0x9001112 150999314 (15: row:317.317 avs:1973)
      leaf: 0x900023f 150995519 (16: row:62.62 avs:6818)
      leaf: 0x9000e85 150998661 (17: row:203.203 avs:4139)
      leaf: 0x90004f3 150996211 (18: row:61.61 avs:6837)
      leaf: 0x9000ee9 150998761 (19: row:377.377 avs:833)
      leaf: 0x900041a 150995994 (20: row:377.377 avs:833)
      leaf: 0x9000724 150996772 (21: row:124.124 avs:5640)
      leaf: 0x9000418 150995992 (22: row:59.59 avs:6875)
      leaf: 0x9000ebe 150998718 (23: row:377.377 avs:833)
      leaf: 0x9000d5f 150998367 (24: row:98.98 avs:6134)
      leaf: 0x9000460 150996064 (25: row:75.75 avs:6571)
      leaf: 0x9000c54 150998100 (26: row:377.377 avs:833)
      leaf: 0x9000f94 150998932 (27: row:377.377 avs:833)
      leaf: 0x9000d36 150998326 (28: row:377.377 avs:833)
      leaf: 0x9000e84 150998660 (29: row:377.377 avs:833)
      leaf: 0x9000ee6 150998758 (30: row:377.377 avs:833)
      leaf: 0x900042e 150996014 (31: row:377.377 avs:833)
      leaf: 0x900073a 150996794 (32: row:377.377 avs:833)
      leaf: 0x90011bc 150999484 (33: row:377.377 avs:833)
      leaf: 0x900020d 150995469 (34: row:377.377 avs:833)
      leaf: 0x9001356 150999894 (35: row:377.377 avs:833)
      leaf: 0x9000e8e 150998670 (36: row:377.377 avs:833)
      leaf: 0x900059c 150996380 (37: row:377.377 avs:833)
      leaf: 0x90004de 150996190 (38: row:377.377 avs:833)
      leaf: 0x9000fed 150999021 (39: row:377.377 avs:833)
      leaf: 0x9000ff7 150999031 (40: row:377.377 avs:833)
      leaf: 0x9000237 150995511 (41: row:377.377 avs:833)
      leaf: 0x9000f8b 150998923 (42: row:318.318 avs:1954)
      leaf: 0x9000494 150996116 (43: row:48.48 avs:7084)
      leaf: 0x90011bf 150999487 (44: row:377.377 avs:833)
      leaf: 0x9001249 150999625 (45: row:128.128 avs:5564)
      leaf: 0x900104b 150999115 (46: row:60.60 avs:6856)
      leaf: 0x90001e3 150995427 (47: row:207.207 avs:4063)
      leaf: 0x9000c18 150998040 (48: row:63.63 avs:6799)
      leaf: 0x900043b 150996027 (49: row:377.377 avs:833)
      leaf: 0x90011de 150999518 (50: row:377.377 avs:833)
      leaf: 0x90007f6 150996982 (51: row:377.377 avs:833)
      leaf: 0x900027c 150995580 (52: row:377.377 avs:833)
      leaf: 0x90001db 150995419 (53: row:377.377 avs:833)
      leaf: 0x9000959 150997337 (54: row:346.346 avs:1422)
      leaf: 0x9000417 150995991 (55: row:50.50 avs:7046)
      leaf: 0x9001350 150999888 (56: row:377.377 avs:833)
      leaf: 0x90004b6 150996150 (57: row:377.377 avs:833)
      leaf: 0x900048e 150996110 (58: row:377.377 avs:833)
      leaf: 0x900049f 150996127 (59: row:377.377 avs:833)
      leaf: 0x90004d8 150996184 (60: row:377.377 avs:833)
      leaf: 0x9000d09 150998281 (61: row:377.377 avs:833)
      leaf: 0x9000ee8 150998760 (62: row:377.377 avs:833)
      leaf: 0x900116c 150999404 (63: row:377.377 avs:833)
      leaf: 0x9000ef0 150998768 (64: row:377.377 avs:833)
      leaf: 0x9000e6c 150998636 (65: row:377.377 avs:833)
      leaf: 0x9000e1b 150998555 (66: row:377.377 avs:833)
      leaf: 0x900079f 150996895 (67: row:377.377 avs:833)
      leaf: 0x900042d 150996013 (68: row:377.377 avs:833)
      leaf: 0x90007b7 150996919 (69: row:377.377 avs:833)
      leaf: 0x900103b 150999099 (70: row:377.377 avs:833)
      leaf: 0x900017e 150995326 (71: row:377.377 avs:833)
      leaf: 0x9000978 150997368 (72: row:377.377 avs:833)
      leaf: 0x9001029 150999081 (73: row:302.302 avs:2258)
      leaf: 0x90007f1 150996977 (74: row:68.68 avs:6704)
      leaf: 0x9000e10 150998544 (75: row:377.377 avs:833)
      leaf: 0x90007d7 150996951 (76: row:377.377 avs:833)
      leaf: 0x9000117 150995223 (77: row:377.377 avs:833)
      leaf: 0x900072f 150996783 (78: row:377.377 avs:833)
      leaf: 0x9000415 150995989 (79: row:245.245 avs:3341)
      leaf: 0x90001f6 150995446 (80: row:51.51 avs:7027)
      leaf: 0x9000e25 150998565 (81: row:377.377 avs:833)
      leaf: 0x900026b 150995563 (82: row:377.377 avs:833)
      leaf: 0x900072b 150996779 (83: row:182.182 avs:4538)
   branch: 0x9000e18 150998552 (4: nrow: 90, level: 1)
      leaf: 0x900017b 150995323 (-1: row:361.361 avs:1137)
      leaf: 0x90007fc 150996988 (0: row:57.57 avs:6913)
      leaf: 0x9000fb2 150998962 (1: row:67.67 avs:6723)
      leaf: 0x9000272 150995570 (2: row:377.377 avs:833)
      leaf: 0x900075a 150996826 (3: row:135.135 avs:5431)
      leaf: 0x90001d2 150995410 (4: row:87.87 avs:6343)
      leaf: 0x9000c6c 150998124 (5: row:193.193 avs:4329)
      leaf: 0x9000327 150995751 (6: row:93.93 avs:6229)
      leaf: 0x9000bb5 150997941 (7: row:377.377 avs:833)
      leaf: 0x900063e 150996542 (8: row:226.226 avs:3702)
      leaf: 0x9000a93 150997651 (9: row:44.44 avs:7160)
      leaf: 0x90014be 151000254 (10: row:377.377 avs:833)
      leaf: 0x9000e2e 150998574 (11: row:377.377 avs:833)
      leaf: 0x9001074 150999156 (12: row:224.224 avs:3740)
      leaf: 0x9000eb6 150998710 (13: row:51.51 avs:7027)
      leaf: 0x9000788 150996872 (14: row:377.377 avs:833)
      leaf: 0x9000428 150996008 (15: row:118.118 avs:5754)
      leaf: 0x9000e28 150998568 (16: row:64.64 avs:6780)
      leaf: 0x90007da 150996954 (17: row:143.143 avs:5279)
      leaf: 0x900025f 150995551 (18: row:54.54 avs:6970)
      leaf: 0x9000d03 150998275 (19: row:354.354 avs:1270)
      leaf: 0x9000e2f 150998575 (20: row:64.64 avs:6780)
      leaf: 0x900017d 150995325 (21: row:64.64 avs:6780)
      leaf: 0x9000d45 150998341 (22: row:377.377 avs:833)
      leaf: 0x9000e31 150998577 (23: row:239.239 avs:3455)
      leaf: 0x9000e05 150998533 (24: row:63.63 avs:6799)
      leaf: 0x9000458 150996056 (25: row:377.377 avs:833)
      leaf: 0x90007c5 150996933 (26: row:377.377 avs:833)
      leaf: 0x900025a 150995546 (27: row:377.377 avs:833)
      leaf: 0x9000c29 150998057 (28: row:377.377 avs:833)
      leaf: 0x9001149 150999369 (29: row:377.377 avs:833)
      leaf: 0x9000732 150996786 (30: row:377.377 avs:833)
      leaf: 0x9000e8d 150998669 (31: row:377.377 avs:833)
      leaf: 0x9000264 150995556 (32: row:377.377 avs:833)
      leaf: 0x90007ec 150996972 (33: row:377.377 avs:833)
      leaf: 0x9001042 150999106 (34: row:377.377 avs:833)
      leaf: 0x9001131 150999345 (35: row:377.377 avs:833)
      leaf: 0x9001125 150999333 (36: row:377.377 avs:833)
      leaf: 0x9000ff9 150999033 (37: row:120.120 avs:5716)
      leaf: 0x9000784 150996868 (38: row:57.57 avs:6913)
      leaf: 0x9000ec6 150998726 (39: row:377.377 avs:833)
      leaf: 0x9000924 150997284 (40: row:377.377 avs:833)
      leaf: 0x900095f 150997343 (41: row:377.377 avs:833)
      leaf: 0x90008cc 150997196 (42: row:377.377 avs:833)
      leaf: 0x90008b3 150997171 (43: row:377.377 avs:833)
      leaf: 0x9000172 150995314 (44: row:377.377 avs:833)
      leaf: 0x90011e5 150999525 (45: row:377.377 avs:833)
      leaf: 0x9000ace 150997710 (46: row:377.377 avs:833)
      leaf: 0x90011af 150999471 (47: row:377.377 avs:833)
      leaf: 0x90005ed 150996461 (48: row:377.377 avs:833)
      leaf: 0x90004e4 150996196 (49: row:377.377 avs:833)
      leaf: 0x9000795 150996885 (50: row:377.377 avs:833)
      leaf: 0x900136f 150999919 (51: row:377.377 avs:833)
      leaf: 0x9001338 150999864 (52: row:377.377 avs:833)
      leaf: 0x9000fce 150998990 (53: row:377.377 avs:833)
      leaf: 0x900127e 150999678 (54: row:377.377 avs:833)
      leaf: 0x90013bc 150999996 (55: row:377.377 avs:833)
      leaf: 0x9001053 150999123 (56: row:130.130 avs:5526)
      leaf: 0x900121d 150999581 (57: row:51.51 avs:7027)
      leaf: 0x90004a6 150996134 (58: row:377.377 avs:833)
      leaf: 0x90004d4 150996180 (59: row:377.377 avs:833)
      leaf: 0x90011be 150999486 (60: row:377.377 avs:833)
      leaf: 0x9001160 150999392 (61: row:377.377 avs:833)
      leaf: 0x900072c 150996780 (62: row:377.377 avs:833)
      leaf: 0x9000d6a 150998378 (63: row:377.377 avs:833)
      leaf: 0x900075e 150996830 (64: row:377.377 avs:833)
      leaf: 0x900047d 150996093 (65: row:377.377 avs:833)
      leaf: 0x9000d5c 150998364 (66: row:377.377 avs:833)
      leaf: 0x900074c 150996812 (67: row:377.377 avs:833)
      leaf: 0x9000449 150996041 (68: row:377.377 avs:833)
      leaf: 0x9000e90 150998672 (69: row:377.377 avs:833)
      leaf: 0x900027b 150995579 (70: row:377.377 avs:833)
      leaf: 0x9001012 150999058 (71: row:377.377 avs:833)
      leaf: 0x9000ec7 150998727 (72: row:377.377 avs:833)
      leaf: 0x900104e 150999118 (73: row:377.377 avs:833)
      leaf: 0x9000709 150996745 (74: row:377.377 avs:833)
      leaf: 0x900027a 150995578 (75: row:377.377 avs:833)
      leaf: 0x90007bb 150996923 (76: row:81.81 avs:6457)
      leaf: 0x9000f99 150998937 (77: row:47.47 avs:7103)
      leaf: 0x90001ea 150995434 (78: row:377.377 avs:833)
      leaf: 0x900072a 150996778 (79: row:377.377 avs:833)
      leaf: 0x9000429 150996009 (80: row:284.284 avs:2600)
      leaf: 0x9000e96 150998678 (81: row:52.52 avs:7008)
      leaf: 0x900071a 150996762 (82: row:377.377 avs:833)
      leaf: 0x9000273 150995571 (83: row:377.377 avs:833)
      leaf: 0x9000749 150996809 (84: row:148.148 avs:5184)
      leaf: 0x90001da 150995418 (85: row:65.65 avs:6761)
      leaf: 0x9000ef5 150998773 (86: row:377.377 avs:833)
      leaf: 0x9000257 150995543 (87: row:377.377 avs:833)
      leaf: 0x90007e6 150996966 (88: row:133.133 avs:5469)
   branch: 0x900073d 150996797 (5: nrow: 88, level: 1)
      leaf: 0x9000143 150995267 (-1: row:377.377 avs:833)
      leaf: 0x9000ef7 150998775 (0: row:377.377 avs:833)
      leaf: 0x9000a2c 150997548 (1: row:377.377 avs:833)
      leaf: 0x9000356 150995798 (2: row:377.377 avs:833)
      leaf: 0x9000108 150995208 (3: row:377.377 avs:833)
      leaf: 0x9001035 150999093 (4: row:377.377 avs:833)
      leaf: 0x9001033 150999091 (5: row:199.199 avs:4215)
      leaf: 0x90001c9 150995401 (6: row:51.51 avs:7027)
      leaf: 0x900155a 151000410 (7: row:377.377 avs:833)
      leaf: 0x90008bc 150997180 (8: row:322.322 avs:1878)
      leaf: 0x9000174 150995316 (9: row:68.68 avs:6704)
      leaf: 0x900101d 150999069 (10: row:377.377 avs:833)
      leaf: 0x9001389 150999945 (11: row:377.377 avs:833)
      leaf: 0x90011b0 150999472 (12: row:377.377 avs:833)
      leaf: 0x90013b5 150999989 (13: row:377.377 avs:833)
      leaf: 0x9000139 150995257 (14: row:377.377 avs:833)
      leaf: 0x9000a9c 150997660 (15: row:377.377 avs:833)
      leaf: 0x9000652 150996562 (16: row:377.377 avs:833)
      leaf: 0x9000bdc 150997980 (17: row:377.377 avs:833)
      leaf: 0x90011a6 150999462 (18: row:377.377 avs:833)
      leaf: 0x9001256 150999638 (19: row:377.377 avs:833)
      leaf: 0x9001535 151000373 (20: row:377.377 avs:833)
      leaf: 0x9001358 150999896 (21: row:377.377 avs:833)
      leaf: 0x9000590 150996368 (22: row:329.329 avs:1745)
      leaf: 0x900154d 151000397 (23: row:48.48 avs:7084)
      leaf: 0x90005b9 150996409 (24: row:377.377 avs:833)
      leaf: 0x9001355 150999893 (25: row:377.377 avs:833)
      leaf: 0x900105d 150999133 (26: row:377.377 avs:833)
      leaf: 0x90005c3 150996419 (27: row:377.377 avs:833)
      leaf: 0x9000585 150996357 (28: row:377.377 avs:833)
      leaf: 0x9000485 150996101 (29: row:369.369 avs:985)
      leaf: 0x900091a 150997274 (30: row:48.48 avs:7084)
      leaf: 0x9001113 150999315 (31: row:377.377 avs:833)
      leaf: 0x90005c6 150996422 (32: row:377.377 avs:833)
      leaf: 0x9000486 150996102 (33: row:377.377 avs:833)
      leaf: 0x9000ab0 150997680 (34: row:377.377 avs:833)
      leaf: 0x9000abc 150997692 (35: row:377.377 avs:833)
      leaf: 0x90013ec 151000044 (36: row:338.338 avs:1574)
      leaf: 0x900058a 150996362 (37: row:72.72 avs:6628)
      leaf: 0x9000bca 150997962 (38: row:54.54 avs:6970)
      leaf: 0x90014d0 151000272 (39: row:377.377 avs:833)
      leaf: 0x900123f 150999615 (40: row:60.60 avs:6856)
      leaf: 0x9000377 150995831 (41: row:44.44 avs:7160)
      leaf: 0x9001307 150999815 (42: row:377.377 avs:833)
      leaf: 0x9001062 150999138 (43: row:377.377 avs:833)
      leaf: 0x900034c 150995788 (44: row:377.377 avs:833)
      leaf: 0x900019d 150995357 (45: row:377.377 avs:833)
      leaf: 0x9001392 150999954 (46: row:377.377 avs:833)
      leaf: 0x9000c6a 150998122 (47: row:377.377 avs:833)
      leaf: 0x9000a03 150997507 (48: row:377.377 avs:833)
      leaf: 0x9000ac3 150997699 (49: row:377.377 avs:833)
      leaf: 0x9000c3f 150998079 (50: row:142.142 avs:5298)
      leaf: 0x9001566 151000422 (51: row:61.61 avs:6837)
      leaf: 0x9000a82 150997634 (52: row:377.377 avs:833)
      leaf: 0x9000355 150995797 (53: row:377.377 avs:833)
      leaf: 0x900130c 150999820 (54: row:64.64 avs:6780)
      leaf: 0x9000186 150995334 (55: row:58.58 avs:6894)
      leaf: 0x9000c76 150998134 (56: row:168.168 avs:4804)
      leaf: 0x9000e56 150998614 (57: row:48.48 avs:7084)
      leaf: 0x9000471 150996081 (58: row:377.377 avs:833)
      leaf: 0x9000a4e 150997582 (59: row:321.321 avs:1897)
      leaf: 0x900127c 150999676 (60: row:62.62 avs:6818)
      leaf: 0x9000a19 150997529 (61: row:40.40 avs:7236)
      leaf: 0x90014d8 151000280 (62: row:377.377 avs:833)
      leaf: 0x90005ce 150996430 (63: row:377.377 avs:833)
      leaf: 0x9000582 150996354 (64: row:377.377 avs:833)
      leaf: 0x900119f 150999455 (65: row:377.377 avs:833)
      leaf: 0x900132d 150999853 (66: row:377.377 avs:833)
      leaf: 0x9000a77 150997623 (67: row:183.183 avs:4519)
      leaf: 0x9000ac2 150997698 (68: row:34.34 avs:7350)
      leaf: 0x900137f 150999935 (69: row:331.331 avs:1707)
      leaf: 0x9000fbb 150998971 (70: row:51.51 avs:7027)
      leaf: 0x90008f6 150997238 (71: row:61.61 avs:6837)
      leaf: 0x9001026 150999078 (72: row:377.377 avs:833)
      leaf: 0x9001230 150999600 (73: row:377.377 avs:833)
      leaf: 0x9001224 150999588 (74: row:377.377 avs:833)
      leaf: 0x900103f 150999103 (75: row:377.377 avs:833)
      leaf: 0x90011c1 150999489 (76: row:377.377 avs:833)
      leaf: 0x90005f3 150996467 (77: row:377.377 avs:833)
      leaf: 0x90005fc 150996476 (78: row:377.377 avs:833)
      leaf: 0x90004ac 150996140 (79: row:377.377 avs:833)
      leaf: 0x9000bb0 150997936 (80: row:377.377 avs:833)
      leaf: 0x9000972 150997362 (81: row:377.377 avs:833)
      leaf: 0x9000adf 150997727 (82: row:377.377 avs:833)
      leaf: 0x9000ab3 150997683 (83: row:377.377 avs:833)
      leaf: 0x9000bd6 150997974 (84: row:377.377 avs:833)
      leaf: 0x9000650 150996560 (85: row:377.377 avs:833)
      leaf: 0x9000607 150996487 (86: row:155.155 avs:5051)
   branch: 0x9000c68 150998120 (6: nrow: 101, level: 1)
      leaf: 0x90001b2 150995378 (-1: row:362.362 avs:1118)
      leaf: 0x9000329 150995753 (0: row:77.77 avs:6533)
      leaf: 0x9000bd5 150997973 (1: row:377.377 avs:833)
      leaf: 0x900066d 150996589 (2: row:377.377 avs:833)
      leaf: 0x90005c5 150996421 (3: row:377.377 avs:833)
      leaf: 0x9000497 150996119 (4: row:163.163 avs:4899)
      leaf: 0x9001188 150999432 (5: row:54.54 avs:6970)
      leaf: 0x9000350 150995792 (6: row:377.377 avs:833)
      leaf: 0x900011a 150995226 (7: row:377.377 avs:833)
      leaf: 0x900153c 151000380 (8: row:377.377 avs:833)
      leaf: 0x900062e 150996526 (9: row:377.377 avs:833)
      leaf: 0x900064f 150996559 (10: row:377.377 avs:833)
      leaf: 0x90005e7 150996455 (11: row:377.377 avs:833)
      leaf: 0x9000482 150996098 (12: row:377.377 avs:833)
      leaf: 0x900066f 150996591 (13: row:377.377 avs:833)
      leaf: 0x9000bf1 150998001 (14: row:377.377 avs:833)
      leaf: 0x900131e 150999838 (15: row:377.377 avs:833)
      leaf: 0x90004a3 150996131 (16: row:377.377 avs:833)
      leaf: 0x90004a8 150996136 (17: row:377.377 avs:833)
      leaf: 0x9000a5b 150997595 (18: row:377.377 avs:833)
      leaf: 0x90008ca 150997194 (19: row:377.377 avs:833)
      leaf: 0x9000224 150995492 (20: row:377.377 avs:833)
      leaf: 0x90007bf 150996927 (21: row:281.281 avs:2657)
      leaf: 0x900124e 150999630 (22: row:47.47 avs:7103)
      leaf: 0x90005af 150996399 (23: row:181.181 avs:4557)
      leaf: 0x9001388 150999944 (24: row:54.54 avs:6970)
      leaf: 0x9000a36 150997558 (25: row:377.377 avs:833)
      leaf: 0x9000e65 150998629 (26: row:377.377 avs:833)
      leaf: 0x9000e67 150998631 (27: row:377.377 avs:833)
      leaf: 0x90011a5 150999461 (28: row:377.377 avs:833)
      leaf: 0x9001005 150999045 (29: row:377.377 avs:833)
      leaf: 0x9000455 150996053 (30: row:377.377 avs:833)
      leaf: 0x9001376 150999926 (31: row:377.377 avs:833)
      leaf: 0x90008a6 150997158 (32: row:377.377 avs:833)
      leaf: 0x900095b 150997339 (33: row:377.377 avs:833)
      leaf: 0x9001060 150999136 (34: row:377.377 avs:833)
      leaf: 0x9001138 150999352 (35: row:377.377 avs:833)
      leaf: 0x90007dc 150996956 (36: row:292.292 avs:2448)
      leaf: 0x9000eff 150998783 (37: row:69.69 avs:6685)
      leaf: 0x900040b 150995979 (38: row:377.377 avs:833)
      leaf: 0x90005f1 150996465 (39: row:377.377 avs:833)
      leaf: 0x9001127 150999335 (40: row:320.320 avs:1916)
      leaf: 0x9000af2 150997746 (41: row:74.74 avs:6590)
      leaf: 0x9000588 150996360 (42: row:35.35 avs:7331)
      leaf: 0x90014a4 151000228 (43: row:377.377 avs:833)
      leaf: 0x9000ae8 150997736 (44: row:377.377 avs:833)
      leaf: 0x9001164 150999396 (45: row:377.377 avs:833)
      leaf: 0x9001155 150999381 (46: row:377.377 avs:833)
      leaf: 0x90004ad 150996141 (47: row:179.179 avs:4595)
      leaf: 0x900035e 150995806 (48: row:37.37 avs:7293)
      leaf: 0x900116f 150999407 (49: row:139.139 avs:5355)
      leaf: 0x90005a5 150996389 (50: row:45.45 avs:7141)
      leaf: 0x90014db 151000283 (51: row:377.377 avs:833)
      leaf: 0x9001176 150999414 (52: row:377.377 avs:833)
      leaf: 0x90004da 150996186 (53: row:377.377 avs:833)
      leaf: 0x90004e6 150996198 (54: row:377.377 avs:833)
      leaf: 0x9000ee7 150998759 (55: row:377.377 avs:833)
      leaf: 0x9000e5c 150998620 (56: row:377.377 avs:833)
      leaf: 0x900021a 150995482 (57: row:377.377 avs:833)
      leaf: 0x90008a9 150997161 (58: row:340.340 avs:1536)
      leaf: 0x90004b1 150996145 (59: row:64.64 avs:6780)
      leaf: 0x90011f8 150999544 (60: row:111.111 avs:5887)
      leaf: 0x900157e 151000446 (61: row:42.42 avs:7198)
      leaf: 0x900021f 150995487 (62: row:377.377 avs:833)
      leaf: 0x9001270 150999664 (63: row:230.230 avs:3626)
      leaf: 0x900117e 150999422 (64: row:58.58 avs:6894)
      leaf: 0x900048a 150996106 (65: row:377.377 avs:833)
      leaf: 0x90004be 150996158 (66: row:377.377 avs:833)
      leaf: 0x90005f8 150996472 (67: row:291.291 avs:2467)
      leaf: 0x9000aaf 150997679 (68: row:72.72 avs:6628)
      leaf: 0x9000595 150996373 (69: row:377.377 avs:833)
      leaf: 0x90014ab 151000235 (70: row:163.163 avs:4899)
      leaf: 0x9000344 150995780 (71: row:54.54 avs:6970)
      leaf: 0x90013ef 151000047 (72: row:377.377 avs:833)
      leaf: 0x9000971 150997361 (73: row:377.377 avs:833)
      leaf: 0x9000922 150997282 (74: row:377.377 avs:833)
      leaf: 0x900090f 150997263 (75: row:377.377 avs:833)
      leaf: 0x9000b82 150997890 (76: row:134.134 avs:5450)
      leaf: 0x900062b 150996523 (77: row:78.78 avs:6514)
      leaf: 0x9000ba5 150997925 (78: row:377.377 avs:833)
      leaf: 0x9001488 151000200 (79: row:377.377 avs:833)
      leaf: 0x9001212 150999570 (80: row:377.377 avs:833)
      leaf: 0x9000ba0 150997920 (81: row:324.324 avs:1840)
      leaf: 0x9000c41 150998081 (82: row:81.81 avs:6457)
      leaf: 0x9000676 150996598 (83: row:84.84 avs:6400)
      leaf: 0x9000c75 150998133 (84: row:244.244 avs:3360)
      leaf: 0x9000610 150996496 (85: row:82.82 avs:6438)
      leaf: 0x9000c33 150998067 (86: row:377.377 avs:833)
      leaf: 0x9000bd9 150997977 (87: row:198.198 avs:4234)
      leaf: 0x9000c6b 150998123 (88: row:79.79 avs:6495)
      leaf: 0x900014d 150995277 (89: row:294.294 avs:2410)
      leaf: 0x9000475 150996085 (90: row:80.80 avs:6476)
      leaf: 0x9000d2b 150998315 (91: row:161.161 avs:4937)
      leaf: 0x9000c11 150998033 (92: row:74.74 avs:6590)
      leaf: 0x9000275 150995573 (93: row:377.377 avs:833)
      leaf: 0x9000c5c 150998108 (94: row:377.377 avs:833)
      leaf: 0x900064e 150996558 (95: row:377.377 avs:833)
      leaf: 0x9000c57 150998103 (96: row:310.310 avs:2106)
      leaf: 0x9000c42 150998082 (97: row:61.61 avs:6837)
      leaf: 0x9000651 150996561 (98: row:377.377 avs:833)
      leaf: 0x9000bbe 150997950 (99: row:94.94 avs:6210)
----- end tree dump

shrink space compact

When we switch to “alter index shrink space compact” (which is the version that doesn’t lower the index highwater mark), the first striking difference appears in the dbms_space report:

Unformatted                   :           62 /          507,904
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :        3,045 /       24,944,640
Full                          :          544 /        4,456,448

PL/SQL procedure successfully completed.

Segment Total blocks:        3,712
Object Unused blocks:            0

Basically we had 3,000 blocks reported as Freespace 2 after a coalesce, but now we see those blocks reported as Freespace 4. Are they in the index structure, have they been unlinked, and is the undo/redo going to show anything significantly different because of this change.

In a single stream, here are the things we need to cross-reference to get a better view of what Oracle has done. Some critical redo stats, the undo segment stats and the report of enqueue requests:

Name                                                                     Value
----                                                                     -----
redo entries                                                            47,521
redo size                                                           85,285,860
redo buffer allocation retries                                              13
undo change vector size                                             59,344,340
rollback changes - undo records applied                                    447


USN   Ex Size K  HWM K  Opt K      Writes     Gets  Waits Shr Grow Shr K  Act K
----  -- ------  -----  -----      ------     ----  ----- --- ---- ----- ------
   0   0      0      0      0           0        1      0   0    0     0      0
   1   0      0      0      0           0       25      0   0    0     0      0
   2   0      0      0      0           0       25      0   0    0     0      0
   3   0      0      0      0           0       25      0   0    0     0      0
   4 -30 -29760      0      0         328       75      0   3    0    55      0
   5   0      0      0      0           0       25      0   0    0     0      0
   6   0      0      0      0           0       25      0   0    0     0      0
   7   0      0      0      0           0       25      0   0    0     0      0
   8 -41 -37952      0      0         410       96      0   5    0    65      0
   9 104  85056      0      0    58635908    14524      0   0  104     0 148492
  10   0      0      0      0           0       25      0   0    0     0      0


Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
CF             2           0           2           0           0 contention
CR           710          17         710           0           2 block range reuse ckpt
IS            71           0          71           0           0 contention
XR             1           0           1           0           0 database force logging
TM             1           0           1           0           0 contention
TX         3,567           0       3,567           0           0 contention
US           116           0         116           0           0 contention
HW           348           0         348           0           0 contention
SK             1           0           1           0           0 contention
TT           232           0         232           0           0 contention
SJ             2           0           2           0           0 Slave Task Cancel
CU             1           0           1           0           0 contention
JG           357           0         357           0           0 queue lock
JG            34           0          34           0           0 q mem clnup lck
JG           357           0         357           0           0 contention

The redo requirement has increased from 34,800 enries and 76MB to 47,500 entries and 85MB; aided by an increase of 4MB in the undo. Cross-checking to the undo segment stats (v$rollstat) we see an alarming difference – the volume of writes agrees with the session stats, but almost all of it takes place on one undo segment; that could be really nasty if it means the whole shrink is performed as a single transaction!

Luckily we can see in the enqueue stats that we still have a large number of transaction (TX) enqueues, though the number has gone up from about 3,000 to 3,500. (That’s an interesting difference given the “shrunk” index consists of about 500 leaf blocks – and while we’re thinking about that, it might be interesting that 4MB of undo seems to be approximately 500 (leaf?) blocks * 8KB!)

Let’s take a look at the before and after versions of the treedump. Because I was using a clean tablespace to re-run the tests, and because I managed to keep restarting on the same v$process.pid for many of the tests the order in which I used data blocks was unchanged from test to test, so the index for the “compact” test started out exactly the same as it was for the “coalesce” test:

Before
branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000438 150996024 (-1: nrow: 401, level: 1)
      leaf: 0x900016d 150995309 (-1: row:222.47 avs:3778)
      leaf: 0x900154e 151000398 (0: row:218.52 avs:3854)
      leaf: 0x9000abd 150997693 (1: row:219.44 avs:3835)
      leaf: 0x900153e 151000382 (2: row:209.43 avs:4025)
      leaf: 0x900058d 150996365 (3: row:230.44 avs:3626)
      leaf: 0x90013a8 150999976 (4: row:229.45 avs:3645)
      leaf: 0x9000ae1 150997729 (5: row:411.88 avs:187)
      leaf: 0x900031c 150995740 (6: row:227.50 avs:3683)
      leaf: 0x90014d3 151000275 (7: row:229.42 avs:3645)
      leaf: 0x9000aec 150997740 (8: row:226.46 avs:3702)
      leaf: 0x90014f3 151000307 (9: row:226.57 avs:3702)
      leaf: 0x9000593 150996371 (10: row:219.46 avs:3835)
      leaf: 0x9001559 151000409 (11: row:223.54 avs:3759)
      leaf: 0x9000a9d 150997661 (12: row:210.33 avs:4006)
      leaf: 0x900152e 151000366 (13: row:215.30 avs:3911)
      leaf: 0x900018a 150995338 (14: row:258.52 avs:3094)

After
branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x9000427 150996007 (-1: nrow: 64, level: 1)
      leaf: 0x900016d 150995309 (-1: row:377.377 avs:833)
      leaf: 0x900011b 150995227 (0: row:377.377 avs:833)
      leaf: 0x900016e 150995310 (1: row:377.377 avs:833)
      leaf: 0x9000370 150995824 (2: row:377.377 avs:833)
      leaf: 0x900011e 150995230 (3: row:377.377 avs:833)
      leaf: 0x9000309 150995721 (4: row:377.377 avs:833)
      leaf: 0x9000239 150995513 (5: row:377.377 avs:833)
      leaf: 0x90001eb 150995435 (6: row:377.377 avs:833)
      leaf: 0x90001d3 150995411 (7: row:377.377 avs:833)
      leaf: 0x9000197 150995351 (8: row:377.377 avs:833)
      leaf: 0x900031f 150995743 (9: row:377.377 avs:833)
      leaf: 0x9000369 150995817 (10: row:377.377 avs:833)
      leaf: 0x9000216 150995478 (11: row:377.377 avs:833)
      leaf: 0x9000332 150995762 (12: row:377.377 avs:833)
      leaf: 0x900023b 150995515 (13: row:377.377 avs:833)
      leaf: 0x900033a 150995770 (14: row:377.377 avs:833)

A particularly interesting detail of the “after compact” treedump appears in the first (highlighted) level 1 branch block: it has changed its address (even though its first leaf block hasn’t).

A less obvious detail in the after compact extract is that none of the leaf block addresses is particularly large. Before compacting some of the block addresses ended with 4 non-zero digits, after compacting the highest block address we can see is 0x9000370 with only 3 trailing non-zero digits. And if we sort all the leaf blocks by block address we see the following:

      leaf: 0x9000105 150995205 (59: row:377.377 avs:833)
      leaf: 0x9000106 150995206 (37: row:377.377 avs:833)
      leaf: 0x9000107 150995207 (5: row:377.377 avs:833)
      leaf: 0x9000108 150995208 (3: row:377.377 avs:833)
      leaf: 0x9000109 150995209 (29: row:377.377 avs:833)
      leaf: 0x900010a 150995210 (52: row:377.377 avs:833)
      leaf: 0x900010b 150995211 (35: row:377.377 avs:833)
      leaf: 0x900010c 150995212 (53: row:377.377 avs:833)
      leaf: 0x900010d 150995213 (28: row:377.377 avs:833)
      leaf: 0x900010e 150995214 (60: row:377.377 avs:833)
      leaf: 0x900010f 150995215 (10: row:377.377 avs:833)
...
      leaf: 0x9000428 150996008 (8: row:377.377 avs:833)
      leaf: 0x9000429 150996009 (20: row:377.377 avs:833)
      leaf: 0x900042a 150996010 (19: row:377.377 avs:833)
      leaf: 0x900042b 150996011 (8: row:377.377 avs:833)
      leaf: 0x900042c 150996012 (55: row:377.377 avs:833)

With a few small gaps for the space management blocks and a couple of big jumps where table extents have been allocated between index extents, we can see a completely contiguous array of blocks. (And this “shuffling” of blocks explains most of the extra undo, redo and transaction count.)

There is the question, of course, of whether Oracle does all the back-filling before rearranging the blocks, or whether it relocates a block as soon as it has filled it. There’s a fairly big hint in the quote from the manuals that said: “Concurrent DML operations are blocked for a short time at the end of the shrink operation when the space is deallocated” but we can answer the question fairly easily by looking at the redo log that we’ve dumped.

If we use grep to pick out just the index-related OP Codes (layer 10) and the 5.4 (commit) OP codes and look at the last few lines of the result we can spot a repeating pattern like the following (which I’ve edited to shorten the lines):

DBA:0x09000c24 OBJ:143721 SCN:0x0000000002542c13 SEQ:1 OP:10.6 
DBA:0x09001674 OBJ:143721 SCN:0x0000000002542c13 SEQ:1 OP:10.6 
DBA:0x09000105 OBJ:143721 SCN:0x0000000002542c13 SEQ:1 OP:10.8 
DBA:0x0900033c OBJ:143721 SCN:0x0000000002542c13 SEQ:1 OP:10.11 
DBA:0x09000c24 OBJ:143721 SCN:0x0000000002542c13 SEQ:2 OP:10.11 
DBA:0x09000e5b OBJ:143721 SCN:0x0000000002542c13 SEQ:1 OP:10.40 
DBA:0x09001674 OBJ:143721 SCN:0x0000000002542c13 SEQ:2 OP:10.34 
DBA:0x04402720 OBJ:4294967295 SCN:0x0000000002542c13 SEQ:2 OP:5.4

This, plus a few variations, happens 441 times at the end of the dump. The OP Codes translate to:

Lock block 0x09000c24 (10.6)
Lock block 0x09001674 (10.6)
Initialize leaf block 0x09000105 (10.8) — copying block 0x009001674
Set previous pointer on leaf block 0x0900033c (10.11) — to point backwards to 0x009000105
Set next pointer on leaf block 0x09000c24 (10.11) — to point forwards to 0x009000105
Update branch block 0x009000e5b (10.40) — to point to 0x009000105 instead of 0x009001674
Make empty leaf block 0x09001674 (10.34)
commit (5.4)

This pattern shows Oracle copying leaf blocks to the start of the segment and wiping the original clean (the small number of variants are for branch blocks). And this happens only after the task of filling all the leaf blocks to the correct level has finished.

As before it’s the initialize (10.8) and the undo (5.1) of the “make empty” (10.34) which generate the largest amount of redo – the 10.8 being effectively an array insert of 377 rows, and the 5.1 being an 8KB block image.

There are other (relatively small) differences, though between the redo log dump for coalesce and compact. We noted that the 3,000 “empty” blocks were marked as FS2 for the coalesce but FS4 for the compact. When we compare the OP Code 13.22 (state change for Level 1 BMB) we find that the coalesce reported roughly 3,000 of them them with the specific action “Mark Block free”; but compact reported 6,900 of them of which 3,900 reports “Mark Block free” and 3,027 of them reported “State Change”. That’s an interesting difference when the space management report tells us that there were 3,027 blocks at FS4 when I had been expecting them to be FS2.

A little detail on the side – the sort of thing that goes into a note for further investigation – is that the redo for the compact reported 8 cases of Op Code 13.22 redo change vectors as “Update Express Allocation Area” in the first level 1 bitmap of the segment for Allocate Area slots 0 to 7 respectively, and had one 13.28 Op Code (Update segment header block) with the same description, supposedly marking the Allocation Area “Full”.

A little extra work with grep (looking for “OP:13.22” or “state:”, and then looking at a few lines of the raw trace very closely, then doing a couple more “grep”s led me to the following summary report:

egrep -n -i  -e "offset:.*state:" -e"state:" *compact*redo*.trc

...
101824:offset: 10 length:1 xidslot:35 state:3
102992:offset: 34 length:1 xidslot:35 state:3
104239:offset: 38 length:1 xidslot:35 state:3
105544:offset: 46 length:1 xidslot:35 state:3
106876:offset: 15 length:1 xidslot:35 state:3
108485:offset: 55 length:1 xidslot:35 state:3
109678:offset: 49 length:1 xidslot:35 state:3
110960:offset: 16 length:1 xidslot:35 state:3
...
3889253:Len: 1 Offset: 63 newstate: 2
3889261:Len: 1 Offset: 62 newstate: 2
3889269:Len: 1 Offset: 61 newstate: 2
3889283:Len: 1 Offset: 60 newstate: 2
3889297:Len: 1 Offset: 59 newstate: 2
3889305:Len: 1 Offset: 58 newstate: 2
3889313:Len: 1 Offset: 57 newstate: 2
3889327:Len: 1 Offset: 56 newstate: 2
3889341:Len: 1 Offset: 55 newstate: 2
3889349:Len: 1 Offset: 54 newstate: 2
3889357:Len: 1 Offset: 53 newstate: 2
3890375:offset: 52 length:1 xidslot:35 state:3
3890387:Len: 1 Offset: 52 newstate: 2
3890400:Len: 1 Offset: 51 newstate: 2
...

The early part of the trace file shows the bitmap changes as index entries are transferred to the (logically) previous index leaf block and a leaf block becomes empty: its “bit” is set to state 3 (which corresponds to FS2). The offset is the location of the “bit” in the space management block and in some cases a range of bits can be set in one call, hence the presence of the length. I haven’t shown you the DBA (block address) of the OP:13.22 which jumps all over the place as Oracle walks the index in key order, but that explains the randomness of the offset – with my 1MB extent definition each bitmap block maps 64 consecutive data blocks from the 128 available to the extent and (logically) consecutive leaf blocks could be in different extents.

In the later part of the trace file – once the rows have been packed into the minimum number of leaf blocks – Oracle starts walking the index “backwards” in “physical” – i.e. from the last block of the last used extrent – again you really need to see some way of viewing DBAs but there’s a hint in the way the offset decreases from 63 to zero.

If Oracle finds an empty (state 3) block it changes it to state 2 (which corresponds to FS4). If it finds a leaf block that is not empty (highlighted lines) it goes through the steps described above to “create full block as near as possible to the start of the segment” / “create empty block here” and flags the empty block as state 3 – and then changes the state from 3 to 2. (You don’t see a state change for the block created near the start of segment as it will be overwriting a pre-existing “full” block. The block changes, the bit doesn’t have to.)

The DBAs you can dump are the ones for all the OP:10.8 (initialize new block). In the earlier part of the trace you’ll see the DBA’s in these records jumping about randomly as Oracle walks the index in key order (each one may appear several times in a row as several consecutive leaf blocks may empty themselves backwards into a single leaf block – which is what happened after my 80% deletion). In the later part of the trace file it will probably be fairly easy to see that the “new” blocks are working forwards from the start of the segment. Here’s a little bit of the dump from the later part of my trace file:

3889458:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000105 OBJ:143721 SCN:0x0000000002542c13 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3890921:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000107 OBJ:143721 SCN:0x0000000002542c60 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3892086:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000108 OBJ:143721 SCN:0x0000000002542c73 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3893167:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000109 OBJ:143721 SCN:0x0000000002542c7a SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3894276:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900010a OBJ:143721 SCN:0x0000000002542c85 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3895861:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900010b OBJ:143721 SCN:0x0000000002542cd4 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3896968:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900010c OBJ:143721 SCN:0x0000000002542cdd SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3898133:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900010d OBJ:143721 SCN:0x0000000002542cf0 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3899214:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x0900010e OBJ:143721 SCN:0x0000000002542cf7 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3900295:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000110 OBJ:143721 SCN:0x0000000002542cfe SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3901348:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000111 OBJ:143721 SCN:0x0000000002542d01 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3902555:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000112 OBJ:143721 SCN:0x0000000002542d1a SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
3903608:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000114 OBJ:143721 SCN:0x0000000002542d1d SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000

So we can see that blocks made empty by the block-shuffling of a “shrink space” on an index are flagged as “state 2”, which then gets reported by dbms_space.space_usage() as FS4, i.e. “75 – 100% free”. It seems a little odd that this should be the case (especially when (a) coalesce and deletes use state 3 / FS2 and (b) shrink space compact marks them as state 3 before immediately changing them to state 2. Possibly, though, this is a little trick to avoid the risk of error when Oracle tries to reduce the highwater mark on a “shrink space”, or to avoid repeating work if this phase of the operation is interrupted and has to be interrupted.

Note, however, another little performance threat indicated by this processing. Oracle walks the index in key order to collapse the contents of multiple leaf blocks back to an existing single leaf block. rows; then it re-reads the index in reverse order of extent_id (and block_id within extent – where necessary) as it moves blocks from the high end of the segment to the low end of the segment. For a very large index you may end up physically reading most of it twice, one block at a time.

Shrink space compact – concurrency

The manuals tell us that the “compact” option of shrink space allows the command to complete “online” – i.e. with other activity ongoing. How accurate is this description? Let’s just re-run the test but insert a few randomly scattered values from another session without committing before we start the shrink, and see what happens:

My test data is tiny and my laptop is fairly high powered, so the session doing the shrinking seemed to hang almost instantly, going into a wait for a TX enqueue, timing out every three seconds. When I commited from the second session the shrink finished almost instantly – so three possibilities:

It had waited for all TM locks to drop before starting work
It had hit the first leaf block with an active transaction and waited for that transaction to commit before continuing
It had skipped over any leaf block with an active transaction and only gone into a TX wait on the “phase 2” when it was trying to move leaf blocks towards the start of the segment.

Fortunately I had done a treedump of the index before committing, with the following results (first few lines only):

branch: 0x9000104 150995204 (0: nrow: 8, level: 2)
   branch: 0x900042b 150996011 (-1: nrow: 81, level: 1)
      leaf: 0x9000167 150995303 (-1: row:377.377 avs:833)
      leaf: 0x9000126 150995238 (0: row:377.377 avs:833)
      leaf: 0x90001b8 150995384 (1: row:377.377 avs:833)
      leaf: 0x9000354 150995796 (2: row:242.242 avs:3398)
      leaf: 0x9000139 150995257 (3: row:45.45 avs:7141)
      leaf: 0x900034d 150995789 (4: row:377.377 avs:833)
      leaf: 0x9000337 150995767 (5: row:377.377 avs:833)
...

Most of the index had been through the row-packing process, but there were clear indications that Oracle had handled a few leaf blocks differently (lines 6 & 7). Moreover there were some very strange leaf blocks reporting “row:0.0” – all of which had a “high” block address:

      leaf: 0x9000335 150995765 (64: row:377.377 avs:833)
      leaf: 0x900015e 150995294 (65: row:308.308 avs:2144)
      leaf: 0x9000148 150995272 (66: row:50.50 avs:7046)
      leaf: 0x9000a36 150997558 (67: row:0.0 avs:7996)
      leaf: 0x9000356 150995798 (68: row:37.37 avs:7293)
      leaf: 0x900014e 150995278 (69: row:377.377 avs:833)

After I’d committed the second session and the shrink space compact had completed this portion of the index changed to:

      leaf: 0x9000335 150995765 (64: row:377.377 avs:833)
      leaf: 0x900015e 150995294 (65: row:308.308 avs:2144)
      leaf: 0x9000148 150995272 (66: row:50.50 avs:7046)
      leaf: 0x9000356 150995798 (67: row:37.37 avs:7293)
      leaf: 0x900014e 150995278 (68: row:377.377 avs:833)

So it seems that Oracle works its way through the entire compaction process but leaves an empty block for each leaf block above the “anticipated” highwater mark (to allow for easy read-consistency, perhaps) and then waits for each transaction that is holding one of those blocks to commit before removing them from the index structure.

We can do one last check to see if this hypothesis is roughly what happens by looking at the redo log dump and checking timestamps to see when the big TX wait stops and what happens after it is over – and here are some lines from the redo dump showing the last few OP:13.22 codes with their associated timestamps and line numbers

4364620:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000480 OBJ:143917 SCN:0x00000000025609a6 SEQ:1 OP:13.22 ENC:0 RBL:0 FLG:0x0000
4364629:SCN: 0x00000000025609a7 SUBSCN:  1 09/05/2022 12:15:38
4364632:CHANGE #2 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000480 OBJ:143917 SCN:0x00000000025609a6 SEQ:2 OP:13.22 ENC:0 RBL:0 FLG:0x0000
4364638:SCN: 0x00000000025609a7 SUBSCN:  1 09/05/2022 12:15:38
...
4373099:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09001100 OBJ:143917 SCN:0x000000000255f937 SEQ:1 OP:13.22 ENC:0 RBL:0 FLG:0x0000
4373105:SCN: 0x0000000002560b1f SUBSCN:  1 09/05/2022 12:20:14
...
4373283:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000a00 OBJ:143917 SCN:0x00000000025602b1 SEQ:2 OP:13.22 ENC:0 RBL:0 FLG:0x0000
4373289:SCN: 0x0000000002560b24 SUBSCN:  1 09/05/2022 12:20:14
...
4373466:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000581 OBJ:143917 SCN:0x0000000002560886 SEQ:2 OP:13.22 ENC:0 RBL:0 FLG:0x0000
4373472:SCN: 0x0000000002560b28 SUBSCN:  1 09/05/2022 12:20:14
4373479:SCN: 0x0000000002560b28 SUBSCN:  1 09/05/2022 12:20:14
4373485:SCN: 0x0000000002560b28 SUBSCN:  1 09/05/2022 12:20:14
4373486:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000480 OBJ:143917 SCN:0x00000000025609a7 SEQ:1 OP:13.22 ENC:0 RBL:0 FLG:0x0000
4373492:SCN: 0x0000000002560b2a SUBSCN:  1 09/05/2022 12:20:14
4373499:SCN: 0x0000000002560b2a SUBSCN:  1 09/05/2022 12:20:14

Note the 5 minute gap between the updates to the level 1 bitmap block 0x09001100 and block 0x09000480, which was one of the 3 “problem” blocks with the unusual “row:0.0” entry in the treedump.

One last little detail to highlight mechanisms – if I sort the leaf blocks from the treedump I took while the shrink was waiting this is what the last few lines looks like:

      leaf: 0x90004ac 150996140 (77: row:70.70 avs:6666)
      leaf: 0x90004ad 150996141 (69: row:377.377 avs:833)
      leaf: 0x90004ae 150996142 (54: row:49.49 avs:7065)
      leaf: 0x90004af 150996143 (6: row:377.377 avs:833)
      leaf: 0x90004b0 150996144 (52: row:377.377 avs:833)
      leaf: 0x90004b1 150996145 (70: row:377.377 avs:833)
      leaf: 0x90005ed 150996461 (41: row:0.0 avs:7996)
      leaf: 0x9000a36 150997558 (67: row:0.0 avs:7996)
      leaf: 0x9001105 150999301 (9: row:0.0 avs:7996)

Check the last three “outliers” in this list and compare them with the bitmap updates recorded after the 5 minute wait. (The very last OP:13.22 – to block 0x9000480 – is the change from state 3 to state 2.) Oracle does as much as it can as soon as it can, and then clears up the last few messy bits while letting everything else go on without interruption. You may find, however, that the strategy of bypassing and returning to leaf blocks that hold active transactions may – depending on the pattern of data – result in a large number of blocks that Oracle has not been able to pack to the greatest possible extent.

If you have a large number of small transactions executing and committing you’ll see the same sort of effect. The impact on the final size of the index and the number of blocks that haven’t achieved maximum utilisation) will depend very much on the data patterns and the nature of processing done by the competing sessions.

If we need to investgate further, we can always examine the ksq trace. There are a couple of details in this that vary vary from the coalesce trace. We see the same TM enqueue in mode 2 held for the duration of the process, but instead of an OD enqueue we see an SK enqueue (segment shrink) taken immediately after the TM enqueue and held to the very end of processing.

In the second phase of the processing, as leaf blocks are copied into blocks nearer the start of the segment we see TX enqueues being taken in mode 4 as the process reaches a block that is still holding an active transaction; but this is no different from the normal action for “wait for competing transaction to commit or rollback”, and the enqueue is released as soon as the other session commits.

shrink space (without compact)

I find it fairly amusing that you have to extend the shrink space command if you want it to do less work. Perhaps if it were “compact only” that would feel less strange.

If you omit the “compact” option Oracle moves the highwater mark down the segment to release as many extents as possible back to the tablespace. This was immediately visible in the space management report:

Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            2 /           16,384
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :            0 /                0
Full                          :          544 /        4,456,448

PL/SQL procedure successfully completed.

Segment Total blocks:          640
Object Unused blocks:           82

Critically the Segment Total blocks has dropped to 640: that’s 5 extents of 128 blocks each (remember my tablespace was declared with 1MB uniform extents).

When, to be awkward, I created the index in a tablespace declared with system-allocated extents, and gave it a storage clause of (initial 1M next 8M), it shrank to two extents, one of 128 blocks and one of 440 blocks (rather than the 1,024 blocks implied by the declaration). So shrinking indexes can result in some fairly randomly sized holes in the tablespace – the effect is similar to the trimming that takes place with parallel “create table as select” and “create index”. The effect isn’t a total mess, though, since it is catering for the 1MB, 8MB, 64MB “boundaries” of system-allocated tablespaces and not a purely random trim.

So the next thing we need to look at is the locking that’s going on, and any collateral mechanisms that show us the work Oracle does as it’s adjusting the highwater mark and releasing the extents to the tablespace.

Shrink Space – locking

There was very little difference in volume of undo and redo when comparing shrink space compact with shrink space – the latter averaged a little more than the former, but with the variations due to the occasional restarts and the undo segment stealing the difference wasn’t significant. Critically, of course, there there were a number of extra transations due to the “spare” extents being dropped as the highwater mark was lowered. Each extent dropped required an update to the seg$ table, and each update was executed as a separate transaction – interestingly, although the undo generated by the shrinking was all dumped into a single undo segment, the recursive dropping of the extents rotated through the available undo segments, producting the following type of figures for the rollback statistics:

USN   Ex Size K  HWM K  Opt K      Writes     Gets  Waits Shr Grow Shr K  Act K
----  -- ------  -----  -----      ------     ----  ----- --- ---- ----- ------
   0   0      0      0      0           0        1      0   0    0     0      0
   1   0      0      0      0        2426       92      0   0    0     0      0
   2   0      0      0      0         782       82      0   0    0     0      0
   3   0      0      0      0        3410      102      0   0    0     0      0
   4   0      0      0      0        1612       87      0   0    0     0      0
   5   0      0      0      0        1260       87      0   0    0     0      0
   6 416  91584      0      0    60289862    16521      0   0  416     0 -62613
   7   0      0      0      0        5202      130      0   0    0     0      0
   8-380 -90616      0      0        5828      629      0  38    0   -24      0
   9   0      0      0      0        1582       89      0   0    0     0      0
  10   0      0      0      0        1468       87      0   0    0     0      0

You’ll notice that most of the undo segments saw a few writes. A particular side-note in this set of results is the effect on undo segment 8 – while segment 6 grew by 90MB this was at the cost of segment 8 shrinking by 90MB. If you try to shrink several segments one after the other you could seriously disrupt any other long-running activity on the system as each shrink steals (possibly “unexpired”) extents from other undo segments as it grows. (You might even see some time spent waiting on the “US” (undo segment) and “CR” enqueues.)

One of the surprising details of the final phase of the shrink space command was that the TM lock on the underlying table was taken and released twice (in mode 6) after the mode 2 for the phases 1 and 2 was released. Given the number and timing of the CR (reuse block range) enqueues that appeared around this time it’s possible that the first lock was held while the redundant extents were forced to disc, and the second was held while the segment header and extent map blocks were updated and forced to disc. The SK enqueue taken at the very start of the shrink was was held all the way through this processing.

Shrink Space – concurrency

As before concurrent transactions will run uninterrupted but skipping blocks which are subject to active transactions in “phase 1”; then in “phase 2” as blocks are copied from the high end of the segment to the low end of the segment the shrink will wait for each block that is still subject to an active transaction. The concurrency is good, as the bulk of the work takes place while the shrinking session is holding its TM lock in mode 2.

When we get to the moment when extents are de-allocated and the highwater marks adjusted the session takes and releases two TM locks in mode 6 in rapid succession. If another session manages to update a row (i.e. taking a TM/3 lock on the table) before the shrink session gets its first TM/6 lock the shrinking session will have to wait for the session to commit. It does this in the normal fashion – timing out and restarting every 3 seconds, and checking for a local deadlock every 60 seconds. So any transaction that manages to slip in to update the table between the shrink and the release of space could cause the table to be locked indefinitely. This doesn’t appear to be any worse than the problem introduced by the waits for “locked” leaf blocks as the “compact” tries to copy them towards the start of the segment, though.

Summary (so far)

alter index xxx coalesce is an online operation which reads through the index in order, copying index entries backwards to fill leaf blocks (according to the pctfree setting). This is not a row-by-row process, the session constructs full leaf blocks in private memory and “initialises” them into the database, re-initialising any blocks that have been made empty at the same time. The leaf block that was the last one to contribute index entries to the full block will be used as the target for the next fill – unless the transfer process left it completely empty in which case Oracle starts with the next leaf block in the index.

When the process empties a leaf block, Oracle unlinks it from the index structure. This means deleting an entry from the level 1 branch block and modifying the two leaf blocks either side of the empty block so that the “next pointer” of the previous block points to the next leaf block, and the “previous pointer” of the next block points to the previous block. The bitmap entry for the the empty block can then be set to report it as “Freespace 2” – ready for re-use anywhere else in the index.

The copy-back process doesn’t cross the boundaries of level 1 branch blocks, so the last leaf block for a branch block may not reach the limit of rows it’s allowed to hold. Also the first leaf block of a branch block is never substituted, and it may (for reasons I don’t know) end up holding less than the expected maximum row count.

It is possible to make a reasonably estimate of the undo and redo generated by a call to coalesce an index.

The coalesce operates as a large number of small transactions (working through the index on pairs of adjacent blocks) and will cycle through all the undo segments in your undo tablespace.

If a coalesce reaches a block that holds an active transaction it will skip the block and move on to the next available leaf block, so a little light activity on the index could significantly reduce the effectiveness of the coalesce and different indexes on the same table could be affected very differently because of the pattern that exist in the data and its indexes.

alter index xxx shrink space [compact] is a two or three phase process depending on whether you include the compact option or not. The first phase seems to be very similar to the work done by the coalesce command and, like the coalesce command, operates as a large number of small transactions, skipping any leaf blocks that contain an active transaction. This allows it to be an online process.

After packing as many leaf blocks as much as possible, shrink space moves into “phase 2” where it copies leaf blocks from the “end” of the segment to empty blocks near the beginning of the segment, working backwards down the extents and blocks. If it finds a leaf block with an active transaction while doing this it waits for the transaction to commit using the normal TX mode 4 wait.

At the end of phase 2 the index is packed into the smallest set of blocks at the low end of the segment and all the other blocks allocated below the highwater mark are flagged as “FS4” (75 – 100% free), unlike the blocks for a coalesce which are flagged as “FS2” (25 – 50% free). In both cases this actually means “empty and available for reuse”.

Like the coalesce command “shrink space” holds a TM lock on the table in mode 2 for this activity, but differs in its choice of “secondary” lock, using an SK enqueue in mode 6 rather than an OD enqueue. Unlike the coalesce command the shrink space command restricts the undo generated so far to a single undo segment – which could cause some disruptive side effects if other long running jobs are generating undo at the same time.

Summary of “phase 3” coming soon.

Comments (2)

March 7, 2022

SYSAUX Occupants

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 11:32 am GMT Mar 7,2022

This is a draft note that’s been hanging around a very long time for a final edit and print. I was prompted to rediscover it today when I noticed a question on the Oracle Developers’ forum asking about reducing the size of a SYSAUX tablespace that currently included 400GB of free space.

Obviously the usual problem of used extents at the top end of a file can apply even for SYSAUX, and it’s highly likely that it will be necessary to move a few objects around (possibly within the tablespace, possibly out of the tablespace and back) but it’s easy to forget that some of the objects in SYSAUX are sufficiently important that they should only be moved using a procedure that has been supplied by Oracle Corp.

The list of contents of SYSAUX and, where they exist, the relevant procedure to move them are listed in view v$sysaux_occupants, so it’s always worth a quick check of the view before you do anything else – hence this little script:

rem
rem     Script:         sysaux_list.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem 


set linesize 156
set pagesize 60
set trimspool on
set tab off

spool sysaux_list.lst

-- execute print_table('select * from v$sysaux_occupants order by space_usage_kbytes')

column  schema_name             format a32
column  occupant_name           format a32
column  move_procedure          format a45
column  space_usage_kbytes      format 999,999,999

select 
        schema_name, occupant_name, move_procedure, space_usage_kbytes 
from 
        v$sysaux_occupants
order by 
        space_usage_kbytes
;

spool off

I don’t think there’s anything in the script that needs comment, apart from the optional line I’ve put in that uses Tom Kyte’s print_table() procedure to output rows in a tabular layout that produces a convenient full dump of the view. It’s a standard bit of defensive programming that helps me (sometimes) to avoid missing changes to view contents as versions are upgrade.

Here’s a small extract from the two sections of the output from a little VM sandbox of 19.11:

OCCUPANT_NAME                  : LOGMNR
OCCUPANT_DESC                  : LogMiner
SCHEMA_NAME                    : SYSTEM
MOVE_PROCEDURE                 : SYS.DBMS_LOGMNR_D.SET_TABLESPACE
MOVE_PROCEDURE_DESC            : Move Procedure for LogMiner
SPACE_USAGE_KBYTES             : 0
CON_ID                         : 3

-----------------

...

OCCUPANT_NAME                  : SM/OTHER
OCCUPANT_DESC                  : Server Manageability - Other Components
SCHEMA_NAME                    : SYS
MOVE_PROCEDURE                 :
MOVE_PROCEDURE_DESC            : *** MOVE PROCEDURE NOT APPLICABLE ***
SPACE_USAGE_KBYTES             : 72896
CON_ID                         : 3

-----------------

OCCUPANT_NAME                  : SDO
OCCUPANT_DESC                  : Oracle Spatial
SCHEMA_NAME                    : MDSYS
MOVE_PROCEDURE                 : MDSYS.MOVE_SDO
MOVE_PROCEDURE_DESC            : Move Procedure for Oracle Spatial
SPACE_USAGE_KBYTES             : 199552
CON_ID                         : 3

-----------------

32 row(s) selected



SCHEMA_NAME                      OCCUPANT_NAME                    MOVE_PROCEDURE                                SPACE_USAGE_KBYTES
-------------------------------- -------------------------------- --------------------------------------------- ------------------
SYSTEM                           LOGMNR                           SYS.DBMS_LOGMNR_D.SET_TABLESPACE                               0
...
AUDSYS                           AUDSYS                           DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION                  39,488
SYS                              AO                               DBMS_AW.MOVE_AWMETA                                       45,696
XDB                              XDB                              XDB.DBMS_XDB_ADMIN.MOVEXDB_TABLESPACE                     62,144
SYS                              SM/OPTSTAT                                                                                 68,288
SYS                              SM/OTHER                                                                                   72,896
MDSYS                            SDO                              MDSYS.MOVE_SDO                                           199,552

32 rows selected.

Addendum

It’s a long time since I had to help someone clean load of garbage from sysaux – typically from excess space usage by the audit table, the histogram history table, or the partition synopsis table – so I thought I’d take a quick look around the internet and MOS for any notes that might be good starting points for doing the job in a recent version of Oracle. The biggest unknown for me was the need to mess around inside a PDB, so this addendum is largely a pointer to get me started if I do need to do something with a PDB in the future.

Searching MOS for “v$sysaux_occupants” I noted a number of documents could offer helpful comments, in particular I picked on “Doc ID 1499542.1: Reducing the Space Usage of the SQL Management Base in the SYSAUX Tablespace.” which made reference to the dbms_pdb package and the need to use dbms_pdb.exec_as_oracle_script() to move objects in a PDB’s sysaux tablespace, and the slightly more generic Doc ID 2798505.1 How To Move Sys Table in PDB that made a similar point and highlighted Oracle error ORA-65040: operation not allowed from within a pluggable database.

Following this detail with a further search on google I then found a note on Marco Mischke’s blog with the title: “Shrink SYSAUX of a Pluggable Database”. The note is 6 years old, so shouldn’t be followed thoughtlessly – things change with time – but it’s a good starting point.

Comments (1)

December 3, 2018

Row Migration

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 4:27 pm GMT Dec 3,2018

There’s a little detail of row migration that’s been bugging me for a long time – and I’ve finally found a comment on MoS explaining why it happens. Before saying anything, though, else I’m going to give you a little script (that I’ve run on 12.2.0.1 with an 8KB block size in a tablespace using [corrected ~~ASSM]~~ manual (freelist) space management and system allocated extents) to demonstrate the anomaly.


rem
rem     Script:         migration_itl.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem     Notes
rem     Under ASSM we can get 733 rows in the block,
rem     using freelist management it goes up to 734
rem

create table t1 (v1 varchar2(4000))
segment creation immediate
tablespace test_8k
pctfree 0
;

insert into t1
select  null from dual connect by level <= 734 -- > comment to avoid wordpress format issue
;

commit;

spool migration_itl.lst

column rel_file_no new_value m_file
column block_no    new_value m_block

select 
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no, 
        dbms_rowid.rowid_block_number(rowid)    block_no,
        count(*)                                rows_starting_in_block
from 
        t1
group by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid) 
order by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid)
;

update t1 set v1 = rpad('x',10);
commit;

alter system flush buffer_cache;

alter system dump datafile &m_file block &m_block;

column tracefile new_value m_tracefile

select
        tracefile 
from 
        v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from v$mystat where rownum = 1
                )
        )
;

-- host grep nrid &m_tracefile

spool off

The script creates a single column table with pctfree set to zero, then populates it with 734 rows where every row has a null for its single column. The query using the calls to the dbms_rowid package will show you that all 734 rows are in the same block. In fact the block will be full (leaving a handful of bytes of free space) because even though each row will require only 5 bytes (2 bytes row directory entry, 3 bytes row overhead, no bytes for data) Oracle’s arithmetic will allow for the 11 bytes that is the minimum needed for a row that has migrated – the extra 6 bytes being the “next rowid” (nrid) pointer to where the migrated row now lives. So 734 rows * 11 bytes = 8078, leaving 4 bytes free space with 110 bytes block and transaction layer overhead.

After populating and reporting the table the script then updates every row to grow it by a few bytes, and since there’s no free space every row will migrate to a new location. By dumping the block (flushing the buffer cache first) I can check where each row has migrated to. (If you’re running a UNIX flavour and have access to the trace directory then the commented grep command will give you what you want to see.) Here’s a small extract from the dump on a recent run:

nrid:  0x05c00082.0
nrid:  0x05c00082.1
nrid:  0x05c00082.2
nrid:  0x05c00082.3
...
nrid:  0x05c00082.a4
nrid:  0x05c00082.a5
nrid:  0x05c00082.a6
nrid:  0x05c00083.0
nrid:  0x05c00083.1
nrid:  0x05c00083.2
nrid:  0x05c00083.3
...
nrid:  0x05c00085.a4
nrid:  0x05c00085.a5
nrid:  0x05c00085.a6
nrid:  0x05c00086.0
nrid:  0x05c00086.1
nrid:  0x05c00086.2
...
nrid:  0x05c00086.3e
nrid:  0x05c00086.3f
nrid:  0x05c00086.40
nrid:  0x05c00086.41

My 734 rows have migrated to fill the next four blocks (23,130) to (23,133) of the table and taken up some of the space in the one after that (23,134). The first four blocks have used up row directory entries 0x00 to oxa6 (0 to 166), and the last block has used up row directory entries 0x00 to 0x41 (0 to 65) – giving us the expected total: 167 * 4 + 66 = 734 rows. Let’s dump one of the full blocks – and extract the interesting bits:

alter system dump datafile 23 block 130;
Block header dump:  0x05c00082
 Object id on Block? Y
 seg/obj: 0x1ba1e  csc:  0x0000000001e0aff3  itc: 169  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.00f.000042c9  0x0240242d.08f3.14  --U-  167  fsc 0x0000.01e0affb
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x04   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x05   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x06   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
...
0xa6   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa7   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa8   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa9   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

nrow=167
frre=-1
fsbo=0x160
fseo=0x2ec
avsp=0x18c
tosp=0x18c

tab 0, row 0, @0xfe4
tl: 20 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x05c00081.0
col  0: [10]  78 20 20 20 20 20 20 20 20 20
tab 0, row 1, @0xfd0
tl: 20 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x05c00081.1

This block has 169 (0xa9) ITL entries – that’s one for each row migrated into the block (nrow = 167) plus a couple spare. The block still has some free space (avsp = tosp = 0x18c: available space = total space = 396 bytes), but it can’t be used for any more incoming migration because Oracle is unable to create any more ITL entries – it’s reached the ITL limit for 8KB blocks.

So finally we come to the question that’s been bugging me for years – why does Oracle want an extra ITL slot for every row that has migrated into a block? The answer appeared in this sentence from MoS Doc ID: 2420831.1: “Errors Noted in 12.2 and Above During DML on Compressed Tables”

“It is a requirement during processing of parallel transactions that each data block row that does not have a header have a block ITL available.”

Rows that have migrated into a block do not have a row header – check the flag byte (fb) for the two rows I’ve listed, it’s: “—-FL–“ , there is no ‘H’ for header. We have the First and Last row pieces of the row in this block and that’s it. So my original “why” question now becomes “What’s the significance of parallel DML?”

Imagine the general case where we have multiple processes updating rows at random from multiple blocks, and many different processes forced rows to migrate at the same time into the same block. The next parallel DML statement would dispatch multiple parallel execution slaves, which would all be locking rows in their own separate block ranges – but multiple slaves could find that they wanted to lock rows which had all migrated into the same block – so every slave MUST be able to get an ITL entry in that block at the same time; for example, if we have 8 rows that had migrated into a specific block from 8 different places, and 8 parallel execution slaves each followed a pointer from the region they were scanning to update a row that had migrated into this particular block then all 8 slaves would need an ITL entry in the block (and if there were a ninth slave scanning this region of the table we’d need a 9th ITL entry). If we didn’t have enough ITL entries in the block for every single migrated row to be locked by a different process at the same time then (in principle, at least) parallel execution slaves could deadlock each other because they were visiting blocks in a different order to lock the migrated rows. For example:

PQ00 visits and locks a row that migrated to block (23,131)
PQ01 visits and locks a row that migrated to block (23,132)
PQ00 visits and tries to lock a row that migrated to block (23,132) — but if there were no “extra” ITL slots available, it would wait
PQ01 visits and tries to lock a row that migrated to block (23,131) — but there were no “extra” ITL slots available so it would wait, and we’d be in a deadlock.

Oracle’s solution to this threat: when migrating a row to a block add a new ITL if the number of migrated rows exceeds the number of ITL slots + 2 (the presence of the +2 is a working hypothesis, it might be “+initrans of table”).

Footnote 1

The note was about problems with compression for OLTP, but the underlying message was about 4 Oracle errors of type ORA-00600 and ORA-00700, which report the discovery and potential threat of blocks where the number of ITL entries isn’t large enough compared to the number of inward migrated rows. Specifically:

ORA-00600 [PITL1]
ORA-00600 [kdt_bseg_srch_cbk PITL1]
ORA-00700: soft internal error, arguments: [PITL6]
ORA-00700: soft internal error, arguments: [kdt_bseg_srch_cbk PITL5]

Footnote 2

While drafting the SQL script above, I decide to check to see how many other scripts I had already written about migrated rows and itl slots: there were 12 of the former and 10 of the latter, and reading through the notes I found that one of the scripts (itl_chain.sql, dated December 2002) included the following note:

According to a comment that came from Oracle support via Steve Adams, the reason for the extra ITLs is to avoid a risk of parallel DML causing an internal deadlock.

So it looks like I knew what the “excess” ITLs were for about 16 years ago, but had managed to forget some time since then.

Comments (5)

November 7, 2016

Reorg

Filed under: fragmentation,humour,Oracle — Jonathan Lewis @ 5:31 pm GMT Nov 7,2016

A current question on the OTN database forum asks: “What’s the difference between object and tablespace reorganization?” Here’s an analogy to address the question.

I have three crates of Guiness in the boot (trunk) of my car, one crate has 4 bottles left, one has 7 bottles left and one has 2 bottles. I also have two cases of Louis Roederer Brut NV champagne, one case has 2 bottles left and one has only one. (I have two objects in my tablespace – one of type Beer, one of type Champagne – and my boot requires manual free space management .)

I move all the Guiness bottles into a single crate and all the champagne bottles into a single case. That’s a couple of “shrink space compact” calls – I’ve re-organised the objects to get all the bottles in each object close to each other, but the crates are still taking up space in the boot.

I take the two empty crates and the empty case out of the boot. That’s a couple of “resize” (or “shrink space” without “compact”) calls that free up space in the boot.

I now want to load a port barrel into car, but it won’t fit until I slide the remaining beer crate and champagne case together at one side of the boot. That’s a couple of “move” commands that have reorganized the boot (tablespace) to make the free space usable.

Comments (7)

September 13, 2016

Securefile space

Filed under: fragmentation,Infrastructure,LOBs,Oracle,Troubleshooting — Jonathan Lewis @ 7:29 am BST Sep 13,2016

Here’s a little script I hacked together a couple of years ago from a clone of a script I’d been using for checking space usage in the older types of segments. Oracle Corp. eventually put together a routine to peer inside securefile LOBs:

rem
rem     Script:         dbms_space_use_sf.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2013
rem     Purpose:        
rem
rem     Last tested 
rem             12.1.0.2
rem             11.2.0.4
rem     Not tested
rem             11.1.0.7
rem     Not relevant
rem             10.2.0.5
rem              9.2.0.8
rem              8.1.7.4
rem
rem     Notes:
rem     See also dbms_space_use.sql
rem
rem     11g introduced securefiles lobs and two overloads of 
rem     dbms_space_usage to report space used by their segments
rem
rem     Valid values for suoption are:
rem             SPACEUSAGE_EXACT (16): Computes space usage exhaustively
rem             SPACEUSAGE_FAST  (17): Retrieves values from in-memory statistics
rem
rem     Valid segment types:
rem             LOB
rem             LOB PARTITION
rem             LOB SUBPARTITION
rem             
rem     This version allows for partitioned objects. You could delete
rem     lines about parameter &4 and partition names to eliminate the
rem     complaints about substitution variables.
rem

define m_seg_owner      = &1
define m_seg_name       = &2
define m_seg_type       = '&3'
define m_part_name      = &4

define m_segment_owner  = &m_seg_owner
define m_segment_name   = &m_seg_name
define m_segment_type   = '&m_seg_type'
define m_partition_name = &m_part_name

set linesize 180
set pagesize 60

set trimspool on
set tab off

set verify off
set feedback off
set serveroutput on

spool dbms_space_use_sf

prompt
prompt  ============
prompt  Secure files
prompt  ============
prompt

declare
        wrong_ssm       exception;
        pragma exception_init(wrong_ssm, -10614);

        m_segment_size_blocks   number(12,0);
        m_segment_size_bytes    number(12,0);
        m_used_blocks           number(12,0);
        m_used_bytes            number(12,0);
        m_expired_blocks        number(12,0);
        m_expired_bytes         number(12,0);
        m_unexpired_blocks      number(12,0);
        m_unexpired_bytes       number(12,0);

begin
        dbms_space.space_usage(
                upper('&m_segment_owner'),
                upper('&m_segment_name'),
                upper('&m_segment_type'),
                suoption                => dbms_space.spaceusage_exact, 
--              suoption                => dbms_space.spaceusage_fast,
                segment_size_blocks     => m_segment_size_blocks,
                segment_size_bytes      => m_segment_size_bytes,
                used_blocks             => m_used_blocks,
                used_bytes              => m_used_bytes,
                expired_blocks          => m_expired_blocks,
                expired_bytes           => m_expired_bytes,
                unexpired_blocks        => m_unexpired_blocks,
                unexpired_bytes         => m_unexpired_bytes,
                partition_name          => upper('&m_partition_name')
        );

        dbms_output.new_line;
        dbms_output.put_line(' Segment Blocks:   ' || to_char(m_segment_size_blocks,'999,999,990') || ' Bytes: ' || to_char(m_segment_size_bytes,'999,999,999,990')); 
        dbms_output.put_line(' Used Blocks:      ' || to_char(m_used_blocks,'999,999,990')         || ' Bytes: ' || to_char(m_used_bytes,'999,999,999,990')); 
        dbms_output.put_line(' Expired Blocks:   ' || to_char(m_expired_blocks,'999,999,990')      || ' Bytes: ' || to_char(m_expired_bytes,'999,999,999,990')); 
        dbms_output.put_line(' Unexpired Blocks: ' || to_char(m_unexpired_blocks,'999,999,990')    || ' Bytes: ' || to_char(m_unexpired_bytes,'999,999,999,990')); 

exception
        when wrong_ssm then
                dbms_output.put_line('Segment not ASSM');
end;
/

prompt
prompt  ===============
prompt  Generic details
prompt  ===============
prompt

declare
        m_total_blocks                  number;
        m_total_bytes                   number;
        m_unused_blocks                 number;
        m_unused_bytes                  number;
        m_last_used_extent_file_id      number;
        m_last_used_extent_block_id     number;
        m_last_used_block               number;
begin
        dbms_space.unused_space(
                segment_owner                   => upper('&m_segment_owner'),
                segment_name                    => upper('&m_segment_name'),
                segment_type                    => upper('&m_segment_type'),
                total_blocks                    => m_total_blocks,
                total_bytes                     => m_total_bytes, 
                unused_blocks                   => m_unused_blocks,  
                unused_bytes                    => m_unused_bytes,
                last_used_extent_file_id        => m_last_used_extent_file_id, 
                last_used_extent_block_id       => m_last_used_extent_block_id,
                last_used_block                 => m_last_used_block,
                partition_name                  => upper('&m_partition_name')
        );

        dbms_output.put_line('Segment Total blocks: ' || to_char(m_total_blocks,'999,999,990'));
        dbms_output.put_line('Object Unused blocks: ' || to_char(m_unused_blocks,'999,999,990'));

end;
/

spool off

Sample of output (from a slightly older version of the code, with the “partition” prompts deleted):


============
Secure files
============

 Segment Blocks:    168960 Bytes: 1384120320
 Used Blocks:       151165 Bytes: 1238343680
 Expired Blocks     17795 Bytes: 145776640
 Unexpired Blocks   0 Bytes: 0

===============
Generic details
===============

Segment Total blocks: 168960
Object Unused blocks: 0

And a sample where I created a table (n1 number, b1 blob) with the blob storage set to disable storage in row, inserted one row with a blob of 100 bytes, then updated (with commit) in a loop 1,000 times, increasing the length of the blob by one byte each time:


============
Secure files
============

 Segment Blocks:          1,152 Bytes:        9,437,184
 Used Blocks:                82 Bytes:          671,744
 Expired Blocks:          1,070 Bytes:        8,765,440
 Unexpired Blocks:            0 Bytes:                0

===============
Generic details
===============

Segment Total blocks:        1,152
Object Unused blocks:            0

As you can see my “one” blob has exploded the segment to over 1,000 blocks because Oracle accumulated 1,000 old versions of the blob which could not be overwritten. The 82 used blocks is mostly space management metadata.

Comments (2)

September 11, 2016

Space Usage

Filed under: fragmentation,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:09 pm BST Sep 11,2016

Here’s a simple script that I’ve used for many years to check space usage inside segments. The comment about freelist groups may be out of date – I’ve not had to worry about that for a very long time. There is a separate script for securefile lobs.


rem
rem     Script:         dbms_space_use.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2002
rem     Purpose:        
rem
rem     Last tested 
rem             12.1.0.2
rem             11.2.0.4
rem     Not tested
rem             11.1.0.7
rem             10.2.0.5
rem              9.2.0.8
rem     Not relevant
rem              8.1.7.4
rem
rem     Notes:
rem     For accuracy in free space you (once) needed to set the
rem     scan limit; and for those rare objects cases where you 
rem     had defined multiple freelist groups you still have to
rem     work through each free list group in turn
rem
rem     For the ASSM calls:
rem             FS1     => 0% - 25% free space
rem             FS2     => 25% - 50% free space
rem             FS3     => 50% - 75% free space
rem             FS4     => 75% - 100% free space
rem             Bytes = blocks * block size
rem
rem     Expected errors:
rem             ORA-10614: Operation not allowed on this segment
rem                     (MSSM segment, ASSM call)
rem             ORA-10618: Operation not allowed on this segment
rem                     (ASSM segment, MSSM call)
rem             ORA-03200: the segment type specification is invalid
rem                     (e.g. for LOBINDEX or LOBSEGMENT)
rem                     11g - "LOB" is legal for LOB segments
rem                         - use "INDEX" for the LOBINDEX
rem
rem     For indexes
rem             Blocks are FULL or FS2 (re-usable)
rem
rem     Special case: LOB segments.
rem     The number of blocks reported by FS1 etc. is actually the
rem     number of CHUNKS in use (and they're full or empty). So 
rem     if your CHUNK size is not the same as your block size the
rem     total "blocks" used doesn't match the number of blocks 
rem     below the HWM.
rem
rem     The package dbms_space is created by dbmsspu.sql
rem     and the body is in prvtspcu.plb
rem
rem     11.2 overloads dbms_space.space_usage for securefile lobs
rem     See dbms_space_use_sf.sql
rem
rem     When supplying details about partitions the segment type
rem     can consist of two words (e.g. LOB PARTITION), these 
rem     must be surrounded by quotes to survive the script.
rem
rem     You might want to set up two versions of this code with
rem     all references to partitions removed from one of them
rem     or you have to keep pressing return to bypass the 
rem     requests for substitution variables
rem

define m_seg_owner      = &1
define m_seg_name       = &2
define m_seg_type       = '&3'
define m_part_name      = &4

define m_segment_owner  = &m_seg_owner
define m_segment_name   = &m_seg_name
define m_segment_type   = '&m_seg_type'
define m_partition_name = &m_part_name

set linesize 156
set pagesize 60
set trimspool on
set tab off
spool dbms_space_use

prompt  ===================
prompt  Freelist management
prompt  ===================

declare
        wrong_ssm       exception;
        pragma exception_init(wrong_ssm, -10618);

        m_free  number(10);
begin
        dbms_space.free_blocks(
                segment_owner           => upper('&m_segment_owner'),
                segment_name            => upper('&m_segment_name'),
                segment_type            => upper('&m_segment_type'),
                partition_name          => upper('&m_partition_name'),
--              scan_limit              => 50,
                freelist_group_id       => 0,
                free_blks               => m_free
        );
        dbms_output.put_line('Free blocks below HWM: ' || m_free);
exception
        when wrong_ssm then
                dbms_output.put_line('Segment not freelist managed');
end;
/


prompt  ====
prompt  ASSM
prompt  ====

declare
        wrong_ssm       exception;
        pragma exception_init(wrong_ssm, -10614);

        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;  
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number; 
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => upper('&m_segment_owner'),
                segment_name            => upper('&m_segment_name'),
                segment_type            => upper('&m_segment_type'),
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes, 
                fs1_blocks              => m_fs1_blocks , 
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,  
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,  
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,  
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks, 
                full_bytes              => m_full_bytes,
                partition_name          => upper('&m_partition_name')
        );


        dbms_output.new_line;
        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));

exception
        when wrong_ssm then
                dbms_output.put_line('Segment not ASSM');
end;
/


prompt  =======
prompt  Generic
prompt  =======

declare
        m_total_blocks                  number;
        m_total_bytes                   number;
        m_unused_blocks                 number;
        m_unused_bytes                  number;
        m_last_used_extent_file_id      number;
        m_last_used_extent_block_id     number;
        m_last_used_block               number;
begin
        dbms_space.unused_space(
                segment_owner           => upper('&m_segment_owner'),
                segment_name            => upper('&m_segment_name'),
                segment_type            => upper('&m_segment_type'),
                total_blocks            => m_total_blocks,
                total_bytes             => m_total_bytes, 
                unused_blocks           => m_unused_blocks,  
                unused_bytes            => m_unused_bytes,
                last_used_extent_file_id        => m_last_used_extent_file_id, 
                last_used_extent_block_id       => m_last_used_extent_block_id,
                last_used_block         => m_last_used_block,
                partition_name          => upper('&m_partition_name')
        );

        dbms_output.put_line('Segment Total blocks: '  || to_char(m_total_blocks,'999,999,990'));
        dbms_output.put_line('Object Unused blocks: '  || to_char(m_unused_blocks,'999,999,990'));

end;
/

undefine 1
undefine 2
undefine 3
undefine 4

undefine m_seg_owner
undefine m_seg_name
undefine m_seg_type
undefine m_part_name

undefine m_segment_owner
undefine m_segment_name
undefine m_segment_type
undefine m_partition_name

spool off

Here’s a sample of output (from a segment using ASSM):


===================
Freelist management
===================
Segment not freelist managed

PL/SQL procedure successfully completed.

====
ASSM
====

Unformatted                   :  132,385 / ############
Freespace 1 (  0 -  25% free) :        0 /            0
Freespace 2 ( 25 -  50% free) :        0 /            0
Freespace 3 ( 50 -  75% free) :        0 /            0
Freespace 4 ( 75 - 100% free) :        0 /            0
Full                          :   12,327 /  100,982,784

PL/SQL procedure successfully completed.

=======
Generic
=======
Segment Total blocks: 145920
Object Unused blocks: 0

PL/SQL procedure successfully completed.

(I’ve increased the length of the byte-count output since I produced that report ;)

Comments (8)

November 14, 2014

Shrink Tablespace

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:16 pm GMT Nov 14,2014

If you start moving objects around to try and reclaim space in a tablespace there are all sorts of little traps that make it a little difficult to get the maximum benefit for the minimum effort. I’ve written a couple of notes in the past about how to proceed and, more recently, one of the difficulties involved. This is just a brief note about a couple of ideas to make life a little easier.

Check that you’ve emptied the recyclebin before you start
If you’re trying to shrink the sysaux tablespace check the latest details on v$sysaux_occupants for threats.
Before you try moving or rebuilding an object check that the total free space “below” that object is greater than the size of the object or you may find that some extents are allocated for the rebuild further “up” the tablespace.
Before moving a table mark its indexes as unusable. If you do this then (in recent versions of Oracle) the default behaviour is for the index space to be freed as the segment vanishes (thanks to “deferred segment creation”) and you may find that the extents of the table can move further “down” the tablespace as a consequence. (If you’ve been creating tables and indexes in different tablespaces this may be irrelevant, of course)
When you move an object think a little carefully about whether specifying a minimum initial extent size would help or hinder the move.
Don’t assume that moving the “highest” object first is the best strategy – work out where you expect the final tablespace HWM to be and you may find that it makes more sense to move other objects that are above that point first. Also, there’s no point in rebuilding an index if you’re going to move its underlying table a little later.
If you’re using system-allocated (rather than uniform) extents, moving objects with several small (64KB) extents first – even if they are below the target HWM – may allow you to free up larger (1MB, 8MB) gaps that can be used by other larger objects.
Creating a new tablespace and moving objects to it from the old tablespace working from the top downwards may be the most effective way to proceed. Work towards dropping the old tablespace HWM regularly as you go through this procedure. If you can do this it’s also worth asking whether all the objects should have been in a single tablespace anyway.

Comments (8)

October 1, 2014

Shrink Tablespace

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 2:55 pm BST Oct 1,2014

In a comment on my previous post on shrinking tablespaces Jason Bucata and Karsten Spang both reported problems with small objects that didn’t move to the start of the tablespace. This behaviour is inevitable with dictionary managed tablespaces (regardless of the size of the object), but I don’t think it’s likely to happen with locally managed tablespaces if they’ve been defined with uniform extent sizes. Jason’s comment made me realise, though, that I’d overlooked a feature of system allocated tablespaces that made it much harder to move objects towards the start of file. I’ve created a little demo to illustrate the point.

I created a new tablespace as locally managed, ASSM, and auto-allocate, then created a few tables or various sizes. The following minimal SQL query reports the resulting extents in block_id order, adding in a “boundary_1m” column which subtracts 128 blocks (1MB) from the block_id, then divides by 128 and truncates to show which “User Megabyte” in the file the extent starts in. (Older versions of Oracle typically have an 8 block space management header, recent versions expanded this from 64KB to 1MB – possibly as a little performance aid to Exadata).


select
        segment_name, block_id, blocks , trunc((block_id - 128)/128) boundary_1M
from
        dba_extents where owner = 'TEST_USER'
order by
        block_id
;

SEGMENT_NAME               BLOCK_ID     BLOCKS BOUNDARY_1M
------------------------ ---------- ---------- -----------
T1                              128       1024           0
T1                             1152       1024           8
T2                             2176       1024          16
T2                             3200       1024          24
T3                             4224          8          32
T4                             4232          8          32
T5                             4352        128          33

As you can see t3 and t4 are small tables – 1 extent of 64KB each – and t5, which I created after t4, starts on the next 1MB boundary. This is a feature of auto-allocate: not only are extents (nearly) fixed to a small number of possible extent sizes, the larger extents are restricted to starting on 1MB boundaries and the 64KB extents are used preferentially to fill in odd-sized” holes. To show the impact of this I’m going to drop table t1 (at the start of file) to make some space.


SEGMENT_NAME               BLOCK_ID     BLOCKS BOUNDARY_1M
------------------------ ---------- ---------- -----------
T2                             2176       1024          16
T2                             3200       1024          24
T3                             4224          8          32
T4                             4232          8          32
T5                             4352        128          33

Now I’ll move table t3 – hoping that it will move to the start of file and use up some of the space left by t1. However there’s a 1MB area (at boundary 32) which is partially used, so t3 moves into that space rather than creating a new “partly used” megabyte.


SEGMENT_NAME               BLOCK_ID     BLOCKS BOUNDARY_1M
------------------------ ---------- ---------- -----------
T2                             2176       1024          16
T2                             3200       1024          24
T4                             4232          8          32
T3                             4240          8          32
T5                             4352        128          33

It’s a little messy trying to clear up the tiny fragments and make them do what you want. In this case you could, for example, create a dummy table with storage(initial 64K next 64K minextents 14) to use up all the space in the partly used megabyte, then move t3 – which should go to the start of file – then move table t4 – which should go into the first partly-used MB (i.e. start of file) rather than taking up the hole left by t3.

Even for a trivial example it’s messy – imagine how difficult it can get to cycle through building and dropping suitable dummy tables and move objects in the right order when you’ve got objects with several small extents scattered through the file, and objects with a mixture of small extents and large extents.

Comments (4)

September 19, 2014

Shrink Tablespace

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 12:10 pm BST Sep 19,2014

A recent question on the OTN database forum raised the topic of returning free space in a tablespace to the operating system by rebuilding objects to fill the gaps near the start of files and leave the empty space at the ends of files so that the files could be resized downwards. There are a few problems with this strategy – including the fact that the movement towards the start of file is not guaranteed in the documentation and the algorithms for system-allocated (as opposed to uniform) extents introduce irritating side effects.

This isn’t a process that you’re likely to need frequently, but I have written a couple of notes about it, including a sample query to produce a map of the free and used space in a tablespace. While reading the thread, though, it crossed my mind that recent versions of Oracle introduced a feature that can reduce the amount of work needed to get the job done, so I thought I’d demonstrate the point here.

When you move a table its indexes become unusable and have to be rebuilt; but when an index becomes unusable, the more recent versions of Oracle will drop the segment. Here’s a key point – if the index becomes unusable because the table has been moved the segment is dropped only AFTER the move has completed. Pause a minute for thought and you realise that the smart thing to do before you move a table is to make its indexes unusable so that they release their space BEFORE you move the table. (This strategy is only relevant if you’ve mixed tables and indexes in the same tablespace and if you’re planning to do all your rebuilds into the same tablespace rather than moving everything into a new tablespace.)

Here are some outputs demonstrating the effect in a 12.1.0.2 database. I have created (and loaded) two tables in a tablespace of 1MB uniform extents, 8KB block size; then I’ve created indexes on the two tables. Running my ts_hwm.sql script I get the following results for that tablespace:


FILE_ID    BLOCK_ID   END_BLOCK OWNER      SEGMENT_NAME    SEGMENT_TYPE   
------- ----------- ----------- ---------- --------------- ------------------            
      5         128         255 TEST_USER  T1              TABLE 
                256         383 TEST_USER  T2              TABLE
                384         511 TEST_USER  T1_I1           INDEX
                512         639 TEST_USER  T2_I1           INDEX
                640      65,535 free       free

Notice that it was a nice new tablespace, so I can see the two tables followed by the two indexes at the start of the tablespaces. If I now move table t1 and re-run the script this is what happens:


alter table t1 move;

FILE_ID    BLOCK_ID   END_BLOCK OWNER      SEGMENT_NAME    SEGMENT_TYPE
------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 free       free
                256         383 TEST_USER  T2              TABLE
                384         511 free       free
                512         639 TEST_USER  T2_I1           INDEX
                640         767 TEST_USER  T1              TABLE
                768      65,535 free       free

Table t1 is now situated past the previous tablespace highwater mark and I have two gaps in the tablespace where t1 and the index t1_i1 used to be.

Repeat the experiment from scratch (drop the tables, purge, etc. to empty the tablespace) but this time mark the index unusable before moving the table and this is what happens:


FILE_ID    BLOCK_ID   END_BLOCK OWNER      SEGMENT_NAME    SEGMENT_TYPE
------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 free       free
                256         383 TEST_USER  T2              TABLE
                384         511 TEST_USER  T1              TABLE
                512         639 TEST_USER  T2_I1           INDEX
                640      65,535 free       free

Table t1 has moved into the space vacated by index t1_i1, so the tablespace highwater mark has not moved up.

If you do feel the need to reclaim space from a tablespace by rebuilding objects, you can find that it’s quite hard to decide the order in which the objects should be moved/rebuilt to minimise the work you (or rather, Oracle) has to do; if you remember that any table you move will release its index space anyway and insert a step to mark those indexes unusable before you move the table you may find it’s much easier to work out a good order for moving the tables.

Footnote:

I appreciate that some readers may already take advantage of the necessity of rebuilding indexes by dropping indexes before moving tables – but I think it’s a nice feature that we can now make them unusable and get the same benefit without introducing a risk of error when using a script to recreate an index we’ve dropped.

Comments (9)

March 13, 2014

Shrink Space

Filed under: Bugs,deadlocks,fragmentation,Index Rebuilds,Indexing,Locks,Oracle — Jonathan Lewis @ 7:08 am GMT Mar 13,2014

Here’s an example of a nasty accident that can be seen in a slightly unusual output from v$lock (on 11.2.0.4).

Rather conveniently it demonstrates a comment I’ve often made about deadlocks not being “resolved” automatically, with the result that incorrectly written applications can end up with two sessions taking it in turns (see-sawing) to block each other.
(more…)

Comments (3)

June 19, 2013

Wasted Space

Filed under: compression,fragmentation,Infrastructure,LOBs,Oracle — Jonathan Lewis @ 9:55 am BST Jun 19,2013

Here’s a little quiz: If I take the average row length of the rows in a table, multiply by the number of rows, and convert the result to the equivalent number of blocks, how can the total volume of data in the table be greater than the total number of blocks below the table high water mark ? I’ve got three tables in a schema, and they’re all in the same (8KB block, 1M uniform extent, locally managed) tablespace, but here’s a query, with results, showing their space utilisation – notice that I gather schema stats immediately before running my query:

(more…)

Comments (5)

February 25, 2013

Free Space

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:36 pm GMT Feb 25,2013

Question – How can you have a single file in a single tablespace showing multiple free extents when there are no objects using any space in that file ? For example, from an 11.1.0.7 database:

(more…)

Comments (17)

October 1, 2012

Row sizes 2

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:34 am BST Oct 1,2012

In an earlier post I showed you how you could generate SQL to analyze the distribution of row sizes in a table. In the introduction to the code I made a comment about how it failed to “allow for nulls at the end of rows”; a feature of Oracle storage that isn’t commonly known is that a set of consecutive null columns at the end of a row take up no storage space, while any null columns followed by a non-null column take up one byte (holding the value 0xFF) per column so that Oracle can “count its way” through the null columns to the non-null column. Consider this example:
(more…)

Comments (10)

August 27, 2012

Fragmentation ?

Filed under: fragmentation,Infrastructure,LOBs,Oracle — Jonathan Lewis @ 5:15 pm BST Aug 27,2012

Here’s a simple piece of SQL that could, in theory, compare the current size of a table with the size it might be after a call to “alter table move” – and it’s followed by the results for a table that’s current in the database that I’m looking at:

select
        blocks, num_rows, avg_row_len, pct_free,
        ceil(num_rows * avg_row_len / (8000 * ((100 - pct_free)/100))) blocks_needed
from
        user_tables
where
        table_name = 'T1'
;

    BLOCKS   NUM_ROWS AVG_ROW_LEN   PCT_FREE BLOCKS_NEEDED
---------- ---------- ----------- ---------- -------------
        25       1000          22         10             4

(more…)

Comments (8)

December 5, 2022

Footnote

September 2, 2022

Background

RTFM

What do we need to know?

The basic model

Tests and results

Coalesce

Coalesce – Transactions

Coalesce – Workload

Coalesce – concurrency

shrink space compact

Shrink space compact – concurrency

shrink space (without compact)

Shrink Space – locking

Shrink Space – concurrency

Summary (so far)

March 7, 2022

Addendum

December 3, 2018

Footnote 1

Footnote 2

November 7, 2016

September 13, 2016

September 11, 2016

November 14, 2014

October 1, 2014

September 19, 2014

Footnote:

March 13, 2014

June 19, 2013

February 25, 2013

October 1, 2012

August 27, 2012

Click on the motto below

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS