Oracle Scratchpad

September 11, 2016

Basicfile LOBs 6

Filed under: Infrastructure,LOBs,Oracle — Jonathan Lewis @ 7:48 pm BST Sep 11,2016

One of the nice things about declaring your (basicfile) LOBs as “enable storage in row” is that the block addresses of the first 12 chunks will be listed in the row and won’t use the LOB index, so if your LOBs are larger than 3960 bytes but otherwise rather small the LOB index will hold only the timestamp entries for deleted LOBs. This makes it just a little easier to pick out the information you need when things behave strangely, so in this installment of my series I’m going to take about an example with with storage enabled in row.

I’m going to demonstrate an issue that is causing a major problem. First I’m going to build a table with a LOB column with multiple (4) freepools – because that’s what you do to handle concurrency – then I’m going to start 4 sessions (carefully checking that I have one associated with each free pool) and do a few thousand inserts with commits from each session. The size of the LOB value I insert will be 20KB so it will be “indexed” in the row but stored out of the row taking 3 LOB blocks.

Once I’ve got the data in place I’m going to use three of the sessions to delete three quarters of the rows from the table then use a call to the dbms_space package to show you that the segment contains virtually no free space. I’ve engineered the code so that it will take just three more rows in the table to fill the available free space and force Oracle either to allocate a new extent or to start reclaiming some of the delete reusable LOB space – and I’m going to run that insert from the session that DIDN’T delete any rows.

I’ve been running these tests on 11.2.0.4, but get similar behaviour on 12c.

create table t1(
        id      number constraint t1_pk primary key,
        c1      clob
)
lob (c1)
store as 
    basicfile 
    text_lob(
            enable storage in row
            chunk 8k
            retention
            nocache
            logging
            freepools 4
            tablespace test_8k_assm
)
;

declare
        m_v1 varchar2(32767) := rpad('X',20000,'X');
begin
        for i in 0..0 loop
                insert into t1 values (i, m_v1);
                commit;
        end loop;
end;
/

truncate table t1
;

You’ll notice I’ve used the retention keyword.  Before I built the LOB I set my undo_retention to 10 seconds so that the space from deleted LOBs should become available for reuse very quickly. The name of the tablespace I’ve used for the LOB is a clue that I’m using an 8KB block size and ASSM (the latter is a requirement of the retention option).

Here’s the code to check which freepool (0 to 3) a session will be associated with (this isn’t documented, but seems to be correct);


select mod(pid,4) from v$process where addr = (
        select paddr from v$session where sid = (
                select sid from v$mystat where rownum = 1
        )
)
;

So I can keep starting sessions and running that query until I’ve got a session covering each freepool. (The first time I tried this I had to start 7 sessions before I got all 4 freepools covered). Now I can run the following from all 4 sessions concurrently:

define m_loop_counter = 12027

lock table t1 in row share mode;
commit;

declare
        m_v1 varchar2(32767) := rpad('x',20000,'x');
begin
        for i in 1..&m_loop_counter loop
                insert into t1 values (s1.nextval, m_v1);
                commit;
        end loop;
end;
/

The purpose of the lock table command is to ensure that all 4 processes start running simultaneously. From a fifth session I execute a “lock table t1 in exclusive mode” before starting the other four sessions running, so they all queue on the exclusive lock; then I commit from the fifth session and away we go. The whole thing took about 30 seconds to run. The rather random-looking value 12,027 was a careful choice to ensure that the last extent in the segment had just a few blocks left – and I used my “dbms_space_use.sql” script to check this, getting the following output:

====
ASSM
====

Unformatted                   :        7 /       57,344
Freespace 1 (  0 -  25% free) :        0 /            0
Freespace 2 ( 25 -  50% free) :        0 /            0
Freespace 3 ( 50 -  75% free) :        0 /            0
Freespace 4 ( 75 - 100% free) :        0 /            0
Full                          :  144,324 / ############

PL/SQL procedure successfully completed.

=======
Generic
=======
Segment Total blocks: 145536
Object Unused blocks: 0

I’ve got 7 “unformatted” blocks in the segment – though in fact these might be “formatted but free” from the perspective of the LOB code.

After going to sessions 0, 1, and 3 and deleting 12,000 rows from each in turn (and committing, leaving a total of 12,108 rows in the table) the report doesn’t change: I haven’t made any space free I’ve simply flagged it in the LOB index as “reusable”. So now we go to session 2 and run the following code 3 times – with “set timing on”:


SQL> l
  1  declare
  2     m_v1 varchar2(32767) := rpad('x',20000,'x');
  3  begin
  4     for i in 1..1 loop
  5             insert into t1 values (s1.nextval, m_v1);
  6             commit;
  7     end loop;
  8* end;

The first run took 0.02 seconds – and the unformatted count dropped to 4

The second run took 0.01 seconds – and the unformatted count dropped to 1

The third run took 10.74 seconds, of which 9 seconds was CPU. The session generated 500,000 redo entries totalling 100MB of redo from 1 million db block changes after doing 8.4 million logical I/Os, issuing 108,000 enqueue (lock) requests and running 108,000 index range scans. The report of space usage ended up looking like this:


Unformatted                   :  108,125 /  885,760,000
Freespace 1 (  0 -  25% free) :        0 /            0
Freespace 2 ( 25 -  50% free) :        0 /            0
Freespace 3 ( 50 -  75% free) :        0 /            0
Freespace 4 ( 75 - 100% free) :        0 /            0
Full                          :   36,333 /  297,639,936

PL/SQL procedure successfully completed.

=======
Generic
=======
Segment Total blocks: 145664
Object Unused blocks: 0

My session has cleared every single piece of re-usable space from the LOB and made it free (unformatted) before allocating space for its one LOB. (That’s going to hurt when the client has 2 million LOBs on the reusable list and isn’t running everything on SSDs – which is why I’m working on this problem).

If you’re wondering why it takes so much redo and so many buffer visits to free 36,000 LOBs this (roughly) is what Oracle does to free up one reusable LOB of 3 blocks – which corresponds to a single index entry carrying three block ids:

  • Find the lowest index entry in the freepool, pin the index leaf block
  • Identify the last block in the list of 3
  • Lock the relevant L1 space management block for the segment and set relevant “bit” to “unformatted”
  • Delete the index entry
  • Re-insert the index entry with one block id removed
  • Commit and unlock the L1 bitmap block
  • Repeat delete/insert the cycle for 2nd block id
  • Repeat the cycle for 3rd (or 1st since we’re going backwards) block id – but don’t re-insert the index entry

Oracle reclaims one block (chunk) at a time. And that’s a bit of a clue to a possible workaround because event 44951 gets mentioned a couple of times in MoS and on the internet as a workaround to a particular problem of HW enqueue waits for LOBS. MoS note 740075.1 tells us:

When using Automatic Segment Space Management (ASSM), and the fix for Bug 6376915 has been applied in your database (Included in 10.2.0.4 +) it is possible to adjust the number of chunks that are cleaned up when the chunk cleanup operation is required.

This can be enabled by setting event 44951 to a value between 1 and 1024 (default is 1). With the value between 1 and 1024 setting the number of chunks to be cleaned up each time a chunk reclaimation operation occurs. This can therefore reduce the number of requests for the High Watermark Enqueue.

Other notes explain that by default only one chunk is cleaned up at a time – which is exactly the behaviour I’m seeing. So what happens when I bounce the database with this event set at level 5 (an arbitrary choice, but larger than the LOBs I’ve been inserting) in the parameter file and repeat the experiment ? On the first attempt it made no difference, but then I changed the experiment slightly and started again. Initially I had done my first re-insert from the one session that hadn’t deleted any rows – which made it an extreme boundary condition; on the second attempt I deleted two rows from the session that had not yet deleted any data (and waited for the retention time to elapse) before doing the inserts from that session.

Deleting two rows would put 6 blocks (in two index entries) onto my re-usable list, so I was starting the inserts with 7 free blocks, 6 reusable blocks and the event set to level 5. Here’s what I saw as I inserted rows one by one.

  • Insert one row: “Unformatted” blocks went up to 9:  I had freed 5 of the reusable blocks then used 3 of them for my lob (7 + 5 – 3 = 9)
  • Insert one row: “Unformatted” blocks went down to 7: I had freed the last reusable block then used 3 blocks for my lob (9 + 1 – 3 = 7)
  • Insert one row: “Unformatted” blocks went down to 4
  • Insert one row: “Unformatted” blocks went down to 1
  • Insert one row: Oracle cleared all the reusable space (11 seconds, 500MB redo), then added an extent (!) to the segment and used 2 of its blocks for part of the new LOB.

So the event isn’t really connected with my problem – though it adds some efficiency to the processing – and my  “boundary condition” is one that’s likely to occur fairly frequently if you’ve got a basicfile LOB defined with multiple freepools. Fortunately it’s probably going to require two pre-conditions before it’s a big problem: first that you’re handling a large number of LOBs and second that your pattern of inserting and deleting is not symmetric – it’s when you use a large number of concurrent sessions for small batches of inserts but a single session for large bulk deletes that all hell can break loose shortly after a delete.

tl;dr

As with many other features of Oracle, skew plays a part in making things break. If you’re doing lots of inserts and deletes of basicfile lobs make sure the mechanisms you use for inserting and deleting look similar: in particular similar numbers of processes to do similar amounts of work for both operations.

 

P.S. It gets worse.

P.P.S. Don’t even start to think that you can work around this by using securefiles.

P.P.P.S. I got an hint from one test that if a reusable LOB is exactly the same size as the LOB being inserted then Oracle very cleverly takes the entry index entry and rewrites it to be the LOB index entry rather than freeing (and then potentially using) the space it identifies.

 

9 Comments »

  1. Great post. I can’t wait you start exploring SecureFiles. Could you elaborate on what you say securefiles still has problems?

    Comment by Christo Kutrovsky — September 12, 2016 @ 7:19 pm BST Sep 12,2016 | Reply

    • Christo,

      Running similar tests (small concurrent inserts, single massive deletes) on securefile LOBs I ended up seeing massive contention for DW and WG locks – the “in memory dispenser” and “write gather” enqueues. This is even with the hidden parameter _securefiles_concurrency_estimate set to its maximum value of 50. I haven’t gone into it in any great detail but I think the problems might be bypassed if I hash partition the object and put each partition in a separate tablespace – the same is true of the basicfile problems, of course, but we’re talking about quite a lot of partitions.

      Comment by Jonathan Lewis — September 12, 2016 @ 7:30 pm BST Sep 12,2016 | Reply

      • I wonder how the performance of Oracle SecureFiles compares to file-system based file allocation/de-allocation. They must have similar scalability challenges. This is specifically about continuous creation/deletion .

        Comment by Christo Kutrovsky — September 12, 2016 @ 7:53 pm BST Sep 12,2016 | Reply

  2. Not that if would fix any securefiles problems, but the _securefile_timers=TRUE setting + v$securefile_timer might help to drill down into some of the securefile internal issues…

    Comment by Tanel Poder — September 12, 2016 @ 8:34 pm BST Sep 12,2016 | Reply

  3. In the old freelist days of heap table management, I remember that an inserting session would sift through up to 5 blocks on the freelist, looking for a block with enough space for the insert (below 100%-PCTFREE?) and after 5 blocks it would give up and add an extent. So the entire job of cleaning up the freelist didn’t fall to one unlucky session.

    It seems Oracle could do with something similar here.

    Similarly to running a FTS SELECT to try to poke your session into doing delayed block cleanout on a table, I wonder if there’s a way to take what you’ve found here to run a script to provoke it to do this cleanup in a batch job where it won’t bite the live production sessions. Or include it in the script that does the deletes.

    Comment by Jason Bucata — September 13, 2016 @ 2:34 pm BST Sep 13,2016 | Reply

    • Jason,

      The rule was simply that the insert wouldn’t take the block over pctused – if would then the block was taken off the freelist and the next block examined. So a long row was allowed to know 5 blocks off the free list that might have been able to accept the next inserted row that wasn’t so long – which is partly why Oracle introduced the ASSM mechanism.

      The situation with freepools isn’t a perfect analogue – because Oracle CAN use the blocks it takes off the reusable list so shouldn’t need to add space. However the idea of a hard limit for one pass of the mechanism would be helpful; an alternative would be to have a limit on the number of chunks a process adds to its freepool when deleting before it starts adding them to other freepools

      For a forced clean-up there is a “rebuild freepools” command, but this locks the table (for no good reason as far as I can see because it does exactly the same as the spontaneous clean-up). I had thought of coding a rebuild immediately after the delete – but it would also have taken 45 minutes (mininmum) for the big job and there was no guarantee that the processes that might insert data wouldn’t cause application problems if they timed out on inserts. Unfortunately the delete was allowed to overlap with the insert.

      Comment by Jonathan Lewis — September 13, 2016 @ 7:22 pm BST Sep 13,2016 | Reply

  4. Before loading lobs from concurrent sessions, i have few doubts over how Oracle dealt with space management for secure files lobs. FREEPOOLS,FREELISTS storage parameters are also ignored for secure files lob and so before conducting concurrency test with secure files lob, I just wanted to see how many blocks/extents are allocated to insert one row of size same as basic file lob.

    ============
    Secure files
    ============
    
     Segment Blocks:            152 Bytes:        1,245,184
     Used Blocks:                72 Bytes:          589,824
     Expired Blocks:             80 Bytes:          655,360
     Unexpired Blocks:            0 Bytes:                0
    
    PL/SQL procedure successfully completed.
    
    ===============
    Generic details
    ===============
    Segment Total blocks:          152
    Object Unused blocks:            0
    
    PL/SQL procedure successfully completed.
    
    SQL> select count(1) from t1;
    
      COUNT(1)
    ----------
             1
    
    I see 152 blocks were allocated for inserting one row. Also noticed the truncate still keeps 16 blocks  with 5 blocks as unused. 
    
    SQL> truncate table t1;
    
    Table truncated.
    
    SQL> @dbms_space_sf.sql test text_lob lob
    ============
    Secure files
    ============
    
     Segment Blocks:             16 Bytes:          131,072
     Used Blocks:                11 Bytes:           90,112
     Expired Blocks:              5 Bytes:           40,960
     Unexpired Blocks:            0 Bytes:                0
    
    PL/SQL procedure successfully completed.
    
    ===============
    Generic details
    ===============
    Segment Total blocks:           16
    Object Unused blocks:            5
    
    PL/SQL procedure successfully completed.
    

    Comment by dbabible — September 22, 2016 @ 3:05 pm BST Sep 22,2016 | Reply

  5. dbabible,

    I tend to use uniform extents rather than system allocated which, I think, is what you’re using. It’s an interesting oddity that you got to 152 blocks: I’d guess Oracle has filled in some holes in the tablespace and given you three extents at 8 blocks (64K) and one extent at 128 blocks (1MB). If it’s not filling holes but has done that in a clean tablespace I guess we just chalk it up as an oddity. When using LOBs it makes sense to keep the LOB segment in its own tablespace with a large uniform extent – I tend to go for 8MB.

    I haven’t examined Securefile LOBs closely yet – just closely enough to make them run into massive contention when modelling my latest client’s requirements – but I did note that the minimum allocation if if you force segment instantiation is 11 blocks which are, in order:

    
          1 type: 0x45=NGLOB: Lob Extent Header
          1 type: 0x3f=NGLOB: Segment Header
          9 type: 0x3d=NGLOB: Hash Bucket
    
    

    Securefiles use a hashing mechanism for locating data rather than an index method – hence the pre-allcoated buckets. I would guess that extra hash buckets would be allocate as the segment grows larger. You will eventually also see:

    
      type: 0x3e=NGLOB: Committed Free Space
      type: 0x40=NGLOB: Persistent Undo
      type: 0x28=PAGETABLE MANAGED LOB BLOCK
    
    

    The allocation of 152 blocks isn’t something to be worried about – you can always odd events at boundary cases, and 1 row inserted is a boundary. ASSM tries to scatter inserts to avoid contention even with ordinary row inserts, with Securefile LOBs there’s an “assumed approximate concurrency” setting which may explain why Oracle created 152 blocks, and preformatted so many of them. The “expired” blocks are probably just “not yet formatted”, by the way.

    Comment by Jonathan Lewis — September 22, 2016 @ 9:29 pm BST Sep 22,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.