Oracle Scratchpad

January 19, 2011

ASSM Again

Filed under: ASSM,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:21 pm GMT Jan 19,2011

While checking my backlog of drafts (currently 75 articles in note form) I came across this one from August 2009 and was a little upset that I hadn’t finished it sooner – it’s a nice example of geek stuff that has the benefit of being useful.

From the Oracle newsgroup comp.databases.oracle.server, here’s an example of how to recreate a performance problem due to maintenance on ASSM bitmaps in 10.2.0.4.

Create a table in a tablespace with an 8KB block size, locally managed tablespace with uniform 1MB extents, and automatic segment space management (ASSM). Check the newsgroup thread if you want complete details on reproducing the test:

Session 1: Insert 100,000 rows of about 490 bytes into a table using a pl/sql loop and commit at end.
Session 1: Insert 1,000 rows into the table with single SQL inserts and no commits
Session 1: delete all data from the table with a single statement – but do not commit

Session 2: Insert 1,000 rows into the table with single SQL inserts and no commits – it’s very slow.

As one person on the thread pointed out – it looks as if Oracle is doing a full tablescan of the table, one block at a time showing “db file sequential read” waits all the way through the table. (If your db_cache_size is large enough you might not see this symptom).

I simplified the test – inserting just 100,000 of the rows (with the commit), then deleting them all (without the commit), then inserting one row from another session. Taking a snapshot of x$kcbsw and x$kcbwh, I got the following figures for the activity that took place inserting that one extra row (this was on Oracle 10.2.0.3):

---------------------------------
Buffer Cache - 23-Dec 11:50:36
Interval:-  0 seconds
---------------------------------
          Why0          Why1          Why2    Other Wait
          ----          ----          ----    ----------
         1,457             0             0             0 ktspfwh10: ktspscan_bmb
             8             0             0             0 ktspswh12: ktspffc
             1             0             0             0 ktsphwh39: ktspisc
         7,061             0             0             0 ktspbwh1: ktspfsrch
             1             0             0             0 ktuwh01: ktugus
         7,060             0             0             0 ktuwh05: ktugct
             1             0             0             0 ktuwh09: ktugfb
             2             0             0             0 kdswh02: kdsgrp
             2             0             0             0 kdiwh06: kdifbk
             2             0             0             0 kdiwh07: kdifbk
          ----          ----          ----    ----------
        15,595             0             0             0 Total: 10 rows

The figures tell us how much work Oracle has to do to find a table block that could accept a new row. The idea is simple – Oracle checks the first “level 3” bitmap block (which is actually part of the segment header block) to find a pointer to the current “level 2” bitmap block; it checks this level 2 bitmap block to find a pointer to the current “level 1” bitmap block; and finally it checks the “level 1” bitmap block to find a pointer to a data block that shows some free space.

Unfortunately every block in our table is apparently empty – but that’s only going to be true once session 1 commits. In this version of Oracle the blocks are all visible as “x% free” in the level 1 bitmaps – but when Oracle visits each block (“ktspbwh1: ktspfsrch”) it checks the ITL entry, which points it to the transaction table slot in the related undo segment header block to Get the Commit Time for the transaction (“ktuwh05: ktugct”) and finds that the transaction is not committed so the space is not really free. So Oracle has to visit the next block shown as free in the bitmap.

In our “bulk delete / no commit” case, we end up visitng every (or nearly every) block in the entire table before we find a block we can actually use – and, given the nature of the ASSM bitmap implementation, the order of the block visits is the “natural” table order, so we see something that looks like a full tablescan operating through single blocks reads (apart, perhaps, from a few blocks that are still cached).

I can’t explain why we do 1,457 visits to bitmap blocks (“ktspfwh10: ktspscan_bmb”) in this version of Oracle, but perhaps it’s simply an indication that Oracle picks (say) five “free block” entries from the bitmap block each time it visits it and therefore has to visit each bitmap block about 12 times if it doesn’t find a data block with space that really is free in it search. This may be a compromise between two possible extremes in “bad” cases – having to hold it while it checking every single entry (causing long buffer busy waits), or having to get it again after every single failed check (potentially causing Cache Buffers Chains problems).

Note – these results will be hugely dependent on version of Oracle – in an earlier version of Oracle the bitmap blocks were not updated by the delete until some time after the commit – and this variation of delayed block cleanout produced other unpleasant anomalies; and, just to make like difficult in later versions of Oracle, the x$kcbsw / x$kcbwh objects are not populated properly in 11g.

Footnote: In case you hadn’t realised, ASSM is a mechanism aimed at OLTP systems with a reasonable degree of concurrency – so it’s not too surprising that you can find problems and weak spots if you hit it with processing which is biased towards the DW and batch processing end of the spectrum.

Update (April 2015)

The problem has been resolved in 11.2.0.4 – a thread that came up on the OTN database forum recently looked as if it might have been an example of the problem, and the owner of the problem originally described in the newsgroup joined in the discussion, read the note I had made linking back to this blog, and re-ran their test case in 11.2.0.4 and 11.2.0.3, reporting that the problem was still there in 11.2.0.3 but gone in 11.2.0.4.

 

 

11 Comments »

  1. Nice post Jonathan.

    You seem to often encounter ASSM issues. Would a general advice be to use freelists in datawarehouse style workloads?

    With bigfile tablespaces supporting ASSM only, it appears Oracle is trying to push exclusively towards ASSM. Your opinion?

    Comment by Christo Kutrovsky — January 19, 2011 @ 7:17 pm GMT Jan 19,2011 | Reply

    • Would stick with ASSM and AUTOALLOCATE, not UNIFORM, even for DW workloads. The ASSM issues that I have seen are bugs (like bug 6918210, now fixed) or cases that are very uncommon in most production environments (like this one). My guess is if you tried hard enough, you could probably find some edge condition with most anything. However, by doing so, you may run into some other issue that you normally would not have. If you do run into an issue, make adjustments as necessary, but I’d do that on an exception basis, not a broad brush recommendation.

      Comment by Greg Rahn — January 19, 2011 @ 11:10 pm GMT Jan 19,2011 | Reply

      • Greg,

        I wouldn’t call this case “very uncommon”; it’s nothing but a big delete followed by a small insert. However, it’s the sort of thing which might not be noticed very often, and might cause bafflement in the few cases where it is seen – so it’s worth describing the effect and the reason.

        Despite the fact that I keep describing and explaining the edge conditions (that I seem to run into quite frequently), I think you’ll find that my reply to Christo is in agreement with your last sentence. Stick with the Goldlocks option until you have to work around the special case.

        Comment by Jonathan Lewis — January 21, 2011 @ 5:37 pm GMT Jan 21,2011 | Reply

    • Christo,

      As a general principle I like to stick to Oracle defaults as much as possible and only do something non-standard if I know that I have to work in a special environment, or if I have had previous experience with bugs, or suspect that there is an incompatability between a feature and what I want to do.

      For example, I would assume that ASSM for an OLTP is a perfectly sensible approach – I know that I have come across (a couple of) sites that had concurrency trouble with ASSM because it doesn’t support the level of concurrency they need, so I know how to recognise the threat – so I can choose to use freelists if necessary. I would, for example, use my knowledge of the effect I mention in this article to work out how to address the problem it demonstrates in the rare case that it appears.

      Since LOBs have been mentioned – that’s another case where I would be a little careful about ASSM – especially since LOBs are Large Objects, which means painful to recover – because of bugs in the past. I’d check versions, bug fixes and expected activity very carefully before deciding whether to use freelists or ASSM for LOBs.

      Comment by Jonathan Lewis — January 21, 2011 @ 5:31 pm GMT Jan 21,2011 | Reply

  2. Jonathan,

    I think that there were actually two related threads at about the same time on the comp.databases.oracle.server Usenet group. If I recall correctly, I found that Oracle Database 11.1.0.7 exhibited the same behavior as 10.2.0.4. I could not quite make as much sense of the test results as you, but I did try to clean up the test case and include it in a blog article in case anyone was able to more accurately decipher the results:
    http://hoopercharles.wordpress.com/2010/01/28/database-using-assm-tablespace-exhibits-slow-insert-performance-after-an-uncommitted-delete/

    Comment by Charles Hooper — January 19, 2011 @ 7:24 pm GMT Jan 19,2011 | Reply

    • Charles,

      I usually include URLs back to the sources in my draft notes – but I hadn’t this time. I had completely forgotten the volume of information in the CDO conversations you’ve linked to from your blog of the same problem. Thanks for linking back.

      Comment by Jonathan Lewis — January 21, 2011 @ 5:48 pm GMT Jan 21,2011 | Reply

  3. ASSM is also much slower than freelist when it comes to insert/update on CLOB. And to make things worse, CLOB mechanism seems to be different than BLOB (CLOB operations are so much slower than BLOB). I played extensively with the CACHE/NOCACHE, LOGGING/NOLOGGING, PCTVERSION/RETENTION in Freelist and ASSM and found that the winning combination is Freelist + CACHE while NOLOGGING decrease a bit the redo amount generated (but no so much) and RETENTION is a must to avoid with mass update. Inlining CLOB or not does very little in term of DML operation on the CLOB itself (It have of course impact on density for all columns in the block if you leave big CLOB in the block).

    Comment by Bernard Polarski — January 20, 2011 @ 8:44 am GMT Jan 20,2011 | Reply

    • Bernard,

      I think the most important point in that list is that you do have to think and test carefully; but our tastes in LOB configuration are similar – I tend to favour CACHE (usually with an isolated KEEP or RECYCLE cache) or CACHE READ.

      The BLOB/CLOB thing might be related to the conversion that external to internal character sets that can take place with CLOBS – and can double the internal size of the external CLOB, of course.

      Comment by Jonathan Lewis — January 21, 2011 @ 5:22 pm GMT Jan 21,2011 | Reply

  4. […] good that Jonathan Lewis is now blogging more and more and this benefits the Oracle community a lot with new and novel ideas […]

    Pingback by Log Buffer #212, A Carnival of the Vanities for DBAs | The Pakistani Spectator — January 22, 2011 @ 6:20 am GMT Jan 22,2011 | Reply

  5. Hi Jonathan
    we are running a 50 TB core banking database. on some of the transaction collecting tables we faced enq: FB contention while doing mass inserts on these tables ( this was with 30gb extents pre-allocated).
    on researching i found that ASSM serializes access to data blocks by placing multiple sessions on an enqueue. Recieved patch from oracle but of no use.
    have decided to switch to freelist managed tables.

    the tables are truncated daily at end of day.

    Concurrently each tables is inserted upon by 140 sessions and fill upto around 30GB in the day.

    Running 3 instance RAC setup on 11.2.0.1. DB size = 50 TB, Redo volume generation = 1.7 to 2 TB per day, 1500 to 1800 TPS.

    Have decided to convert the tablespaces to
    1. Freelist managed by putting 140 freelists on each IG table & its index ( total there are 7 such tables)
    2. Adding 3 Freelist Groups to each table ( due to 3 instance RAC)
    3. Pre-allocating extents of 30GB.
    4.Setting PCTUSED and PCTFREE both at 90.
    5.Running dbms_repair.repair_freelists on tables and indexes before start of day so that free blocks are distributed equally in them.

    Requesting you to vet the above 5 point approach and suggest changes if any.

    Thank you for the great work you’ve been doing for the Oracle community.

    Best Regards
    Dhairyasheel Tawade
    Oracle – ACS – India.

    Comment by Dhairyasheel Tawade — October 1, 2012 @ 6:37 am BST Oct 1,2012 | Reply

  6. […] ASSM again (bulk delete / no commit) […]

    Pingback by Oracle Database – db file sequential read Wait Event – quoellfrisch — November 22, 2018 @ 3:35 pm GMT Nov 22,2018 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: