Oracle Scratchpad

January 30, 2017

ASSM Help

Filed under: 12c,Oracle,Troubleshooting — Jonathan Lewis @ 12:33 pm GMT Jan 30,2017

I’ve written a couple of articles in the past about the problems of ASSM spending a lot of time trying to find blocks with usable free space. Without doing a bit of rocket science with some x$ objects, or O/S tracing for the relevant calls, or enabling a couple of nasty events, it’s not easy proving that ASSM might be a significant factor in a performance problem – until you get to 12c Release 2 where a staggering number of related statistics appear in v$sysstat.

I’ve published the full list of statistics (without explanation) at the end of this note, but here’s just a short extract showing the changes in my session’s ASSM stats due to a little PL/SQL loop inserting 10,000 rows, one row at a time into an empty table with a single index:

Name                                  Value
----                                  -----
ASSM gsp:get free block                 185
ASSM cbk:blocks examined                185
ASSM gsp:L1 bitmaps examined            187
ASSM gsp:L2 bitmaps examined              2
ASSM gsp:Search hint                      2
ASSM gsp:good hint                      185

It looks like we’ve checked a couple of “level 2” bitmap blocks (one for the table, one for the index, presumably) to pick a sequence of “level 1” bitmap blocks that have been very good at taking us to a suitable data (table or index) block that can be used.

You might have expected to see numbers more like 10,000 in the output, but remember that PL/SQL has lots of little optimisations built into it and one of those is that it pins a few blocks while the anonymous block is running so it doesn’t have to keep finding blocks for every single row.

In comparison here’s the effect of the same data load when operated as 10,000 separate insert statements called from SQL*Plus:

Name                                  Value
----                                  -----
ASSM gsp:get free block              10,019
ASSM cbk:blocks examined             10,220
ASSM cbk:blocks marked full             201
ASSM gsp:L1 bitmaps examined         10,029
ASSM gsp:L2 bitmaps examined              6
ASSM gsp:L2 bitmap full                   1
ASSM gsp:Search all                       1
ASSM gsp:Search hint                      2
ASSM gsp:Search steal                     1
ASSM gsp:bump HWM                         1
ASSM gsp:good hint                   10,016
ASSM rsv:fill reserve                     1

It’s interesting to note that in this case we see (I assume) a few cases where we’ve done the check for an L1 bitmap block, gone to a data blocks that was apparently free, and discovered that our insert would make to over full – hence the 201 “blocks marked full”.

Critically, of course, this is just another of the many little indications of how “client/server” chatter introduces lots of little bits of extra work when compared to the “Thick DB “ approach.

One final set of figures. Going back to an example that first alerted me to the type of performance catastrophes that ASSM could contribute to, I re-ran my test case on 12.2 and checked the ASSM figures reported. The problem was that a switch from a 4KB or 8KB blocks size to a 16KB bblock size produced a performance disaster. A version of my  test case and some timing results are available on Greg Rahn’s site.

In my test case I have 830,000 rows and do an update that sets column2 to column1 changing it from null to an 8-digit value. With a 16KB block size and PCTFREE set to a highly inappropriate value (in this case the default value of 10) this is what the new ASSM statistics looks like:


Name                                  Value
----                                  -----
ASSM gsp:get free block             668,761
ASSM cbk:blocks examined            671,404
ASSM cbk:blocks marked full           2,643
ASSM gsp:L1 bitmaps examined      1,338,185
ASSM gsp:L2 bitmaps examined        672,413
ASSM gsp:Search all                     332
ASSM gsp:Search hint                668,760
ASSM gsp:Search steal                   332
ASSM gsp:bump HWM                       332
ASSM wasted db state change         669,395

I’d love to know what the figures would have looked like if they had been available in the original Oracle 9.2.0.8 case (my guess is that the “blocks examined” statistic would have been in the order of hundreds of millions); they look fairly harmless in this case even though the database (according to some of the other instance activity stats) did roughly 10 times the work you might expect from a perfect configuration.

Even here, though, where the original catastrophic bug has been addressed, the ASSM stats give you an important clue: we’ve been doing a simple update so why have we even been looking for free space (get free block); even stranger, how come we had to examine 1.3M L1 bitmaps when we’ve only updated 830,000 rows surely the worst case scenario shouldn’t have been worse that 1 to 1; and then there’s that “wasted db state change” – I don’t understand exactly what that last statistic is telling me but when I’m worried about performance I tend to worry about anything that’s being wasted.

In passing – if you want to insert a single row into an unindexed table you can expect Oracle to examine the segment header, then an L2 bitmap block, then an L1 bitmap block to find a data block for the insert. (In rare cases that might be segment header, L3, L2, L1). There are then optimisation strategies for pinning blocks – the session will pin the L1 bitmap block briefly because it may have to check several data blocks it references because they may be full even though they are flagged as having space; similarly the session will pin the L2 bitmap block because it may need to mark an L1 bitmap block as full and check another L1 block. The latter mechanism probably explains why we have examined more L1 bitmaps than L2 bitmaps.

Finally, the full monty

Just a list of all the instance statistics that start with “ASSM”:

ASSM bg: segment fix monitor
ASSM bg:create segment fix task
ASSM bg:mark segment for fix
ASSM bg:slave compress block
ASSM bg:slave fix one segment
ASSM bg:slave fix state
ASSM cbk:blocks accepted
ASSM cbk:blocks examined
ASSM cbk:blocks marked full
ASSM cbk:blocks rejected
ASSM fg: submit segment fix task
ASSM gsp:Alignment unavailable in space res
ASSM gsp:L1 bitmaps examined
ASSM gsp:L2 bitmap full
ASSM gsp:L2 bitmaps examined
ASSM gsp:L3 bitmaps examined
ASSM gsp:Optimized data block rejects
ASSM gsp:Optimized index block rejects
ASSM gsp:Optimized reject DB
ASSM gsp:Optimized reject l1
ASSM gsp:Optimized reject l2
ASSM gsp:Search all
ASSM gsp:Search hint
ASSM gsp:Search steal
ASSM gsp:add extent
ASSM gsp:blocks provided by space reservation
ASSM gsp:blocks rejected by access layer callback
ASSM gsp:blocks requested by space reservation
ASSM gsp:bump HWM
ASSM gsp:get free block
ASSM gsp:get free critical block
ASSM gsp:get free data block
ASSM gsp:get free index block
ASSM gsp:get free lob block
ASSM gsp:good hint
ASSM gsp:reject L1
ASSM gsp:reject L2
ASSM gsp:reject L3
ASSM gsp:reject db
ASSM gsp:space reservation success
ASSM gsp:use space reservation
ASSM rsv:alloc from reserve
ASSM rsv:alloc from reserve fail
ASSM rsv:alloc from reserve succ
ASSM rsv:clear reserve
ASSM rsv:fill reserve
ASSM wasted db state change

6 Comments »

  1. Hi Jonathan,

    we are seeing an intermittent issue with somewhat similar symptoms. Single-row inserts into one of the tables occasionally spend up to 2 minutes. During that time, the are doing large amounts (up to 100k) of single-block reads from a unique index on that table.

    The tricky part is that this table lives in an MSSM tablespace.

    Do you think the basic mechanism behind it could still be the same? The database version is 11.2.0.4.

    Thanks!

    Best regards,
    Nikolai

    Comment by Nikolai Savvinov — January 30, 2017 @ 12:54 pm GMT Jan 30,2017 | Reply

    • Nikolai,

      My first thought would be about a very old problem with leaf node splits the wrong free block from an index, but there could be the ASSM variation of the problem (not that that’s your case).

      Option 1: Do you have any code which does very large deletes leaving lots of index blocks empty; do you have any code that could insert lots of (consecutive PK values) in a batch and then rolls back ? In either case you could have a lot of empty blocks which Oracle decides to use as the next block for re-use in an index leaf block split, but finds (after doing a lot of the work) that for some reason it can’t use that block, and tries many more blocks one after the other. This shouldn’t happen any more, but maybe it can – symptoms: do you see (a very small number of ) “recursive aborts on index block reclamation” in the instance statistics; do lots of other session wait on TX mode 4 while the delay is going on.

      Option 2: A special ASSM case – after a very large array insert of consecutive key values you might have a lot of leaf blocks which are full but marked as being free by the bitmap block. This used to be possible with tables, I’ve not seen it with indexes, but if it happens then when Oracle needs an empty block is might end up walking the bitmap and finding that the blocks were full. This, I think, is where 12c2 would should you “blocks marked full” climbing unexpectedly. Pre-12c I would check the blocks being read to see if they were leaf blocks or bitmap blocks. A treedump could also give a clue if a couple of those blocks were in an area of the index with a lot of blocks showing nrows = 0, rrow=0
      might also be revealing.

      UPDATE:
      I’ve found a blog note that I wrote about the 1st problem here: https://jonathanlewis.wordpress.com/2008/09/29/index-analysis-2/

      And a variant on the problem here: https://jonathanlewis.wordpress.com/2009/11/10/index-freelists/

      Comment by Jonathan Lewis — January 30, 2017 @ 1:19 pm GMT Jan 30,2017 | Reply

      • Hi Jonathan,

        thanks a lot for your detailed reply. The second link mentions a blog article by Saibabu that provides a reproducible case that fits our scenario (MSSM, and the waits are on index and not the table). I’m going to experiment a little bit with it.

        To answer your question about deletes/rollbacks, yes, both are possible, although it’s hard to tell for sure whether either actually took place when the problem was last observed (which is over 2 weeks ago). I plotted “recursive aborts on index block reclamation” to see if we had any spikes around that time, but found none. I also did the same for several different split statistics, with the same result. The only statistics that could be relevant that were appreciably increased within the right time frame were ones having to do with segment chunks allocation and segment preallocation, but I’m not quite sure what to do with it.

        We deployed a script which turns on SQL tracing if the given INSERT statement takes more than 7 seconds, hopefully that would give us some further insights.

        Thanks again for your help.

        Best regards,
        Nikolai

        Comment by Nikolai Savvinov — January 31, 2017 @ 9:53 am GMT Jan 31,2017 | Reply

        • Nikolai,

          I haven’t worked with this problem recently so things may have changed, but the “recursive abort” statistic didn’t change by much – a single problem event might result in several hundred blocks being manipulated and then discarded (which, by the way, would also generate a spike in redo – roughly 18KB * number of blocks) but it would only count as ONE recursive abort.

          Comment by Jonathan Lewis — February 1, 2017 @ 10:21 am GMT Feb 1,2017

        • Hi Jonathan,

          thanks — that agrees with my findings that I shared in my blog yesterday (the entire problem being produced by a single event, e.g. by a single 90-10 split).

          I wonder if it’s possible to detect the spike in redo generation. Probably not with AWR data (too much redo in a 30 min snapshot anyway), but by looking at DELTA…IO… columns in ASH (for the LGWR session). Will try it if I find time.

          Best regards,
          Nikolai

          Comment by savvinov — February 1, 2017 @ 10:28 am GMT Feb 1,2017


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.