Oracle Scratchpad

September 25, 2009

Bugs

Filed under: ASSM,Block Size,Bugs,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:03 pm BST Sep 25,2009

There are times as I browse through Metalink when I see descriptions of bugs that make me wonder how on earth anyone managed to find them. There are bugs which are so bizarre in their combination of prerequisites that you might think they’d never,ever,  show up. Here’s one that got a mention on OTN some time back.

Problem: an update on a simple, unindexed, table takes 90 minutes if the table is in a tablespace using a 16KB block size; but closer to 90 seconds if the table is in a tablespace using a 4KB block size. The effect is totally reproducible.

If you want to read the original description of the problem as it presented at the client’s site, Steve Karam published the details about 15 months ago. (There’s an alternative version of the story available on Google Books – page 537 of the 2nd edition of the “Definitive Reference” on tuning Oracle)

It’s not easy to figure out what’s broken here – especially when the requirement is so simple and you know it can’t possibly go wrong – but based on Steve’s description I built a reproducible test case of the scenario he had described and, after a flash of insight, discovered one of those idiotically impossible bugs.

If

  • you’re using ASSM (automatic segment space management)

and

  • you run a single transaction that causes lots of rows to increase in size

and

  • the rows start, and end up, fairly short so you can gets lots in a block

and

  • your PCTFREE setting is not appropriate, resulting in lots of row migration

and

  • your block size is greater than 8KB

then performance is going to be dire.

If you want to see the test case Greg Rahn notes relating to the original posting I made to the OTN thread where Steve mentioned his blog example. He also had a copy of the test case, but this has gone missing after a site move and is now available here.]

The bug has been around since ASSM first appeared and it was years before anyone happened to run into it – that’s just the way it is with the best bugs.

Update 23/5/2011:  Randolf Geist has written a couple of items about a variation of this problem that shows up when you mix compression and ASSM. Link to part 1Link to part 2

20 Comments »

  1. Jonathan

    I would like to expose you a situation I have been confronted to, this last week. I have an insert

    insert /*+ append */ into local_partitioned_tab1
    select from distant_table1 a
    where not exists (select null
    from local_partitioned_tab1 b
    where b.pk = a.pk)

    The insert was taking 2Hours and 30 minutes to complete

    The 10046 trace level 12 when considered was showing that 94% of the execution time was db file sequential read generated by the insert /*+ append */

    I have then do the following

    alter table local_partitioned_tab1 parralel 4;

    and the execution time dropped to 16 minutes!!!

    Could you please explain me what is the effect of this alter table on the insert

    Thanks a lot

    Comment by Mohamed — September 27, 2009 @ 1:56 pm BST Sep 27,2009 | Reply

    • The alter allow Oracle to use a parallel degree of 4.

      So, probably, it reduced the read time for the “not exists “(the select null…) rather than the insert time.

      Bye,
      Antonio

      Comment by lascoltodelvenerdi — September 28, 2009 @ 8:28 am BST Sep 28,2009 | Reply

      • Or possibly changed the execute path from a nested loop anti-join to a hash anti-join.

        There really isn’t enough information here to offer a solid conclusion. But checking execution paths is the obvious first step of the investigation.

        (This isn’t AskTom, by the way, so that isn’t an invitation to send in more information – just a suggestion on how to proceed).

        Comment by Jonathan Lewis — September 28, 2009 @ 5:45 pm BST Sep 28,2009 | Reply

  2. I don’t see how the pre-requisites are so bizarre “that you might think they’d never,ever, show up”. (Never say “never” !).
    After all, ASSM is common. PCTFREE of 10 is common. What would you mean by “PCTFREE setting is not appropriate” ? How many people review PCTFREE settings for every individual tables when setting up the database/schema ? How many do actually change PCTFREE later during the lifecyle of the database ?
    As for “it was years before anyone happened to run into it” , wouldn’t it be fair to say “it was years before anyone noticed that it was a bug”, that “many performance situations still do, in the real world, get treated as “oh ! that’s just how Oracle behaves”, without investigation and test cases.
    How many DBAs do take the time to create test cases to test a performance scenario which arises from a combination of factors relating to parameters/setup vis-a-vis “unusual usage” ?
    A very small table as in the demonstration case IS unusual. But probably such tables do exist.
    A large update of such a small table could be rare — but we really do not know how many Oracle implementations have such tables and updates.

    Hemant K Chitale

    Comment by Hemant K Chitale — September 27, 2009 @ 4:05 pm BST Sep 27,2009 | Reply

  3. Similar to what Hemant said, “For every complaint a business receives, there are some number more people who had the same issue but didn’t bother to voice it.”

    It would be reasonable to say that the bug was ‘hit’ long before it was reported.

    Comment by Mark Brady — September 28, 2009 @ 4:57 pm BST Sep 28,2009 | Reply

  4. Mark, Hemant,
    I think I’d come quite close to agreeing with you – I think it’s quite possible that many people have suffered from this bug, but not to a sufficient degree that they noticed the effect.

    In the original “live” case the run on the 4KB block size recorded 2.2M gets when it should have recorded about 830,000 – but no-one noticed that this was actually bad (except me, of course – and my initial assumption was the excess would be due to related index updates), because it wasn’t sufficiently extreme to be obvious (unless you happened to know how Oracle works).

    You really have to have quite a lot of migration in a single transaction before it’s obvious that something has gone wrong.

    Comment by Jonathan Lewis — September 28, 2009 @ 5:52 pm BST Sep 28,2009 | Reply

  5. Here’s the most obscure bug that I’ve reported to Oracle…

    I found a scenario in a complex query with many WITH clauses. One branch does a select against a global temporary table. Another branch uses the first one in an IN subquery in a CASE statement condition. And the second branch is materialized (either due to GROUP BY, or a MATERIALIZE hint, or automatically by using the branch twice later). And the whole thing runs in parallel.

    In one query I had that met those conditions, I tripped over a wrong results bug in 10.2.0.4 and 11.1.0.7. (9.2.0.8 worked but only because it refused to parallelize the query.)

    If you change any one of those facets of the test case query, it would suddenly work.

    Comment by Jason Bucata — September 30, 2009 @ 5:02 pm BST Sep 30,2009 | Reply

    • Jason,

      Normally the competition is “my DB is bigger than your DB” – but I hereby open the blog to “my bug’s wierder than your bug”.

      Any takers ?

      Comment by Jonathan Lewis — October 1, 2009 @ 6:06 pm BST Oct 1,2009 | Reply

  6. In the interest of accuracy and fair “competition”, I should clarify something: I realized/remembered after posting that that you don’t need the first WITH clause to trip over the bug. That was an artifact of how I worked around the problem (break out the GTT select into a WITH clause and materialize it).

    You actually just need SELECT CASE WHEN x IN (SELECT y FROM my_gtt) THEN 1 ELSE 0 END AS blah, ……

    Comment by Jason Bucata — October 1, 2009 @ 6:13 pm BST Oct 1,2009 | Reply

  7. Jonathan,

    here’s a link to response on the OTN forum which mentions a Doc ID: 330818.1 Increasing db_file_multiblock_read_count Results In Slower Performance Due To More and Smaller Direct Path Waits. Though the doc says about 9i, Alex Haralampiev’s response asserts that it is applicable to 10g too.
    I think this one is related to both “weird bugs” and your recent posts about sorting issues.

    Comment by Timur Akhmadeev — October 2, 2009 @ 7:56 am BST Oct 2,2009 | Reply

  8. [...] 4: 3rd Oct 2009: In a comment on one of my postings about bugs, Timur Akhmadeev has referenced an OTN thread that references Metalink document id 330818.1 and [...]

    Pingback by Analytic Agony « Oracle Scratchpad — October 3, 2009 @ 3:15 pm BST Oct 3,2009 | Reply

  9. [...] Jonathan Lewis – Bugs [...]

    Pingback by Blogroll Report 18/09/2009 – 25/09/2009 « Coskan’s Approach to Oracle — October 6, 2009 @ 7:36 pm BST Oct 6,2009 | Reply

  10. Here’s one interesting (and very disconcertingly hidden) bug : 6008290 as a duplicate of 4743582.
    There’s just not enough information in the published text.
    But a forums posting has possibly some evidence :

    http://forums.oracle.com/forums/thread.jspa?messageID=3816337

    Hemant K Chitale

    Comment by Hemant K Chitale — October 8, 2009 @ 3:03 pm BST Oct 8,2009 | Reply

  11. [...] three passes against a table, for example to merge data from different source into a single table. There is a bug (fixed the latest releases only) that can turn this error into a performance disaster if you happen [...]

    Pingback by Glossary « Oracle Scratchpad — October 11, 2009 @ 10:56 am BST Oct 11,2009 | Reply

  12. [...] Troubleshooting — Jonathan Lewis @ 12:00 pm UTC Oct 10,2009 After asking people about their strangest bugs, it’s a little ironic that I should immediately be subject to one. Here’s a transcript [...]

    Pingback by Bugs – 2 « Oracle Scratchpad — December 15, 2009 @ 9:31 pm BST Dec 15,2009 | Reply

  13. [...] gets, 18.03 seconds spent performing single block reads.  This seems to be behaving similar to the bug that Jonathan Lewis found with ASSM 16KB block size tablespaces in 2008 when column values in existing rows were [...]

    Pingback by Database using ASSM Tablespace Exhibits Slow Insert Performance After an Uncommitted Delete « Charles Hooper's Oracle Notes — January 28, 2010 @ 6:02 am BST Jan 28,2010 | Reply

  14. I think I have one (a weird bug, that is).

    We have a staging table with two partitions: a little shrimpy one that isn’t used and a large one that can be dropped and recreated as new data is staged into the table. If we try to do a full scan against the table, we see really slow performance; Oracle reports the scan, but no scattered reads. All reads against the table are sequential. Extent size is fixed 1 MB (too small I think), ASSM, MBRC is 512. Reads against this table are abysmal.

    It does load sort records, then do an update that lengthens the rows. pctfree is 10, but there are no “continue row events” during the scan … so row chaining is playing no visible role.

    We’ll send it to Oracle Support on Monday … thought this was odd, though.

    Comment by Gary S — February 6, 2010 @ 3:29 am BST Feb 6,2010 | Reply

    • Gary,

      Sorry about the late response – this one dropped out of sight before I’d got around to it. I’d be interested to hear what comments you get from Oracle.

      When you say MBRC = 512, do you mean the parameter db_file_multiblock_read_count, or do you mean the system statistics named ‘MBRC’ ? And what block size are you using – the 512 suggests it might be 2KB.

      Comment by Jonathan Lewis — February 12, 2010 @ 8:57 am BST Feb 12,2010 | Reply

  15. [...] technical editor of this book) reported fixing the problem, and not the book author (reference  reference2  reference3  [...]

    Pingback by Book Review: Oracle Tuning: The Definitive Reference Second Edition « Charles Hooper's Oracle Notes — November 28, 2010 @ 5:18 pm BST Nov 28,2010 | Reply

  16. [...] problem is similar to the problem of row migration and ASSM – there are some hard-coded limits in what you can put into a single Oracle block, and those [...]

    Pingback by Block size – again « Oracle Scratchpad — May 20, 2011 @ 9:22 pm BST May 20,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,161 other followers