Oracle Scratchpad

November 10, 2009

Index Freelists

Filed under: Indexing, Infrastructure, Performance, Troubleshooting — Jonathan Lewis @ 7:09 pm UTC Nov 10,2009

A few months ago Saibabu Devabhaktuni let me know of an interesting issue he had had with indexes misbehaving, and pointed me to a blog note he had written to describe it. In the note he supplies a test case where Oracle walks through hundreds of blocks on an index freelist to find a single block that could be used as the target for a leaf node split.

The most interesting thing to note is that Sai has demonstrated a case where issuing a commit between the bulk delete and the subsequent insert results in a performance problem. On the other hand if you read my description in this blog posting, you’ll see that I have described a scenario where the commit between the delete and the insert is a very good idea – especially if you can fit a coalesce in between to clear up the mess.

If you’re going to repeat the test shown in Sai’s blog, it’s best if you create a new tablespace with a single datafile. When I first ran it, I decided that one of existing tablespaces was a “close enough” match to his example – and found that I couldn’t reproduce his results. His demonstration, being based on a non-unique index,  is very sensitive to the order in which extents are allocated to the table – s0 you need a clean tablespace and one datafile, or a tablespace which guarantees the table will fit in a single extent.

The bug number that Sai raised (8446989) is still waiting for a base bug fix; and the base bug it refers to (8570223) is hidden. There is some further information available, though, because his bug number has become the base bug for bug 7712825 which gives you a few clues about what is going on.

In fact, when I ran Sai’s test case with some diagnostic code interspersed, I noticed that the insert he did to “refill” the empty blocks didn’t take them off the freelist when (to my mind) they should have come off the freelist. For various reasons, and I think one of them may be a fairly expensive structural issue, I don’t think this is going to be easy to fix.

Free space management is not a simple part of the database – especially given Oracle’s dedication to making the commit as fast as possible by minimising the “excess” work that has to take place at the same time; so we may continute to see strange bevhaviour appearing in both freelist management and bitmap management of free space for quite some time to come.

 

 

November 6, 2009

Did you know …

Filed under: Indexing, Infrastructure — Jonathan Lewis @ 7:02 pm UTC Nov 6,2009

… a few things about the use of index space to suprise your friends and amaze your colleagues:

  • If you use “bigfile” tablespaces for your tables this can result in some indexes becoming more space-efficient than they would be otherwise.
  • Creating tables in tablespaces built from multiple datafiles may cause some of their indexes to be less space-efficient than they would otherwise be.
  • Unless you store tables in different tablespaces from indexes, rebuilding indexes can allow the indexes that you haven’t rebuilt to become less space-efficient.
  • Dropping (or truncating) a table may allow unrelated indexes to become less space-efficient
  • Using the partition exchange mechanism for data loading may make some unrelated indexes become less space-efficient.
  • Some classes of index will be less space-efficient in a RAC system than they would be in a single-instance system.
  • Using the “shrink” option on a table may cause some of its indexes to get bigger.

On the plus side: although there are many activities that contribute to indexes  being bigger than they would be if they were newly created, there aren’t many cases where the benefit of recreating them (or restructuring your database) is worth the effort. Note that many “space anomalies” in indexes are transient anyway and are self-correcting over time – given enough time and sufficient use.

Footnote: I am not going to supply an explanation for any of the above phenomena because I don’t think they’re sufficiently important to warrant any action – in general. This note has been published only for the purpose of answering trivia questions and similar entertainments.

 

November 4, 2009

Anonymous

Filed under: Uncategorized — Jonathan Lewis @ 9:00 pm UTC Nov 4,2009

From time to time I browse the comments that Akismet (the spam detector used by WordPress) has intercepted just to check that a useful comment hasn’t been unjustly labelled as spam; and a couple of days ago I was surprised to find that four comments (from the same person) had been marked as spam even though there was no obvious reason why they had been so marked.

They were clearly rubbish, of course – one expressed suprise that I hadn’t published my resume [sic] (it could do with an acute accent over the final e, but apart from that it’s American for C.V. ), another expressed the opinion that my blog showed I had no experience with ACTUAL [sic] working databases … and so on.

But how had Akismet worked out that these comments were garbage ?

The author had supplied a gmail account name – and I think that comments from the gmail domain have triggered spam exclusion in the past but only when the comment included a hyperlink. Then I thought of tracing the IP address that WordPress has captured – and it turned out to lead back to an “anonymizing proxy server”.

Clever trick that – if it’s coming from someone who’s trying to hide, a comment probably deserves to be binned.

Back to the USA – etc.

Filed under: Uncategorized — Jonathan Lewis @ 7:30 pm UTC Nov 4,2009

California, Bulgaria (yes, I know it’s not in the USA), Virginia and Texas

It’s nice after a long trip to get a few email messages from people who have heard you speak and enjoyed the experience – so thanks to the people who let me know what they thought of the seminars and presentations I’ve done in the last few weeks.

For an idea of how things went – here’s a review from Bob Watkins of one of the full day seminars.

UKOUG Conference Series

The agenda for the “Technical and E-Business Suite”  conference (30th Nov – 2nd Dec) is now available.

I’m doing a 45 minute presentation on  “Writing Optimal SQL” at 14:50 on Monday 30th, and another 45 minutes on “Introducing Partitions” at 13:15 on Tuesday 1st.

There’s also going to be a “round table” – which means everyone gets their chance to talk – on Cost Based Optimisation at 12:35 on Wednesday 3rd that I’ll be chairing.

Future options

And while we’re looking ahead – Oracle Education has started talking about trips to Switzerland in Feb, Germany in March, and Greece in Apr. I’m also talking to Oracle in the APAC region about  a trip to China (Beijing and Shanghai) in March.  Watch this space.

November 3, 2009

Foreign Keys

Filed under: Indexing, Infrastructure — Jonathan Lewis @ 11:02 pm UTC Nov 3,2009

People often create far more “foreign key” indexes than are needed – and any redundant index is a waste of several types of resources. I’ve just made some comments on an OTN thread about this topic, so rather than repeat the comments here I’ll just give you the URL.

Update 5th Nov 2009: and here’s another common issue about foreign keys that I also addressed by a link to OTN some time ago.

 

October 31, 2009

Top Ten

Filed under: Uncategorized — Jonathan Lewis @ 10:41 pm UTC Oct 31,2009

Sorry, this isn’t a posting about efficient ways of getting the first 10 rows from a result set – it’s a little note about Oracle Open World. I don’t brag very often, but sometimes it’s hard to resist.

I’ve just received an email about Openworld with the following content:

Go to Oracle OpenWorld On Demand
Get in on sessions you missed, or hear keynotes and executive solution sessions again.
FREE for all full conference, Oracle Develop, and Primavera program attendees.
Only US$245 for all others.

    • Find all keynotes, general sessions, executive solution sessions, and breakout sessions—more than 1,900 presentations—in an online streaming rich media portal
    • Get keynotes with synchronized video, slides, scrolling transcripts, and downloadable MP3s and MP4s
    • Comment and rate presentations online and post your own content
  • For 1,900 presentations, $245 doesn’t seem to be an extreme price – although it might be nicer to be able to get a set of DVDs rather than downloading all those pdfs, mp3s and adobe flash files. But I was registered anyway, so I just clicked on the link, signed in, and got to the opening portal page to find a list of the “Top Ten Presentations”, in this order:

    • Keynote: Extreme Innovation - Scott McNealy, Sun Microsystems, Inc.
    • Keynote: Larry Ellison, Oracle
    • Keynote: Innovation Across the Stack -  Thomas Kurian, Oracle
    • Keynote: Oracle Develop – What Are We Still Doing Wrong? – Thomas Kyte
    • General Session: Oracle Fusion Middleware 11g – Foundation for Innovation -  Hasan Rizvi
    • General Session: Lower Costs with Oracle Database 11g Release 2 - Robbins Jeffrey, Masayuki Karahashi, Andy Mendelsohn
    • Keynote: The Art of the Possible - Charles Phillips and Safra Catz, Oracle
    • How to Hint -  Jonathan Lewis
    • Accelerating Java for Better Enterprise Performance - Charlie Hunt, John Pampuch
    • General Session: Complete, Open, and Integrated Applications for Your Business - Andy Platt, Paco Aubrejuan, Janet Foutty, Anthony Lye,  Steve Miranda, Syed Nadeem, Jewell Richard
    Looking at the list of keynotes and well-known Oracle names above mine, I think I’m currently the top “civilian” (at the moment) – so I’m going to feel just a little smug for the rest of the weekend.

    October 30, 2009

    logging

    Filed under: Performance — Jonathan Lewis @ 1:02 pm UTC Oct 30,2009

    I’ve just jotted down a few notes about “log file sync” waits, “log file parallel write” waits, and the nologging option in response to a question on OTN about redo activity when creating a large index. The thread is probably worth reading.

    October 27, 2009

    XStreams

    Filed under: Infrastructure — Jonathan Lewis @ 7:24 pm UTC Oct 27,2009

    If you’re into using Streams, you may want to take a look at the latest enhancement: XStreams.

    There’s a little overview, plus a couple of Java examples at the following links:

    October 26, 2009

    Philosophy – 8

    Filed under: Uncategorized — Jonathan Lewis @ 8:33 pm UTC Oct 26,2009

    Btree indexes vs. Bitmap indexes – the critical difference

    • A single B-tree index allows you to access a small amount of data very precisely.
    • It is the combination of a subset of the available bitmap indexes that offers the same degree of precision.

    You should not be comparing the effectiveness of a bitmap index with the effectiveness of a b-tree index.

    (Inevitably it’s a little more subtle than this – you may create some low-precision b-tree indexes to avoid foreign key locking issues,the optimizer can combine b-tree indexes, and so on - but if you start from this basis you will have a rational view about how to use bitmap indexes).

    Footnote: remember, also, that bitmap indexes introduce massive concurrency issues and other maintenance overheads.

    October 23, 2009

    Quiz Night

    Filed under: Uncategorized — Jonathan Lewis @ 6:55 pm UTC Oct 23,2009

    Okay, so it’s not night-time, or even early evening, in some parts of the world – but somewhere it’s Friday night, and Friday night is quiz night.

    Now, most people are aware that the clustering_factor of an index measures the ordering (or, if you want to be slightly more accurate, the clustering) of the rows in a table relative to the order of entries in the index. Consequently  if you check the clustering_factor before and after rebuilding an index it cannot change.

    So how do you explan this output – which comes from a clean cut-n-paste, with no selective editing, from an SQL*Plus session running Oracle 10.2.03 with echo on - with no other sessions active on the system:

    
     SQL> execute dbms_stats.gather_index_stats(user,'i1', estimate_percent=>100)
    
    PL/SQL procedure successfully completed.
    
    SQL>
    SQL> select
      2          clustering_factor
      3  from       user_indexes
      4  where      index_name = 'I1'
      5  ;
    
    CLUSTERING_FACTOR
    -----------------
                  396
    
    1 row selected.
    
    SQL>
    SQL> alter index i1 rebuild pctfree 10;
    
    Index altered.
    
    SQL>
    SQL> execute dbms_stats.gather_index_stats(user,'i1', estimate_percent=>100)
    
    PL/SQL procedure successfully completed.
    
    SQL>
    SQL> select
      2          clustering_factor
      3  from       user_indexes
      4  where      index_name = 'I1'
      5  ;
    
    CLUSTERING_FACTOR
    -----------------
                  198
    
    1 row selected.
    

    You have two minutes, starting from NOW.

    October 21, 2009

    Bitmap Updates

    Filed under: Indexing, Infrastructure, Performance — Jonathan Lewis @ 5:33 pm UTC Oct 21,2009

    It is fairly well-known that bitmap indexes are very dense structures that can behave badly if their underlying tables are subject to even fairly low levels of insert, update or delete activity. Problems include contention, space management and performance, and these problens have spawned a couple of well-known guidelines relating to bitmap indexes:

    • Avoid concurrent modification of data by multiple processes – otherwise you end up with processes dead-locking
    • Drop/disable bitmap indexes before data loads and rebuild them afterwards.

    Of course, with a little care and experimentation, you may find that you don’t need to apply the second guideline in all cases – especially for bulk inserts.

    But what if you absolutely do have to cope with a continuous (serialised) stream of small modifications to the data and can’t drop the indexes while you’re doing it. Are there any damage-limitation steps you can take. The answer, of course, is yes – there’s always a “least worst” strategy, no matter how badly you attempt to misue an Oracle feature. Here’s a demonstration of one interesting aspect of bitmap index behaviour that helps to reduce the problems of “lots of small updates”.

    All I’m going to do is create a table with a bitmap index in a tablespace using 8KB blocks and freelist management (rather than ASSMautomatic segment space management – but that’s not a specific requirement of the demonstration) then run a pl/sql loop that updates 5,000 randomly selected rows one at a time.  When the update loop is complete I’m going to check the session stats for any interesting figures.

    execute dbms_random.seed(0)
    
    create table t1
    as
    with generator as (
    	select	--+ materialize
    		rownum 	id
    	from	all_objects
    	where	rownum <= 3000
    )
    select
    	rownum				id,
    	trunc(dbms_random.value(1,100))	bit1,
    	trunc(dbms_random.value(1,100))	bit2,
    	lpad(rownum,10,'0')		small_vc
    from
    	generator	v1,
    	generator	v2
    where
    	rownum <= 400000
    ;
    
    create index t1_i1 on t1(id);
    create bitmap index t1_b2 on t1(bit2);
    
    execute snap_my_stats.start_snap
    
    declare
    	m_key	number(6);
    begin
    	for i in 1..5000 loop
    		m_key := trunc(dbms_random.value(1,250000));
    
    		update t1 set
    			bit2 = trunc(dbms_random.value(1,100))
    		where id = m_key;
    
    	end loop;
    end;
    /
    
    execute snap_my_stats.end_snap
    

    The snap_my_stats procedure calls come from one of my simple snapshot packages – querying v$mystat joined to v$statname in this case. So what do you think will be the interesting numbers generated by this little piece of code when running on 10g ?

    Take a look at the figures for the session’s undo and redo:

    
    redo entries                      31,786
    redo size                     41,933,324
    undo change vector size       38,038,180
    

    The number of redo entries is a little high – in a perfect case we might expect about 15,000 redo entries because (in principle) we update one row, delete one index entry and insert another 5,000 times; clearly we’ve had quite a lot of extra block cleanout activity happening at the same time. In fact, the number of redo entries isn’t particularly interesting, but I’ve included it so that you can see that the average size of a redo entry is, at first approximation, nearly 1.3KB – which is quite big.

    Since half the redo entries are fairly small entries relating to block cleanout we could ignore their contribution to the count, and since 5,000 of the entries are fairly small entries describing changes to the table blocks we could ignore their contribution as well – the most significant entries are the 10,000 entries for modifying index blocks and (after allowing for small redo sizes for the other entries) our second approximation is that the average size of the redo entries for the index changes is about 4KB (which you could confirm by doing some log file dumps). Think about that for a moment – every time we update a row in the table, the two index changes between them seem to generate the equivalent of a data block of redo.

    Given that undo also has to be described by redo ( see Nutshell – 1 ) you can see that most of the redo is actually from the undo – 38MB out of 42MB. So what information is being created where, and why is there so much in total, and what can we do to reduce it.

    Each of my update statements updated one row of data, usually changing the indexed column from one value to another. To effect this change Oracle has  to change a bit in one bitmap index entry (the one for the current value of the column) from a one to a zero and change a bit in another bitmap index entry (the one for the new value of the column) from a zero to a one. But you don’t update index entries – you “delete” (mark as deleted) the old index entry and insert a new index entry … at least, that’s what you do with B-tree indexes in Oracle, and it’s what you used to do with bitmap indexes until 10g of Oracle).

    In 10g when you change a bit in a bitmap index entry you copy the old version of the index entry into an undo record and modify the index entry “in situ” to its new value. (This may not be possible in every case, there are a few variations on the theme.) The redo change vectors you generate are:

    • a very large vector describing what you’ve done to the undo block,
    • a small change vector describing how you’ve modified the index entry. 

    Now remember that you do this for two index entries. So how do we manage to generate so much undo and redo ? Because a single bitmap index entry can be as large as half a data block – which is why the total redo generated for a single row change can be roughly the same size as a whole data block.

    So the underlying problem is a consequence of the size of the block or rather, to be precise, the size of the freespace available in the block based on the index definition. So how can you reduce the redo generation if you really need to cope with this type of drip-feed into a table with bitmap indexes ? There are two options really – create the index in a tablespace with the smallest block size possible, or define the index with a large value for pctfree (e.g. 75%). Both strategies have their costs but they can make a big difference to the undo and redo generated.

    Here, for example, are the figures from re-running my test first using a 2KB blocksize for the index, and then using an 8KB blocksize with the index pctfree set to 75. (You can expect the figures to be similar but not identical because the settings don’t result in exactly the same limiting size on an index entry .) In both cases the typical index entry has been reduced by a factor of approximately four, so the undo change vector size is roughly a quarter of the size – and that fact is reflected in the reduction in the redo size:

    Blocksize 2KB – pctfree 10 (default)

    redo entries                       29,718
    redo size                      14,784,852
    undo change vector size        10,293,344
    

    Blocksize 8KB – pctfree 75

    redo entries                       29,944
    redo size                      15,913,584
    undo change vector size        11,413,304
    

    If diskspace and buffer memory are not a problem you may prefer to “waste” space in the 8KB blocks so that Oracle can make best use of the buffer cache through its LRU as the pattern of activity changes throughout the  day. If space (and memory particularly) are at a premium you may choose to go for the smaller block size and deal with the hassle of manual cache management rather than memory to “empty” space.

    If you think this is bad, by the way, take a look at the equivalent figures for the test when run against 9i using an 8KB block size and the default pctfree 10 (there is no “undo change vector size” statistic on 9i).

    redo entries                      136,209
    redo size                     289,892,104
    

    Apart from the excess redo, the index also exploded from 99 blocks to about 7,300 blocks and the test run took 81 seconds instead of just nine seconds.

    The problem is that in 9i (and earlier), updates to bitmap index entries are treated the same way as updates to B-tree entries – which means a long “history” of very similar bitmap index entries can build up in the index, and these entries result in leaf block splits and large entries propagating upwards into branch blocks; and both the old and new entries are copied (one into the undo vector, the other into the forward vector) into the redo log.

    Bonus Performance Threat:

    If you enable flashback database, then Oracle will read old undo blocks from the undo tablespace and copy them into the flashback log before “new”ing them and using them (watch out for statistic “physical reads for flashback new”). So when you’re hammering the redo logs because of your bitmap updates, you’re also hammering your undo tablespace and flashback log – so it really is worth thinking carefully about the best way to do damage limitation.

    October 20, 2009

    Understanding

    Filed under: Uncategorized — Jonathan Lewis @ 7:27 pm UTC Oct 20,2009

    Karen Morton has a few wise words to say about understanding vs. memorization. Definitely worth reading and understanding – and maybe even memorizing.

    October 18, 2009

    Experts

    Filed under: Uncategorized — Jonathan Lewis @ 7:30 pm UTC Oct 18,2009

    Chen Shapira wrote a very nice note about the unconference (and Oracle Closed World) presentation that I did at Oracle Open World this year, loosely titled “The Beginner’s Guide to being an Oracle Expert”. In her comments she has captured a point more clearly than I have ever expressed it: “DBAs are under a lot of pressure not to be experts.”

    I am not a working DBA – yet I spend a lot of my time solving problems for DBAs and helping DBAs do their jobs better. How do I manage this ? By knowing more than they do about the technology they are using***; and I’m in that position because I have the time to investigate and test and experiment – time that a working DBA is usually not given.

    It’s easy for me to make that time, of course, because I am self-employed and create my own timetable – and on my timetable I aim to average one day per week working at home investigating mechanisms and solving problems: call it my R&D budget. I’ll charge you more for the time I spend on your site because I’ve spent time in my “lab” making sure I know the answers to your questions before you even ask them. 

    I have the time to be an expert because I can choose how I use my time; the typical DBA is not so lucky. If you want to be an Oracle expert, time is the most important commodity.

    *** I should point out that there are areas of the technology where I am not an expert – so there have been occasions when people have asked me to do a job and I have suggested they contact a colleague who is an expert in that particular field.

    Footnote: as I glanced through this note just before publishing it, it cross my mind that DBAs are often advised to ignore the wait event “SQL*Net Message from Client” as it’s only “think time” and doesn’t affect performance. When you’re aim is to become an expert “think time” becomes very important, and ends up making a huge impact on performance.

    October 16, 2009

    Correlation

    Filed under: Uncategorized — Jonathan Lewis @ 7:14 pm UTC Oct 16,2009

    One of the “inspirational thoughts” on my opening page is the observation by the late Stephen J. Gould that

    The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.”

     It’s very easy to equate correlation with causation and take inappprioate action as a result – it’s an example of faulty thinking that I see fairly frequently on forums such as OTN or the Oracle newsgroups.

    If you want to get an insight into the difference between correlation and causation, you ought to read Robyn Sands’ note on “Nonsense Correlation”.

    October 14, 2009

    Nutshell – 1

    Filed under: Infrastructure — Jonathan Lewis @ 8:22 pm UTC Oct 14,2009

    Undo, redo, and recovery in a nutshell. (The “1″ in the title may turn out to be optimistic, I offer no guarantee about further nutshells.)

    When you update a block – change one row in an table, say, or mark an index entry as deleted – Oracle does the following:

    • creates some redo (called a change vector) to describe the change to the table block
    • creates some undo to describe the older version of the changed data
      • this really means creating some redo (another change vector) to describe how to create that undo
    • writes the change descriptions into the redo log buffer
      • the pair of change vectors go together (undo first) into a single redo record
    • changes the undo block
    • changes the table block

    The redo must be written to disc before the table block and the undo block are written to disc. Eventually, though, the undo block and the changed table block will be written to disc – even if the change has not been commited.

    A Common Question: If the database crashes before the transaction commits, how can Oracle recover the old version of the data if the new, uncommited version has overwritten it ?

    Answer: After the database crash, the recovery process knows the last checkpoint time for each file, and applies redo to bring each datafile up to date. This mechanism applies to the files in the undo tablespace, just like any other permanent tablespace.

    Once the database has been brought up to date the recovery process can see the latest version of both the table block and the undo block. Since the undo tablespace (including the undo segment header block, hence the transaction table) is up to date  the recovery process can detect that that transaction didn’t commit, so it can do a rollback to recover the old table data from the undo block.

    Footnotes:

    This description of data change is the most general example – there are special cases that increase the complexity of the mechanism. In particular, there is some special processing if your change is the start of a transaction, and in 10g the first few changes of a transaction now use a special optimisation on single-instance Oracle that takes advantage of “private redo threads”, and “in-memory undo”.

    See the Oracle Glossary for notes on the “undo segment header” and “transaction table”.

    Next Page »

    Blog at WordPress.com.