Oracle Scratchpad

November 23, 2007

Table Rebuilds

Filed under: Infrastructure,Oracle,Tuning — Jonathan Lewis @ 7:25 pm GMT Nov 23,2007

Here’s a question that came up on the OTN database forum recently:

I want to reorganize some tables to release some space, how can I find the tables that need to be reorganized. I found something that whenever I reorganize the tables it won’t release space, instead the size of the table increases. Can we find the tables which will gain space by reorganization ?

I’m not going to address the ‘how do we find the tables’ question here – but I thought I’d just describe how difficult the problem of re-organising tables can be by quoting something I wrote in Practical Oracle 8i nearly eight years ago.

Imagine you have some sort of order processing system where each order goes through three main stages – ‘new’, ‘in progress’, ‘completed’.  A ‘new’ row requires 50 bytes of storage, an ‘in progress’ row takes up 100 bytes, and a ‘completed’ row reaches it full maturity at 150 bytes.  The canny DBA quickly works out that PCTFREE should be set to 65% to allow new rows to grow to full size without migrating (which also means setting PCTUSED to 35% or less as the total of PCTUSED and PCTFREE has to be no more than 100).

Every month, most orders that were completed at least 12 months earlier are deleted from the system, but some are retained for various reasons.  Over a period of time the table becomes a little ragged because of this incomplete clear down, so the DBA decides the time has come to rebuild the table and claw back some space and efficiency.

Ask yourselves this question: what should the value of PCTFREE be (at least temporarily) as the table is being rebuilt?

Assume 95% of the rows in the table are complete, 4% are in progress, and 1% are new.  For the benefit of the completed rows, the best value for PCTFREE would be zero since the bulk of the data will never be updated again, but this would be disastrous for the more recent 5% of the data which needs plenty of room to grow.  Conversely if you set PCTFREE to 65% to cater for the new data, or even 35% to compromise between the new data and the ‘in progress’ data, you waste a huge amount of space in the blocks which contain only completed data.

Of course you could get your spreadsheet going and decide on 2% as a reasonable weighted average – (95% * 0 + 4% * 35 + 1% * 65)/100, and hope for the best.  You could start playing complicated games with partial unloads and reloads, changing PCTFREE as you go.  There are several (tedious) variations of making the best of an awkward situation. 

The book then goes on to describe how you might do some testing with the minimize records_per_block option of the alter table command to prepare an empty table which would only allow you to insert the “correct” number of rows per block.

In passing, one of the answers to the original question pointed to an item on OraFaq that suggested that you identify tables where rebuilding might be of some benefit by looking at dba_tables, multiplying avg_row_len by num_rows,  dividing by the block size then comparing with blocks. If blocks is much larger – say 50% larger – than the space needed for the data then you may have a table worth thinking about.

Although it made no allowances for row and block overheads, as a pointer to trouble-spots it’s probably quite reasonable in most cases – provided your statistics are reasonably accurate and you don’t have long, LOB, varray, nested table or UDT (user-defined type) columns – all of which can make the arithmetic irrelevant.

13 Comments »

  1. “Imagine you have some sort of order processing system where each order goes through three main stages – ‘new’, ‘in progress’, ‘completed’. A ‘new’ row requires 50 bytes of storage, an ‘in progress’ row takes up 100 bytes, and a ‘completed’ row reaches it full maturity at 150 bytes. ”

    Here is what fires in my mind, reading this.
    Instead of the one table with all these “NULL-able” columns, one really should have designed this to be three tables: One holding all new-order column values, a second holding the additional in-progress column values, and the last one for completed-order column values.
    Nicely modeled in a relational way.

    All tables with pctfree 0, and no reorgs needed.

    Comment by Toon Koppelaars — November 26, 2007 @ 6:33 am GMT Nov 26,2007 | Reply

  2. @Koppelaars

    Yep!
    I think the same.

    Clearly Jonathan situation was an example.

    I can not imagine why “reorganized” table gain space…

    Comment by Antonio — November 26, 2007 @ 7:37 am GMT Nov 26,2007 | Reply

  3. Yes, I’d always be suspicious of planned NULLs (as opposed to unplanned unknowns) — usually a surefire signal of poor design.

    Comment by SeánMacGC — November 26, 2007 @ 2:53 pm GMT Nov 26,2007 | Reply

  4. But aren’t you tripling your i/o? To retrieve 1 record in a single table you’d do an (one) index scan then table access by rowid. In the case of vertically partitioned tables you’d do that 3 times (ok, on completed rows only). We avoid chained rows within a single table, why would it be better to *intentionally* chain every row? You’ve also increased your storage costs… although the “disk space is cheap” mantra is the inevitable reply.

    Do the queries you intend to run play some role? If your most important query is to bring back every record in the new state, maybe the vertically partitioned table is better because the rows_per_block is larger. But if your most import query is retrieving one record at a time regardless of state then you’re in the state I mentioned before.

    On the upside, you never do an update. You’ve turned them all into inserts.

    I’m not saying that anyone is wrong, only that I’m confused. I’ve encountered this choice myself and never could decide, definitively, between the two.

    Comment by Mark Brady — November 26, 2007 @ 5:24 pm GMT Nov 26,2007 | Reply

  5. Or, you could put the three tables on a cluster… but isn’t that just back to where we started, one table?

    Comment by Mark Brady — November 28, 2007 @ 5:13 pm GMT Nov 28,2007 | Reply

  6. @Mark, it definitively depends on the usage. If there are batch processes that migrate entries from one state to another then using three tables does not seem too bad. If OTOH you need to pull records from all stages in one SELECT then you might suffer some disadvantages.

    Here is another idea: partition the table by the field identifying the stage. Downside is of course, as with the three table approach, that records have to move when they change their state.

    But in the end this whole reorganizing business seems only useful if the usage pattern is such that many blocks with only few rows stay in the system. Otherwise, i.e. if blocks are eventually left completely without rows, blocks will be reused and you don’t waste space in the long run.

    Comment by Robert — November 28, 2007 @ 5:36 pm GMT Nov 28,2007 | Reply

  7. @Mark/Robert,
    Masssive area here guys: mixing the logical and the physical, all the potential variables involved, etc. I’d personally abide by logical fundamentals and principles in the first instance — adhering to the relational paradigm, and not entertaining violations of normalisation (BCNF preferably) simply to facilitate ‘alleged’ higher performance.

    The physical should be predicated on the logical, not the reverse, and if the three-stage populated record contravenes no(relational) modelling rules then no reason why it shouldn’t be so implemented, though typically a big ‘if’.

    My previous comment was a bit sweeping, apologies.

    Comment by SeánMacGC — November 30, 2007 @ 11:34 am GMT Nov 30,2007 | Reply

  8. @Sean, completely agree – first business then physics. However, I understood that we were talking about a made up example with a lot of variables left undefined. I believe to bring in the physical layer is ok for the sake of discussing different options – after all this was all about “phsyics” in the first place.

    Comment by Robert — December 3, 2007 @ 3:15 pm GMT Dec 3,2007 | Reply

  9. […] a couple of years ago where a DBA found that moving tables made them take up more space (I wrote a note in response, quoting the reason, and solution, that I had originally published in Practical Oracle […]

    Pingback by Fragmentation 2 « Oracle Scratchpad — July 16, 2010 @ 6:52 pm BST Jul 16,2010 | Reply

  10. […] ago I wrote about a poster on the OTN db forum who was puzzled by the fact that when he started rebuilding tables they got bigger.  (Bad luck, sometimes that’s what happens […]

    Pingback by Index rebuilds « Oracle Scratchpad — August 24, 2010 @ 6:57 pm BST Aug 24,2010 | Reply

  11. Hi Jonathan,

    Could you please let me know how to find the table’s that are candidates of reorg if the table has clob columns apart from the regular NUMBER,VARCHAR etc? clob’s definition is “Enable Storage IN ROW” “CHUNK 8192” “RETENTION”. db version 11.1. I tried to use dbms_space.space_usage but couldn’t get much details.

    Also, if we specify “ENABLE STORAGE IN ROW” would Oracle use it in calculating avg_row_len? Since anything less than 4000 bytes is stored in the row itself (and not in lobsegment?).

    Thanks

    Comment by venkat — February 20, 2012 @ 8:01 am GMT Feb 20,2012 | Reply

  12. […] lost by the fact that you’ve had to do a rebuild, so be careful – make sure you read this elderly post (which echoes a comment from Practical Oracle 8i) before doing ANY table rebuild. Leave a […]

    Pingback by Row sizes 2 « Oracle Scratchpad — October 1, 2012 @ 6:34 am BST Oct 1,2012 | Reply

  13. […] API over a database link but moved on to the topic of how to handle a scenario that I’ve described in the past involving a table where rows are intially short and eventually become much longer and a requirement […]

    Pingback by Hakan Factor | Oracle Scratchpad — November 28, 2022 @ 3:14 pm GMT Nov 28,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.