This note is part three of a four-part series, and covers Table fragmentation. The whole series is as follows
- Introduction – with links to parts 2 – 4
- Disk and Tablespace Fragmentation
- Table Fragmentation – this bit
- Index Fragmentation
3. Table “fragmentation”.
In the introduction we discussed one type of table fragmentation that doesn’t (usually) matter – the fragmentation of a table into multiple extents. Here’s a funny thought – ASSM (automatic segment space management) introduces another form of table fragmentation that usually doesn’t matter.
When a process checks the freespace bitmaps for an object and can’t find a block with enough free space for inserting new data, it will make some space available by formatting 16 blocks somewhere in the current extent (allocating a new extent if necessary). The location of those 16 blocks may be almost anywhere in the extent (limited by the fact that the starting point within the extent has to be based on multiples of 16). The location of the chunk, like the initial choice of block to use, is dictated by the process id (pid). This means you could create a table in a tablespace using 1MB uniform extents and find that the first row you inserted formatted the last 16 blocks of the first extent and the row went into the last block of that extent. This doesn’t really matter very much in general because (a) most I/O access is by block rather than tablescan so the location within extent doesn’t really matter much; (b) the “messiness” is generally restricted to just the last extent in the table; and (c) if you are doing tablescans Oracle has a mechanism dependent on using the “LHWM (low high water mark) and HHWM (high high water mark), for minimising the overhead of checking which 16 block chunks need special treatment.
The more significant, and common, type of table fragmentation comes about because data can be deleted and then we can worry about “sparsely populated blocks” . When you create a table you specify how much space should be left in each block for rows to be updated and (explicitly under freelist management or implicitly under ASSM) how much space should become free in a block before it should be considered as a target for new data to be inserted.
When you consider the possible scenarios that have to be addressed with data inserts, updates and deletes, you realise how difficult it can be for Oracle to write code that handles free space management at the block level in a timely and efficient manner; you also realise how difficult it can be (as a developer or DBA) to specify sensible space management limits to minimise performance problems in extreme cases. Ultimately it is possible for a table to end up with a significant amount of free space in each block – and then you have to start asking question like: how did that free space get there, is its presence causing a performance problem, are you going to be able to re-use it, are you going to be able to re-use it in a timely fashion, if you do manage to re-use it will that introduce a different type of performance problem.
Consider the scenario where you delete one year’s worth of data after running your system for five years – this probably results in the first 20% of the blocks in a table becoming completely empty. They will all go on the freelist or (bugs excepted) be flagged in the ASSM bitmaps as having free space, so you will reuse the space eventually; moreover if your application performance depends on your data being clustered by time of arrival the way the empty space refills will, broadly speaking, keep that clustering intact. By a quirk of implementation you will probably use the blocks in reverse order under freelist management and in forward order under ASSM (and a thought crosses my mind at that point to wonder about the impact the former will have on “low-cardinality” non-unique indexes). But, for quite some time, any tablescans you do will take about 20% longer than they need to, and you could be backing up larger data files than you really need to: these considerations might make you decide to rebuild the table (and its indexes) in a new tablespace – they might also make you consider the benefits of partitioning.
Imagine a scenario where you have an web-based sales system where users can fill baskets which ultimately become orders. Some users inevitably will fill baskets and then not place orders so you could have some background code running that looks for old baskets that have not turned into orders and delete them from the database. (There are two design flaws in this scenario – but accidents like this often happen in production systems.) In this case you could end up with a table that suffers a constant stream of deletions in the recent past so that when you check the table you find that on average every block in the table has (say) 20% free space, except for the relatively small number of blocks that hold data about recent baskets.
If you are running ASSM the 20% of free space cannot be reused (unless you recreate the table) because the granularity of ASSM requires a minimum of 25% free space in a block before its status changes down from “full” to “0% – 25% free”; but if you are using freelist space management then you may have predicted the effect and set the pctused table storage parameter to a value a little above 80. With the pctused above 80, blocks would go back on the freelist as their free space climbed to 20% and the space would be reused. This sounds like a good thing, except there’s a new problem to consider.
If your application used to run efficiently because the baskets for the last few days were stored in the last N blocks of the table, you’ve now got a table where the pattern of deletes and re-use mean that that data is now spread over 5 * N blocks (because you keep freeing up 20% of blocks from the past – which means you’re going to have to find some extra memory for buffering the data if you want to keep the same level of efficiency.
The general conclusion is this: once you’ve got past the issues of dealing (once) with setting extent sizes for a table, any thoughts of “fragmentation” should be thought of as “sparsely populated blocks” – and there are patterns of data deletion that can result in table blocks being less than perfectly utilised. In some cases it is not possible to reclaim the empty space without rebuilding the table (and in all cases, partitioned tables may make this a viable option), but there are cases where the pattern of deletion can allow space to be reused – and then you may find that you don’t always want to re-use that space because using it can introduce a different kind of problem.
Update: I see that Nigel Noble has written an article about a bug in the way that ASSM checks free space maps that can result in significant space wastage in particular special cases.