Oracle Scratchpad

June 16, 2021

Split Partition

Filed under: Uncategorized — Jonathan Lewis @ 12:33 pm BST Jun 16,2021

This is a note I drafted a little over 5 years ago in response to a question on the Oracle developer forum about turning a simple heap table into a list partitioned table; but never got around to publishing it and now it’s not going to be of much benefit to most people because in 12c Oracle makes it very easy to handle the task of re-engineering partitioned tables online with a single “alter table” command.

But people do still run 11g, and the topic of changing, or repairing, a partitioning strategy has come up again very recently (more than once) in the context of 11g, so I thought I’d edit the draft and publish it.

There are, of course, many ways to start with a simple heap table and end up with a partitioned table, or start with a partitioned table in one state and end up with the table in a different state, and the method you choose may depend on various restrictions imposed from outside the database. One of the simpler strategies would be to “create table as select” to create a new partitioned table from the existing table and then rename the two tables – with possible variations such as exporting, creating an empty partitioned table, then re-importing ignoring creation errors.

Here’s a thought, though, that you might be able to take advantage if you need to enhance the partitioning of an existing partitioned table, but don’t have to re-partition all your data immediately. It’s an idea I came up with many years ago (long before the introduction of interval partitions) when a client had a daily partitioned table but had forgotten for a several months that they were supposed to do a regular “split partition” of the maxvalue partition in anticipation of future data arriving. They had a table with about 30 daily parititons, and one partition of about 500 days (18 months).

The first step, of course, was to split the big partiiton “above” the highest current value so that Oracle didn’t have to copy data into the resulting two partitions, so we could re-impose the regular splitting mechanism on the new, small, maxvalue partition. The follow-up step was to worry about splitting the one huge partition into daily chunks over the next few days or weeks.

The client’s initial thought was to write a simple pl/sql loop that generated and executed statements like:

alter table orders 
        split partition p20050907 
        at (to_date('02-Mar-2004','dd-Mon-yyyy')) 
        into (
                partition p20040302, 
                partition p20050907
        )
/

This would split the first day off the very large partition, then the next pass through the loop would move on one day and split off the next daily partition, and so on. While the code was simple it was not an efficient strategy – just think about the arithmetic:

To keep things simple, imagine we have 512 days in the intial partition and each day’s data is exactly 1GB. On the first split we read and write 512GB; on the second split we read and write 511GB, on the fird split we read and write 510GB … on the 511th split we read and write 2GB and the job is done. In total the loop reads and writes 511 * (512 + 2) / 2 = 131,327 GB.  (Average of 1st and last, multiplied by number of terms).

Now think about coding a more complex algorithm:

  • First split the big partition into 2 partitions of 256 days each, reading and writing 512GB;
  • Then split the two resulting partitions into a total 4 partitions of 128 days each, reading and writing 512GB (in total);
  • Then split the 4 resulting partitions into a total of 8 partition of 64 days each, reading and writing 512GB (in total)
  • On the 9th cycle you will have to split 256 partitions of 2GB each to produce 512 partitions of 1GB each, reading and writing a total of 512GB.

In total the algorithm reads and writes 9 * 512 = 4,604 GB which is about 1/30th of the simpler algorithm.

Conclusion

This example of splitting partitions is just one specific case of a general principle.

It’s easy to code up an “obvious” strategy without considering whether there might be a slightly more complex but vastly more efficient strategy. In many cases, of course, the difference in resource costs between the simplest and the most efficient strategies may not be worth the extra development and testing time, but when you start dealing with large data volumes it is always worth asking three questions:

  1. Do I have a reasonable idea of the resources that will be taken up by the obvious strategy
  2. Is the obvious strategy the only possible strategy
  3. Is it worth spending more human resources designing and testing a better strategy to save machine resources at run-time.

 

2 Comments »

  1. Hi Jonathan,

    I really like your splitting in half method. It seems the best one, if we cannot stop the system, it gives us logarithmic complexity. And, given that pl/sql supports recursion, should be easy to program.

    Though, I suppose, if a customer could afford small downtime, one could try to exchange the huge partition with a “temporary” table, pre-create all necessary partitions as empty segments in the target table, and reload the data back from the temporary storage.
    Exchange should virtually make no IO. And we left with only 512 Gb to read and write. Of course, there might be difficulties due to MAXVALUE already present in the partitions list, and FKs referencing the target table. But again, if we agreed to downtime, who cares, no new data comes in for this brief moment, no need to go through revalidation :) We can also fire up a parallel statement, that could reload 512 Gb in a matter of seconds on a relatively modern system. My point is, that there should be customers who desperately need to keep the system available all the time, and the ones, who would prefer a fast painful shot of a downtime :)

    Thank you,
    Viacheslav

    Comment by Viacheslav Andzhich — June 18, 2021 @ 11:29 am BST Jun 18,2021 | Reply

    • Viacheslav,

      Thanks for the comment.

      As I said in the article, there are plenty of different strategies that might be appropriate.

      Another possibility is to consider the option for not partitioning the entire data set in exactly the same way. The client had managed to survive for months without doing the daily paritioning, maybe it would be sufficient to say “let’s do it from now onwards”; or maybe “let’s split off the start of this month from the big partition, make that daily, and carry on from there”. Maybe I should have put the 2nd question first: “Is the obvious strategy the only possible strategy?”

      I didn’t mention it in the note, of course, but even in 12c one might choose to take an approach like the one you describe rfather than doing a single multi-partition online split because of the possibility of failure due to space, time, or read-consistency issues (etc.) if the data is arriving too fast for the online journalling mechanism to cope.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — June 19, 2021 @ 10:21 am BST Jun 19,2021 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.