Oracle Scratchpad

March 19, 2018

Reference Costs

Filed under: Oracle,Partitioning,Performance — Jonathan Lewis @ 7:44 am BST Mar 19,2018

The partitioning option “partition by reference” is a very convenient option which keeps acquiring more cute little features, such as cascading truncates and cascading splits, as time passes – but what does it cost and would you use it if you don’t really need to.

When reference partitioning came into existence many years ago, I had already seen several performance disasters created by people’s enthusiasm for surrogate keys and the difficulties this introduced for partition elimination; so my first thought was that this was a mechanism that would have a hugely beneficial effect on systems which (in 20:20 – or 6:6 if you’re European – hindsight) had been badly designed and would otherwise need a lot of re-engineering to use partitioning effectively.

(Side note: Imagine you have partitioned an orders table on colX which is a column in the real (business-oriented) candidate key, but you’ve created a surrogate key which is used as the target for a foreign key from the order_lines tables – how do you get partition-wise joins between orders and order_lines if you haven’t got the partitioning column in the order_lines table ?)

So ref partitioning was a good way to workaround a big existing problem and, whatever overheads it introduced, the benefit was potentially so  huge that you wouldn’t care (or, probably, notice) that your system was less efficient than it ought to be. But what if you’re working on a new project and still have control of the physical design – how does that change the cost/benefit analysis.

It’s actually taken me several years to get round to producing a little demonstration to highlight one of the obvious costs of reference partitioning – even though it’s a very simple demo raising the obvious question: ‘how much work does Oracle have to do to find the right partition when inserting a “child” row ?’ If you chose to implement reference partitioning without asking that simple question you may be using a lot more machine resources than you really need to, although you may not actually be heading for a disastrous performance problem.

As a demonstration of the issue I’m going to set up something that approximates an order/order_lines model in two ways, one using reference partitioning and one using a copied column, to see what differences show up when you start loading data.

rem
rem     Script:         pt_ref.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table orders (
        id              number(10,0) not null,
        id_cust         number(10,0) not null,
        date_ordered    date not null,
        padding         varchar2(150)
)
partition by range (date_ordered)
(
        partition p201801       values less than (to_date('01-Feb-2018')),
        partition p201802       values less than (to_date('01-Mar-2018')),
        partition p201803       values less than (to_date('01-Apr-2018')),
        partition p201804       values less than (to_date('01-May-2018')),
        partition p201805       values less than (to_date('01-Jun-2018')),
        partition p201806       values less than (to_date('01-Jul-2018')),
        partition p201807       values less than (to_date('01-Aug-2018'))
);

create unique index ord_pk on orders (id);
alter table orders add constraint ord_pk primary key(id);

create table order_lines (
        id_ord          number(10,0) not null,
        line_number     number(4,0)  not null,
        id_product      number(6,0)  not null,
        qty             number(6,0)  not null,
        value           number(10,2) not null,
        padding         varchar2(150),
        constraint orl_fk_ord foreign key (id_ord) references orders
                on delete cascade
)
partition by reference (orl_fk_ord)
;

create unique index orl_pk on order_lines (id_ord, line_number);
alter table order_lines add constraint orl_pk primary key (id_ord, line_number);

create table order_lines_2 (
        date_ordered    date,
        id_ord          number(10,0) not null,
        line_number     number(4,0)  not null,
        id_product      number(6,0)  not null,
        qty             number(6,0)  not null,
        value           number(10,2) not null,
        padding         varchar2(150),
        constraint orl2_fk_ord foreign key (id_ord) references orders
                on delete cascade
)
partition by range (date_ordered)
(
        partition p201801       values less than (to_date('01-Feb-2018')),
        partition p201802       values less than (to_date('01-Mar-2018')),
        partition p201803       values less than (to_date('01-Apr-2018')),
        partition p201804       values less than (to_date('01-May-2018')),
        partition p201805       values less than (to_date('01-Jun-2018')),
        partition p201806       values less than (to_date('01-Jul-2018')),
        partition p201807       values less than (to_date('01-Aug-2018'))
)
;

create unique index orl2_pk on order_lines_2 (id_ord, line_number);
alter table order_lines_2 add constraint orl2_pk primary key (id_ord, line_number);

It’s a bit of a bodge job as far as modelling goes, but that’s to keep workload comparisons easy and make a point without writing too much code. All I’ve got is an orders table partitioned by date and an order_lines table that I want partitioned the same way. I’ve handled the requirement for partitioning order_lines in two ways, one is partition by reference and the other is to copy down the partitioning column from the orders table. (In my view the “real” key for an orders table should be (customer identifier, order date, counter) and if I thought efficient partitioning was going to be a necessary feature for scalability I would copy down all three columns. Depending on the nature of the business I would compress the primary key index on orders on one or two of the columns, and the foreign key index on order_lines on one, two, or three of its columns)

Now all I have to do is load some data into the tables. First the orders table:

insert into orders(
        id, id_cust, date_ordered, padding
)
with g as (
        select rownum id from dual
        connect by level <= 1e4
)
select
        rownum                                  id,
        trunc(dbms_random.value(10000,20000))   id_cust,
        to_date('01-Jan-2018') +
                trunc((rownum-1)/100)           date_ordered,
        rpad('x',40)                            padding
from
        g,g
where
        rownum <= 2e4
;

commit;
execute dbms_stats.gather_table_stats(user,'orders')

This produces 100 orders per day, for 200 days which fits within the seven months of pre-declared partitions. I’ve gathered table stats on the table because that’s probably the best way to deal with any requirements for block cleanout after the insert. (Note: I’m avoiding interval partitioning in this example because that’s just another complication to add to the comparison and, as I reported a few days ago, introduces another massive inefficiency on data loading.)

Now I’ll insert some order_lines rows at 5 lines per order into the two versions of the order_lines tables. One of them, of course, has to have a date generated using the same algorithm that I used for the orders table. Note that I’ve made a call to dbms_random.seed(0) before each insert to guarantee that the same “random” values will be inserted in both table.

execute dbms_random.seed(0)

insert into order_lines_2(
        date_ordered, id_ord, line_number, id_product, qty, value, padding
)
with g as (
        select rownum id from dual
        connect by level <= 1e4
)
select
        to_date('01-Jan-2018') +
                trunc((rownum-1)/500)           date_ordered,
        1 + trunc((rownum-1)/5)                 id_ord,
        1 + mod(rownum,5)                       line_number,
        trunc(dbms_random.value(10000,20000))   id_product,
        1 qty,
        1 value,
        rpad('x',80)                            padding
from
        g,g
where
        rownum <= 10e4
;

commit;


execute dbms_random.seed(0)

insert into order_lines(
        id_ord, line_number, id_product, qty, value, padding
)
with g as (
        select rownum id from dual
        connect by level <= 1e4
)
select
        1 + trunc((rownum-1)/5)                 id_ord,
        1 + mod(rownum,5)                       line_number,
        trunc(dbms_random.value(10000,20000))   id_product,
        1 qty,
        1 value,
        rpad('x',80)                            padding
from
        g,g
where
        rownum <= 10e4
;

commit;

What I haven’t shown in the code is the snapshot wrapping I used to check the session stats, system latch activity and system rowcache activity – which I thought would give me the best indication of any variation in workload. In fact, of course, the first and simplest variation was the elapsed time: 4.5 seconds for the ref partitioned table, 2.5 seconds for the explicitly created table (regardless of which insert I did first), and it was nearly all pure CPU time.

It turned out that the rowcache stats showed virtually no variation, and the latch stats only showed significant variation in the latches that I could have predicted from the session stats, and here are the most significant session stats that highlight and explain the difference in times:

Explicitly Created
------------------
CPU used by this session                                                   231
DB time                                                                    242
db block gets                                                          219,471
db block changes                                                        27,190
redo entries                                                            15,483
redo size                                                           24,790,224
HSC Heap Segment Block Changes                                           2,944

Ref partitioned
---------------
CPU used by this session                                                   515
DB time                                                                    532
db block gets                                                          615,979
db block changes                                                       418,025
redo entries                                                           209,918
redo size                                                           70,043,676
HSC Heap Segment Block Changes                                         100,048

These results were from 12.1.0.2, but the figures from 12.2.0.1 and 11.2.0.4 were similar though the CPU time dropped as the version number went up: what you’re seeing is the effect of turning an array insert (for the precreated table) into single row processing for the ref partitioned table. Basically it seems that for every row inserted Oracle has to do something to work out which partition the row should go into, and while it does that work it release any pins of buffers it would have been holding from the previous row’s insert; in other words, various optimisations relating to array inserts are not taking place.

  • Looking in more detail at the figures for the ref partition insert:
  • The 100,000 “HSC heap Segment Block Changes” equate to the 100,000 rows inserted into the table
  • Add the single row index updates to the primary key and you get 200,000 redo entries.
  • For every individual row inserted Oracle has to do a current mode (db block gets) check against the primary key of the orders table – but when array processing the root block can be pinned.

We can get a closer look at the differences by taking snapshots of v$segstat (or v$segment_statistics), to see the following (pre-created table on the left):


ORD_PK                                |   ORD_PK
  logical reads          199,440      |     logical reads          300,432
                                      |
ORDER_LINES_2 - P201801               |   ORDER_LINES - P201801
  logical reads            2,112      |     logical reads           16,960
  db block changes         1,280      |     db block changes        16,944
                                      |
ORDER_LINES_2 - P201802               |   ORDER_LINES - P201802
  logical reads            2,256      |     logical reads           16,144
  db block changes         1,248      |     db block changes        15,088
                                      |
ORDER_LINES_2 - P201803               |   ORDER_LINES - P201803
  logical reads            2,288      |     logical reads           17,264
  db block changes         1,376      |     db block changes        16,560
                                      |
ORDER_LINES_2 - P201804               |   ORDER_LINES - P201804
  logical reads            2,672      |     logical reads           16,768
  db block changes         1,280      |     db block changes        16,144
                                      |
ORDER_LINES_2 - P201805               |   ORDER_LINES - P201805
  logical reads            2,224      |     logical reads           17,472
  db block changes         1,264      |     db block changes        16,528
                                      |
ORDER_LINES_2 - P201806               |   ORDER_LINES - P201806
  logical reads            2,624      |     logical reads           16,800
  db block changes         1,328      |     db block changes        16,160
                                      |
ORDER_LINES_2 - P201807               |   ORDER_LINES - P201807
  logical reads            1,376      |     logical reads           10,368
  db block changes           864      |     db block changes        10,752
                                      |
ORL2_PK                               |   ORL_PK
  logical reads           10,640      |     logical reads          206,352
  db block changes         7,024      |     db block changes       104,656

The right hand data set does an extra 100,000 logical reads on the ORD_PK index (top set of lines) which I think are the 100,000 gets on the root block that was pinned for the table on the left – the numbers don’t quite add up, so there’s some extra complexity that I haven’t guessed correctly.

The insert into the ORL[2]_PK index (lines) is single row processed for the right hand table – with, I think, the logical reads recording two current gets per insert.

Every partition of the table, except the last, shows 15,000 db block changes, totalling a difference of about 100,000 db block changes corresponding to the single rows being inserted. Then ORL[2]_PK shows another 100,000 db block changes, giving us the 200,000 we saw as redo entries and 400,000 (when doubled up to allow for the undo) db block changes that we saw in total.

Finally we need to explain the difference of 400,000 db block gets between the two sets of session stats – and I think this is the extra 100,000 for ORD_PK, the 100,000 for the table inserts, and 200,000 for the ORL[2]_PK index, which I think might be explained as 100,000 as a current get that checks for “duplicate key” and 100,000 gets to do the actual insert.

Bottom Line, though – if you use reference partitioning every array insert seems to turn into single row processing with the attendant increase in buffer gets, undo and redo generated,  latch activity, and CPU used as Oracle checks for every single row which partition it should go into: and there doesn’t seem to be any optimisation that caters for “this row belongs in the same partition as the previous row”. You may decide that this extra cost due to reference partitioning is worth it for the benefits that reference partitioning supplies – it’s all down to what your application does, especially in terms of aging data perhaps – but it’s nice to know that this cost is there so that you can do a better cost/benefit analysis.

Footnote:

Interested readers might like to extend this test to a multi-layered set of ref-partitioned tables to see if the increase in overheads is linear or geometric.

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.