Oracle Scratchpad

March 9, 2015

Flashback logging

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 2:44 pm GMT Mar 9,2015

When database flashback first appeared many years ago I commented (somewhere, but don’t ask me where) that it seemed like a very nice idea for full-scale test databases if you wanted to test the impact of changes to batch code, but I couldn’t really see it being a good idea for live production systems because of the overheads.

Features and circumstances change, of course, and someone recently pointed out that if your production system is multi-terabyte and you’re running with a dataguard standby and some minor catastrophe forces you to switch to the standby then you don’t really want to be running without a standby for the time it would take for you to use restore and recover an old backup to create a new standby and there may be cases where you could flashback the original primary to before the catastrophe and turn it into the standby from that point onward. Sounds like a reasonable argument to me – but you might still need to think very carefully about how to minimise the impact of enabling database flashback, especially if your database is a datawarehouse, DSS, or mixed system.

Imagine you have a batch processes that revolve around loading data into an empty table with a couple of indexes – it’s a production system so you’re running with archivelog mode enabled, and then you’re told to switch on database flashback. How much impact will that have on your current loading strategies ? Here’s a little bit of code to help you on your way – I create an empty table as a clone of the view all_objects, and create one index, then I insert 1.6M rows into it. I’ve generated 4 different sets of results: flashback on or off, then either maintaining the index during loading or marking it unusable then rebuilding it after the load. Here’s the minimum code:


create table t1 segment creation immediate tablespace test_8k
as
select * from all_objects
where   rownum < 1
;

create index t1_i1 on t1(object_name, object_id) tablespace test_8k_assm_auto;
-- alter index t1_i1 unusable;

insert /*+ append */ into t1
with object_data as (
        select --+ materialize
                *
        from
                all_objects
        where
                rownum <= 50000
),
counter as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 32
)
select
        /*+ leading (ctr obj) use_nl(obj) */
        obj.*
from
        counter         ctr,
        object_data     obj
;

-- alter index t1_i1 rebuild;

Here’s a quick summary of the timing I got  before I talk about the effects (running 11.2.0.4):

Flashback off:
Maintain index in real time: 138 seconds
Rebuild index at end: 66 seconds

Flashback on:
Maintain index in real time: 214 seconds
Rebuild index at end: 112 seconds

It is very important to note that these timings do not allow you to draw any generic conclusions about optimum strategies for your systems. The only interpretation you can put on them is that different circumstances may lead to very different timings, so it’s worth looking at what you could do with your own systems to find good strategies for different cases.

Most significant, probably, is the big difference between the two options where flashback is enabled – if you’ve got to use it, how do you do damage limitation. Here are some key figures, namely the file I/O stats and the some instance activity stats, I/O stats first:


"Real-time" maintenance
---------------------------------
Tempfile Stats - 09-Mar 11:41:57
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,088      22,454      20.638        .063         296        .000     1,011      22,455        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:41:58
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3      24,802      24,802       1.000        .315      24,802        .315           0        .000          2,386      20,379        .239    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         718      22,805      31.762        .001           5        .000         713        .002            725      22,814        .002    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       8,485       8,485       1.000        .317       8,485        .317           0        .000            785       6,938        .348    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

Mark Unusable and Rebuild
---------------------------------
Tempfile Stats - 09-Mar 11:53:04
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,461      10,508       7.192        .100           1        .017       407      10,508        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:53:05
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3          17          17       1.000       5.830          17       5.830           0        .000             28          49       1.636    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         894      45,602      51.009        .001           2        .002         892        .001            721      22,811        .026    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       2,586       9,356       3.618        .313         264       3.064       2,322        .001          2,443       9,214        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

There are all sorts of interesting differences in these results due to the different way in which Oracle handles the index. For the “real-time” maintenance the session accumulates the key values and rowids as it writes the table, then sorts them, then does an cache-based bulk update to the index. For the “rebuild” strategy Oracle simply scans the table after it has been loaded, sorts the key values and indexes, then writes the index to disc using direct path writes; you might expect the total work done to be roughly the same in both cases – but it’s not.

I’ve shown 4 files: the temporary tablespace, the undo tablespace, the tablespace holding the table and the tablespace holding the index; and the first obvious difference is the number of blocks written and read and the change in average read size on the temporary tablespace. Both sessions had to spill to disc for the sort, and both did a “one-pass” sort; the difference in the number of blocks written and read appears because the “real-time” session wrote the sorted data set back to the temporary tablespace one more time than it really needed to – it merged the sorted data in a single pass but wrote the data back to the temporary tablespace before reading it again and applying it to the index (for a couple of points on tracing sorts, see this posting). I don’t know why Oracle chose to use a much smaller read slot size in the second case, though.

The next most dramatic thing we see is that real-time maintenance introduced 24,800 single block reads with 20,000 blocks written to the undo tablespace (with a few thousand more that would eventually be written by dbwr – I should have included a “flush buffer_cache” in my tests), compared to virtually no activity in the “rebuild” case. The rebuild generates no undo; real-time maintenance (even starting with an empty index) generates undo because (in theory) someone might look at the index and need to see a read-consistent image of it. So it’s not surprising that we see a lot of writes to the undo tablespace – but where did the reads come from? I’ll answer question that later.

It’s probably not a surprise to see the difference in the number of blocks read from the table’s tablespace. When we rebuild the index we have to do a tablescan to acquire the data; but, again, we can ask why did we see 22,800 blocks read from the table’s tablespace when we were doing the insert with real-time maintenance. On a positive note those reads were multiblock reads, but what caused them? Again, I’ll postpone the answer.

Finally we see that the number of blocks read (reason again postponed) and written to the index’s tablespace are roughly similar. The writes differ because because the rebuild is doing direct path writes, while the real-time maintenance is done in the buffer cache, so there are some outstanding index blocks to be written. The reads are similar, though one test is exclusively single block reads and the other is doing (small) multiblock reads – which is just a little bit more efficient.  The difference in the number of reads is because the rebuild was at the default pctfree=10 while the index maintenance was a massive “insert in order” which would have packed the index leaf blocks at 100%.

To start the explanation – here are the most significant activity stats – some for the session, a couple for the instance:


"Real-time" maintenance
-----------------------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                        33,263
redo entries                                                           118,290
redo size                                                          466,628,852
redo size for direct writes                                        187,616,044
undo change vector size                                            134,282,356
flashback log write bytes                                          441,032,704

Rebuild
-------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                           156
redo entries                                                            35,055
redo size                                                          263,801,792
redo size for direct writes                                        263,407,628
undo change vector size                                                122,156
flashback log write bytes                                          278,036,480

The big clue is the “physical reads for flashback new”. When you modify a block, if it hasn’t been dumped into the flashback log recently (as defined by the hidden _flashback_barrier_interval parameter) then the original version of the block has to be written to the flashback log before the change can be applied; moreover, if a block is being “newed” (Oracle-speak for being reformatted for a new use) it will also be written to flashback log. Given the way that the undo tablespace works it’s not surprising if virtually every block you modify in the undo tablespace has to be written to the flashback log before you use it. The 33,264 blocks read for “flashback new” consists of the 24,800 blocks read from the undo tablespace when we were maintaining the index in real-time plus a further 8,460 from “somewhere” – which, probably not coincidentally, matches the number of blocks read from the index tablespace as we create the index. The odd thing is that we don’t see the 22,800 reads on the table’s tablespace (which don’t occur when flashback is off) reported as “physical reads for flashback new”; this looks like a reporting error to me.

So the volume of undo requires us to generate a lot of flashback log as well as the usual increase in the amount of redo. As a little side note, we get confirmation from these stats that the index was rebuilt using direct path writes – there’s an extra 75MB of redo for direct writes.

Summary

If you are running with flashback enabled in a system that’s doing high volume data loading remember that the “physical reads for flashback new” could be a major expense. This is particularly expensive on index maintenance, which can result in a large number single block reads of the undo tablespace. The undo costs you three times – once for the basic cost of undo (and associated redo), once for the extra reads, and once for writing the flashback log. Although you have to do tablescans to rebuild indexes, the cost of an (efficient, possibly direct path) tablescan may be much less than the penalty of the work relating to flashback.

Footnote: since you can’t (officially) load data into a table with an unusable unique index or constraint, you may want to experiment with using non-unique indexes to support unique/PK constraints and disabling the constraints while loading.

11 Comments »

  1. Hi Jonathan, apologies if I’ve just missed it but – What version is this? As per “Flashback Database Best Practices & Performance (Doc ID 565535.1)”, there have been quite a few tweaks over the years. I personally saw quite a dramatic drop in the overhead in 11gR2.

    Comment by Tim Hopkins — March 9, 2015 @ 6:45 pm GMT Mar 9,2015 | Reply

    • Tim,

      Thanks for highlighting that omission – now corrected – I was using 11.2.0.4.

      Comment by Jonathan Lewis — March 9, 2015 @ 7:03 pm GMT Mar 9,2015 | Reply

    • Tim,

      I’ve been trying to track down any notes on how to make best use of the “block-new” optimization – but there’s doesn’t seem to be anything available on MoS, do you have any notes about it ?

      My thoughts were: minimize the flashback_retention_target so that by the time you re-use blocks they are outside the target limit and therefore don’t need to be logged, and keep undo tablespaces as small as possible so that if you have multiple operations like my example you have a chance of re-using undo blocks a few times before they’ve reached the flashback barrier time. I haven’t yet tested either idea, and I’m not sure that they would be sufficiently robust that they could be used in a production system anyway.

      Comment by Jonathan Lewis — March 9, 2015 @ 8:05 pm GMT Mar 9,2015 | Reply

  2. Thanks for the post, Jonathan. I’ve been turning off flashback on all my dbs since it came out. It is obvious that it MUST introduce a serious overhead on I/O operations of a highly volatile db (case in point: DW and ETL).
    Hence it may be self-defeating in a way: it’s supposed to help reduce restores from backup (high I/O), but it enacts a high price on I/O to do so.
    The same can be said about the current crop of “thin-provisioning” db copy software/hardware. Most of them are based on variations of “copy-on-write” duplication technology.
    Which basically ensures any first write will be two writes at the very least, if not more – depending how many “copies” were made. I’ve had a very bad experience with this using the EMC variation and another one that shall remain nameless (we didn’t buy the hardware).
    In a nutshell, all writes became twice as slow – best case. And what’s worse is AWR and other tuning tools (including OS in the case of hardware driven ones) are blissfully unaware of what’s going on.
    End result? Greatly increased overall runtimes, no direct explanation of why. No biggie in small or medium sized DSS and OLTP test dbs with small amoutns of “O”, absolutely lethal on TB-sized datawarehouse dbs and large ETLs.

    Comment by Noons — March 9, 2015 @ 9:42 pm GMT Mar 9,2015 | Reply

  3. I am curious, did you enable flashback database just prior to running the test and measuring the results? When flashback database is first enabled it must create flashback logs in which to store the before images of the blocks that are being changed. In 11.2 this log creation process happens as changes are occurring. During this time flashback database is trying to figure out the number and size of logs needed to satisfy the retention target and creating them on the fly, resulting in a lot of extra I/O. Once flashback has what it thinks is enough flashback logs to satisfy the retention target it will start reusing the logs, thus reducing the amount of I/O. So if you enabled flashback database and then start your tests you will see a lot more I/O since you have to create the logs than if you re-tested once the logs have been created and can be reused.

    In 11204 you can pre-create the flashback logs by using the _db_flashback_log_min_size and _db_flashback_log_min_total_space parameters. For example, if I wanted 100GB of flashback logs then I would set _size to the same as online redo log size and _total_space=100G. Once you enable flashback database (don’t start any tests yet) we go out and begin creating the logs. You can monitor V$FLASHBACK_DATABASE_LOGFILE to see the files being created (in the above example you would see 24 logs).

    Block new optimization prevents the reading into the buffer cache of empty blocks for direct load operations. With block new you should not see any overhead for direct loads.

    Comment by Michael Smith (@Smith_MichaelT) — March 9, 2015 @ 10:12 pm GMT Mar 9,2015 | Reply

    • Michael,

      I’ve run the test about 20 times with different traces and the results were consistent – except for the one run where I had just changed the code to put the index in a different tablespace from the table. In that one case Oracle reported a count matching the blocks used for maintaining the index as “flashback cache read optimizations for block new”.

      Given your comments about pre-allocation, I repeated the test after setting the parameters to pre-allocate 600MB of flashback log (more than enough for the first run), and checked that no new files had been allocated, and some space was still untouched after the run. The number of “flashback log write bytes” did not change as a consequence – I don’t believe that pre-allocation bytes would appear in this statistic, although the concurrent activity might have an impact on the general I/O rates. (It’s worth noting, also that the flashback log writes are large, sequential, writes and operate very efficiently and in the background – the time lost by the session due these writes was less than 5 seconds of the total 214 in the first test (event “flashback log file sync”)).

      The volume of writes is consistent with Oracle doing direct path reads for flashback new without reporting them.

      Test 1: (24,800 (undo) + 8,500 (index) + 22,800 (unreported table) ) * 8192 = 459M (cp. 441M flashback log write bytes)
      Test 2: (22,800 (unreported table) + 9,360 (unreported index)) * 8192 = 263M (cp. 278M flashback log write bytes).

      What I’d like to know about the “flashback new” optimization is what condition makes it possible to use the mechanism. Note 565535.1 suggests that you can choose to make it happen, but there are no clues about how you can engineer the choice. (The note also suggests that it can’t happen after a truncate on an object, so I’m not entirely surprised that it isn’t happening after I’ve done a drop/purge/recreate.)

      Comment by Jonathan Lewis — March 10, 2015 @ 8:00 am GMT Mar 10,2015 | Reply

      • When we perform a direct load to a table with flashback database enabled then we read new empty blocks into the buffer cache, write the change the block and then write it out to the flashback log. With the block new optimization we skip that read of a new empty block all together. There are some conditions were block new will not kick in until we know that the object was cleared (dropped/truncated) and we are well past the retention period (internally at least 90 minutes regardless of how low it was set via the parameter). Also, I don’t think it works with indexes.

        In general the requirements for block new is drop/truncate/purge recycle bin and wait until we are well outside the retention period. For testing the easiest way to see block new is to create a new table that has never existed (and with no indexes to start) and perform a direct load into that table. If block new kicked in you should almost zero impact due to flashback database and “flashback direct read optimizations for block new” should be roughly the same as “flashback log writes”.

        Comment by Michael Smith (@Smith_MichaelT) — March 10, 2015 @ 10:48 am GMT Mar 10,2015 | Reply

        • Michael,

          “(internally at least 90 minutes regardless of how low it was set via the parameter)”

          Thanks, I’ll play around with that – I had set the retention to 15 minutes and only waited 20 minutes when trying to trigger it.
          Despite anything else, though, the real killer is the single block reads on the undo when maintaining the index rather than rebuilding it.

          Comment by Jonathan Lewis — March 10, 2015 @ 10:57 am GMT Mar 10,2015

        • Michael, FYI:

          I just ran the version of the test with the index rebuild, creating a pair of new objects (with new names) instead of dropping and purging the old. As a result the objects were created in space that had not been used at all recently in the data files.

          The resulting stats are consistent with the “newed” blocks being read into session memory using direct path reads but only the table blocks being written to the flashback log. (essentially the flashback log write bytes dropped by about 64MB – which is close to the space needed for the index).

          Comment by Jonathan Lewis — March 10, 2015 @ 11:59 am GMT Mar 10,2015

  4. True. We’ve disabled flashback on Data Warehouse database. We keep it on Data Guarded main OLTP database but flashback IO must be watched out. Not as critical as redo but close. Some years ago I blogged about an issue with LOB’s which brought database almost to a standstill after a particular load was switched on: https://laimisnd.wordpress.com/2011/03/25/lobs-and-flashback-database-performance/

    Comment by Laimis N — March 10, 2015 @ 8:10 am GMT Mar 10,2015 | Reply

    • p.s. Data Guarded databases are a special case: automated or manual standby reinstate after failover relies on flashback. As a bonus you can flashback a standby: good for incident solving.

      Comment by Laimis N — March 10, 2015 @ 8:14 am GMT Mar 10,2015 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.