Oracle Scratchpad

March 9, 2015

Flashback logging

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 2:44 pm BST Mar 9,2015

When database flashback first appeared many years ago I commented (somewhere, but don’t ask me where) that it seemed like a very nice idea for full-scale test databases if you wanted to test the impact of changes to batch code, but I couldn’t really see it being a good idea for live production systems because of the overheads.

Features and circumstances change, of course, and someone recently pointed out that if your production system is multi-terabyte and you’re running with a dataguard standby and some minor catastrophe forces you to switch to the standby then you don’t really want to be running without a standby for the time it would take for you to use restore and recover an old backup to create a new standby and there may be cases where you could flashback the original primary to before the catastrophe and turn it into the standby from that point onward. Sounds like a reasonable argument to me – but you might still need to think very carefully about how to minimise the impact of enabling database flashback, especially if your database is a datawarehouse, DSS, or mixed system.

Imagine you have a batch processes that revolve around loading data into an empty table with a couple of indexes – it’s a production system so you’re running with archivelog mode enabled, and then you’re told to switch on database flashback. How much impact will that have on your current loading strategies ? Here’s a little bit of code to help you on your way – I create an empty table as a clone of the view all_objects, and create one index, then I insert 1.6M rows into it. I’ve generated 4 different sets of results: flashback on or off, then either maintaining the index during loading or marking it unusable then rebuilding it after the load. Here’s the minimum code:


create table t1 segment creation immediate tablespace test_8k
as
select * from all_objects
where   rownum < 1
;

create index t1_i1 on t1(object_name, object_id) tablespace test_8k_assm_auto;
-- alter index t1_i1 unusable;

insert /*+ append */ into t1
with object_data as (
        select --+ materialize
                *
        from
                all_objects
        where
                rownum <= 50000
),
counter as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 32
)
select
        /*+ leading (ctr obj) use_nl(obj) */
        obj.*
from
        counter         ctr,
        object_data     obj
;

-- alter index t1_i1 rebuild;

Here’s a quick summary of the timing I got  before I talk about the effects (running 11.2.0.4):

Flashback off:
Maintain index in real time: 138 seconds
Rebuild index at end: 66 seconds

Flashback on:
Maintain index in real time: 214 seconds
Rebuild index at end: 112 seconds

It is very important to note that these timings do not allow you to draw any generic conclusions about optimum strategies for your systems. The only interpretation you can put on them is that different circumstances may lead to very different timings, so it’s worth looking at what you could do with your own systems to find good strategies for different cases.

Most significant, probably, is the big difference between the two options where flashback is enabled – if you’ve got to use it, how do you do damage limitation. Here are some key figures, namely the file I/O stats and the some instance activity stats, I/O stats first:


"Real-time" maintenance
---------------------------------
Tempfile Stats - 09-Mar 11:41:57
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,088      22,454      20.638        .063         296        .000     1,011      22,455        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:41:58
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3      24,802      24,802       1.000        .315      24,802        .315           0        .000          2,386      20,379        .239    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         718      22,805      31.762        .001           5        .000         713        .002            725      22,814        .002    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       8,485       8,485       1.000        .317       8,485        .317           0        .000            785       6,938        .348    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

Mark Unusable and Rebuild
---------------------------------
Tempfile Stats - 09-Mar 11:53:04
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,461      10,508       7.192        .100           1        .017       407      10,508        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:53:05
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3          17          17       1.000       5.830          17       5.830           0        .000             28          49       1.636    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         894      45,602      51.009        .001           2        .002         892        .001            721      22,811        .026    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       2,586       9,356       3.618        .313         264       3.064       2,322        .001          2,443       9,214        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

There are all sorts of interesting differences in these results due to the different way in which Oracle handles the index. For the “real-time” maintenance the session accumulates the key values and rowids as it writes the table, then sorts them, then does an cache-based bulk update to the index. For the “rebuild” strategy Oracle simply scans the table after it has been loaded, sorts the key values and indexes, then writes the index to disc using direct path writes; you might expect the total work done to be roughly the same in both cases – but it’s not.

I’ve shown 4 files: the temporary tablespace, the undo tablespace, the tablespace holding the table and the tablespace holding the index; and the first obvious difference is the number of blocks written and read and the change in average read size on the temporary tablespace. Both sessions had to spill to disc for the sort, and both did a “one-pass” sort; the difference in the number of blocks written and read appears because the “real-time” session wrote the sorted data set back to the temporary tablespace one more time than it really needed to – it merged the sorted data in a single pass but wrote the data back to the temporary tablespace before reading it again and applying it to the index (for a couple of points on tracing sorts, see this posting). I don’t know why Oracle chose to use a much smaller read slot size in the second case, though.

The next most dramatic thing we see is that real-time maintenance introduced 24,800 single block reads with 20,000 blocks written to the undo tablespace (with a few thousand more that would eventually be written by dbwr – I should have included a “flush buffer_cache” in my tests), compared to virtually no activity in the “rebuild” case. The rebuild generates no undo; real-time maintenance (even starting with an empty index) generates undo because (in theory) someone might look at the index and need to see a read-consistent image of it. So it’s not surprising that we see a lot of writes to the undo tablespace – but where did the reads come from? I’ll answer question that later.

It’s probably not a surprise to see the difference in the number of blocks read from the table’s tablespace. When we rebuild the index we have to do a tablescan to acquire the data; but, again, we can ask why did we see 22,800 blocks read from the table’s tablespace when we were doing the insert with real-time maintenance. On a positive note those reads were multiblock reads, but what caused them? Again, I’ll postpone the answer.

Finally we see that the number of blocks read (reason again postponed) and written to the index’s tablespace are roughly similar. The writes differ because because the rebuild is doing direct path writes, while the real-time maintenance is done in the buffer cache, so there are some outstanding index blocks to be written. The reads are similar, though one test is exclusively single block reads and the other is doing (small) multiblock reads – which is just a little bit more efficient.  The difference in the number of reads is because the rebuild was at the default pctfree=10 while the index maintenance was a massive “insert in order” which would have packed the index leaf blocks at 100%.

To start the explanation – here are the most significant activity stats – some for the session, a couple for the instance:


"Real-time" maintenance
-----------------------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                        33,263
redo entries                                                           118,290
redo size                                                          466,628,852
redo size for direct writes                                        187,616,044
undo change vector size                                            134,282,356
flashback log write bytes                                          441,032,704

Rebuild
-------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                           156
redo entries                                                            35,055
redo size                                                          263,801,792
redo size for direct writes                                        263,407,628
undo change vector size                                                122,156
flashback log write bytes                                          278,036,480

The big clue is the “physical reads for flashback new”. When you modify a block, if it hasn’t been dumped into the flashback log recently (as defined by the hidden _flashback_barrier_interval parameter) then the original version of the block has to be written to the flashback log before the change can be applied; moreover, if a block is being “newed” (Oracle-speak for being reformatted for a new use) it will also be written to flashback log. Given the way that the undo tablespace works it’s not surprising if virtually every block you modify in the undo tablespace has to be written to the flashback log before you use it. The 33,264 blocks read for “flashback new” consists of the 24,800 blocks read from the undo tablespace when we were maintaining the index in real-time plus a further 8,460 from “somewhere” – which, probably not coincidentally, matches the number of blocks read from the index tablespace as we create the index. The odd thing is that we don’t see the 22,800 reads on the table’s tablespace (which don’t occur when flashback is off) reported as “physical reads for flashback new”; this looks like a reporting error to me.

So the volume of undo requires us to generate a lot of flashback log as well as the usual increase in the amount of redo. As a little side note, we get confirmation from these stats that the index was rebuilt using direct path writes – there’s an extra 75MB of redo for direct writes.

Summary

If you are running with flashback enabled in a system that’s doing high volume data loading remember that the “physical reads for flashback new” could be a major expense. This is particularly expensive on index maintenance, which can result in a large number single block reads of the undo tablespace. The undo costs you three times – once for the basic cost of undo (and associated redo), once for the extra reads, and once for writing the flashback log. Although you have to do tablescans to rebuild indexes, the cost of an (efficient, possibly direct path) tablescan may be much less than the penalty of the work relating to flashback.

Footnote: since you can’t (officially) load data into a table with an unusable unique index or constraint, you may want to experiment with using non-unique indexes to support unique/PK constraints and disabling the constraints while loading.

December 23, 2014

Just in case

Filed under: Infrastructure,Oracle,Performance,redo — Jonathan Lewis @ 10:37 am BST Dec 23,2014

For those who don’t read Oracle-l and haven’t found Nikolay Savvinov’s blog, here’s a little note pulling together a recent question on Oracle-L and a relevant (and probably unexpected) observation from the blog. The question (paraphrased) was:

The developers/data modelers are creating all the tables with varchar2(4000) as standard by default “Just in case we need it”. What do you think of this idea?

The general answer was that it’s a bad idea (and unnecessary, anyway) and one specific threat that got mentioned was the problem of creating indexes and Oracle error ORA-01450; but coincidentally Nikolay Savvinov had written about a performance-related “bug” in August this year, which turned out, only last week, to be expected behaviour. You can read his articles for the details, but since he used a 4KB block size to demonstrate it I thought I’d repeat the exercise using an 8KB block size.

 



drop table t1 purge;
create table t1 (id number(6), v1 varchar(40), v2 varchar2(40), v3 varchar2(40));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000
;

execute snap_redo.end_snap


drop table t1 purge;
create table t1 (id number(6), v1 varchar(4000), v2 varchar2(4000), v3 varchar2(4000));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000
;

execute snap_redo.end_snap

I’ve dropped and created the same table twice, once with varchar2(40) columns and once with varchar2(4000) columns.

I’ve created an index on a (non-character) column – the specific results vary depending on whether the index is unique or non-unique, and whether or not you have the index, and whether or not the table already holds data, and the effective clustering on the index columns etc. etc. but the key difference between the two sets of results doesn’t go away.

I’ve inserted object_name values (maximum usage 32 bytes) into the varchar2() columns, inserting 10,000 rows.

The snap_redo package is one of my simple pre/post packages that calculates changes in values in some dynamic performance view – in this case it’s looking at v$sysstat (system statistics) for statistics relating to redo generation, which means you need to run this test on an otherwise idle instance. Here are the two sets of results from an instance of 11.2.0.4:


Name                                                                     Value
----                                                                     -----
messages sent                                                               11
messages received                                                           11
calls to kcmgcs                                                            313
calls to kcmgas                                                             37
calls to get snapshot scn: kcmgss                                           74
redo entries                                                               769
redo size                                                            1,317,008
redo wastage                                                             3,888
redo writes                                                                 11
redo blocks written                                                      2,664
redo write time                                                             10
redo blocks checksummed by FG (exclusive)                                2,242
redo ordering marks                                                          1
redo subscn max counts                                                       1
redo synch time                                                              7
redo synch time (usec)                                                  88,875
redo synch time overhead (usec)                                          1,810
redo synch time overhead count (<2 msec)                                    11
redo synch writes                                                           11
redo write info find                                                        11
undo change vector size                                                261,136
rollback changes - undo records applied                                      2
IMU undo allocation size                                                17,184


Name                                                                     Value
----                                                                     -----
messages sent                                                                8
messages received                                                            8
calls to kcmgcs                                                            222
calls to kcmgas                                                             56
calls to get snapshot scn: kcmgss                                           52
redo entries                                                            20,409
redo size                                                            5,606,872
redo buffer allocation retries                                               1
redo wastage                                                             1,248
redo writes                                                                  6
redo blocks written                                                     11,324
redo write time                                                             26
redo blocks checksummed by FG (exclusive)                                  571
redo ordering marks                                                         32
redo subscn max counts                                                       1
redo synch time                                                              6
redo synch time (usec)                                                  60,230
redo synch time overhead (usec)                                            159
redo synch time overhead count (<2 msec)                                     1
redo synch writes                                                            1
redo write info find                                                         1
undo change vector size                                              1,590,520
IMU undo allocation size                                                   144

Notice, particularly, the great change in the number of redo entries and the total redo size when the character columns are defined at varchar2(4000). Note particularly that the number of redo entries is roughly “2 * number of rows inserted” – for each row that’s one for the row and one for the index entry. You can check the redo log content by dump the log file, of course (and Nikolay did), or you can take my word for it that Oracle is doing the equivalent of single row processing in the varchar2(4000) case and array processing in the varchar2(40) case.

When Oracle calculates that the row length is larger than the block size it falls back to single row processing; this can increase your redo generation significantly, and since the rate at which you can pump out redo is the ultimate rate at which you can load data this could have a significant impact on your data loading times. Declaring character columns as varchar2(4000) “just in case” is a bad idea.

 

June 25, 2013

12c

Filed under: 12c,Infrastructure,Oracle,Partitioning,redo — Jonathan Lewis @ 11:43 pm BST Jun 25,2013

The news is out that 12c is now available for download (Code, Docs and Tutorials). There are plenty of nice little bits in it, and several important additions or enhancements to the optimizer, but there’s one feature that might prove to be very popular:

SQL> alter table p1 move partition solo ONLINE;

Table altered.

(more…)

October 5, 2012

SSD

Filed under: Exadata,Infrastructure,Oracle,Performance,redo — Jonathan Lewis @ 1:04 pm BST Oct 5,2012

There’s never enough time to read everything that’s worth reading, so even though Guy Harrison’s blog is one of the ones worth reading I find that it’s often months since I last read it. Visiting it late last night, I found an interesting batch of articles spread over the last year about the performance of SSD – the conclusions may not be what you expect, but make sure you read all the articles or you might end up with a completely misleading impression:

Don’t forget to read the comments as well. For other notes Guy has written about SSD, here’s a URL for his SSD tag.

September 17, 2012

Private Redo

Filed under: Infrastructure,Oracle,redo — Jonathan Lewis @ 8:28 pm BST Sep 17,2012

Following a question on the Oracle Core Addenda pages, here’s a little script to tell you about the sizes of the public and private redo threads currently active in the instance. It’s a minor variation of a script I published in Appendix D (Dumping and Debugging), page 237 to show the addresses of current activity in the various log buffers:
(more…)

August 19, 2011

Redo

Filed under: Infrastructure,Oracle,redo — Jonathan Lewis @ 2:58 am BST Aug 19,2011

In the comments to a recent blog posting about log file syncs, Tony Hasler has produced a stunning observation about Oracle and ACID, in particular the D (durability) bit of transactions.

The observation can best be described with a demonstration (which I have run on versions from 8.1 to 11.2) involving three sessions, one of which has to be connected with sysdba privileges.

(more…)

May 27, 2011

Audit Ouch!

Filed under: audit,Bugs,Infrastructure,Oracle,redo — Jonathan Lewis @ 5:37 pm BST May 27,2011

A few days ago I was rehearsing a presentation about how to investigate how Oracle works, and came across something surprising. Here’s a simple bit of code:
(more…)

January 3, 2011

Redo

Filed under: Infrastructure,redo — Jonathan Lewis @ 7:40 pm BST Jan 3,2011

A couple of days ago I published a link to some comments I had made on OTN about differences in redo generation between 10g and earlier versions of Oracle. This raised a few questions that suggested a follow-up (or perhaps “prequel”) note might be a little helpful. So I’ve created a simple SQL script to help demonstrate the differences and show how some of the pieces hang together.
(more…)

December 23, 2010

Private Redo

Filed under: redo,undo — Jonathan Lewis @ 10:17 am BST Dec 23,2010

About this time last year I wrote a short answer on OTN about Private Redo Threads and In-Memory Undo. Thanks to a follow-up question a year later I’ve been prompted to supply a link to my original answer because it was actually pretty good: OTN Thread “In Memory Undo”.

Update: If you’re looking at this note and haven’t expanded the view to see the comments, make sure that you do look at them since they include a couple of technical details I described in response to follow-up questions.

July 2, 2010

Unrecoverable

Filed under: Infrastructure,redo — Jonathan Lewis @ 6:38 pm BST Jul 2,2010

A recent question on the OTN database forum asked: “What’s the difference between index rebuild unrecoverable and nologging?”

The most important difference, of course, is that unrecoverable is a deprecated option so you shouldn’t be using it even though it still works.
(more…)

June 27, 2010

Coalesce

Filed under: Index Rebuilds,Indexing,Infrastructure,Oracle,Performance,redo — Jonathan Lewis @ 6:36 pm BST Jun 27,2010

The following question came up in an email conversation a little while ago:

Are you aware of any problems a large oltp site might have with running index coalesce during production hours, as opposed to doing index rebuilds in a maintenance window?

(more…)

February 9, 2010

Why Undo ?

Filed under: Infrastructure,redo,undo — Jonathan Lewis @ 5:32 pm BST Feb 9,2010

A recent thread on the OTN database forum asks the question:

“… since redo also has past and current information why can’t redo logs be used to retrieve that information? Why is undo required when redo already has all that information.”

The thread has generated some interesting replies – but to a large extent they describe how the undo and redo work rather than explaining why the designers at Oracle Corp. chose to implement undo and redo the way they did. Since I’m sitting in an airport (Zurich – where the coffee bar in the business lounge has a very large bowl of Lindt chocolate squares available) waiting for a plane I thought I’d use the time to share my thoughts on the why.

(more…)

October 30, 2009

logging

Filed under: Performance,redo — Jonathan Lewis @ 1:02 pm BST Oct 30,2009

I’ve just jotted down a few notes about “log file sync” waits, “log file parallel write” waits, and the nologging option in response to a question on OTN about redo activity when creating a large index. The ensuing conversation also picks up various topics relating also to backup, recovry and dataguard.

October 14, 2009

Nutshell – 1

Filed under: Infrastructure,redo,undo — Jonathan Lewis @ 8:22 pm BST Oct 14,2009

Undo, redo, and recovery in a nutshell. (The “1” in the title may turn out to be optimistic, I offer no guarantee about further nutshells.)

In the most general case of updating a block – when you change one row in an table, say, or mark an index entry as deleted – Oracle does the following:
(more…)

November 28, 2008

Log file write

Filed under: Infrastructure,Performance,redo,Troubleshooting — Jonathan Lewis @ 9:04 am BST Nov 28,2008

A few years ago, I wrote a note about a bug in 9i that made it difficult to decide whether or not you had a problem with the log writer (lgwr). The problem was that the time spent in wait event “log file parallel write” was not captured properly – so you were reduced to guessing whether the time spent in “log file sync” might actually be an indication of a log file write problem.

Time changes everything – including the available tools and levels of skill of the people doing the trouble – and I see that Riyaj Shamsudeen recently published a note about the same problem.

Riyaj approaches the problem from a different direction, using exactly the right tool for the task, to demonstrate the error directly (rather than relying on modelling, inference and a follow-up check with Metalink as I did in 2005).

[Update 30th Nov 2008:] Here’s another interesting article on a closely related topic from Christian Bilien – in this case from a 10g system.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,191 other followers