Oracle Scratchpad

March 9, 2015

Flashback logging

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 2:44 pm BST Mar 9,2015

When database flashback first appeared many years ago I commented (somewhere, but don’t ask me where) that it seemed like a very nice idea for full-scale test databases if you wanted to test the impact of changes to batch code, but I couldn’t really see it being a good idea for live production systems because of the overheads.

Features and circumstances change, of course, and someone recently pointed out that if your production system is multi-terabyte and you’re running with a dataguard standby and some minor catastrophe forces you to switch to the standby then you don’t really want to be running without a standby for the time it would take for you to use restore and recover an old backup to create a new standby and there may be cases where you could flashback the original primary to before the catastrophe and turn it into the standby from that point onward. Sounds like a reasonable argument to me – but you might still need to think very carefully about how to minimise the impact of enabling database flashback, especially if your database is a datawarehouse, DSS, or mixed system.

Imagine you have a batch processes that revolve around loading data into an empty table with a couple of indexes – it’s a production system so you’re running with archivelog mode enabled, and then you’re told to switch on database flashback. How much impact will that have on your current loading strategies ? Here’s a little bit of code to help you on your way – I create an empty table as a clone of the view all_objects, and create one index, then I insert 1.6M rows into it. I’ve generated 4 different sets of results: flashback on or off, then either maintaining the index during loading or marking it unusable then rebuilding it after the load. Here’s the minimum code:


create table t1 segment creation immediate tablespace test_8k
as
select * from all_objects
where   rownum < 1
;

create index t1_i1 on t1(object_name, object_id) tablespace test_8k_assm_auto;
-- alter index t1_i1 unusable;

insert /*+ append */ into t1
with object_data as (
        select --+ materialize
                *
        from
                all_objects
        where
                rownum <= 50000
),
counter as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 32
)
select
        /*+ leading (ctr obj) use_nl(obj) */
        obj.*
from
        counter         ctr,
        object_data     obj
;

-- alter index t1_i1 rebuild;

Here’s a quick summary of the timing I got  before I talk about the effects (running 11.2.0.4):

Flashback off:
Maintain index in real time: 138 seconds
Rebuild index at end: 66 seconds

Flashback on:
Maintain index in real time: 214 seconds
Rebuild index at end: 112 seconds

It is very important to note that these timings do not allow you to draw any generic conclusions about optimum strategies for your systems. The only interpretation you can put on them is that different circumstances may lead to very different timings, so it’s worth looking at what you could do with your own systems to find good strategies for different cases.

Most significant, probably, is the big difference between the two options where flashback is enabled – if you’ve got to use it, how do you do damage limitation. Here are some key figures, namely the file I/O stats and the some instance activity stats, I/O stats first:


"Real-time" maintenance
---------------------------------
Tempfile Stats - 09-Mar 11:41:57
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,088      22,454      20.638        .063         296        .000     1,011      22,455        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:41:58
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3      24,802      24,802       1.000        .315      24,802        .315           0        .000          2,386      20,379        .239    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         718      22,805      31.762        .001           5        .000         713        .002            725      22,814        .002    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       8,485       8,485       1.000        .317       8,485        .317           0        .000            785       6,938        .348    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

Mark Unusable and Rebuild
---------------------------------
Tempfile Stats - 09-Mar 11:53:04
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,461      10,508       7.192        .100           1        .017       407      10,508        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

---------------------------------
Datafile Stats - 09-Mar 11:53:05
---------------------------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3          17          17       1.000       5.830          17       5.830           0        .000             28          49       1.636    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         894      45,602      51.009        .001           2        .002         892        .001            721      22,811        .026    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       2,586       9,356       3.618        .313         264       3.064       2,322        .001          2,443       9,214        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

There are all sorts of interesting differences in these results due to the different way in which Oracle handles the index. For the “real-time” maintenance the session accumulates the key values and rowids as it writes the table, then sorts them, then does an cache-based bulk update to the index. For the “rebuild” strategy Oracle simply scans the table after it has been loaded, sorts the key values and indexes, then writes the index to disc using direct path writes; you might expect the total work done to be roughly the same in both cases – but it’s not.

I’ve shown 4 files: the temporary tablespace, the undo tablespace, the tablespace holding the table and the tablespace holding the index; and the first obvious difference is the number of blocks written and read and the change in average read size on the temporary tablespace. Both sessions had to spill to disc for the sort, and both did a “one-pass” sort; the difference in the number of blocks written and read appears because the “real-time” session wrote the sorted data set back to the temporary tablespace one more time than it really needed to – it merged the sorted data in a single pass but wrote the data back to the temporary tablespace before reading it again and applying it to the index (for a couple of points on tracing sorts, see this posting). I don’t know why Oracle chose to use a much smaller read slot size in the second case, though.

The next most dramatic thing we see is that real-time maintenance introduced 24,800 single block reads with 20,000 blocks written to the undo tablespace (with a few thousand more that would eventually be written by dbwr – I should have included a “flush buffer_cache” in my tests), compared to virtually no activity in the “rebuild” case. The rebuild generates no undo; real-time maintenance (even starting with an empty index) generates undo because (in theory) someone might look at the index and need to see a read-consistent image of it. So it’s not surprising that we see a lot of writes to the undo tablespace – but where did the reads come from? I’ll answer question that later.

It’s probably not a surprise to see the difference in the number of blocks read from the table’s tablespace. When we rebuild the index we have to do a tablescan to acquire the data; but, again, we can ask why did we see 22,800 blocks read from the table’s tablespace when we were doing the insert with real-time maintenance. On a positive note those reads were multiblock reads, but what caused them? Again, I’ll postpone the answer.

Finally we see that the number of blocks read (reason again postponed) and written to the index’s tablespace are roughly similar. The writes differ because because the rebuild is doing direct path writes, while the real-time maintenance is done in the buffer cache, so there are some outstanding index blocks to be written. The reads are similar, though one test is exclusively single block reads and the other is doing (small) multiblock reads – which is just a little bit more efficient.  The difference in the number of reads is because the rebuild was at the default pctfree=10 while the index maintenance was a massive “insert in order” which would have packed the index leaf blocks at 100%.

To start the explanation – here are the most significant activity stats – some for the session, a couple for the instance:


"Real-time" maintenance
-----------------------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                        33,263
redo entries                                                           118,290
redo size                                                          466,628,852
redo size for direct writes                                        187,616,044
undo change vector size                                            134,282,356
flashback log write bytes                                          441,032,704

Rebuild
-------
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                           156
redo entries                                                            35,055
redo size                                                          263,801,792
redo size for direct writes                                        263,407,628
undo change vector size                                                122,156
flashback log write bytes                                          278,036,480

The big clue is the “physical reads for flashback new”. When you modify a block, if it hasn’t been dumped into the flashback log recently (as defined by the hidden _flashback_barrier_interval parameter) then the original version of the block has to be written to the flashback log before the change can be applied; moreover, if a block is being “newed” (Oracle-speak for being reformatted for a new use) it will also be written to flashback log. Given the way that the undo tablespace works it’s not surprising if virtually every block you modify in the undo tablespace has to be written to the flashback log before you use it. The 33,264 blocks read for “flashback new” consists of the 24,800 blocks read from the undo tablespace when we were maintaining the index in real-time plus a further 8,460 from “somewhere” – which, probably not coincidentally, matches the number of blocks read from the index tablespace as we create the index. The odd thing is that we don’t see the 22,800 reads on the table’s tablespace (which don’t occur when flashback is off) reported as “physical reads for flashback new”; this looks like a reporting error to me.

So the volume of undo requires us to generate a lot of flashback log as well as the usual increase in the amount of redo. As a little side note, we get confirmation from these stats that the index was rebuilt using direct path writes – there’s an extra 75MB of redo for direct writes.

Summary

If you are running with flashback enabled in a system that’s doing high volume data loading remember that the “physical reads for flashback new” could be a major expense. This is particularly expensive on index maintenance, which can result in a large number single block reads of the undo tablespace. The undo costs you three times – once for the basic cost of undo (and associated redo), once for the extra reads, and once for writing the flashback log. Although you have to do tablescans to rebuild indexes, the cost of an (efficient, possibly direct path) tablescan may be much less than the penalty of the work relating to flashback.

Footnote: since you can’t (officially) load data into a table with an unusable unique index or constraint, you may want to experiment with using non-unique indexes to support unique/PK constraints and disabling the constraints while loading.

March 14, 2014

12c Temporary

Filed under: 12c,Infrastructure,Oracle,undo — Jonathan Lewis @ 6:41 pm BST Mar 14,2014

Just one of those little snippets to cover something new and remind of something old. A single session can now have three (or more) temporary tablespaces in use at the same time for different reasons.

  • In 12c you can set parameter temp_undo_enabled to true, at which point the undo relating to global temporary tables (GTTs) will be written into the database default temporary tablespace, which means it won’t generate redo. As an interesting side effect this also means that you can do DML on temporary objects in a physical standby database. Currently the undo segment shows as type “UNDEFINED” in v$sort_usage. A detail to watch out for, though – it looks as if each session may get its own temporary undo segment – so be careful about specifying the extent size and tempfile size for the database default temporary tablespace.
  • In 11g you can specify a tablespace (though not a tablespace group) in the ‘create global temporary table’ statement, which means you keep activity about GTTs separated from the I/O resulting from sorts and hash joins etc. spilling to disc; in fact you could specify a different temporary tablespace for every GTT if you wanted to – and I could imagine a case for having a couple of different temporary tablespaces to accommodate GTTs with very different usage characteristics. (Unfortunately you still can’t specify a tablespace in the dbms_lob.create_temporary() function).  If you don’t specify a tablespace for a GTT it will go into the default temporary tablespace of the user who is using it (not the database default, and not the default for the definer of the GTT). If you create indexes on a GTT they will automatically go into the same tablespace as the table.
  • Finally, of course, there’s the default temporary tablespace for the user and this is where GTTs will go if they don’t have a tablespace specified, and where all the scratch data (sorts, hash tables, factored subqueries et. al.) will go.

This combination means, of course, that you could manage to do a single “insert as select” writing a GTT to one temporary tablespace, with its undo going to a second temporary tablespace, and the spill from a sort or hash join in the select going to a third. The flexibility probably won’t make much difference to performance (for most people), but it’s possible that it will make it easier to monitor where the work is coming from if you’r ever in the position where your single temporary tablespace is subject to a lot of I/O.

Footnote:

In the past I’ve advised DBAs to set up a small number of tablespaces (or tablespace groups) so that they can allocate different classes of users – typically grouped by business function – to different temporary tablespaces. The ability to allocate GTTs to temporary tablespaces allows a further degree of refinement to this strategy.

September 2, 2011

used_urec

Filed under: Infrastructure,Oracle,undo — Jonathan Lewis @ 5:15 pm BST Sep 2,2011

A question that comes up from time to time on the OTN forums is the one about how much undo do you generate when you insert or update a row. It’s not a question that has an immediate answer – it depends on what you’re actually doing, the amount of change you introduce, and the number of indexes affected. However, there are a few guidelines that will point you in the right direction – and the key is to keep an eye on the used_urec and used_ublk colummns from v$transactions.

Consider the example of inserting data: when you insert a row, you typically generate one undo record (used_urec) for the row, but when you insert many rows using an array insert Oracle has an optimisation mechanism on undo record creation that allows it to create one used_urec to cover all the changes you have made simultaneously to an individual block – so used_urec could be much smaller than the number of rows processed.

However, if you have indexes in place and are doing normal index maintenance on import, then each table row would require each index to be updated, so you would go back to one used_urec per table row plus one used_urec per index maintained per table row.

So, when you look at the “big picture” there’s no obvious correlation between rows inserted and undo generated — until you look at the fine detail of exactly what you’re doing, and whether any optimisations apply. (The details of the optimisation strategies available vary with the chosen insert mechanisms and with version of Oracle)

Just for reference, here’s a link to a little note I wrote some months ago about monitoring undo.

May 8, 2011

Consistent Gets

Filed under: Infrastructure,Oracle,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 10:09 am BST May 8,2011

There’s an interesting question on the OTN database forum at present – why does an update of 300,000 rows take a billion buffer visits. (There are 25 indexes on the table – so you might point a finger at that initially, but only one of the indexes is going to be changed by the update so that should only account for around an extra 10 gets per row in a clean environment.)
(more…)

April 19, 2011

More CR

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 6:32 pm BST Apr 19,2011

Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in 10.2.0.3):
(more…)

April 18, 2011

Consistent Reads

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 11:08 am BST Apr 18,2011

Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):
(more…)

December 23, 2010

Private Redo

Filed under: redo,undo — Jonathan Lewis @ 10:17 am BST Dec 23,2010

About this time last year I wrote a short answer on OTN about Private Redo Threads and In-Memory Undo. Thanks to a follow-up question a year later I’ve been prompted to supply a link to my original answer because it was actually pretty good: OTN Thread “In Memory Undo”.

Update: If you’re looking at this note and haven’t expanded the view to see the comments, make sure that you do look at them since they include a couple of technical details I described in response to follow-up questions.

September 22, 2010

Session Undo

Filed under: Infrastructure,Oracle,Troubleshooting,undo — Jonathan Lewis @ 8:40 pm BST Sep 22,2010

One of the simple scripts I use from time to time monitors the amount of undo generated by a long running session by taking a session id (SID) as an input and joining v$session to v$transaction:


column start_scn format 999,999,999,999

select
        tr.start_scn, tr.log_io, tr.phy_io, tr.used_ublk, tr.used_urec, recursive
from
        v$session       se,
        V$transaction   tr
where
        se.sid = &1
and     tr.ses_addr = se.saddr
;

       START_SCN     LOG_IO     PHY_IO  USED_UBLK  USED_UREC REC
---------------- ---------- ---------- ---------- ---------- ---
 136,509,466,121   10730956     358074          1          1 NO
 136,515,115,543         12          0          1          3 YES

You’ll note that this is a session that has been doing quite a lot of work (log_io = db block gets + db block changes, and phy_io = physical reads). I was watching it because I wanted to check that it was only generating undo in a series of very short recursive transactions. (It’s running a coalesce on a very large, very smashed, index).

Interestingly the work done by the recursive transactions for the coalesce was accounted for against the parent transaction, even though (fortunately) the undo block and record usage was not. The used_urec and used_ublk columns can best be described as showing you “currently held” space, rather than “cumulative space used”.

Addendum: with reference to Gary’s comment below – here’s a lovely little script – originally written for 9i, but still working) from Steve Adams for estimating how long it will take for rolling back to complete. Note that it references sys.x_$ktuxe; Steve has a convention of creating “x_” views on the x$ objects when there is no v$ view supplied by Oracle, so x_$ktuxe is just a sys-owned view defined as ‘select * from x$ktuxe’.

February 9, 2010

Why Undo ?

Filed under: Infrastructure,redo,undo — Jonathan Lewis @ 5:32 pm BST Feb 9,2010

A recent thread on the OTN database forum asks the question:

“… since redo also has past and current information why can’t redo logs be used to retrieve that information? Why is undo required when redo already has all that information.”

The thread has generated some interesting replies – but to a large extent they describe how the undo and redo work rather than explaining why the designers at Oracle Corp. chose to implement undo and redo the way they did. Since I’m sitting in an airport (Zurich – where the coffee bar in the business lounge has a very large bowl of Lindt chocolate squares available) waiting for a plane I thought I’d use the time to share my thoughts on the why.

(more…)

October 14, 2009

Nutshell – 1

Filed under: Infrastructure,redo,undo — Jonathan Lewis @ 8:22 pm BST Oct 14,2009

Undo, redo, and recovery in a nutshell. (The “1” in the title may turn out to be optimistic, I offer no guarantee about further nutshells.)

In the most general case of updating a block – when you change one row in an table, say, or mark an index entry as deleted – Oracle does the following:
(more…)

October 7, 2009

Undone

Filed under: Infrastructure,Oracle,Troubleshooting,undo — Jonathan Lewis @ 7:17 pm BST Oct 7,2009

Here’s a little problem that was thrown at me a little while ago during a visit to a client of mine. It only took about five minutes to fix, but it seemed like a nice demonstration of trouble-shooting so I thought I’d write it up for the blog.

We start with the moment when one of the developers told me that they were seeing lots of session failing with the error “unable to extend segment in undo tablespace”.
(more…)

March 22, 2009

Block size – again

Filed under: Block Size,Infrastructure,Performance,trace files,Troubleshooting,undo — Jonathan Lewis @ 7:09 pm BST Mar 22,2009

Here’s a little oddity that I came across at a client site recently.

The client called me in because they were having problems with Oracle error “ORA-01555: snapshot too old” appearing in a particular task after a few thousand seconds (typically 5,000 to 6,000) even though they had set the undo_retention to 14,400 seconds and had a huge undo tablespace running with autoextend enabled on the data files.

(more…)

November 22, 2006

Tuning Updates

Filed under: Indexing,Infrastructure,Oracle,redo,Tuning,undo — Jonathan Lewis @ 8:45 pm BST Nov 22,2006

Consider the following update statement:

update
        /*+ index(t1 t1_n1) */
        t1
set
        n1 = 0,
        n2 = n2 + 1,
        status = 'X'
where
        n1 > 100
and     status = 'A'
;

(more…)

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,092 other followers