A few days ago I was rehearsing a presentation about how to investigate how Oracle works, and came across something surprising. Here’s a simple bit of code:
create table t1 as with generator as ( select --+ materialize rownum id from dual connect by rownum <= 10000 ) select rownum id, lpad(rownum,10,'0') small_vc, case mod(rownum,1000) when 0 then 'Y' else 'N' end flag from generator v1, generator v2 where rownum <= 1000000 ; -- gather table stats here execute snap_redo.start_snap execute snap_my_stats.start_snap update /*+ full(t1) */ t1 set flag = 'N' where flag = 'Y' ; execute snap_my_stats.end_snap execute snap_redo.end_snap
The two “snap” packages take a starting snapshot of v$mystat/v$statname and v$sysstat (the latter restricted to redo statistics only) and an ending snapshot, and report the change between snapshots. Here are the results for just the redo statistics from the update when I ran the code on 10.2.0.3:
Name Value ---- ----- redo entries 1,907 redo size 439,260 redo ordering marks 17 redo subscn max counts 18
You may at this point be wondering why the first two figures are roughly twice the size they ought to be; you might, of course, be wondering what you’re supposed to be seeing here (in which case you need to see my presentation “The Beginners’ Guide to becoming an Expert.”
My code updates one row in every thousand and each row is in a separate block. That means I should see roughly one redo entry per row at roughly 250 bytes per redo entry. Since I’m using Oracle 10g (single instance) the total number of entries ought to be slightly less than 1,000 because the first 40 or 50 should be accumulated in the private redo and in-memory undo areas. So we have about 950 redo entries too many; where did they come from ?
The test takes about 60 seconds to run, so I repeated it taking snapshots of a couple more dynamic performance views – which didn’t actually help very much – and then ran it one more time after switching log files so that I could dump the log file and find the relevant redo entries easily after the update. I won’t bore you with an extract from the log file dump, I’ll just point out that it showed me two redo records per row – op code 11.4 followed by op code 11.5, which is: “lock row, update row” – when it should simply have been showing nothing but “update row” records.
It took me a little while to figure out what was going on, but the title of this piece gives it away. A few days earlier I had enabled audit (audit_trail = db_extended) on my laptop to demonstrate a point to a client – the redundant “lock row” records appeared as a side effect (which I hadn’t noticed at the client site because we were focussing on a completely different point – which I might write about next week).
Here’s the really important point – I had NOT enabled any audit options on this table, the only thing I had done was make it possible for Oracle to record audit records. The spare records were NOT redo records relating to the aud$ table; instead, the simple act of enabling audit introduced a completely spurious “lock row” record on every update I did.
So I changed audit_trail to ‘none’, bounced the database and repeated the test, and got back to the result that I had been expecting:
Name Value ---- ----- redo entries 961 redo size 295,996 redo ordering marks 13 redo subscn max counts 14
As you can see, we’re back to 1,000 redo records (minus the odd 40 or 50) at about 250 bytes per entry (allowing for the size of the first “private” redo record that combined lots of changes).
Note: Having observed this problem on 10.2.0.3 with audit_trail = db_extended I repeated the tests with audit_trail=db, and then tested a couple of older and newer versions of Oracle. The phenomenon appeared only on 10.2.0.3, and it didn’t matter whether I chose db or db_extended.
Footnote: the day after I made this discovery I gave what must have been my worst ever presentation: not only had I lost three hours sleep working out the details of this problem, I’d made a couple of “quick changes” to some of my demo scripts to do some extra checks and I had failed to clean up properly afterwards so I kept getting the “wrong” results and confusing myself in my live demonstrations.
I’ve just fallen into the same trap at the NoCOUG conference. Since I’ve highlighted this note I thought I’d add a copy of one pair of redo records to show how I recognised what was going on.
CHANGE #4 TYP:0 CLS: 1 AFN:5 DBA:0x01800010 OBJ:88040 SCN:0x0000.04a7f3bd SEQ: 2 OP:11.4 KTB Redo op: 0x01 ver: 0x01 op: F xid: 0x0006.001.000052b0 uba: 0x00800844.0cad.1f KDO Op code: LKR row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x01800010 hdba: 0x01800009 itli: 2 ispac: 0 maxfr: 4863 tabn: 0 slot: 21 to: 2 CHANGE #5 TYP:0 CLS: 1 AFN:5 DBA:0x01800010 OBJ:88040 SCN:0x0000.04a7f476 SEQ: 1 OP:11.5 KTB Redo op: 0x02 ver: 0x01 op: C uba: 0x00800844.0cad.20 KDO Op code: URP row dependencies Disabled xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x01800010 hdba: 0x01800009 itli: 2 ispac: 0 maxfr: 4863 tabn: 0 slot: 21(0x15) flag: 0x2c lock: 2 ckix: 0 ncol: 3 nnew: 1 size: 0 Vector content: col 2: [ 1] 4e
Note particularly OP:11.4 is an LKR record (OP Code on 5th line), and OP:11.5 is a URP record … Lock Row, Update Row Piece.