Oracle Scratchpad

May 27, 2011

Audit Ouch!

Filed under: audit,Bugs,Infrastructure,Oracle,redo — Jonathan Lewis @ 5:37 pm BST May 27,2011

A few days ago I was rehearsing a presentation about how to investigate how Oracle works, and came across something surprising. Here’s a simple bit of code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		rownum <= 10000
)
select
	rownum				id,
	lpad(rownum,10,'0')		small_vc,
	case mod(rownum,1000) 
		when 0 then 'Y' else 'N' 
	end 				flag
from
	generator	v1,
	generator	v2
where	rownum <= 1000000
;

-- gather table stats here

execute snap_redo.start_snap
execute snap_my_stats.start_snap

update 
	/*+ full(t1) */ 
	t1 
set 
	flag = 'N' 
where 
	flag = 'Y'
;

execute snap_my_stats.end_snap
execute snap_redo.end_snap

The two “snap” packages take a starting snapshot of v$mystat/v$statname and v$sysstat (the latter restricted to redo statistics only) and an ending snapshot, and report the change between snapshots. Here are the results for just the redo statistics from the update when I ran the code on 10.2.0.3:

Name                                                                     Value
----                                                                     -----
redo entries                                                             1,907
redo size                                                              439,260
redo ordering marks                                                         17
redo subscn max counts                                                      18

You may at this point be wondering why the first two figures are roughly twice the size they ought to be; you might, of course, be wondering what you’re supposed to be seeing here (in which case you need to see my presentation “The Beginners’ Guide to becoming an Expert.”

My code updates one row in every thousand and each row is in a separate block. That means I should see roughly one redo entry per row at roughly 250 bytes per redo entry. Since I’m using Oracle 10g (single instance) the total number of entries ought to be slightly less than 1,000 because the first 40 or 50 should be accumulated in the private redo and in-memory undo areas. So we have about 950 redo entries too many; where did they come from ?

The test takes about 60 seconds to run, so I repeated it taking snapshots of a couple more dynamic performance views – which didn’t actually help very much – and then ran it one more time after switching log files so that I could dump the log file and find the relevant redo entries easily after the update. I won’t bore you with an extract from the log file dump, I’ll just point out that it showed me two redo records per row – op code 11.4 followed by op code 11.5, which is: “lock row, update row” – when it should simply have been showing nothing but “update row” records.

It took me a little while to figure out what was going on, but the title of this piece gives it away. A few days earlier I had enabled audit (audit_trail = db_extended) on my laptop to demonstrate a point to a client – the redundant “lock row” records appeared as a side effect (which I hadn’t noticed at the client site because we were focussing on a completely different point – which I might write about next week).

Here’s the really important point – I had NOT enabled any audit options on this table, the only thing I had done was make it possible for Oracle to record audit records. The spare records were NOT redo records relating to the aud$ table; instead, the simple act of enabling audit introduced a completely spurious “lock row” record on every update I did.

So I changed audit_trail to ‘none’, bounced the database and repeated the test, and got back to the result that I had been expecting:

Name                                                                     Value
----                                                                     -----
redo entries                                                               961
redo size                                                              295,996
redo ordering marks                                                         13
redo subscn max counts                                                      14

As you can see, we’re back to 1,000 redo records (minus the odd 40 or 50) at about 250 bytes per entry (allowing for the size of the first “private” redo record that combined lots of changes).

Note: Having observed this problem on 10.2.0.3 with audit_trail = db_extended I repeated the tests with audit_trail=db, and then tested a couple of older and newer versions of Oracle. The phenomenon appeared only on 10.2.0.3, and it didn’t matter whether I chose db or db_extended.

Footnote: the day after I made this discovery I gave what must have been my worst ever presentation: not only had I lost three hours sleep working out the details of this problem, I’d made a couple of “quick changes” to some of my demo scripts to do some extra checks and I had failed to clean up properly afterwards so I kept getting the “wrong” results and confusing myself in my live demonstrations.

Update 23/8/2011.
I’ve just fallen into the same trap at the NoCOUG conference. Since I’ve highlighted this note I thought I’d add a copy of one pair of redo records to show how I recognised what was going on.

CHANGE #4 TYP:0 CLS: 1 AFN:5 DBA:0x01800010 OBJ:88040 SCN:0x0000.04a7f3bd SEQ:  2 OP:11.4
KTB Redo 
op: 0x01  ver: 0x01  
op: F  xid:  0x0006.001.000052b0    uba: 0x00800844.0cad.1f
KDO Op code: LKR row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01800010  hdba: 0x01800009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 21 to: 2

CHANGE #5 TYP:0 CLS: 1 AFN:5 DBA:0x01800010 OBJ:88040 SCN:0x0000.04a7f476 SEQ:  1 OP:11.5
KTB Redo 
op: 0x02  ver: 0x01  
op: C  uba: 0x00800844.0cad.20
KDO Op code: URP row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x01800010  hdba: 0x01800009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 21(0x15) flag: 0x2c lock: 2 ckix: 0
ncol: 3 nnew: 1 size: 0
Vector content: 
col  2: [ 1]  4e

Note particularly OP:11.4 is an LKR record (OP Code on 5th line), and OP:11.5 is a URP record … Lock Row, Update Row Piece.

6 Comments »

  1. Hi Jonathan,
    >> Footnote [...] I gave what must have been my worst ever presentation
    No, don’t worry, it was a good presentation.
    You proved that it is not so easy to match expected results with experimental observations, because there are so many parameters that can change the behaviour of such a complex system as Oracle is. But you have shown as well that an expert will not give up and look at the system as a black box. That he still can try to understand what happens, even if it takes a few hours…
    Regards,
    Franck.

    Comment by Franck Pachot — May 29, 2011 @ 9:08 am BST May 29,2011 | Reply

    • Franck,

      Thanks for the comment.
      I hope the rest of the audience viewed the presentation the same way. I have to say, when things “go wrong” I prefer it when I’ve planned for them to go wrong so that I don’t have to backtrack and contradict myself. Still, with hindsight I can say the audience did see the most important bit of the “science of Oracle”, the bit where someone says: “wait a minute – that can’t be right.”

      Comment by Jonathan Lewis — June 2, 2011 @ 8:41 pm BST Jun 2,2011 | Reply

  2. Bug 5166745 Excessive redo generation with auditing enabled
    Versions confirmed as being affected
    * 10.2.0.1
    * 10.2.0.2
    * 10.2.0.3
    This issue is fixed in
    * 10.2.0.4 (Server Patch Set)
    * 11.1.0.6 (Base Release)
    This problem is introduced in 10g R2 (10.2.0.1).
    Excessive redo generation occurs in 10.2 with standard auditing enabled
    (around 15%-20% increase compared to 10.1/9.2)

    Comment by Anonymous — June 3, 2011 @ 11:12 am BST Jun 3,2011 | Reply

    • Thanks for the information. It’s nice to know that it’s so localised.

      I noticed the document had a summary of the redo OPs – do you know if there’s a quick way to generate this summary. At present I just dump the log file and then grep my way through it – it would be convenient if there were a dump option that dumped just the one necessary line. (I really don’t want to have to write a C program to unravel the file, given the infrequency of the occasions when I need to do that type of analysis).

      Comment by Jonathan Lewis — June 4, 2011 @ 10:29 am BST Jun 4,2011 | Reply

  3. [...] by session was a light-weight operation, think again. Interestingly, the auditing anomaly that I reported recently re-appears in the “audit by session” option. You will recall that when auditing is [...]

    Pingback by Audit Excess « Oracle Scratchpad — June 7, 2011 @ 6:20 pm BST Jun 7,2011 | Reply

  4. [...] anomaly may look familiar – but this time it’s nothing to do with the audit_trail, it just seems to be the way [...]

    Pingback by Correlation oddity « Oracle Scratchpad — December 19, 2011 @ 6:26 pm BST Dec 19,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers