Part 12 is about a 2-node RAC system. Someone recently posted links to a matched pair of AWR reports in this thread on the OTN Database General forum and, after I had made a couple of quick comments on them, gave me permission to use them on my blog.
Because OTN threads get dropped occasionally and people’s web addresses can change, I’ve copied the original files up onto my blog area so that the comments I make below can still be read in context. There’s a Word format (2MB) and a pdf format file (600KB) for each report so you can choose your preferred download and search tool.
I’ve set the files to open in new windows to make it easier to compare them and look at my notes at the same time – the report for node 1 is here in Word format and here as a pdf, and the report for node2 is here in Word format and here as a pdf.
As ever, we start with a quick context check – size of machine, type of job etc. and then take a look at work done (Load Profile) and time spent (Top 5 Timed events).
The snapshots are for two hours (this is the sort of thing that confuses me from time to time – I always expect to see one hour duration and sometimes forget to adjust for different times as I’m working through a report), but we see that the DB time is only slightly larger than the elapsed time, so the system is not too busy. (DB time is a feature introduced in 10g but only reported in the AWR output, not in Statspack.)
We note that the NUM_CPUS in the Operating System Statistics section is eight in one report – and 16 in the other (an imbalance that I’m not keen on in RAC systems). In one case the BUSY_TIME is around 10% of the IDLE_TIME, and in the other it’s closer to 6% – so again we see that “the system” is not heavily loaded. If there’s a problem it seems likely that it’s a particular task that is taking too much time, and we should be using the extended trace (10046 at level 8 or level 12) to investigate it. But let’s carry on to see if we can pick up any oddments anyway.
Still in the OS statistics we note that the memory is 16GB for the smaller machine and 32GB for the larger – and comparing that with the cache sizes at the top of the report we note that the buffer cache is 5.5GB and the shared pool is 600MB – so we might wonder if there’s another database, or an application server, also using this machine. (That might also help to explain why the CPU Time recorded by the instance is significantly less than the BUSY_TIME recorded by the operating system.)
Looking at the Load Profiles then – there’s nothing particularly exciting to see. The volume of work is not large for machines of this size. There is an interesting little oddity that one machine has done twice the number of transaction of the other – so perhaps it’s worth considering what’s going on with the load balancing strategy (maybe different types of work have been isolated to different nodes).
Looking at the Top 5, though, there are a couple of items to pursue. Both instances have spent a significant fraction of their time in “gc buffer busy” waits – an indication that processes on different nodes are competing for the same data at the same time, and that’s something we might be able to address. We’ll take a look at “SQL ordered by Cluster Wait Time” in a moment, and at “Segments by Global Cache Buffer Busy” (as well as the other two “Segment by ..” for cluster wait time.
One instance is showing quite a lot of time on “enq: TX – row lock contention” – transactions competing with each other to modify (or create) the same data at the same time. We’ll have a look at the “SQL ordered by elapsed time” and the “enqueue activity” in a moment.
We also note that both instances show quite a lot of time for “flashback buf free by RVWR”. This system is running the “flashback database” enabled, and these waits are due to the activity of the “recovery writer”. When you are about to change a block the first time after it’s been read into the cache, the recovery writer has to copy the block into the “recovery log buffer” or “flashback buffer”. If the flashback buffer is too small then a foreground process may have to wait for the recovery writer to clear some space before it can proceed – this may help to explain some of the competition for buffers (gc buffer busy, and ordinary buffer busy waits) that we see elsewhere.
I believe the flashback buffer defaults to twice the redo log buffer (although it’s 15MB in this case). The way in which Oracle 10g sets the redo log buffer sizing is a little complex, depending on granule sizes and the setting for processes and transactions - and things can go a little odd in RAC which then doesn’t use the “private redo thread” technology – which is why we see the log buffer reported as a slightly surprising 4MB (which turned out to be two shared threads of 2MB each).
The normal advice for setting the log_buffer parameter in 10.2 is “don’t” – but in this case I think we might want to set it to 16MB or 32MB to see if that eliminates the waits for the recovery log buffer and, consequentially, some of the buffer busy waits and gc buffer busy. (Unfortuantely it means restarting the instances, of course – but apart from that it’s a quick and easy fix.)
Both instances show lots of fairly quick log file syncs – that says something about the nature of the application. One instance shows quite a lot of fairly fast db file sequential reads (and the other instance isn’t far behind when we check the main Wait Events section. It looks like I/O isn’t a problem on this system – but we do have a lot of O/S memory, which we are probably using as a secondary (file system) cache behind the db cache.
So where have we got to: there is a configuration detail that we may be able to fix quickly; we have a lot of little transactions, a lot of competition for the same buffers between nodes, and even (possibly) for the same rows.
Checking the “SQL ordered by Cluster Wait Time” we see lots of:
DELETE FROM JMS_TRANSACTIONS W... INSERT INTO MESSAGECONTENTS (I... INSERT INTO JMS_MESSAGES (MESS... DELETE FROM JMS_MESSAGES WHERE... UPDATE JMS_MESSAGES SET TXID INSERT INTO MAILBOXITEM (MAILB... UPDATE MAILBOXITEM SET LASTMOD
and in the “Segments by global cache buffer busy” we see asssociated segments like:
MAILBOXITEM TABLE JMS_TRANSACTIONS TABLE MAILBOXITEM_MBID_OWN_DATES INDEX COMPLIANCE_DETAILS_PK INDEX MAILBOXITEM_MAILBOX_SUBMSG_PK INDEX JMS_MESSAGES_DESTINATION_IB INDEX JMS_MESSAGES_TXID_TXOP_I INDEX
Initially I suggested that the owner of this system might want to look at making all the processes that use this message system connect to a single instance. However, the object names and SQL led to observations from another reader of the forum (Timur Akhmadeev) that the system was using the JAVA_XA package (which I know nothing about) and suggested that the system should be using the newer JDBC drivers instead.
Clearly there’s an opportunity for a strategic step to be taken in this particular area before pursuing too much more detail – but however it’s done, once the easy fix to the flashback log is in place, the important thing it to minimise the competition across the interconnect for popular blocks.
Footnote: Any advice about reading AWR reports is almost always relevant when reading Statspack reports (and vice versa).