Oracle Scratchpad

February 18, 2013

Transactions 2

Filed under: Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 7:54 am UTC Feb 18,2013

Here’s a little follow-on from Friday’s posting. I’ll start it off as a quiz, and follow up tomorrow with an explanation of the results (though someone will probably have given the correct solution by then anyway).

I have a simple heap table t1(id number(6,0), n1 number, v1 varchar2(10), padding varchar2(100)). The primary key is the id column, and the table holds 3,000 rows where id takes the values from 1 to 3,000. There are no other indexes. (I’d show you the code, but I don’t want to make it too easy to run the code, I want you to try to work it out in your heads).

I run the following pl/sql block.

begin
	--
	--	Going to fail on primary key constraint
	--
	for i in 1..200 loop
		begin
			insert into t1 values(50,i,'x','x');
			commit;
		exception
			when others then null;
		end;
	end loop;

	--
	--	Explicit rollback
	--
	for i in 1..40 loop
		begin
			insert into t1 values(4000 + i,i,'x','x');
			rollback;
		end;
	end loop;

	--
	--	Explicit commit
	--
	for i in 1..50 loop
		begin
			insert into t1 values(4500 + i,i,'x','x');
			commit;
		end;
	end loop;

	--
	--	Rolling back to savepoint
	--
	for i in 1..70 loop
		begin
			savepoint A;
			insert into t1 values(4600 + i,i,'x','x');
			rollback to savepoint A;
		end;
	end loop;

end;
/

If (from another session) you take an AWR (or Statspack) snapshot before and after running the block, what will you see for the statistics

user commits
user rollbacks
transaction rollbacks
rollback changed – undo records applied

and, for a bonus point, what will the value of “Rollback per transaction %” be ?

Note – since the snapshots are global, your results will be affected by other work on the system; in particular I noticed that simply taking two AWR snapshots in a row on the instance I used resulted in 8 user commits.

 

Update

And the answers – cut from an AWR report – are:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
rollback changes - undo records                 420           25.2           4.3
transaction rollbacks                           240            4.7           2.5
user commits                                     58            1.1           0.6
user rollbacks                                   40            0.8           0.4

 Rollback per transaction %:   40.82       Rows per Sort:   122.70

The 40 “user rollbacks” come from the loop with the explicit rollback. At the same time the 40 rollbacks introduced 80 “rollback changes – undo records applied” – one for the table index and one for the table for each call to rollback. Our “user rollbacks” have been “real” rollbacks, so they’ve introduced 40 transaction rollbacks at the same time.

The 50 “user commits” come from the loop with the explicit commit – except that the picture is slightly blurred by the fact that simply running the AWR snapshot introduced a few extra commits.

The 40.82% is 40 / (40 + 58) expressed as a percentage.

The “rollback to savepoint” calls haven’t contributed to the user rollbacks – even though, in this case, they have rolled back 40 transaction starts, and a check of various other statistics would show a TX enqueue being requested and released, and transactions rotating through the undo segments. Despite this clearing of the TX locks, these rollbacks to savepoint don’t count towards transaction rollbacks; however, the 70 “rollback to savepoint” calls have introduced a further 140 “rollback changes – undo records applied” (again, one for the table and one for the index for each call).

Finally the failed attempts to insert duplicate keys have, in these circumstances, introduced 200 transaction rollbacks (though not user rollbacks – we didn’t ask for them explicitly). They are also responsible for the final 200 “rollback changes – undo records applied” that we need to account for. The changes, very specifically, are the changes to the table – Oracle does actually have to insert the row into the table before trying to insert the index entry (after all, the index entry needs to know the rowid), and so it has to reverse that insert when it discovers that the relevant entry already exists in the primary key index. The requirement to insert and then rollback is one of the reasons why it is often better for “batch merge by pl/sql loop” to try an update then insert is sql%rowcount is zero rather than trying to insert, capture the exception, and then update; it’s also a good reason for creating the unique indexes on a table before creating the non-unique indexes – Oracle maintains indexes in the order they were created so you want it to find the duplicate key error as soon as possible to minimise the amount of change and rollback.

If you want to experiment further, there are a couple of slightly different variations of the code that exhibit interesting variations in results. There are also other events that can cause transaction rollbacks to appear without user rollbacks being invoked.

February 15, 2013

Transactions

Filed under: Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 8:07 am UTC Feb 15,2013

It’s very easy to get a lot of information from an AWR (or Statspack) report – provided you remember what all the numbers represent. From time to time I find that someone asks me a question about some statistic and my mind goes completely blank about the exact interpretation; but fortunately it’s always possible to cross check because so many of the statistics are cross-linked. Here’s an example of a brief mental block I ran into a few days ago – I thought I knew the answer, but realised that I wasn’t 100% sure that my memory was correct:

In this Load Profile (for an AWR report of 60.25 minutes), what does that Transactions figure actually represent ?

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:             57,567.09             12,028.39
              Logical reads:             48,043.83             10,038.54
              Block changes:                314.07                 65.62
             Physical reads:                667.70                139.51
            Physical writes:                 46.25                  9.66
                 User calls:                619.33                129.41
                     Parses:                505.67                105.66
                Hard parses:                 36.94                  7.72
                      Sorts:                313.05                 65.41
                     Logons:                  0.56                  0.12
                   Executes:              1,165.42                243.51
               Transactions:                  4.79

  % Blocks changed per Read:    0.65    Recursive Call %:    95.18
 Rollback per transaction %:   24.95       Rows per Sort:    25.09

And, while we’re at it, what does the “Rollback per transaction %” actually mean – and what, if anything, can we infer from the value ?

Since we’re looking at transactions, it’s a fairly safe bet that the figure is something to do with commits – but is it counting only committed transactions, or does it included rolled back transactions [philosophical question - is a change that's never committed really a transaction, since it "never happened" as far as the rest of the world is concerned?]. Fortunately we can look at the Intance Activity Statistics to check.

First, though, lets convert the “Per Second” figures into an absolute value by multiplying by the duration in seconds (3,615) of the report. Allowing for rounding errors we’re looking for a value between ceiling(4.785 *3615) and floor(4.795 * 3615), i.e. between 17,298 and 17,338.  Let’s also keep in mind that the “Rollback per transaction %” is very close to 25%.

Here, then, are a couple of useful figures from the Instance Activity:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
user commits                                 12,986            3.6           0.8
user rollbacks                                4,316            1.2           0.3

Almost immediately you can see that user rollbacks are roughly one third of user commits (I promise I didn’t massage these number), which means the rollback percentage is just (user rollbacks / (user commits + user rollbacks)). As a further sanity check, 4,316 + 12,986 = 17,302 which falls nicely into our required range. Ta-da, job done: “Transactions” is the sum of user commits and user rollbacks (as a first approximation – but is it the whole answer).

Of course, I picked a fairly extreme example from my AWR library to make a point – which means you might now be asking whether all those rollbacks pose some sort of threat. How serious are they ? Luckily there are a couple more statistics that tell us:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
rollback changes - undo records                 222            0.1           0.0
transaction rollbacks                            59            0.0           0.0

The name of the first statistic has been trimmed a little in this text report, the full name is: “rolllback changes – undo records applied”. As you can see, we’ve only applied 222 undo records in our 4,316 rollback calls, so we’re not really rolling anything back (most of the time). Moreoever, the “transaction rollbacks” corroborates this observation – we have only attempted to rollback 59 “real” (data-changing) transactions. Most of those rollbacks are probably the default “rollback after every call” that some web application servers make.

As a closing thought – if you’re responsible for several different systems, it’s convenient to keep a couple of ”reference” AWR or Statspack reports from busy, normal, and quiet periods for each system; that way, if you use the figures from one system to work out the meaning of some derived value you can use the figures from another system to check if your hypothesis is correct.

Footnote: there are two other statistics that include the text “undo records applied” (not that you’ll see those words in the textual AWR report), but they both relate to creating read-consistent copies of blocks.

Footnote 2: Is anyone getting worried by my “first approximation” comment – and has anyone started wondering if there may be more to transaction rollbacks that user rollbacks, and if so where they fit into the arithmetic ?  To be continued.

 

January 7, 2013

Analysing Statspack 13

Filed under: Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 6:44 pm UTC Jan 7,2013

A recent (Jan 2013) post on the OTN database forum reported a performance problem on Oracle 9.2.0.6 (so no AWR), and posted a complete statspack report to one of the public file-sharing sites. It’s been some time since I did a quick run through the highlights of trouble-shooting with statspack, so I’ve picked out a few points from this one to comment on.

As usual, although this specific report is Statspack, the same analysis would have gone into looking at a more modern AWR report, although I will make a couple of comments at the end about the extra material that would have been available by default with the AWR report that would have helped us help the OP.

First, here are the Load Profile, and “Top 5″ parts of the report. To make sense of them, you also need to know that report covered a two-hour interval, and the machine had 12 CPUs; there’s also a little clue in the fact that this is 32-bit Oracle with a 600MB SGA (with the sga_max_size set to 1.6GB) running (alone) on a machine with 12GB of memory.

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:             54,437.02             65,579.53
              Logical reads:          2,053,336.52          2,473,626.15
              Block changes:                383.39                461.86
             Physical reads:             43,270.67             52,127.57
            Physical writes:                 46.63                 56.18
                 User calls:                 35.83                 43.16
                     Parses:                 25.53                 30.76
                Hard parses:                  0.18                  0.21
                      Sorts:                186.25                224.38
                     Logons:                  0.08                  0.10
                   Executes:                108.76                131.02
               Transactions:                  0.83

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time                                                       74,717    84.80
latch free                                        181,788       8,101     9.19
db file scattered read                         47,546,136       2,893     3.28
db file sequential read                        23,092,208       2,006     2.28
buffer busy waits                               1,812,920         193      .22
          -------------------------------------------------------------

Working from the top down, the first thing we notice is that 2M logical I/Os per second is quite busy, and that 43, 000 blocks read per second is pretty amazing – especially since the number of user calls and executes isn’t particularly large.

A cross check to the Top 5 then shows that a huge fraction of the work done by the database is CPU. At 2 hours and 12 cores we have 12 * 2 * 3600 = 86,400 CPU seconds available, and we’ve used 74,700 of them – and that’s into Cary Millsap’s “knee of the curve” for queueing problems. Since this isn’t the AWR, of course, we don’t have the OS Stats available, so we can’t see if the rest of the machines CPU is being used up outside the database – but looking at the db file read figures I’ll bet it is. If you have extreme CPU usage in an Oracle system, then latch contention problems are very likely to show up – and that’s our next highest figure – AWR would probably tells us which latch but, again based on the db file read figures, I’ll guess it’s the cache buffers chains latches.

So we look at the db file read figures and do the arithmetic that the AWR would show us to work out that we’re seeing multiblock read times of 0.06 milliseconds, and single block read times of 0.08 milliseconds. These have to be coming from the file system cache to be this fast. (Even then I think we may also seeing a timing problem (perhaps a measurement bug) that’s managing to lose wait time.) First thoughts, then – are that there is some code that is so inefficient that it’s doing far more physical I/O than it should; on the plus side the excess memory (12GB) means we’re being protected from the worst effects of disk I/O, but the caching effect means we’re able to burn up CPU at an astonishing rate to do the work.

So let’s look at the SQL to see where the CPU and disk activity is likely to be going. Here’s extract from the SQL ordered by Gets, and SQL Ordered by Reads. (We don’t have SQL ordered by Time in Statspack, of course):

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
  5,636,117,159           28  201,289,898.5   38.1 ######## #########  835260576
Module: RAXTRX
 INSERT INTO RA_INTERFACE_ERRORS  (INTERFACE_LINE_ID,   MESSAGE_
TEXT,   INVALID_VALUE) SELECT  INTERFACE_LINE_ID, :b_err_msg6, '
trx_number='||T.TRX_NUMBER||','||'customer_trx_id='||TL.CUSTOMER
_TRX_ID FROM RA_INTERFACE_LINES_GT IL, RA_CUSTOMER_TRX_LINES TL,
 RA_CUSTOMER_TRX T WHERE  IL.REQUEST_ID = :b1 AND    IL.INTERFAC

    511,626,129           20   25,581,306.5    3.5  8155.90  15045.56  205460129
Module: JDBC Thin Client
select  TRANSACTION_BRANCH , BRANCH_ADDRESS_1 , BRANCH_ADDRESS_2
 , BRANCH_PHONES , BRANCH_FAX , BRANCH_TIN , TRX_NUMBER , TRX_NU
MBER_DISP , TRX_DATE , TRX_DATE_DISP , BILL_TO_CUSTOMER_ID , BIL
L_TO_CUSTOMER_NAME , SHIP_DATE_ACTUAL_DISP , SHIP_TO_NAME , SHIP
_TO_ADDRESS1 , SHIP_TO_ADDRESS2 , SHIP_TO_ADDRESS3 , SHIP_TO_ADD

                                                     CPU      Elapsd
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
    281,953,754           20   14,097,687.7   90.5  8155.90  15045.56  205460129
Module: JDBC Thin Client
select  TRANSACTION_BRANCH , BRANCH_ADDRESS_1 , BRANCH_ADDRESS_2
 , BRANCH_PHONES , BRANCH_FAX , BRANCH_TIN , TRX_NUMBER , TRX_NU
MBER_DISP , TRX_DATE , TRX_DATE_DISP , BILL_TO_CUSTOMER_ID , BIL
L_TO_CUSTOMER_NAME , SHIP_DATE_ACTUAL_DISP , SHIP_TO_NAME , SHIP
_TO_ADDRESS1 , SHIP_TO_ADDRESS2 , SHIP_TO_ADDRESS3 , SHIP_TO_ADD

     94,281,089        8,550       11,027.0   30.3  2924.20   4550.22 3761220362
Module: JDBC Thin Client
SELECT DELV_ADDR.ADDRESS1 FROM APPS.OE_ORDER_HEADERS_ALL OE_HEAD
ERS , APPS.RA_SITE_USES_ALL DELV_SITE , APPS.RA_ADDRESSES_ALL DE
LV_ADDR WHERE TO_CHAR(ORDER_NUMBER) = :B1 AND DELV_SITE.SITE_USE
_ID(+) = OE_HEADERS.DELIVER_TO_ORG_ID AND DELV_ADDR.ADDRESS_ID(+
) = DELV_SITE.ADDRESS_ID

     94,011,901        8,544       11,003.3   30.2  2919.47   4527.54  803969757
Module: JDBC Thin Client
SELECT COALESCE(DELV_ADDR.CITY, DELV_ADDR.PROVINCE) FROM APPS.OE
_ORDER_HEADERS_ALL OE_HEADERS , APPS.RA_SITE_USES_ALL DELV_SITE
, APPS.RA_ADDRESSES_ALL DELV_ADDR WHERE TO_CHAR(ORDER_NUMBER) =
:B1 AND DELV_SITE.SITE_USE_ID(+) = OE_HEADERS.DELIVER_TO_ORG_ID
AND DELV_ADDR.ADDRESS_ID(+) = DELV_SITE.ADDRESS_ID

     93,981,254        8,551       10,990.7   30.2  2919.13   4533.69 4273350236
Module: JDBC Thin Client
SELECT DELV_ADDR.ADDRESS2 FROM APPS.OE_ORDER_HEADERS_ALL OE_HEAD
ERS , APPS.RA_SITE_USES_ALL DELV_SITE , APPS.RA_ADDRESSES_ALL DE
LV_ADDR WHERE TO_CHAR(ORDER_NUMBER) = :B1 AND DELV_SITE.SITE_USE
_ID(+) = OE_HEADERS.DELIVER_TO_ORG_ID AND DELV_ADDR.ADDRESS_ID(+
) = DELV_SITE.ADDRESS_ID

The first statement reported by Gets is responsible for 500 millions buffer gets in the interval (and even then that’s 40% of the total – so there may be a couple more big ones that have fallen out of the library cache in the interim, perhap an hourly report on the previous hour would catch them). Look at the table that the SQL is inserting into – RA_INTERFACT_ERRORS. We might hope that an error table wasn’t going to receive many errors – so perhaps this is just an example of a statement with an insanely bad execution plan; clearly one to look at. Trouble is, though, that unless you set statspack to run at level 6 you don’t capture the execution plans – unlike the AWR which capture them by default. The batch took a couple of days to complete, though, so it would have been possible either to change the snapshot level, or to pull the execution plans from memory.

Looking at the SQL ordered by Reads, we see another example of the top statement being much more expensive than anything following it – but there’s an odd little pattern that caught my eye as I scanned the list. The next three statements look very similar to each other, and have similar statistics, and if you add up their stats the answer is very close to the figures for the first statement. Moreover, the report says that the top statement is responsible for 90% of the work – I think the top statement is actually calling the next three, so we need to look at those statements to see how to make them efficient in order to fix the first statement.

You might note, by the way, the capitals and the bind variables (:B1) in the next three statements – I suspect that there is a PL/SQL function in the first statement that is being called too many times. (20 executions of the first statement, 8,500 of the other three). Perhaps a change in the execution plan of the main query has produced a change in table order which has resulted in some predicate – one with a pl/sql function in it – being tested much more frequently than usual.

A quick cross-reference to the instance activity is in order:

physical reads                           311,462,252       43,270.7     52,127.6
prefetched blocks                        240,396,452       33,397.7     40,233.7

table fetch by rowid                  12,348,226,523    1,715,508.0  2,066,648.8
table fetch continued row             12,047,845,308    1,673,776.8  2,016,375.8

table scan blocks gotten                 361,692,971       50,249.1     60,534.4
table scan rows gotten                 7,787,418,493    1,081,886.4  1,303,333.6
table scans (long tables)                     32,041            4.5          5.4

A huge fraction of the physical reads seem to be for prefetched blocks – that a good indication that they are for tablescans (or index fast full scans), and when we compare rowid access with tablescan access we see that there are a lot of “long” tablescans (maybe the expensive three queries are all doing tablescans of a long table, that would account for 25,500 of those reported), with with 360M blocks scanned. Apart from anything else, all those block scans will be using a big chunk of CPU. The AWR would show us which segments were being scanned, of course, as would statspack if we were running it at level 7.

I’ve included the table fetch by rowid and table fetch continued row. Normally we should be concerned by “continued fetches” – but in this case I’m prepared to overlook them temporarily: it’s possible that after the critical SQL is corrected we may still see that most of the fetches by rowid turn into continued fetches, and at that point it will be worth worrying; but at present, even though they’re probably responsible for a serious fraction of the CPU usage, those continued fetches may be for rows that we’re not going to be fetching once the SQL is doing the right thing.

The last little detail, just for confirmation, comes from the latch activity part of the report; I’ve just selected the busiest 4 latches:


                                           Pct    Avg   Wait                 Pct
                              Get          Get   Slps   Time       NoWait NoWait
Latch                       Requests      Miss  /Miss    (s)     Requests   Miss
------------------------ -------------- ------ ------ ------ ------------ ------
cache buffers chains     ##############    0.0    0.0      0  577,860,644    0.1
cache buffers lru chain         157,333    0.3    0.0      0  687,799,033    0.8
multiblock read objects     114,602,621    0.6    0.0     12            0
simulator hash latch        811,998,194    0.0    0.0      2            0

                                      Get                            Spin &
Latch Name                       Requests      Misses      Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
cache buffers chains       ##############  12,262,452     180,671 0/0/0/0/0
multiblock read objects       114,602,621     719,477         115 719369/101/7/0/0
simulator hash latch          811,998,194       3,889          19 3872/15/2/0/

As you can see, and as we had guessed, they’re all about the buffer cache – doing multiblock reads and loading/dumping blocks from the cache; and almost all of the sleeps are on the cache buffers chains.

Next Steps

As a quick and dirty which might help a bit – put more of the sga_max_size into the buffer cache – it doesn’t happen automatically in 9.2 – and that may reduce the interaction between the O/S and Oracle and save a bit of latching and CPU.

Address the three statements doing all the physical I/O, and check the path for the statement that cause them to be executed. Fix the statement that has done 40% of the buffer gets. In all cases the problem may simply be an execution plan change – but perhaps the machine has been hiding an inherent performance problem and on this occasion a small change in data (a few more errors, perhaps) resulted in the excess work generating a massive increase in workload and contention.

We do know, however, that there is a lot of work done on buffer gets that didn’t show up in this report – so run off the report for the previous hour and see if we can find a couple more big hitters. Once we’ve sorted out the worst SQL, we might then want to see if the volume of “continued rows” is high – and find out why they’re appearing. It’s possible that it’s due to a table with more than 254 columns, of course, or a long column, but we’ll worry about that only when we need to.

One final thought – it’s easy to fix the wrong thing when you’re looking at statspack reports. Although we can see some SQL that is “obviously” overloading the machine, maybe the business critical task that has to complete promptly is simply hidden because it does (relatively) little work. Unless we can fix the big problems extremely quickly, we ought to take a little time to check with the business and the developers that what we’re looking at really is the code that’s running the critical business task.

Longer term, of course, upgrade to 9.2.0.8 at least and then start running Statspack at level 7 for better information. Then upgrade some more.

November 13, 2011

Irrational Ratios

Filed under: Oracle,Ratios,Statspack — Jonathan Lewis @ 11:23 am UTC Nov 13,2011

I’ve pointed out in the past how bad the Instance Efficiency ratios are in highlighting a performance problem. Here’s a recent example from OTN repeating the point. The question, paraphrased, was:

After going through AWR reports (Instance Efficiency Percentages) I observed they have low Execute to Parse % but high Soft Parse %.
Please share if you had faced such issue and any suggestions to solve this

(more…)

September 26, 2011

Upgrade Argh

Filed under: Infrastructure,Oracle,Statspack,Troubleshooting,Upgrades — Jonathan Lewis @ 4:30 pm UTC Sep 26,2011

Time for another of those little surprises that catch you out after the upgrade.
Take a look at this “Top N” from a standard AWR report, from an instance running 11.2.0.2

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
direct path read                  3,464,056       6,593      2   33.5 User I/O
DB CPU                                            3,503          17.8
db flash cache single block ph    2,293,604       3,008      1   15.3 User I/O
db file sequential read             200,779       2,294     11   11.6 User I/O
enq: TC - contention                     82       1,571  19158    8.0 Other

(more…)

March 9, 2011

Statspack Reports

Filed under: Statspack,Troubleshooting — Jonathan Lewis @ 6:29 pm UTC Mar 9,2011

A couple of weeks ago I listed a number of scripts from 11.2.0.2 relating to AWR reports – it seems only sensible to publish a corresponding list for Statspack. In fact, there are two such lists – one for “traditional” Statspack, and one for “standby statspack” – a version of statspack you can pre-install so that you can run statspack reports against a standby database.

Statspack – the basic source of information is $ORACLE_HOME/rdbms/admin/spdoc.txt

spreport.sql    ->  Generates a Statspack Instance report
sprepins.sql    ->  Generates a Statspack Instance report for the database and instance specified
sprepsql.sql    ->  Generates a Statspack SQL report for the SQL Hash Value specified
sprsqins.sql    ->  Generates a Statspack SQL report for the SQL Hash Value specified, for the database and instance specified

sppurge.sql     ->  Purges a limited range of Snapshot Id's for a given database instance
sptrunc.sql     ->  Truncates all Performance data in Statspack tables
spuexp.par      ->  An export parameter file supplied for exporting the whole PERFSTAT user

Standby Statspack – the basic source of information is $ORACLE_HOME/rdbms/admin/sbdoc.txt

sbreport.sql      - Create a report

sbaddins.sql      - Add a standby database instance to the configuration
sblisins.sql      - List instances in the standby configuration
sbdelins.sql      - Delete an instance from the standby configuration

sbpurge.sql       - Purge a set of snapshots

A warning note about standby statspack – there is a significant structural change from 11.2.0.1 to 11.2.0.2 allowing support of multiple standby databases. I haven’t looked closely at it yet, but it did cross my mind that it might be possible to install the 11.2.0.2 version in an 11.2.0.1 database to avoid late upgrade issues – I may get around to testing the idea one day.

February 25, 2011

AWR Snapshots

Filed under: Statspack,Troubleshooting — Jonathan Lewis @ 6:32 pm UTC Feb 25,2011

A couple of days ago I mentioned as a passing comment that you could take AWR snapshots at the start and end of the overnight batch processing so that if you ever had to answer the question: “Why did last night’s batch overrun by two hours?” you had the option of creating and comparing the AWR report from the latest batch run with the report from a previous batch run (perhaps the corresponding night the previous week) and check for any significant differences. Moreover, Oracle supplies you with the code to compare and report such differences from 10.2 (at least) using the script $ORACLE_HOME/rdbms/admin/awrddrpt.sql

The thing I didn’t mention at the time was how to take a snapshot on demand. It’s very easy if you have the appropriate execute privilege on package dbms_workload_repository.

execute dbms_workload_repository.create_snapshot('TYPICAL');

The single input parameter can be ‘TYPICAL’ (the default) or ‘ALL’.

I keep this one liner in a little script called awr_launch.sql – because I can never remember the exact name of the package without looking it up. (And sometimes I run it with sql_trace enabled so that I can see how much work goes into a snapshot as we change versions, features, configuration, workload and platforms.)

February 23, 2011

AWR Reports

Filed under: AWR,Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 6:51 pm UTC Feb 23,2011

A little thought for those who use OEM to generate AWR reports – if you generate the standard HTML form and then try posting it into (say) the OTN database forum when you want some help interpreting it, it won’t look very readable (even if you do remember to put the right tags before and after to reproduce it in fixed font).
(more…)

January 14, 2011

Statspack on RAC

Filed under: Statspack — Jonathan Lewis @ 6:46 pm UTC Jan 14,2011

Some time ago I was on a client site which was busy setting up a RAC cluster using Oracle 10g. Although I wasn’t involved with that particular system there were a couple of coffee-breaks where I ended up chatting with the consultant that the client had hired to install the cluster. In one of our breaks he surprised me by making the casual comment: “Of course, you can’t run Statspack on RAC because it locks up the system.”
(more…)

April 14, 2010

Analysing Statspack 12

Filed under: AWR,Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 6:54 pm UTC Apr 14,2010

[Further Reading on Statspack]

Part 12 is about a 2-node RAC system.  Someone recently posted links to a matched pair of AWR reports in this thread on the OTN Database General forum and, after I had made a couple of quick comments on them, gave me permission to use them on my blog.
(more…)

March 21, 2010

Nutshell – 2

Filed under: Infrastructure,Statspack — Jonathan Lewis @ 8:16 pm UTC Mar 21,2010

There is a lot of confusion around about the significance of the statistic “parse calls”. The important thing to remember is that it is simply counting a certain type of call from the OCI library – the amount of work done by a parse call may vary enormously depending upon circumstances, and sometimes the amount of work is so tiny that it’s not worth worrying about.

“parse call” may:

a) Have to optimise the statement because it failed to find it after searching the library cache

b) Find the statement after searching the library cache, and still have to optimise it for various reasons, e.g. the previous plan has been flushed from memory or the same text applies to different objects depending on who is executing it.

c) Find the statement after searching the library cache and not have to optimise it because the plan is still available and the user has the appropriate privileges.

d) Operate through the session cursor cache or pl/sql cursor cache allowing it to use a short cut to the statement’s location in the library cache without having to search the cache.

When the Oracle increments the counter for “parse calls”  you still have to work out whether that call turned into (a), (b), (c) or (d).

Just to confuse the issue, Oracle may also record a “parse count (hard)” without recording a “parse call”.

[Back to Nutshell -1 (Redo and Undo)]

Update Jan 2011

Randolf Geist has been looking at adaptive cursor sharing, and has noted that the parse call – including parse calls that go through the session cursor cache – seems to be the point in the code where adaptive cursor sharing can take place: in other words, it’s not an event that gets triggered or flagged by executions.

September 10, 2009

Philosophy – 5

Filed under: Philosophy,Statspack,Troubleshooting — Jonathan Lewis @ 6:31 pm UTC Sep 10,2009

Trouble-shooting with Statspack / AWR:

Something has to be top of the “Top 5 Timed Waits” … even when there are no performance problems.

[The Philosophy Series]
[Further Reading on Statspack]

July 10, 2009

Statspack Skills

Filed under: Performance,Statspack,Troubleshooting — Jonathan Lewis @ 7:28 pm UTC Jul 10,2009

I had a great time a couple of weeks ago at the  UKOUG meeting of the DBMS SIG (reported here by Coskan Gundogar). The range of presentations was good and I had a number of interesting conversations.

Of course, the exciting part for me was sitting down with a batch of Statspack and AWR reports that I had been supplied with in the previous couple of days and doing a “real-time” analysis of them.

(more…)

May 26, 2009

CPU used

Filed under: Statspack,Troubleshooting — Jonathan Lewis @ 11:31 am UTC May 26,2009

[Further Reading on Statspack]

From time to time users of Statspack on the newer versions of Oracle are surprised to see the “CPU Time” in the “Top N Timed Events” section of the report looking very different from the “BUSY_TIME” that appears in the “OS Statistics” part of the report.

There are various reasons why the numbers can differ, but one of the reasons is simple and highly beneficial – prior to 10g Oracle usually updated time figures at the end of each database call; but from 10g there are some views where time is updated more regularly.

(more…)

January 18, 2009

Analysing Statspack (11)

Filed under: Statspack — Jonathan Lewis @ 11:38 am UTC Jan 18,2009

[Further Reading on Statspack]

Here’s an interesting example of a Statspack report that appeared recently on the OTN Database General Forum. It’s from a user who wants to know why an update to seg$ appears in second place in the “SQL ordered by …” sections of the report. I’ve set the link to open in a new window so that you can read the report and my comments at the same time.

(more…)

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,437 other followers