Oracle Scratchpad

April 4, 2013

Delphix Overview

Filed under: Uncategorized — Jonathan Lewis @ 9:04 pm UTC Apr 4,2013

Update: Here’s the link to the recording of the webinar

I’ll b online tomorrow morning (Friday 5th, 9:00 Pacific time, 5:00 pm UK) in a webinar with Kyle Hailey to talk about my first impressions of Delphix, so I thought I’d write up a few notes beforehand.

I’ve actually installed a complete working environment on my laptop to model a production setup. This means I’ve got three virtual machines running under VMWare: my “production” machine (running Oracle 11.2.0.2 on OEL 5, 64-bit), a “development” machine (which has the 11.2.0.2 software installed, again on OEL 5, 64-bit), and a machine which I specified as Open Solaris 10, 64-bit for the Delphix server VM (pre-release bloggers’ version). The two Linux servers are running with 2.5GB of RAM, the Delphix server is running with 8GB RAM, and all three machines are running 2 virtual CPUs. (My laptop has an Intel quad core i7, running two threads per CPU, 16GB RAM, and 2 drives of 500GB each.) The Linux machines were simply clones of another virtual machine I previously prepared and the purpose of the exercise was simply to see how easy it would be to “wheel in” a Delphix server and stick it in the middle. The answer is: “pretty simple”. (At some stage I’ll be writing up a few notes about some of the experiments I’ve done on that setup.)

To get things working I had to create a couple of UNIX accounts for a “delphix” user on the Linux machines, install some software, give a few O/S privileges to the user (mainly to allow it to read and write a couple of Oracle directories), and a few Oracle privileges. The required Oracle privileges vary slightly with the version of Oracle and your prefered method of operation, but basically the delphix user needs to be able to run rman, execute a couple of Oracle packages, and query some of the dynamic performance views. I didn’t have any difficulty with the setup, and didn’t see any threats in the privileges that I had to give to the delphix user. The last step was simply to configure the Delphix server to give it some information about the Linux machines and accounts that it was going to have access to.

The key features about the Delphix server are that it uses a custom file system (DxFS, which is based on ZFS with a number of extensions and enhancements) and it exposes files to client machines through NFS; and there are two major components to the software that make the whole Delphix package very clever.

Oracle-related mechanisms

At the Oracle level, the Delphix server sends calls to the production database server to take rman backups (initially a full backup, then incremental backups “from SCN”); between backup requests it also pulls the archived redo logs from the production server – or can even be configured to copy the latest entries from the online redo logs a few seconds after they’ve been written (which is one of the reasons for requiring privileges to query some of the dynamic performance views, but the feature does depend on the Oracle version).

If you want to make a copy of the database available, you can use the GUI interface on the Delphix server to pick a target machine, invent a SID, and Service name, and pick an SCN (or approximate timetamp) that you want to database to start from, and within a few minutes the Delphix server will have combined all the necessary backup pieces, applied any relevant redo, and configured your target machine to start up an instance that can use the (NFS-mounted) database that now exists on the Delphix server. I’ll explain in a little while why this is a lot cleverer than a simple rman “restore and recover”.

DxFS

Supporting the Oracle-related features, the other key component of the Delphix server is the Delphix file-system (DxFS). I wrote a little note a few days ago to describe how Oracle can handle “partial” updates to LOB values – the LOB exists in chunks with an index on (lob_id, chunk_number) that allows you to pick the right chunks in order. When you update a chunk in the LOB Oracle doesn’t really update the chunk, it creates a new chunk and modifies the index to point at it. If another session has a query running that should see the old chunk, though, Oracle can read the index “as at SCN” (i.e. it creates a read consistent copy of the required index blocks) and the read-consistent index will automatically be pointing at the correct version of the LOB chunk. DxFS does the same sort of thing – when a user “modifies” a file system block DxFS doesn’t overwrite the original copy, it writes a new copy to wherever there’s some free space and maintains some “indexing” metadata that tells it where all the pieces are. But if you never tell the file system to release the old block you can ask to see the file as at a previous point in time at no extra cost!

But DxFs is even cleverer than that because (in a strange imitation of the “many worlds” interpretation of quantum theory) a single file can have many different futures. Different users can be identified as working in different “contexts” and the context is part of the metadata describing the location of blocks that belong to the file. Imagine we have a file with 10 blocks sitting on DxFs - in your context you modify blocks 1,2 and 3 but at the same time I modify blocks 1,2 and 3 in my context. Under DxFS there are now 16 blocks associated with that file – the original 10, your three modified blocks and my three modified blocks and, depending on timestamp and context, someone else could ask to see any one of three different versions of that file – the original version, your version, or my version.

Now think of that in an Oracle context. If we copy an entire set of database files onto DxFS, then NFS-mount the files on a machine with Oracle installed, we can configure and start up an instance to use those files. At the same time we could NFS-mount the files on another machine, configuring and starting another instance to use the same data files at the same time! Any blocks changed by the first instance would be written to disc as private copies, any blocks changed by the second instance would be written to discs as private copies – if both instances managed to change 1% of the data in the course of the day then DxFs would end up holding 102% of the starting volume of data: the original datafiles plus the two sets changed blocks – but each instance would think it was the sole user of its version of the files.

There’s another nice (database-oriented) feature to Delphix, though. The file system has built-in compression that operates at the “block” level. You can specify what you mean by the block size (and for many Oracle sites that would be 8KB) and the file system would transparently apply a data compression algorithm on that block boundary. So when the database writer writes an 8KB block to disc, the actual disc space used might be significantly less than 8KB, perhaps by a factor of 2 to 3. So in my previous example, not only could you get two test databases for the space of 1 and a bit – you might get two test databases for the space of 40% or less of the original database.

Delphix vs. rman

I suggested earlier on that Delphix can be a lot clever than an rman restore and recover. If you take a full backup to Delphix on Sunday, and a daily incremental backup (let’s preted that’s 1% of the database per day) for the week, then Delphix can superimpose each incremental onto the full backup as it arrives. So on Monday we construct the equivalent of a full Monday backup, on Tuesday we construct the equivalent of a full Tuesday backup, and so on. But since DxFS keeps all the old copies of blocks this means two things that we can point an instance at a full backup for ANY day of the week simply by passing a suitable “timestamp” to DxFs – and we’ve 7 full backups for the space of 107% of a single full backup.

There are lots more to say, but I think they will have to wait for tomorrow’s conversation with Kyle, and for a couple more articles.

Register of Interests / Disclosure

Delphix Corp. paid my consultancy rates and expenses for a visit to the office in Menlo Park to review their product.

March 31, 2013

Index Selectivity

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 6:42 pm UTC Mar 31,2013

Here’s a summary of a recent posting on OTN:

I have two indexes (REFNO, REFTYPESEQNO) and (REFNO,TMSTAMP,REFTYPESEQNO). When I run the following query the optimizer uses the second index rather than the first index – which is an exact match for the predicates, unless I hint it otherwise:

select *
from   RefTable
where  RefTypeSeqNo = :1
and    RefNo = :2;

Default plan:
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     3 |   126 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REFTABLE      |     3 |   126 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | REFTABLE_CX03 |     3 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))
       filter("REFTYPESEQNO"=TO_NUMBER(:1))

Hinted plan:
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     3 |   126 |    15   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REFTABLE      |     3 |   126 |    15   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | REFTABLE_CX02 |    14 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))

This is actually an example of a feature of the optimizer that I described a few years ago. The original note described a change as you moved from 10.1 to 10.2 and on to 11.1 – but once you’ve seen the basic issue there are a number of variations on how it might appear. In this case the OP seems to be using 10gR2, where the distinct_keys value from an index is used to calculate the cost and row estimate for the index and for the table access cost when that specific indexed access path is being considered.

So with the hint to use the accurate index we see an index cardinality estimate of 14 rows with a table cardinality of 3 despite the fact that the plan shows no extra predicates applied at the table; the cost of accessing the table is also clearly related to the cardinality estimate on the index line.

In the default plan when the wrong index is used, the optimizer doesn’t pay any attention to the distinct_keys from the other index, and simply uses the standard “product of column selectivities”.

11g introduces two changes – when calculating the table cardinality the distinct_keys value for the index is carried forward (so the plan with the high index cardinality but low table cardinality would report the same cardinality for both operations), and the distinct_keys from the first index would be used when doing the calculations for the second index – which would increase the cost of using the wrong index.

There’s really very little you can do to find a strategic fix for this type of problem in 10g – obviously you could add hints whenever Oracle used the wrong index, but that’s not reallya desirable approach, and it is possible to adjust column statistics in such a way that the calculations the optimizer uses give better approximations, but that’s not always very easy to do well. Ultimately you just have to be very careful about your choice of indexes – and when you think that two indexes show a significant overlap in columns consider the possibility that one carefully defined index may be able to do the job of both.

 

 

March 29, 2013

Missing SQL

Filed under: Uncategorized — Jonathan Lewis @ 9:35 am UTC Mar 29,2013

From time to time I’ve looked at an AWR report and pointed out to the owner the difference in work load visible in the “SQL ordered by” sections of the report when they compare the summary figure with the sum of the work done by the individual statements. Often the summary will state that the captured SQL in the interval represents some percentage of the total workload  in the high 80s to mid 90s – sometimes you might see a statement that the capture represents a really low percentage, perhaps in the 30s or 40s.

You have to be a little sensible about interpreting these figures, of course – at one extreme it’s easy to double-count the cost of SQL inside PL/SQL, at the other you may notice that every single statement reported does about the same amount of work so you can’t extrapolate from a pattern to decide how significant a low percentage might be. Nevertheless I have seen examples of AWR reports where I’ve felt justified in suggesting that at some point in the interval some SQL has appeared, worked very hard, and disappeared from the library cache before the AWR managed to capture it.

Now, from Nigel Noble, comes another explanation for why the AWR report might be hiding expensive SQL – a bug, which doesn’t get fixed until 12.2 (although there are backports in hand).

 

 

March 27, 2013

Open Cursors

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:13 pm UTC Mar 27,2013

Here’s a little detail that appeared in 11gR2 that may help you answer questions about open cursors. Oracle has added a “cursor type” column to the view v$open_cursor, so you can now see which cursors have been held open because of the pl/sql cursor cache, which have been held by the session cursor cache, and various other reasons why Oracle may take a short-cut when you fire a piece of SQL at it.

The following is the output showing the state of a particular session just after it has started up in SQL*Plus and called a PL/SQL procedure to run a simple count:

select
        cursor_type, sql_text
from
        V$open_cursor
where
        sid = 17
order by
        cursor_type,
        sql_text
;

CURSOR_TYPE                      SQL_TEXT
-------------------------------- ------------------------------------------------------------
DICTIONARY LOOKUP CURSOR CACHED  BEGIN DBMS_APPLICATION_INFO.SET_MODULE(:1,NULL); END;
                                 BEGIN DBMS_OUTPUT.DISABLE; END;
                                 BEGIN DBMS_OUTPUT.ENABLE(1000000); END;
                                 BEGIN dbms_random.seed(0); END;
                                 SELECT ATTRIBUTE,SCOPE,NUMERIC_VALUE,CHAR_VALUE,DATE_VALUE F
                                 SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS WHERE   (UPPER('
                                 SELECT USER FROM DUAL
                                 select /*+ connect_by_filtering */ privilege#,level from sys
                                 select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('U
                                 select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECON
                                 select metadata from kopm$  where name='DB_FDO'
                                 select privilege# from sysauth$ where (grantee#=:1 or grante
                                 select to_char(sysdate,'hh24miss') time_now from dual
                                 select value$ from props$ where name = 'GLOBAL_DB_NAME'

OPEN                             BEGIN spin_1; END;
                                 table_1_ff_208_0_0_0

OPEN-RECURSIVE                   insert into sys.aud$( sessionid,entryid,statement,ntimestamp

PL/SQL CURSOR CACHED             SELECT COUNT(*) X FROM KILL_CPU CONNECT BY N > PRIOR N START

SESSION CURSOR CACHED            BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
                                 SELECT DECODE('A','A','1','2') FROM DUAL

Variations are left to the user.
There are a few other cursor types – here’s the list given in the 11.2 Server Reference manual under the definition of v$open_cursor:

  • BUNDLE DICTIONARY LOOKUP CACHED
  • CONSTRAINTS CURSOR CACHED
  • DICTIONARY LOOKUP CURSOR CACHED
  • OPEN
  • OPEN-PL/SQL
  • OPEN-RECURSIVE
  • PL/SQL CURSOR CACHED
  • REPLICATION TRIGGER CURSOR CACHED
  • SESSION CURSOR CACHED

It’s an interesting exercise to consider why there are so many types, and then create some tests to confirm or refute your hypothesis. I haven’t checked, but here are a few ideas:

  • replication trigger cursor cached: I don’t remember which version introduced the change, but once upon a time the triggers updating the materialized view logs were real after insert/update/delete triggers, but now they’re “pre-compiled” – so it’s not surprising they form a special case.
  • dictionary lookup cursor cached: are these, perhaps, the statements that are currently cached in the “_row_cache_cursors” cache for data dictionary access; the parameter was once set to 10, but currently defaults to 20.
  • bundle dictionary lookup cached: why would there be a special case of dictionary lookup ? perhaps this is the set of cursors needed to read the first few tables in the data dictionary that allow the optimizer to do its work (how do you optimize a query against tab$ if you need to query syn$, obj$ and tab$ to discover that tab$ is a table ?)
  • constraints cursor cached: probably something to do with the SQL (internal, or externalised) that Oracle has to run to check or implement details of referential integrity constraints.

Footnote (28th March):

By a strange coincidence a note came up on OTN today that pointed to a different version of the Oracle manual where the possible cursor types are listed under their “internal” names – but I’m not sure if there’s a version of Oracle where you’d see them looking like this:

  • CACHED
  • KNT CACHED
  • KQD BUNDLE CACHED
  • KQD CACHED
  • KXCC CACHED
  • PL/SQL
  • PL/SQL CACHED
  • SYSTEM

March 22, 2013

LOB Update

Filed under: Infrastructure,LOBs,Oracle — Jonathan Lewis @ 10:36 pm UTC Mar 22,2013

This note is about a feature of LOBs that I first desribed in “Practial Oracle 8i” but have yet to see used in real life. It’s a description of how efficient Oracle can be, which I’ll start with a description of, and selection from, a table:

create table test_lobs (
	id              number(5),
	bytes           number(38),
	text_content    clob
)
lob (text_content) store as text_lob(
	disable storage in row
	cache
)
;

-- insert a row

SQL> desc test_lobs
 Name                    Null?    Type
 ----------------------- -------- ----------------
 ID                               NUMBER(5)
 BYTES                            NUMBER(38)
 TEXT_CONTENT                     CLOB

SQL> select id, bytes, dbms_lob.getlength(text_content) from test_lobs;

        ID      BYTES DBMS_LOB.GETLENGTH(TEXT_CONTENT)
---------- ---------- --------------------------------
         1     365025                           365025

1 row selected.

I’ve got a table with a single CLOB column holding a single row. The size of the single CLOB is roughly 365KB (or about 45 blocks of 8KB). Old hands who have had to suffer LONG columns will recognise the trick of recording the size of a LONG as a separate column in the table; it’s a strategy that isn’t really necessary with LOBs but old coding habits die hard. It’s quite hard to find details of how much space has been used in a LOB segment (the space_usage procedure in the dbms_space package doesn’t allow you to examine LOBSEGMENTs), but I did a coupld of block dumps to check on this LOBSEGMENT and it had allocated 46 blocks on the first insert.

So here’s the clever bit – how big will the LOBSEGMENT grow when I update that one CLOB ?

It’s common knowledge (to users of LOBs) that the undo mechanism Oracle has for LOBs is simply to leave the old LOB in place and create a new one – so the intial response to the question might be to guess that the LOBSEGMENT will grow to roughly double the size. But it doesn’t have to be like that, at least, not if you update the LOB the way I happen to have done, which is like this:

declare

	m_length	integer;
	m_lob		clob;

begin

	select
		text_content,
		dbms_lob.getlength(text_content)
	into	m_lob, m_length
	from
		test_lobs
	where
		id = 1
	for update
	;

	dbms_output.put_line('Lob size: ' || m_length);

	dbms_lob.write(
		lob_loc	=> m_lob,
		amount	=> 17,
		offset	=> 1,
		buffer	=> 'This is an update'
	);

	commit;

end;
/

My code very specifically changes only the first 17 bytes of the LOB. So how much does Oracle have to do to effect this change ? The LOB-handling mechanisms are smart enough to work out that only the first (of 45) blocks in the LOB need to be changed, so Oracle need only add one block to the segment and write the new version of the first LOB block to that one block. (In fact the segment – which was in a tablespace using freelist management – grew by the “standard” 5 blocks from which Oracle selected just one block to add to the LOB.)

So how does Oracle keep track of the whole LOB if it can change it one piece at a time ? This is where the (notionally invisible and you don’t need to know about it) LOBINDEX comes into play. Oracle maintains an index keyed by (LOB_ID, chunk_number) *** pointing to all the chunks of a LOB in order, so when you update a single chunk Oracle simply creates an updated copy of the chunk and changes the appropriate index entry to point to the new chunk. So here’s an image representing our one LOB value just after we’ve created it and before we’ve updated:

lob_1

And then we “modify” the first chunk – which means we have to add a chunk (which in this case is a single block) to the segment, create a new version of the first chunk, modify the index to point to the new block, and add an index entr – keyed by time-stamp – to the end of the index to point to the old chunk; something like this:

lob_2

Now, when we run a query to select the LOB, Oracle will follow the index entries in order and pick up the new chunk from the end of the LOBSEGMENT. But the LOBINDEX is protected by undo in the standard fashion, so if another long-running query that started before our update needs to see the old version of the LOB it will create a read-consistent copy of the relevant index leaf block- which means that from its perspective the index will automatically be pointing to the correct LOB chunk.

The index is actually quite an odd one because it serves two functions; apart from pointing to current lobs by chunk number, it also points to “previous” chunks by timestamp (specifically the number of seconds between Midnight of 1st Jan 1970 and the time at which the chunk was “overwritten”). This makes it easy for Oracle to deal with the retention interval for LOBs – any time it needs space in the LOBSEGMENT it need only find the minimum timestamp value in the index and compare it with “sysdate – retention” to see if there are any chunks available for re-use.

To sum up – when you update LOBs, and it’s most beneficial if you have an application which doees piece-wise updates, you leave a trail of old chunks in  in the LOBSEGMENT. The version of the LOB you see is dictated by the version of the index that you generate when you request a copy of the LOB at a given SCN.

 

*** Footnote: My description of the LOBINDEX was an approximation. Each index entry carries a fixed size “payload” listing up to eight lob chunks; so the (LOB_ID, chunk_number) index entries in a LOBINDEX may point to every 8th chunk in the LOB. The significance of the “fixed size” payload is that the payload can be modified in place if the pointer to a LOB chunk has to be changed – and this minimises disruption of the index (at a cost of some wasted space).

 

Delphix Debrief

Filed under: Uncategorized — Jonathan Lewis @ 7:24 am UTC Mar 22,2013

I’ve had my week in Palo Alto with the Delphix people. I really don’t know where the time went to – but I had a lot of interesting conversations with a number of very able people; plenty of time to experiment; and I’ve even been able to install Delphix on my laptop (as a 64-bit Open Solaris 10 VM under VMWare). I liked the product, and I was impressed with the team they’ve got working on it.

We’ve pencilled in an intial online webinar for Friday 5th April which will probably feature an informal chat where Kyle Hailey and I talk about my impressions of the product and what I’ve done with it so far. Then, in about a month’s time, we’ll have a more technical discussion.

In the meantime, it occurred to me that my next blog post ought to be about LOBs as this would be a nice way to introduce you to one of the key ideas behind DxFS (the Delphix-extended file system that’s based on ZFS).

Update 22nd March

The link to register for the online conversation is now up. We’re timetabled for 5th April 9:00 am PDT (which is currently 4:00 pm GMT, but it will be 5:00 pm BST by then.)

March 20, 2013

Lock Bug

Filed under: Bugs,deadlocks,Locks,Oracle — Jonathan Lewis @ 12:08 pm UTC Mar 20,2013

Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:


-- created a table t1 to work with, no data needed, column definitions don't matter.

Session 1:
lock table t1 in row exclusive mode;        -- mode 3, lock acquired

Session 2:
lock table t1 in row exclusive mode;        -- mode 3, lock acquired

Session 3:
lock table t1 in exclusive mode;            -- mode 6, goes into wait for session 1

Session 1 (again):
lock table t1 in share row exclusive mode;  -- mode 5, should move to converter queue, wait for session 2

The mode 3 to mode 5 conversion is what happens if you have a foreign key referential integrity constraint without a covering index and run code that does something like “delete child rows for parent X, delete parent X” (which also happens under the covers if your foreign key is declared as “on delete cascade”).

Notice the “should” on the lock conversion line – this is exactly what happens in 10.2.0.3 and 11.2.0.3; but on the system I was using session 1 got an immediate deadlock (ORA-00060) error – in the absence of a deadlock scenario !

My problem is this – the instance that gives the deadlock error is 64-bit 11.2.0.2 running RAC on Linux (OEL 5); the instances that don’t give the error are 32-bit, non-RAC, running on Windows XP Pro. SO is the anomaly due to some difference in:

  • Windows vs. Linux
  • 32 bit vs. 64 bit
  • RAC vs. non-RAC
  • Specific version of Oracle

I’m inclined to think it’s a version dependent bug, but it’s possible that it’s a necessary side effect of RAC. So if you’ve got 11.2.0.3 RAC, or 11.2.0.2 non-RAC, or the terminal releases of 10g and 11.1 whether RAC or non-RAC, I’d like to hear from you which versions – if any – produce the same deadlock and which don’t. (And the scientific method being what it is, anyone with 11.2.0.2 on RAC on Linux might like to confirm – or contradict – my result.)

Update 8:30 p.m. GMT

The results in so far tend to support the idea that this may be a RAC-related issue. Justin Mitchell’s result is particularly interesting, and I confirmed on my 11.2.0.2 RAC system that if the two starting sessions are on different nodes then I don’t get the deadlock behaviour.

Users of RAC will probably be aware of the fact that v$lock.block commonly sets itself to 2 (potential blocker) on RAC the moment you acquire the lock; and this is what happens to both the “row exclusive” locks. When we request the exclusive lock, both TM locks change from block=2 to block=1 if they are on the same node; if they are on different nodes then only one of them changes.

Unfortunately Oleksandr didn’t see a deadlock when he tried RAC 11.2.0.3 – and since he didn’t make any comment about using multiple nodes we will have to wait to see if he can repeat the test and tell us whether his results are consistent with Justin’s, or whether they suggest that there are further considerations in effect.

March 9, 2013

Virtual DB

Filed under: Uncategorized — Jonathan Lewis @ 9:52 am UTC Mar 9,2013

I’m heading off to Heathrow airport later on today to fly out to San Francisco for my week of  experimenting with Delphix. I’ve done a little preparation work, of course, including browsing around the Internet to read about related technologies. Some of the material I found was very interesting, so I thought I’d go publish a few of the links that might be useful to other people.

It’s quite surprising to see how long the necessary core technology has been around; and yet there seems to have been minimal follow-up on the possibilities the technology  makes available – perhaps because of the specific  hardware, or special skills needed to put put together a working solution.

Here’s a short series from “Oracle Storage Guy” comparing Netapp and EMC (with a bit of a bias towards EMC)  - it’s a few years old, and the companies mentioned have probably moved on, reviewed strategies, and refined what they do, but I suspect it’s still good background information for the non-specialist.

And an introduction to some OpenWorld presentations on Oracle’s dbClone from the same person.

A couple of posts by Kevin Closson on NFS and CloneDB

And a posting about Oracle’s clonedb from Tim Hall

A couple of (fairly long) Oracle white papers on cloning

And, of course, a couple of items from Kyle Hailey (Delphix) that talk about why Delphix is the right solution at the right time. Both items  come with a useful list of links  to articles on other technologoes

Finally, a couple of posts from end-users:

Bottom line – it looks as if Delphix has created a product that is more flexible, more powerful and easier to use than the competition that has been around for several years, and they’ve done it on generic hardware. I’ll be trying to find out how  reality lines up with expectations over the next week.

March 4, 2013

Duplicate indexes ?

Filed under: Indexing,Oracle,Troubleshooting — Jonathan Lewis @ 5:39 pm UTC Mar 4,2013

I don’t think this is likely to happen on a production system (until 12c) – but look what you can do if you try hard enough:

  1  select
  2     index_name, column_name from user_ind_columns
  3  where
  4     table_name = 'T1'
  5  order by
  6*    index_name , column_position
SQL> /

INDEX_NAME           COLUMN_NAME
-------------------- --------------------
T1_I1                N1
                     V1

T1_I2                N1
                     V1


4 rows selected.

That’s a straight cut-n-paste from an Oracle 11.1.0.7 SQL*Plus session. (You can tell I typed it in real time because I missed the return before the FROM, and couldn’t be bothered to go back and do it again ;) )

March 1, 2013

Source Control

Filed under: Advertisements,Oracle — Jonathan Lewis @ 6:37 pm UTC Mar 1,2013

You may recall that I spent some time with the developers at the redgate offices in Cambridge (UK) a little while ago, looking at their Source Control for Oracle package. The product is about to go live, with a launch date of 12th March.

Because of the help I’ve given them they’ve offered my readers the chance of winning one of two 5-user licences for the product – provided I devise a strategy for picking the recipients.

So, to make it easy, all you have to do is persuade me that you really need or really deserve a copy. Write up a short description in the comments of the worst problem you’ve had to face because you didn’t have a decent source control system, or the  best use you think you could make of a source control system for Oracle.

I’ll get the redgate developers to read what you wrote, and they will be the final judges of the two most interesting, or possibly most horrifying, or maybe even the most entertaining, cases.

Entries to be in by 23:59 GMT on 11th March.

Update

The product has launched early !

You can find out more and see screenshots on this web page.

And there’s an online demo of the tool on March 14 at 16:00 GMT (17:00 CET / 11:00 EST / 10:00 CST / 08:00 PST), where you will be able to ask any questions. If you’re interesting in seeing it you’ll have to register at this “go to meeting” URL.

The closing date for entries is still 11th March, 23:59 GMT.

February 25, 2013

Free Space

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:36 pm UTC Feb 25,2013

Question – How can you have a single file in a single tablespace showing multiple free extents when there are no objects using any space in that file ? For example, from an 11.1.0.7 database:

SQL> select
  2          *
  3  from    user_free_space
  4  where
  5          tablespace_name = 'TEST_8K'
  6  order by
  7          file_id, block_id
  8  ;

TABLESPACE_N    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------ ---------- ---------- ---------- ---------- ------------
TEST_8K               3        128    1048576        128            3
TEST_8K               3        256    1048576        128            3
TEST_8K               3        384    1048576        128            3
TEST_8K               3        512  130023424      15872            3

4 rows selected.

The answer in this case is simple – here’s what I did just before running my query:

SQL> create table t1(n1 number);

Table created.

SQL> create table t2(n1 number);

Table created.

SQL> create table t3(n1 number);

Table created.

SQL> drop table t1;

Table dropped.

SQL> drop table t2;

Table dropped.

SQL> drop table t3;

Table dropped.

SQL>

This test is on 11.1.0.7; test_8k is my default tablespace, uses an 8KB block size (did you spot the clue) with 1MB uniform extents. The segments created immediately and I haven’t purged my recyclebin. Because I’ve dropped the tables Oracle includes their space in the “free space” views, but because I need to be able to flash them back into existence the segments can’t be coalesced into the adjacent free space, and they will also be reported in dba_segments.

Here’s a harder one – there are NO objects in this tablespace, and nothing hiding in the recyclebin:

SQL> select
  2          *
  3  from    user_free_space
  4  where
  5          tablespace_name = 'TEST_2K'
  6  order by
  7          file_id, block_id
  8  ;

TABLESPACE_N    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------ ---------- ---------- ---------- ---------- ------------
TEST_2K               7        512   58720256      28672            7
TEST_2K               7      29184   58720256      28672            7
TEST_2K               7      57856   58720256      28672            7
TEST_2K               7      86528   32505856      15872            7

And here’s the statement I executed (as SYS) just before I ran the query – so no chance that there’s anything hidden in the file:


create
        tablespace test_2k
        datafile 'C:\ORACLE\ORADATA\d11g\d11g\test_2k.dbf'
        SIZE 200M reuse
        blocksize 2k
        extent management local
        uniform size 4k
        segment space management manual
;

Update: 26th Feb

I’m sitting in Munich airport and boarding starts in 10 minutes, so just enough time to give an answer. I’ll start with a block dump of block 3 of the data file.

Block dump from disk:
buffer tsn: 22 rdba: 0x01c00003 (7/3)
scn: 0x0b86.06d63ae4 seq: 0x01 flg: 0x04 tail: 0x3ae41e01
frmt: 0x02 chkval: 0x4e8a type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0C112200 to 0x0C112A00
C112200 0000621E 01C00003 06D63AE4 04010B86  [.b.......:......]
C112210 00004E8A 00000007 00000200 00000000  [.N..............]
C112220 00000000 00003800 00000000 00000000  [.....8..........]
C112230 00000000 00000000 00000000 00000000  [................]
        Repeat 123 times
C1129F0 00000000 00000000 00000000 3AE41E01  [...............:]
File Space Bitmap Block:
BitMap Control:
RelFno: 7, BeginBlock: 512, Flag: 0, First: 0, Free: 14336
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
...
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000

This is a locally managed tablespace, so the third block of the file is the first bitmap space management block for the tablespace. The tablespace uses a 2KB block size (so the amount of space in the block for the bit map is slight under 2KB) and a 4KB uniform extent size (so each bit in the map represents 2 blocks/4KB in the tablespace).

As you can see from line 15, the first bit represents the extent starting at block 512, and there are 14,336 bits available (1,792 bytes), so the bitmap in the next block would start at block 512 + 2 * 14,336 = 512 * 28,672 = 29,184. This lines up exactly with the first chunk of free space reported in dba_free_space above.

As one of the comments indicated – the code that populates x$ktfbfe is probably called once for each bitmap space management block, and it doesn’t seem to bother trying to “coalesce” two free space fragments identified by adjacent bitmap blocks.

February 22, 2013

Deadlock Detection

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm UTC Feb 22,2013

By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:

“Oracle detects and resolves deadlocks automatically.”

Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.

Consider the following example (which, I have to admit, I wrote without access to a live instance):

Session 1
delete from t1 where id between 1 and 1000000;
1000000 rows deleted

                                                   Session 2
                                                   update jobs set status = 'STARTING' where id = 99;
                                                   1 row updated

Session 1
update jobs set status = 'FINISHED' where id = 99;
-- session 1 is now waiting on session 2

                                                   Session 2
                                                   delete from t1 where id between 1 and 1000000
                                                   -- session 2 is now waiting on session 1

Session 1 (some time within the next 3 seconds)
ORA-00060: deadlock detected

At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update (worst case I’ve seen: session 2 ran for 10 seconds and resulted in session 1 crashing and rolling back for 3.5 hours); some applications would log “Unexpected Oracle error” to the front-end and retry the most recent action (at which point Session 2 would receive an ORA-00060 error and the two sessions would see-saw back and fore every 3 seconds until someone noticed what was going on).

In what way has the deadlock been “resolved” ?

In this specific case I think I’d probably want a supervisor (person or program) to log and kill session 2 and allow session 1 to retry its second update – and then I’d want to find out why this sequence of events had happened at all.

Deadlocks are NOT resolved automatically by Oracle, they are merely reported so that the client code can decide how to resolve them.

February 20, 2013

Travelogue 3

Filed under: Non-technical — Jonathan Lewis @ 6:45 pm UTC Feb 20,2013

I’m sure I wrote something a little while ago about cutting down on long-haul flights – but I’ve just been subjected to a positive barrage of invitations that I’ve found hard to resist. So my international travel itinerary for the future is starting to look like it might be a little busy. Probable highlights:

I’m aiming to take a short city-break in Moscow in May, and I’ll be stopping off with the people at Innova to spend a day talking about Oracle.

I’ve volunteered to take part in the southern leg of the “LA OTN” tour in August – aiming to visit Chile 01/08, Peru 03/08, Uruguay 06/08, Argentina 08/08, and Brazil 10/08 – flying home just in time for my son’s 21st birthday.

Then, in late October, I have an invitation to spend a couple of days in Beijing presenting at the All China Oracle User Group conference.

And if I manage to complete all that lot I think that will take my list of countries visited up to 60. Who knows, I may even find a couple more photos to publish in my “Travelogue” series.

February 18, 2013

Transactions 2

Filed under: Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 7:54 am UTC Feb 18,2013

Here’s a little follow-on from Friday’s posting. I’ll start it off as a quiz, and follow up tomorrow with an explanation of the results (though someone will probably have given the correct solution by then anyway).

I have a simple heap table t1(id number(6,0), n1 number, v1 varchar2(10), padding varchar2(100)). The primary key is the id column, and the table holds 3,000 rows where id takes the values from 1 to 3,000. There are no other indexes. (I’d show you the code, but I don’t want to make it too easy to run the code, I want you to try to work it out in your heads).

I run the following pl/sql block.

begin
	--
	--	Going to fail on primary key constraint
	--
	for i in 1..200 loop
		begin
			insert into t1 values(50,i,'x','x');
			commit;
		exception
			when others then null;
		end;
	end loop;

	--
	--	Explicit rollback
	--
	for i in 1..40 loop
		begin
			insert into t1 values(4000 + i,i,'x','x');
			rollback;
		end;
	end loop;

	--
	--	Explicit commit
	--
	for i in 1..50 loop
		begin
			insert into t1 values(4500 + i,i,'x','x');
			commit;
		end;
	end loop;

	--
	--	Rolling back to savepoint
	--
	for i in 1..70 loop
		begin
			savepoint A;
			insert into t1 values(4600 + i,i,'x','x');
			rollback to savepoint A;
		end;
	end loop;

end;
/

If (from another session) you take an AWR (or Statspack) snapshot before and after running the block, what will you see for the statistics

user commits
user rollbacks
transaction rollbacks
rollback changed – undo records applied

and, for a bonus point, what will the value of “Rollback per transaction %” be ?

Note – since the snapshots are global, your results will be affected by other work on the system; in particular I noticed that simply taking two AWR snapshots in a row on the instance I used resulted in 8 user commits.

 

Update

And the answers – cut from an AWR report – are:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
rollback changes - undo records                 420           25.2           4.3
transaction rollbacks                           240            4.7           2.5
user commits                                     58            1.1           0.6
user rollbacks                                   40            0.8           0.4

 Rollback per transaction %:   40.82       Rows per Sort:   122.70

The 40 “user rollbacks” come from the loop with the explicit rollback. At the same time the 40 rollbacks introduced 80 “rollback changes – undo records applied” – one for the table index and one for the table for each call to rollback. Our “user rollbacks” have been “real” rollbacks, so they’ve introduced 40 transaction rollbacks at the same time.

The 50 “user commits” come from the loop with the explicit commit – except that the picture is slightly blurred by the fact that simply running the AWR snapshot introduced a few extra commits.

The 40.82% is 40 / (40 + 58) expressed as a percentage.

The “rollback to savepoint” calls haven’t contributed to the user rollbacks – even though, in this case, they have rolled back 40 transaction starts, and a check of various other statistics would show a TX enqueue being requested and released, and transactions rotating through the undo segments. Despite this clearing of the TX locks, these rollbacks to savepoint don’t count towards transaction rollbacks; however, the 70 “rollback to savepoint” calls have introduced a further 140 “rollback changes – undo records applied” (again, one for the table and one for the index for each call).

Finally the failed attempts to insert duplicate keys have, in these circumstances, introduced 200 transaction rollbacks (though not user rollbacks – we didn’t ask for them explicitly). They are also responsible for the final 200 “rollback changes – undo records applied” that we need to account for. The changes, very specifically, are the changes to the table – Oracle does actually have to insert the row into the table before trying to insert the index entry (after all, the index entry needs to know the rowid), and so it has to reverse that insert when it discovers that the relevant entry already exists in the primary key index. The requirement to insert and then rollback is one of the reasons why it is often better for “batch merge by pl/sql loop” to try an update then insert is sql%rowcount is zero rather than trying to insert, capture the exception, and then update; it’s also a good reason for creating the unique indexes on a table before creating the non-unique indexes – Oracle maintains indexes in the order they were created so you want it to find the duplicate key error as soon as possible to minimise the amount of change and rollback.

If you want to experiment further, there are a couple of slightly different variations of the code that exhibit interesting variations in results. There are also other events that can cause transaction rollbacks to appear without user rollbacks being invoked.

February 15, 2013

Transactions

Filed under: Oracle,Statspack,Troubleshooting — Jonathan Lewis @ 8:07 am UTC Feb 15,2013

It’s very easy to get a lot of information from an AWR (or Statspack) report – provided you remember what all the numbers represent. From time to time I find that someone asks me a question about some statistic and my mind goes completely blank about the exact interpretation; but fortunately it’s always possible to cross check because so many of the statistics are cross-linked. Here’s an example of a brief mental block I ran into a few days ago – I thought I knew the answer, but realised that I wasn’t 100% sure that my memory was correct:

In this Load Profile (for an AWR report of 60.25 minutes), what does that Transactions figure actually represent ?

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:             57,567.09             12,028.39
              Logical reads:             48,043.83             10,038.54
              Block changes:                314.07                 65.62
             Physical reads:                667.70                139.51
            Physical writes:                 46.25                  9.66
                 User calls:                619.33                129.41
                     Parses:                505.67                105.66
                Hard parses:                 36.94                  7.72
                      Sorts:                313.05                 65.41
                     Logons:                  0.56                  0.12
                   Executes:              1,165.42                243.51
               Transactions:                  4.79

  % Blocks changed per Read:    0.65    Recursive Call %:    95.18
 Rollback per transaction %:   24.95       Rows per Sort:    25.09

And, while we’re at it, what does the “Rollback per transaction %” actually mean – and what, if anything, can we infer from the value ?

Since we’re looking at transactions, it’s a fairly safe bet that the figure is something to do with commits – but is it counting only committed transactions, or does it included rolled back transactions [philosophical question - is a change that's never committed really a transaction, since it "never happened" as far as the rest of the world is concerned?]. Fortunately we can look at the Intance Activity Statistics to check.

First, though, lets convert the “Per Second” figures into an absolute value by multiplying by the duration in seconds (3,615) of the report. Allowing for rounding errors we’re looking for a value between ceiling(4.785 *3615) and floor(4.795 * 3615), i.e. between 17,298 and 17,338.  Let’s also keep in mind that the “Rollback per transaction %” is very close to 25%.

Here, then, are a couple of useful figures from the Instance Activity:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
user commits                                 12,986            3.6           0.8
user rollbacks                                4,316            1.2           0.3

Almost immediately you can see that user rollbacks are roughly one third of user commits (I promise I didn’t massage these number), which means the rollback percentage is just (user rollbacks / (user commits + user rollbacks)). As a further sanity check, 4,316 + 12,986 = 17,302 which falls nicely into our required range. Ta-da, job done: “Transactions” is the sum of user commits and user rollbacks (as a first approximation – but is it the whole answer).

Of course, I picked a fairly extreme example from my AWR library to make a point – which means you might now be asking whether all those rollbacks pose some sort of threat. How serious are they ? Luckily there are a couple more statistics that tell us:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
rollback changes - undo records                 222            0.1           0.0
transaction rollbacks                            59            0.0           0.0

The name of the first statistic has been trimmed a little in this text report, the full name is: “rolllback changes – undo records applied”. As you can see, we’ve only applied 222 undo records in our 4,316 rollback calls, so we’re not really rolling anything back (most of the time). Moreoever, the “transaction rollbacks” corroborates this observation – we have only attempted to rollback 59 “real” (data-changing) transactions. Most of those rollbacks are probably the default “rollback after every call” that some web application servers make.

As a closing thought – if you’re responsible for several different systems, it’s convenient to keep a couple of ”reference” AWR or Statspack reports from busy, normal, and quiet periods for each system; that way, if you use the figures from one system to work out the meaning of some derived value you can use the figures from another system to check if your hypothesis is correct.

Footnote: there are two other statistics that include the text “undo records applied” (not that you’ll see those words in the textual AWR report), but they both relate to creating read-consistent copies of blocks.

Footnote 2: Is anyone getting worried by my “first approximation” comment – and has anyone started wondering if there may be more to transaction rollbacks that user rollbacks, and if so where they fit into the arithmetic ?  To be continued.

 

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,398 other followers