Oracle Scratchpad

May 17, 2013

UKOUG

Filed under: Uncategorized — Jonathan Lewis @ 2:25 pm UTC May 17,2013

The call for papers is open for Tech 13 – the “server-side” conference of the UKOUG.

The conference was getting so big that we’ve split Apps from Server Tech and will be running the two conferences separately this year. The Server Tech conference will be in Manchester from 2nd to 4th Dec.

The closing date for submissions is Friday 31st May (only 2 weeks !) and confirmation of acceptance given by August.

There is a slightly shorter route to submission (if you don’t want to watch the video on “Why to speak”).

 

 

 

May 16, 2013

Performance Monitoring

Filed under: Uncategorized — Jonathan Lewis @ 4:51 am UTC May 16,2013

Updated – just a quick reminder for next week; I’ll be doing a short webinar next Wednesday comparing the performance monitoring tools Oracle and SQL server provide.

I think I may have broken my record with 6 countries in 6 weeks – so I haven’t been very thorough at updating my blog recently. Just time, before I head off to Heathrow once again, to do a quick advert for the next redgate webinar that I’m doing with Grant Fritchey. This time comparing built-in performance monitoring tools. Details and Registrations at this URL.

I’ll see if I can catch up with a couple of answers while I’m in the airport lounge – but no promises, since the simple act of walking into an airport makes me  feel like falling asleep.

Webinars

Filed under: Advertisements — Jonathan Lewis @ 4:47 am UTC May 16,2013

Reminder:

The webinars on “Smarter Statistics in 11g” are on tomorrow (Friday) at 2:00 pm and 6:00 pm. There’s a waiting list for the 6:00 pm event, so if you’ve signed up but can’t make it please delete your registration. (The event will be repeated on 10th June).  If you want to vote a better time for me to do short webinars there’s a poll at the end of the article.

 

 

I’m about to make a serious move into online webinars, and as a warm-up exercise I’ll be doing a couple of one-hour free events on Friday 17th May.

I’ll be talking through a Powerpoint presentation called “Smarter Statistics in 11g” twice, once at 2:00 pm – 3:00 pm BST, and again at 6:00 pm BST  (12:00 pm - 1:00 pm CDT) . Broadly speaking the first one is for the benefit people from the UK and eastwards, and the second is for the benefit of people from the US and westwards. This is just a trial run, of course, and if it works well I will be doing more of the same, perhaps three times per day to spread across more time zones.

John Goodhue (my O1 sponsor for the USA) is arranging all the mechanical details, and I’ll post links for registration when they become available – we’ll be using GoToWebinar as the supply mechanism, and we’ll be limiting access to 100 people (so if you do register and can’t attend, please remove yourself from the list; if you don’t manage to register for either event, you’ll get another chance later as I plan to repeat each event a few times.)

I’ll also be doing a full day paid event on 23rd May which will be my “Indexing Strategies” tutorial. This first full day event will be timed to suit the American audience – although anyone can register, of course – but we plan to have further events suited to other time zones. The URL for registration is now available – with an option to purchase a 30-day window to the recording of my “Oracle Mechanisms” presentation in Minneapolis.

May 13, 2013

Parse Time

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:59 pm UTC May 13,2013

Dominic Brooks published a note recently about some very nasty SQL – originally thinking that it was displaying a run-time problem due to the extreme number of copies of the lnnvl() function the optimizer had produced. In fact it turned out to be a parse-time problem rather than a run-time problem, but when I first read Dominic’s note I was sufficiently surprised that I decided to try modelling the query.

Unfortunately the query had more than 1,000 predicates, (OR’ed together) and some of them included in-lists.  Clearly, writing this up by hand wasn’t going to be a good idea, so I wrote a script to generate both the data, and the query, as follows – first a table to query:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum		id1,
	rownum		id2,
	rownum		id,
	lpad(rownum,10)	v1,
	rpad('x',100)	padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

create index t1_i1 on t1(id1, id2);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

Then a piece of code to write a nasty query:

set pagesize 0
set feedback off
set termout off

spool temp1.sql

prompt select * from t1 where 1 = 2

select
	'or (id1 = ' || rownum || ' and id2 = ' || (rownum + 1) || ')'
from
	t1
where
	rownum <= 750
union all
select
	'or ( id1 =  ' || (rownum + 1000) || ' and id2 in (' || rownum || ',' || (rownum+1) || '))'
from
	t1
where
	rownum <= 250
;

prompt /

spool off

Here’s an example of the text generated by the code – with the parameters set to 5 and 3 respectively (and notice how I’ve rigged the query so that it doesn’t return any data, whatever the optimizer thinks):

select * from t1 where 1 = 2
or (id1 = 1 and id2 = 2)
or (id1 = 2 and id2 = 3)
or (id1 = 3 and id2 = 4)
or (id1 = 4 and id2 = 5)
or (id1 = 5 and id2 = 6)
or ( id1 =  1001 and id2 in (1,2))
or ( id1 =  1002 and id2 in (2,3))
or ( id1 =  1003 and id2 in (3,4))
/

So here’s the plan from the above query:


---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     8 |  1008 |    16   (0)| 00:00:01 |
|   1 |  CONCATENATION                |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |     1 |   126 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID | T1    |     1 |   126 |     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T1    |     1 |   126 |     3   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN           | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
|   8 |   INLIST ITERATOR             |       |       |       |            |          |
|   9 |    TABLE ACCESS BY INDEX ROWID| T1    |     5 |   630 |     7   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN          | T1_I1 |     5 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID1"=1003)
       filter("ID2"=3 OR "ID2"=4)
   5 - access("ID1"=1002)
       filter((LNNVL("ID1"=1003) OR LNNVL("ID2"=3) AND LNNVL("ID2"=4)) AND
              ("ID2"=2 OR "ID2"=3))
   7 - access("ID1"=1001)
       filter((LNNVL("ID1"=1002) OR LNNVL("ID2"=2) AND LNNVL("ID2"=3)) AND
              (LNNVL("ID1"=1003) OR LNNVL("ID2"=3) AND LNNVL("ID2"=4)) AND ("ID2"=1 OR
              "ID2"=2))
  10 - access(("ID1"=1 AND "ID2"=2 OR "ID1"=2 AND "ID2"=3 OR "ID1"=3 AND
              "ID2"=4 OR "ID1"=4 AND "ID2"=5 OR "ID1"=5 AND "ID2"=6))
       filter((LNNVL("ID1"=1001) OR LNNVL("ID2"=1) AND LNNVL("ID2"=2)) AND
              (LNNVL("ID1"=1002) OR LNNVL("ID2"=2) AND LNNVL("ID2"=3)) AND
              (LNNVL("ID1"=1003) OR LNNVL("ID2"=3) AND LNNVL("ID2"=4)))

As you can see, the first five predicates end up in line 10 of the plan with 10 repetitions (5 * 2) of the lnnvl() function. The last three predicates show up in lines 3, 5, and 7 – and the on each line we see two more lnnvl() calls than on the previous – just imagine, then, how many lnnvl() calls the optimizer will have added to the query plan by the time we have 750 occurrences in the inlist iterator (line 8) and 250 occurrences of the slightly complex predicate. Here are the relevant CPU stats (from v$session_stats) from running the generated script on 11.1.0.7, on Windows 32-bit, 2.8GHz CPU:

Name                                           Value
----                                           -----
recursive cpu usage                            1,848
CPU used when call started                     1,854
CPU used by this session                       1,854
DB time                                        1,870
parse time cpu                                 1,847
parse time elapsed                             1,862

Clearly the parse time is extreme – though not as dramatic as in Dominic’s example; but having set up the first draft of the sample code it’s easy enough to change the number of occurrences of each type of predicate, and it’s pretty easy to make longer in-lists in the more complex of the two types of predicate. It’s not too difficult to get an execution plan that mimics Dominic’s in length and time to parse.

It’s not just the parse times that are interesting when you start doing this, by the way – it’s worth playing around to see what happens. It’s probably best to run the query to pull the plans from memory if you want to see the plans, though – if you try using “explain plan” then you start using memory in the SGA for some of the work: in one of my examples I had to abort the instance after a few minutes.

ISS

Filed under: Non-technical — Jonathan Lewis @ 6:45 pm UTC May 13,2013

I’d like to dedicate this posting to fellow Oak Table member Richard Foote, for reasons that the readers we have in common will immediately recognise: http://www.youtube.com/watch?v=KaOC9danxNo

The singer is Canadian astronaut Commander Chris Hadfield who has been tweeting and posting pictures from space – be careful, you may get hooked: https://twitter.com/Cmdr_Hadfield/status/332819772989378560/photo/1

Update:

When I posted the link to the video it had received 1.5M views; less than 24 hours later it’s up to roughly 7M. (And they weren’t all Richard Foote). Clearly the images have caught the imagination of a lot of people. If you have looked at the twitter stream it’s equally inspiring – and not just for the pictures.

 

May 10, 2013

Hakan Factor

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 3:52 pm UTC May 10,2013

Here’s a quick and dirty script to create a procedure (in the SYS schema – so be careful) to check the Hakan Factor for an object. If you’re not familiar with the Hakan Factor, it’s the value that gets set when you use the command “alter table minimize records_per_block;”.

I was prompted to publish this note by an item on the OTN SQL forum describing a problem with partition exchange with a table when there were bitmap indexes in place and the table had been changed to have some extra columns added. (Problem as yet unresolved as I publish).

If you start playing with the Hakan Factor, you’ll find that there are some odd little bugs in what gets stored and how it gets used. (SQL updated to use bitand() to reflect comments below and Karsten Spang’s blog note; also edited following a comment on OTN to show the rest of the spare1 flag bits)


create or replace procedure show_hakan(
	i_table		in	varchar2,
	i_owner		in	varchar2 default user
) as
	m_obj		number(8,0);
	m_flags		varchar2(12);
	m_hakan		number(8,0);
begin

	/* created by show_hakan.sql	*/
	select
		obj#, 
/*
		case 
			when	(spare1 > 5 * power(2,15)) 
				then	(spare1 - 5 * power(2,15))
			when	(spare1 > power(2,17)) 
				then	(spare1 - power(2,17))
			when	(spare1 > power(2,15)) 
				then	(spare1 - power(2,15))
				else	spare1
		end 				hakan
*/
	to_char(
		bitand(
			spare1, to_number('ffff8000','xxxxxxxx')
		),
		'xxxxxxxx'
	)					flags,
	bitand(spare1, 32767)			hakan	-- 0x7fff	
	into
		m_obj,
		m_flags,
		m_hakan
	from
		tab$
	where	obj# in (
			select	object_id 
			from	dba_objects
			where	object_name = upper(i_table)
			and	object_type = 'TABLE'
			and	owner = upper(i_owner)
		)
	;

	dbms_output.put_line(
		'Hakan factor for object ' || 
		m_obj   || ' (' ||
		i_owner || '.' || 
		i_table || ') is ' || 
		m_hakan || ' with flags ' ||
		m_flags
	);
end;
/

drop public synonym show_hakan;
create public synonym show_hakan for show_hakan;
grant execute on show_hakan to public;

 

You’ll notice that I’ve done an “upper()” on the table and owner – that means you’re in trouble if you have created an schemas or tables with mixed-case names (but you wouldn’t do that in a production system, would you?)

Update – A little bug history

One of the odd details of the Hakan factor is that the value it shows is one less than the number of rows that will be stored in a block; and since it looks as if the factor is not allowed to drop to zero, you can’t hack the Hakan factor to force one row per block.

So here’s a (trivial and sub-optimal) piece of code to check current number of rows per block in a simple heap table (assuming the tablespace consists of a single file):

select 
	ct, count(*) 
from	(
		select	dbms_rowid.rowid_block_number(rowid), count(*) ct  
		from	t1
		group by
			dbms_rowid.rowid_block_number(rowid)
	)
group by ct
order by ct
;

Here’s the output of a session, running under 9.2.0.8, cut and pasted from the screen:

SQL> @afiedt.buf

        CT   COUNT(*)
---------- ----------
         9          1
        16          1
SQL> alter table t1 nominimize records_per_block;
SQL> alter table t1 minimize records_per_block;
SQL> execute show_hakan('t1')
Hakan factor for object 48865 (TEST_USER.t1) is: 15
SQL> alter table t1 move;
SQL> @afiedt.buf

        CT   COUNT(*)
---------- ----------
        10          1
        15          1
SQL> alter table t1 nominimize records_per_block;
SQL> alter table t1 minimize records_per_block;
SQL> execute show_hakan('t1')
Hakan factor for object 48865 (TEST_USER.t1) is: 14

Every time you moved the table, 9.2.0.8 (and earlier) used the actual stored value of the Hakan Factor to rebuild the table; but if you regenerated the Hakan Factor the stored value was one less than the actual row count. So if you kept repeating the process the number of rows per block would decrease by one each time and the table would get bigger and bigger.

It’s a silly example – but the real-world relevance was that a direct path insert behaved differently from a normal insert and this could result in a significant amount of wasted space if you were doing bulk loads in your overnight batch; so the code changed in 10g to make the normal and direct path inserts consistent with each other, but the change went the wrong way and, as a side effect, you get one more row per block than suggested by the Hakan Factor – and you can’t trick the Hakan factor into enforcing one row per block any more.

May 9, 2013

Clustering_factor

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 8:14 am UTC May 9,2013

Cost Based Oracle – Fundamentals (November 2005)

But the most interesting function for our purposes is sys_op_countchg(). Judging from its name, this function is probably counting changes, and the first input parameter is the block ID portion (object_id, relative file number, and block number) of the table’s rowid, so the function is clearly matching our notional description of how the clustering_factor is calculated. But what is that 1 we see as the second parameter?

When I first understood how the clustering_factor was defined, I soon realized that its biggest flaw was that Oracle wasn’t remembering recent history as it walked the index; it only remembered the previous table block so that it could check whether the latest row was in the same table block as last time or in a new table block. So when I saw this function, my first guess (or hope) was that the second parameter was a method of telling Oracle to remember a list of previous block visits as it walked the index.

And finally, Oracle Corp. had implemented an official interface to the second parameter of sys_op_countchg() – provided you install the right patch – through a new table (or schema, or database) preference type available to the dbms_stats.set_table_prefs() procedure.

I’ve been meaning to write this post for two or three months, ever since Sean Molloy sent me an email about short blog note from Martin Decker describing Bug 13262857  Enh: provide some control over DBMS_STATS index clustering factor computation. Unfortunately I’ve not yet had time to investigate the patch, but I don’t think I need to any more because Richard Foote has written it up in his latest blog post.

Read Richard’s post – it’s important.

Update 10th May

Richard’s post has, unsurprisingly, produced a buzz of excitement in his reader – and started up the discussion of how best to use this capability; so here’s another quote from the book (p.111 – available in the download of chapter 5):

So using Oracle’s own function for calculating the clustering_factor, but substituting the freelists value for the table, may be a valid method for correcting some errors in the clustering_factor for indexes on strongly sequenced data. (The same strategy applies if you use multiple freelist groups—but multiply freelists by freelist groups to set the second parameter.)

Can a similar strategy be used to find a modified clustering_factor in other circumstances? I think the answer is a cautious “yes” for tables that are in ASSM tablespaces. Remember that Oracle currently allocates and formats 16 new blocks at a time when using automatic segment space management (even when the extent sizes are very large, apparently). This means that new data will be roughly scattered across groups of 16 blocks, rather than being tightly packed.

Calling Oracle’s sys_op_countchg() function with a parameter of 16 could be enough to produce a reasonable clustering_factor where Oracle currently produces a meaningless one. The value 16 should, however, be used as an upper bound. If your real degree of concurrency is typically less than 16, then your actual degree of concurrency would probably be more appropriate.

Whatever you do when experimenting with this function—don’t simply apply it across the board to all indexes, or even all indexes on a particular table. There will probably be just a handful of critical indexes where it is a good way of telling Oracle a little more of the truth about your system—in other cases you will simply be confusing the issue.

Note particularly the comments about how the best value depends on the data in the indexed columns, the table configuration, and the degree of concurrency - you don’t necessarily want to use the same value for every index on a given table. That’s a shame, since Oracle has defined the interface as a TABLE preference, so if you set it then you get the same for every index. Despite this, if you’re prepared to put in a little control work, it does mean that you can use an official Oracle mechanism to play the game I was suggesting in the book – for each “special” index, set the preference, collect the stats, then clear the preference.

May 5, 2013

SLOB

Filed under: Oracle,Performance — Jonathan Lewis @ 6:21 pm UTC May 5,2013

Anyone who has used Kevin Closson’s “Silly Little Oracle Benchmark” will want to know about his significantly updated SLOB2.

May 2, 2013

v$lock

Filed under: Locks,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 7:03 pm UTC May 2,2013

The problem of slow queries on v$lock just came up again on the OTN database forum, so I thought I’d better push out a post that’s been hanging around on my blog for the last few months. This is actually mentioned in MOS in note 1328789.1: “Query Against v$lock Run from OEM Performs Slowly” which points out that it is basically a problem of bad statistics and all you have to do is collect the stats.

However, because the view is a messy union and join of several dynamic performance views sitting on top of a load of x$ structures, the advisory from MOS is to call the procedure  dbms_stats.gather_fixed_objects_stats. This is not a nice thing to do on a busy production system, especially if it has a large number of users and a large shared pool – and doing it at a representative run-time is important if you’re going to do it at all.

There is an alternative. Here’s the basic execution plan (on my 11.2.0.2) for “select * from v$lock”:

-----------------------------------------------------
| Id  | Operation              | Name       | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT       |            |     1 |
|   1 |  HASH JOIN             |            |     1 |
|   2 |   VIEW                 | GV$_LOCK   |    10 |
|   3 |    UNION-ALL           |            |       |
|   4 |     FILTER             |            |       |
|   5 |      VIEW              | GV$_LOCK1  |     2 |
|   6 |       UNION-ALL        |            |       |
|   7 |        FIXED TABLE FULL| X$KDNSSF   |     1 |
|   8 |        FIXED TABLE FULL| X$KSQEQ    |     1 |
|   9 |     FIXED TABLE FULL   | X$KTADM    |     1 |
|  10 |     FIXED TABLE FULL   | X$KTATRFIL |     1 |
|  11 |     FIXED TABLE FULL   | X$KTATRFSL |     1 |
|  12 |     FIXED TABLE FULL   | X$KTATL    |     1 |
|  13 |     FIXED TABLE FULL   | X$KTSTUSC  |     1 |
|  14 |     FIXED TABLE FULL   | X$KTSTUSS  |     1 |
|  15 |     FIXED TABLE FULL   | X$KTSTUSG  |     1 |
|  16 |     FIXED TABLE FULL   | X$KTCXB    |     1 |
|  17 |   MERGE JOIN CARTESIAN |            |   100 |
|  18 |    FIXED TABLE FULL    | X$KSUSE    |     1 |
|  19 |    BUFFER SORT         |            |   100 |
|  20 |     FIXED TABLE FULL   | X$KSQRS    |   100 |
-----------------------------------------------------

Note, particularly, the Cartesian merge join at line 17, which assumes there will be one row from v$session (x$ksuse) joined to 100 rows from v$resource (x$ksqrs – the structure used to represent any resources that you want to lock). The big UNION ALL is then all the different types of locks (enqueues) that you might attach to a resource. The estimates relating to these two structures are the most significant problem – v$session always has FAR more than one row in it, and v$resource isn’t small: in my little system the Cartesian join produced about 325,000 rows, and that was just after starting up the database.

Having identified a couple of critical tables, I decided to see what would happen if I just collected stats on these two objects rather than doing the whole system, and the following little piece of pl/sql did what I wanted:

begin
        dbms_stats.gather_table_stats('SYS','x$ksuse',method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('SYS','x$ksqrs',method_opt=>'for all columns size 1');
end;
/

The effect of the stats was to change the plan to the following which, while it might not be the absolute best in all cases, is certainly better than the previous one (it’s possible that you may also find that it helps to collect stats on x$ksqeq (the “generic enqueues” structure) which is likely to be the next most significant in terms of number of rows):

------------------------------------------------------
| Id  | Operation               | Name       | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT        |            |   133 |
|   1 |  HASH JOIN              |            |   133 |
|   2 |   HASH JOIN             |            |    10 |
|   3 |    VIEW                 | GV$_LOCK   |    10 |
|   4 |     UNION-ALL           |            |       |
|   5 |      FILTER             |            |       |
|   6 |       VIEW              | GV$_LOCK1  |     2 |
|   7 |        UNION-ALL        |            |       |
|   8 |         FIXED TABLE FULL| X$KDNSSF   |     1 |
|   9 |         FIXED TABLE FULL| X$KSQEQ    |     1 |
|  10 |      FIXED TABLE FULL   | X$KTADM    |     1 |
|  11 |      FIXED TABLE FULL   | X$KTATRFIL |     1 |
|  12 |      FIXED TABLE FULL   | X$KTATRFSL |     1 |
|  13 |      FIXED TABLE FULL   | X$KTATL    |     1 |
|  14 |      FIXED TABLE FULL   | X$KTSTUSC  |     1 |
|  15 |      FIXED TABLE FULL   | X$KTSTUSS  |     1 |
|  16 |      FIXED TABLE FULL   | X$KTSTUSG  |     1 |
|  17 |      FIXED TABLE FULL   | X$KTCXB    |     1 |
|  18 |    FIXED TABLE FULL     | X$KSUSE    |   252 |
|  19 |   FIXED TABLE FULL      | X$KSQRS    |  1328 |
------------------------------------------------------

Thanks to Timur Akhmadeev who recently published a note pointing out that you could collect stats on individual X$ tables. Do make sure you test this on your specific version of Oracle, though, and don’t use the production system as your first test case.

April 29, 2013

MV Refresh

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 2:13 pm UTC Apr 29,2013

Materialized views open up all sorts of possibilities for making reporting more efficient – but at the same time they can introduce some “interesting” side effects when you start seeing refreshes taking place. (Possibly one of the most dramatic surprises appeared in the upgrade that switched many refreshes into “atomic” mode, changing a “truncate / append” cycle into a massively expensive “delete / insert” cycle).

If you want to have some ideas of the type of work that is involved in the materialized view “fast refresh”, you could look at a recent pair of articles by Alberto Dell’Era on (very specifically) outer join materialized views (which a link back to a much older article on inner join materialized view refresh):

 

 

April 16, 2013

systimestamp

Filed under: Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 12:46 am UTC Apr 16,2013

For your entertainment – there’s nothing up my sleeves, this was a simple cut-n-paste after real-time typing with no tricks:

20:39:51 SQL> create table t1 (t1 timestamp);

Table created.

20:39:55 SQL> insert into t1 values(systimestamp);

1 row created.

20:39:59 SQL> select t1 - systimestamp  from t1;

T1-SYSTIMESTAMP
---------------------------------------------------------------------------
+000000000 04:59:50.680620

1 row selected.

20:40:08 SQL>

My laptop runs Oracle so quickly that it took only 4 seconds for 5 hours to elapse !
11.2.0.3 on 64-bit Linux – the client is running with TZ=EST5EDT, while the server is running UK Time (currently BST (GMT+1))

Comments available on MOS: 340512.1 Timestamps & time zones – Frequently Asked Questions
Another MOS note, thanks to Jure Bratina in the comments: 227334.1 – “Dates & Calendars – Frequently Asked Questions” in the question

Update:

As Niall quotes in the comments: “times are difficult”.
Oracle supplies three timestamps: systimestamp, localtimestamp, and current_timestamp. (For reasons of consistency, only one of uses an underscore ;) )
Oracle also supplies three timestamp types: timestamp, timestamp with time zone, and timestamp with local time zone.
Oracle also supplies two timezone calls: dbtimezone, and sessiontimezone

If you need to figure out all the details of how these things hang together, I think you need to set your machine timezone to something that isn’t UTC (or GMT as I still tend to call it), then use two separate machines as clients, with their timezones set to two other timezones (again avoiding UTC).

I’ve done a few experiments but without being so rigorous in my settings – my machine was running on GMT, but I opened a (UNIX) session and set the session time zone to EST5EDT to start the database, while running other (UNIX) session with different TZ settings. The reason I should have restarted the machine in a different timezone is that Oracle “normalises” some timestamps to UTC – which means there are cases when I can’t be certain whether the stored value is in UTC because it has been normalised or because it simply was the actual machine time.

So here’s a little experiment (11.2.0.2, instance started in EST5EDT, unix session running in UTC, connecting across the network to the server).

select
        current_timestamp,
        localtimestamp,
        systimestamp
from
        dual
;

CURRENT_TIMESTAMP
---------------------------------------------------------------------------
LOCALTIMESTAMP
---------------------------------------------------------------------------
SYSTIMESTAMP
---------------------------------------------------------------------------
17-APR-13 11.37.10.870658 AM +01:00
17-APR-13 11.37.10.870658 AM
17-APR-13 06.37.10.870554 AM -04:00

Notes:
systimestamp reflects the instance timestamp – which is 5 hours earlier than the session timestamp.
systimestamp returns a timestamp with time zone, not just a timestamp
localtimestamp and current_timestamp show the client time, but localtimestamp doesn’t show the timezone, current_timestamp does (the +1:00 appears because Daylight Saving Time (British Summer Time) is active so my session is one hour ahead of UTC, while the database is 4 hours behind.)

Another quick test:

create table t1 (
        t0              timestamp,
        tz              timestamp with time zone,
        tl              timestamp with local time zone,
        ts_type         varchar2(20)
)
;

insert into t1 values(
        systimestamp, systimestamp, systimestamp,
        'sys Timestamp'
);

commit;

select * from t1;

T0
---------------------------------------------------------------------------
TZ
---------------------------------------------------------------------------
TL                                                                          TS_TYPE
--------------------------------------------------------------------------- --------------------
17-APR-13 06.44.04.353489 AM
17-APR-13 06.44.04.353489 AM -04:00
17-APR-13 11.44.04.353489 AM                                                sys Timestamp

select
        dump(t0,16),
        dump(tz,16),
        dump(tl,16),
        ts_type
from
        t1
;
DUMP(T0,16)
------------------------------------------------------------------------------------------------------------------------
DUMP(TZ,16)
------------------------------------------------------------------------------------------------------------------------
DUMP(TL,16)
------------------------------------------------------------------------------------------------------------------------
TS_TYPE
--------------------
Typ=180 Len=11: 78,71,4,11,7,2d,5,15,11,d0,68
Typ=181 Len=13: 78,71,4,11,b,2d,5,15,11,d0,68,10,3c
Typ=231 Len=11: 78,71,4,11,b,2d,5,15,11,d0,68
sys Timestamp

Notes:
T0 – the timestamp column, has the instance timestamp in it – but doesn’t have any timezone information stored; the raw dump show the value 6:44:04 (7, 2d, 5 – convert from hex and substract one). Anyone on ANY timezone will see their output showing 6:44:04 if they select this column.

TZ – the timestamp with time zone column, has the instance timestamp, but has stored it as (b, 2d,5 – 11:44:04) with time zone information (10,3c) that allows the session to know what “global” moment the information really represents and the location (or, rather, time zone) where is was entered.

TL – the timestamp with local time zone, has the instance timestamp, but has stored it as (b, 2d, 5 – 11:44:04) with NO timezone information. So the output when you query this column is adjusted to suit the local timestamp. It’s the right “global” moment, and it displays as the relevant local time. But, as a penalty, it’s lost the information about where (in which time zone) it was entered.

I think that examination of the content of the raw dumps of the three different types may help you understand why you need to store timestamps in a column type that includes a time zone – if you don’t then you lose some information, and time-based arithmetic will give you some surprises if your application crosses timezones.

Next Issue:

Indexing time (though the link in the comments below to Tony Hasler’s blog probably gives you all the answers you need), and an Oracle design error that I’ve visited before.

April 13, 2013

Deadlocks

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 11:36 am UTC Apr 13,2013

Here’s a deadlock graph that might cause a little confusion:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
 Resource Name          process session holds waits  process session holds waits
 TX-001a0015-00014787        34      90     X             32       3           S
 TX-00190008-0000601b        32       3     X             34      90           S

 session 90: DID 0001-0022-00000327      session 3: DID 0001-0020-000009E9
 session 3: DID 0001-0020-000009E9       session 90: DID 0001-0022-00000327

 Rows waited on:
  Session 90: obj - rowid = 00030CE5 - AAAwzlAAGAABDiuAAA
  (dictionary objn - 199909, file - 6, block - 276654, slot - 0)
  Session 3: obj - rowid = 00030CE5 - AAAwzlAAGAABp8gAAA
  (dictionary objn - 199909, file - 6, block - 433952, slot - 0)

Both sessions are holding X and waiting S on a TX lock. There are several well-known reasons why you might see a share (mode 4) lock on a transaction table slot: collisions on bitmap indexes, unique indexes, index organized tables, or referential integrity checks are the commonest “external” examples, and problems with interested transaction lists (ITLs) or freelists are the common “internal” ones – so the presence of the waits in share mode shouldn’t, of themselves, a source of confusion.

The confusion is in the reported rowids. If you try to interpret them as real rowids you may be unlucky and discover that they seem to be related to the SQL reported for the deadlocked sessions when really the information they hold is garbage. (Just to avoid confusion, let me make it clear that there are cases where the rowids reported definitely WILL be garbage; on the other hand, there may be some scenarios where the rowids are relevant – although I haven’t done any exhaustive check to see if there really are such scenarios.)

When I see a deadlock graph on transaction locks and the waits are for S mode I tend to assume that the information about the rows waited on is probably misleading; when the slot number for the rowid is zero this increases my confidence that the rowid is rubbish. (Zero is a legal value for a rowid slot, of course, so a zero doesn’t prove that the rowid is rubbish, it’s just a coincidence that allows me to continue following a hypothesis.)

The problem is that Oracle doesn’t waste resources tidying up after itself, and in the case of deadlock graphs this laziness shows up in the rowids reported. The trace is simply reporting whatever happens to be in the row_wait_obj#, row_wait_file#, row_wait_block# and row_wait_row# columns of v$session; and if the waiting process hasn’t updated these columns with current row information you could be looking at the details of the last row (or block) that the session waited for. Here’s the description of a test to demonstrate the behaviour:

create table t1 (n1 number, n2 number);
insert into t1 values(1,1);
create unique index t1_i1 on t1(n1);
create unique index t1_i2 on t1(n2);

session 1:	insert into t1 values(2,11);
session 2:	insert into t1 values(3,21);
session 1:	insert into t1 values(4,21);
session 2:	insert into t1 values(5,11);

With this table, and sequence of events following it, session 1 raised a deadlock error, and dumped the following trace:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-000a0021-0000c4aa        16      93     X             13      90           S
TX-00020020-0000fdb9        13      90     X             16      93           S
session 93: DID 0001-0010-00000057	session 90: DID 0001-000D-000000F2
session 90: DID 0001-000D-000000F2	session 93: DID 0001-0010-00000057
Rows waited on:
Session 90: obj - rowid = 00000009 - AAAAAJAABAAAQJcAAA
  (dictionary objn - 9, file - 1, block - 66140, slot - 0)
Session 93: obj - rowid = 0002E7DC - AAAufaAAFAAAAAJAAA
  (dictionary objn - 190428, file - 5, block - 9, slot - 0)

Object 9 is the I_FILE#_BLOCK# index in the data dictionary – and session 90 is definitely not doing anything with that object in this transaction.
Object 190428 is another table in the test schema, but session 93 didn’t access it in this transaction, and the block referenced is the segment header block, not a block that could hold a row.

In fact, just before I started the sequence of inserts I ran this query from a third session (connected as sys) with the following results:

select
	sid,
	row_wait_obj#, row_wait_file#, row_wait_block#, row_wait_row#
from
	v$session
where
	sid in (90,93)
;

       SID ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW#
---------- ------------- -------------- --------------- -------------
        90             9              1           66140             0
        93        190428              5               9             0

The “rows” reported for the deadlock simply echoed the values that were already in the row_wait columns before the test started. (The sessions were waiting on “SQL*Net message from client” at the time.)

Footnote:

Session 90 created the tables and indexes, that’s probably why it happened to have an outstanding reference to the i_file#_block# index.
Session 93 had just run a script to drop all the objects in the schema, which may explain why it happened to have an outstanding reference to a segment header block
If the row_wait_obj# had been set to -1 for either session then the deadlock graph would have reported “No row” for that session.

April 7, 2013

DML Tracking

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 10:04 pm UTC Apr 7,2013

You’ve probably seen questions on the internet occasionally about finding out how frequently an object has been modified. The question is a little ambiguous – does it mean how much change has occurred, or how many DML statements have been executed; either may be an interesting measure. Of course, Oracle gave us a method of answering the first question a long time ago: v$segstat (or v$segment_statistics if you don’t mind doing the join) and the resulting content in the AWR or Statspack reports:

Segments by DB Blocks Changes   DB/Inst: XXXXXXXX/XXXXXXXX  Snaps: 85563-85564
-> % of Capture shows % of DB Block Changes for each top segment compared
-> with total DB Block Changes for all segments captured by the Snapshot

           Tablespace                      Subobject  Obj.      DB Block    % of
Owner         Name    Object Name            Name     Type       Changes Capture
---------- ---------- -------------------- ---------- ----- ------------ -------
XXXXXXXXX  DATA       MLOG$_xxxxxxxxxxxxxx            TABLE    1,144,112   22.16
XXXXXXXXX  DATA       PK_xxxxxxxxxxxxxxxxx            INDEX      614,256   11.90
XXXXXXXXX  DATA       xxxxxxxxxxxxxxx                 TABLE      490,080    9.49
XXXXXXXXX  IDX        IDX_xxxxxxxxxxxxxxxx            INDEX      353,392    6.84
XXXXXXXXX  IDX        PK_yyyyyyyyyyyyyyy              INDEX      273,664    5.30
          -------------------------------------------------------------

There’s still a little ambiguity in this report (in the right circumstances you can change multiple rows in a block while only reporting a single block change), but it’s a very good indicator of the volume of change to an object; but is the million or so block changes due to one SQL statement or 1,000,000 SQL statements ?

With 11.2.0.2 there is an (undocumented) option for finding the answer to that question. Since it’s undocumented I wouldn’t use it on production unless I was really desperate, and even then I’d do the standard “check with Oracle support” first; however I might use it on a test system if I were do something like running tests of a new version of an overnight suite of batch jobs.

You can enable “DML frequency tracking” which seems to count the number of DML statements that modify an object or, to be more accurate, a table. It doesn’t do anything to identify indexes, although it will report index organized tables (using the table name); it won’t identify partition-level DML individually for partitioned tables, it will simply record the DML as being relevant to the table.

To enable the feature you modify hidden parameter _dml_frequency_tracking and then query dynamic performance view v$object_dml_frequencies.


alter system set "_dml_frequency_tracking"=true;

select 
	object, working, slot0, slot1, slot2, slot3, slot4 
from	
	v$object_dml_frequencies
order by
	working + slot0 + slot1 + slot2 + slot3
;

The view has 25 columns for numbered “slots”, and a “working” column. Each slot represents a 15 minute interval (set by parameter _dml_frequency_tracking_slot_time) and only 4 slots are maintained by default (set by parameter _dml_frequency_tracking_slots). The values roll from slot to slot every 15 minutes, and the “working” column keeps track of objects that have suffered DML since the last roll-over. Unfortunately there doesn’t seem to be any obvious way of tracking when a roll-over takes place, so you can’t tell how long the “working” column has been active for.

If you have the default 4 slots set, then “working” and slots 0 to 2 are populated with 15 minutes worth of information each, and slot3 seems to be the accumulated history of all accesses since you enabled the feature.

The view sits on top of x$ksxm_dft, but there are no interesting extra columns in the x$ that aren’t displayed in the v$ and gv$. (It’s an interesting little quirk that the v$ doesn’t following the usual pattern of being a view of the gv$ restricted to the current instance.) You can see the information in the shared pool (by querying v$sgastat) in an area named “dml frequency” – although the figure reported the last time I checked was wrong by a factor of roughly 3 – show about 8KB when I was expecting roughly 24KB.

I haven’t spent a lot of time investigating the feature yet, so there are probably a few questions and boundary conditions to test before using it in anger; but there don’t appear to be any special latches associated with the counters (so possibly the results would be under-count in a highly concurrent system, but maybe the results are collected under a mutex).

Related parameters

_dml_frequency_tracking (false) Control DML frequency tracking — can be set at system level on demand
_dml_frequency_tracking_advance (true) Control automatic advance and broadcast of DML frequencies — ?
_dml_frequency_tracking_slot_time (15) Time length of each slot for DML frequency tracking — needs restart
_dml_frequency_tracking_slots ( 4) Number of slots to use for DML frequency tracking — needs restart

April 4, 2013

Delphix Overview

Filed under: Uncategorized — Jonathan Lewis @ 9:04 pm UTC Apr 4,2013

Update: Here’s the link to the recording of the webinar

I’ll b online tomorrow morning (Friday 5th, 9:00 Pacific time, 5:00 pm UK) in a webinar with Kyle Hailey to talk about my first impressions of Delphix, so I thought I’d write up a few notes beforehand.

I’ve actually installed a complete working environment on my laptop to model a production setup. This means I’ve got three virtual machines running under VMWare: my “production” machine (running Oracle 11.2.0.2 on OEL 5, 64-bit), a “development” machine (which has the 11.2.0.2 software installed, again on OEL 5, 64-bit), and a machine which I specified as Open Solaris 10, 64-bit for the Delphix server VM (pre-release bloggers’ version). The two Linux servers are running with 2.5GB of RAM, the Delphix server is running with 8GB RAM, and all three machines are running 2 virtual CPUs. (My laptop has an Intel quad core i7, running two threads per CPU, 16GB RAM, and 2 drives of 500GB each.) The Linux machines were simply clones of another virtual machine I previously prepared and the purpose of the exercise was simply to see how easy it would be to “wheel in” a Delphix server and stick it in the middle. The answer is: “pretty simple”. (At some stage I’ll be writing up a few notes about some of the experiments I’ve done on that setup.)

To get things working I had to create a couple of UNIX accounts for a “delphix” user on the Linux machines, install some software, give a few O/S privileges to the user (mainly to allow it to read and write a couple of Oracle directories), and a few Oracle privileges. The required Oracle privileges vary slightly with the version of Oracle and your prefered method of operation, but basically the delphix user needs to be able to run rman, execute a couple of Oracle packages, and query some of the dynamic performance views. I didn’t have any difficulty with the setup, and didn’t see any threats in the privileges that I had to give to the delphix user. The last step was simply to configure the Delphix server to give it some information about the Linux machines and accounts that it was going to have access to.

The key features about the Delphix server are that it uses a custom file system (DxFS, which is based on ZFS with a number of extensions and enhancements) and it exposes files to client machines through NFS; and there are two major components to the software that make the whole Delphix package very clever.

Oracle-related mechanisms

At the Oracle level, the Delphix server sends calls to the production database server to take rman backups (initially a full backup, then incremental backups “from SCN”); between backup requests it also pulls the archived redo logs from the production server – or can even be configured to copy the latest entries from the online redo logs a few seconds after they’ve been written (which is one of the reasons for requiring privileges to query some of the dynamic performance views, but the feature does depend on the Oracle version).

If you want to make a copy of the database available, you can use the GUI interface on the Delphix server to pick a target machine, invent a SID, and Service name, and pick an SCN (or approximate timetamp) that you want to database to start from, and within a few minutes the Delphix server will have combined all the necessary backup pieces, applied any relevant redo, and configured your target machine to start up an instance that can use the (NFS-mounted) database that now exists on the Delphix server. I’ll explain in a little while why this is a lot cleverer than a simple rman “restore and recover”.

DxFS

Supporting the Oracle-related features, the other key component of the Delphix server is the Delphix file-system (DxFS). I wrote a little note a few days ago to describe how Oracle can handle “partial” updates to LOB values – the LOB exists in chunks with an index on (lob_id, chunk_number) that allows you to pick the right chunks in order. When you update a chunk in the LOB Oracle doesn’t really update the chunk, it creates a new chunk and modifies the index to point at it. If another session has a query running that should see the old chunk, though, Oracle can read the index “as at SCN” (i.e. it creates a read consistent copy of the required index blocks) and the read-consistent index will automatically be pointing at the correct version of the LOB chunk. DxFS does the same sort of thing – when a user “modifies” a file system block DxFS doesn’t overwrite the original copy, it writes a new copy to wherever there’s some free space and maintains some “indexing” metadata that tells it where all the pieces are. But if you never tell the file system to release the old block you can ask to see the file as at a previous point in time at no extra cost!

But DxFs is even cleverer than that because (in a strange imitation of the “many worlds” interpretation of quantum theory) a single file can have many different futures. Different users can be identified as working in different “contexts” and the context is part of the metadata describing the location of blocks that belong to the file. Imagine we have a file with 10 blocks sitting on DxFs - in your context you modify blocks 1,2 and 3 but at the same time I modify blocks 1,2 and 3 in my context. Under DxFS there are now 16 blocks associated with that file – the original 10, your three modified blocks and my three modified blocks and, depending on timestamp and context, someone else could ask to see any one of three different versions of that file – the original version, your version, or my version.

Now think of that in an Oracle context. If we copy an entire set of database files onto DxFS, then NFS-mount the files on a machine with Oracle installed, we can configure and start up an instance to use those files. At the same time we could NFS-mount the files on another machine, configuring and starting another instance to use the same data files at the same time! Any blocks changed by the first instance would be written to disc as private copies, any blocks changed by the second instance would be written to discs as private copies – if both instances managed to change 1% of the data in the course of the day then DxFs would end up holding 102% of the starting volume of data: the original datafiles plus the two sets changed blocks – but each instance would think it was the sole user of its version of the files.

There’s another nice (database-oriented) feature to Delphix, though. The file system has built-in compression that operates at the “block” level. You can specify what you mean by the block size (and for many Oracle sites that would be 8KB) and the file system would transparently apply a data compression algorithm on that block boundary. So when the database writer writes an 8KB block to disc, the actual disc space used might be significantly less than 8KB, perhaps by a factor of 2 to 3. So in my previous example, not only could you get two test databases for the space of 1 and a bit – you might get two test databases for the space of 40% or less of the original database.

Delphix vs. rman

I suggested earlier on that Delphix can be a lot clever than an rman restore and recover. If you take a full backup to Delphix on Sunday, and a daily incremental backup (let’s preted that’s 1% of the database per day) for the week, then Delphix can superimpose each incremental onto the full backup as it arrives. So on Monday we construct the equivalent of a full Monday backup, on Tuesday we construct the equivalent of a full Tuesday backup, and so on. But since DxFS keeps all the old copies of blocks this means two things that we can point an instance at a full backup for ANY day of the week simply by passing a suitable “timestamp” to DxFs – and we’ve 7 full backups for the space of 107% of a single full backup.

There are lots more to say, but I think they will have to wait for tomorrow’s conversation with Kyle, and for a couple more articles.

Register of Interests / Disclosure

Delphix Corp. paid my consultancy rates and expenses for a visit to the office in Menlo Park to review their product.

March 31, 2013

Index Selectivity

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 6:42 pm UTC Mar 31,2013

Here’s a summary of a recent posting on OTN:

I have two indexes (REFNO, REFTYPESEQNO) and (REFNO,TMSTAMP,REFTYPESEQNO). When I run the following query the optimizer uses the second index rather than the first index – which is an exact match for the predicates, unless I hint it otherwise:

select *
from   RefTable
where  RefTypeSeqNo = :1
and    RefNo = :2;

Default plan:
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     3 |   126 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REFTABLE      |     3 |   126 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | REFTABLE_CX03 |     3 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))
       filter("REFTYPESEQNO"=TO_NUMBER(:1))

Hinted plan:
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     3 |   126 |    15   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REFTABLE      |     3 |   126 |    15   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | REFTABLE_CX02 |    14 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))

This is actually an example of a feature of the optimizer that I described a few years ago. The original note described a change as you moved from 10.1 to 10.2 and on to 11.1 – but once you’ve seen the basic issue there are a number of variations on how it might appear. In this case the OP seems to be using 10gR2, where the distinct_keys value from an index is used to calculate the cost and row estimate for the index and for the table access cost when that specific indexed access path is being considered.

So with the hint to use the accurate index we see an index cardinality estimate of 14 rows with a table cardinality of 3 despite the fact that the plan shows no extra predicates applied at the table; the cost of accessing the table is also clearly related to the cardinality estimate on the index line.

In the default plan when the wrong index is used, the optimizer doesn’t pay any attention to the distinct_keys from the other index, and simply uses the standard “product of column selectivities”.

11g introduces two changes – when calculating the table cardinality the distinct_keys value for the index is carried forward (so the plan with the high index cardinality but low table cardinality would report the same cardinality for both operations), and the distinct_keys from the first index would be used when doing the calculations for the second index – which would increase the cost of using the wrong index.

There’s really very little you can do to find a strategic fix for this type of problem in 10g – obviously you could add hints whenever Oracle used the wrong index, but that’s not reallya desirable approach, and it is possible to adjust column statistics in such a way that the calculations the optimizer uses give better approximations, but that’s not always very easy to do well. Ultimately you just have to be very careful about your choice of indexes – and when you think that two indexes show a significant overlap in columns consider the possibility that one carefully defined index may be able to do the job of both.

 

 

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,392 other followers