Oracle Scratchpad

May 10, 2013

Hakan Factor

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 3:52 pm UTC May 10,2013

Here’s a quick and dirty script to create a procedure (in the SYS schema – so be careful) to check the Hakan Factor for an object. If you’re not familiar with the Hakan Factor, it’s the value that gets set when you use the command “alter table minimize records_per_block;”.

I was prompted to publish this note by an item on the OTN SQL forum describing a problem with partition exchange with a table when there were bitmap indexes in place and the table had been changed to have some extra columns added. (Problem as yet unresolved as I publish).

If you start playing with the Hakan Factor, you’ll find that there are some odd little bugs in what gets stored and how it gets used. (SQL updated to use bitand() to reflect comments below and Karsten Spang’s blog note; also edited following a comment on OTN to show the rest of the spare1 flag bits)


create or replace procedure show_hakan(
	i_table		in	varchar2,
	i_owner		in	varchar2 default user
) as
	m_obj		number(8,0);
	m_flags		varchar2(12);
	m_hakan		number(8,0);
begin

	/* created by show_hakan.sql	*/
	select
		obj#, 
/*
		case 
			when	(spare1 > 5 * power(2,15)) 
				then	(spare1 - 5 * power(2,15))
			when	(spare1 > power(2,17)) 
				then	(spare1 - power(2,17))
			when	(spare1 > power(2,15)) 
				then	(spare1 - power(2,15))
				else	spare1
		end 				hakan
*/
	to_char(
		bitand(
			spare1, to_number('ffff8000','xxxxxxxx')
		),
		'xxxxxxxx'
	)					flags,
	bitand(spare1, 32767)			hakan	-- 0x7fff	
	into
		m_obj,
		m_flags,
		m_hakan
	from
		tab$
	where	obj# in (
			select	object_id 
			from	dba_objects
			where	object_name = upper(i_table)
			and	object_type = 'TABLE'
			and	owner = upper(i_owner)
		)
	;

	dbms_output.put_line(
		'Hakan factor for object ' || 
		m_obj   || ' (' ||
		i_owner || '.' || 
		i_table || ') is ' || 
		m_hakan || ' with flags ' ||
		m_flags
	);
end;
/

drop public synonym show_hakan;
create public synonym show_hakan for show_hakan;
grant execute on show_hakan to public;

 

You’ll notice that I’ve done an “upper()” on the table and owner – that means you’re in trouble if you have created an schemas or tables with mixed-case names (but you wouldn’t do that in a production system, would you?)

Update – A little bug history

One of the odd details of the Hakan factor is that the value it shows is one less than the number of rows that will be stored in a block; and since it looks as if the factor is not allowed to drop to zero, you can’t hack the Hakan factor to force one row per block.

So here’s a (trivial and sub-optimal) piece of code to check current number of rows per block in a simple heap table (assuming the tablespace consists of a single file):

select 
	ct, count(*) 
from	(
		select	dbms_rowid.rowid_block_number(rowid), count(*) ct  
		from	t1
		group by
			dbms_rowid.rowid_block_number(rowid)
	)
group by ct
order by ct
;

Here’s the output of a session, running under 9.2.0.8, cut and pasted from the screen:

SQL> @afiedt.buf

        CT   COUNT(*)
---------- ----------
         9          1
        16          1
SQL> alter table t1 nominimize records_per_block;
SQL> alter table t1 minimize records_per_block;
SQL> execute show_hakan('t1')
Hakan factor for object 48865 (TEST_USER.t1) is: 15
SQL> alter table t1 move;
SQL> @afiedt.buf

        CT   COUNT(*)
---------- ----------
        10          1
        15          1
SQL> alter table t1 nominimize records_per_block;
SQL> alter table t1 minimize records_per_block;
SQL> execute show_hakan('t1')
Hakan factor for object 48865 (TEST_USER.t1) is: 14

Every time you moved the table, 9.2.0.8 (and earlier) used the actual stored value of the Hakan Factor to rebuild the table; but if you regenerated the Hakan Factor the stored value was one less than the actual row count. So if you kept repeating the process the number of rows per block would decrease by one each time and the table would get bigger and bigger.

It’s a silly example – but the real-world relevance was that a direct path insert behaved differently from a normal insert and this could result in a significant amount of wasted space if you were doing bulk loads in your overnight batch; so the code changed in 10g to make the normal and direct path inserts consistent with each other, but the change went the wrong way and, as a side effect, you get one more row per block than suggested by the Hakan Factor – and you can’t trick the Hakan factor into enforcing one row per block any more.

April 29, 2013

MV Refresh

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 2:13 pm UTC Apr 29,2013

Materialized views open up all sorts of possibilities for making reporting more efficient – but at the same time they can introduce some “interesting” side effects when you start seeing refreshes taking place. (Possibly one of the most dramatic surprises appeared in the upgrade that switched many refreshes into “atomic” mode, changing a “truncate / append” cycle into a massively expensive “delete / insert” cycle).

If you want to have some ideas of the type of work that is involved in the materialized view “fast refresh”, you could look at a recent pair of articles by Alberto Dell’Era on (very specifically) outer join materialized views (which a link back to a much older article on inner join materialized view refresh):

 

 

March 27, 2013

Open Cursors

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:13 pm UTC Mar 27,2013

Here’s a little detail that appeared in 11gR2 that may help you answer questions about open cursors. Oracle has added a “cursor type” column to the view v$open_cursor, so you can now see which cursors have been held open because of the pl/sql cursor cache, which have been held by the session cursor cache, and various other reasons why Oracle may take a short-cut when you fire a piece of SQL at it.

The following is the output showing the state of a particular session just after it has started up in SQL*Plus and called a PL/SQL procedure to run a simple count:

select
        cursor_type, sql_text
from
        V$open_cursor
where
        sid = 17
order by
        cursor_type,
        sql_text
;

CURSOR_TYPE                      SQL_TEXT
-------------------------------- ------------------------------------------------------------
DICTIONARY LOOKUP CURSOR CACHED  BEGIN DBMS_APPLICATION_INFO.SET_MODULE(:1,NULL); END;
                                 BEGIN DBMS_OUTPUT.DISABLE; END;
                                 BEGIN DBMS_OUTPUT.ENABLE(1000000); END;
                                 BEGIN dbms_random.seed(0); END;
                                 SELECT ATTRIBUTE,SCOPE,NUMERIC_VALUE,CHAR_VALUE,DATE_VALUE F
                                 SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS WHERE   (UPPER('
                                 SELECT USER FROM DUAL
                                 select /*+ connect_by_filtering */ privilege#,level from sys
                                 select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('U
                                 select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECON
                                 select metadata from kopm$  where name='DB_FDO'
                                 select privilege# from sysauth$ where (grantee#=:1 or grante
                                 select to_char(sysdate,'hh24miss') time_now from dual
                                 select value$ from props$ where name = 'GLOBAL_DB_NAME'

OPEN                             BEGIN spin_1; END;
                                 table_1_ff_208_0_0_0

OPEN-RECURSIVE                   insert into sys.aud$( sessionid,entryid,statement,ntimestamp

PL/SQL CURSOR CACHED             SELECT COUNT(*) X FROM KILL_CPU CONNECT BY N > PRIOR N START

SESSION CURSOR CACHED            BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
                                 SELECT DECODE('A','A','1','2') FROM DUAL

Variations are left to the user.
There are a few other cursor types – here’s the list given in the 11.2 Server Reference manual under the definition of v$open_cursor:

  • BUNDLE DICTIONARY LOOKUP CACHED
  • CONSTRAINTS CURSOR CACHED
  • DICTIONARY LOOKUP CURSOR CACHED
  • OPEN
  • OPEN-PL/SQL
  • OPEN-RECURSIVE
  • PL/SQL CURSOR CACHED
  • REPLICATION TRIGGER CURSOR CACHED
  • SESSION CURSOR CACHED

It’s an interesting exercise to consider why there are so many types, and then create some tests to confirm or refute your hypothesis. I haven’t checked, but here are a few ideas:

  • replication trigger cursor cached: I don’t remember which version introduced the change, but once upon a time the triggers updating the materialized view logs were real after insert/update/delete triggers, but now they’re “pre-compiled” – so it’s not surprising they form a special case.
  • dictionary lookup cursor cached: are these, perhaps, the statements that are currently cached in the “_row_cache_cursors” cache for data dictionary access; the parameter was once set to 10, but currently defaults to 20.
  • bundle dictionary lookup cached: why would there be a special case of dictionary lookup ? perhaps this is the set of cursors needed to read the first few tables in the data dictionary that allow the optimizer to do its work (how do you optimize a query against tab$ if you need to query syn$, obj$ and tab$ to discover that tab$ is a table ?)
  • constraints cursor cached: probably something to do with the SQL (internal, or externalised) that Oracle has to run to check or implement details of referential integrity constraints.

Footnote (28th March):

By a strange coincidence a note came up on OTN today that pointed to a different version of the Oracle manual where the possible cursor types are listed under their “internal” names – but I’m not sure if there’s a version of Oracle where you’d see them looking like this:

  • CACHED
  • KNT CACHED
  • KQD BUNDLE CACHED
  • KQD CACHED
  • KXCC CACHED
  • PL/SQL
  • PL/SQL CACHED
  • SYSTEM

March 22, 2013

LOB Update

Filed under: Infrastructure,LOBs,Oracle — Jonathan Lewis @ 10:36 pm UTC Mar 22,2013

This note is about a feature of LOBs that I first desribed in “Practial Oracle 8i” but have yet to see used in real life. It’s a description of how efficient Oracle can be, which I’ll start with a description of, and selection from, a table:

create table test_lobs (
	id              number(5),
	bytes           number(38),
	text_content    clob
)
lob (text_content) store as text_lob(
	disable storage in row
	cache
)
;

-- insert a row

SQL> desc test_lobs
 Name                    Null?    Type
 ----------------------- -------- ----------------
 ID                               NUMBER(5)
 BYTES                            NUMBER(38)
 TEXT_CONTENT                     CLOB

SQL> select id, bytes, dbms_lob.getlength(text_content) from test_lobs;

        ID      BYTES DBMS_LOB.GETLENGTH(TEXT_CONTENT)
---------- ---------- --------------------------------
         1     365025                           365025

1 row selected.

I’ve got a table with a single CLOB column holding a single row. The size of the single CLOB is roughly 365KB (or about 45 blocks of 8KB). Old hands who have had to suffer LONG columns will recognise the trick of recording the size of a LONG as a separate column in the table; it’s a strategy that isn’t really necessary with LOBs but old coding habits die hard. It’s quite hard to find details of how much space has been used in a LOB segment (the space_usage procedure in the dbms_space package doesn’t allow you to examine LOBSEGMENTs), but I did a coupld of block dumps to check on this LOBSEGMENT and it had allocated 46 blocks on the first insert.

So here’s the clever bit – how big will the LOBSEGMENT grow when I update that one CLOB ?

It’s common knowledge (to users of LOBs) that the undo mechanism Oracle has for LOBs is simply to leave the old LOB in place and create a new one – so the intial response to the question might be to guess that the LOBSEGMENT will grow to roughly double the size. But it doesn’t have to be like that, at least, not if you update the LOB the way I happen to have done, which is like this:

declare

	m_length	integer;
	m_lob		clob;

begin

	select
		text_content,
		dbms_lob.getlength(text_content)
	into	m_lob, m_length
	from
		test_lobs
	where
		id = 1
	for update
	;

	dbms_output.put_line('Lob size: ' || m_length);

	dbms_lob.write(
		lob_loc	=> m_lob,
		amount	=> 17,
		offset	=> 1,
		buffer	=> 'This is an update'
	);

	commit;

end;
/

My code very specifically changes only the first 17 bytes of the LOB. So how much does Oracle have to do to effect this change ? The LOB-handling mechanisms are smart enough to work out that only the first (of 45) blocks in the LOB need to be changed, so Oracle need only add one block to the segment and write the new version of the first LOB block to that one block. (In fact the segment – which was in a tablespace using freelist management – grew by the “standard” 5 blocks from which Oracle selected just one block to add to the LOB.)

So how does Oracle keep track of the whole LOB if it can change it one piece at a time ? This is where the (notionally invisible and you don’t need to know about it) LOBINDEX comes into play. Oracle maintains an index keyed by (LOB_ID, chunk_number) *** pointing to all the chunks of a LOB in order, so when you update a single chunk Oracle simply creates an updated copy of the chunk and changes the appropriate index entry to point to the new chunk. So here’s an image representing our one LOB value just after we’ve created it and before we’ve updated:

lob_1

And then we “modify” the first chunk – which means we have to add a chunk (which in this case is a single block) to the segment, create a new version of the first chunk, modify the index to point to the new block, and add an index entr – keyed by time-stamp – to the end of the index to point to the old chunk; something like this:

lob_2

Now, when we run a query to select the LOB, Oracle will follow the index entries in order and pick up the new chunk from the end of the LOBSEGMENT. But the LOBINDEX is protected by undo in the standard fashion, so if another long-running query that started before our update needs to see the old version of the LOB it will create a read-consistent copy of the relevant index leaf block- which means that from its perspective the index will automatically be pointing to the correct LOB chunk.

The index is actually quite an odd one because it serves two functions; apart from pointing to current lobs by chunk number, it also points to “previous” chunks by timestamp (specifically the number of seconds between Midnight of 1st Jan 1970 and the time at which the chunk was “overwritten”). This makes it easy for Oracle to deal with the retention interval for LOBs – any time it needs space in the LOBSEGMENT it need only find the minimum timestamp value in the index and compare it with “sysdate – retention” to see if there are any chunks available for re-use.

To sum up – when you update LOBs, and it’s most beneficial if you have an application which doees piece-wise updates, you leave a trail of old chunks in  in the LOBSEGMENT. The version of the LOB you see is dictated by the version of the index that you generate when you request a copy of the LOB at a given SCN.

 

*** Footnote: My description of the LOBINDEX was an approximation. Each index entry carries a fixed size “payload” listing up to eight lob chunks; so the (LOB_ID, chunk_number) index entries in a LOBINDEX may point to every 8th chunk in the LOB. The significance of the “fixed size” payload is that the payload can be modified in place if the pointer to a LOB chunk has to be changed – and this minimises disruption of the index (at a cost of some wasted space).

 

February 25, 2013

Free Space

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:36 pm UTC Feb 25,2013

Question – How can you have a single file in a single tablespace showing multiple free extents when there are no objects using any space in that file ? For example, from an 11.1.0.7 database:

SQL> select
  2          *
  3  from    user_free_space
  4  where
  5          tablespace_name = 'TEST_8K'
  6  order by
  7          file_id, block_id
  8  ;

TABLESPACE_N    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------ ---------- ---------- ---------- ---------- ------------
TEST_8K               3        128    1048576        128            3
TEST_8K               3        256    1048576        128            3
TEST_8K               3        384    1048576        128            3
TEST_8K               3        512  130023424      15872            3

4 rows selected.

The answer in this case is simple – here’s what I did just before running my query:

SQL> create table t1(n1 number);

Table created.

SQL> create table t2(n1 number);

Table created.

SQL> create table t3(n1 number);

Table created.

SQL> drop table t1;

Table dropped.

SQL> drop table t2;

Table dropped.

SQL> drop table t3;

Table dropped.

SQL>

This test is on 11.1.0.7; test_8k is my default tablespace, uses an 8KB block size (did you spot the clue) with 1MB uniform extents. The segments created immediately and I haven’t purged my recyclebin. Because I’ve dropped the tables Oracle includes their space in the “free space” views, but because I need to be able to flash them back into existence the segments can’t be coalesced into the adjacent free space, and they will also be reported in dba_segments.

Here’s a harder one – there are NO objects in this tablespace, and nothing hiding in the recyclebin:

SQL> select
  2          *
  3  from    user_free_space
  4  where
  5          tablespace_name = 'TEST_2K'
  6  order by
  7          file_id, block_id
  8  ;

TABLESPACE_N    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------ ---------- ---------- ---------- ---------- ------------
TEST_2K               7        512   58720256      28672            7
TEST_2K               7      29184   58720256      28672            7
TEST_2K               7      57856   58720256      28672            7
TEST_2K               7      86528   32505856      15872            7

And here’s the statement I executed (as SYS) just before I ran the query – so no chance that there’s anything hidden in the file:


create
        tablespace test_2k
        datafile 'C:\ORACLE\ORADATA\d11g\d11g\test_2k.dbf'
        SIZE 200M reuse
        blocksize 2k
        extent management local
        uniform size 4k
        segment space management manual
;

Update: 26th Feb

I’m sitting in Munich airport and boarding starts in 10 minutes, so just enough time to give an answer. I’ll start with a block dump of block 3 of the data file.

Block dump from disk:
buffer tsn: 22 rdba: 0x01c00003 (7/3)
scn: 0x0b86.06d63ae4 seq: 0x01 flg: 0x04 tail: 0x3ae41e01
frmt: 0x02 chkval: 0x4e8a type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0C112200 to 0x0C112A00
C112200 0000621E 01C00003 06D63AE4 04010B86  [.b.......:......]
C112210 00004E8A 00000007 00000200 00000000  [.N..............]
C112220 00000000 00003800 00000000 00000000  [.....8..........]
C112230 00000000 00000000 00000000 00000000  [................]
        Repeat 123 times
C1129F0 00000000 00000000 00000000 3AE41E01  [...............:]
File Space Bitmap Block:
BitMap Control:
RelFno: 7, BeginBlock: 512, Flag: 0, First: 0, Free: 14336
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
...
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000

This is a locally managed tablespace, so the third block of the file is the first bitmap space management block for the tablespace. The tablespace uses a 2KB block size (so the amount of space in the block for the bit map is slight under 2KB) and a 4KB uniform extent size (so each bit in the map represents 2 blocks/4KB in the tablespace).

As you can see from line 15, the first bit represents the extent starting at block 512, and there are 14,336 bits available (1,792 bytes), so the bitmap in the next block would start at block 512 + 2 * 14,336 = 512 * 28,672 = 29,184. This lines up exactly with the first chunk of free space reported in dba_free_space above.

As one of the comments indicated – the code that populates x$ktfbfe is probably called once for each bitmap space management block, and it doesn’t seem to bother trying to “coalesce” two free space fragments identified by adjacent bitmap blocks.

February 11, 2013

Optimisation ?

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 2:05 pm UTC Feb 11,2013

I was at a client site recently where one of the end-users seemed to have discovered a cunning strategy for optimising a critical SQL statement. His problem was that his query screen times out after 2 minutes, so any query he runs has to complete in less than two minutes or he doesn’t see the results. Unfortunately he had a particular query which took nearly 32 minutes from cold to complete – partly because it’s a seven-table join using ANSI OUTER joins, against tables ranging through the 10s of millions of rows and gigabytes of data – the (necessary) tablescan of the table that had to be first in the join order took 70 seconds alone.

But our intrepid user seems to have made an important discovery and engineered a solution to his performance problem. I think he’s noticed that when you run a query twice in a row the second execution is often faster than the first. I can’t think of any other reason why the same person would run the same query roughly every four minutes between 8:00 and 9:00 am every morning (and then do the same again around 5:00 in the afternoon).

Looking at the SQL Monitoring screen around 10:00 the first day I was on-site I noticed this query with a very pretty graphic effect of gradually shrinking blue bars as 32 minutes of I/O turned into 2 minutes of CPU over the course of 8 consecutive executions which reported run times something like:  32 minutes, 25 minutes, 18 minutes, 12 minutes, 6 minutes, 4 minutes, 2.1 minutes, 2 minutes.

It’s lucky (for that user) that the db_cache_size is 60GB. On the other hand this machine is one of those Solaris boxes that likes to pretend that it’s got 128 CPUs when really it’s only 16 cores with 8 lightweight threads per core – you don’t want anyone running a query that uses 2 solid CPU minute on one of those boxes because it’s taking out 1/16th of your CPU availability, while reporting a load of 1/128 of your CPUs.

Footnote: the query can be optimised (properly) – it accessed roughly 100M rows of data to return roughly 300 rows (with no aggregation), so we just need to do a little bit of work on precise access paths.

February 6, 2013

Delphix

Filed under: Infrastructure,Oracle,Wishlist — Jonathan Lewis @ 7:19 pm UTC Feb 6,2013

If you’re a regular follower or my blog you may recall Kyle Hailey and the joint webinar we did nearly two years ago on “Visual SQL Tuning” covering an approach I’ve written about in the past and a product that he developed at Embarcadero to automate the work that I’d been doing by hand and eye.

Kyle has now moved on to Delphix, and has become involved with another really interesting piece of technology – database virtualization. How do you supply a terabyte sized database to five different development teams without using up 5TB of disc space ? Create an operating environment that keeps one master copy of the database while maintaining a set of (small) private files for each team that hold private copies of the blocks that have been changed by that team – and that’s just one feature of the product.

The product is sufficiently interesting (plus I have a healthy regard for Kyle’s opinions) that I’ve accepted an invitation to go over to California for a few days next month to experiment with it, see what it can do, try to stress it a bit and so on. The people at Delphix are so confident that I’ll be impressed that they’re going to let me do this and then write up a blog telling you how things went.

Have a browse around their documentation and if you’re interested add a suggestion to the comment telling me what you’d like me to test, and how, and I’ll see if I can fit it into my timetable (no promises – but if you come up with interesting ideas I’ll see what I can do).

January 24, 2013

Compression

Filed under: compression,Infrastructure,Oracle — Jonathan Lewis @ 1:06 pm UTC Jan 24,2013

Red Gate have asked me to write a few articles for their Oracle site, so I’ve sent them a short series on “traditional” compression in Oracle – which means I won’t be mentioning Exadata hybrid columnar compression (HCC a.k.a. EHCC). There will be five articles, published at the rate of one per week starting Tuesday (15th Jan). I’ll be supplying links for them as they are published.

December 28, 2012

Quiz Night

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 11:30 am UTC Dec 28,2012

Here’s a little puzzle if you have nothing better to do between now and the new year. The following extract came from 11.2.0.3, but could have come from 10.2.0.5 or 9.2.0.8 (and many others). I’ve got a simple heap table where the last thing I (or anyone) did was “alter table t1 move” before dumping the first (data) block of the table. Looking at the resulting trace file, I see the following:

fsbo=0x56e
fseo=0xf4d
avsp=0x5f
tosp=0x5f

If you need to have the acronyms decoded they are (according to my best guess):
fsbo – free space, beginning offset
fseo – free space, end offset
avsp – available space
tosp – total space

Doing the arithmetic, the free space starts at offset 0x56e and ends at 0xf4d, which means the free space gap is 2,527 bytes; but the total space available for use is only 0x5f bytes, i.e. 95 bytes. So what has happened to the other 2,432 ?

Remember – I dumped the block immediately after issuing “alter table t1 move”, so there are no issues of delayed block cleanout, uncommitted transactions etc. to worry about.

Footnote: the reason why you have “available space” and “total space” is to keep track of the space made available by deleted rows. The “avsp” (usually) reports the size of the gap between the row directory and the row heap; the “tosp” includes the space in the holes left in the row heap after rows have been deleted (or updated in a way that moves them up to the top of the heap, leaving a gap behind them, or updated in situ in a way that reduces the row length leaving a little hole).

November 27, 2012

IOT Load

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 5:15 pm UTC Nov 27,2012

When I introduced Connor McDonald’s blog a few days ago, it was because we had exchanged a couple of email messages (through the Oak Table Network) about how to minimise the resource consumption when copying a load of data from one IOT to another of the same structure. His problem was the way in which the obvious way of copying the data resulted in a massive sort even though, in principle, it should not have been necessary to sort anything since the data could have been extracted in order by walking the existing IOT.

As a suggestion I referenced a comment I had made in the Addenda to Practical Oracle 8i about 12 years ago when I had first solved the problem of loading an IOT with minimal logging and no sorting. At the time I had been loading data from a sorted file into an empty table that was then going to be exchnaged into a partitioned IOT – but it crossed my mind that load from a flat file and loading from a UNIX pipe were pretty much the same thing, so perhaps Connor could workaround his problem by making one session spool to a pipe while another session was reading it. In the end, he simply created a massive temporary tablespace, but I thought I’d modify a test script I wrote a few years ago to demonstrate my idea – and here it is:

rem
rem	create an empty IOT
rem	Note the columns used for the PK
rem

create table iot_load(
	id	number,
	name	varchar2(32),
	padding	varchar2(100),
	constraint iot_pk primary key(name, id)
)
organization index
;

rem
rem	Create a load file
rem	But since we are running UNIX use the mknod command to
rem	create a pipe before spooling to that pipe.
rem

set pagesize 0
set feedback off
set newpage 0
set trimspool on
set linesize 180
set timing off
set termout off

column object_name format a38

host mknod /tmp/iot_load.dat p

spool /tmp/iot_load.dat

select r, object_name, padding
from
(
	select
		rownum	r,
		object_name,
		rpad('x',100,'x') padding
	from
		all_Objects
	where
		rownum <= 10000
)
order by
	object_name, r
;

spool off

This script creates a target IOT, and uses a simple select statement (you get a clue about how old this code is by the fact that I’ve referenced all_objects, and not a “connect by” with dual) to generate a set of date that is sorted in an order that is a suitable match for the definition of the IOT. In Connor’s case he could have used a simple ‘select from source IOT order by primary key’. I’ve set up a few SQL*Plus environment detals to make the output completely flat and undecorated. A key point, though, is that I’m spooling to a file called /tmp/iot_load.dat and that file has been pre-created as a UNIX pipe. When you run this script the select statement will hang almost immediately because the pipe will become full, at which point you need another process to start emptying it. So from another UNIX session run the following (changing the userid and password as necessary):

sqlldr userid=test_user/test control=iot_load data=/tmp/iot_load.dat

Here’s the content of the control file iot_load.ctl

OPTIONS (direct = true)
UNRECOVERABLE
LOAD DATA
TRUNCATE
INTO table iot_load
sorted indexes (iot_pk)
(
	id		position(01:10)	char,
	name		position(12:49) char,
	padding		position(51:150) char
)

We use direct path load (which means virtually no undo) in unrecoverable mode (which means virtually no redo). But the unrecoverable bit only works because we start by truncating the IOT, and promising that the incoming data is sorted in order of the IOT’s primary key index. And since the data is appearing in order, Oracle doesn’t have to sort it before inserting it. Net result: no undo, no redo (apart from metadata) and no sorting – i.e. minimum overhead for loading the IOT – we simply walk the source in order and copy it into the target.

As this session reads from the pipe, the first session can resume writing into the pipe, so the data flows from source to target through a small piece of shared memory (techically a bit of the file-system buffer, I suppose) until the source session closes the pipe with its “spool off”.

October 5, 2012

SSD

Filed under: Exadata,Infrastructure,Oracle,Performance,redo — Jonathan Lewis @ 1:04 pm UTC Oct 5,2012

There’s never enough time to read everything that’s worth reading, so even though Guy Harrison’s blog is one of the ones worth reading I find that it’s often months since I last read it. Visiting it late last night, I found an interesting batch of articles spread over the last year about the performance of SSD – the conclusions may not be what you expect, but make sure you read all the articles or you might end up with a completely misleading impression:

Don’t forget to read the comments as well. For other notes Guy has written about SSD, here’s a URL for his SSD tag.

October 1, 2012

Row sizes 2

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:34 am UTC Oct 1,2012

In an earlier post I showed you how you could generate SQL to analyze the distribution of row sizes in a table. In the introduction to the code I made a comment about how it failed to “allow for nulls at the end of rows”; a feature of Oracle storage that isn’t commonly known is that a set of consecutive null columns at the end of a row take up no storage space, while any null columns followed by a non-null column take up one byte (holding the value 0xFF) per column so that Oracle can “count its way” through the null columns to the non-null column. Consider this example:
(more…)

September 27, 2012

Row sizes

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 5:32 pm UTC Sep 27,2012

A recent question on Oracle-L (a performance problem creating indexes that turned into an issue with migrated rows) prompted me to revisit a piece of code I first wrote about 20 years ago for an Oracle 6 system, and first published in November 2000. It addresses the question of analysing row-lengths in detail and, if you look at the example I’ve linked to, you’ll see that this could be very useful when you’re trying to work out suitable settings for pctfree and why you’re suffering from row migration.

The script is strictly limited to “simple” tables, by which I mean heap tables with columns that are basic data types and don’t include such things as Longs, LOBs, nested tables and all the other nasty things that usually break simple utilities. All it does is estimate the length of each row, then aggregating by row length. The estimate doesn’t allow for nulls at the ends of rows or columns longer than 254 bytes – technically the former don’t need length bytes and the latter use 3 bytes for the length when the column is more than 250 bytes. I don’t use the script often, and haven’t got around to including all the column types it won’t work for – that’s left as an exercise for the reader.
(more…)

September 17, 2012

Private Redo

Filed under: Infrastructure,Oracle,redo — Jonathan Lewis @ 8:28 pm UTC Sep 17,2012

Following a question on the Oracle Core Addenda pages, here’s a little script to tell you about the sizes of the public and private redo threads currently active in the instance. It’s a minor variation of a script I published in Appendix D (Dumping and Debugging), page 237 to show the addresses of current activity in the various log buffers:
(more…)

September 8, 2012

DDL triggers

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 5:14 pm UTC Sep 8,2012

One of the partitioning features that Oracle introduced relatively recently was the “interval partition”, for example:

create table transactions (
	account_id		number(8)	not null,
	transaction_date	date		not null,
	transaction_type	varchar2(2)	not null,
	transaction_id		varchar2(10)	not null,
	amount			number(10,2)	not null,
	padding			varchar2(100)
)
partition by range (transaction_date)
interval (numtoyminterval(1,'MONTH'))
(
   partition p200801 values less than (to_date('01-FEB-2008','DD-MON-YYYY'))
)
;

When I insert data into this table for a partition that doesn’t yet exist, Oracle will work out which partition it should be and create it automatically before doing the insert. The benefit of this trick, of course, is that the DBAs and developers don’t have to write any code to add partitions in anticipation of time passing and new data appearing.
(more…)

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,391 other followers