I was at a client site recently where one of the end-users seemed to have discovered a cunning strategy for optimising a critical SQL statement. His problem was that his query screen times out after 2 minutes, so any query he runs has to complete in less than two minutes or he doesn’t see the results. Unfortunately he had a particular query which took nearly 32 minutes from cold to complete – partly because it’s a seven-table join using ANSI OUTER joins, against tables ranging through the 10s of millions of rows and gigabytes of data – the (necessary) tablescan of the table that had to be first in the join order took 70 seconds alone.
But our intrepid user seems to have made an important discovery and engineered a solution to his performance problem. I think he’s noticed that when you run a query twice in a row the second execution is often faster than the first. I can’t think of any other reason why the same person would run the same query roughly every four minutes between 8:00 and 9:00 am every morning (and then do the same again around 5:00 in the afternoon).
Looking at the SQL Monitoring screen around 10:00 the first day I was on-site I noticed this query with a very pretty graphic effect of gradually shrinking blue bars as 32 minutes of I/O turned into 2 minutes of CPU over the course of 8 consecutive executions which reported run times something like: 32 minutes, 25 minutes, 18 minutes, 12 minutes, 6 minutes, 4 minutes, 2.1 minutes, 2 minutes.
It’s lucky (for that user) that the db_cache_size is 60GB. On the other hand this machine is one of those Solaris boxes that likes to pretend that it’s got 128 CPUs when really it’s only 16 cores with 8 lightweight threads per core – you don’t want anyone running a query that uses 2 solid CPU minute on one of those boxes because it’s taking out 1/16th of your CPU availability, while reporting a load of 1/128 of your CPUs.
Footnote: the query can be optimised (properly) – it accessed roughly 100M rows of data to return roughly 300 rows (with no aggregation), so we just need to do a little bit of work on precise access paths.
Red Gate have asked me to write a few articles for their Oracle site, so I’ve sent them a short series on “traditional” compression in Oracle – which means I won’t be mentioning Exadata hybrid columnar compression (HCC a.k.a. EHCC). There will be five articles, published at the rate of one per week starting Tuesday (15th Jan). I’ll be supplying links for them as they are published.
Here’s a little puzzle if you have nothing better to do between now and the new year. The following extract came from 126.96.36.199, but could have come from 10.2.0.5 or 188.8.131.52 (and many others). I’ve got a simple heap table where the last thing I (or anyone) did was “alter table t1 move” before dumping the first (data) block of the table. Looking at the resulting trace file, I see the following:
If you need to have the acronyms decoded they are (according to my best guess):
- fsbo – free space, begin offset
- fseo – free space, end offset
- avsp – available space
- tosp – total space
Doing the arithmetic, the free space starts at offset 0x56e and ends at 0xf4d, which means the free space gap is 2,527 bytes; but the total space available for use is only 0x5f bytes, i.e. 95 bytes. So what has happened to the other 2,432 ?
Remember – I dumped the block immediately after issuing “alter table t1 move”, so there are no issues of delayed block cleanout, uncommitted transactions etc. to worry about.
Footnote: the reason why you have “available space” and “total space” is to keep track of the space made available by deleted rows. The “avsp” (usually) reports the size of the gap between the row directory and the row heap; the “tosp” includes the space in the holes left in the row heap after rows have been deleted (or updated in a way that moves them up to the top of the heap, leaving a gap behind them, or updated in situ in a way that reduces the row length leaving a little hole).
When I introduced Connor McDonald’s blog a few days ago, it was because we had exchanged a couple of email messages (through the Oak Table Network) about how to minimise the resource consumption when copying a load of data from one IOT to another of the same structure. His problem was the way in which the obvious way of copying the data resulted in a massive sort even though, in principle, it should not have been necessary to sort anything since the data could have been extracted in order by walking the existing IOT.
As a suggestion I referenced a comment I had made in the Addenda to Practical Oracle 8i about 12 years ago when I had first solved the problem of loading an IOT with minimal logging and no sorting. At the time I had been loading data from a sorted file into an empty table that was then going to be exchnaged into a partitioned IOT – but it crossed my mind that loading from a flat file and loading from a UNIX pipe were pretty much the same thing, so perhaps Connor could workaround his problem by making one session spool to a pipe while another session was reading it. In the end, he simply created a massive temporary tablespace, but I thought I’d modify a test script I wrote a few years ago to demonstrate my idea – and here it is:
There’s never enough time to read everything that’s worth reading, so even though Guy Harrison’s blog is one of the ones worth reading I find that it’s often months since I last read it. Visiting it late last night, I found an interesting batch of articles spread over the last year about the performance of SSD – the conclusions may not be what you expect, but make sure you read all the articles or you might end up with a completely misleading impression:
Don’t forget to read the comments as well. For other notes Guy has written about SSD, here’s a URL for his SSD tag.
In an earlier post I showed you how you could generate SQL to analyze the distribution of row sizes in a table. In the introduction to the code I made a comment about how it failed to “allow for nulls at the end of rows”; a feature of Oracle storage that isn’t commonly known is that a set of consecutive null columns at the end of a row take up no storage space, while any null columns followed by a non-null column take up one byte (holding the value 0xFF) per column so that Oracle can “count its way” through the null columns to the non-null column. Consider this example:
A recent question on Oracle-L (a performance problem creating indexes that turned into an issue with migrated rows) prompted me to revisit a piece of code I first wrote about 20 years ago for an Oracle 6 system, and first published in November 2000. It addresses the question of analysing row-lengths in detail and, if you look at the example I’ve linked to, you’ll see that this could be very useful when you’re trying to work out suitable settings for pctfree and why you’re suffering from row migration.
The script is strictly limited to “simple” tables, by which I mean heap tables with columns that are basic data types and don’t include such things as Longs, LOBs, nested tables and all the other nasty things that usually break simple utilities. All it does is estimate the length of each row, then aggregating by row length. The estimate doesn’t allow for nulls at the ends of rows or columns longer than 254 bytes – technically the former don’t need length bytes and the latter use 3 bytes for the length when the column is more than 250 bytes. I don’t use the script often, and haven’t got around to including all the column types it won’t work for – that’s left as an exercise for the reader.
Following a question on the Oracle Core Addenda pages, here’s a little script to tell you about the sizes of the public and private redo threads currently active in the instance. It’s a minor variation of a script I published in Appendix D (Dumping and Debugging), page 237 to show the addresses of current activity in the various log buffers:
One of the partitioning features that Oracle introduced relatively recently was the “interval partition”, for example:
create table transactions (
account_id number(8) not null,
transaction_date date not null,
transaction_type varchar2(2) not null,
transaction_id varchar2(10) not null,
amount number(10,2) not null,
partition by range (transaction_date)
partition p200801 values less than (to_date('01-FEB-2008','DD-MON-YYYY'))
When I insert data into this table for a partition that doesn’t yet exist, Oracle will work out which partition it should be and create it automatically before doing the insert. The benefit of this trick, of course, is that the DBAs and developers don’t have to write any code to add partitions in anticipation of time passing and new data appearing.
Here’s a simple piece of SQL that could, in theory, compare the current size of a table with the size it might be after a call to “alter table move” – and it’s followed by the results for a table that’s current in the database that I’m looking at:
blocks, num_rows, avg_row_len, pct_free,
ceil(num_rows * avg_row_len / (8000 * ((100 - pct_free)/100))) blocks_needed
table_name = 'T1'
BLOCKS NUM_ROWS AVG_ROW_LEN PCT_FREE BLOCKS_NEEDED
---------- ---------- ----------- ---------- -------------
25 1000 22 10 4
A question came up on the OTN database forum a few days ago asking why a very simple query against a hash partitioned table was doing so much physical I/O to get a very small amount of data from a single partition:
I have table hash-partitioned on an indexed column “id” which is non-unique and part of my primary key. Inside each partition, rows with the same id are located close to each other which is done by dbms_redefinition reorg using orderby_cols. The intention is to reduce the amount of physical reads since there are no queries that don’t filter on the id column.
What I’m seeing is a lot of physical reads though. The first partition has roughly 80K rows, an average row length of 347, block size of 8K and compression … resulting in 821 blocks. And when (after flushing buffered cache and shared pool) submit a query that filters on “id” only and results is 106 selected rows, I see roughly 1400 physical reads.
The poster has clearly applied some careful thought to minimising the work required – so what’s gone wrong ? To help us analyse the problem he posted the results from running a query through autotrace:
A few years ago (2007) I wrote about a problem that could appear when you mixed parallel execution with system managed extent allocation. A couple of years later I added a note that Christian Antognini had observed a patch in 184.108.40.206 that addressed the specific issue I had raised. Today, thanks to an email exchange with Christo Kutrovsky of Pythian, I can report that there is a variation of this issue still available even in 220.127.116.11.
The basic problem is that you can end up with a very large number of very small extents, leading to poor performance in parallel queries and a significant waste of space in a data segment. Here’s a simple, though not particularly realistic, way to demonstrate the problem.
There was a question on the OTN database forum a little while ago about the different between the relative and absolute files numbers for a data file as reported in view dba_data_files – where they appear as file_id and relative_fno – or v$datafile – where they appear as file# and rfile#. In response, someone resurrected a very old posting of mine from the Usenet newsgroup comp.databases.oracle.server, which says the following:
A recent posting on the OTN database forum (which I didn’t answer correctly on my first attempt) raised the problem of truncating partitions when there are referential integrity constraints in place. Let’s start with a demonstration of the problem:
Here’s another one of those little changes that sidles in when you’re not looking.
When locally managed tablespaces first appeared, there were a number of posts on Usenet (comp.databases.oracle.server) and the metalink lists (the OTN database forum didn’t exist at the time) about “missing” space in data files. The commonest sort of comment was along the lines of
“I’ve created a data file of 100 MB, how come I can only create 99 extents of 1 MB each?”
“I’ve created a data file of 10 GB, how come I can only create 9 extents of 1 GB each?”