Oracle Scratchpad

August 19, 2012

Compression Units – 5

Filed under: CBO,Exadata,HCC,Indexing,Oracle — Jonathan Lewis @ 6:02 pm BST Aug 19,2012

The Enkitec Extreme Exadata Expo (E4) event is over, but I still have plenty to say about the technology. The event was a great success, with plenty of interesting speakers and presentations. I was particularly keen to hear  Frits Hoogland’s comments  on Exadata and OLTP, Richard Foote on Indexes, and Maria Colgan’s comments on how Oracle is making changes to the optimizer to understand Exadata a little better.

All three presentations were interesting – but Maria’s was possiby the most important (and entertaining). In particular she told us about two patches for 11.2.0.3, one current and one that is yet to be released (unfortunately I forgot to take  note of the patch numbers – ed: but they’ve been supplied by readers’ comment below).

The patch that is currently available introduces a new option for dbms_stats.gather_system_stats(). Pass in the parameter value ‘Exadata’ and (along with any other steps) the code will set the MBRC stored statistic to 128.  If this looks like a familiar idea, check this comment on an earlier post:

It’s slightly worrying that when you set up a (datawarehouse) system on Exadata it might be a good idea – at least in the short term – to set the db_file_multiblock_read_count to 128 because you really do want Oracle to think that tablescans are, in general, a pretty good fast option – we’ve only just got over persuading people that you shouldn’t set this parameter in any version of 10g or above

It’s nice to know that with the right patch we  now  have a proper way to get the effect we need. (I can’t help thinking, though, that since a scan in Exadata is effectively “Parallel N” when N is the number of cell servers, it might be even better if the fix changed the MBRC to allow for a significant percentage of N.)

The patch that is not yet available addresses the problem I described in the same post, that a sample query took 0.7 seconds to complete on an uncompressed table, 77 seconds if the table was compressed at “query high”, and 3,000 seconds when compressed at “archive high.” Here’s the slide I presented at the conference, where I explained why compression made the queries run so slowly, and how I had reduced the run-time from 3,000 seconds to 16 seconds by optimising the query:

As you can see, the query is just “select from single table by index range scan”. The bottom half of the slide gives you an indication of why the performance is so bad – the numbers give an indication of how the data is scattered through the table, and represent the order in which the rows are fetched as the query walks the index; and every “block” in the picture is a separate compression unit, which has to be de-compressed each time it is visited. Because of the data pattern, and because Exadata will only keep one decompressed unit in the PGA at any moment, my query decompresses one compression unit for every single row fetched – and that’s a lot of CPU disappearing from the compute node of an Exadata system.

So how can you rewrite the query to optimise it ? Here’s how – and it’s a method that can have an effect whether or not you’re using Exadata, but it’s only with HCC that you’re really likely to see a dramatic difference in CPU usage (in more traditional systems it can have quite an impact on physical I/O):

select
	/*+
		leading(v1 t1_ah)
		use_nl(t1_ah)
		monitor
	*/
        max(t1_ah.padding)
from
	(
		select	/*+ index(t1_ah) no_merge no_eliminate_oby */
			rowid r1
		from
			t1_ah
		where
        		n_128k between 1000 and 1999
		order by
			r1
	)	v1,
	t1_ah
where
	t1_ah.rowid = v1.r1
;

It’s my old “two-step” approach to visiting tables and indexes. Get the rowids you really need, and visit the table later. In this case, though, I’ve sorted the rowids before I visit the data. The effect of this sort is that I visit each table block (or, in the case of Exadata, each Compression Unit) just once, and pin it (or, in Exadata, keep the decompressed unit in the PGA) until I’ve extracted all the rows from it that I need.

So, you might ask, how will the patch optimise for indexed access in Exadata ? The optimizer will be allowed to introduce a new operation between an “index range scan” and a “table access by rowid” – and the operation will be called (something like) “sort by datablock address”.  (And if you think I’m feeling smug about pre-empting the optimizer group, you’d be right – but then I don’t have to change and test the source code, of course, so I had a timing advantage.)

Footnote: When I saw the “index pre-fetching” change to the nested loop join in Oracle 9i (and that’s long before the NLJ_batching feature of 11g), I had assumed that this sorting step was part of the strategy. So the Exadata strategy looks like something I’ve been waiting for for the last 10 years or so.

Update: According to document 10248538.8 the call to dbms_stats.gather_system_stats(‘Exadata’) sets the MBRC statistic to the “true I/O size” – which seems to mean it will end up matching whatever has been set, or defaulted, for parameter db_file_multiblock_read_count.

Update 2:

A new hint appeared in 11.2.0.4 to address this issue: /*+ cluster_by_rowid(alias) */, possibly back-ported from 12.1.0.1 where it’s also available.

 

8 Comments »

  1. Jonathan,

    Nice to meet you at E4 last week. It looks like the first patch (optimizer stats) is fixed in the August quarterly database patch for 11.2.0.3. The bug number is 10248538. It looks like it’s planned on being included in 11.2.0.2 BP18 (scheduled for release October 2012).

    Andy

    Comment by Andy Colvin (@acolvin) — August 19, 2012 @ 7:12 pm BST Aug 19,2012 | Reply

  2. Hello Jonathan,
    When you say hinted to smart scan what do you mean?. by using _serial_direct_reads=true?

    thanks

    Comment by vnmaster — August 20, 2012 @ 1:03 am BST Aug 20,2012 | Reply

    • vnmaster,

      I was trying to pack too much into too few words.
      I just hinted a tablescan, which managed to operate as serial direct and therefore used a smart scan.

      Comment by Jonathan Lewis — August 29, 2012 @ 8:33 am BST Aug 29,2012 | Reply

  3. The second bug number is 12780479. I’ve twitted most of Richard’s and Maria’s presentation listening online (thanks to Kerry :)). They were awesome and Maria rocked.

    So, you might ask, how will the patch optimise for indexed access in Exadata ? The optimizer will be allowed to introduce a new operation between an “index range scan” and a “table access by rowid” – and the operation will be called (something like) “sort by datablock address”

    I had impression it will be done by run-time with no plan change. But I might be wrong, I listened and wrote concurrently.

    Comment by Timur Akhmadeev — August 20, 2012 @ 7:07 am BST Aug 20,2012 | Reply

    • Timur,

      Thanks for the bug number.

      I think I heard the word “operation” and assumed that an operation would automatically correspond to a line in an execution plan.

      It’s an important difference, and I’d hope that it would be visible and controllable by a hint because there are other ways that the side effects of a sort could make a big difference to performance.

      Comment by Jonathan Lewis — August 29, 2012 @ 8:38 am BST Aug 29,2012 | Reply

  4. […] to do with the costs of accessing compressed data by rowid on an Exadata system (see, for example, this posting) and it’s very easy to check what the hint […]

    Pingback by Caution – hints | Oracle Scratchpad — February 12, 2014 @ 6:57 pm BST Feb 12,2014 | Reply

  5. […] see if the figures were consistent). This type of rowid sorting is, of course, an important fix for an Exadata issue I described some time ago, and I had assumed that the “batched” concept in the 12c plan […]

    Pingback by NL History | Oracle Scratchpad — April 23, 2014 @ 6:43 pm BST Apr 23,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,013 other followers