Oracle Scratchpad

September 20, 2021

Optimizer Tip

Filed under: CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 9:04 am BST Sep 20,2021

This is a note I drafted in March 2016, starting with a comment that it had been about that time the previous year that I had written:

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

I keep finding ancient drafts like this (there are still more than 730 drafts on my blog at present – which means one per day for the next 2 years!) and if they still seem relevant – even if they are a little behind the times – I’ve taken to dusting them down and publishing.

With the passing of time, though, new information becomes available, algorithms change, and (occasionally) I discover I’ve made a significant error in my inferences. In this case  there are a couple of important additions that I’ve added to the end of the note.

Optimizer Tips (IOUG Quick Tips 2015)

There are two very common reasons why the optimizer picks a bad execution plan. The first is that its estimate of the required data volume is bad, the second is that it has a misleading impression of how scattered that data is.

The first issue is often due to problems with the selectivity of complex predicates, the second to unsuitable values for the clustering_factor of potentially useful indexes. Recent [ed: i.e. pre-2015] versions of the Oracle software have given us features that try to address both these issues, and I’m going to comment on them in the following note.

As always, any change can have side effects – introducing a new feature might have no effect on 99% of what we do, a beneficial effect on 99% of the remainder, and a hideous effect on the 1% of 1% that’s left, so I will be commenting on both the pros and cons of both features.

Column Group Stats

The optimizer assumes that the data in two different columns of a single table are independent – for example the registration number on your car (probably) has nothing to do with the account number of your bank account. So when we execute queries like:

     colX = 'abcd'
and  colY = 'wxyz'

the optimizer’s calculations will be something like:

“one row in 5,000 is likely to have colX = ‘abcd’ and one row in 2,000 is likely to have colY = ‘wxyz’, so the combination will probably appear in roughly one row in ten million”.

On the other hand we often find tables that do things like storing post codes (zipcodes) in one column and city names in another, and there’s a strong correlation between post codes and city – for example the district code (first part of the post code) “OX1” will be in the city of Oxford (Oxfordshire, UK). So if we query a table of addresses for rows where:

     district_code = 'OX1'
and  city          = 'Oxford

there’s a degree of redundancy, but the optimizer will multiple the total number of distinct district codes in the UK by the total number of distinct city names in the UK as it tries to work out the number of addresses that match the combined predicate and will come up with a result that is far too small.

In cases like this we can define “column group” statistics about combinations of columns that we query together, using the function dbms_stats.create_extended_stats(). This function will create a type of virtual column for a table and report the system-generated name back to us, and we will be able to see that name in the view user_tab_cols, and the definition in the view user_stat_extensions. If we define a column group in this way we then need to gather stats on it, which we can do in one of two ways, either by using the generated name or by using the expression that created it.


SQL> create table addresses (district_code varchar2(8), city varchar2(40));

Table created.

SQL> execute dbms_output.put_line( -
>        dbms_stats.create_extended_stats( -
>            user,'addresses','(district_code, city)'))

SYS_STU12RZM_07240SN3V2667EQLW

PL/SQL procedure successfully completed.

begin
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns SYS_STU12RZM_07240SN3V2667EQLW size 1'
        );
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns (district_code, city) size 1'
        );
end;
/

I’ve included both options in the anonymous pl/sql block, but you only need one of them. In fact if you use the second one without calling create_extended_stats() first Oracle will create the column group implicitly, but you won’t know what it’s called until you query user_stat_extensions.

I’ve limited the stats collection to basic stats with the “size 1” option. You can collect a histogram on a column group but since the optimizer can only use a column group with equality predicates you should only create a histogram in the special cases where you know that you’re going to get a frequency histogram or “Top-N” histogram.

You can also define extended stats on expressions (e.g. trunc(delivery-date) – trunc(collection_date)) rather than column groups, but since you’re only allowed 20 column groups per table [see update 1] it would be better to use virtual columns for expressions since you can have as many virtual columns you like on a table provided the total column count stays below the limit of 1,000 columns per table.

Warnings

  • Column group statistics are only used for equality expressions. [see also update 2]
  • Column group statistics will not be used if you’ve created a histogram on any of the underlying columns unless there’s also a histogam on the column group itself.
  • Column group statistics will not be used if you query any of the underlying columns with an “out of range” value. This, perhaps, is the biggest instability threat with column groups. As time passes and new data appears you may find people querying the new data. If you haven’t kept the column stats up to date then plans can change dramatically as the optimizer switches from using column group stats to multiplying the selectivities of underlying columns.
  • The final warning arrives with 12c. If you have all the adaptive optimizer options enabled the optimizer will keep a look out for tables that it thinks could do with column group stats, and automatically creates them the next time you gather stats on the table. In principle this shouldn’t be a problem – the optimizer should only do this when it has seen that column stats should improve performance – but you might want to monitor your system for the arrival of new automatic columns.

Preference: table_cache_history

Even when the cardinality estimates are correct we may find that we get an inefficient execution plan because the optimizer doesn’t want to use an index that we think would be a really good choice. A common reason for this failure is that the clustering_factor on the index is unrealistically large.

The clustering_factor of an index is a measure of how randomly you will jump around the table as you do an index range scan through the index – and the algorithm Oracle uses to calculate this number has a serious flaw in it: it can’t tell the difference between a little bit of localised jumping and constant random leaps across the entire width of the table.

To calculate the clustering_factor Oracle basically walks the index in order using the rowid at the end of each index entry to check which table block it has to visit, and every time it has to visit a “new” table block it increments a counter. The trouble with this approach is that, by default, it doesn’t remember its recent history so, for example, it can’t tell the difference in quality between the following two sequences of table block visits:

Block 612, block 87, block 154, block 3,  block 1386, block 834, block 237
Block 98,  block 99, block 98,  block 99, block 98,   block 99,  block 98

In both cases Oracle would say that it had visited seven different blocks and the data was badly scattered. This has always been a problem, but it became much more of a problem when Oracle introduced ASSM (automatic segment space management). The point of ASSM is to ensure that concurrent inserts from different sessions tend to use different table blocks, the aim being to reduce contention due to buffer busy waits. As we’ve just seen, though, the clustering_factor doesn’t differentiate between “a little bit of scatter” and “a totally random disaster area”.

Oracle finally addressed this problem by introducing a “table preference” which allows you to tell it to “remember history” when calculating the clustering_factor. So, for example, a call like this:

execute dbms_stats.set_table_prefs(user,'t1','table_cached_blocks',16)

would tell Oracle that the next time you collect statistics on any indexes on table t1 the code to calculate the clustering_factor should remember the last 16 table blocks it had “visited” and not increment the counter if the “next” block was already in that list.

If you look at the two samples above, this means the counter for the first list of blocks would reach 7 while the counter for the second list would only reach 2. Suddenly the optimizer will be able to tell the difference between data that is “locally” scattered and data that really is randomly scattered. You and the optimizer may finally agree on what constitutes a good index.

It’s hard to say whether there’s a proper “default” value for this preference. If you’re using ASSM (and there can’t be many people left who aren’t) then the obvious choice for the parameter would be 16 since ASSM tends to format 16 consecutive blocks at a time when a segment needs to make more space available for users [but see Update 3]. However, if you know that the real level of insert concurrency on a table is higher than 16 then you might be better off setting the value to match the known level of concurrency.

Are there any special risks to setting this preference to a value like 16? I don’t think so; it’s going to result in plans changing, of course, but indexes which should have a large clustering_factor should still end up with a large clustering_factor after setting the preference and gathering of statistics; the indexes that ought to have a low clustering_factor are the ones most likely to change, and change in the right direction.

Footnote: “Danger, Will Robinson”.

I’ve highlighted two features that are incredibly useful as tools to give the optimizer better information about your data and allow it to get better execution plans with less manual intervention. The usual warning applies, though: “if you want to get there, you don’t want to start from here”. When you manipulate the information the optimizer is using it will give you some new plans; better information will normally result in better plans but it is almost inevitable that some of your current queries are running efficiently “by accident” (possibly because of bugs) and the new code paths will result in some plans changing for the worse.

Clearly it is necessary to do some thorough testing but fortunately both features are incremental and any changes can be backed out very quickly and easily. We can change the “table_cached_blocks” one table at a time (or even, with a little manual intervention, one index at a time) and watch the effects; we can add column groups one at a time and watch for side effects. All it takes to back out of a change is a call to gather index stats, or a call to drop extended stats. It’s never nice to live through change – especially change that can have a dramatic impact – but if we find after going to production that we missed a problem with our testing we can reverse the changes very quickly.

Updates

Update 1 – 20 sets of extended stats. In fact the limit is the larger of 20 and ceiling(column count/10), and the way the arithmetic is applied is a little odd so there are ways to hack around the limit.

Update 2 – Column groups and equality. It’s worth a special menton that the predicate colX is null is not an equality predicate, and column group stats will not apply but there can be unexpected side effects even for cases where you don’t use this “is null” predicate.

Update 3 – table_cache_history = 16. This suggestions doesn’t allow for systems running RAC.

2 Comments »

  1. It was easy to overlook table_cache_history preference, thank you for the tip!

    Comment by rbikblog — September 25, 2021 @ 1:10 pm BST Sep 25,2021 | Reply

    • Thanks for the feedback.

      You’re right about how easy it is to overlook tiny little details that turn out to be very useful and important. I’ve has quite a few cases where I’ve noticed something “new” and then discovered that it first appeared a couple of versions before the one I first noticed it in.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — September 26, 2021 @ 11:33 am BST Sep 26,2021 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: