Oracle Scratchpad

December 10, 2015

Upgrades

Filed under: 12c,Oracle,Upgrades — Jonathan Lewis @ 8:42 am GMT Dec 10,2015

One of the questions that came up at the Optimizer Round Table this year was about minimizing the performance-related** hassle of upgrading from 11g to 12c. Dealing with changes in the optimizer is always an an interesting problem but in 12c this is made more challenging because of the automatic dynamic sampling that can introduce a significant amount of extra work at (hard) parse time, then generate SQL Directives, and finally generate extended (column group) statistics the next time you (or the automatic job) collect stats.

Of course one of the traditional strategies for upgrades (especially if you don’t really have a technical need to, but want your Oracle version to be current) is to set the optimizer_features_enable parameter to the older version and hope that this stops most of the new features from appearing, letting other customers identify the problems and raise the SRs that result in the next patch set.

There’s also the option for reading the manuals and white-papers carefully and identifying the new and enhanced optimizer features that might cause you problems if only you can spot them in time in your code. (The enhancement to subquery unnesting I described in my previous post is such an example – do you know of any code in your application that was carefully engineered to maximise the benefit of scalar subquery caching, if so you should check very carefully that 12c isn’t going to transform away your subquery and wreck your performance.)

As a general strategy, though, it’s worth reviewing your code for four generic features – histograms, multi-column joins, predicates that apply a function to a column, and “out-of-range” issues. The first three are features that are likely to make the 12c optimizer decide to do something “dynamic” that may ultimately give you a better execution plan, but may cost you extra resources or result in increased contention at just the wrong moment; the fourth is one that can disrupt the solution to the multi-column problem.

For the purposes of this note I am going to assume that you don’t use partitioned table or that you have already devised a programmatic method for minimising the work while maximising the effectiveness of their stats (and that your programs unlock and relock the stats so that they aren’t re-gathered by accident).

Histograms

I’m assuming by now that you’re using the approximate_ndv method with auto_sample_size to gather object statistics. In 11g this still means that any histograms gathered by Oracle will use a (surprisingly small) sample and even if you are gathering the histograms at the right time they could introduce a lot of instability to execution plans.

In 12c, Oracle can use an approximate method to create a frequency or Top-N frequency histogram while doing the scan that generates all the other stats – so if you have columns where you know the number of distinct values is relatively small (default up to 254, though technically up to 2048) or that almost all the data (except, broadly speaking, one bucket’s worth) comes from a relatively small number of distinct values then, on the upgrade, you should be safe creating a frequency or Top-N histogram once on those columns and setting the method_opt to ‘for all columns size repeat’.

Oracle 12c still uses a small sample size, though, when it detects the need for a “hybrid” histogram – which is the thing that replaces the height-balanced histogram. This means you may still need to write code to generate the histogram rather than allowing Oracle to collect it. It’s the classic compromise problem – a good histogram will be expensive to gather, a cheaply gathered histogram could easily be inaccurate and unstable. In this case you may want to create a Top-N frequency histogram that is a good model of the data, rather than trying to create a correctly structured hybrid. For tables with such columns you need code that can detect changes to the stats and recreates the histogram. In fact, even in cases where Oracle could gather a (frequency) histogram efficiently and accurately, you may still want to detect changes to stats for some columns and write code to create a histogram that describes the data the way it will look when the users are accessing it rather than the way the data looked when Oracle created the histogram (at 2:00 a.m. or whenever the automatic stats collection job hit it).

Broad strategy: set the global preference to method_opt => ‘for all columns size 1’, set a few table preferences to ‘for all columns size repeat’, and have code that checked the last_analyzed date on a few tables and recreates the histogram you want immediately after the stats have been gathered.

Note: as a general guideline, if you know the database needs histograms to produce the most appropriate execution plans this means the front-end code has to co-operate with the database and not conceal useful information in SQL that uses bind variables that hide the significance of special values or ranges – but that’s a topic for another blog note (if I haven’t written anything about it before).

Virtual Columns and Column Groups

In 12c column expressions (where the optimizer guesses) and multi-column predicates (where the optimizer assumes independence) are key triggers to dynamic sampling and SQL Directives. To engage in a pre-emptive strike the ideal is to know the code and know the data. Search for “important” SQL that applies functions (like trunc(date_col)) to columns in predicates or does multi-column (equality) joins or uses filter predicates that reference multiple columns from the same table.

If the execution plans you find for these statements produce estimates which are clearly far from reality then you may need to take pre-emptive action, even (or, perhaps, especially) if the resulting plans look good. The optimizer may (for example) know that it is guessing when it says to itself: “trunc(delivery_date) = trunc(sysdate) will return 1% of the data” and do several things such as create an adaptive execution plan and switch plans in mid-execution, do a load of dynamic sampling to discover the guess was wrong, dump an SQL Plan Directive into the data dictionary that triggers subsequent dynamic sampling and then create a virtual column through the extended stats mechanism.

In cases like this you may want to create and document virtual columns explicitly, and create column groups explicitly before Oracle does its dynamic thing. In the case of the column groups, you only have to worry about equality predicates, a column group cannot be used with range-based predicates. Remember you are only allowed a maximum of 20 column groups per table (although there is, effectively, no limit on the number of virtual columns) so you need to make your choices of columns groups before Oracle goes onto auto-pilot.

Remember, there’s a trap waiting here if you don’t make this effort.  You might run a test suite a couple of times and fix everything that seems to be necessary without realising that in the course of your testing Oracle has created a couple of hundred column groups and virtual columns. Then, when everything is working nicely, you upgrade the production system and all hell breaks loose until the production system has generated the same (we hope) set of virtual columns and column groups. Since there is a limit of 20 column groups per table (or column groups / 10 if you have more than 200 columns) you could get unlucky and find that Oracle wants to create 30 columns groups and the 20 that happen to be created first on production may not be the ones it created on the test system.

Out of Range

When a predicate specifies a value that is above the known high value or below the known low value for a column, the optimizer uses a “linear decay” method for cardinality calculations that is based on how far out of the range the value is (as a percentage of the range). In many cases this makes introduces a slowly increasing error (though for some predicates it can create an immediate catastrophe). There is a special consideration, though,  that means you need to be very careful about time-based or sequence-based columns that can go out of range: if you have a column group that includes the column then the optimizer stops using the column group the moment you go out of range. This is probably a rare condition to check for because you probably use range-based predicates on such columns (e.g. “order_date < trunc(sysdate) – 7”) and column groups are not applicable to range-based predicates anyway, but you do need to be aware that in special cases an execution plan can go bad the moment you overshoot the high value.

For columns like these you need a strategy that allows you to set a high (or low) value that reflects a future high value for the column – even to the extent (for example) of running a scheduler task that calls dbms_stats.set_column_stats() every hour to push the high value forward a little bit.

Update [11th Dec 2015]

Since this note is about preemptive strategies for reducing the number of problems you run into with 12c, it’s worth reminding you about the new “table preference” which you can use to get more realistic values for the clustering_factor on indexes. For example:


begin
        dbms_stats.set_table_prefs(user, 't1', 'table_cached_blocks', 16);
        dbms_stats.gather_index_stats(user, 't1_i1');
end;
/

Setting the table_cached_blocks preference for a table means Oracle will remember a history of recently “visited” table blocks as it walks an index to generate the clustering_factor. Although there is no official word, I think that 16 is a good default value for this setting in single instance Oracle, and 16 * {number of instances} might be appropriate for RAC.

The fragment above is just for demo purposes – you wouldn’t set the value every time you gather stats on an index, it’s a one-off exercise for each table – though you could change it for all existing tables in a schema with a single call to dbms_stats.set_schema_prefs().

The reason why this preference becomes more important in 12c is that the more expensive an execution path is the longer the optimizer will work to find a better path – and bad settings for the clustering_factor make otherwise good paths looks expensive and encourage the optimizer to try all sorts of before deciding on the final path.

Wrap-up

This is just a brief note to pick up a few key points that you can identify as potential threats before you start upgrading. Obviously it will also be of some help after you’ve upgraded (and are still in test mode) to help you explain some of the changes to execution plans and increases in parse times that you are likely to see. The benefit of pre-emptive action, though, is that you may be able to minimise the number of hidden actions (creation of directives and extended stats) that Oracle takes during your testing. It used to be hard enough doing an upgrade in the past, going up to 12c it can be even harder because Oracle may have changed your test database behind your back when you weren’t looking leaving you running a production system that doesn’t match the system you’ve tested.

** Footnote: if you went to Tim Hall’s presentation you’ll know that part of your upgrade process should deal with the move to the multi-tenant architecture, even if you adopt just the single-PDB licence.

 

6 Comments »

  1. I think OFE is sometimes a false comfort blanket unless you can find the specific fix control which caused a problem.
    I helped out on a such an issue below where an upgrade seemed to cause an issue and OFE seemed to fix it but the proper fix was to run away from nasty first_rows:
    https://orastory.wordpress.com/2015/11/16/did-it-really-fix-it-1-ofe-nasty-first_rows-plan-on-11-2-0-4/

    Comment by Dom Brooks — December 10, 2015 @ 11:16 am GMT Dec 10,2015 | Reply

  2. Completely perplexed as to why we keep hearing “your code” and other generalizations that totally ignore the reality of data centres nowadays.
    It’s NOT “our” code and hasn’t been in decades! It’s the application’s. Written by someone else. Mostly without any knowledge of SQL and best practices and horribly optimized for anything larger than the minuscule samples of data in the laptops used to develop those. And most of the time under ,lock and key of licensing.
    Under these conditions, to suggest that we look at “our code” to fix it before an upgrade is firmly in la-la-land…

    Comment by Noons — December 10, 2015 @ 8:52 pm GMT Dec 10,2015 | Reply

    • Noons,

      First – if you’re the first port of call when the performance is bad then it’s “YOUR” code, whether or not you wrote it.

      Secondly – by a coincidence that wasn’t entirely due to luck the article says nothing about changing application code; it’s mostly about statistics, with some requirement to generate hidden columns and write management code to generate timely stats.

      Comment by Jonathan Lewis — December 11, 2015 @ 8:55 pm GMT Dec 11,2015 | Reply

  3. I ran into the same problem as described here: http://dbakevin.blogspot.be/2014/06/long-parsing-time-issue-bundles-of-sqls.html
    One specific (complex) query got parse times of about 20..30 seconds. After setting _optimizer_cost_based_transformation = off and _b_tree_bitmap_plans = false, parse times got below 1 sec again. I have a reproducible case in my test environment (but had not time yet to open an SR for it).
    But this proves as well that you have to be careful with the optimizer when upgrading…

    Comment by Geert De Paep — December 11, 2015 @ 11:12 am GMT Dec 11,2015 | Reply

    • Geert,

      Thanks for the comment and link.

      It would be good if you could find time to file an SR. The optimizer group would probably like to hear about examples where an idea that is generally good hits a case where it causes too much time to be spent in optimisation. Without examples of nasty side effects the code won’t change.

      Comment by Jonathan Lewis — December 11, 2015 @ 8:58 pm GMT Dec 11,2015 | Reply

  4. […] to be creating column group stats involving char() columns – and then remember that 12c may generate column group stats automatically for you. If you use char() columns you will have to ensure that predicates using literal values […]

    Pingback by Column Groups | Oracle Scratchpad — April 20, 2016 @ 4:08 pm GMT Apr 20,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.