Oracle Scratchpad

April 10, 2018

exp catch

Filed under: 12c,Histograms,Oracle,Statistics,Upgrades — Jonathan Lewis @ 5:52 pm BST Apr 10,2018

No-one should be using exp/imp to export and import data any more, they should be using the datapump equivalents expdp/impdp – but if you’ve been in the habit of using exp/imp to do things like moving production statistics to test systems on older (pre-12c) versions of Oracle then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).

exp/imp will mess up some of your histograms if you use them in 12c (and above) to move tables/statistics.

Remember that 12c can create “Top-N” and “hybrid” histograms – and exp/imp were written long before these new histogram types came into existence. The code has not been updated to allow for the new histogram types so if you happen to generate any histograms of these type in a 12c system and then use exp/imp to move some table stats (and it’s particularly an issue relating to stats) from one system to another – the stats that arrive at the destination system won’t match the stats that left the source system.

Here’s a little sample code to build a model that I can use to demonstrate the problem. It creates a table with three columns that will make it easy for me to create one frequency histogram, one Top-N histogram and one hybrid histogram. I’ve included a couple of substitution variables in the code so that you can specify an Oracle instance to connect to and a directory for the export file that expdp is going to produce. Don’t forget to check that the directory I create in this script doesn’t overwrite a directory that already exists for other reasons on your test system.


rem
rem     Script:         12c_histograms.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2015
rem
rem     Define m_service to be the service name you connect to
rem     Define m_directory to be the O/S directory you to use for
rem     the export/import/log files
rem
rem     Make sure this code is not over-writing an existing 
rem     definition for a directory called DMPDIR before you 
rem     start


define m_service = 'orcl'
define m_service = 'or32'

define m_directory = '/mnt/working'

host rm &m_directory/expdat.dmp
host rm &m_directory/expdp.dmp

create or replace directory dmpdir as '&m_directory';

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        trunc(sqrt(rownum + 0)) frequency,
        trunc(sqrt(rownum + 0)) top_n,
        trunc(sqrt(rownum + 0)) hybrid
from
        generator
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns frequency size 254 for columns top_n size 95 for columns hybrid size 50'
        );
end;
/

select
        column_name,
        num_distinct,
        histogram,
        num_buckets
from
        user_tab_cols
where
        table_name = 'T1'
;

column endpoint_actual_value format a22
break on column_name skip 1

select 
        column_name, 
        endpoint_number, endpoint_value, endpoint_actual_value, 
        endpoint_repeat_count 
from 
        user_tab_histograms 
where 
        table_name = 'T1'
order by 
        column_name, endpoint_number
;

Here’s an extract, from a 12.1.0.2 instance, of the results of the two queries with a large number of the rows from the histogram data deleted:


COLUMN_NAME              Distinct HISTOGRAM          Buckets
-------------------- ------------ --------------- ----------
FREQUENCY                     100 FREQUENCY              100
TOP_N                         100 TOP-FREQUENCY           95
HYBRID                        100 HYBRID                  50

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
-------------------- --------------- -------------- ---------------------- ---------------------
FREQUENCY                          3              1 1                                          0
                                   8              2 2                                          0
                                  15              3 3                                          0
...
                                9800             98 98                                         0
                                9999             99 99                                         0
                               10000            100 100                                        0


HYBRID                             3              1 1                                          3
                                 224             14 14                                        29
                                 440             20 20                                        41
...
                                9603             97 97                                       195
                                9800             98 98                                       197
                               10000            100 100                                        1

TOP_N                              1              1 1                                          0
                                  16              7 7                                          0
                                  33              8 8                                          0
...
                                9753             98 98                                         0
                                9952             99 99                                         0
                                9953            100 100                                        0

The most important detail is the endpoint_repeat_count column of the hybrid histogram, although you should note that the endpoint_actual_value column is populated with a copy of the endpoint_value for all three histograms.

Now I’m going to use exp / drop table / imp to export, drop, and re-import the table with (one hopes) the exact same statistics. To do this I’ll be using the imp command with the option “statistics=always” with the intention of copying the stats from the export file into the destination database (you’ll have to substitute your own userid/password, of course):


host exp   userid=test_user/test@&m_service file=expdat.dmp tables='(t1)'
-- host expdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'

drop table t1 purge;

host imp   userid=test_user/test@&m_service file=expdat.dmp tables='(t1)' statistics=always
-- host impdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'

So what do we see now when we re-run the two queries to report the histogram information. If you’ve used expdp/impdp the original output re-appears. If you’ve used exp/imp you’ll see the following:


COLUMN_NAME              Distinct HISTOGRAM          Buckets
-------------------- ------------ --------------- ----------
FREQUENCY                     100 FREQUENCY              100
TOP_N                         100 FREQUENCY               95
HYBRID                        100 FREQUENCY               50

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
-------------------- --------------- -------------- ---------------------- ---------------------
FREQUENCY                          3              1                                            0
                                   8              2                                            0
                                  15              3                                            0
...
                                9800             98                                            0
                                9999             99                                            0
                               10000            100                                            0

HYBRID                             3              1                                            0
                                 224             14                                            0
                                 440             20                                            0
...
                                9603             97                                            0
                                9800             98                                            0
                               10000            100                                            0

TOP_N                              1              1                                            0
                                  16              7                                            0
                                  33              8                                            0
...
                                9753             98                                            0
                                9952             99                                            0
                                9953            100                                            0

  • The histograms on all three columns are now labelled as FREQUENCY.
  • The endpoint_actual_value is null for all three – but that may be a purely cosmetic detail with no side effects.
  • The histogrm on column hybrid reports the endpoint_repeat_count as zero for every value – the histogram really has become a frequency histogram – and that’s a critical change with potentially huge side effects on execution plans.

tl;dr

If you’re still using exp/imp instead of expdp/impdp to move tables (and, more importantly, their statistics) from one database to another and don’t change tools after an upgrade to 12c you can end up with hybrid histograms on the source system that are “downgraded” to frequency histograms on the destination system, with the effect that execution plans vary between the two systems.

1 Comment »

  1. […] Importing Statistics (April 2018): If you’re still using exp/imp after upgrading to 12c you should make the change or you may lose hybrid and top-frequency histograms on the import. […]

    Pingback by Upgrade catalogue | Oracle Scratchpad — February 2, 2023 @ 12:33 pm GMT Feb 2,2023 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.