Oracle Scratchpad

April 10, 2018

exp catch

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 5:52 pm BST Apr 10,2018

No-one should be using exp/imp to export and import data any more, they should be using the datapump equivalents expdp/impdp – but if you’re on an older (pre-12c) version of Oracle and still using exp/imp to do things like moving tables with their production statistics over to test systems then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).

exp/imp will mess up some of your histograms if you’re still using them to move tables/statistics in 12c.

Remember that 12c can create “Top-N” and “hybrid” histograms – and exp/imp were written long before these new histogram types came into existence. The code has not been updated to allow for the new histogram types so if you happen to generate any histograms of these type in a 12c system and then use exp/imp to move some table stats (and it’s particularly an issue relating to stats) from one system to another – the stats that arrive at the destination system won’t match the stats that left the source system.

Here’s a little sample code to build a model that I can use to demonstrate the problem. It creates a table with three columns that will make it easy for me to create one frequency histogram, one Top-N histogram and one hybrid histogram. I’ve included a couple of substitution variables in the code so that you can specify an Oracle instance to connect to and a directory for the export file that expdp is going to produce. Don’t forget to check that the directory I create in this script doesn’t overwrite a directory that already exists for other reasons on your test system.


rem
rem     Script:         12c_histograms.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2015
rem
rem     Define m_service to be the service name you connect to
rem     Define m_directory to be the O/S directory you to use for
rem     the export/import/log files
rem
rem     Make sure this code is not over-writing an existing 
rem     definition for a directory called DMPDIR before you 
rem     start


define m_service = 'orcl'
define m_service = 'or32'

define m_directory = '/mnt/working'

host rm &m_directory/expdat.dmp
host rm &m_directory/expdp.dmp

create or replace directory dmpdir as '&m_directory';

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        trunc(sqrt(rownum + 0)) frequency,
        trunc(sqrt(rownum + 0)) top_n,
        trunc(sqrt(rownum + 0)) hybrid
from
        generator
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns frequency size 254 for columns top_n size 95 for columns hybrid size 50'
        );
end;
/

select
        column_name,
        num_distinct,
        histogram,
        num_buckets
from
        user_tab_cols
where
        table_name = 'T1'
;

column endpoint_actual_value format a22
break on column_name skip 1

select 
        column_name, 
        endpoint_number, endpoint_value, endpoint_actual_value, 
        endpoint_repeat_count 
from 
        user_tab_histograms 
where 
        table_name = 'T1'
order by 
        column_name, endpoint_number
;

Here’s an extract, from a 12.1.0.2 instance, of the results of the two queries with a large number of the rows from the histogram data deleted:


COLUMN_NAME              Distinct HISTOGRAM          Buckets
-------------------- ------------ --------------- ----------
FREQUENCY                     100 FREQUENCY              100
TOP_N                         100 TOP-FREQUENCY           95
HYBRID                        100 HYBRID                  50

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
-------------------- --------------- -------------- ---------------------- ---------------------
FREQUENCY                          3              1 1                                          0
                                   8              2 2                                          0
                                  15              3 3                                          0
...
                                9800             98 98                                         0
                                9999             99 99                                         0
                               10000            100 100                                        0


HYBRID                             3              1 1                                          3
                                 224             14 14                                        29
                                 440             20 20                                        41
...
                                9603             97 97                                       195
                                9800             98 98                                       197
                               10000            100 100                                        1

TOP_N                              1              1 1                                          0
                                  16              7 7                                          0
                                  33              8 8                                          0
...
                                9753             98 98                                         0
                                9952             99 99                                         0
                                9953            100 100                                        0

The most important detail is the endpoint_repeat_count column of the hybrid histogram, although you should note that the endpoint_actual_value columns is populated with a copy of the endpoint_value for all three histograms.

Now I’m going to use exp / drop table / imp (or the datapump equivalents) to export, drop, and re-import the table with (one hopes) the exact same statistics. To do this I’ll be using the imp command with the option “statistics=always” with the intention of copying the stats from the export file into the destination database (you’ll have to substitute your own userid/password, of course):


host exp   userid=test_user/test@&m_service file=expdat.dmp tables='(t1)'
-- host expdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'

drop table t1 purge;

host imp   userid=test_user/test@&m_service file=expdat.dmp tables='(t1)' statistics=always
-- host impdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'

So what do we see now when we re-run the two queries to report the histogram information:


COLUMN_NAME              Distinct HISTOGRAM          Buckets
-------------------- ------------ --------------- ----------
FREQUENCY                     100 FREQUENCY              100
TOP_N                         100 FREQUENCY               95
HYBRID                        100 FREQUENCY               50

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
-------------------- --------------- -------------- ---------------------- ---------------------
FREQUENCY                          3              1                                            0
                                   8              2                                            0
                                  15              3                                            0
...
                                9800             98                                            0
                                9999             99                                            0
                               10000            100                                            0

HYBRID                             3              1                                            0
                                 224             14                                            0
                                 440             20                                            0
...
                                9603             97                                            0
                                9800             98                                            0
                               10000            100                                            0

TOP_N                              1              1                                            0
                                  16              7                                            0
                                  33              8                                            0
...
                                9753             98                                            0
                                9952             99                                            0
                                9953            100                                            0

The histograms on all three columns are now labelled as FREQUENCY.
The endpoint_actual_value is null for all three – but that may be a purely cosmetic detail with no side effects.
The “hybrid” column really has become a frequency histogram – and that’s the critical one – the endpoint_repeat_count columns are all zero.

tl;dr

If you’re still using exp/imp instead of expdp/impdp to move tables (and, more importantly, their statistics) from one database to another then the upgrade to 12c may mean you end up with hybrid histograms on the source system that are “downgraded” to frequency histograms on the destination system, with the effect that execution plans vary between the two systems.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Powered by WordPress.com.