No-one should be using exp/imp to export and import data any more, they should be using the datapump equivalents expdp/impdp – but if you’ve been in the habit of using exp/imp to do things like moving production statistics to test systems on older (pre-12c) versions of Oracle then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).
exp/imp will mess up some of your histograms if you use them in 12c (and above) to move tables/statistics.
Remember that 12c can create “Top-N” and “hybrid” histograms – and exp/imp were written long before these new histogram types came into existence. The code has not been updated to allow for the new histogram types so if you happen to generate any histograms of these type in a 12c system and then use exp/imp to move some table stats (and it’s particularly an issue relating to stats) from one system to another – the stats that arrive at the destination system won’t match the stats that left the source system.
Here’s a little sample code to build a model that I can use to demonstrate the problem. It creates a table with three columns that will make it easy for me to create one frequency histogram, one Top-N histogram and one hybrid histogram. I’ve included a couple of substitution variables in the code so that you can specify an Oracle instance to connect to and a directory for the export file that expdp is going to produce. Don’t forget to check that the directory I create in this script doesn’t overwrite a directory that already exists for other reasons on your test system.
rem rem Script: 12c_histograms.sql rem Author: Jonathan Lewis rem Dated: Sep 2015 rem rem Define m_service to be the service name you connect to rem Define m_directory to be the O/S directory you to use for rem the export/import/log files rem rem Make sure this code is not over-writing an existing rem definition for a directory called DMPDIR before you rem start define m_service = 'orcl' define m_service = 'or32' define m_directory = '/mnt/working' host rm &m_directory/expdat.dmp host rm &m_directory/expdp.dmp create or replace directory dmpdir as '&m_directory'; drop table t1 purge; create table t1 nologging as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 -- > comment to avoid wordpress format issue ) select trunc(sqrt(rownum + 0)) frequency, trunc(sqrt(rownum + 0)) top_n, trunc(sqrt(rownum + 0)) hybrid from generator ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for columns frequency size 254 for columns top_n size 95 for columns hybrid size 50' ); end; / select column_name, num_distinct, histogram, num_buckets from user_tab_cols where table_name = 'T1' ; column endpoint_actual_value format a22 break on column_name skip 1 select column_name, endpoint_number, endpoint_value, endpoint_actual_value, endpoint_repeat_count from user_tab_histograms where table_name = 'T1' order by column_name, endpoint_number ;
Here’s an extract, from a 12.1.0.2 instance, of the results of the two queries with a large number of the rows from the histogram data deleted:
COLUMN_NAME Distinct HISTOGRAM Buckets -------------------- ------------ --------------- ---------- FREQUENCY 100 FREQUENCY 100 TOP_N 100 TOP-FREQUENCY 95 HYBRID 100 HYBRID 50 COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE ENDPOINT_REPEAT_COUNT -------------------- --------------- -------------- ---------------------- --------------------- FREQUENCY 3 1 1 0 8 2 2 0 15 3 3 0 ... 9800 98 98 0 9999 99 99 0 10000 100 100 0 HYBRID 3 1 1 3 224 14 14 29 440 20 20 41 ... 9603 97 97 195 9800 98 98 197 10000 100 100 1 TOP_N 1 1 1 0 16 7 7 0 33 8 8 0 ... 9753 98 98 0 9952 99 99 0 9953 100 100 0
The most important detail is the endpoint_repeat_count column of the hybrid histogram, although you should note that the endpoint_actual_value column is populated with a copy of the endpoint_value for all three histograms.
Now I’m going to use exp / drop table / imp to export, drop, and re-import the table with (one hopes) the exact same statistics. To do this I’ll be using the imp command with the option “statistics=always” with the intention of copying the stats from the export file into the destination database (you’ll have to substitute your own userid/password, of course):
host exp userid=test_user/test@&m_service file=expdat.dmp tables='(t1)' -- host expdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)' drop table t1 purge; host imp userid=test_user/test@&m_service file=expdat.dmp tables='(t1)' statistics=always -- host impdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'
So what do we see now when we re-run the two queries to report the histogram information. If you’ve used expdp/impdp the original output re-appears. If you’ve used exp/imp you’ll see the following:
COLUMN_NAME Distinct HISTOGRAM Buckets -------------------- ------------ --------------- ---------- FREQUENCY 100 FREQUENCY 100 TOP_N 100 FREQUENCY 95 HYBRID 100 FREQUENCY 50 COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE ENDPOINT_REPEAT_COUNT -------------------- --------------- -------------- ---------------------- --------------------- FREQUENCY 3 1 0 8 2 0 15 3 0 ... 9800 98 0 9999 99 0 10000 100 0 HYBRID 3 1 0 224 14 0 440 20 0 ... 9603 97 0 9800 98 0 10000 100 0 TOP_N 1 1 0 16 7 0 33 8 0 ... 9753 98 0 9952 99 0 9953 100 0
- The histograms on all three columns are now labelled as FREQUENCY.
- The endpoint_actual_value is null for all three – but that may be a purely cosmetic detail with no side effects.
- The histogrm on column hybrid reports the endpoint_repeat_count as zero for every value – the histogram really has become a frequency histogram – and that’s a critical change with potentially huge side effects on execution plans.
tl;dr
If you’re still using exp/imp instead of expdp/impdp to move tables (and, more importantly, their statistics) from one database to another and don’t change tools after an upgrade to 12c you can end up with hybrid histograms on the source system that are “downgraded” to frequency histograms on the destination system, with the effect that execution plans vary between the two systems.
[…] Importing Statistics (April 2018): If you’re still using exp/imp after upgrading to 12c you should make the change or you may lose hybrid and top-frequency histograms on the import. […]
Pingback by Upgrade catalogue | Oracle Scratchpad — February 2, 2023 @ 12:33 pm GMT Feb 2,2023 |