Oracle Scratchpad

February 16, 2017

Truncate 12c

Filed under: 12c,Infrastructure,Oracle — Jonathan Lewis @ 12:52 pm GMT Feb 16,2017

Here’s one of those little improvements in 12c (including 12.1) that will probably end up being described as “little known features” in about 3 years time. Arguably it’s one of those little things that no-one should care about because it’s not the sort of thing you should do on a production system, but that doesn’t mean it won’t be seen in the wild.

Rather than simply state the feature I’m going to demonstrate it, starting with a little code to build a couple of tables with referential integrity:


create table parent (
        id      number(4),
        name    varchar2(10),
        constraint par_pk primary key (id)
)
;

create table child(
        id_p    number(4)
                        constraint chi_fk_par
                        references parent
                        on delete cascade,
        id      number(4),
        name    varchar2(10),
        constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1,1,'Sally');
insert into child values(1,2,'Simon');

insert into child values(2,1,'Jack');
insert into child values(2,2,'Jill');

commit;


There’s one important detail in this code that isn’t taking the default and isn’t used very frequently – it’s the option on the foreign key to take the action “on delete cascade”. If you delete a row from the parent table then Oracle will automatically delete any referenced rows from the child table first thus avoiding the error ORA-02292: integrity constraint (TEST_USER.CHI_FK_PAR) violated – child record found. (Conveniently I have a suitable index on the child table that will bypass the problem of a mode 4 (or, where child rows already exist, mode 5) TM lock being taken on the child as the parent row is deleted.)

And here’s the demonstration of the new feature – working in 12.1 onwards:


truncate table parent;

truncate table parent cascade;

The first command will raise Oracle error ORA-02266: unique/primary keys in table referenced by enabled foreign keys, but the second command will truncate the parent and child tables “simultaneously”: but only if the referential integrity constraint is set to “on delete cascade”. If the referential integrity constraint is left to its default action then the second command will raise error: ORA-14705: unique or primary keys referenced by enabled foreign keys in table “TEST_USER”.”CHILD”

This feature (and several broadly similar features) also works with matching partitions of equi-partitioned (or ref partitioned) tables – and that’s a context where the requirement  is much more likely to appear than with non-partitioned tables.

 

February 13, 2017

Band Join 12c

Filed under: 12c,Execution plans,Oracle,Performance,Upgrades — Jonathan Lewis @ 1:53 pm GMT Feb 13,2017

One of the optimizer enhancements that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more  efficient.  Consider the following query (I’ll supply the SQL to create the demonstration at the end of the posting) which joins two tables of 10,000 rows each using a “between” predicate on a column which (just to make it easy to understand the size of the result set)  happens to be unique with sequential values though there’s no index or constraint in place:

select
        t1.v1, t2.v1
from
        t1, t2
where
        t2.id between t1.id - 1
                  and t1.id + 2
;

This query returns nearly 40,000 rows. Except for the values at the extreme ends of the range each of the 10,000 rows in t2 will join to 4 rows in t1 thanks to the simple sequential nature of the data. In 12.2 the query, with rowsource execution stats enabled, completed in 1.48 seconds. In 12.1.0.2 the query, with rowsource execution stats OFF, took a little over 14 seconds. (With rowsource execution stats enabled it took 12.1.0.2 a little over 1 minute to return the first 5% of the data – I didn’t bother to wait for the rest, though the rate would have improved over time.)

Here are the two execution plans – spot the critical difference:


12.1.0.2
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   1 |  MERGE JOIN          |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   2 |   SORT JOIN          |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   3 |    TABLE ACCESS FULL | T1   | 10000 |   146K|    27   (4)| 00:00:01 |
|*  4 |   FILTER             |      |       |       |            |          |
|*  5 |    SORT JOIN         |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   | 10000 |   146K|    27   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ID"<="T1"."ID"+2)   -- > had to add GT here to stop WordPress spoiling the format 
   5 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID">="T1"."ID"-1)

12.2.0.1
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   1 |  MERGE JOIN         |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   2 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   146K|    25   (4)| 00:00:01 |
|*  4 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   | 10000 |   146K|    25   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID"<="T1"."ID"+2 AND "T2"."ID">="T1"."ID"-1)

Notice how operation 4, the FILTER, that appeared in 12.1 has disappeared in 12.2 and the filter predicate that it used to hold is now part of the filter predicate of the SORT JOIN that has been promoted to operation 4 in the new plan.

As a reminder – the MERGE JOIN operates as follows: for each row returned by the SORT JOIN at operation 2 it calls operation 4. In 12.1 this example will then call operation 5 so the SORT JOIN there happens 10,000 times. It’s important to know, though, that the name of the operation is misleading; what’s really happening is that Oracle is “probing a sorted result set in local memory” 10,000 times – it’s only on the first probe that it finds it has to call operation 6 to read and move the data into local memory in sorted order.

So in 12.1 operation 5 probes (accesses) the in-memory data set starting at the point where t2.id >= t1.id – 1; I believe there’s an optimisation here because Oracle will recall where it started the probe last time and resume searching from that point; having found the first point in the in-memory set where the access predicate it true Oracle will walk through the list passing each row back to the FILTER operation as long as the access predicate is still true, and it will be true right up until the end of the list. As each row arrives at the FILTER operation Oracle checks to see if the filter predicate there is true and passes the row up to the MERGE JOIN operation if it is. We know that on each cycle the FILTER operation will start returning false after receiving 4 rows from SORT JOIN operation – Oracle doesn’t.  On average the SORT JOIN operation will send 5,000 rows to the FILTER operation (for a total of 50,000,000 values passed and discarded).

In 12.2, and for the special case here where the join predicate uses constants to define the range, Oracle has re-engineered the code to eliminate the FILTER operation and to test both parts of the between clause in the same subroutine it uses to probe and scan the rowsource. In 12.2 the SORT JOIN operation will pass 4 rows up to the MERGE JOIN operation and stop scanning on the fifth row it reaches. In my examples that’s an enormous (CPU) saving in subroutine calls and redundant tests.

Footnote:

This “band-join” mechanism only applies when the range is defined by constants (whether literal or bind variable). It doesn’t work with predicates like (e.g.):

where t2.id between t1.id - t1.step_back and t1.id + t1.step_forward

The astonishing difference in performance due to enabling rowsource execution statistics is basically due to the number of subroutine calls eliminated – I believe (subject to a hidden parameter that controls a “sampling frequency”) that Oracle will call the O/S clock twice each time it calls the second SORT JOIN operation from the FILTER operation to acquire the next row. In 12.1 we’re doing roughly 50M redundant calls to that SORT JOIN.

The dramatic difference in performance even when rowsource execution statistics isn’t enabled is probably something you won’t see very often in a production system – after all, I engineered a fairly extreme data set and query for the purposes of demonstration. Note, however, the band join does seemt to introduce a change in cost, so it’s possible that on the upgrade you may find a few cases where the optimizer will switch from a nested loop join to a merge join using a band-join.

February 9, 2017

Index bouncy scan

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 1:05 pm GMT Feb 9,2017

There’s a thread running on OTN at present about deleting huge volumes of duplicated data from a table (to reduce it from 1.1 billion to about 22 million rows). The thread isn’t what I’m going to talk about, though, other than quoting some numbers from it to explain what this post is about.

An overview of the requirement suggests that a file of about 2.2 million rows is loaded into the table every week with (historically) no attempt to delete duplicates. As a file is loaded into the table every row gets the same timestamp, which is the sysdate at load time. I thought it would be useful to know how many different timestamps there were in the whole table.  (From an averaging viewpoint, 1.1 billion rows at 2.2 million rows per week suggests about 500 dates/files/weeks – or about 9.5 years – but since the table relates to “customer accounts” it seems likely that the file was originally smaller and has grown over time, which means the hiostory may be rather longer than that.)

Conveniently there is an index on the “input_user_date” column in the table so we might feel happy running a query that simply does:


select
        distinct input_user_date
from
        customer_account
order by
        input_user_date
;

We might then refine the query to do a count(*) aggregate, or do some analytics to find any strange gaps in the timing of the weekly loads. However, all I’m really interested in is the number of dates because I’ve suggested we could de-duplicate the data by running a PL/SQL process that does a simple job for each date in turn, and I want to get an idea of how many times that job will run so that I can estimate how long the entire process might take.

The trouble with the basic query is that the table is (as you probably noticed) rather large, and so is the index. If we assume 8 bytes (which includes the length byte) for a date, 7 bytes for the rowid, 4 bytes overhead, and 100% packing we get about 420 index entries per leaf blocks, so with 1.1 billion entries the index is about 2.6 million leaf blocks. If the index had been built with compression (which means you’d only be recording a date once per leaf block) it would still be about 1.6 million leaf blocks. Fortunately we wouldn’t have to do much “real” sorting to report just a list of distinct values, or even the count(*) for each date, if we made Oracle use an index full scan – but it’s still a lot of work to read 1.6 million blocks (possibly using single block reads) and do even something as simple as a running count as you go. So I whipped up a quick and dirty bit of PL/SQL to do the job.

declare
        m_d1 date := to_date('01-Jan-0001');
        m_d2 date := to_date('01-Jan-0001');
        m_ct number := 0;
begin
        loop
                select
                        min(input_user_date)
                into
                        m_d2
                from
                        customer_account
                where
                        input_user_date > m_d1
                ;

                exit when m_d2 is null;

                m_ct := m_ct + 1;
                dbms_output.put_line('Count: ' || m_ct || '  Date: ' || m_d2);
                m_d1 := m_d2;

        end loop;
end;
/

The code assumes that the input_user_date hasn’t gone back to a silly date in the past to represent a “null date” (which shouldn’t exist anyway; if you want to use code like this but have a problem with a special “low-value” then you would probably be safest adding a prequel SQL that selects the min(columnX) where columnX is not null to get the starting value instead of using the a constant as I have done.

The execution path for the SQL statement should be an index-only: “index range scan (min/max)” which typically requires only 3 or 4 logical I/Os to find the relevant item for each date (which compares well with the estimated 2,200,000 / 420 = 5,238 leaf blocks we would otherwise have to scan through for each date). Here’s the path you should see:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |            |          |
|   2 |   FIRST ROW                  |       |     1 |     8 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| CA_I1 |     1 |     8 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("INPUT_USER_DATE">:B1)

I did build a little data set as a proof of concept – and produced a wonderful example of how the scale and the preceding events makes a difference that requires you to look very closely at what has happened. I used a table t1 in my example with a column d1, but apart from the change in names the PL/SQL block was as above.Here’s the code I used to create the data and prepare for the test:


create table t1 nologging
as
select
        trunc(sysdate) + trunc((rownum - 1)/100) d1,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 50000
;

execute dbms_stats.gather_table_stats(user,'t1')
alter table t1 modify d1 not null;

create index t1_i1 on t1(d1) nologging pctfree 95
;

select index_name, leaf_blocks from user_indexes;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

My data set has 500 dates with 100 rows per date, and the pctfree setting for the index gives me an average of about 8 leaf blocks per date (for a total of 4,167 leaf blocks). It’s only a small index so I’m expecting to see just 2 or 3 LIOs per date, and a total of about 500 physical reads (one per date plus a handful for reading branch blocks). Here’s the output from the running tkprof against the trace file:


SELECT MIN(D1)
FROM
 T1 WHERE D1 > :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    501      0.00       0.01          0          0          0           0
Fetch      501      0.08       0.18       4093       1669          0         501
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1003      0.09       0.19       4093       1669          0         501

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=3 pr=64 pw=0 time=9131 us)
         1          1          1   FIRST ROW  (cr=3 pr=64 pw=0 time=9106 us cost=3 size=8 card=1)
         1          1          1    INDEX RANGE SCAN (MIN/MAX) T1_I1 (cr=3 pr=64 pw=0 time=9089 us cost=3 size=8 card=1)(object id 252520)

I’ve done a physical read of virtually every single block in the index; but I have done only 3 buffer gets per date – doing fewer buffer gets than physical reads.

I’ve been caught by two optimisations (which turned out to be “pessimisations” in my test): I’ve flushed the buffer cache, so the Oracle runtime engine has decided to consider “warming up” the cache by reading extra blocks from any popular-looking objects that I’m accessing, and the optimizer may have given the run-time engine enough information to allow it to recognise that this index is subject to range scans and could therefore be a suitable object to use while warming up. As you can see from the following extracts from session events and session activity stats – we’ve done a load of multiblock reads through the index.


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               1           0           0.03        .031           6
db file scattered read                              136           0          13.54        .100           1


Name                                                                     Value
----                                                                     -----
physical reads                                                           4,095
physical reads cache                                                     4,095
physical read IO requests                                                  137
physical reads cache prefetch                                            3,958
physical reads prefetch warmup                                           3,958

This isn’t likely to happen, of course, in the production system where we’ll be starting with a fully loaded cache and the leaf blocks we need are (logically) spaced apart by several thousand intervening blocks.

Footnote

I can’t remember who first brought this strategy to my attention – though I’m fairly sure it was one of my Russian colleagues, who has blogged about ways to work around what is effectively a limitation of the “index skip scan”. Apologies to the originator, and if you recognise your work here please add a comment with URL below.

February 2, 2017

Upgrades

Filed under: Uncategorized — Jonathan Lewis @ 1:38 pm GMT Feb 2,2017

This is a note I wrote a couple of years ago, but never published. Given the way it’s written I think it may have been the outline notes for a presentation that I was thinking about rather than an attempt to write a little essay. Since it covers a number of points that are worth considering and since I’ve just rediscovered it by accident I thought I’d publish it pretty much as is. Many of the examples of change are now quite old – the intent was to demonstrate how to be cautious rather than trying to supply every possible change that might your next upgrade.

We start with a couple of

  • The effort worth spending to minimise the risk of performance-related surprises on an upgrade depends on the how critical the system is.
  • The effort needed to minimise the risk of performance-related surprises on an upgrade depends on how complex the system is.
  • The more “rare” features and “cunning strategies” and the greater the level of concurrent activity the more likely you are to find surprising effects.

Typical causes of problems are:

  • New automatic jobs installed by Oracle, which might cause extra load during overnight batch tasks
    • e.g. automatic stats collection (10g)
    • Automatic evolution of baselines (12c)
  • Changes to existing packaged procedures
    • e.g. switch to atomic refresh of MVs (11g)
      • changed the time take to do the refresh itself and added a new load to the redo log activity
  • automatic histograms (10g)
    • changed the time taken to collect stats
    • changed lots of execution plans as a side effect
    • changed the behaviour of cursor_sharing=similar as a side effect,
      • which increased CPU and library cache loading
  • “Notable changes” in behaviour
    • e.g. The standard audit trail no longer updates aud$ rows, it always inserts new records
      • This had side effects on redo generation
      • This also allowed Oracle to drop an index, with side effects on typical DBA queries
    • Oracle also introduced a timestamp with time zone as the “audit timestamp”
      • This had a catastrophic side effects on a particular OEM “failed logins” query
  • New optimizer  features appear
    • This can introduce lots of new plans – and a few may perform badly for your data set
  • Optimizer bugs will be fixed
    • This can also introduce new plans – and a few may perform badly for your data set
  • New stats collection mechanisms
    • e.g. approximate_ndv with auto_sample_size (11g)
    • If introduced by default you could get unlucky
      • (See histogram comment above, for example).
    • If not introduced by default (now) you might want to test and adopt them on the upgrade
    • This may require changes to your current coding, and checks for bad changes in plans.
  • Concurrency issues
    • Could introduce side effects in locking, latching, mutexes on busy systems
    • Very hard to test without knowing the system
    • e.g. partition exchange and query optimisation colliding (various changes in 11g)

Background reading which can prompt you to look at particular features of your application that might be at risk.

  • Read the “New Features” guide
  • Read the “Changes in this release” section of the various reference guides
  • Check MoS for upgrade notes for the version you are aiming at
  • Search MoS for any documents with the text: “Notable changes {in|of} behaviour”
  • Check MoS for bug fix notes for later releases (e.g. 12.1 if upgrading to 11.2)

Test as much as possible, and as realistically as possible. You could take advantage of flashback database to repeat large test suites starting from the same point in time with the same data set, but making changes to database structures or code strategies between tests. Proper concurrency tests are ultimately going to be the hardest things to do right – but you have to work on it (or understand Oracle’s architecture really well) if you are upgrading a highly concurrent system.

 

January 30, 2017

ASSM Help

Filed under: 12c,Oracle,Troubleshooting — Jonathan Lewis @ 12:33 pm GMT Jan 30,2017

I’ve written a couple of articles in the past about the problems of ASSM spending a lot of time trying to find blocks with usable free space. Without doing a bit of rocket science with some x$ objects, or O/S tracing for the relevant calls, or enabling a couple of nasty events, it’s not easy proving that ASSM might be a significant factor in a performance problem – until you get to 12c Release 2 where a staggering number of related statistics appear in v$sysstat.

I’ve published the full list of statistics (without explanation) at the end of this note, but here’s just a short extract showing the changes in my session’s ASSM stats due to a little PL/SQL loop inserting 10,000 rows, one row at a time into an empty table with a single index:

Name                                  Value
----                                  -----
ASSM gsp:get free block                 185
ASSM cbk:blocks examined                185
ASSM gsp:L1 bitmaps examined            187
ASSM gsp:L2 bitmaps examined              2
ASSM gsp:Search hint                      2
ASSM gsp:good hint                      185

It looks like we’ve checked a couple of “level 2” bitmap blocks (one for the table, one for the index, presumably) to pick a sequence of “level 1” bitmap blocks that have been very good at taking us to a suitable data (table or index) block that can be used.

You might have expected to see numbers more like 10,000 in the output, but remember that PL/SQL has lots of little optimisations built into it and one of those is that it pins a few blocks while the anonymous block is running so it doesn’t have to keep finding blocks for every single row.

In comparison here’s the effect of the same data load when operated as 10,000 separate insert statements called from SQL*Plus:

Name                                  Value
----                                  -----
ASSM gsp:get free block              10,019
ASSM cbk:blocks examined             10,220
ASSM cbk:blocks marked full             201
ASSM gsp:L1 bitmaps examined         10,029
ASSM gsp:L2 bitmaps examined              6
ASSM gsp:L2 bitmap full                   1
ASSM gsp:Search all                       1
ASSM gsp:Search hint                      2
ASSM gsp:Search steal                     1
ASSM gsp:bump HWM                         1
ASSM gsp:good hint                   10,016
ASSM rsv:fill reserve                     1

It’s interesting to note that in this case we see (I assume) a few cases where we’ve done the check for an L1 bitmap block, gone to a data blocks that was apparently free, and discovered that our insert would make to over full – hence the 201 “blocks marked full”.

Critically, of course, this is just another of the many little indications of how “client/server” chatter introduces lots of little bits of extra work when compared to the “Thick DB “ approach.

One final set of figures. Going back to an example that first alerted me to the type of performance catastrophes that ASSM could contribute to, I re-ran my test case on 12.2 and checked the ASSM figures reported. The problem was that a switch from a 4KB or 8KB blocks size to a 16KB bblock size produced a performance disaster. A version of my  test case and some timing results are available on Greg Rahn’s site.

In my test case I have 830,000 rows and do an update that sets column2 to column1 changing it from null to an 8-digit value. With a 16KB block size and PCTFREE set to a highly inappropriate value (in this case the default value of 10) this is what the new ASSM statistics looks like:


Name                                  Value
----                                  -----
ASSM gsp:get free block             668,761
ASSM cbk:blocks examined            671,404
ASSM cbk:blocks marked full           2,643
ASSM gsp:L1 bitmaps examined      1,338,185
ASSM gsp:L2 bitmaps examined        672,413
ASSM gsp:Search all                     332
ASSM gsp:Search hint                668,760
ASSM gsp:Search steal                   332
ASSM gsp:bump HWM                       332
ASSM wasted db state change         669,395

I’d love to know what the figures would have looked like if they had been available in the original Oracle 9.2.0.8 case (my guess is that the “blocks examined” statistic would have been in the order of hundreds of millions); they look fairly harmless in this case even though the database (according to some of the other instance activity stats) did roughly 10 times the work you might expect from a perfect configuration.

Even here, though, where the original catastrophic bug has been addressed, the ASSM stats give you an important clue: we’ve been doing a simple update so why have we even been looking for free space (get free block); even stranger, how come we had to examine 1.3M L1 bitmaps when we’ve only updated 830,000 rows surely the worst case scenario shouldn’t have been worse that 1 to 1; and then there’s that “wasted db state change” – I don’t understand exactly what that last statistic is telling me but when I’m worried about performance I tend to worry about anything that’s being wasted.

In passing – if you want to insert a single row into an unindexed table you can expect Oracle to examine the segment header, then an L2 bitmap block, then an L1 bitmap block to find a data block for the insert. (In rare cases that might be segment header, L3, L2, L1). There are then optimisation strategies for pinning blocks – the session will pin the L1 bitmap block briefly because it may have to check several data blocks it references because they may be full even though they are flagged as having space; similarly the session will pin the L2 bitmap block because it may need to mark an L1 bitmap block as full and check another L1 block. The latter mechanism probably explains why we have examined more L1 bitmaps than L2 bitmaps.

Finally, the full monty

Just a list of all the instance statistics that start with “ASSM”:

ASSM bg: segment fix monitor
ASSM bg:create segment fix task
ASSM bg:mark segment for fix
ASSM bg:slave compress block
ASSM bg:slave fix one segment
ASSM bg:slave fix state
ASSM cbk:blocks accepted
ASSM cbk:blocks examined
ASSM cbk:blocks marked full
ASSM cbk:blocks rejected
ASSM fg: submit segment fix task
ASSM gsp:Alignment unavailable in space res
ASSM gsp:L1 bitmaps examined
ASSM gsp:L2 bitmap full
ASSM gsp:L2 bitmaps examined
ASSM gsp:L3 bitmaps examined
ASSM gsp:Optimized data block rejects
ASSM gsp:Optimized index block rejects
ASSM gsp:Optimized reject DB
ASSM gsp:Optimized reject l1
ASSM gsp:Optimized reject l2
ASSM gsp:Search all
ASSM gsp:Search hint
ASSM gsp:Search steal
ASSM gsp:add extent
ASSM gsp:blocks provided by space reservation
ASSM gsp:blocks rejected by access layer callback
ASSM gsp:blocks requested by space reservation
ASSM gsp:bump HWM
ASSM gsp:get free block
ASSM gsp:get free critical block
ASSM gsp:get free data block
ASSM gsp:get free index block
ASSM gsp:get free lob block
ASSM gsp:good hint
ASSM gsp:reject L1
ASSM gsp:reject L2
ASSM gsp:reject L3
ASSM gsp:reject db
ASSM gsp:space reservation success
ASSM gsp:use space reservation
ASSM rsv:alloc from reserve
ASSM rsv:alloc from reserve fail
ASSM rsv:alloc from reserve succ
ASSM rsv:clear reserve
ASSM rsv:fill reserve
ASSM wasted db state change

January 27, 2017

DBaaS Performance

Filed under: 12c,Cloud,Oracle — Jonathan Lewis @ 7:58 am GMT Jan 27,2017

I don’t know how I missed it but Randolf Geist has been doing writing a series of posts on the performance of Oracle’s DBaaS offering, using a series of long-running tests to capture not only raw performance figures but also an indication of consistency. You can find all of these tests with a search URL on his blog, but I’ve also created a little index here to make it easier for me to access them in order.

Oracle Database Cloud (DBaaS) Performance Consistency

Oracle Database Cloud (DBaaS) Performance

h/t to Connor McDonald for the tweet that took me back to Randolf’s blog.

January 26, 2017

Basicfile LOBs

Filed under: Infrastructure,LOBs,Oracle — Jonathan Lewis @ 12:03 pm GMT Jan 26,2017

I wrote a short series a little while ago about some of the nasty things that can happen (and can’t really be avoided) with Basicfile LOBs and recently realised that it needed a directory entry so that I didn’t have to supply 6 URLs if I wanted to point someone to it; so here’s the catalogue:

At some stage I may also write a similar series about Securefile LOBs – because you do hit problems if you have a system that does a lot of work modifying a LOB segment whether or not it’s Basicfile or Securefile, and you need a strategy for damage limitation.

Footnote

At the time of creating this catalogue I’ve had an SR open with Oracle for about 4 months on the problem that triggered this series, basically asking if there was a way to limit the number of chunks that could be taken off the reusable part of the index. So far I haven’t had an answer to that question; however the client was able to switch the table into a partitioned table and now drops old partitions rather than deleting old data.

 

January 13, 2017

use_nl hint

Filed under: Hints,Oracle — Jonathan Lewis @ 8:52 am GMT Jan 13,2017

In response to a recent lamentation from Richard Foote about the degree of ignorance regarding the clustering_factor of indexes I commented on the similar level of understanding of a specific hint syntax, namely use_nl(a b) pointing out that this does not mean “do a nested loop from a to b”. My comment was underscored by a fairly prompt response asking what the hint did mean.

Surprisingly, although I’ve explained it many times over the last couple of decades (here’s one from 10 years ago), I couldn’t find an explanation on my blog though I did find a blog note where I’d made a passing comment about the equivalent misunderstanding of the use_hash(a b) syntax.

The misunderstanding is not entirely surprising since for many years the Oracle manuals seemed to suggest (in their examples) that the hint did have a multi-table meaning and it wasn’t until 10g that the manual gave an explicit statement of the single-table nature of the hint. The hint /*+ use_nl(a b) */ is a short-hand for the pair of hints /*+ use_nl(a)  use_nl(b) */ it doesn’t say anything about whether a and b should be joined, or in what order. If you want to guarantee that a and b will be joined in that order by a nested loop you will have to work a lot harder with your hints – and almost certainly need to make use of the /+ leading() */ hint.

Consider the following query (I’ll put the table creation code at the end of the article if you want to experiment):

select
	/*+ use_nl(a b) */
	a.v1, b.v1, c.v1, d.v1
from
	a, b, c, d
where
	d.n100 = 0
and	a.n100 = d.id
and	b.n100= a.n2
and	c.id = a.id
;

Only one of the tables a and b can be the first table in the final execution plan so one of them will be “the next table in the join order” at some point, so this hint will guarantee that one of the tables will be the inner table of a nested loop join. Here’s the plan I happened to get with my data, indexing, version (11.2.0.4), etc.:

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      | 20000 |  1347K| 30125   (1)| 00:00:02 |
|   1 |  HASH JOIN                     |      | 20000 |  1347K| 30125   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL            | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|   3 |   HASH JOIN                    |      | 20000 |  1054K| 30098   (1)| 00:00:02 |
|   4 |    TABLE ACCESS FULL           | D    |   100 |  1800 |    26   (4)| 00:00:01 |
|   5 |    NESTED LOOPS                |      | 20000 |   703K| 30072   (1)| 00:00:02 |
|   6 |     NESTED LOOPS               |      | 20000 |   703K| 30072   (1)| 00:00:02 |
|   7 |      TABLE ACCESS FULL         | B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   8 |      INDEX RANGE SCAN          | A_I2 |     2 |       |     1   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID| A    |     2 |    44 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

In this case it’s table a that ends up in a position to be the inner table of a nested loop join.

You may be wondering why there seems to be a hash join into b when we’ve hinted a nested loop join – but the join order that Oracle is using is B -> A -> D -> C with a swap_join_inputs(d) swap_join_inputs(d), so b is never “the next table in the join order”.

If you want an even more confusing (at first sight) plan here’s the plan I got if I changed the one hint to /*+ use_nl(a) */


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   1 |  HASH JOIN           |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   3 |   HASH JOIN          |      | 10000 |   537K|    78   (4)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|   5 |    HASH JOIN         |      | 10000 |   390K|    52   (4)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| D    |   100 |  1800 |    26   (4)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    | 10000 |   214K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------

This plan really looks as if Oracle should have done a nested loop into a but didn’t. Again appearanced are deceptive thanks to the effects of swap_join_inputs(): the join order here is A -> D -> C -> B (note that we don’t have a use_nl(b) hint in this example).

If you want a plan where the optimizer produces a nested loop join between a and b you’ll need to put in a leading() hint which places b immediately after a somewhere in the list of tables with just use_nl(b) being sufficient to enforce the join method. Here, for example, is the plan with hints /*+ leading(d a b c) use_nl(b) */ for my data set:


----------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        | 20000 |  1347K| 30164   (1)| 00:00:02 |
|   1 |  HASH JOIN                    |        | 20000 |  1347K| 30164   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL           | C      | 10000 |   146K|    26   (4)| 00:00:01 |
|   3 |   NESTED LOOPS                |        | 20000 |  1054K| 30137   (1)| 00:00:02 |
|   4 |    NESTED LOOPS               |        |  1000K|  1054K| 30137   (1)| 00:00:02 |
|   5 |     HASH JOIN                 |        | 10000 |   390K|    52   (4)| 00:00:01 |
|   6 |      TABLE ACCESS FULL        | D      |   100 |  1800 |    26   (4)| 00:00:01 |
|   7 |      TABLE ACCESS FULL        | A      | 10000 |   214K|    26   (4)| 00:00:01 |
|   8 |     INDEX RANGE SCAN          | B_I100 |   100 |       |     1   (0)| 00:00:01 |
|   9 |    TABLE ACCESS BY INDEX ROWID| B      |     2 |    28 |   101   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Notice, yet again, Oracle has done hash join to c with a swap_join_inputs().

Creation Script:


create table a
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
	rownum				id,
	mod(rownum,5000)		n2,
	mod(rownum,100)			n100,
	lpad(rownum,10,'0')		v1,
	lpad('x',100,'x')		padding
from
        generator       v1
;

create table b nologging as select * from a;
create table c nologging as select * from a;
create table d nologging as select * from a;

alter table a add constraint a_pk primary key(id);
alter table b add constraint b_pk primary key(id);
alter table c add constraint c_pk primary key(id);
alter table d add constraint d_pk primary key(id);

create index a_i2 on a(n2) nologging;
create index b_i2 on b(n2) nologging;
create index c_i2 on c(n2) nologging;
create index d_i2 on d(n2) nologging;

create index a_i100 on a(n100) nologging;
create index b_i100 on b(n100) nologging;
create index c_i100 on c(n100) nologging;
create index d_i100 on d(n100) nologging;
begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'A',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'B',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'C',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'D',
		method_opt	 => 'for all columns size 1'
	);
end;
/

January 10, 2017

Join Elimination 12.2

Filed under: 12c,Bugs,Execution plans,Oracle — Jonathan Lewis @ 1:03 pm GMT Jan 10,2017

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:


rem
rem	Script:		join_elimination_12c2.sql
rem	Author:		Jonathan Lewis
rem	

-- Environment details eliminated

define m_pad=100

/*
	IDs will be 1 to 1000
*/

create table grandparent 
as
select 
	rownum			id,
	trunc((rownum-1)/5)	small_num_g,
	rpad(rownum,10)		small_vc_g,
	rpad(rownum,&m_pad)	padding_g
from 
	all_objects 
where 
	rownum <= 1000
;

/*
	Each GP has two (scattered) children here
	Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
	1+mod(rownum,1000)	id_g,
	rownum			id,
	trunc((rownum-1)/5)	small_num_p,
	rpad(rownum,10)		small_vc_p,
	rpad(rownum,&m_pad)	padding_p
from 
	all_objects 
where 
	rownum <= 2000
;

/*
	Simple trick to get 5 (clustered) children per parent
	Child IDs are 1 to 12,000
*/

create table child 
as
select 
	id_g,
	id			id_p,
	rownum			id,
	trunc((rownum-1)/5)	small_num_c,
	rpad(rownum,10)		small_vc_c,
	rpad(rownum,&m_pad)	padding_c
from 
	parent	p,
	(
		select /*+ no_merge */ 
			rownum 
		from	parent p 
		where	rownum <= 5
	)	d
;

create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem	Don't need to collect stats because it's 12c
rem

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered gp -> p -> c
prompt	The final plan is GP->C, The optimizer eliminated P before
prompt	considering GP
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered c -> p -> gp
prompt	The final plan is a tablescan of C only. The optimizer managed 
prompt	to eliminate GP first and P second
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	child		c,
	parent		p,
	grandparent	g 
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	==================================================
prompt	Convert to ANSI standard in the order gp -> p -> c
prompt	and both gp and p eliminated.
prompt	==================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g
join
	parent		p
on	p.id_g = g.id
join
	child		c
on	c.id_p = p.id
and	c.id_g = p.id_g
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===================================================
prompt	Convert to ANSI standard in the order c -> p -> gp
prompt	and only p is eliminated. 
prompt	===================================================

select 
	count(c.small_vc_c)
from 
	child		c
join
	parent		p
on      p.id   = c.id_p 
and	p.id_g = c.id_g 
join
	grandparent	g
on	g.id = p.id_g 
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.

Footnote:

If you’re wondering why the “traditional” and “ANSI” syntax should exhibit this flaw for joins in the opposite direction – remember that ANSI SQL is first transformed into an equivalent Oracle form and – in the simple cases – the first two tables form the first query block then each table after that introduces a new query block, so the optimizer strategy does (approximately) the following translation:


select ... from grandparent join parent join child

==>

select ... from (select ... from grandparent join parent) join child

The optimizer then optimizes the inline query, which eliminates grandparent leaving a join between parent and child, which then allows parent to be eliminated.

Conversely we get:

select ... form child join parent join grandparent 

==>

select ... from (select ... from child join parent) join grandparent

In this form the optimizer eliminates parent from the inline view and is left with a join between child and grandparent – so no further elimination.

December 13, 2016

Index Compression

Filed under: 12c,Indexing,Oracle — Jonathan Lewis @ 1:11 pm GMT Dec 13,2016

Richard Foote has published a couple of articles in the last few days on the new (licensed under the advanced compression option) compression mechanism in 12.2 for index leaf blocks. The second of these pointed out that the new “high compression” mechanism was even able to compress single-column unique indexes – a detail that doesn’t make sense and isn’t allowed for the older style “leading edge deduplication” mechanism for index compression.

In 12.2 an index can be created (or rebuilt) with the option “compress advanced high” – and at the leaf-block level this will create “compression units” (possibly just one per leaf block – based on my early testing) that takes the complexity of compression far beyond the level of constructing a directory of prefixes. Richard demonstrated the benefit by creating a table with a numeric unique index – then compressed the index, reducing its size from 2,088 leaf blocks to 965 leaf blocks, which is pretty dramatic difference.

It crossed my mind, though, to wonder whether the level of compression was a side effect of the very straightforward code that Richard had used to create the data set: the table was a million rows with a primary key that had been generated as the rownum selected from the now-classic “connect by..” query against dual, and the row length happened to allow for 242 rows per 8KB table block.

If you pause to think about this data set you realise that if you pick the correct starting point and walk through 242 consecutive entries of the index you will be walking through 242 consecutive rows in the table starting from the zeroth row in a particular table block and ending at the 241st row in that block. A rowid (as stored by Oracle in a simple B-tree index) consists of 6 bytes and the first five bytes of the rowid will be the same for the first 256 rows in any one table block (and the first four will still be the same for the remaining rows in the block). Richard’s data set will be very close to ideal for any byte-oriented, or bit-oriented, compression algorithm because (to use Oracle terminology) the data will have a perfect clustering_factor. (He might even have got a slightly better compression ratio if he’d used an /*+ append */ on the insert, or done a CTAS, and reduced the rowsize to get a uniform 256 rows per table block.)

So how do things change if you randomise the ordering of the key ? Here’s a variant on Richard’s code:


rem
rem	Script:		index_compression_12c_2.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Dec 2016
rem
rem	Last tested 
rem		12.2.0.1
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
	rownum				id,
	lpad('x',130,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
order by
	dbms_random.value
;

select table_name, blocks, round(num_rows/blocks,0) rpb from user_tables where table_name = 'T1';

drop index t1_i1;
create unique index t1_i1 on t1(id);
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

drop index t1_i1;
create unique index t1_i1 on t1(id) compress advanced high;
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

The initial drop index is obviously redundant, and the calls to gather_index_stats should also be redundant – but they’re there just to make it obvious I haven’t overlooked any checks for correctness in the build and stats.

You’ll notice that my row length is going to be slightly more “real-world” than Richard’s so that the degree of compression I get from nearly identical rowid values is likely to be reduced slightly, and I’ve completely randomised the order of key values.

So what do the results like ?

With the default pctfree = 10, and in a tablespace of uniform 1MB extents, 8KB blocks, utilising ASSM I get this:


TABLE_NAME               BLOCKS        RPB
-------------------- ---------- ----------
T1                        19782         51

INDEX_NAME           COMPRESSION     PCT_FREE LEAF_BLOCKS
-------------------- ------------- ---------- -----------
T1_I1                DISABLED              10        2088
T1_I1                ADVANCED HIGH         10        1303

Unsurprisingly the uncompressed index is exactly the same size as Richard’s (well, it was just the integers from 1 to 1M in both cases) but the compression ratio is significantly less – though still pretty impressive.

Of course, for this type of index my example probably goes to the opposite extreme from Richard’s. Realistically if you have a sequence based key with an OLTP pattern of data arrival then consecutive key values are likely to be scattered within a few blocks of each other rather than being scattered complely randomly across the entire width of the table; so a more subtle model (using a suitable number of concurrent processes to insert ids based on a sequence, perhaps) would probably get a better compression ratio than I did, though a worse one than Richard’s.There’s also the issue of the size of the key value itself – once you get to values in the order of 10 million to 100 million you’re looking at mostly 4 bytes (internal format) storage where for large runs of values the first 3 bytes match, possibly leading to a better compression ratio.

Of course the question of globally partitioned indexes will be relevant for some people since the principle reason for global indexes on partitioned tables is to enforce uniqueness on column combinations that don’t include the partition key, and that introduces another possible benefit – the rowid goes up to 10 bytes, of which the first 4 bytes are the object id of the table partition: depending on the nature of the partitioning that repeated 4 bytes per row may be close to non-existent after compression, giving you a better compression ratio on globally partitioned than you get on any other type of single column unique index.

Once you start looking at the details there are a surpising number of factors that can affect how well the advanced compression high can work.

Footnote:

Once you’ve created the index, you can start poking around in all sorts of ways to try and work out what the compression algorithm does. A simple block dump is very informative, with lots of clues in the descriptive details – and lots of puzzles when you start looking too closely. There are hints that this type of index compression adopts a strategy similar to “oltp comprssion” for tables in that compression occurs only as the leaf block becomes full – and possibly allows some sort of batching process within a leaf block before finally compressing to a single contiguous unit. (This is just conjecture, at present: the only firm statement I’ll make about the actual implementation of index compression is that it uses a lot of CPU; in my example the baseline create index took about 1.5 seconds of CPU, the compressed create took about 4.5 seconds of CPU.)

There are also a couple of amusing side effects that may confound those who use the old “validate index / query index_stats” two-step to decide when to rebuild indexes. Here’s what I got on the compressed index:


SQL> validate index t1_i1;

SQL> select blocks, lf_rows, lf_rows_len, btree_space, used_space, pct_used from index_stats;

    BLOCKS    LF_ROWS LF_ROWS_LEN BTREE_SPACE USED_SPACE   PCT_USED
---------- ---------- ----------- ----------- ---------- ----------
      1408    1000000	 14979802    10416812	14994105	144

My index is using 144% of the space that it has been allocated. You don’t have to be very good at maths (or math, even) to realise that something’s wrong with that number.

December 7, 2016

Extended Stats

Filed under: extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 3:54 pm GMT Dec 7,2016

After my Masterclass on indexes at the UKOUG Tech2016 conference this morning I got into a conversation about creating extended stats on a table. I had pointed out in the masterclass that each time you dropped an index you really ought to be prepared to create a set of extended stats (specifically a column group) on the list of columns that had defined the index just in case the optimizer had been using the distinct_keys statistic from the index to help it calculate cardinalities.

Unfortunately there is a limit on the number of column groups (or any other type of extended stats) you can have on a table and that limit is the larger of 20 and ceiling(number of columns / 10) – so you typically run into a problem if you want to take defensive action after dropping more than twenty (multi-column) indexes. (And you wonder how Oracle’s adaptive dynamic stats process that silently creates column groups overnight handles the problem of needing far more column groups than are allowed.)

The conversation led on to the oddity that the column count includes the virtual columns representing the column groups so, for example, if you have 253 columns in your table you can create 26 column groups; but if you have 26 column groups that means you have a total of 279 columns, so you can actually create a total of 28 groups (an extra 2); but if you create those two column groups you now have a total of 281 columns in the table which means you’re allowed a total of 29 column groups so you can add one more column group for a total of 282 columns. Here’s some code (which I’ve run only on 11.2.0.4) to play with – to keep things very simple I’ve generated some trivial extended stats rather than column groups:


rem
rem     Script:         extended_stats_limit2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2016
rem

drop table t1 purge;

begin
        for i in 2..253 loop
                execute immediate
                'alter table t1 add (c' || to_char(i,'FM000') || ' number)';
        end loop;
end;
/

desc t1

prompt  ============================================================================================
prompt  This will raise an error on the 30th addition
prompt  ORA-20008: Number of extensions in table TEST_USER.T1 already reaches the upper limit (28.2)
prompt  ============================================================================================

declare
        ext_string varchar2(20);
begin
        for i in 1..30 loop
                ext_string := '(c001 + ' || i || ')';
                dbms_output.put_line(
                        dbms_stats.create_extended_stats(
                                ownname         => user,
                                tabname         => 'T1',
                                extension       => ext_string
                        )
                );
        end loop;
end;
/

column column_name format a32

select
        column_name, hidden_column, virtual_column, segment_column_id, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
order by
        internal_column_id
;

This code results in a table with 253 segment columns, and 29 hidden, virtual columns (with names like SYS_STU0#$2X$X1M4NFZVM2O_5A3FC) representing the extended stats. What if I want more extended stats ? There is no limit on virtual columns in general, beyond the inherent table limit of 1,000 columns total, so what if I create a few virtual columns (another 39, say, taking my total column count to 321): would this allow me to increase the number of extended stats to 33 – and if so, what would happen if I then dropped the virtual columns:


prompt  ============================================
prompt  Now we add some virtual columns after which
prompt  we will be able to add more extended stats
prompt  and drop the virtual columns
prompt  ============================================

begin
        for i in 1..39 loop
                execute immediate
                'alter table t1 add (virt' || to_char(i,'fm000') ||
                        ' generated always as ( c002 + ' || i || ') virtual)'
                ;
        end loop;
end;
/

select
        column_name, hidden_column, virtual_column, segment_column_id, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
order by
        internal_column_id
;

prompt  ============================================================================================
prompt  We can now get up to 33 extended stats
prompt  This will raise an error on the attempt to add the 34th set
prompt  ORA-20008: Number of extensions in table TEST_USER.T1 already reaches the upper limit (32.5)
prompt  ============================================================================================

declare
        ext_string varchar2(20);
begin
        for i in 30..34 loop
                ext_string := '(c001 + ' || i || ')';
                dbms_output.put_line(
                        dbms_stats.create_extended_stats(
                                ownname         => user,
                                tabname         => 'T1',
                                extension       => ext_string
                        )
                );
        end loop;
end;
/

select
        column_name, hidden_column, virtual_column, segment_column_id, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
order by
        internal_column_id
;


select
        column_name, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
and     hidden_column = 'YES'
and     virtual_column = 'YES'
order by
        internal_column_id
;

prompt  ============================
prompt  Now drop the virtual columns
prompt  ============================

begin
        for r in (
                select column_name from user_tab_cols
                where  column_name like 'VIRT%'
        ) loop
                execute immediate
                'alter table t1 drop column ' || r.column_name;
        end loop;
end;
/

select
        column_name, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
and     virtual_column = 'YES'
order by
        internal_column_id
;

When I ran this code I ended up with a table consisting of 286 columns, of which 253 were my original columns and 33 – with internal column ids of 254 to 286 inclusive – were the extended stats. It seems there is a way to bypass the limit if you really want to – though I’m not sure I’d really want to do it on a production system.

Left as Exercise for the Reader:

Create a table with 5 real columns and the 26 column groups needed to represent all (multi-column) combinations of those five columns. (Remember that the order of columns in a column group is not really significant). (The 26 groups consist of: 1 x 5 column, 5 x 4 column, 10 x 3 column, 10 x 2 column – this may remind some of you of binomial expansions, others may remember it as a row from Pascal’s triangle, you could also view it as a particular subset of the binary representations of the integers from 1 to 31.)

 

December 2, 2016

Histogram Upgrade

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 3:02 pm GMT Dec 2,2016

I’ve written notes in the past about the improvements 12c introduces for histograms – particularly the frequency and top-N histograms which can be generated “free of charge” while the basic “approximate NDV” scans are taking place to gather stats. Gathering histograms in 12c is much safer than it used to be in earlier versions of Oracle even in the case of the new hybrid histograms (which are still sampled on a very small sample and therefore still a stability risk).

There is a threat, though, recently highlighted by Franck Pachot, that sneaked in at 11.2.0.4 – the way in which the calculation of endpoint values for histograms on char() and nchar() columns has changed. When you upgrade from anything prior to 11.2.0.4 to either 11.2.0.4 or 12c you need to recreate any historgams on those columns; note that this does not apply to varchar2() and nvarchar2() columns, just the fixed length character types. If you fail to do this then you may find that Oracle produces some very silly estimates of cardinality, which could result in some very inefficient tablescans – in particular you are likely to find (as Franck showed) cases where you “know” that a particular value is IN the histogram but the optimizer behaves as if it isn’t – which means it uses the “half the least popular” estimate for the cardinality.

Here’s a little demo to show the underlying difference:


rem
rem     Script:         histogram_change_11204.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem     Purpose:
rem

create table t1 (v1 varchar2(32), c1 char(32));
insert into t1
select
        case when rownum <= 100 then 'N' else 'Y' end,
        case when rownum <= 100 then 'N' else 'Y' end
from
        all_objects
where
        rownum <= 1000
;
begin
        dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 10'
        );
end;
/

column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999
break on column_name skip 1

select
        column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_number
;

Here are the results from an instance of 11.1.0.7 (though anything up to 11.2.0.3 should produce the same), and 11.2.0.4 (and later – including 12.2):

Results 11.1.0.7
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  405,650,737,316,592,000,000,000,000,000,000,000    4e20202020203a7bb119d5f6000000
                                1000  462,766,002,760,475,000,000,000,000,000,000,000    59202020202034d998ff0b5ae00000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Results 11.2.0.4
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Look particularly at the first 6 bytes of the Hex version of the endpoint values for the char() column c1. In 11.1.0.7 you see “4e2020202020”, “592020202020” – that’s ASCII ‘N’ and ‘Y’ respectively, padded to 6 characters with spaces. In 11.2.0.4 the spaces have disappeared – the char() columns are now padded to 6 characters with zeros (which is how varchar2() columns have always been treated).

In 11.1.0.7 the optimizer will find a histogram entry for c1 = ‘Y’ and produce a cardinality of 900; if you upgrade the database to 11.2.0.4 without recreating the histograms the optimizer won’t find a histogram entry for the predicate and will produce a cardinality of 50 (i.e. 100 / 2).

Footnote

There’s a brief summary of the algorithm Oracle uses to generate values for character-based histograms at this URL.

 

November 29, 2016

Delete/Insert #2

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 4:33 pm GMT Nov 29,2016

In the previous post I threw out a couple of options for addressing the requirement to transfer data from one table to another (“cut and paste” rather than just “copy”) without running into odd inconsistency errors. This triggered of a wonderful comment trail of alternatives based on how large the volume might be, how relaxed the concurrency requirements might be, and so on.

A comment by SydOracle1 picked up on my failure to get Oracle working with the “as of SCN” syntax because it kept reporting ORA-08187 and suggested a straightforward use of the VERSIONS strategy. I thought it was quite surprising that this could work given that “as of SCN” didn’t, so I whipped up a quick test to check it – adding a couple of little refinements to the supplied sample – and it worked.

create table t1
as
select	object_id, object_name, owner
from	all_objects
;

create table t2
as
select	* from t1
where	rownum = 0
;

alter table t1 add constraint t1_pk primary key(object_id);

execute dbms_stats.gather_table_stats(user,'t1')

execute dbms_lock.sleep(5)

That dbms_lock.sleep() is very important for the purposes of this demonstration; it has to be just a few seconds otherwise the references back to earlier SCNs could report error: “ORA-01466: unable to read data – table definition has changed”. This is probably a side effect due to the 3 second interval in the capture that Oracle uses in the table smon_scn_time.

So now we do the following:

  • find the current SCN,
  • start a transaction,
  • get the transaction id,
  • delete the data from the source table,
  • find the current SCN again,
  • insert into the target table the data that was deleted by our transaction in the interval spanned by the two SCNs.

 


rem
rem     insert_delete_2.sql
rem     Jonathan Lewis
rem     Nov 2016
rem

column current_scn new_value m_start_scn 
select to_char(current_scn,'FM999999999999999999999') current_scn from v$database;

variable m_tx_id varchar2(20)
exec :m_tx_id := dbms_transaction.local_transaction_id(true)

column xid new_value m_xid
select xid from v$transaction where xidusn || '.' || xidslot || '.' || xidsqn = :m_tx_id;

delete from t1 where owner = 'SYSTEM';
commit;

column current_scn new_value m_end_scn format 999999999999999999
select to_char(current_scn,'FM999999999999999999999') current_scn from v$database;
 
insert	into t2
select	r.*
from	t1  versions between scn &m_start_scn and &m_end_scn r
where 
	versions_operation = 'D'
and	versions_xid = '&m_xid'
and	owner = 'SYSTEM'
;

commit;

There are a couple of variants on getting the transaction ID – I decided to use a function call to start a transaction without doing any work rather than doing the delete and then finding the transaction id that the delete initiated (I could have linked v$session for my SID to v$transaction after the delete). Because of the choice I made I have to do a little bit of messing around in the subsequent code – the function call returns the transaction ID in the form 31.16.19111 (that’s undo segment, slot number, sequence) but the VERSIONS mechanism wants a transaction ID in its HEX form which, for the example shown, would be ‘1F001000A74A0000’. I could have converted the three part form to the other using a messy bit of to_char(,’XXXXXXXX’) code, but I was feeling a little lazy.

To my surprise I didn’t see any ORA-08187 errors – which made me look back at the notes I had jotted down on the couple of tests I’d initially tried to find out what I had been doing wrong. My first attempt did the insert first then tried to do the delete “as of SCN” and failed, so my second attempt tried to do the delete first just in case the problem related to using “as of SCN” in the middle of a transaction:


column current_scn new_value m_scn

prompt  ============
prompt  Insert first
prompt  ============

select to_char(current_scn,'FM999999999999999999') current_scn from V$database;

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

delete from t1 as of scn &m_scn r
where owner = 'SQLTXADMIN'
;

-- ORA-08171: snapshot expression not allowed here

rollback;

prompt  ============
prompt  Delete first
prompt  ============

select to_char(current_scn,'FM999999999999999999') current_scn from V$database;

delete from t1 as of scn &m_scn r
where owner = 'SQLTXADMIN'
;

-- ORA-08171: snapshot expression not allowed here

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

rollback;

Clearly “delete as of scn” is illegal.

Of course, if I’d gone a little further with this idea I might have tried starting with a delete that didn’t use “as of SCN”, and then the code would have succeeded. In fact, though, this wouldn’t be a perfect solution because it would allow a window for error: some other session might delete a relevant row between my call for SCN and my delete, which means my insert would insert a row deleted by another user.

The code could be modified though in its choice of SCN. Provided I started my transaction with the delete I could then query v$transaction for the start SCN for the transaction, and use that as the “as of” SCN for the insert:


delete from t1
where owner = 'SQLTXADMIN'
;

select
        to_char(
                start_scnw * power(2,32) + start_scnb,
                'FM999999999999999999'
        )       current_scn
from
        v$transaction
where   ses_addr = (
                select  saddr
                from    v$session
                where   sid = (
                        select  sid
                        from    V$mystat
                        where   rownum = 1
                )
        )
;

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

By using the delete to initiate the transaction and set the SCN I think we block any window of inconsistency and, apart from the messy little bit of code that finds the transaction entry, we have an even simpler piece of code than the example give by SydOracle.

Collaboration or, kicking ideas around, is a wonderful way to learn.

 

November 22, 2016

Delete/Insert

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 12:59 pm GMT Nov 22,2016

Many of the questions that appear on OTN are deceptively simple until you start thinking carefully about the implications; one such showed up a little while ago:

What i want to do is to delete rows from table where it matches condition upper(CATEGORY_DESCRIPTION) like ‘%BOOK%’.

At the same time i want these rows to be inserted into other table.

The first problem is this: how carefully does the requirement need to be stated before you can decide how to address it? Trying to imagine awkward scenarios, or boundary conditions, can help to clarify the issue.

If you delete before you insert, how do you find the data to insert ?

If you insert before you delete, what happens if someone updates a row you’ve copied so that it no longer matches the condition. Would it matter if the update changes the row in a way that leaves it matching the condition (what you’ve inserted is not totally consistent with what you’ve deleted).

If you insert before you delete, and someone executes some DML that makes another row match the requirement should you delete it (how do you avoid deleting it) or leave it in place.

Once you start kicking the problem about you’ll probably come to the conclusion that the requirement is for the delete and insert to be self-consistent – in other words what you delete has to be an exact match for what you insert as at the time you inserted it. You’ll ignore rows that come into scope in mid-process due to other activity, and you’ll have to stop people changing rows that are being transferred (in case there’s an audit trail that subsequently says that there was, at some point in time, a row that matched the condition but never arrived – and a row that has arrived that didn’t match the final condition of the rows that disappeared).

Somehow your code needs to lock the set of rows to be transferred and then transfer those rows and eliminate them. There are two “obvious” and simple strategies – readers are invited to propose others (or criticise the two I – or any of the comments – suggest). I’ll start with a simple data setup for testing:


create table t1
as
select  object_id, object_name, owner
from    all_objects
;

alter table t1 add constraint t1_pk primary key(object_id);

create table t2
as
select  * from t1
where   rownum = 0
;

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

Option 1:

The simplest approach is often the best – until, perhaps, you spot the drawbacks – do a basic delete of the data to be transferred (which handles the locking) but wrap the statement in a PL/SQL block that captures the data (using the returning clause) and then inserts it into the target table as efficiently as possible. With thanks to Andrew Sayer who prompted this blog post:

declare
        type t1_rows is table of t1%rowtype;
        t1_deleted t1_rows;

begin
        delete from t1 where owner = 'SYSTEM'
        returning object_id, object_name, owner bulk collect into t1_deleted;

        forall i in 1..t1_deleted.count
                insert into t2 values t1_deleted(i);

        commit;
end;
/

The drawback to this, of course, is that if the volume to be transferred is large (where “large” is probably a fairly subjective measure) then you might not want to risk the volume of memory (PGA) it takes to gather all the data with the bulk collect.

Option 2:

For large volumes of data we could reduce the threat to the PGA by gathering only the rowids of the rows to be transferred (locking the rows as we do so) then do the insert and delete based on the rowids:

declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;
        fetch c1 bulk collect into r;
        close c1;

        forall i in 1..r.count
                insert into t2 select * from t1 where rowid = r(i);

        forall i in 1..r.count
                delete from t1 where rowid = r(i);

        commit;
end;
/

Note, particularly, the “for update” in the driving select.

Inevitably there is a drawback to this strategy as well (on top of the threat that the requirement for memory might still be very large even when the return set is restricted to just rowids). We visit the source data (possibly through a convenient index and avoid visiting the table, [silly error deleted – see comment 8]) to collect rowids; then we visit the data again by rowid (which is usually quite efficient) to copy it, then we visit it again (by rowid) to delete it. That’s potentially a significant increase in buffer cache activity (especially latching) over the simple “delete returning” strategy; moreover the first strategy gives Oracle the option to use the index-driven optimisation for maintaining indexes and this method doesn’t. You might note, by the way, that you could include an “order by rowid” clause on the select; depending on your data distribution and indexes this might reduce the volume of random I/O you have to do as Oracle re-visits the table for the inserts and deletes.

We can address the PGA threat, of course, by fetching the rowids with a limit:


declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;

--      dbms_lock.sleep(60);

        loop
                fetch c1 bulk collect into r limit 5;

                forall i in 1..r.count
                        insert into t2 select * from t1 where rowid = r(i);

                forall i in 1..r.count
                        delete from t1 where rowid = r(i);

                exit when r.count != 5;
        end loop;

        close c1; 

        commit;
end;
/

One thing to be aware of is that even though we fetch the rowids in small batches we lock  all the relevant rows when we open the cursor, so we don’t run into the problem of inserting thousands of rows into t2 and then finding that the next batch we select from t1 has been changed or deleted by another session. (The commented out call to dbms_lock.sleep() was something I included as a way of checking that this claim was true.) This doesn’t stop us running into a locking (or deadlocking) problem, of course; if it takes us 10 seconds to lock 1M rows in our select for update another user might manage to lock our millionth row before we get there; if, a few seconds later, it then gets stuck in a TX/6 wait trying to lock one of our locked rows after we start waiting in a TX/6 wait for our millionth row our session will time out after 3 further seconds with an ORA-00060 deadlock error.

The limit of 5 is just for demonstration purposes, of course – there were 9 rows in all_objects that matched the select predicate; in a production system I’d probably raise the limit as high as 255 (which seems to be the limit of Oracle’s internal array-processing).

You’ll notice, of course, that we can’t use this limited fetch approach with the delete command – the entire delete would take place as we opened the equivalent cursor and, though we can use the bulk collect with the returning clause, there is no syntax that allows something like the fetch with limit to take place.

Discarded option

My first thought was to play around with the AS OF SCN clause.  Select the current SCN from v$database and then do things like delete “as of scn”, or “select for update as of scn” – there were ways of getting close, but invariably I ended up running into Oracle error: “ORA-08187: snapshot expression not allowed here”. But maybe someone else can come up with a way of doing this that doesn’t add significant overheads and doesn’t allow for inconsistent results.

Update:

Following a comment from SydOracle1 below I’ve revisited the “as of SCN” approach and discovered that I had fooled myself into discarding the option too quickly. The results are in a follow-up blog.

November 14, 2016

Distributed Trap

Filed under: distributed,Oracle — Jonathan Lewis @ 1:19 pm GMT Nov 14,2016

Here’s an interesting (and potentially very useful) observation from an OTN database forum thread that appeared at the end of last week. It concerns the problems of pulling data from remote systems, and I’ll start by building some data:

rem
rem     Script:         remote_insert_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(sysdate) + rownum d1,
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',50)            padding
from
        generator
where
        rownum <= 500
;

alter table t2 add constraint t2_pk primary key(id);

create table t3
as
select sysdate d1, t1.* from t1
where rownum = 0
;

--  Now gather stats if you need to (depending on version)

I’ve created three tables. Table t3 is an empty copy of table t1 with a date column added, and t2 is some sort of reference table that looks rather like table t1 but has a lot less data. Now I’m going to pretend that t1 and t2 are in a remote database while t3 is in the local database and copy data from t1 to t3, omitting any data that is referenced in t2. The SQL is simple:


define m_target=test@loopback

insert into t3(
        id, small_vc, padding
)
select
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

----------------------------------------------------------------------
| Id  | Operation                | Name | Cost (%CPU)| Inst   |IN-OUT|
----------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     0   (0)|        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |            |        |      |
|   2 |   REMOTE                 |      |            |   TEST | R->S |
----------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - EXPLAIN PLAN INTO PLAN_TABLE@! FOR SELECT
       "A1"."ID","A1"."SMALL_VC","A1"."PADDING" FROM "T1" "A1" WHERE
       "A1"."ID"<>ALL (SELECT "A2"."ID" FROM "T2" "A2") (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

I’ve set up an SQL*Plus substitution variable to hold a database link name (and used a loopback qualifier so that I can pretend t1 and t2 are in a remote database. The execution plan (taken from an explain plan, but confirmed by running the query and calling dbms_xplan.display_cursor) shows that Oracle has executed select part of the insert as a “fully remote” statement – which is nice.

Unfortunately I forgot to include a datestamp as I copied the data over. So let’s modify the query to do that:


insert into t3(
        d1,
        id, small_vc, padding
)
select
        sysdate,
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

--------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |      1 |        |   123 (100)|      0 |00:00:27.42 |   10908 |
|   1 |  LOAD TABLE CONVENTIONAL |      |      1 |        |            |      0 |00:00:27.42 |   10908 |
|*  2 |   FILTER                 |      |      1 |        |            |  49500 |00:00:26.51 |       0 |
|   3 |    REMOTE                | T1   |      1 |  50000 |   113   (3)|  50000 |00:00:00.33 |       0 |
|   4 |    REMOTE                | T2   |  50000 |      1 |     0   (0)|    500 |00:00:25.29 |       0 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM  "T2" WHERE "T2"."ID"=:B1))

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","SMALL_VC","PADDING" FROM "T1" "T1"
       (accessing 'TEST.LOCALDOMAIN@LOOPBACK' )

   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID" FROM "T2" "T2" WHERE "ID"=:1 (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

Whoops, the plan just changed – it also took 27.4 seconds instead of the 1.1 seconds that it used to – that’s because of the 50,000 remote calls to execute a remote query for the subquery filter. The addition of sysdate (which is the local sysdate@!) to the select list has made the select statement distributed instead of fully remote, and the query for a CTAS or “insert/select” has to be driven from the local site if it’s a distributed query.

Okay, plan (b), don’t insert sysdate, add it to the table as a default:


alter table t3 modify d1 default sysdate;

This doesn’t help; even though the query doesn’t mention sysdate explicitly the query is still treated as disrtibuted query.

Okay, plan (c) – don’t insert sysdate, add a trigger to the table:

alter table t3 modify d1 default null;

create or replace trigger t3_bri
before insert on t3
for each row
begin
        :new.d1 := sysdate;
end;
/


And this works (in 11.2.0.4 and 12.1.0.2, at least, which are the versions I tested).

I could have made the trigger a little more sophisticated, of course, but the point of the post was simply to demonstrate a problem and a simple workaround.

There are probably several other commonly used features (various sys_context() calls, perhaps) that have the same effect.

 

Next Page »

Powered by WordPress.com.