Oracle Scratchpad

August 23, 2021

Distributed Query

Filed under: distributed,Execution plans,Hints,Oracle,subqueries,Transformations,Troubleshooting — Jonathan Lewis @ 5:24 pm BST Aug 23,2021

Here’s an example that appeared on the Oracle Developer Community forum about a year ago that prompted me to do a little investigative work. The question involved a distributed query that was “misbehaving” – the interesting points were the appearance of the /*+ rule */ and /*+ driving_site() */ hints in the original query when combined with a suggestion to address the problem using the /*+ materialize */ hint with factored subqueries (common table expressions – CTEs), or when combined with my suggestion to use the /*+ no_merge */ hint.

If you don’t want to read the whole article there’s a tl;dr summary just before the end.

The original question was posed with a handful of poorly constructed code fragments that were supposed to describe the problem, viz:


select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ; -- query n°1

select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table); -- query n°2

select * from
select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ,
select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table)
where att_table_1 = att_table_2  -- sic

The crux of the problem was that the two separate statements individually produced an acceptable execution plan but the attempt to use the queries in inline views with a join resulted in a plan that (from the description) sounded like the result of Oracle merging the two inline views and running the two IN subqueries as FILTER (existence) subqueries.

We weren’t shown any execution plans and only had the title of the question (“Distributed sql query through multiple databases”) to give us the clue that there might be three different databases involved.

Obviously there are several questions worth asking when presented with this problem. The first being “can we have a more realistic piece of code”, also “which vesion of Oracle”, and “where are the execution plans”. I can’t help feeling that there’s more to the problem than just the three tables that seem to be suggested by the fragments supplied.

More significant, though, was the surprise that rule and driving_site should work together. There’s a long-standing (but incorrect) assertion that “any other hint invalidates the RULE hint”. I think I’ve published an example somewhere showing that /*+ unnest */ would affect an execution plan where the optimizer still obeyed the /*+ rule */ hint, and there’s an old post on this blog which points out that transformation and optimisation are (or were, at the time) independent of each other, implying that you could combine the rule hint with “transformational” hints and still end up with a rule-based execution plan.

Despite old memories suggesting the contrary my first thought was that the rule and driving_site hints couldn’t be working together – and that made it worth running a little test. Then one of the other specialists on the forums suggested using subquery factoring with the materialize hint – and I thought that probably wouldn’t help because when you insert into a global temporary table the driving site has to become the site that holds the global temporary tables (in fact this isn’t just a feature of GTTs). So there was another thing prompting me to run a test. (And then I suggested using the /*+ no_merge */ hint – but thought I’d check if that idea was going to work before I suggested it.)

So here’s a code sample to create some data, and the first two simple queries with calls for their predicted execution plans:

rem
rem     Script:         distributed_multi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

rem     create public database link test@loopback using 'test';
rem     create public database link test2@loopback using 'test2';

rem     create public database link orcl@loopback using 'orcl';
rem     create public database link orcl2@loopback using 'orcl2';

rem     create public database link orclpdb@loopback using 'orclpdb';
rem     create public database link orclpdb2@loopback using 'orclpdb2';

define m_target=test@loopback
define m_target2=test2@loopback

define m_target=orcl@loopback
define m_target2=orcl2@loopback

define m_target=orclpdb@loopback
define m_target2=orclpdb2@loopback

create table t0
as
select  *
from    all_objects
where   mod(object_id,4) = 1
;

create table t1
as
select  *
from    all_objects
where   mod(object_id,11) = 0
;

create table t2
as
select  *
from    all_Objects
where   mod(object_id,13) = 0
;

explain plan for
select  /*+ driving_site(t1) */
        t1.object_name, t1.object_id
from    t1@&m_target    t1
where
        t1.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

explain plan for
select
        /*+ rule driving_site(t2) */
        t2.object_name, t2.object_id
from    t2@&m_target2   t2
where
        t2.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

Reading from the top down – t0 is in the local database, t1 is in remote database 1, t2 is in remote database 2. I’ve indicated the creation and selection of a pair of public database links at the top of the script – in this case both of them are loopback links to the local database, but I’ve used substitition variables in the SQL to allow me to adjust which databases are the remote ones. Since there are no indexes on any of the tables the optimizer is very limited in its choice of execution plans, which are as follows in 19.3 (the oraclepdb/orclpdb2 links).

First, the query against t1@orclpdb1 – which will run cost-based:


-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|*  1 |  HASH JOIN SEMI        |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   2 |   TABLE ACCESS FULL    | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   3 |   REMOTE               | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A1"."OBJECT_ID"="A2"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement

You’ll note that operation 3 is simply REMOTE, and t0 is the object accessed – which means this query is behaving as if the (local) t0 table is the remote one as far as the execution plan is concerned. The IN-OUT column tells us that this operation is “Remote to Serial” (R->S)” and the instance called to is named “!” which is how the local database is identified in the plan from a remote database.

We can also see that the execution plan gives us the “Remote SQL Information” for operation 2 – and that’s the text of the query that gets sent by the driving site to the instance that holds the object of interest. In this case the query is simply selecting the object_id values from all the rows in t0.

Now the plan for the query against t2@orclpdb2 which includes a /*+ rule */ hint:

-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   SORT JOIN            |          |        |      |
|   3 |    TABLE ACCESS FULL   | T2       | ORCLP~ |      |
|*  4 |   SORT JOIN            |          |        |      |
|   5 |    VIEW                | VW_NSO_1 | ORCLP~ |      |
|   6 |     SORT UNIQUE        |          |        |      |
|   7 |      REMOTE            | T0       |      ! | R->S |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A1"."OBJECT_ID"="OBJECT_ID")
       filter("A1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   7 - SELECT /*+ RULE */ "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement
   - rule based optimizer used (consider using cbo)

The most striking feature of this plan is that it is an RBO (rule based optimizer) plan not a cost-based plan – and the Note section confirms that observation. We can also see that the Remote SQL Information is echoing the /*+ RULE */ hint back in it’s query against t0. Since the query is operating rule-based the hash join mechanism is not available (it’s a costed path – it needs to know the size of the data that will be used in the build table), and that’s why the plan is using a sort/merge join.

Following the “incremental build” strategy for writing SQL all we have to do as the next step of producing the final code is put the two queries into separate views and join them:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ rule driving_site(t2) */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

And here’s the execution plan – which, I have to admit, gave me a bit of a surprise on two counts when I first saw it:


-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT       |          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   MERGE JOIN           |          |        |      |
|   3 |    MERGE JOIN          |          |        |      |
|   4 |     SORT JOIN          |          |        |      |
|   5 |      REMOTE            | T2       | ORCLP~ | R->S |
|*  6 |     SORT JOIN          |          |        |      |
|   7 |      REMOTE            | T1       | ORCLP~ | R->S |
|*  8 |    SORT JOIN           |          |        |      |
|   9 |     VIEW               | VW_NSO_1 |        |      |
|  10 |      SORT UNIQUE       |          |        |      |
|  11 |       TABLE ACCESS FULL| T0       |        |      |
|* 12 |   SORT JOIN            |          |        |      |
|  13 |    VIEW                | VW_NSO_2 |        |      |
|  14 |     SORT UNIQUE        |          |        |      |
|  15 |      TABLE ACCESS FULL | T0       |        |      |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
       filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   8 - access("T2"."OBJECT_ID"="OBJECT_ID")
       filter("T2"."OBJECT_ID"="OBJECT_ID")
  12 - access("T1"."OBJECT_ID"="OBJECT_ID")
       filter("T1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   5 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T2" "T2"
       (accessing 'ORCLPDB2.LOCALDOMAIN@LOOPBACK' )

   7 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T1" "T1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - rule based optimizer used (consider using cbo)

The two surprises were that (a) the entire plan was rule-based, and (b) the driving_site() selection has disappeared from the plan.

Of course as soon as I actually started thinking about what I’d written (instead of trusting the knee-jerk “just stick the two bits together”) the flaw in the strategy became obvious.

  • Either the whole query runs RBO or it runs CBO – you can’t split the planning.
  • In the words of The Highlander “There can be only one” (driving site that is) – only one of the database involved will decide how to decompose and distribute the query.

It’s an interesting detail that the /*+ rule */ hint seems to have pushed the whole query into the arms of the RBO despite being buried somewhere in the depths of the query rather than being in the top level query block – but we’ve seen that before in some old data dictionary views.

The complete disregard for the driving_site() hints is less interesting – there is, after all, a comment in the manuals somewhere to the effect that when two hints contradict each other they are both ignored. (But I did wonder why the Hint Report that should appear with 19.3 plans didn’t tell me that the hints had been observed but not used.)

The other problem (from the perspective of the OP) is that the two inline views have been merged so the join order no longer reflects the two isolated components we used to have. So let’s fiddle around a little bit to see how close we can get to what the OP wants. The first step would be to add the /*+ no_merge */ hint to both inline view, and eliminate one of the /*+ driving_site() */ hints to see what happens, and since we’re modern we’ll also get rid of the /*+ rule */ hint:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ qb_name(subq1) no_merge driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ qb_name(subq2) no_merge */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|*  1 |  HASH JOIN             |      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|   2 |   VIEW                 |      |  4342 |   334K|    14   (8)| 00:00:01 |        |      |
|   3 |    REMOTE              |      |       |       |            |          |      ! | R->S |
|   4 |   VIEW                 |      |  5168 |   398K|    57   (8)| 00:00:01 |        |      |
|*  5 |    HASH JOIN SEMI      |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   6 |     TABLE ACCESS FULL  | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   7 |     REMOTE             | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A2"."OBJECT_ID"="A1"."OBJECT_ID")
   5 - access("A3"."OBJECT_ID"="A6"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - EXPLAIN PLAN INTO "PLAN_TABLE" FOR SELECT /*+ QB_NAME ("SUBQ2") NO_MERGE */
       "A1"."OBJECT_NAME","A1"."OBJECT_ID" FROM  (SELECT DISTINCT "A3"."OBJECT_ID"
       "OBJECT_ID" FROM "T0" "A3") "A2","T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1" WHERE
       "A1"."OBJECT_ID"="A2"."OBJECT_ID" (accessing '!' )

   7 - SELECT "OBJECT_ID" FROM "T0" "A6" (accessing '!' )

Note
-----
   - fully remote statement

In this plan we can see that the /*+ driving_site() */ hint has been applied – the plan is presented from the point of view of orclpdb (the database holding t1). The order of the two inline views has apparently been reversed as we move from the statement to its plan – but that’s just a minor side effect of the hash join (picking the smaller result set as the build table).

Operations 5 – 7 tell us that t1 is treated as the local table and used for the build table in a hash semi-join, and then t0 is accessed by a call back to our database and its result set is used as the probe table.

From operation 3 (in the body of the plan, and in the Remote SQL Information) we see that orclpdb has handed off the entire t2 query block to a remote operation – which is ‘accessing “!”. But there’s a problem (in my opinion) in the SQL that it’s handing off – the text is NOT the text of our inline view; it’s already been through a heuristic transformation that has unnested the IN subquery of our original text into a “join distinct view” – if we had used a hint to force this transformation it would have been the /*+ unnest(UNNEST_INNERJ_DISTINCT_VIEW) */ variant.

SELECT /*+ NO_MERGE */
        "A1"."OBJECT_NAME","A1"."OBJECT_ID"
FROM
       (SELECT DISTINCT "A3"."OBJECT_ID" "OBJECT_ID" FROM "T0" "A3") "A2",
       "T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1"
WHERE
        "A1"."OBJECT_ID"="A2"."OBJECT_ID"

I tried to change this by adding alternative versions of the /* unnest() */ hint to the original query, following the query block names indicated by the outline information (not shown), but it looks as if the code path constructs the Remote SQL operates without considering the main query hints – perhaps the decomposition code is simply following the code path of the old heuristic “I’ll do it if it’s legal” unnest. The drawback to this is that if the original form of the text had been sent to the other site the optimizer that had to handle it could have used cost-based query transformation and may have come up with a better plan.

You may be wondering why I left the /*+ driving_site() */ hint in one of the inline views rather than inserting it in the main query block. The answer is simple – it didn’t seem to work (even in 19.3) when I put /*+ driving_site(t1@subq1) */ in the main query block.

tl;dr

The optimizer has to operate rule-based or cost-based, it can’t do a bit of both in the same query – so if you’ve got a /*+ RULE */ hint that takes effect anywhere in the query the entire query will be optimised under the rule-based optimizer.

There can be only one driving site for a query, and if you manage to get multiple driving_site() hints in a query that contradict each other the optimizer will ignore all of them.

When the optimizer decomposes a distributed query and produces non-trivial components to send to remote sites you may find that some of the queries constructed for the remote sites have been subject to transformations that you cannot influence by hinting.

Footnote

I mentioned factored subqueries and the /*+ materialize */ option in the opening notes. In plans where the attempt to specify the driving site failed (i.e. when the query ran locally) the factored subqueries did materialize. In any plans where the driving site was a remote site the factored subqueries were always inline. This may well be related to the documented (though not always implemented) restriction that temporary tables cannot take part in distributed transactions.

June 1, 2021

Distributed Sequences

Filed under: distributed,Infrastructure,Oracle — Jonathan Lewis @ 5:55 pm BST Jun 1,2021

This is an other detail to add to the note I wrote recently about the effects of adding a sequence number to a remote select when moving data from one database to another. It probably shouldn’t have much impact on performance unless it runs into code that does a lot of single row processing to handle a large batch of data.

We start with the database link, tables and sequence I created in the previous article (link will open in a separate tab/window for easy viewing), but this time I’m going to do nothing but a simple select from a single remote table – with, or without, a call to my local sequence.

alter system flush shared_pool;

select  /*+ id_only */
        id 
from    test_user.t1@&m_target
where   rownum < 4;

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate +remote -plan_hash'));

select  /*+ seq_only */
        s1.nextval 
from    test_user.t1@&m_target
where   rownum < 4;

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate +remote -plan_hash'));

select  /*+ both */
        id, v1, s1.nextval 
from    test_user.t1@&m_target
where   rownum < 4;

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate +remote -plan_hash'));


I’ve flushed the shared pool before running my three queries because I’m going to want to do a trivial search of the library cache to pick up some run-time information as easily as possible.

A key feature of this set of queries is the “rownum” predicate – and the surprisie you get when sequences and remote queries don’t treat it nicely. (It’s also demonstrating a point about checking which bits of Oraclec code run at what stage of a query.)

Here’s the execution plan output from the three queries:

select /*+ id_only */  id from test_user.t1@orclpdb@loopback where rownum < 4

NOTE: cannot fetch plan for SQL_ID: a3jkr0hqbyccp, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)



select  /*+ seq_only */  s1.nextval from test_user.t1@orclpdb@loopback where rownum < 4

-------------------------------------------------
| Id  | Operation        | Name | Inst   |IN-OUT|
-------------------------------------------------
|   0 | SELECT STATEMENT |      |        |      |
|   1 |  SEQUENCE        | S1   |        |      |
|*  2 |   COUNT STOPKEY  |      |        |      |
|   3 |    REMOTE        | T1   | ORCLP~ | R->S |
-------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<4)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT 0 FROM "TEST_USER"."T1" "T1" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )



select /*+ both */  id, s1.nextval from test_user.t1@orclpdb@loopback where rownum < 4

-------------------------------------------------
| Id  | Operation        | Name | Inst   |IN-OUT|
-------------------------------------------------
|   0 | SELECT STATEMENT |      |        |      |
|   1 |  SEQUENCE        | S1   |        |      |
|*  2 |   COUNT STOPKEY  |      |        |      |
|   3 |    REMOTE        | T1   | ORCLP~ | R->S |
-------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<4)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "ID" FROM "TEST_USER"."T1" "T1" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

The first query (id only) is full remote, so the call to dbms_xplan.display_cursor() couldn’t find a plan in the local library cache for it, hence the Note.

The second query (sequence number only) has the sort of plan you’d expect – a fetch with stop key. Looking at the “Remote SQL information”, though, you can see that the “rownum” predicate has not reached the remote site – you might wonder what effect this wil have.

The final query (id and sequence number) seems to have an identical plan, again without passing a rownum predicate to the remote.

But let’s search for the remote SQL in the library cache – we know from experience that it will be sent in all capitals (with identifiers double quoted), Here’s a suitable query with its results:

column sql_text format a60


select
        sql_id, parse_calls, fetches, executions, rows_processed, sql_text
from    v$sql
where   sql_text like 'SELECT%T1%'
;


SQL_ID        PARSE_CALLS    FETCHES EXECUTIONS ROWS_PROCESSED SQL_TEXT
------------- ----------- ---------- ---------- -------------- ------------------------------------------------------------
71bq0j03wtsmz           2          0          0              0 SELECT /*+ FULL(P) +*/ * FROM "TEST_USER"."T1" P
1kzgwm8cj1t1x           1          2          1              3 SELECT 0 FROM "TEST_USER"."T1" "T1"
dkxd06sct6y76           1          2          1              3 SELECT "A1"."ID" FROM "TEST_USER"."T1" "A1" WHERE ROWNUM<4
c6cavnnps7kn8           1          2          1          10000 SELECT "ID" FROM "TEST_USER"."T1" "T1"


There are 4 statements that have been parsed at the remote site. I think the first statement above is probably sent from the local to the remote to allow semantic checking at the local site – there’s a little oddity in the timing of when the parse_calls column is incremented, the statement is actually sent for all three of my queries, but always seems to report one less parse than the number of statement I actually test with.

The other three statements are the different effects we get from the three statements I executed. You’ll notice that they’ve all done 2 Fetch calls – this is the standard “get the first row, get more rows” effect that you usually see from SQL*Plus. The interesting points lie in the differences between the texts sent and in the numbers of rows fetched.

  • Selecting the id only results in the (3rd) text: SELECT “A1″.”ID” FROM “TEST_USER”.”T1″ “A1” WHERE ROWNUM<4 which fetches exactly the 3 rows demanded by its rownum predicate.
  • Selecting the sequence only results in the (2nd) text: SELECT 0 FROM “TEST_USER”.”T1″ “T1” which has no rownum predicate, but manages to fetch only the 3 rows required.
  • Selecting both id and sequence results in the (4th) text: SELECT “ID” FROM “TEST_USER”.”T1″ “T1” which again has no rownum predicate, but in this case fetches all 10,000 rows from the table. This isn’t quite as bad as it appears at first glance, the data fetched is one SDU (session data unit), it’s not deliberately the entire table and if we select id and v1 the rows_processed drops, in my case, to 3,879. It’s worth remembering, of course, that this could still be a lot of work at the remote server, and the SDU in 12.2 onwards has a maxmimum value of 2MB.

tl;dr

The bottom line on this little note, though is simply this: even in the very simplest cases, sequences make a difference to the way that Oracle handles remote/disrtibuted queries – so look very closely at how you’re using them in these circumstances

May 9, 2021

Driving site patch

Filed under: distributed,Hints,Oracle — Jonathan Lewis @ 12:23 pm BST May 9,2021

A recent question on the Oracle-L list server asked “Is there a way I can use an SQL_PATCH to add a driving_site() hint to a query?” to which the reply was: “Just do it, it should work.” Unfortunately this produced the response: “I’m on 11.2.0.4, I’ve tried it, it doesn’t seem to be working.” I do approve of the “doesn’t seem to be” – it’s much more encouraging than a flat assertion that “it doesn’t work”, and helps encourage further dialogue.

I’ve come across this problem before (though I couldn’t find any notes I’d written about it – so possibly they’re only on a client site, or maybe it was a different hint displaying the same symptom) and it’s possible that the solution is very easy. Here’s a little data set to test with – created on 11.2.0.4 and then tested on 19.3.0.0:

rem
rem     Script:         patch_driving_site.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2021
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem

define m_target=orclpdb@loopback

execute sys.dbms_sqldiag.drop_sql_patch('driving_site');

create table t1
as
select
        *
from
        all_objects
where
        rownum <= 10000
;

alter table t1 add constraint t1_pk primary key (object_id);

create table t2
as
select
        *
from
        all_objects
where
        rownum <= 10000
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns owner size 254'
        );
end;
/

I’ve created two tables which I’m going to join, but I’ve created a loopback database link that I’ll use to make one of them appear to be at a remote database. The data comes from view all_objects, and includes just 7 rows out of 10,000 for owner=’OUTLN’, which is why I’ve created a histogram on the owner column – but only for one of the tables.

You’ll notice I’ve issued a call to the dbms_sqldiag package to drop an SQL patch called “driving_site” just to make sure I don’t confuse the issue (i.e. myself) if I re-run the test in a couple of days time (I’ve also got a similar call at the end of the script). To run this test you’ll have to grant execute privileges on this package to your test schema, and if you’re on 11g you’ll also have to grant execute privileges on the package dbms_sqldiag_internal. so that you can create the SQL patch.

Here’s my query, with the default execution plan I got on the run against 19.3.0.0:

select
        t1.object_name,
        t1.object_type,
        t2.object_name,
        t2.object_type
from
        t1,
        t2@&m_target    t2
where
        t2.object_id = t1.object_id
and     t2.owner     = 'OUTLN'
/

select * from table(dbms_xplan.display_cursor(format=>'outline alias'));


SQL_ID  4dssxbjvzzrc0, child number 0
-------------------------------------
select  t1.object_name,  t1.object_type,  t2.object_name,
t2.object_type from  t1,  t2@orclpdb@loopback t2 where  t2.object_id =
t1.object_id and t2.owner     = 'OUTLN'

Plan hash value: 3828311863

------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |    54 (100)|          |        |      |
|   1 |  NESTED LOOPS                |       |  1111 |   165K|    54   (8)| 00:00:01 |        |      |
|   2 |   NESTED LOOPS               |       |       |       |            |          |        |      |
|   3 |    REMOTE                    | T2    |  1111 |   130K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |       |       |            |          |        |      |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    33 |    27   (8)| 00:00:01 |        |      |
------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   3 - SEL$1 / T2@SEL$1
   4 - SEL$1 / T1@SEL$1
   5 - SEL$1 / T1@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."OBJECT_ID"))
      NLJ_BATCHING(@"SEL$1" "T1"@"SEL$1")
      USE_NL(@"SEL$1" "T1"@"SEL$1")
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T2"@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "OWNER","OBJECT_NAME","OBJECT_ID","OBJECT_TYPE" FROM "T2" "T2" WHERE
       "OWNER"='OUTLN' (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - this is an adaptive plan

The optimizer has decided to do a nested loop join, fetching an estimated 1,111 rows (there are 10,000 rows in total and 9 distinct values for owner – and the optimizer doesn’t examine remote histograms!) The costing may seem a little surprising – an incremental cost of 27 for 1,111 probes of the index and table does seem a little low, but I’m not going to comment on that in this note.

We are actually going to get only 7 rows in the tablescan, so the path is a good one; but the Note tells us it is an adaptive plan and if at run-time the number of rows had been too large (as it would probably be for some other value of owner) Oracle would have switched to a hash join as the query is running.

Let us pretend, however, that we know that we could get better performance if the remote database optimised and executed the query. If we add the hint /*+ driving_site(t2) */ to the query we get the following from the call to dbms_xplan.display_cursor():

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID  86n3j9s9q9k47, child number 0

select  /*+ driving_site(t2) */  t1.object_name,  t1.object_type,
t2.object_name,  t2.object_type from  t1,  t2@orclpdb@loopback t2 where
 t2.object_id = t1.object_id and t2.owner     = 'OUTLN'

NOTE: cannot fetch plan for SQL_ID: 86n3j9s9q9k47, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)

The call can’t find the plan because it’s the remote database that generated it from a piece of text that the local database sent to it, and the plan “belongs to” that text, not to the original query. So that almost tells us that the driving_site() hint has done its job properly – but we can double-check by searching the remote database’s library cache for the SQL that it actually ran.

set linesize 132
column sql_text wrap word format a75

select  sql_id, sql_text 
from    V$sql 
where   sql_text like '%OUTLN%'
;


SQL_ID        SQL_TEXT
------------- ---------------------------------------------------------------------------
5hmjcxgt0jc8t SELECT
              "A2"."OBJECT_NAME","A2"."OBJECT_TYPE","A1"."OBJECT_NAME","A1"."OBJECT_TYPE"
              FROM "T1"@! "A2","T2" "A1" WHERE "A1"."OBJECT_ID"="A2"."OBJECT_ID" AND
              "A1"."OWNER"='OUTLN'


select * from table(dbms_xplan.display_cursor('5hmjcxgt0jc8t'));


SQL_ID  5hmjcxgt0jc8t, child number 0
-------------------------------------
SELECT "A2"."OBJECT_NAME","A2"."OBJECT_TYPE","A1"."OBJECT_NAME","A1"."OB
JECT_TYPE" FROM "T1"@! "A2","T2" "A1" WHERE
"A1"."OBJECT_ID"="A2"."OBJECT_ID" AND "A1"."OWNER"='OUTLN'

Plan hash value: 3485226535

-------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    33 (100)|          |        |      |
|   1 |  NESTED LOOPS      |      |     7 |   924 |    33   (4)| 00:00:01 |        |      |
|*  2 |   TABLE ACCESS FULL| T2   |     7 |   280 |    26   (4)| 00:00:01 |        |      |
|   3 |   REMOTE           | T1   |     1 |    92 |     1   (0)| 00:00:01 |      ! | R->S |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("A1"."OWNER"='OUTLN')

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "OBJECT_NAME","OBJECT_ID","OBJECT_TYPE" FROM "T1" "A2" WHERE
       :1="OBJECT_ID" (accessing '!' )


As we can see from the plan – which executed from the remote instance – the (now-local) histogram comes into play with a correct cardinality estimate for t2 and we do a nested loop requesting one row at a time from the (now remote) t1 by object_id. So the driving_site() hint works when written inline, which means we can work on creating a patch to apply it from “outside”.

begin
        dbms_output.put_line('19.3 patch name: ' || sys.dbms_sqldiag.create_sql_patch(
--      sys.dbms_sqldiag_internal.i_create_patch(
                sql_text        =>  
q'{
select
        t1.object_name,
        t1.object_type,
        t2.object_name,
        t2.object_type
from
        t1,
        t2@&m_target    t2
where
        t2.object_id = t1.object_id
and     t2.owner     = 'OUTLN'
}',
                hint_text       => 'driving_site(t2)',
--              hint_text       => 'driving_site(t2@sel$1)',
                name            => 'driving_site'
        ))       -- extra bracket for dbms_output.put_line
        ;
end;
/

There are a couple of funny format details in this code fragment. First, dbms_sqldiag in 19c uses a function to create an SQL patch while 11g uses a procedure in dbms_sqldiag_internal, so there’s a messy optional bit near the top of the code and the reminder to count closing brackets near the bottom.

Second – when I used an SQL patch to add the hint that had worked it didn’t do what it was suppoed to do (even though the Note for the execution plan reported: “SQL patch “driving_site” used for this statement”. That’s why you see two versions (one commented) of the hint_text parameter. To get the SQL patch working I had to use the fully qualified alias of the remote table which, as you can see in the Query Block / Object Alias information from the first plan I reported, is t2@sel$1.

With the corrected SQL patch in place the text sent to the remote database was exactly the same as it was when I used the inline (but slightly lucky) /*+ driving_site(t2) */ hint. Quite possibly I should have gone one step further and made the hint_text include the query block as well, vizl: driving_site(@sel$1 t2@sel$1) I suspect, though, that that might not always be necessary (or even correct) – at some point I’ll have to check what happens if the reported query block has appeared as a consequence of a transformation and no longer matches the original query block from the object alias.

Summary

I’ve often made the point that you have to be very thorough with hints and this note demonstrates a variation of that theme. There are cases where you can get away with being a little sloppy when specifying an object alias in an “inline” hint but have to be more precise when imposing it from outside the original source SQL.

Adding the “originating” query block to construct the “fully qualified” object alias is the first refinement that you should think of; adding in a “target” query block name may also be necessary for some hints. Both these pieces of information can be reported in the various calls to dbms_xplan by including the “alias” option in the format parameter as you build your test cases towards a complete solution.

Footnote

Another example of the need for “refinement” to the simple in-line hint – a tweet thread from a couple of years ago where the hint use_nl_with_index() needing a fully qualified alias and/or the query block name when moved into an SQL patch.

March 12, 2021

Distributed Sequences

Filed under: distributed,Execution plans,Oracle,Performance,Problem Solving — Jonathan Lewis @ 9:09 am GMT Mar 12,2021

A request for help came up some time ago on ODC reporting a query that was hanging when it included a sequence.nextval. In fact the intial “query” was an “insert as select” with a select that was a join of two remote tables.

Making the fairly automatic assumption that many people say “hanging” when they really mean “hasn’t finished yet” the first thought I had about the structure of the statement was that it was just a variation of the standard problem of distributed DML. I haven’t written anything previously about how using sequences can introduce the problem so here’s a note to demonstrate the issue and suggest a workaround:

We start with a database link and a few tables:


rem
rem     Script:         distributed_sequence.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1
rem

rem     create public database link orcl@loopback using 'orcl';

define m_target=orcl@loopback

create sequence s1 cache 10000;
select s1.nextval from dual;

create table t1
segment creation immediate
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n2,
        lpad(rownum,10,'0')             v2,
        lpad('x',100,'x')               padding
from
        generator       v1
;

create table t3(
        id1     number(6,0),
        id2     number(6,0),
        n0      number(6,0),
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(10),
        v2      varchar2(10)
)
segment creation immediate
;

create or replace view v1 as
select
        t1.id id1,
        t2.id id2,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1, t2
where
        t2.id = t1.id
;

The command to create a public database link (one example of the many optional commands in my original source) has to be run by a suitably privileged schema as a one-off event. The definition of the substitution variable m_target that I’ll be using as my database link (again with many possibilities in my original script) means I only have to edit my choice of database link once in my script as I change Oracle instances. You’ll notice I’ve done my usual trick of using a loopback database link to emulate a distributed system.

I’ve then created two populated tables (t1, t2) and a third empty table (t3) that will be the target of an insert. I’ve also created a view (v1) that joins the first two tables and a sequence (s1) that I’ve primed with a single call to nextval. When I get to the tests I’ll be using t1, t2 and v1 as if they had been created in the remote database (referenced through my loopback database link) while t3 and s1 will be local objects.

So let’s run a couple of statements and see what their execution plans look like:


set serveroutput off

prompt  =======================
prompt  Insert without sequence
prompt  =======================

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        t1.id,
        t2.id,
        0,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1@&m_target    t1,
        t2@&m_target    t2
where
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));

prompt  =======================
prompt  Insert WITH sequence
prompt  =======================

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        t1.id,
        t2.id,
        s1.nextval,
        t1.n1,
        t2.n2,
        t1.v1,
        t2.v2
from
        t1@&m_target    t1,
        t2@&m_target    t2
where
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;

The code joins t1 and t2, selects a couple of columns and inserts then into t3 without or with a call to s1.nextval (the local sequence). Here’s the output (with minor cosmetic changes) from executing this code under 19.3.0.0 (the effects are the same in 18.3.0.0 and 12.2.0.1):


=======================
Insert without sequence
=======================

10000 rows created.


SQL_ID  373nz810u3frv, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
0,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback t1,
t2@orclpdb@loopback t2 where  t2.id = t1.id

-----------------------------------------------------------------
| Id  | Operation                | Name | Cost  | Inst   |IN-OUT|
-----------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |        |      |
|   2 |   REMOTE                 |      |       | ORCLP~ | R->S |
-----------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - SELECT "A2"."ID","A1"."ID",0,"A2"."N1","A1"."N2","A2"."V1","A1"."
       V2" FROM "T1" "A2","T2" "A1" WHERE "A1"."ID"="A2"."ID" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - cpu costing is off (consider enabling it)


=======================
Insert WITH sequence
=======================

10000 rows created.


SQL_ID  8jg23arujnh01, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
s1.nextval,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback
t1,  t2@orclpdb@loopback t2 where  t2.id = t1.id

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    54 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|*  3 |    HASH JOIN             |      | 10000 |   937K|    54   (8)| 00:00:01 |        |      |
|   4 |     REMOTE               | T1   | 10000 |   468K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
|   5 |     REMOTE               | T2   | 10000 |   468K|    26   (4)| 00:00:01 | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"="T1"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","N1","V1" FROM "T1" "A2" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

   5 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","N2","V2" FROM "T2" "A1" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

The key thing to notice is that when we want to insert the local sequence as a column in the select list Oracle breaks the hash join into two separate accesses to the remote database and pulls all the data we might need from the two tables before trying to join them locally. When the query is “fully remote” the local database can let the remote database deal with the join, when the query is distributed – which is a side effect of introducing the sequence – the local site becomes the driving site and has to work out the least worst way of handling the join, which might be much slower than the remote join.  (It’s an odd little quirk that when the select is fully remote the optimizer thinks that it’s not using CPU costing. Possibly that’s because all the arithmetic happens somewhere else and the local cost of the query never gets above zero.)

This is one of those cases where we might work around the problem by creating a remote view to handle the join – hence the creation of the view v1; here’s a suitable statement and the resulting execution plan:

prompt  ==============================
prompt  Insert using view and sequence
prompt  ==============================

set serveroutput off

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        v1.id1,
        v1.id2,
        s1.nextval,
        v1.n1,
        v1.n2,
        v1.v1,
        v1.v2
from
        v1@&m_target v1
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;



==============================
Insert using view and sequence
==============================

10000 rows created.


SQL_ID  4tz0rrqt87nb8, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  v1.id1,  v1.id2,
s1.nextval,  v1.n1,  v1.n2,  v1.v1,  v1.v2 from  v1@orclpdb@loopback v1

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    27 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|   3 |    REMOTE                | V1   | 10000 |   937K|    27   (8)| 00:00:01 | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID1","ID2","N1","N2","V1","V2" FROM "V1" "V1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

As you can see, the local optimizer doesn’t know enough about the remote view to be able to split it into components and make a mess of the execution plan, it simply sees a query against a “single table” and leaves the remote database to worry about optimising it. But, of course, we don’t always have the luxury of being able to create objects on someone else’s database, so what’s the alternative?

Try rewriting the query to use an inline view with the /*+ no_merge() */ hint:


prompt  =======================================
prompt  Insert from a no-merge inline view with
prompt  a sequence.nextval in the outer query
prompt  ========================================

set serveroutput off

insert into t3 (id1, id2, n0, n1, n2, v1, v2)
select
        id1, id2, s1.nextval, n1, n2, v1, v2
from    (
        select  /*+ no_merge */
                t1.id   id1,    
                t2.id   id2,
                0       n0,
                t1.n1,
                t2.n2,
                t1.v1,
                t2.v2
        from
                t1@&m_target    t1,
                t2@&m_target    t2
        where
                t2.id = t1.id
        )       v1
;

select * from table(dbms_xplan.display_cursor(format=>'-plan_hash'));
commit;


=======================================
Insert from a no-merge inline view with
a sequence.nextval in the outer query
========================================

10000 rows created.


SQL_ID  20z81g550tbsk, child number 0
-------------------------------------
insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  id1, id2,
s1.nextval, n1, n2, v1, v2 from (  select /*+ no_merge */   t1.id id1,
 t2.id id2,   0 n0,   t1.n1,   t2.n2,   t1.v1,   t2.v2  from
t1@orclpdb@loopback t1,   t2@orclpdb@loopback t2  where   t2.id = t1.id
 ) v1

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |       |       |    54 (100)|          |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |       |       |            |          |        |      |
|   2 |   SEQUENCE               | S1   |       |       |            |          |        |      |
|   3 |    VIEW                  |      | 10000 |   937K|    54   (8)| 00:00:01 |        |      |
|   4 |     REMOTE               |      |       |       |            |          | ORCLP~ | R->S |
-------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   4 - SELECT /*+ NO_MERGE */ "A2"."ID","A1"."ID",0,"A2"."N1","A1"."N2","A2"."V1","A1"."V
       2" FROM "T1" "A2","T2" "A1" WHERE "A1"."ID"="A2"."ID" (accessing
       'ORCLPDB.LOCALDOMAIN@LOOPBACK' )


The hint has done exactly what we needed: it has passed the text of the inline view to the remote database for optimisation so the join happens remotely, and the sequence number is then included after the result set comes back from the remote database. The SQL is a little messier, of course, mostly thanks to the doubled appearance of the columns in the select list.

Summary

Using a local sequence in DML that accesses a remote database makes the optimizer treats the underlying query as a distributed query, and this may mean it can’t find an efficient execution path unless you do some re-engineering of the code. If you can manage to make an insert with a constant efficient then using that version of the code as an in-line no_merge view with one extra layer that brings the sequence into play may be all you need to do to make the DML operate efficiently.

July 3, 2019

DB links

Filed under: distributed,Oracle,Performance — Jonathan Lewis @ 1:16 pm BST Jul 3,2019

A recent thread on the ODC SQL & PL/SQL forum raised the question of how data moves across a database link – is it row by row or array based (or other)? It’s a good question as it’s fairly common knowledge that distributed queries can be very much slower than the same query when executed on an equivalent set of local tables, so it’s worth having a little poke around to see if there’s anything in the way the data moves across the link that could be tweaked to improve performance.

It’s fairly easy to get some ideas by running a couple of queries and checking session activity stats and wait events – so here’s a little test conducted between a database running 12.2.0.1 and a database running 11.2.0.4. For this type of testing it’s probably sensible to use two database rather than faking things with a loopback database link in case the loopback introduces some special features (a.k.a. dirty tricks).


rem
rem     Script:         db_link_usage.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem
rem     Last tested 
rem             12.2.0.1 -> 11.2.0.4
rem

prompt  ==================================
prompt  SQL*Plus 20,000 rows, arraysize 20
prompt  ==================================

set feedback only
set arraysize 20

execute snap_my_stats.start_snap
execute snap_events.start_snap
select * from all_objects@d11g where rownum <= 20000;

set feedback on
execute snap_events.end_snap
execute snap_my_stats.end_snap

prompt  ====================================
prompt  SQL*Plus 20,000 rows, arraysize 2000
prompt  ====================================

set feedback only
set arraysize 2000

execute snap_my_stats.start_snap
execute snap_events.start_snap
select * from all_objects@d11g where rownum <= 20000;

set feedback on
execute snap_events.end_snap
execute snap_my_stats.end_snap

All I’ve done is select 20,000 rows from view all_objects in a remote database with two different settings for the array fetch size. The “feedback only” option is a 12c option that means the session doesn’t output the data, it shows only the final “N rows selected” report. The two “snap” packages are some code I wrote a couple of decades ago to report changes in my session’s activity stats and wait events.

It turned out that there was very little of interest in the session activity stats although the impact of SQL*Net compression is always worth a quick reminder (here and here) on the other hand the wait events (specifically the wait counts) are most revealing:

================================== 
SQL*Plus 20,000 rows, arraysize 20
================================== 
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                       1,004           0          15.69        .016     414,828
SQL*Net message to dblink                         1,004           0           0.04        .000           0
SQL*Net message from dblink                       1,004           0          61.02        .061          11
SQL*Net message to client                         1,004           0           0.11        .000           0

====================================
SQL*Plus 20,000 rows, arraysize 2000
====================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                          14           0           4.88        .349     414,828
SQL*Net message to dblink                            13           0           0.00        .000           0
SQL*Net message from dblink                          13           0           7.72        .594          11
SQL*Net message to client                            14           0           0.00        .000           0

SQL*Net more data from dblink                        96           0           6.26        .065           6
SQL*Net more data to client                          96           0           0.16        .002           4

Doing a little arithmetic (and allowing for a few “metadata” messages that introduce small variations in the numbers), we can see that when we fetched the 20,000 rows with an arraysize of 20 this turned into 1,000 (fetch) calls from SQL*Plus to the server, with a matching 1,000 calls from the local server to the remote server. When the arraysize goes up to 2,000 rows, though, the SDU (session data unit) setting for SQL*Net is too small to hold the whole of a single fetch and we see a single fetch from SQL*Plus turning into one “SQL*Net message to client” accompanied by 7 or 8 “SQ:(Net more data to client” with exactly the same pattern of conversation between the local server and the remote server. You could imagine the conversation as something like:

  • Client to local server: “give me 2,000 rows”
  • local server to remote server: “give me 2,000 rows”
  • remote server to local server: “Here, I can manage 120 rows as the first installment”
  • local server to client: “Here’s a first installment”
  • Client to local server: “Got it, where’s the rest?”     *** but no message recorded in the session activity stats
  • Local server to remote server: “Where’s the rest?”
  • Remote server to local server: “You want more – here’s another 120 rows”
  • Local server to client: “Here’s a second installment”
  • Client to local server: “Got it, where’s the rest?”     *** but no message recorded in the session activity stats
  • etc.

In this example the volume of data sent back to the client in each message was limited by the SDU size negotiated between the local server and the remote server as the link opens. In my case I happened to have the default (apparently 8KB) as the SDU_SIZE in the 12c and a non-default 32KB setting in the 11g sqlnet.ora, so the negotiated SDU between servers was 8KB (the lower of the two).

Here’s what the figures looked like after I had restarted with the SDU set to 16KB in the 12c tnsnames.ora, and 16KB in the 11g listener.ora:


==================================
SQL*Plus 20,000 rows, arraysize 20
==================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                       1,004           0          24.23        .024         588
SQL*Net message to dblink                         1,009           0           0.06        .000           0
SQL*Net message from dblink                       1,010           0          77.76        .077           4
SQL*Net message to client                         1,004           0           0.15        .000           0

====================================
SQL*Plus 20,000 rows, arraysize 2000
====================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                          14           0           1.61        .115         588
SQL*Net message to dblink                            13           0           0.00        .000           0
SQL*Net message from dblink                          13           0           4.21        .324           4
SQL*Net message to client                            14           0           0.00        .000           0

SQL*Net more data from dblink                        45           0          13.53        .301           6
SQL*Net more data to client                          45           0           0.13        .003           0

The first set of figures (arraysize 20) don’t change. If an 8KB SDU is large enough to hold an array of 20 rows then it’s not going to make a difference when the SDU is increased. In the second set of figures we see that for each “SQL*Net message from dblink” we now have roughly 3 “SQL*Net more data from dblink” (with matching counts for “SQL*Net more data to client”). With an SDU of double the size it now takes a total of roughly 4 packets to transmit the array fetch rather than the 8 or 9 we saw with the smaller SDU size.

As a final test with SQL*Plus, what happens if we set the SDU size at 32K for the 12c database (and that means for the client calling in and the server calling out) and 8KB for the 11g database? The client negotiates a 32KB SDU with the 12c server, but the 12c server negotiates an 8KB SDU with the 11g server. Here are the stats for just the test with the large fetch size in this case:


====================================
SQL*Plus 20,000 rows, arraysize 2000
====================================
Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                          14           0           5.30        .379     214,570
SQL*Net message to dblink                            13           0           0.00        .000           0
SQL*Net message from dblink                          13           0           4.09        .314          13
SQL*Net message to client                            14           0           0.00        .000           0

SQL*Net more data from dblink                        96           0          14.46        .151           6
SQL*Net more data to client                          20           0           0.08        .004           0

We get the same 10 (plus a few) “message to/from client/dblink”, but now the “more data” waits are dramatically different. When the client calls for the “next fetch” the local server has to call the remote server 4 or 5 times to get enough 8KB data packets to fill a single 32KB packet to return to the client. You can confirm this (and some of my descriptions of the earlier behaviour) by enabling extended tracing for wait states. Here’s an extract of 5 consecutive lines from a suitable trace file (you have to ignore the “#bytes” parameter on the “SQL*Net more data from dblink” waits, they’ve always been garbage:


WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 2793 driver id=1413697536 #bytes=7 p3=0 obj#=-1 tim=703316387674
WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 34 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=703316388447
WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 8 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=703316389134
WAIT #140028098478296: nam='SQL*Net more data from dblink' ela= 16 driver id=1413697536 #bytes=12 p3=0 obj#=-1 tim=703316389818
WAIT #140028098478296: nam='SQL*Net more data to client' ela= 73 driver id=1413697536 #bytes=32671 p3=0 obj#=-1 tim=703316389960

Summary Note

Oracle will use the negotiated SDU to do array fetches across a database link. Since a larger arraysize can (for large volumes of data) reduce the total work load on the remote database and on the network you may want the local server to have a specifically crafted tnsnames.ora entry and the remote server to expose a specific service with matching SDU size to help improve the efficiency of transferring data between two databases.

Lagniappe

If you want to look at other cases of how array fetching and SDU sizes interact, here are a couple of examples of using PL/SQL to execute SQL that fetches data across database links. The first is a simple, embedded “cursor for loop” that (apparently) does “row by row” procssing – although an enhancement appeared many versions ago to make this type of code use array fetching of 100 rows under the covers. The second demonstrates the effects of an explicit cursor with “fetch, bulk collect, limit”:


execute snap_events.start_snap

declare
        m_total number := 0;
        m_ct    number := 0;
begin
        for c1 in (select * from all_objects@d11g where rownum < 20000) loop
                m_total := m_total + c1.object_id;
                m_ct    := m_ct + 1;
        end loop;
end;
/

execute snap_events.end_snap

execute snap_events.start_snap

declare

        cursor c1 is  select * from all_objects@d11g where rownum <= 20000;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

        m_ct number := 0;

begin
        open c1;
        loop
                fetch   c1
                bulk collect
                into    m_tab
                limit   1000
                ;

                exit when c1%notfound;
                m_ct := m_ct + 1;

        end loop;
        close c1;
end;
/

execute snap_events.end_snap

Checking the session events for these two test we see the following when the SDU_SIZE has been set at 16KB at both ends of the database link (which means the negotiated SDU will be 16KB):


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                           4           0           0.66        .165         588 
SQL*Net message to dblink                           203           0           0.05        .000           0
SQL*Net message from dblink                         203           0          38.51        .190           5
SQL*Net message to client                             4           0           0.00        .000           0

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message from client                           4           0           0.08        .021         588
SQL*Net message to dblink                            24           0           0.01        .000           0
SQL*Net message from dblink                          24           0           9.19        .383           5
SQL*Net more data from dblink                        40           0           8.47        .212           6
SQL*Net message to client                             4           0           0.00        .000           0

The critical information we can see in the first example is that getting 20,000 rows from the remote database “row by row” takes 200 (plus noise) round-trips – showing that Oracle really is converting our single-row processing loop in array fetches of 100 rows.

The second example shows us that (allowing for a little noise, again) it takes 20 messages to the remote database to fetch 20,000 rows – so 1,000 rows per message – but the SDU size is too small and we have to send each array in three consecutive packets, the “message from dblink” wait and two “more data from dblink” waits.

Footnote

I did point out that the session activity stats for theses tests weren’t interesting. There is, however, one detail worth mentioning since you could otherwise be fooled into thinking the number of packet travelling between the databases was smaller than it really was. There are two statistics about network roundtrips:


Name                                         Value
----                                         -----
SQL*Net roundtrips to/from client               18         
SQL*Net roundtrips to/from dblink               13

A “roundtrip” is a “message to / message from” pair. It doesn’t account for traffic due to “more data” being requested or sent.

 

May 8, 2018

20 Indexes

Filed under: distributed,Indexing,Oracle — Jonathan Lewis @ 12:53 pm BST May 8,2018

If your system had to do a lot of distributed queries there’s a limit on indexes that might affect performance: when deriving an execution plan for a distributed query the optimizer will consider a maximum of twenty indexes on each remote table. if you have any tables with a ridiculous number of indexes (various 3rd party accounting and CRM systems spring to mind) and if you drop and recreate indexes on those tables in the wrong order then execution plans may change for the simple reason that the optimizer is considering a different subset of the available indexes.

Although the limit is stated in the manuals (a few lines into a section on managing statement transparency) there is no indication about which 20 indexes the optimizer is likely to choose – a couple of experiments, with tracing enabled and shared pool flushes, gives a fairly strong indication that it’s the last 20 indexes created (or, to be more explicit, the ones with the 20 highest object_id values).

Here’s a little code to help demonstrate the point – first just the table and index creation


rem
rem	Script:		indexes_20.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Apr 2008
rem
rem	Last tested 
rem		12.2.0.1
rem

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects 
	where	rownum <= 3000 -- > comment to avoid WordPress format issue
)
select
	mod(rownum,trunc(5000/1))	n01,
	mod(rownum,trunc(5000/2))	n02,
	mod(rownum,trunc(5000/3))	n03,
	mod(rownum,trunc(5000/4))	n04,
	mod(rownum,trunc(5000/5))	n05,
	mod(rownum,trunc(5000/6))	n06,
	mod(rownum,trunc(5000/7))	n07,
	mod(rownum,trunc(5000/8))	n08,
	mod(rownum,trunc(5000/9))	n09,
	mod(rownum,trunc(5000/10))	n10,
	mod(rownum,trunc(5000/11))	n11,
	mod(rownum,trunc(5000/12))	n12,
	mod(rownum,trunc(5000/13))	n13,
	mod(rownum,trunc(5000/14))	n14,
	mod(rownum,trunc(5000/15))	n15,
	mod(rownum,trunc(5000/16))	n16,
	mod(rownum,trunc(5000/17))	n17,
	mod(rownum,trunc(5000/18))	n18,
	mod(rownum,trunc(5000/19))	n19,
	mod(rownum,trunc(5000/20))	n20,
	mod(rownum,trunc(5000/21))	n21,
	mod(rownum,trunc(5000/22))	n22,
	mod(rownum,trunc(5000/23))	n23,
	mod(rownum,trunc(5000/24))	n24,
	rownum				id,
	rpad('x',40)			padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5 -- > comment to avoid WordPress format issue
;

--
-- Typo, I missed the semi-colon at the end of this line.
-- See comment 3.
--

alter table t1 add constraint t1_pk primary key(id)

create table t2
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects 
	where	rownum <= 3000 -- > comment to avoid WordPress format issue
)
select
	mod(rownum,trunc(5000/1))	n01,
	mod(rownum,trunc(5000/2))	n02,
	mod(rownum,trunc(5000/3))	n03,
	mod(rownum,trunc(5000/4))	n04,
	mod(rownum,trunc(5000/5))	n05,
	mod(rownum,trunc(5000/6))	n06,
	mod(rownum,trunc(5000/7))	n07,
	mod(rownum,trunc(5000/8))	n08,
	mod(rownum,trunc(5000/9))	n09,
	mod(rownum,trunc(5000/10))	n10,
	mod(rownum,trunc(5000/11))	n11,
	mod(rownum,trunc(5000/12))	n12,
	mod(rownum,trunc(5000/13))	n13,
	mod(rownum,trunc(5000/14))	n14,
	mod(rownum,trunc(5000/15))	n15,
	mod(rownum,trunc(5000/16))	n16,
	mod(rownum,trunc(5000/17))	n17,
	mod(rownum,trunc(5000/18))	n18,
	mod(rownum,trunc(5000/19))	n19,
	mod(rownum,trunc(5000/20))	n20,
	mod(rownum,trunc(5000/21))	n21,
	mod(rownum,trunc(5000/22))	n22,
	mod(rownum,trunc(5000/23))	n23,
	mod(rownum,trunc(5000/24))	n24,
	rownum				id,
	rpad('x',40)			padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5 -- > comment to avoid WordPress format issue
;

create index t2_a21 on t2(n21);
create index t2_a22 on t2(n22);
create index t2_a23 on t2(n23);
create index t2_a24 on t2(n24);

create index t2_z01 on t2(n01);
create index t2_z02 on t2(n02);
create index t2_z03 on t2(n03);
create index t2_z04 on t2(n04);
create index t2_z05 on t2(n05);
create index t2_z06 on t2(n06);
create index t2_z07 on t2(n07);
create index t2_z08 on t2(n08);
create index t2_z09 on t2(n09);
create index t2_z10 on t2(n10);

create index t2_i11 on t2(n11);
create index t2_i12 on t2(n12);
create index t2_i13 on t2(n13);
create index t2_i14 on t2(n14);
create index t2_i15 on t2(n15);
create index t2_i16 on t2(n16);
create index t2_i17 on t2(n17);
create index t2_i18 on t2(n18);
create index t2_i19 on t2(n19);
create index t2_i20 on t2(n20);

alter index t2_a21 rebuild;
alter index t2_a22 rebuild;
alter index t2_a23 rebuild;
alter index t2_a24 rebuild;
 

begin
        dbms_stats.gather_table_stats(
                ownname 	 => user,
		tabname		 =>'t1',
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'t2',
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);

end;
/

I’m going to use a loopback database link to join “local” table t1 to “remote” table t2 on all 24 of the nXX columns. I’ve created indexes on all the columns, messing around with index names, order of creation, and rebuilding, to cover possible selection criteria such as alphabetical order, ordering by data_object_id (rather than object_id), even ordering by name of indexed columns(!).

Now the code to run a test:


define m_target=orcl@loopback

alter session set events '10053 trace name context forever';
set serveroutput off

select
	t1.id,
	t2.id,
	t2.padding
from
	t1			t1,
	t2@&m_target		t2
where
	t1.id = 99
and	t2.n01 = t1.n01
and	t2.n02 = t1.n02
and	t2.n03 = t1.n03
and	t2.n04 = t1.n04
and	t2.n05 = t1.n05
and	t2.n06 = t1.n06
and	t2.n07 = t1.n07
and	t2.n08 = t1.n08
and	t2.n09 = t1.n09
and	t2.n10 = t1.n10
/*			*/
and	t2.n11 = t1.n11
and	t2.n12 = t1.n12
and	t2.n13 = t1.n13
and	t2.n14 = t1.n14
and	t2.n15 = t1.n15
and	t2.n16 = t1.n16
and	t2.n17 = t1.n17
and	t2.n18 = t1.n18
and	t2.n19 = t1.n19
and	t2.n20 = t1.n20
/*			*/
and	t2.n21 = t1.n21
and	t2.n22 = t1.n22
and	t2.n23 = t1.n23
and	t2.n24 = t1.n24
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

alter session set events '10053 trace name context off';

I’ve used a substitution variable for the name of the database link – it’s a convenience I have with all my distributed tests, a list of possible defines at the top of the script depending on which database I happen to be using at the time – then enabled the optimizer (10053) trace, set serveroutput off so that I can pull the execution plan from memory most easily, then executed the query.

Here’s the execution plan – including the Remote section and Outline.


-------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   270 (100)|          |        |      |
|   1 |  NESTED LOOPS      |      |     1 |   243 |   270   (6)| 00:00:01 |        |      |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |   101 |   268   (6)| 00:00:01 |        |      |
|   3 |   REMOTE           | T2   |     1 |   142 |     2   (0)| 00:00:01 | ORCL@~ | R->S |
-------------------------------------------------------------------------------------------


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "N01","N02","N03","N04","N05","N06","N07","N08","N09","N10","N11","N1
       2","N13","N14","N15","N16","N17","N18","N19","N20","N21","N22","N23","N24","ID","PA
       DDING" FROM "T2" "T2" WHERE "N01"=:1 AND "N02"=:2 AND "N03"=:3 AND "N04"=:4 AND
       "N05"=:5 AND "N06"=:6 AND "N07"=:7 AND "N08"=:8 AND "N09"=:9 AND "N10"=:10 AND
       "N11"=:11 AND "N12"=:12 AND "N13"=:13 AND "N14"=:14 AND "N15"=:15 AND "N16"=:16
       AND "N17"=:17 AND "N18"=:18 AND "N19"=:19 AND "N20"=:20 AND "N21"=:21 AND
       "N22"=:22 AND "N23"=:23 AND "N24"=:24 (accessing 'ORCL@LOOPBACK' )

There’s a little oddity with the plan – specifically in the Outline: there’s a “full(t2)” hint which is clearly inappropriate and isn’t consistent with the cost of 2 for the REMOTE operation reported in the body of the plan. Fortunately the SQL forwarded to the “remote” database doesn’t include this hint and (you’ll have to take my word for it) used an indexed access path into the table.

Where, though, is the indication that Oracle considered only 20 indexes? It’s in the 10053 trace file under the “Base Statistical Information” section in the subsection headed “Index Stats”:


Index Stats::
  Index: 0  Col#: 20    (NOT ANALYZED)
  LVLS: 1  #LB: 204  #DK: 250  LB/K: 1.00  DB/K: 400.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 19    (NOT ANALYZED)
  LVLS: 1  #LB: 204  #DK: 263  LB/K: 1.00  DB/K: 380.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 18    (NOT ANALYZED)
  LVLS: 1  #LB: 205  #DK: 277  LB/K: 1.00  DB/K: 361.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 17    (NOT ANALYZED)
  LVLS: 1  #LB: 205  #DK: 294  LB/K: 1.00  DB/K: 340.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 16    (NOT ANALYZED)
  LVLS: 1  #LB: 205  #DK: 312  LB/K: 1.00  DB/K: 320.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 15    (NOT ANALYZED)
  LVLS: 1  #LB: 205  #DK: 333  LB/K: 1.00  DB/K: 300.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 14    (NOT ANALYZED)
  LVLS: 1  #LB: 206  #DK: 357  LB/K: 1.00  DB/K: 280.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 13    (NOT ANALYZED)
  LVLS: 1  #LB: 206  #DK: 384  LB/K: 1.00  DB/K: 260.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 12    (NOT ANALYZED)
  LVLS: 1  #LB: 206  #DK: 416  LB/K: 1.00  DB/K: 240.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 11    (NOT ANALYZED)
  LVLS: 1  #LB: 206  #DK: 454  LB/K: 1.00  DB/K: 220.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 10    (NOT ANALYZED)
  LVLS: 1  #LB: 207  #DK: 500  LB/K: 1.00  DB/K: 200.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 9    (NOT ANALYZED)
  LVLS: 1  #LB: 207  #DK: 555  LB/K: 1.00  DB/K: 180.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 8    (NOT ANALYZED)
  LVLS: 1  #LB: 207  #DK: 625  LB/K: 1.00  DB/K: 160.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 7    (NOT ANALYZED)
  LVLS: 1  #LB: 208  #DK: 714  LB/K: 1.00  DB/K: 140.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 6    (NOT ANALYZED)
  LVLS: 1  #LB: 208  #DK: 833  LB/K: 1.00  DB/K: 120.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 5    (NOT ANALYZED)
  LVLS: 1  #LB: 208  #DK: 1000  LB/K: 1.00  DB/K: 100.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 4    (NOT ANALYZED)
  LVLS: 1  #LB: 208  #DK: 1250  LB/K: 1.00  DB/K: 80.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 3    (NOT ANALYZED)
  LVLS: 1  #LB: 209  #DK: 1666  LB/K: 1.00  DB/K: 60.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 2    (NOT ANALYZED)
  LVLS: 1  #LB: 209  #DK: 2500  LB/K: 1.00  DB/K: 40.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 
  Index: 0  Col#: 1    (NOT ANALYZED)
  LVLS: 1  #LB: 209  #DK: 5000  LB/K: 1.00  DB/K: 20.00  CLUF: 2002.00  NRW: 0.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 0 BSZ: 0
  KKEISFLG: 0 

We have 20 indexes listed, and while they’re all called “Index 0” (and reported as “Not Analyzed”) we can see from their column definitions that they are (in reverse order) the indexes on columns n01 through to n20 – i.e. the last 20 indexes created. The optimizer has created its plan based only on its knowledge of these indexes.

We might ask whether this matters or not – after all when the remote SQL gets to the remote database the remote optimizer is going to (re-)optimize it anyway and do the best it can with it, so at run-time Oracle could still end up using remote indexes that the local optimizer didn’t know about. So let’s get nasty and give the local optimizer a problem:


create index t2_id on t2(id);

select
        t1.id,
        t2.id,
        t2.padding
from
        t1                      t1,
        t2@&m_target            t2
where
        t1.id = 99
and     t2.n01 = t1.n01
;

I’ve created one more index on t2, which means the local optimizer is going to “forget” about the index that was the previous 20th index on the most recently created list for t2. That’s the index on (n01), which would have been a very good index for this query. If this query were to run locally the optimizer would do a nested loop from t1 to t2 using the index on (n01) – but the optimizer no longer knows about that index, so we get the following plan:


-------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   538 (100)|          |        |      |
|*  1 |  HASH JOIN         |      |    20 |  1140 |   538   (7)| 00:00:01 |        |      |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     9 |   268   (6)| 00:00:01 |        |      |
|   3 |   REMOTE           | T2   |   100K|  4687K|   268   (6)| 00:00:01 | ORCL@~ | R->S |
-------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      USE_HASH(@"SEL$1" "T2"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N01"="T1"."N01")
   2 - filter("T1"."ID"=99)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "N01","ID","PADDING" FROM "T2" "T2" (accessing 'ORCL@LOOPBACK' )

Oracle is going to do a hash join and apply the join predicate late. Although the remote optimizer can sometimes rescue us from a mistake made by the local optimizer and use indexes that the local optimizer doesn’t know about, there are times when the remote SQL generated by the local optimizer is so rigidly associated with the expected plan that there’s no way the remote optimizer can workaround the assumptions made by the local optimizer.

So when you create (or drop and recreate) an index, it’s just possible that a distributed plan will have to change because the local optimizer is no longer aware of an index that exists at the remote site.

tl;dr

Be very cautious about dropping and recreating indexes if the table in question

  1. has more than 20 indexes
  2. and is used at the remote end of a distributed execution plan

The optimizer will consider only 20 of the indexes on the table, choosing the ones with the highest object_ids. If you drop and recreate an index then it gets a new (highest) object_id and a plan may change because the index that Oracle was previously using is no longer in the top 20.

May 4, 2018

FBI Limitation

Filed under: CBO,distributed,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 9:19 am BST May 4,2018

Latest update: Dec 2020

A recent question on the ODC (OTN) database forum prompted me to point out that the optimizer doesn’t consider function-based indexes on remote tables in distributed joins. I then spent 20 minutes trying to find the blog note where I had demonstrated this effect, or an entry in the manuals reporting the limitation – but I couldn’t find anything, so I’ve written a quick demo which I’ve run on 12.2.0.1 to show the effect. First, the SQL to create a couple of tables and a couple of indexes:


rem
rem     Script:         fbi_limitation.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem

-- create public database link orcl@loopback using 'orcl'; 
define m_target = orcl@loopback

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

create table t2
nologging
as
select * from t1
;

alter table t1 add constraint t1_pk primary key(id);
alter table t2 add constraint t2_pk primary key(id);
create unique index t2_f1 on t2(id+1);

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T2',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/


The code is very simple, it creates a couple of identical tables with an id column that will produce an index with a very good clustering_factor. You’ll notice that I’ve (previously) created a public database link that is (in my case) a loopback to the current database and the code defines a variable that I can use as a substitution variable later on. If you want to do further tests with this model you’ll need to make some changes in these two lines.

So now I’m going to execute a query that should result in the optimizer choosing a nested loop between the tables – but I have two versions of the query, one which treats t2 as the local table it really is, and one that pretends (through the loopback) that t2 is remote.


set serveroutput off

select
        t1.v1, t2.v1
from
        t1,
        t2
--      t2@orcl@loopback
where
        t2.id+1 = t1.id
and     t1.n1 between 101 and 110
;


select * from table(dbms_xplan.display_cursor);

select
        t1.v1, t2.v1
from
        t1,
--      t2
        t2@orcl@loopback
where
        t2.id+1 = t1.id
and     t1.n1 between 101 and 110
;

select * from table(dbms_xplan.display_cursor);

Here are the two execution plans, pulled from memory – including the “remote” section in the distributed case:


SQL_ID  fthq1tqthq8js, child number 0
-------------------------------------
select  t1.v1, t2.v1 from  t1,  t2 -- t2@orcl@loopback where  t2.id+1 =
t1.id and t1.n1 between 101 and 110

Plan hash value: 1798294492

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |  2347 (100)|          |
|   1 |  NESTED LOOPS                |       |    11 |   407 |  2347   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL          | T1    |    11 |   231 |  2325   (4)| 00:00:01 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    16 |     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | T2_F1 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("T1"."N1"<=110 AND "T1"."N1">=101))
   4 - access("T2"."SYS_NC00005$"="T1"."ID")

Note
-----
   - this is an adaptive plan




SQL_ID  ftnmywddff1bb, child number 0
-------------------------------------
select  t1.v1, t2.v1 from  t1, -- t2  t2@orcl@loopback where  t2.id+1 =
t1.id and t1.n1 between 101 and 110

Plan hash value: 1770389500

-------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  4663 (100)|          |        |      |
|*  1 |  HASH JOIN         |      |    11 |   616 |  4663   (4)| 00:00:01 |        |      |
|*  2 |   TABLE ACCESS FULL| T1   |    11 |   231 |  2325   (4)| 00:00:01 |        |      |
|   3 |   REMOTE           | T2   |  1000K|    33M|  2319   (3)| 00:00:01 | ORCL@~ | R->S |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="T2"."ID"+1)
   2 - filter(("T1"."N1"<=110 AND "T1"."N1">=101))

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "ID","V1" FROM "T2" "T2" (accessing 'ORCL@LOOPBACK' )

Both plans show that the optimizer has estimated the number of rows that would be retrieved from t1 correctly (very nearly); but while the fully local query does a nested loop join using the high-precision, very efficient function-based index (reporting the internal supporting column referenced in the predicate section) the distributed query seems to have no idea about the remote function-based index and select all the required rows from the remote table and does a hash join.

Footnote:

Another reason for changes in execution plan when you test fully local and then run distributed is due to the optimizer ignoring remote histograms, as demonstrated in a much older blog note (though still true in 12.2.0.1).

Addendum

After finishing this note, I discovered that I had written a similar note about reverse key indexes nearly five years ago. Arguably a reverse key is just a special case of a function-based index – except it’s not labelled as such in user_tab_cols, and doesn’t depend on a system-generated hidden column. I still haven’t been able to find any references in any manual since 8i stating the reverse key index limitation, and have yet to find an official comment about ignoring function-based indexes.


 
 

Update (Dec 2020)

In a recent thread on the Oracle-L mailing list someone pointed out that with the appropriate hints Oracle would use the function-based index in this example, suggesting that this was evidence that the problem was not a limitation but more of a plan selection issue.

This highlighted an important principle: when the local optimizer tries to generate a plan and decomposes the query to produce remote components it thinks it knows what will happen when those component SQL statements reach the remote optimizer, but the remote optimizer may do something different because it knows more about its own data than the local optimizer does.

Here’s the test query with “appropriate” hints added, and the resulting execution plan pulled from memory on an instance of 19.3.0.0:


select
        /*+ leading(t1 t2) use_nl(t2) index(t2_f1) gather_plan_statistics */
        t1.v1, t2.v1
from
        t1,
        t2@&m_target
where
        t2.id+1 = t1.id
and     t1.n1 between 101 and 110
;

select * from table(dbms_xplan.display_cursor(format=>'cost allstats last hint_report remote outline'));


------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| Inst   |IN-OUT| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        | 29040 (100)|        |      |     10 |00:00:00.01 |   17864 |
|   1 |  NESTED LOOPS      |      |      1 |     11 | 29040   (7)|        |      |     10 |00:00:00.01 |   17864 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |     11 |  2394   (6)|        |      |     10 |00:00:00.01 |   17864 |
|   3 |   REMOTE           | T2   |     10 |      1 |  2422   (7)| ORCLP~ | R->S |     10 |00:00:00.01 |       0 |
------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1"<=110 AND "T1"."N1">=101))

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ GATHER_PLAN_STATISTICS INDEX ("T2" "T2_F1") USE_NL ("T2") */ "ID","V1" FROM "T2" "T2"
       WHERE :1="ID"+1 (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$1
           -  leading(t1 t2)

   3 -  SEL$1 / T2@SEL$1
         U -  index(t2 t2_f1) / index specified in the hint doesn't exist
           -  use_nl(t2)

The hint set I supplied was one that said:

  • consider only the join order t1 -> t2
  • use a nested loop to get to t2
  • use the function-based index t2_f1 to access t2

There are several details in the resulting output that tell us that the local optimizer doesn’t know anything about the remote function-based index.

From the top down:

  • the cost of the access to t2 is clearly a tablescan cost – the local optimizer thinks it has to do a tablescan
  • the outline information actually includes the hint FULL(@”SEL$1″ “T2″@”SEL$1”) – the local optimizer thinks it has to do a tablescan
  • the hint report tells us that the hint index(t2 t2_f1) is unused because the index doesn’t exist.

Despite this, the run-time activity was a nested loop join using the t2_f1 index to reach the t2 table. At first sight it would be easy to think that this was because the /*+ index(t2 t2_f1) */ hint was forwarded to the remote database – as shown in the Remote SQL Information section of the plan – but that’s not the case. Because the optimizer has been hinted to do a nested loop from t1 to t2 the join predicate WHERE :1=”ID”+1 was included in the SQL sent to the remote database. Even if the index hint had not been present the remote optimizer would have recognised that the arriving query could best be satisfied by a unique index scan on t2_f1.

This is not a case of Oracle being able to use the correct plan when hinted, it’s a case of being a bit lucky that a set of highly suggestive hints produced a lucky result.  Of course in a very simple query like this it’s fairly easy to get lucky, but in a more complex case (multiple remote tables with multiple candidate indexes) the fact that the optimizer is costing for a tablescan when it should be costing for an index is likely to make it much harder to get lucky.

In this case a search through the shared pool showed that the remote instance had actually used the index unique scan rather than doing the full tablescan the local instance was assuming would happen. It’s far more common, though, for this type of local/remote mismatch  to result in the remote instance doing a full tablescan when the local instance was expecting it to do a high precision indexed access, and this is usually because the remote histogram information isn’t available to the local instance.

To be checked

As time (and versions) pass, limitations in the optimizer are removed, so it’s good to have to revisit old notes and re-run tests. Another effect of coming back to an old test is that you think of new details that might be worth checking. I’ve said in the past that “there’s no such thing as a function-based index”. If the remote index is a multi-column index that starts with one or more base columns, and only introduces the function-based bit later in the index will the local  optimizer still fail to pick it up?

 

November 14, 2016

Distributed Trap

Filed under: distributed,Oracle — Jonathan Lewis @ 1:19 pm GMT Nov 14,2016

Here’s an interesting (and potentially very useful) observation from an OTN database forum thread that appeared at the end of last week. It concerns the problems of pulling data from remote systems and is an interesting example because it highlights a case where you might want to ignore the normal performance penalties associated with row-level triggers. I’ll start by building some data:

rem
rem     Script:         remote_insert_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(sysdate) + rownum d1,
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',50)            padding
from
        generator
where
        rownum <= 500
;

alter table t2 add constraint t2_pk primary key(id);

create table t3
as
select sysdate d1, t1.* from t1
where rownum = 0
;

--  Now gather stats if you need to (depending on version)

I’ve created three tables. Table t3 is an empty copy of table t1 with a date column added, and t2 is some sort of reference table that looks rather like table t1 but has a lot less data. Now I’m going to pretend that t1 and t2 are in a remote database while t3 is in the local database and copy data from t1 to t3, omitting any data that is referenced in t2. The SQL is simple:


define m_target=test@loopback

insert into t3(
        id, small_vc, padding
)
select
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

----------------------------------------------------------------------
| Id  | Operation                | Name | Cost (%CPU)| Inst   |IN-OUT|
----------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     0   (0)|        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |            |        |      |
|   2 |   REMOTE                 |      |            |   TEST | R->S |
----------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - EXPLAIN PLAN INTO PLAN_TABLE@! FOR SELECT
       "A1"."ID","A1"."SMALL_VC","A1"."PADDING" FROM "T1" "A1" WHERE
       "A1"."ID"<>ALL (SELECT "A2"."ID" FROM "T2" "A2") (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

I’ve set up an SQL*Plus substitution variable to hold a database link name (and used a loopback qualifier so that I can pretend t1 and t2 are in a remote database). The execution plan (taken from a call to explain plan, but confirmed by running the query and calling dbms_xplan.display_cursor) shows that Oracle has executed the select part of the insert as a “fully remote” statement – which is nice.

Unfortunately I forgot to include a datestamp as I copied the data over, so let’s modify the query to do that:


insert into t3(
        d1,
        id, small_vc, padding
)
select
        sysdate,
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

--------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |      1 |        |   123 (100)|      0 |00:00:27.42 |   10908 |
|   1 |  LOAD TABLE CONVENTIONAL |      |      1 |        |            |      0 |00:00:27.42 |   10908 |
|*  2 |   FILTER                 |      |      1 |        |            |  49500 |00:00:26.51 |       0 |
|   3 |    REMOTE                | T1   |      1 |  50000 |   113   (3)|  50000 |00:00:00.33 |       0 |
|   4 |    REMOTE                | T2   |  50000 |      1 |     0   (0)|    500 |00:00:25.29 |       0 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM  "T2" WHERE "T2"."ID"=:B1))

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","SMALL_VC","PADDING" FROM "T1" "T1"
       (accessing 'TEST.LOCALDOMAIN@LOOPBACK' )

   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID" FROM "T2" "T2" WHERE "ID"=:1 (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

Whoops, the plan just changed – it also took 27.4 seconds instead of the 1.1 seconds that it used to – that’s because of the 50,000 remote calls to execute a remote query for the subquery filter. The addition of sysdate (which is the local sysdate@!) to the select list has made the select statement distributed instead of fully remote, and the query for a CTAS or “insert/select” has to be driven from the local site if it’s a distributed query.

Okay, plan (b), don’t insert sysdate, add it to the table as a default:


alter table t3 modify d1 default sysdate;

This doesn’t help. Even though the query doesn’t mention sysdate explicitly the query is still treated as distributed query.

Okay, plan (c) – don’t insert sysdate, add a trigger to the table:

alter table t3 modify d1 default null;

create or replace trigger t3_bri
before insert on t3
for each row
begin
        :new.d1 := sysdate;
end;
/


And this works (in 11.2.0.4 and 12.1.0.2, at least, which are the versions I tested).

I could have made the trigger a little more sophisticated, of course, but the point of the post was simply to demonstrate a problem and a simple workaround.

There are probably several other commonly used features (various sys_context() calls, perhaps) that have the same effect.

Update (July 2020)

This note is till relevant to 19c

 

 

June 10, 2016

Uniquely parallel

Filed under: 12c,distributed,Execution plans,Oracle,Parallel Execution,Performance,Upgrades — Jonathan Lewis @ 7:36 am BST Jun 10,2016

Here’s a surprising (to me) execution plan from 12.1.0.2 – parallel execution to find one row in a table using a unique scan of a unique index – produced by running the following script (data creation SQL to follow):


set serveroutput off
set linesize 180
set trimspool on
set pagesize 60

alter session set statistics_level = all;

variable b1 number
exec :b1 := 50000

select /*+ parallel (3) */ id, v1 from t2 where id=:b1;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

break on dfo_number skip 1 on tq_id skip 1 on server_type

select
        dfo_number, tq_id, server_type, instance, process, num_rows
from
        v$pq_tqstat
order by
        dfo_number, tq_id, server_type desc, instance, process
;

All I’ve done is enable rowsource execution statistics, set a bind variable to a value, query a table with a /*+ parallel(3) */ hint to find the one row that will be identified by primary key, and then reported the actual execution plan. When I first ran the test Oracle didn’t report the execution statistics correctly so I’ve also queried v$pq_tqstat to show the PX servers used and the flow of data through the plan. Here’s the plan, followed by the  results from v$pq_tqstat:


SQL_ID  0dzynh9d29pt9, child number 0
-------------------------------------
select /*+ parallel (3) */ id,v1 from t2 where id=:b1

Plan hash value: 247082613

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   1 |  PX COORDINATOR                   |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |  Q1,01 | P->S | QC (RAND)  |      0 |00:00:00.01 |
|   3 |    TABLE ACCESS BY INDEX ROWID    | T2       |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   4 |     BUFFER SORT                   |          |      0 |        |  Q1,01 | PCWC |            |      0 |00:00:00.01 |
|   5 |      PX RECEIVE                   |          |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |
|   7 |        PX SELECTOR                |          |      0 |        |  Q1,00 | SCWC |            |      0 |00:00:00.01 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |      0 |      1 |  Q1,00 | SCWP |            |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("ID"=:B1)

Note
-----
   - Degree of Parallelism is 3 because of hint

DFO_NUMBER      TQ_ID SERVER_TYP   INSTANCE PROCES   NUM_ROWS
---------- ---------- ---------- ---------- ------ ----------
         1          0 Producer            1 P003            0
                                          1 P004            1
                                          1 P005            0
                      Consumer            1 P000            0
                                          1 P001            1
                                          1 P002            0

                    1 Producer            1 P000            0
                                          1 P001            1
                                          1 P002            0
                      Consumer            1 QC              1

As you can see the table access follows a unique scan of an index and, although the rowsource execution stats report zero starts for the unique scan, we can see from v$pq_tqstat that slave P004 acquired a “row” (actually a rowid) and passed it to slave P001 which then acquired a row from the table and passed that row to the query coordinator. Oracle really did execute a parallel query, starting and stopping a total of 6 sessions to perform a single unique index access.

You’ll notice operation 7 is one you’ve only seen in the latest version of Oracle. The PX SELECTOR was introduced in 12c to reduce the number of times a complex parallel query would funnel into the query coordinator (parallel to serial) and then fan out again (serial to parallel) generating a new data flow operation tree (DFO tree) spawning one or two new parallel server groups as it did so. To stop this happening a step that needs to serialise in a 12c parallel plan can nominate one of the existing PX server processes (from each set, if necessary) to do the job so that the same set of PX servers can carry on running the query without the need for a new DFO tree to appear.

This enhancement to parallel execution plans is a good idea – except when it appears in my silly little query and turns something that ought to be quick and cheap into a job that is far more resource-intensive than it should be.

At this point, of course, you’re probably wondering what kind of idiot would put a parallel() hint into a query that was doing nothing but selecting one row by primary key – the answer is: “the Oracle optimizer in 12c”. I discovered this anomaly while creating a demonstration of the way that a distributed parallel query has to serialise through a single database link even if the operations at the two ends of the link run parallel. Here’s the SQL I wrote for the full demonstration:


rem     Script:         distributed_pq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016

define m_remote='test@loopback'
define m_remote='orcl@loopback'

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid wordpress format issue
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,8,'0') as varchar2(8))         v1,
        cast(rpad('x',100) as varchar2(100))            padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5    -- > comment to avoid wordpress format issue
;

create table t2
as
select  *
from    t1
where   mod(id,100) = 0
;

alter table t2 add constraint t2_pk primary key(id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/


set serveroutput off

select
        /*+ parallel(3) */
        t1.v1, t2.v1
from
        t1,
        t2@&m_remote
where
        mod(t1.id,10) = 0
and     t2.id = t1.id
and     mod(to_number(t2.v1),10) = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

If you want to run this demo you’ll need to do something about formatting the output; more importantly you’ll have to create a database link (with a loopback link) and set up a define identifying it at the line where I’ve got orcl@loopback and test@loopback (which are my 12c and 11g loopback links respectively).

Here’s the plan (with rowsource stats) I got from the 12c test:


----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   1 |  NESTED LOOPS         |          |      1 |     10 |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   2 |   PX COORDINATOR      |          |      1 |        |        |      |            |  10000 |00:00:00.11 |       7 |      0 |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |      0 |   1000 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
|   4 |     PX BLOCK ITERATOR |          |      3 |   1000 |  Q1,00 | PCWC |            |  10000 |00:03:17.72 |    1745 |   1667 |
|*  5 |      TABLE ACCESS FULL| T1       |     39 |   1000 |  Q1,00 | PCWP |            |  10000 |00:00:00.06 |    1745 |   1667 |
|   6 |   REMOTE              | T2       |  10000 |      1 |        |      |            |      0 |00:01:14.44 |       0 |      0 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(:Z>=:Z AND :Z<=:Z)
       filter(MOD("T1"."ID",10)=0)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   6 - SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1
       (accessing 'ORCL@LOOPBACK' )

Note
-----
   - Degree of Parallelism is 3 because of hint

I have hacked this output a little – the “Remote SQL” section didn’t get reported by display_cursor(), so I’ve inserted the remote sql I got from a call to dbms_xplan.display() after using explain plan to generate a plan. Note the /*+ shared(3) */ hint that appears in the remote SQL – that’s the internal version of a parallel(3) hint.

In 11g the query complete in 2.4 seconds, in 12c the query took nearly 75 seconds to run thanks to the 12c enhancement that allowed it to obey the hint! Looking at the time column (and ignoring the anomalous 3:17 at operation 4 – which might roughly be echoing 3 * 1:14) we can see that the time goes on the calls to the remote database (and a check of v$session_event shows this time spent in “SQL*Net message from db link”), so the obvious thing to do is check what actually happened at the remote database and we can do that by searching the library cache for a recognizable piece of the remote SQL – here’s the SQL to do that, with the results from 11g followed by the results from 12c:


SQL> select sql_id, child_number, executions, px_servers_executions, sql_text from v$sql
  2  where sql_text like '%SHARED%' and sql_text not like 'select sql_id%';

11g results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
c0f292z5czhwk            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE MOD(TO_NUMBER("V1"),10)=1 AND "ID"=:1

12c results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
7bk51w7vtagwd            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

7bk51w7vtagwd            1          0                 59995
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

It’s not surprising to see that the query has executed 10,000 times – that’s what we were told by the Starts statistic from dbms_output.display_cursor(), but 12c has 60,000 (with a little error) PX Servers executions of the statement. That’s 10,000 executions * degree 3 * the 2 slave sets we saw in my original execution plan. (It’s an odd little quirk of the two versions of Oracle that the order of predicates in the remote SQL was reversed between 11g and 12c – leading to two different SQL_IDs).

By enabling rowsource execution stats at the system level I was able to capture the remote execution plan with its stats:


SQL_ID  7bk51w7vtagwd, child number 0
-------------------------------------
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND
MOD(TO_NUMBER("V1"),10)=1

--------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  PX COORDINATOR                   |          |      0 |        |      0 |00:00:00.01 |       0 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  3 |    TABLE ACCESS BY INDEX ROWID    | T2       |  29983 |      1 |      0 |00:00:22.21 |    1000 |
|   4 |     BUFFER SORT                   |          |  29995 |        |    999 |00:00:21.78 |       0 |
|   5 |      PX RECEIVE                   |          |  29924 |      1 |    994 |00:00:21.21 |       0 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |
|   7 |        PX SELECTOR                |          |  29993 |        |    999 |00:00:06.08 |   19992 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |  29999 |      1 |   1000 |00:00:00.24 |   20000 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MOD(TO_NUMBER("V1"),10)=1)
   8 - access("ID"=:1)

Unlike the test case I started with, this output did show the number of starts (with a few missing) and the work done across the slaves. Our index probe had to do two buffer gets on every execution, and we have 10,000 executions of the query so 20,000 buffer gets on the index unique scan. Even though only one slave actually does any work with the PX Selector, all three slaves in that set seem to “start” the relevant operations. The definition of the data meant that only one index probe in 10 was successful so we only visited 1,000 buffers from the table. If you’re familiar with reading plans with rowsource execution stats you’ll appreciate that something has gone wrong in the reporting here – that 1,000 at operation 3 should read 21,000 because it ought to include the 20,000 from the index scan (at least, that’s what a serial plan would do).

If you’re still wondering why running this query as a parallel query should take so long – after all it’s only 10,000 executions in 70 seconds – bear in mind that Oracle has to allocate and deallocate 6 PX servers to new sessions each time it starts; the instance activity stats showed “logons cumulative” going up by 60,000 each time I ran the driving query: that’s about 850 logons (and log offs) per second. I don’t think my test machine would give a realistic impression of the impact of a couple of copies of this query running simultaneously, but when I tried the contention introduce increased the run time to 93 seconds.

tl;dr

Watch out for poor performance becoming disastrous for distributed parallel queries when you upgrade from 11g to 12c

Update (May 2017)

Thanks to Mauro Pagano – identifying this as fix 13345888 (check v$system_fix control) introduced in 12.1.0.1

Update (Sept 2021)

Still does the same horrible thing in 19.11.0.0 – even with fix_control 13345888 set to zero. To avoid getting the shared(3) hint sent to the remote database the hint in the main query needs to go from query level to object level, viz: /*+ parallel(t1 3) */.

September 18, 2013

Distributed Sets

Filed under: distributed,Oracle,Performance,sorting — Jonathan Lewis @ 6:14 pm BST Sep 18,2013

In an earlier post I’ve described how a distributed query can operate at a remote site if it’s a simple select but has to operate at the local site if it’s a CTAS (create as select) or insert as select. There’s (at least) one special case where this turns out to be untrue … provided you write the query in the correct fashion. I discovered this only as a result of doing a few experiments in response to a question on the OTN database forum.

(more…)

August 19, 2013

Distributed Queries – 3

Filed under: distributed,Histograms,Oracle — Jonathan Lewis @ 7:25 am BST Aug 19,2013

A comment I’ve made many times in the past about distributed queries is that Oracle doesn’t try to retrieve histogram information from remote databases when optimizing a query. Checking back through previous posts, though, I don’t think I’ve ever mentioned it on the blog – so here’s an example demonstrating the point.

(more…)

November 24, 2010

Distributed Queries – 2

Filed under: CBO,distributed — Jonathan Lewis @ 9:46 am GMT Nov 24,2010

I have often said that the optimizer “forgets” that it is dealing with a distributed query once it has collected the stats that it can about the objects in the query, and that as a consequence the driving site for a distributed query will be the local database unless you use the /*+ driving_site */ hint to change it.

While investigating an oddity with a distributed query between two 11.1.0.7 databases a few days, I noticed something in the 10053 trace file that made me change my mind, and go back to look at earlier versions of Oracle.
(more…)

October 11, 2010

Distributed Objects

Filed under: distributed,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 7:12 pm BST Oct 11,2010

I recently came across a tidy solution to a common problem – how to minimise code maintenance in a procedure while maximising flexibility of the procedure. The task was fairly simple – create a ref cursor for a calling program to return data that (a) followed complex selection rules and (b) allowed the user to specify numerous types of input.

The principle was simple – the final ref cursor was driven by a list of (say) order ids – and the details to be returned about those orders required some fairly complex SQL to execute. To separate the complexity of constructing the list of columns from the complexity of identifying the required rows the developers had split the procedure into two stages. First, select the list of relevant order ids using one of several possible statements – the appropriate statement being derived from analysis of the inputs to the procedure; secondly open a ref cursor using that list of order ids. In this way if a new set of rules for selection appeared the only new code needed was a new query to select the ids – the main body of code didn’t need to be modified and re-optimised.
(more…)

October 7, 2010

Distributed Pipelines

Filed under: distributed,Oracle,Performance — Jonathan Lewis @ 6:06 pm BST Oct 7,2010

In an article that I wrote about the  driving_site() hint a few months ago I pointed out that the hint was not supposed to work with “create table as select” (CTAS) and “insert as select”. One of the people commenting on the note mentioned pipelined function as a workaround to this limitation – and I’ve finally got around to writing a note about the method.

The idea is simple. If you can write a distributed select statement that takes advantage of the /*+ driving_site(alias) */ hint to work efficiently, you can wrap the statement in a pl/sql cursor loop and stick that loop into a pipelined function to maximise the efficiency of create or insert as select. Here’s some sample code (tested on 11.1.0.6) to demonstrate the principle:
(more…)

May 19, 2010

Ignoring Hints

Filed under: distributed,Hints,Ignoring Hints,Troubleshooting,Tuning — Jonathan Lewis @ 9:04 pm BST May 19,2010

I’ve previously published a couple of notes (hereand here) about the use of the driving_site() hint with distributed queries. The first note pointed out that the hint was deliberately ignored if you write a local CTAS or INSERT that did a distributed query. I’ve just found another case where the hint is ignored – this time in a simple SELECT statement.

Try running an ordinary distributed query from the SYS account, and then try using the driving_site() hint to make it run at the remote site. When I tried this a few days ago I ended up wasting half an hour translating some SQL from ANSI to Oracle dialect because I thought that the ANSI was making Oracle transform the query in a way that lost the hint – then I discovered that both versions of the code worked correctly if I logged in as a different user.

I was running my queries between two databases using 11.1.0.7 – I won’t guarantee you get the same results on other versions, but it looks like SYS doesn’t honour the driving_site() hint. I can’t think of a robust argument why this should be the case, but if I were forced to do some vague hand-waving I’d probably mumble something about potential security loopholes.

Footnote: I should, of course, have mentioned that there are all sorts of things that behave in unexpected ways if you are logged on as SYS, and that you shouldn’t be logged on as SYS – especially in a production system.

[Further reading on “ignoring hints”]

Next Page »

Website Powered by WordPress.com.