Distributed Queries

July 1, 2009

Distributed Queries

Filed under: distributed,Execution plans,Hints — Jonathan Lewis @ 7:20 am BST Jul 1,2009

Some time ago I wrote a note about distributed DML, pointing out that the driving_site() hint works with distributed queries but not with DML based on distributed queries; so insert as select, or create as select and so on will “ignore” the hint.

This is just a little follow-up to give you an idea of what execution plans for distributed queries look like so that you can tell whether your query is going to work locally or remotely.

I have created a loopback database link (d10g@loopback) with a connection qualifier – another term for the glossary eventually – so the example runs on a single database, but the optimizer believes it is running on a distributed pair; and I’ve defined a couple of small tables to join.

Here’s the query with its execution plan when I run the join at the local database. I’ve used the driving_site() hint for textual clarity to identify the “home” database as the place to run the query, but in the absence of a hint I believe the query will always run at the local database – the optimizer appears to have no mechanism for evaluating the network cost of changing the location where the query runs. (I have seen a few books and a metalink note claiming that there is a network cost built into the cost arithmetic – but I’ve never seen anything in the 10053 trace that supports the claim. [Update Dec 2014: see footnote 2])

SQL  select
  2  	/*+ driving_site (dh) */
  3  	dh.small_vc,
  4  	da.large_vc
  5  from
  6  	dist_home		dh,
  7  	dist_away@d10g@loopback	da
  8  where
  9  	dh.small_vc like '12%'
 10  and	da.id = dh.id
 11  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1261259267

------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     2 |   260 |    21   (0)| 00:00:01 |        |      |
|   1 |  NESTED LOOPS      |           |     2 |   260 |    21   (0)| 00:00:01 |        |      |
|*  2 |   TABLE ACCESS FULL| DIST_HOME |     2 |    30 |    19   (0)| 00:00:01 |        |      |
|   3 |   REMOTE           | DIST_AWAY |     1 |   115 |     1   (0)| 00:00:01 |   D10G | R->S |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DH"."SMALL_VC" LIKE '12%')

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "ID","LARGE_VC" FROM "DIST_AWAY" "DA" WHERE "ID"=:1 (accessing
       'D10G.JLCOMP.CO.UK@LOOPBACK' )

Note, in particular, that the operation in line 3 is “REMOTE” – and the name of the remote object is dist_away. This should be enough to tell us (at least in this case) where the query is actually executing and which database is the remote one. The “Remote SQL Information” makes things even more clear: it tells us the SQL that is being sent to “the other” database, and tells us where that database is. This query is running at the database that holds table dist_home and, for each row selected from dist_home, is sending a query to d10g@jlcomp.demon.co.uk@loopback.

Now we repeat the test, but put into the driving_site() hint the alias of the table that is located at the remote database:

SQL  select
  2  	/*+ driving_site (da) */
  3  	dh.small_vc,
  4  	da.large_vc
  5  from
  6  	dist_home		dh,
  7  	dist_away@d10g@loopback	da
  8  where
  9  	dh.small_vc like '12%'
 10  and	da.id = dh.id
 11  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 4154226149

---------------------------------------------------------------------------------------------------------
| Id | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
---------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT REMOTE      |           |     2 |   450 |    15   (0)| 00:00:01 |        |      |
|  1 |  NESTED LOOPS                |           |     2 |   450 |    15   (0)| 00:00:01 |        |      |
|  2 |   REMOTE                     | DIST_HOME |     2 |    40 |    13   (0)| 00:00:01 |      ! | R->S |
|  3 |   TABLE ACCESS BY INDEX ROWID| DIST_AWAY |     1 |   205 |     1   (0)| 00:00:01 |   D10G |      |
|* 4 |    INDEX UNIQUE SCAN         | DA_PK     |     1 |       |     0   (0)| 00:00:01 |   D10G |      |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A1"."ID"="A2"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - SELECT "ID","SMALL_VC" FROM "DIST_HOME" "A2" WHERE "SMALL_VC" LIKE '12%' (accessing '!' )

Note
-----
   - fully remote statement

The “Note” section is a bit of a clue here – the query is running at the remote site. We also see at line 2 that the “REMOTE” operation is against a table called dist_home – which isn’t remote as far as we’re concerned, it’s in our local database! But we’re seeing the execution plan from the perspective of “the other” database – and to the other database we are remote. This is confirmed by the instance information and the “Remote SQL Information” where we see “!” as the location of dist_home. The “!” is my database’s name for itself: the remote database is running the query, and sending a select statement to me for each row it selects from dist_away.

Footnote: If you’ve ever seen sysdate, for example, turning into sysdate@! in the predicate section of an execution plan, you now know why. “!” is the database’s name for itself and the database is making it very clear whose value of sysdate it is using.

Footnote 2: Alerted to this blog note by a ping-back, I need to correct a significant error. I have now seen in recent versions of Oracle some information showing that the optimizer has considered the cost of remote vs. local access to data. I haven’t yet examined how far back this feature goes, or tried to work out the details of how the calculation is done, but will note that the presence of the calculation rarely, if ever, seems (based on a couple of examples I created) to change the access path. It’s possible that the calculation simple modifies the join method (HJ vs NLJ) without considering the choice of driving site.

Comments (11)

11 Comments »

[…] [Further reading: Distributed Queries] Comments (11) […]

Pingback by Distributed DML « Oracle Scratchpad — July 1, 2009 @ 7:21 am BST Jul 1,2009 | Reply
Info on connection qualifier is here: http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/ds_admin002.htm#i1008068

Comment by Colin 't Hart — July 1, 2009 @ 10:51 am BST Jul 1,2009 | Reply
[…] Jonathan Lewis -Distributed Queries […]

Pingback by Blogroll Report 26/06/2009 – 03/07/2006 « Coskan’s Approach to Oracle — July 3, 2009 @ 3:29 pm BST Jul 3,2009 | Reply
I wonder why Oracle does not simply query the remote data dictionary (or ask the remote CBO) to get the needed information.

Comment by Wolfgang — July 5, 2009 @ 5:36 am BST Jul 5,2009 | Reply
- Wolfgang,
  
  Can you be more precise in which bit of information you are thinking of ?
  
  The local database will request basic object statistics from the remote database (table stats, index stats, simple column stats) before optimising the query; but there are two “flaws” to the strategy:
  
  a) It doesn’t request histogram data from the remote database
  
  b) When doing the optimizer calculation it doesn’t allow any extra costs for round-trip time to the remote database. It simply uses the statistics as if the tables were all local tables.
  
  Comment by Jonathan Lewis — July 5, 2009 @ 7:33 am BST Jul 5,2009 | Reply
  - Jonathan,
    
    I’m thinking about a kind of dynamic sampling parameter for remote queries. Lets call it remote_stats. If remote_stats is set to 1 only basic informations are retrieved from the remote instance (you already mentioned them) and as you set remote_stats to a higher value more and more information about the remote tables are requested from the other instance.
    So the user can decide what’s more important: less network overhead or more round trips and probably a better execution plan.
    
    Comment by Wolfgang — July 5, 2009 @ 11:09 am BST Jul 5,2009 | Reply
> the remote database is running the query, and sending a select statement to me for each row it selects from dist_away.

My impression is that the remote database is sending us just the result in order to display it. It “gets” the rows from dist_home, performs the inner loop on-site and sends us the reult back.

Comment by todorbotev — August 22, 2009 @ 1:56 pm BST Aug 22,2009 | Reply
- >> the remote database is sending us just the result
  >> It “gets” the rows from dist_home
  
  Just trying to remove all ambiguity:
  “dist_home”, “me” and “us” are at the local database.
  The remote database “gets” the rows from dist_home [us/me] by sending us [dist_home/me] a select statement that asks for some rows each time it gets a row from dist_away.
  
  Comment by Jonathan Lewis — August 23, 2009 @ 8:32 am BST Aug 23,2009 | Reply
[…] Lewis @ 9:04 pm UTC May 19,2010 I’ve previously published a couple of notes (here and here) about the driving_site() hint. The first note pointed out that the hint was deliberately ignored […]

Pingback by Ignoring Hints « Oracle Scratchpad — May 19, 2010 @ 9:05 pm BST May 19,2010 | Reply
[…] to remote databases you may not always get the best execution (plan) available, because Oracle always runs the query on the local database as it has no way of estimating the cost of network traffic and […]

Pingback by Tuning Distributed Queries in Oracle | Databaseline — December 7, 2014 @ 5:13 am GMT Dec 7,2014 | Reply
[…] https://jonathanlewis.wordpress.com/2009/07/01/distributed-queries/ […]

Pingback by Execution Plan: Como interpretar as seções não estatísticas do plano de execução | DBTimeWizard — December 21, 2015 @ 12:47 am GMT Dec 21,2015 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

July 1, 2009