Oracle Scratchpad

February 20, 2018

Assumptions

Filed under: CBO,Oracle,Philosophy — Jonathan Lewis @ 8:57 am GMT Feb 20,2018

As the years roll on I’ve found it harder and harder to supply quick answers to “simple” questions on the Oracle-L list server and OTN/ODC forum because things are constantly changing and an answer that may have been right the last time I checked could now be wrong. A simple example of the consequences of change showed up recently on the OTN/ODC forum where one reply to a question started:

Just why do you need distinct in a subquery??? That’s the first thing that appears really shocking to me. If it’s a simple in (select …) adding a distinct to the subquery would just impose a sort unique(as you can see in the explain plan), which may be quite costly.

Three question-marks is already tip-toeing its way to the Pratchett limit – but “really shocking”? It’s bad enough that the comment goes operatic but going operatic in order to introduce an error pushes the thing into tragedy (or possibly farce – or maybe both). To make the self-inflicted injury worse there were two execution plans supplied in the original post anyway of which only one showed any attempt to achieve uniqueness.

Bottom line – when you’re about to correct someone for doing something that is “obviously” wrong be a little bit kind about it and then be kind to yourself and do a quick sanity check to see if your attempt at correction is itself correct. A good guideline would be to ask yourself: “How do I know what I know – and am I about to make myself look like an idiot (again).”

Check It

Question: Does a  “distinct” in a subquery impose a sort (or hash) unique ?

Answer: No – a uniqueness operation may appear but it’s not guaranteed to appear.

Here’s a quick example which does not result in any attempt at imposing uniqueness (running 11.2.0.4):

rem
rem     Script:         unnest_demo.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

drop table t2 purge;
drop table t1 purge;
create table t1 as select * from all_objects where rownum  <= 100;
create table t2 as select * from all_objects where rownum <= 100;

create index t1_i1 on t1(owner);
create index t2_i2 on t2(object_type);


set autotrace traceonly explain

select  * 
from    t1 
where   owner = 'OUTLN' 
and     object_name in (
                select distinct object_name 
                from   t2 
                where  object_type = 'TABLE'
        )
;


Execution Plan
----------------------------------------------------------
Plan hash value: 3169044451

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     3 |   558 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN SEMI              |       |     3 |   558 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |    12 |   336 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   5 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)


There’s no sign of a sort unique or hash unique. The optimizer has decided that the IN subquery can be transformed into an EXISTS subquery which can then be transformed into a SEMIJOIN.

I can think of three other execution plan strategies that might have appeared depending on the data, indexing, and statistics. Initially I had just hacked the text above to produce the following plans and hadn’t saved anything as a library script but following a request in the comments below I decided to recreate the whole text and report the hints I’d used. In all the following cases the hints I quote go in the subquery, not in the main body of the query:

a) Hinting /*+ no_unnest */ transforms the IN subquery to an EXISTS subquery which operates as a filter subquery (with no uniqueness imposed):


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   158 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                      |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    28 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

b) Hinting /*+ unnest no_merge no_semijoin */ gets a simple unnest with sort/hash unique and join


--------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |     3 |   525 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN                      |          |     3 |   525 |     5  (20)| 00:00:01 |
|   2 |   JOIN FILTER CREATE            | :BF0000  |     3 |   474 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID  | T1       |     3 |   474 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN            | T1_I1    |     3 |       |     1   (0)| 00:00:01 |
|   5 |   VIEW                          | VW_NSO_1 |    12 |   204 |     3  (34)| 00:00:01 |
|   6 |    HASH UNIQUE                  |          |    12 |   336 |     3  (34)| 00:00:01 |
|   7 |     JOIN FILTER USE             | :BF0000  |    12 |   336 |     2   (0)| 00:00:01 |
|   8 |      TABLE ACCESS BY INDEX ROWID| T2       |    12 |   336 |     2   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN          | T2_I2    |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   4 - access("OWNER"='OUTLN')
   9 - access("OBJECT_TYPE"='TABLE')


For my particular data set I had originally had to take the optimizer_features_enable back to ‘8.1.7’ to get the above  plan but on recreating the tests I realised that there was a way to get the plan from basic hints (though the modern version of Oracle has slipped a Bloom filter into the hash join). As you can see that there’s a HASH UNIQUE at operation 6, but that would have been there whether or not the DISTINCT keyword had appeared in the SQL. Effectively the query has been transformed to:


select  t1.*
from    (
                select  distinct t2.object_name object_name
                from    t2
                where   t2.object_type='TABLE'
        )
        vw_nso_1,
        t1
where   t1.owner = 'OUTLN'
and     t1.object_name = vw_nso_1.object_name
/

c) Hinting /*+ unnest no_semijoin merge */ results in unnesting then a “transform distinct aggregation” so that the distinct is applied after the join. (In the original text I had said this was using “place group by”. But that’s the transformation that pushes an aggregate inside a join while what’s happening here is one of the variants of the opposite transformation.)

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     3 |   474 |     5  (20)| 00:00:01 |
|   1 |  VIEW                          | VM_NWVW_1 |     3 |   474 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                  |           |     3 |   594 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN                   |           |     3 |   594 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1        |     3 |   510 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | T1_I1     |     3 |       |     1   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2        |    12 |   336 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | T2_I2     |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."OBJECT_NAME"="T2"."OBJECT_NAME")
   5 - access("T1"."OWNER"='OUTLN')
   7 - access("T2"."OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)

Again the plan would be the same whether or not the original subquery had a redundant distinct.

The things you think you know may have been true 10 years ago – but maybe they’re not true any longer, or maybe they’re still true on your version of the database but not every  version of the database. So I often end up looking at a question, thinking the poster’s claim can’t be right, and then working out and modelling the circumstances that might make the poster’s observations appear – and learning something new at the same time.

Remember: “I’ve never seen it” doesn’t mean “It doesn’t happen”.

Update (1st March 2018)

In a remarkably timely coincidence – showing that there’s always more to see no matter how carefully you think you’ve been looking – Nenad Noveljic shows us that sometimes it’s actually a positively good thing to have a “redundant” distinct, because it bypasses an optimizer bug.

 

5 Comments »

  1. Hi Jonathan,

    Just wondering how we can get the plan in the section “c) Unnest, then “place group by” so that the distinct is applied after the join” ?

    Comment by Wing — March 2, 2018 @ 6:41 am GMT Mar 2,2018 | Reply

    • Wing,

      I found that I hadn’t kept the script to run this demo, and I couldn’t remember what I’d done, so I had to start from scratch and also discovered how to get the 8.1.7.4 plan without having to set the optimizer_features_enable hint. I’m updating the text to add all the different hints.

      Comment by Jonathan Lewis — March 2, 2018 @ 10:46 am GMT Mar 2,2018 | Reply

  2. […] internally. The “place group by” transformation appeared in 11.1.0.6 in 2007, and the “transform distinct aggregation” appeared in 11.2.0.1 in […]

    Pingback by CBO++ | Oracle Scratchpad — May 13, 2019 @ 8:49 am BST May 13,2019 | Reply

  3. […] = ‘TABLE’ (which might be all of them). The script I started with was one I had used in a note I wrote about “distinct” appearing in the select list of subqueries, but the data it produced in the newer versions of […]

    Pingback by qbregistry | Oracle Scratchpad — August 24, 2021 @ 11:54 am BST Aug 24,2021 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.