Oracle Scratchpad

July 13, 2015

Missing Bloom

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 1:37 pm BST Jul 13,2015

Here’s a little surprise that came up on the OTN database forum a few days ago. Rather than describe it I’m just going to create a data set to demonstrate it, initially using 11.2.0.4 although the same thing happens on 12.1.0.2. The target is a query that joins to a range/hash composite partitioned table and uses a Bloom filter to do partition pruning at the subpartition level.  (Note to self: is it possible to see Bloom filters that operate at both the partition and subpartition level from a single join? I suspect not.). Here’s my code to create and populate both the partitioned table and a driving table for the query:


rem
rem     Script: bloom_filter_fail.sql
rem     Dated:  4th July 2015
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

create table pt_composite_1 (
        part_key        number(8),
        subp_key        number(8),
        small_vc        varchar2(40),
        padding         varchar2(100)
)
nologging
partition by range(part_key)
subpartition by hash (subp_key)
subpartition template (
        subpartition g1,
        subpartition g2,
        subpartition g3,
        subpartition g4
)
(
        partition p01 values less than ( 10),
        partition p02 values less than ( 20),
        partition p03 values less than ( 30),
        partition p04 values less than ( 40),
        partition p05 values less than ( 50),
        partition p06 values less than ( 60),
        partition p07 values less than ( 70),
        partition p08 values less than ( 80),
        partition p09 values less than ( 90),
        partition p10 values less than (100),
        partition p11 values less than (110),
        partition p12 values less than (120)
)
;

insert into pt_composite_1 (
        part_key, subp_key, small_vc, padding
)
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        to_char(trunc((rownum-1)/20))   small_vc,
        rpad('x',100)                   padding
from
        dual
connect by
        rownum <= 25000
;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

create table driver (
        part_key        number(8),
        subp_key        number(8),
        test            number(4)
)
;

execute dbms_random.seed(0)

insert into driver
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        mod(rownum - 1, 30)
from
        dual
connect by
        level <= 60 ; begin dbms_stats.gather_table_stats( ownname => user,
                tabname         => 'driver',
                method_opt      => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'pt_composite_1',
                method_opt      => 'for all columns size 1',
                granularity     => 'all'
        );
end;
/

So I’ve got a table with 12 partitions, each hash subpartitioned into 4 subpartitions, a total of 400,000 rows, and a driving table with 60 rows with two rows per value for column test, which probably means two separate subpartitions identified for most values of test. I set this data up to do a number of different experiments but the only result I’m going to report here is about the sub-partition key. Here’s a query that selects all the data from the partitioned table that matches the subp_key value from a subset of the driver table:


select
        ptc.part_key, ptc.subp_key, count(*), max(ptc.small_vc)
from
        pt_composite_1  ptc
where
        (ptc.subp_key) in (
                select  subp_key
                from    driver
                where   test = 0
        )
group by
        ptc.part_key, ptc.subp_key
;

The optimizer has the option to unnest the subquery and turn the query into a semi-join (specifically a right outer join), and we might hope to see a hash join with Bloom filtering being used to restrict the hash subpartitions that we visit. (We’ve (probably) picked two values for the subp_key, so we don’t expect to visit more than 2 of the hash subpartitions from each of the range partitions.) Here’s the execution plan I got, with rowsource execution statistics:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |      1 |        |       |       |    238 |00:00:01.74 |    2329 |       |       |          |
|   1 |  HASH GROUP BY                |                |      1 |    238 |       |       |    238 |00:00:01.74 |    2329 |   960K|   960K| 1377K (0)|
|*  2 |   HASH JOIN RIGHT SEMI        |                |      1 |  15997 |       |       |  15856 |00:00:01.69 |    2329 |  2440K|  2440K|  905K (0)|
|   3 |    PART JOIN FILTER CREATE    | :BF0000        |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|*  4 |     TABLE ACCESS FULL         | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   5 |    PARTITION RANGE ALL        |                |      1 |    400K|     1 |    12 |    104K|00:00:01.04 |    2306 |       |       |          |
|   6 |     PARTITION HASH JOIN-FILTER|                |     12 |    400K|:BF0000|:BF0000|    104K|00:00:00.63 |    2306 |       |       |          |
|   7 |      TABLE ACCESS FULL        | PT_COMPOSITE_1 |     12 |    400K|     1 |    48 |    104K|00:00:00.22 |    2306 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (part_keyentified by operation part_key):
---------------------------------------------------
   2 - access("PTC"."SUBP_KEY"="SUBP_KEY")
   4 - filter("TEST"=0)

Oracle has unnested the subquery and converted to a right outer semi-join using a hash join. While building the in-memory hash table it has constructed a Bloom filter at operation 3 of the plan to help it eliminate hash subpartitions, and used that Bloom filter at operation 6 of the plan. Our query does nothing to eliminate any of the range partitions so we can see operation 5 is a “partition range all”, and the application of the Bloom filter at operation 6 starts 12 times, once for each range partition. As we can see from operation 7, the Bloom filter generated by our selection from the driver table happened to identify just one subpartition – we start the TABLE (subpartition) ACCESS FULL 12 times, once for each range scan. If our driver data (and the Bloom filter) had identified 2 subpartitions we would have seen operation 7 start 24 times.

So we’ve met our first target – demonstrating that we can get a Bloom filter to eliminate at the subpartition level. Now we need to break things – the OP had a problem with a query that used Bloom filters on one system but didn’t use them for (nominally) the same setup on another system. Here’s my first attempt, with the resulting execution plan:


alter table pt_composite_1 add partition p13 values less than (130) subpartitions 8;

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |      1 |        |       |       |    238 |00:00:01.75 |    2628 |       |       |          |
|   1 |  HASH GROUP BY             |                |      1 |   3310 |       |       |    238 |00:00:01.75 |    2628 |   960K|   960K| 2529K (0)|
|*  2 |   HASH JOIN RIGHT SEMI     |                |      1 |  16000 |       |       |  15856 |00:00:01.71 |    2628 |  2440K|  2440K|  743K (0)|
|*  3 |    TABLE ACCESS FULL       | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   4 |    PARTITION RANGE ALL     |                |      1 |    400K|     1 |    13 |    104K|00:00:01.05 |    2605 |       |       |          |
|   5 |     PARTITION HASH SUBQUERY|                |     13 |    400K|KEY(SQ)|KEY(SQ)|    104K|00:00:00.64 |    2605 |       |       |          |
|   6 |      TABLE ACCESS FULL     | PT_COMPOSITE_1 |     13 |    400K|     1 |    56 |    104K|00:00:00.22 |    2306 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

I’ve added a new partition – with a different number of subpartitions from the table default. The Bloom filter has disappeared and the optimizer has decided to do subquery pruning instead. Drop the partition and recreate it with 2 subpartitions and the same thing happens; drop it and recreate it with 4 subpartitions and we’re back to a Bloom filter. It seems that the Bloom filter depends on every partition having the same number of subpartitions. (That’s not too surprising – the code to handle a Bloom filter when there are a variable number of subpartitions could get a little messy, and there probably aren’t many sites that use variable numbers of subpartitions.)

You might note from the Starts value for operation 5 (the subquery line) that the subquery had to run 13 times. Checking the 10046 trace file we can see the following SQL:


SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 0, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 1, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 2, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 3, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 4, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 5, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 6, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 7, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 8, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 9, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 10, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 11, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 12, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1

This is the optimizer trying to work out, for each of the 12 partitions, which subpartitions it needs to visit. In my case this resulted in a full tablescan of the driver table for each partition. For hash subpartitions, at least, this does seem to be overkill (and can anyone say “bind variables”) – wouldn’t it be possible to run the query once for the partition with the most subpartitions and then derive the correct subpartition number for all other cases ? Maybe, but perhaps that’s just too much special-case code, or maybe it’s on the todo list. Realistically we might guess that a driver table would be very small compared to the size of the subpartitions you were eventually going to scan, so the excess extra work may be a tiny fraction of the total workload – so the added complexity might be seen as too much investment (and risk) for too little return. Maybe in a future release there will be a bit of patching to reduce this overhead.

Conclusion

You may find that some execution plans involving hash subpartitions become less efficient if you don’t keep the number of subpartitions per partition constant across the entire table. I’ve only tested with range/hash composites but there may be other variations of composite partitoning where a similar change in plans occurs.

Footnote

I haven’t done any exhaustive investigation yet, but so far I haven’t been able to create a data set, or perhaps a query, that allows the optimizer to create a Bloom filter (or two) from the driving table and then filter both the range partitions and the hash subpartitions. The closest I’ve come is a plan that shows a Bloom filter being used to filter the range partitions followed by a pruning subquery for the hash subpartitions.

10 Comments »

  1. Nice article, though I have just a slight quibble with the title. I think it might better be “Failure to Get a Bloom Filter.” When I saw your post title I was quite concerned I was about to read a case where in the course of applying a Bloom filter Oracle failed in the sense of filtering out some valid results. That, of course, would be a wrong information bug, rather than “merely” a missed optimization. To date (unless I’ve missed or forgotten something) we do not have to deal with paranoia that a Bloom filter applied during Oracle results projection will give wrong results, but rather only that it may sometimes not be applied when it would be useful and may from time to time include more false positives than we’ve like which get filtered later in the process. Please do let me know if I’m wrong about that.

    Comment by rsiz — July 13, 2015 @ 4:11 pm BST Jul 13,2015 | Reply

    • Mark,

      I think that’s a quibble worth addressing – I overlooked that interpretation in my constant aim for short titles, and I’d hate to be the source of a myth that Oracle’s Bloom filters do it wrong.
      They may return too much data (false positives) sometimes, but they don’t throw away data they should keep.

      Title modified.

      Comment by Jonathan Lewis — July 13, 2015 @ 4:17 pm BST Jul 13,2015 | Reply

  2. Interesting case when oracle fallback to SUBQUERY prunning.

    There are several “interesting” things regarding BLOOM FILTERS. One of then is following: Oracle is not able to take advantage of JOIN FILTER PREDICATE for bulk insert (aka INSERT SELECT statement). SELECT is OK, interestingly MULTI TABLE INSERT (aka INSERT ALL) works as well.

    So workaround for 11.2.0.4 was using INSERT ALL with just one destination table for set of ETL jobs. Then it worked like a charm. Similiar as direct path inserts, by default Exadata is not able to take advatnage of SMART SCANS for INSERT SELECT statement, moreover it does not even go for direct path read and stick to buffered reads (cell multiblock read), possible workarround is to modify “_serial_direct_read” parameter, which is strange.

    Regards
    Pavol Babel

    Comment by Pavol Babel — July 13, 2015 @ 5:43 pm BST Jul 13,2015 | Reply

    • Pavol,

      Thanks for the note – I wonder if doing the select through a pipelined function would work (though your workaround seems more desirable since it avoids a couple of layers of obscurity).

      I also wonder if the Exadata thing is related to the comments I made a few days ago about update/select for update – setting the hidden parameter was a workaround for that too.

      Comment by Jonathan Lewis — July 13, 2015 @ 6:18 pm BST Jul 13,2015 | Reply

      • Jonathan,

        interesting idea with pipelined function, will definitely give a try.
        Regarding the “Exadata thing”… Well this seems to be standard RDBMS behaviour, as far I know. In 11g for bulk insert (INSERT as SELECT) oracle always stick to buffer full table scan, which is strange. I read your SQL vs. PL/SQL some days before, I do not thing it has something to do with that. Bulk insert seems to stick to buffered FTS always, no meter how huge source table is when compared with size of db cache. This applies even for direct path insert with is very strange for me. The note 1348116.1 suggests two workarounds, using PARALLEL DML or setting “_serial_direct_read” = “true”. I was only surprised oracle development didn’t change it specially for Exadata / Engineered Systems.

        One more note to BF, another possible workaround for Insert as Select with BLOOM FILTERS is also CTAS, which we were not able to follow in our particular case.

        Regards
        Pavol Babel

        Comment by Pavol Babel — July 13, 2015 @ 11:31 pm BST Jul 13,2015 | Reply

  3. The BF trouble for INSERT as SELECT is mentioned in following note: Performance Difference Between ‘select’ And ‘insert into..as select’ For The Same Query on Exadata. (Doc ID 1303570.1)

    There is again option for using Create Table as Select (CTAS), which was not acceptable again. And then idea INSERT /*+ APPEND */ ALL … restricted to one target table crossed my mind.
    It is very interesting to follow all this little differences between CTAS and INSERT as SELECT. Next (un)famous can be observed in FIRST ROWS optimisation, where INSERT as SELECT always stick to ALL ROWS optimisation (even when rownum predicated is used) which is not a case for CTAS. Or maybe, in many cases CTAS behaves more then pure SELECT as opposite to INSERT as SELECT

    Comment by Pavol Babel — July 14, 2015 @ 12:36 am BST Jul 14,2015 | Reply

  4. I have a problem with missing bloom filter when using merge statement.
    Here script that reproduces this behavior:

    SET PAGESIZE 0
    CREATE TABLE Small_Tab(Gl_Book_Dt DATE NOT NULL)
    /
    CREATE TABLE Big_Tab
    (
     Main_Org_Id VARCHAR2(8),
     Delv_Srce_Nm VARCHAR2(8),
     Gl_Book_Dt DATE,
     Verf_Id VARCHAR2(20),
     Verf_Seq_Nr NUMBER,
     Ar_Id VARCHAR2(20),
     Verf_Tp VARCHAR2(3),
     Part_Fld AS (Main_Org_Id || '-' || TO_CHAR(Main_Org_Id,'yyyymmdd'))
    )
    PARTITION BY LIST (Part_Fld)
    SUBPARTITION BY LIST (Delv_Srce_Nm)
    (
     PARTITION P_1234_20130101 VALUES ('1234-20130101')
     (
     SUBPARTITION P_1234_20130101_XXX VALUES ('XXX') TABLESPACE GLAS_1530_DATA
     )
    )
    /
    BEGIN
     FOR i IN 1 .. 100 LOOP
     EXECUTE IMMEDIATE 'ALTER TABLE Big_Tab ADD PARTITION P_1530_' || TO_CHAR(DATE '2013-01-01' + i, 'yyyymmdd') || ' VALUES(''1530-' || TO_CHAR(DATE '2013-01-01' + i, 'yyyymmdd') || ''')';
     END LOOP;
    END;
    /
    EXPLAIN PLAN FOR
     WITH Temp AS (SELECT g.Verf_Id,
     g.Gl_Book_Dt,
     MAX(MAX(g.Verf_Seq_Nr)) OVER(PARTITION BY g.Verf_Id) + RANK() OVER(PARTITION BY g.Verf_Id ORDER BY MAX(ROWNUM)) Next_Verf_Seq_Nr
     FROM Small_Tab c
     JOIN Big_Tab g ON g.Part_Fld = '1530-' || TO_CHAR(c.Gl_Book_Dt, 'yyyymmdd')
     GROUP BY g.Verf_Id,
     g.Gl_Book_Dt)
    SELECT t.Gl_Book_Dt,
     t.Verf_Id,
     t.Next_Verf_Seq_Nr
     FROM Temp t
    /
    SELECT * FROM table(Dbms_Xplan.Display(Format =&gt; 'BASIC'))
    /
    EXPLAIN PLAN FOR
    MERGE --+ leading(s)
     INTO Big_Tab t
    USING (WITH Temp AS (SELECT g.Verf_Id,
     g.Gl_Book_Dt,
     MAX(MAX(g.Verf_Seq_Nr)) OVER(PARTITION BY g.Verf_Id) + RANK() OVER(PARTITION BY g.Verf_Id ORDER BY MAX(ROWNUM)) Next_Verf_Seq_Nr
     FROM Small_Tab c
     JOIN Big_Tab g ON g.Part_Fld = '1530-' || TO_CHAR(c.Gl_Book_Dt, 'yyyymmdd')
     GROUP BY g.Verf_Id,
     g.Gl_Book_Dt)
    SELECT t.Gl_Book_Dt,
     t.Verf_Id,
     t.Next_Verf_Seq_Nr
     FROM Temp t) s
     ON (t.Part_Fld = '1530-' || TO_CHAR(s.Gl_Book_Dt, 'yyyymmdd')
     AND t.Delv_Srce_Nm = 'PPR'
     AND t.Verf_Id = s.Verf_Id
     AND t.Verf_Seq_Nr = s.Next_Verf_Seq_Nr)
     WHEN MATCHED THEN UPDATE
     SET t.Ar_Id = ''
    /
    SELECT * FROM table(Dbms_Xplan.Display(Format =&gt; 'BASIC'))
    /
    ROLLBACK
    /
    DROP TABLE Small_Tab PURGE
    /
    DROP TABLE Big_Tab PURGE
    /
    
    Just select statement. Bloom filter is here
    -----------------------------------------------------
    | Id  | Operation                       | Name      |
    -----------------------------------------------------
    |   0 | SELECT STATEMENT                |           |
    |   1 |  VIEW                           |           |
    |   2 |   WINDOW SORT                   |           |
    |   3 |    HASH GROUP BY                |           |
    |   4 |     COUNT                       |           |
    |   5 |      HASH JOIN                  |           |
    |   6 |       PART JOIN FILTER CREATE   | :BF0000   |
    |   7 |        TABLE ACCESS FULL        | SMALL_TAB |
    |   8 |       PARTITION LIST JOIN-FILTER|           |
    |   9 |        PARTITION LIST SINGLE    |           |
    |  10 |         TABLE ACCESS FULL       | BIG_TAB   |
    -----------------------------------------------------
    
    The same select within merge. Bloom filter is gone
    ----------------------------------------------------
    | Id  | Operation                      | Name      |
    ----------------------------------------------------
    |   0 | MERGE STATEMENT                |           |
    |   1 |  MERGE                         | BIG_TAB   |
    |   2 |   VIEW                         |           |
    |   3 |    HASH JOIN                   |           |
    |   4 |     VIEW                       |           |
    |   5 |      WINDOW SORT               |           |
    |   6 |       HASH GROUP BY            |           |
    |   7 |        COUNT                   |           |
    |   8 |         HASH JOIN              |           |
    |   9 |          TABLE ACCESS FULL     | SMALL_TAB |
    |  10 |          PARTITION LIST ALL    |           |
    |  11 |           PARTITION LIST SINGLE|           |
    |  12 |            TABLE ACCESS FULL   | BIG_TAB   |
    |  13 |     PARTITION LIST ALL         |           |
    |  14 |      PARTITION LIST SINGLE     |           |
    |  15 |       TABLE ACCESS FULL        | BIG_TAB   |
    ----------------------------------------------------
    

    Comment by Zilvinas — December 21, 2015 @ 1:37 pm BST Dec 21,2015 | Reply

    • Zilvinas,

      It’s important to remember that the “orthoginality” of the cost based optimizer is not perfect; in other words you may see examples of mechanism “A” and mechanism “B”, but when you produce an example where you expect “B” to be inside “A” then something unexpected happens. In this case it might be the Bloom Filter that is the surprising side effect rather than its absence.

      Secondly – both your tables are empty. It’s always possible that a special case (possibly an accidental special case) appears when the optimizer KNOWs that there is no data to be queried. I actually built your tables, created three rows in “small_tab” that should have matched the partitions of big_tab, and then ran your queries with rowsource execution stats enabled. In 11.2.0.4 with no data I saw the Bloom filter, with three rows in place the Bloom filter disappeared.

      Comment by Jonathan Lewis — December 21, 2015 @ 3:30 pm BST Dec 21,2015 | Reply

      • Thanks for the answer,

        I undestand that statistics make the difference. Of cource in my development enviromnment there are records. I just make this small case to demonstrate my problem. I know that this bloom filter would be perfect way to execute my query (I know by businnes rules that there will be only few records to be merged, big table is quite big with many partitions and have no indexes). It would be nice to have some hint to force bloom filtering, but there are no such hint as far as I know. If I create CTAS and then merge results then everyting takes less than second, but the merge statement as it is takes about 30 seconds on my development environment. I could take CTAS approach, but I do not realy like create tables dynamicaly in production environment (besides I’m not shure if I will have create table priviledge in production environment). I tried to use materialize hint, but that not helped. There is the way to use nested loops instead hash join. In this case paritition pruning is happening, but then I will full scan the same partition several times. I have an idea to try using recursive subquery factoring. Maybe this way I will forse partition prunning. And I have an idea to execute small query in separate sql statement and after that cosntruct merge statement with “WHERE part_fld = any(‘xxx’, ‘yyy’)” and execute it dynamicaly with dbms_sql pacakage. Don’t like dynamic queries though. If someone could suggest some better solution I’d appreciate it a lot.

        Comment by Zilvinas — December 22, 2015 @ 6:53 am BST Dec 22,2015 | Reply

  5. […] but I tried adding one to the insert statement anyway – and nothing changed; however there is a “subquery pruning” mechanism that’s been available to the optimizer for a very long time now so I decided to try hinting […]

    Pingback by DML and Bloom | Oracle Scratchpad — July 8, 2016 @ 1:01 pm BST Jul 8,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.