Oracle Scratchpad

August 5, 2013

Bloom Filter

Filed under: 12c,CBO,Execution plans,Oracle — Jonathan Lewis @ 9:22 pm GMT Aug 5,2013

I’ve posted this note as a quick way of passing on an example prompted by a twitter conversation with Timur and Maria about Bloom filters:

The Bloom filter (capital B because it’s named after a person) is not supposed to appear in Oracle plans unless the query is executing in parallel but here’s an example which seems to use a serial Bloom filter.  Running in 11.2.0.3 and 12.1.0.1 (the results shown are the latter – the numbers are slightly different between versions):


create table t1 as
select
	rownum - 1			id1,
	trunc((rownum - 1)/10)		n1,
	lpad(rownum,10,'0')		small_vc,
	rpad('x',100)			padding
from
	all_objects
where
	rownum <= 5000
;

alter table t1 add constraint t1_pk primary key(id1);

create table t2 as
select
	trunc((rownum-1)/5)		id1,
	rownum				id2,
	lpad(rownum,10,'0')		small_vc,
	rpad('x',100)			padding
from
	all_objects
where
	rownum <= 25000
;

alter table t2 add constraint t2_pk primary key(id1, id2);

-- gather table stats for all columns size 1 (approximate NDV enabled).

create or replace view v2
as
select
 	t2.id1,
 	max(t2.id2) max_id2,
 	max(t2.small_vc) max_vc
from
 	t2
group by
 	t2.id1
;

set serveroutput off
set linesize 180
set pagesize 60
set trimspool on

select
 	/*+
 		qb_name(main)
 		no_merge(v2@main)
 		gather_plan_statistics
  	*/
 	t1.*,
 	v2.*
from
 	t1,
 	v2
where
 	v2.id1 = t1.id1
and	t1.n1 > 450
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

I’ve included a couple of hints to make sure that the optimizer doesn’t try complex view merging (with subsequent group by placement), and I’ve enabled stats collections. Here’s the execution plan, with predicates and execution stats.


SQL_ID  35nh459dsw88h, child number 1
-------------------------------------
select  /*+   qb_name(main)   no_merge(v2@main)
gather_plan_statistics  */  t1.*,  v2.* from  t1,  v2 where  v2.id1 =
t1.id1 and t1.n1 > 450

Plan hash value: 1196539158

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |    490 |00:00:00.02 |     532 |       |       |          |
|*  1 |  HASH JOIN            |         |      1 |    491 |    490 |00:00:00.02 |     532 |   823K|   823K| 1111K (0)|
|   2 |   JOIN FILTER CREATE  | :BF0000 |      1 |    491 |    490 |00:00:00.01 |      90 |       |       |          |
|*  3 |    TABLE ACCESS FULL  | T1      |      1 |    491 |    490 |00:00:00.01 |      90 |       |       |          |
|   4 |   VIEW                | V2      |      1 |   5000 |    527 |00:00:00.01 |     442 |       |       |          |
|   5 |    HASH GROUP BY      |         |      1 |   5000 |    527 |00:00:00.01 |     442 |   917K|   917K| 2507K (0)|
|   6 |     JOIN FILTER USE   | :BF0000 |      1 |  25000 |   2635 |00:00:00.26 |     442 |       |       |          |
|*  7 |      TABLE ACCESS FULL| T2      |      1 |  25000 |   2635 |00:00:00.26 |     442 |       |       |          |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("V2"."ID1"="T1"."ID1")
   3 - filter("T1"."N1">450)
   7 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID1"))

The 2,635 rows reported as A-rows in line 7 is consistent with the id1 values that would have been extracted in line 3, so it does look as if the Bloom filter has been created at line 2 and used as the predicate in line 7, even though this plan has run as a serial plan. (No slaves started, and no rows returned by querying v$pq_tqstat.)

16 Comments »

  1. Jonathan,

    as far I know bloom filters were introduced somewhere in 10gR2 to oracle database to support parallel queries. Moreover, they were used in 11gr1 (quite a long time ago) to optimize “partition range (list) subquery” access path, so it was definitely not supposed to occur only in parallel plans. I think it was Christian Antognini first, who described it in his book.
    However I’m simply amazed CBO is clever enough in your simple (and interesting) testcase to take advantage of Bloom filter, too. I suggest it will definitely have sense on Exadata platform while Bloom filter can be used to offload full scan on table t1

    Regards
    Pavol Babel

    Comment by Pavol Babel — August 5, 2013 @ 10:49 pm GMT Aug 5,2013 | Reply

  2. Last example on the russian forum was mine: http://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=997519&msg=13855348
    and it was even without an aggregation

    Comment by Sayan Malakshinov — August 6, 2013 @ 10:03 am GMT Aug 6,2013 | Reply

  3. Sayan,

    Can you reproduce the plan – it’s important when checking Bloom filter plans to be able to check the actual execution stats since a plan could claim to create and use a bloom filter without actually doing to. The dbms_photoshop one above your seems to be a case in point – it can’t be using a Bloom filter because none of the later plan lines show any predicates that would have to exist to operate the sys_op_bloom_filter() predicate. Your plan has a line which could be the relevant predicate (the table access line 12, which could eliminate the unique access). Execution stats could then tell us something about whether we were seeing early data elimination due to the filter.

    Comment by Jonathan Lewis — August 6, 2013 @ 10:28 am GMT Aug 6,2013 | Reply

  4. Jonathan,

    I’m not very experienced in formatting the code in your blog so please find my response here:

    http://www.sql.ru/forum/997519/bloom-filter?mid=14673341#14673341

    dbms_photoshop

    Comment by sqlmdx — August 7, 2013 @ 12:41 am GMT Aug 7,2013 | Reply

    • sqlmdx,

      Thanks for the follow-up – sorry I can’t reply on the Russian forum.

      I was wrong about your example.

      I didn’t want to assume that the appearance of a BLOOM FILTER CREATE / BLOOM FILTER USE was an absolute guarantee that a Bloom filter was created and used at run-time since we know that there are occasions when the run-time engine chooses a strategy that is not an exact match for the optimizer’s expression of its plan. (I have a couple of examples with plans that report Bloom filters but don’t seem to use them.)

      However, I did make a mistake in your case because I failed to consider the (highlighted) line 9 in your original plan which showed the A-rows dropping from 20,000 to 325 on the filter line – and that’s the most convincing indication that the Bloom filter was in use regardless of the fact that there was no reported predicate usage on that line.

      As the next comment on the forum points out – I think the Bloom filter applies only in the case of hash joins (at present), even though it could in principle be used for any join that that collected the entire first rowsource before visiting the second rowsource. It’s possible that a future version will make it possible in merge joins – but perhaps it isn’t implemented at present because of two implementation details that would have to be handled: (a) a merge join can use an indexed access path to get the first data set in order, so could visit the second data set before acquiring the whole of the first set, and (b) there may be an “entanglement” of operations in the second rowsource “sort join” operation that might make it a little complicated to insert the filter code without spending a more time than is considered desirable on the rewrite.

      Comment by Jonathan Lewis — August 7, 2013 @ 9:11 am GMT Aug 7,2013 | Reply

      • Hi Jonathan,

        Going back to px_join_filter hint.
        Let’s consider following modifications of my initial query:

        select --+ px_join_filter(t2)
         *
          from (select rownum id from dual connect by level <= 10) t1
          join (select id, count(value) cnt from x group by id) t2
            on t1.id = t2.id;
               
        select --+ leading(t1)
         *
          from (select rownum id from dual connect by level <= 10) t1
          join (select id, count(value) cnt from x group by id) t2
            on t1.id = t2.id;
               
        select --+ px_join_filter(t2) leading(t1)
         *
          from (select rownum id from dual connect by level <= 10) t1
          join (select id, count(value) cnt from x group by id) t2
            on t1.id = t2.id;
        

        Correspondent lines from the “Dumping Hints” section of 10053 trace for above queries
        are following:

          atom_hint=(@=0E3E0878 err=0 resol=1 used=0 token=1078 org=1 lvl=3 txt=PX_JOIN_FILTER ("T2" -1) )
          
          atom_hint=(@=0F780A8C err=0 resol=1 used=1 token=501 org=1 lvl=4 txt=LEADING ("T1") )  
          
          atom_hint=(@=0E4505F4 err=0 resol=1 used=0 token=1078 org=1 lvl=3 txt=PX_JOIN_FILTER ("T2" -1) )
          atom_hint=(@=0E450518 err=0 resol=1 used=1 token=501 org=1 lvl=4 txt=LEADING ("T1") )  
        

        It’s expected that used=0 for px_join_filter in first case but it’s strange why it remains 0 in the last case.
        Query plan and rowsource statistics demonstrate that bloom filter takes place though.

        dbms_photoshop

        Comment by sqlmdx — August 7, 2013 @ 11:01 pm GMT Aug 7,2013 | Reply

      • You wrote “I can’t reply on the Russian forum”.
        Due to some policy?
        Anyway, it doesn’t require registration.

        Comment by sqlmdx — August 7, 2013 @ 11:07 pm GMT Aug 7,2013 | Reply

  5. […] where Bloom filters can come into play – all involving hash joins: partition elimination, aggregate reduction on non-mergeable aggregate views, and […]

    Pingback by Quiz NIght | Oracle Scratchpad — September 13, 2013 @ 6:32 pm GMT Sep 13,2013 | Reply

  6. […] Note that the above examples are from Oracle 11.2.0.3 – in this version the Bloom filtering features should kick in only for parallel queries (although Jonathan Lewis has blogged about a case when this happens also on 11.2.0.3). […]

    Pingback by Combining Bloom Filter Offloading and Storage Indexes on Exadata | Tanel Poder's blog: Responsible data management — May 18, 2014 @ 5:15 am GMT May 18,2014 | Reply

  7. Hi Jonathan,
    As you have demonstrated by by the example that serial execution can also make use of Bloom Filters. I was wondering if there is any way to force the use of a Bloom Filter in a serial execution.

    Regards,
    Aveek

    Comment by Aveek — June 1, 2014 @ 5:36 pm GMT Jun 1,2014 | Reply

    • Asveek,

      I haven’t tested this thoroughly, it may only be a coincidence, and it’s not documented this way in the manuals BUT /*+ pq_filter (serial) */ may cause Oracle to choose Bloom filters if you’ve got enough hints to make sure that you’ve already controlled the join order and choice of build and probe tables.

      Comment by Jonathan Lewis — June 2, 2014 @ 12:59 pm GMT Jun 2,2014 | Reply

  8. Hi Jonathan,
    Many thanks for your response, I tried this with a two table join where my build table had around 250 records and the probed table had 10 million records in a hash join. But I could not get the query to use bloom filter, if possible can you explain this with an example.

    Regards,
    Aveek

    Comment by Aveek — June 17, 2014 @ 6:29 pm GMT Jun 17,2014 | Reply

    • Aveek,

      Two things – I forgot to point out that the pq_filter() hint is as 12c hint, so if you’re not running that version it’s not going to have any effect; secondly I would assume that the Bloom filter is only available in situations where it could make a difference, and I don’t think a simple serial hash join between two tables would be such a situation. (I could imagine that it would be of benefit in a merge join where filtering before sorting the second input might reduce the size of the sort significantly – but being able to imagine it doesn’t mean it will happen, of course.)

      Comment by Jonathan Lewis — June 17, 2014 @ 6:58 pm GMT Jun 17,2014 | Reply

  9. Thanks Jonathan, that explains it I am running Oracle 11.2.0.2 on Exadata V2.

    Comment by Aveek — June 18, 2014 @ 2:58 am GMT Jun 18,2014 | Reply

  10. Aveek,

    Now that you’ve said Exadata I’ve got a potential benefit. Exadata can use the Bloom filter to minimise the traffic between the storage cells and the compute nodes. If the second table access is a tablescan that does direct path reads the storage cells can read the blocks apply a bloom filter and send back only the columns and (probably needed) rows.

    It’s possible, depending on version and current caching, that your instance will be reading the second table through into the cache – if so you could try forcing your session to do serial direct path reads to see if this results in a bloom filter. I don’t have an Exadata to hand so I can’t do the experiment.

    Comment by Jonathan Lewis — June 18, 2014 @ 5:48 am GMT Jun 18,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,308 other followers