Oracle Scratchpad

July 13, 2015

Missing Bloom

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 1:37 pm BST Jul 13,2015

Here’s a little surprise that came up on the OTN database forum a few days ago. Rather than describe it, I’m just going to create a data set to demonstrate it, initially using 11.2.0.4 although the same thing happens on 12.1.0.2. The target is a query that joins to a range/hash composite partitioned table and uses a Bloom filter to do partition pruning at the subpartition level.  (Note to self: is it possible to see Bloom filters that operate at both the partition and subpartition level from a single join? I suspect not.). Here’s my code to create and populate both the partitioned table and a driving table for the query:


create table pt_composite_1 (
        part_key        number(8),
        subp_key        number(8),
        small_vc        varchar2(40),
        padding         varchar2(100)
)
nologging
partition by range(part_key)
subpartition by hash (subp_key)
subpartition template (
        subpartition g1,
        subpartition g2,
        subpartition g3,
        subpartition g4
)
(
        partition p01 values less than ( 10),
        partition p02 values less than ( 20),
        partition p03 values less than ( 30),
        partition p04 values less than ( 40),
        partition p05 values less than ( 50),
        partition p06 values less than ( 60),
        partition p07 values less than ( 70),
        partition p08 values less than ( 80),
        partition p09 values less than ( 90),
        partition p10 values less than (100),
        partition p11 values less than (110),
        partition p12 values less than (120)
)
;

insert into pt_composite_1 (
        part_key, subp_key, small_vc, padding
)
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        to_char(trunc((rownum-1)/20))   small_vc,
        rpad('x',100)                   padding
from
        dual
connect by
        rownum <= 25000
;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

create table driver (
        part_key        number(8),
        subp_key        number(8),
        test            number(4)
)
;

execute dbms_random.seed(0)

insert into driver
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        mod(rownum - 1, 30)
from
        dual
connect by
        level <= 60
;

begin
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'driver',
                method_opt      => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'pt_composite_1',
                method_opt      => 'for all columns size 1',
                granularity     => 'all'
        );
end;
/

So I’ve got a table with 12 partitions, each hash subpartitioned into 4 subpartitions, a total of 400,000 rows, and a driving table with 60 rows with two rows per value for column test, which probably means two separate subpartitions identified for most values of test. I set this data up to do a number of different experiments but the only result I’m going to report here is about the sub-partition key. Here’s a query that selects all the data from the partitioned table that matches the subp_key value from a subset of the driver table:


select
        ptc.part_key, ptc.subp_key, count(*), max(ptc.small_vc)
from
        pt_composite_1  ptc
where
        (ptc.subp_key) in (
                select  subp_key
                from    driver
                where   test = 0
        )
group by
        ptc.part_key, ptc.subp_key
;

The optimizer has the option to unnest the subquery and turn the query into a semi-join (specifically a right outer join), and we might hope to see a hash join with Bloom filtering being used to restrict the hash subpartitions that we visit. (We’ve (probably) picked two values for the subp_key, so we don’t expect to visit more than 2 of the hash subpartitions from each of the range partitions.) Here’s the execution plan I got, with rowsource execution statistics:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |      1 |        |       |       |    238 |00:00:01.74 |    2329 |       |       |          |
|   1 |  HASH GROUP BY                |                |      1 |    238 |       |       |    238 |00:00:01.74 |    2329 |   960K|   960K| 1377K (0)|
|*  2 |   HASH JOIN RIGHT SEMI        |                |      1 |  15997 |       |       |  15856 |00:00:01.69 |    2329 |  2440K|  2440K|  905K (0)|
|   3 |    PART JOIN FILTER CREATE    | :BF0000        |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|*  4 |     TABLE ACCESS FULL         | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   5 |    PARTITION RANGE ALL        |                |      1 |    400K|     1 |    12 |    104K|00:00:01.04 |    2306 |       |       |          |
|   6 |     PARTITION HASH JOIN-FILTER|                |     12 |    400K|:BF0000|:BF0000|    104K|00:00:00.63 |    2306 |       |       |          |
|   7 |      TABLE ACCESS FULL        | PT_COMPOSITE_1 |     12 |    400K|     1 |    48 |    104K|00:00:00.22 |    2306 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (part_keyentified by operation part_key):
---------------------------------------------------
   2 - access("PTC"."SUBP_KEY"="SUBP_KEY")
   4 - filter("TEST"=0)

Oracle has unnested the subquery and converted to a right outer semi-join using a hash join. While building the in-memory hash table it has constructed a Bloom filter at operation 3 of the plan to help it eliminate hash subpartitions, and used that Bloom filter at operation 6 of the plan. Our query does nothing to eliminate any of the range partitions so we can see operation 5 is a “partition range all”, and the application of the Bloom filter at operation 6 starts 12 times, once for each range partition. As we can see from operation 7, the Bloom filter generated by our selection from the driver table happened to identify just one subpartition – we start the TABLE (subpartition) ACCESS FULL 12 times, once for each range scan. If our driver data (and the Bloom filter) had identified 2 subpartitions we would have seen operation 7 start 24 times.

So we’ve met our first target – demonstrating that we can get a Bloom filter to eliminate at the subpartition level. Now we need to break things – the OP had a problem with a query that used Bloom filters on one system but didn’t use them for (nominally) the same setup on another system. Here’s my first attempt, with the resulting execution plan:


alter table pt_composite_1 add partition p13 values less than (130) subpartitions 8;

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |      1 |        |       |       |    238 |00:00:01.75 |    2628 |       |       |          |
|   1 |  HASH GROUP BY             |                |      1 |   3310 |       |       |    238 |00:00:01.75 |    2628 |   960K|   960K| 2529K (0)|
|*  2 |   HASH JOIN RIGHT SEMI     |                |      1 |  16000 |       |       |  15856 |00:00:01.71 |    2628 |  2440K|  2440K|  743K (0)|
|*  3 |    TABLE ACCESS FULL       | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   4 |    PARTITION RANGE ALL     |                |      1 |    400K|     1 |    13 |    104K|00:00:01.05 |    2605 |       |       |          |
|   5 |     PARTITION HASH SUBQUERY|                |     13 |    400K|KEY(SQ)|KEY(SQ)|    104K|00:00:00.64 |    2605 |       |       |          |
|   6 |      TABLE ACCESS FULL     | PT_COMPOSITE_1 |     13 |    400K|     1 |    56 |    104K|00:00:00.22 |    2306 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

I’ve added a new partition – with a different number of subpartitions from the table default. The Bloom filter has disappeared and the optimizer has decided to do subquery pruning instead. Drop the partition and recreate it with 2 subpartitions and the same thing happens; drop it and recreate it with 4 subpartitions and we’re back to a Bloom filter. It seems that the Bloom filter depends on every partition having the same number of subpartitions. (That’s not too surprising – the code to handle a Bloom filter when there are a variable number of subpartitions could get a little messy, and there probably aren’t many sites that use variable numbers of subpartitions.)

You might note from the Starts value for operation 5 (the subquery line) that the subquery had to run 13 times. Checking the 10046 trace file we can see the following SQL:


SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 0, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 1, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 2, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 3, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 4, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 5, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 6, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 7, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 8, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 9, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 10, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 11, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 12, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1

This is the optimizer trying to work out, for each of the 12 partitions, which subpartitions it needs to visit. In my case this resulted in a full tablescan of the driver table for each partition. For hash subpartitions, at least, this does seem to be overkill (and can anyone say “bind variables”) – wouldn’t it be possible to run the query once for the partition with the most subpartitions and then derive the correct subpartition number for all other cases ? Maybe, but perhaps that’s just too much special-case code, or maybe it’s on the todo list. Realistically we might guess that a driver table would be very small compared to the size of the subpartitions you were eventually going to scan, so the excess extra work may be a tiny fraction of the total workload – so the added complexity might be seen as too much investment (and risk) for too little return. Maybe in a future release there will be a bit of patching to reduce this overhead.

Conclusion

You may find that some execution plans involving hash subpartitions become less efficient if you don’t keep the number of subpartitions per partition constant across the entire table. I’ve only tested with range/hash composites but there may be other variations of composite partitoning where a similar change in plans occurs.

Footnote

I haven’t done any exhaustive investigation yet, but so far I haven’t been able to create a data set, or perhaps a query, that allows the optimizer to create a Bloom filter (or two) from the driving table and then filter both the range partitions and the hash subpartitions. The closest I’ve come is a plan that shows a Bloom filter being used to filter the range partitions followed by a pruning subquery for the hash subpartitions.

June 17, 2015

Reverse Key

Filed under: Indexing,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 1:11 pm BST Jun 17,2015

A question came up on the OTN database forum recently asking if you could have a partitioned index on a non-partitioned table.

(Aside: I’m not sure whether it would be quicker to read the manuals or try the experiment – either would probably be quicker than posing the question to the forum. As so often happens in these RTFM questions the OP didn’t bother to acknowledge any of the responses)

The answer to the question is yes – you can create a globally partitioned index, though if it uses range partitioning you have to specify a MAXVALUE partition. The interesting thing about the question, though is that several people tried to guess why it had been asked and then made suggestions based on the most likely guess (and wouldn’t it have been nice to see some response from the OP ). The common guess was that there was a performance problem with the high-value block of a sequence-based (or time-based) index – a frequent source of “buffer busy wait” events and other nasty side effects.

Unfortunately too many people suggesting reverse key as a solution to this “right-hand” problem. If you’re licensed for partitioning it’s almost certain that a better option would simple be to use global hash partitioning (with 2^N for some N) partitions. Using reverse keys can result in a bigger performance than the one you’re trying to avoid – you may end up turning a little time spent on buffer busy waits into a large amount of time spent on db file sequential reads. To demonstrate the issue I’ve created a sample script – and adjusted my buffer cache down to the appropriate scale:

create table t1(
	id	not null
)
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		rownum <= 1e4
)
select
	1e7 + rownum	id
from
	generator	v1,
	generator	v2
where
	rownum <= 1e7 
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1'
	);
end;
/

alter table t1 add constraint t1_pk primary key(id) 
using index 
	reverse 
	nologging 
;

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';

begin
	for i in 20000001..20010000 loop
		insert into t1 values(i);
	end loop;
end;
/

I’ve created a table with 10,000,000 rows using a sequential value as the primary key, then inserted “the next” 10,000 rows into the table in order. The index occupied about about 22,000 blocks, so to make my demonstration show you the type of effect you could get from a busy production system with more tables and many indexes I ran my test with the buffer cache limited to 6,000 blocks – a fair fraction of the total index size. Here’s a small section of the trace file from the test running 10.2.0.3 on an elderly machine:


WAIT #43: nam='db file sequential read' ela= 13238 file#=6 block#=12653 blocks=1 obj#=63623 tim=3271125590
WAIT #43: nam='db file sequential read' ela=  7360 file#=6 block#=12749 blocks=1 obj#=63623 tim=3271133150
WAIT #43: nam='db file sequential read' ela=  5793 file#=6 block#=12844 blocks=1 obj#=63623 tim=3271139110
WAIT #43: nam='db file sequential read' ela=  5672 file#=6 block#=12940 blocks=1 obj#=63623 tim=3271145028
WAIT #43: nam='db file sequential read' ela= 15748 file#=5 block#=13037 blocks=1 obj#=63623 tim=3271160998
WAIT #43: nam='db file sequential read' ela=  8080 file#=5 block#=13133 blocks=1 obj#=63623 tim=3271169314
WAIT #43: nam='db file sequential read' ela=  8706 file#=5 block#=13228 blocks=1 obj#=63623 tim=3271178240
WAIT #43: nam='db file sequential read' ela=  7919 file#=5 block#=13325 blocks=1 obj#=63623 tim=3271186372
WAIT #43: nam='db file sequential read' ela= 15553 file#=6 block#=13549 blocks=1 obj#=63623 tim=3271202115
WAIT #43: nam='db file sequential read' ela=  7044 file#=6 block#=13644 blocks=1 obj#=63623 tim=3271209420
WAIT #43: nam='db file sequential read' ela=  6062 file#=6 block#=13741 blocks=1 obj#=63623 tim=3271215648
WAIT #43: nam='db file sequential read' ela=  6067 file#=6 block#=13837 blocks=1 obj#=63623 tim=3271221887
WAIT #43: nam='db file sequential read' ela= 11516 file#=5 block#=13932 blocks=1 obj#=63623 tim=3271234852
WAIT #43: nam='db file sequential read' ela=  9295 file#=5 block#=14028 blocks=1 obj#=63623 tim=3271244368
WAIT #43: nam='db file sequential read' ela=  9466 file#=5 block#=14125 blocks=1 obj#=63623 tim=3271254002
WAIT #43: nam='db file sequential read' ela=  7704 file#=5 block#=14221 blocks=1 obj#=63623 tim=3271261991
WAIT #43: nam='db file sequential read' ela= 16319 file#=6 block#=14444 blocks=1 obj#=63623 tim=3271278492
WAIT #43: nam='db file sequential read' ela=  7416 file#=6 block#=14541 blocks=1 obj#=63623 tim=3271286129
WAIT #43: nam='db file sequential read' ela=  5748 file#=6 block#=14637 blocks=1 obj#=63623 tim=3271292163
WAIT #43: nam='db file sequential read' ela=  7131 file#=6 block#=14732 blocks=1 obj#=63623 tim=3271299489
WAIT #43: nam='db file sequential read' ela= 16126 file#=5 block#=14829 blocks=1 obj#=63623 tim=3271315883
WAIT #43: nam='db file sequential read' ela=  7746 file#=5 block#=14925 blocks=1 obj#=63623 tim=3271323845
WAIT #43: nam='db file sequential read' ela=  9208 file#=5 block#=15020 blocks=1 obj#=63623 tim=3271333239
WAIT #43: nam='db file sequential read' ela=  7708 file#=5 block#=15116 blocks=1 obj#=63623 tim=3271341141
WAIT #43: nam='db file sequential read' ela= 15484 file#=6 block#=15341 blocks=1 obj#=63623 tim=3271356807
WAIT #43: nam='db file sequential read' ela=  5488 file#=6 block#=15437 blocks=1 obj#=63623 tim=3271362623
WAIT #43: nam='db file sequential read' ela= 10447 file#=6 block#=15532 blocks=1 obj#=63623 tim=3271373342
WAIT #43: nam='db file sequential read' ela= 12565 file#=6 block#=15629 blocks=1 obj#=63623 tim=3271386741
WAIT #43: nam='db file sequential read' ela= 17168 file#=5 block#=15725 blocks=1 obj#=63623 tim=3271404135
WAIT #43: nam='db file sequential read' ela=  7542 file#=5 block#=15820 blocks=1 obj#=63623 tim=3271411882
WAIT #43: nam='db file sequential read' ela=  9400 file#=5 block#=15917 blocks=1 obj#=63623 tim=3271421514
WAIT #43: nam='db file sequential read' ela=  7804 file#=5 block#=16013 blocks=1 obj#=63623 tim=3271429519
WAIT #43: nam='db file sequential read' ela= 14470 file#=6 block#=16237 blocks=1 obj#=63623 tim=3271444168
WAIT #43: nam='db file sequential read' ela=  5788 file#=6 block#=16333 blocks=1 obj#=63623 tim=3271450154
WAIT #43: nam='db file sequential read' ela=  9630 file#=6 block#=16429 blocks=1 obj#=63623 tim=3271460008
WAIT #43: nam='db file sequential read' ela= 10910 file#=6 block#=16525 blocks=1 obj#=63623 tim=3271471174
WAIT #43: nam='db file sequential read' ela= 15683 file#=5 block#=16620 blocks=1 obj#=63623 tim=3271487065
WAIT #43: nam='db file sequential read' ela=  8094 file#=5 block#=16717 blocks=1 obj#=63623 tim=3271495454
WAIT #43: nam='db file sequential read' ela=  6670 file#=5 block#=16813 blocks=1 obj#=63623 tim=3271502293
WAIT #43: nam='db file sequential read' ela=  7852 file#=5 block#=16908 blocks=1 obj#=63623 tim=3271510360
WAIT #43: nam='db file sequential read' ela= 10500 file#=6 block#=17133 blocks=1 obj#=63623 tim=3271521039
WAIT #43: nam='db file sequential read' ela= 11038 file#=6 block#=17229 blocks=1 obj#=63623 tim=3271532275
WAIT #43: nam='db file sequential read' ela= 12432 file#=6 block#=17325 blocks=1 obj#=63623 tim=3271544974
WAIT #43: nam='db file sequential read' ela=  7784 file#=6 block#=17421 blocks=1 obj#=63623 tim=3271553331
WAIT #43: nam='db file sequential read' ela=  7774 file#=5 block#=17517 blocks=1 obj#=63623 tim=3271561346
WAIT #43: nam='db file sequential read' ela=  6583 file#=5 block#=17613 blocks=1 obj#=63623 tim=3271568146
WAIT #43: nam='db file sequential read' ela=  7901 file#=5 block#=17708 blocks=1 obj#=63623 tim=3271576231
WAIT #43: nam='db file sequential read' ela=  6667 file#=5 block#=17805 blocks=1 obj#=63623 tim=3271583259
WAIT #43: nam='db file sequential read' ela=  9427 file#=6 block#=18029 blocks=1 obj#=63623 tim=3271592988
WAIT #43: nam='db file sequential read' ela= 52334 file#=6 block#=18125 blocks=1 obj#=63623 tim=3271646055
WAIT #43: nam='db file sequential read' ela= 50512 file#=6 block#=18221 blocks=1 obj#=63623 tim=3271697284
WAIT #43: nam='db file sequential read' ela= 10095 file#=6 block#=18317 blocks=1 obj#=63623 tim=3271708095

Check the block number for this list of single block reads – we’re jumping through the index about 100 blocks at a time to read the next block where an index entry has to go. The jumps are the expected (and designed) effect of reverse key indexes: the fact that the jumps turn into physical disc reads is the (possibly unexpected) side effect. Reversing an index makes adjacent values look very different (by reversing the bytes) and go to different index leaf blocks: the purpose of the exercise is to scatter concurrent similar inserts across multiple blocks, but if you scatter the index entries you need to buffer a lot more of the index to keep the most recently used values in memory. Reversing the index may eliminate buffer busy waits, but it may increase time lost of db file sequential reads dramatically.

Here’s a short list of interesting statistics from this test – this time running on 11.2.0.4 on a machine with SSDs) comparing the effects of reversing the index with those of not reversing the index – normal index first:


Normal index
------------
CPU used by this session               83
DB time                                97
db block gets                      40,732
physical reads                         51
db block changes                   40,657
redo entries                       20,174
redo size                       5,091,436
undo change vector size         1,649,648

Repeat with reverse key index
-----------------------------
CPU used by this session              115
DB time                               121
db block gets                      40,504
physical reads                     10,006
db block changes                   40,295
redo entries                       19,973
redo size                       4,974,820
undo change vector size         1,639,232

Because of the SSDs there’s little difference in timing between the two sets of data and, in fact, all the other measures of work done are very similar except for the physical read, and the increase in reads is probably the cause of the extra CPU time thanks to both the LRU manipulation and the interaction with the operating system.

If you want to check the effect of index reversal you can take advantage of the sys_op_lbid() function to sample a little of your data – in my case I’ve queried the last 10,000 rows (values) in the table:


select 
	/*+ 
		cursor_sharing_exact 
		dynamic_sampling(0) 
		no_monitoring 
		no_expand 
		index_ffs(t1,t1_i1) 
		noparallel_index(t,t1_i1) 
	*/ 
	count (distinct sys_op_lbid( &m_ind_id ,'L',t1.rowid)) as leaf_blocks
from 
	t1
where 
	id between 2e7 + 1 and 2e7 + 1e4
;

The &m_ind_id substition variable is the object_id of the index t1_i1.

In my case, with an index of 22,300 leaf blocks, my 10,000 consecutive values were scattered over 9,923 leaf blocks. If I want access to “recent data” to be as efficient as possible I need to keep that many blocks of the index cached, compared to (absolute) worst case for my data 100 leaf blocks. When you reverse key an index you have to think about how much bigger you have to make your buffer cache to keep the performance constant.

July 2, 2014

Comparisons

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 5:09 pm BST Jul 2,2014

Catching up (still) from the Trivadis CBO days, here’s a little detail which had never crossed my mind before.


where   (col1, col2) < (const1, const2)

This isn’t a legal construct in Oracle SQL, even though it’s legal in other dialects of SQL. The logic is simple (allowing for the usual anomaly with NULL): the predicate should evaluate to true if (col1 < const1), or if (col1 = const1 and col2 < const2). The thought that popped into my mind when Markus Winand showed a slide with this predicate on it – and then pointed out that equality was the only option that Oracle allowed for multi-column operators – was that, despite not enabling the syntax, Oracle does implement the mechanism.

If you’re struggling to think where, it’s in multi-column range partitioning: (value1, value2) belongs in the partition with high value (k1, k2) if (value1 < k1) or if (value1 = k1 and value2 < k2).

December 26, 2013

Current row

Filed under: Oracle,Partitioning — Jonathan Lewis @ 9:52 pm BST Dec 26,2013

Here’s a question that I’ve had on my todo (and draft posts) list for a few years – so I’m presenting it as a task for anyone who can demonstrate the answer.

If you’ve got a pl/sql cursor open and you’re using the “update current of” syntax, what happens if you update the same row twice but the row comes from a partitioned table and moves to a new partition on the first update ?

If you have a demo that you want to include in the comments then start with “sourcecode” and end with “/sourcecode” – in square brackets, without the quotation marks – to get a fixed font format and space preservation. If you have a good demonstration or reference article that you can link to, a simple URL will do nicely.#

December 13, 2013

Bitmap Question

Filed under: Indexing,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 6:09 pm BST Dec 13,2013

This question came up on the OTN database forum a couple of months ago: “Why doesn’t Oracle allow you to create globally partitioned bitmap indexes?” The obvius answer is “It just doesn’t, okay.” But it can be quite interesting to think of reasons why a particular mechanism might not have been implemented – sometimes the answer can give you an insight into how a feature has been implemented, or it might suggest cases where a feature might not work very well, it might give you some ideas on how to work around a particular limitation, and sometimes it might just help to pass the time on a short flight.

(more…)

November 5, 2013

Deadlock

Filed under: Oracle,Parallel Execution,Partitioning — Jonathan Lewis @ 6:14 am BST Nov 5,2013

There an interesting example of a deadlock on the OTN database forum:

DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
PS-00000001-00000011        92     423     S             33     128     S     X
BF-2ed08c01-00000000        33     128     S             92     423     S     X

(more…)

October 17, 2013

Virtual date partitions

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 7:40 pm BST Oct 17,2013

I posted this question on twitter earlier on today (It was a thought that crossed my mind during a (terrible) presentation on partitioning that I had to sit through a few weeks ago – it’s always possible to be prompted to think of some interesting questions no matter how bad the presentation is, though):

Quiz: if you create a virtual column as trunc(date_col,’W’) and partition on it – will a query on date_col result in partition elimination?

(more…)

October 11, 2013

Testing Partitions

Filed under: Oracle,Partitioning — Jonathan Lewis @ 7:47 am BST Oct 11,2013

A quick check-list on testing new partitioning features:

  • Step 1 – test with a small amount of data in place, just to see if it works at all
  • Step 2 – test with a reasonably large amount of data in place so that you can spot any unexpected time being lost
  • Step 3 – test with a few uncommitted transactions from other sessions so that you can spot locking problems easily
  • Step 4 – test with SQL tracing enabled so that you can check the trace files for any recursive SQL threats
  • Step 5 – (optional, and harder to recognise) test with event 10704 set so that you can check the trace files for locking threats

Whatever else you do, though, make sure that you always have some data in every object because Oracle has some special optimisations for dealing with partitions that are known to be empty so, in the absence of data, you may be testing something that will never happen in production.

(more…)

September 11, 2013

MV Refresh

Filed under: compression,Infrastructure,Materialized view,Oracle,Partitioning,Performance — Jonathan Lewis @ 8:47 pm BST Sep 11,2013

I have a fairly strong preference for choosing simple solutions over complex solutions, and using Oracle-supplied packages over writing custom code – provided the difference in cost (whether that’s in human effort, run-time resources or licence fees) is acceptable. Sometimes, though, the gap between simplicity and cost is so extreme that a hand-crafted solution is clearly the better choice. Here’s an idea prompted by a recent visit to a site that makes use of materialized views and also happens to be licensed for the partitioning option.

(more…)

June 25, 2013

12c

Filed under: 12c,Infrastructure,Oracle,Partitioning,redo — Jonathan Lewis @ 11:43 pm BST Jun 25,2013

The news is out that 12c is now available for download (Code, Docs and Tutorials). There are plenty of nice little bits in it, and several important additions or enhancements to the optimizer, but there’s one feature that might prove to be very popular:

SQL> alter table p1 move partition solo ONLINE;

Table altered.

(more…)

October 2, 2012

Partitioning 12c

Filed under: 12c,Oracle,Partitioning — Jonathan Lewis @ 8:11 pm BST Oct 2,2012

Most useful presentation of OOW so far, from Hermann Baer of Oracle on improvements in partitioning features in 12c – and there are lots of useful ones, including:

Online move of a partition – so easy to compress a partition when it has reached its final “read-only” state

multiple partition maintenance in one operation – e.g. merge 3 monthly partitions into one quarterly partition, or split one partition into 24 (think about “how do I partition a non-partitioned table”, and 12c has just made it easier and quicker – exchange it into an empty partitioned table, then do one massive split).

partial indexing – define which partitions of a partitioned table should be included in the indexes you create on the table – and the optimizer also knows that different partitions need different plans (an enhancement of “table expansion”.

interval-reference partitionining – you can create an interval partitioned table, and use ref-partitioning to create child (and further decendent) tables, and their partitions will automatically be generated, truncated, and dropped as the parent is extended, truncated or dropped (needs enabled foreign key constraints).

Lots more details – and lots of stress-testing to be done – but I’m off to hear “the optimizer lady” talk about hints.

Reminder:

Oracle has a “safe harbour” slide at the start of all presentations about future developments pointing out that the information presented is an indication of direction, but not guaranteed to make it into production.

July 13, 2012

Recursive SQL

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 4:41 pm BST Jul 13,2012

A question came up on the OTN database forum a few days ago asking why a very simple query against a hash partitioned table was doing so much physical I/O to get a very small amount of data from a single partition:

I have table hash-partitioned on an indexed column “id” which is non-unique and part of my primary key. Inside each partition, rows with the same id are located close to each other which is done by dbms_redefinition reorg using orderby_cols. The intention is to reduce the amount of physical reads since there are no queries that don’t filter on the id column.

What I’m seeing is a lot of physical reads though. The first partition has roughly 80K rows, an average row length of 347, block size of 8K and compression … resulting in 821 blocks. And when (after flushing buffered cache and shared pool) submit a query that filters on “id” only and results is 106 selected rows, I see roughly 1400 physical reads.

The poster has clearly applied some careful thought to minimising the work required – so what’s gone wrong ? To help us analyse the problem he posted the results from running a query through autotrace:

(more…)

June 5, 2012

Truncate Partition

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 5:11 pm BST Jun 5,2012

A recent posting on the OTN database forum (which I didn’t answer correctly on my first attempt) raised the problem of truncating partitions when there are referential integrity constraints in place. Let’s start with a demonstration of the problem:
(more…)

September 30, 2011

Table rebuilds

Filed under: Oracle,Partitioning — Jonathan Lewis @ 11:06 am BST Sep 30,2011

Here’s a question from the OTN forum that I’ve seen fairly frequently in the last couple of years:

I have a table abc(c_date date, c_id varchr2(20), c_name varchar2);
The table is already range partitoned on “c_date” column and I have lot of data in that table.
Now my task is to create the hash partions under range; hash partiton needs 16 partitons.
I dont want to create another table with range and hash partitons and migrate the data. The amount of data I have is 3 terabytes.

(more…)

July 6, 2011

Indexing

Filed under: Indexing,Oracle,Partitioning,Tuning — Jonathan Lewis @ 5:39 pm BST Jul 6,2011

A question about partitioning came up on OTN a few days ago – but the really interesting part of the thread was about the choice of indexes, and how the choice of partitioning, combined with the data patterns, could make a big difference to the choice of indexes. I think the comments I made about indexes are worth seeing, so I’ve linked to the thread.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,217 other followers