Oracle Scratchpad

August 19, 2015

Oops

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 10:02 am BST Aug 19,2015

I made a mistake a few days ago following up a question on the OTN database forum. The question was about a problem creating a hash/list composite partitioned table, and one of the respondants suggested that perhaps the problem appeared because hash/list wasn’t a legal combination.

Spot on: so I confirmed that observation and supplied a link to the official Oracle white paper that listed the combinations that were legal in 11.2 for composite partitioning.  In fact, although I was fairly sure that hash/list wasn’t legal, I had even run up a quick test to check that the attempt would fail before I’d searched online for the document.

Fortunately other people had different ideas about the original failure, and one of them supplied a link to a thread on AskTom which actually included some SQL to create a hash/list composite partitioned table. Naturally I tested it (even though it was from Tom Kyte) and obviously it worked (after all it was from Tom Kyte) – and then I spotted the syntax error in the example I had created for myself.

Trust but verify … and then …

I had fallen into two traps – and one of them was documented in my own “Trust” posting from 2006.

The white paper was dated September 2009 (Tom’s example was dated June 2013) and as it says in my Trust note:

  • If its date is more than about 18 months old – don’t assume it’s (still) true
  • If it’s not your exact version number – don’t assume it’s (still) true

The second trap was an example of confirmation bias, I was fairly sure that my test was supposed to fail with Oracle error “ORA-00922: missing or invalid option”, so when it failed with exactly that error I didn’t check why it had failed and didn’t notice that I had swapped the order of a couple of clauses in the create table statement. It’s very easy to think you’ve done enough when testing – especially when your test results match your expectation.

Update – later that day

So I’ve had an email asking me how I got the ORA-00922.  Here’s the SQL as I wrote it – don’t give the answer in the comments, but how quickly can you spot what I did wrong ?

create table t1
(
        object_name,
        object_type,
        created
)
partition by hash(object_name) partitions 4
subpartition by list(object_type)
subpartition template (
        subpartition sp1 values ('TABLE'),
        subpartition sp2 values ('INDEX'),
        subpartition sp3 values (default)
)
as
select  object_name, object_type, created
from    all_objects
;

It’s a deceptive error (to me, at least) because even though I know it’s wrong it still looks right.

August 7, 2015

CBO catchup

Filed under: 12c,CBO,Oracle,Partitioning — Jonathan Lewis @ 1:10 pm BST Aug 7,2015

It’s interesting to watch the CBO evolving and see how an enhancement in one piece of code doesn’t necessarily echo through to all the other places it seems to fit. Here’s an example of an enhancement that spoiled (or, rather, made slightly more complicated) a little demonstration I had been running for about the last 15  years  – but (in a fashion akin to another partitioning limitation) doesn’t always work in exactly the way you might expect.

I wrote a note some time ago about the way that the optimizer could pre-compute the result of what I called a “fixed subquery” (such as “select 9100 from dual”) and take advantage of the value it derived to do a better job of estimating the cardinality for a query. That’s a neat feature (although it may cause some 3rd party applications a lot of pain as plans change on the upgrade to 11.2.0.4 or 12c) but it doesn’t work everywhere you might hope.

I’m going to create two (small) tables with the same data, but one of them is going to be a simple heap table and the other is going to be partitioned by range; then I’m going to run the same queries against the pair of them and show you the differences in execution plans. First the tables:


create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
from
        generator       v1
where
        rownum <= 1e4
;

create table pt1(
        id, v1, padding
)
partition by range (id) (
        partition p02000 values less than ( 2001),
        partition p04000 values less than ( 4001),
        partition p06000 values less than ( 6001),
        partition p08000 values less than ( 8001),
        partition p10000 values less than (10001)
)
as
select * from t1
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PT1',
                granularity      =>'ALL',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

alter table  t1 add constraint  t1_pk primary key(id);
alter table pt1 add constraint pt1_pk primary key(id) using index local;

create or replace function f(i_in  number)
return number
is
begin
        return i_in;
end;
/

Note that I’ve used ‘ALL’ as my granularity option – for such small tables this should mean that the statistics at the partition and global level are as accurate as they can be. And since the data is defined to be uniform I don’t expect the partitioning to introduce any peculiarities in the optimizer’s calculations of selectivity and cardinality. I’ve created the indexes after gathering stats on the tables – this is 12c (and 11.2.0.4) so the index stats will be collected with a 100% sample as the indexes are created. Finally I’ve created a function that simply returns its numeric input.

Now let’s run a couple of queries against the simple table and check the cardinality (Rows) predicted by the optimizer – the two plans follow the code that generated them:

set serveroutput off

select  max(v1)
from    t1
where   id between (select 500 from dual)
           and     (select 599 from dual)
;

select * from table(dbms_xplan.display_cursor);

select  max(v1)
from    t1
where   id between (select f(500) from dual)
           and     (select f(599) from dual)
;

select * from table(dbms_xplan.display_cursor);

======================
Actual Execution Plans
======================

select max(v1) from t1 where id between (select 500 from dual)
  and     (select 599 from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   101 |  1515 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |   101 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

select max(v1) from t1 where id between (select f(500) from dual)
     and     (select f(599) from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    25 |   375 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |    45 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

In the first plan the optimizer has recognised the values 500 and 599, so its range-based calculation has produced a (matching, and nearly correct) prediction of 101 rows. In the second plan the function call has hidden the values so the optimizer has had to use the arithmetic for “ranges with unknown values” – which means it uses guesses for the selectivity of 0.45% for the index and 0.25% for the table. Maybe in a future release that f(500) will be evaluated in the same way that we can trigger in-list calculation with the precompute_subquery hint.

Now we repeat the query, but using the partitioned table – showing only the trimmed output from dbms_xplan.display_cursor():

select max(v1) from pt1 where id between (select 500 from dual)
   and     (select 599 from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     4 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |   101 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

select max(v1) from pt1 where id between (select f(500) from dual)
      and     (select f(599) from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     3 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |    45 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

It’s great to see that the predicted cardinalities match the simple heap version exactly – but can you see anything odd about either of these plans ?

 

 

Pause for thought …

 

 

There’s nothing odd about the second plan but there’s a little puzzle in the first.

In theory, it seems, the optimizer is aware that the first query covers the range 500 – 599; so why are the pstart/pstop columns for operations 2-4 showing KEY-KEY, which usually means the optimizer knows that it will have some partition key values at run-time and will be able to do run-time partition elimination, but it doesn’t know what those key values are at parse time.

In this very simple case it’s (probably) not going to make any difference to the performance – but it may be worth some careful experimentation in more complex cases where you might have been hoping to get strict identification of partitions and partition-wise joins taking place. Yet another topic to put on the todo list of “pre-emptive investigations” with a reminder to re-run the tests from time to time.

 

 

July 13, 2015

Missing Bloom

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 1:37 pm BST Jul 13,2015

Here’s a little surprise that came up on the OTN database forum a few days ago. Rather than describe it, I’m just going to create a data set to demonstrate it, initially using 11.2.0.4 although the same thing happens on 12.1.0.2. The target is a query that joins to a range/hash composite partitioned table and uses a Bloom filter to do partition pruning at the subpartition level.  (Note to self: is it possible to see Bloom filters that operate at both the partition and subpartition level from a single join? I suspect not.). Here’s my code to create and populate both the partitioned table and a driving table for the query:


create table pt_composite_1 (
        part_key        number(8),
        subp_key        number(8),
        small_vc        varchar2(40),
        padding         varchar2(100)
)
nologging
partition by range(part_key)
subpartition by hash (subp_key)
subpartition template (
        subpartition g1,
        subpartition g2,
        subpartition g3,
        subpartition g4
)
(
        partition p01 values less than ( 10),
        partition p02 values less than ( 20),
        partition p03 values less than ( 30),
        partition p04 values less than ( 40),
        partition p05 values less than ( 50),
        partition p06 values less than ( 60),
        partition p07 values less than ( 70),
        partition p08 values less than ( 80),
        partition p09 values less than ( 90),
        partition p10 values less than (100),
        partition p11 values less than (110),
        partition p12 values less than (120)
)
;

insert into pt_composite_1 (
        part_key, subp_key, small_vc, padding
)
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        to_char(trunc((rownum-1)/20))   small_vc,
        rpad('x',100)                   padding
from
        dual
connect by
        rownum <= 25000
;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

create table driver (
        part_key        number(8),
        subp_key        number(8),
        test            number(4)
)
;

execute dbms_random.seed(0)

insert into driver
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        mod(rownum - 1, 30)
from
        dual
connect by
        level <= 60
;

begin
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'driver',
                method_opt      => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'pt_composite_1',
                method_opt      => 'for all columns size 1',
                granularity     => 'all'
        );
end;
/

So I’ve got a table with 12 partitions, each hash subpartitioned into 4 subpartitions, a total of 400,000 rows, and a driving table with 60 rows with two rows per value for column test, which probably means two separate subpartitions identified for most values of test. I set this data up to do a number of different experiments but the only result I’m going to report here is about the sub-partition key. Here’s a query that selects all the data from the partitioned table that matches the subp_key value from a subset of the driver table:


select
        ptc.part_key, ptc.subp_key, count(*), max(ptc.small_vc)
from
        pt_composite_1  ptc
where
        (ptc.subp_key) in (
                select  subp_key
                from    driver
                where   test = 0
        )
group by
        ptc.part_key, ptc.subp_key
;

The optimizer has the option to unnest the subquery and turn the query into a semi-join (specifically a right outer join), and we might hope to see a hash join with Bloom filtering being used to restrict the hash subpartitions that we visit. (We’ve (probably) picked two values for the subp_key, so we don’t expect to visit more than 2 of the hash subpartitions from each of the range partitions.) Here’s the execution plan I got, with rowsource execution statistics:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |      1 |        |       |       |    238 |00:00:01.74 |    2329 |       |       |          |
|   1 |  HASH GROUP BY                |                |      1 |    238 |       |       |    238 |00:00:01.74 |    2329 |   960K|   960K| 1377K (0)|
|*  2 |   HASH JOIN RIGHT SEMI        |                |      1 |  15997 |       |       |  15856 |00:00:01.69 |    2329 |  2440K|  2440K|  905K (0)|
|   3 |    PART JOIN FILTER CREATE    | :BF0000        |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|*  4 |     TABLE ACCESS FULL         | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   5 |    PARTITION RANGE ALL        |                |      1 |    400K|     1 |    12 |    104K|00:00:01.04 |    2306 |       |       |          |
|   6 |     PARTITION HASH JOIN-FILTER|                |     12 |    400K|:BF0000|:BF0000|    104K|00:00:00.63 |    2306 |       |       |          |
|   7 |      TABLE ACCESS FULL        | PT_COMPOSITE_1 |     12 |    400K|     1 |    48 |    104K|00:00:00.22 |    2306 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (part_keyentified by operation part_key):
---------------------------------------------------
   2 - access("PTC"."SUBP_KEY"="SUBP_KEY")
   4 - filter("TEST"=0)

Oracle has unnested the subquery and converted to a right outer semi-join using a hash join. While building the in-memory hash table it has constructed a Bloom filter at operation 3 of the plan to help it eliminate hash subpartitions, and used that Bloom filter at operation 6 of the plan. Our query does nothing to eliminate any of the range partitions so we can see operation 5 is a “partition range all”, and the application of the Bloom filter at operation 6 starts 12 times, once for each range partition. As we can see from operation 7, the Bloom filter generated by our selection from the driver table happened to identify just one subpartition – we start the TABLE (subpartition) ACCESS FULL 12 times, once for each range scan. If our driver data (and the Bloom filter) had identified 2 subpartitions we would have seen operation 7 start 24 times.

So we’ve met our first target – demonstrating that we can get a Bloom filter to eliminate at the subpartition level. Now we need to break things – the OP had a problem with a query that used Bloom filters on one system but didn’t use them for (nominally) the same setup on another system. Here’s my first attempt, with the resulting execution plan:


alter table pt_composite_1 add partition p13 values less than (130) subpartitions 8;

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |      1 |        |       |       |    238 |00:00:01.75 |    2628 |       |       |          |
|   1 |  HASH GROUP BY             |                |      1 |   3310 |       |       |    238 |00:00:01.75 |    2628 |   960K|   960K| 2529K (0)|
|*  2 |   HASH JOIN RIGHT SEMI     |                |      1 |  16000 |       |       |  15856 |00:00:01.71 |    2628 |  2440K|  2440K|  743K (0)|
|*  3 |    TABLE ACCESS FULL       | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   4 |    PARTITION RANGE ALL     |                |      1 |    400K|     1 |    13 |    104K|00:00:01.05 |    2605 |       |       |          |
|   5 |     PARTITION HASH SUBQUERY|                |     13 |    400K|KEY(SQ)|KEY(SQ)|    104K|00:00:00.64 |    2605 |       |       |          |
|   6 |      TABLE ACCESS FULL     | PT_COMPOSITE_1 |     13 |    400K|     1 |    56 |    104K|00:00:00.22 |    2306 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

I’ve added a new partition – with a different number of subpartitions from the table default. The Bloom filter has disappeared and the optimizer has decided to do subquery pruning instead. Drop the partition and recreate it with 2 subpartitions and the same thing happens; drop it and recreate it with 4 subpartitions and we’re back to a Bloom filter. It seems that the Bloom filter depends on every partition having the same number of subpartitions. (That’s not too surprising – the code to handle a Bloom filter when there are a variable number of subpartitions could get a little messy, and there probably aren’t many sites that use variable numbers of subpartitions.)

You might note from the Starts value for operation 5 (the subquery line) that the subquery had to run 13 times. Checking the 10046 trace file we can see the following SQL:


SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 0, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 1, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 2, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 3, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 4, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 5, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 6, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 7, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 8, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 9, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 10, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 11, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 12, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1

This is the optimizer trying to work out, for each of the 12 partitions, which subpartitions it needs to visit. In my case this resulted in a full tablescan of the driver table for each partition. For hash subpartitions, at least, this does seem to be overkill (and can anyone say “bind variables”) – wouldn’t it be possible to run the query once for the partition with the most subpartitions and then derive the correct subpartition number for all other cases ? Maybe, but perhaps that’s just too much special-case code, or maybe it’s on the todo list. Realistically we might guess that a driver table would be very small compared to the size of the subpartitions you were eventually going to scan, so the excess extra work may be a tiny fraction of the total workload – so the added complexity might be seen as too much investment (and risk) for too little return. Maybe in a future release there will be a bit of patching to reduce this overhead.

Conclusion

You may find that some execution plans involving hash subpartitions become less efficient if you don’t keep the number of subpartitions per partition constant across the entire table. I’ve only tested with range/hash composites but there may be other variations of composite partitoning where a similar change in plans occurs.

Footnote

I haven’t done any exhaustive investigation yet, but so far I haven’t been able to create a data set, or perhaps a query, that allows the optimizer to create a Bloom filter (or two) from the driving table and then filter both the range partitions and the hash subpartitions. The closest I’ve come is a plan that shows a Bloom filter being used to filter the range partitions followed by a pruning subquery for the hash subpartitions.

June 17, 2015

Reverse Key

Filed under: Indexing,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 1:11 pm BST Jun 17,2015

A question came up on the OTN database forum recently asking if you could have a partitioned index on a non-partitioned table.

(Aside: I’m not sure whether it would be quicker to read the manuals or try the experiment – either would probably be quicker than posing the question to the forum. As so often happens in these RTFM questions the OP didn’t bother to acknowledge any of the responses)

The answer to the question is yes – you can create a globally partitioned index, though if it uses range partitioning you have to specify a MAXVALUE partition. The interesting thing about the question, though is that several people tried to guess why it had been asked and then made suggestions based on the most likely guess (and wouldn’t it have been nice to see some response from the OP ). The common guess was that there was a performance problem with the high-value block of a sequence-based (or time-based) index – a frequent source of “buffer busy wait” events and other nasty side effects.

Unfortunately too many people suggesting reverse key as a solution to this “right-hand” problem. If you’re licensed for partitioning it’s almost certain that a better option would simple be to use global hash partitioning (with 2^N for some N) partitions. Using reverse keys can result in a bigger performance than the one you’re trying to avoid – you may end up turning a little time spent on buffer busy waits into a large amount of time spent on db file sequential reads. To demonstrate the issue I’ve created a sample script – and adjusted my buffer cache down to the appropriate scale:

create table t1(
	id	not null
)
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		rownum &lt;= 1e4
)
select
	1e7 + rownum	id
from
	generator	v1,
	generator	v2
where
	rownum &lt;= 1e7 
;

begin
	dbms_stats.gather_table_stats(
		ownname		 =&gt; user,
		tabname		 =&gt;'T1'
	);
end;
/

alter table t1 add constraint t1_pk primary key(id) 
using index 
	reverse 
	nologging 
;

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';

begin
	for i in 20000001..20010000 loop
		insert into t1 values(i);
	end loop;
end;
/

I’ve created a table with 10,000,000 rows using a sequential value as the primary key, then inserted “the next” 10,000 rows into the table in order. The index occupied about about 22,000 blocks, so to make my demonstration show you the type of effect you could get from a busy production system with more tables and many indexes I ran my test with the buffer cache limited to 6,000 blocks – a fair fraction of the total index size. Here’s a small section of the trace file from the test running 10.2.0.3 on an elderly machine:


WAIT #43: nam='db file sequential read' ela= 13238 file#=6 block#=12653 blocks=1 obj#=63623 tim=3271125590
WAIT #43: nam='db file sequential read' ela=  7360 file#=6 block#=12749 blocks=1 obj#=63623 tim=3271133150
WAIT #43: nam='db file sequential read' ela=  5793 file#=6 block#=12844 blocks=1 obj#=63623 tim=3271139110
WAIT #43: nam='db file sequential read' ela=  5672 file#=6 block#=12940 blocks=1 obj#=63623 tim=3271145028
WAIT #43: nam='db file sequential read' ela= 15748 file#=5 block#=13037 blocks=1 obj#=63623 tim=3271160998
WAIT #43: nam='db file sequential read' ela=  8080 file#=5 block#=13133 blocks=1 obj#=63623 tim=3271169314
WAIT #43: nam='db file sequential read' ela=  8706 file#=5 block#=13228 blocks=1 obj#=63623 tim=3271178240
WAIT #43: nam='db file sequential read' ela=  7919 file#=5 block#=13325 blocks=1 obj#=63623 tim=3271186372
WAIT #43: nam='db file sequential read' ela= 15553 file#=6 block#=13549 blocks=1 obj#=63623 tim=3271202115
WAIT #43: nam='db file sequential read' ela=  7044 file#=6 block#=13644 blocks=1 obj#=63623 tim=3271209420
WAIT #43: nam='db file sequential read' ela=  6062 file#=6 block#=13741 blocks=1 obj#=63623 tim=3271215648
WAIT #43: nam='db file sequential read' ela=  6067 file#=6 block#=13837 blocks=1 obj#=63623 tim=3271221887
WAIT #43: nam='db file sequential read' ela= 11516 file#=5 block#=13932 blocks=1 obj#=63623 tim=3271234852
WAIT #43: nam='db file sequential read' ela=  9295 file#=5 block#=14028 blocks=1 obj#=63623 tim=3271244368
WAIT #43: nam='db file sequential read' ela=  9466 file#=5 block#=14125 blocks=1 obj#=63623 tim=3271254002
WAIT #43: nam='db file sequential read' ela=  7704 file#=5 block#=14221 blocks=1 obj#=63623 tim=3271261991
WAIT #43: nam='db file sequential read' ela= 16319 file#=6 block#=14444 blocks=1 obj#=63623 tim=3271278492
WAIT #43: nam='db file sequential read' ela=  7416 file#=6 block#=14541 blocks=1 obj#=63623 tim=3271286129
WAIT #43: nam='db file sequential read' ela=  5748 file#=6 block#=14637 blocks=1 obj#=63623 tim=3271292163
WAIT #43: nam='db file sequential read' ela=  7131 file#=6 block#=14732 blocks=1 obj#=63623 tim=3271299489
WAIT #43: nam='db file sequential read' ela= 16126 file#=5 block#=14829 blocks=1 obj#=63623 tim=3271315883
WAIT #43: nam='db file sequential read' ela=  7746 file#=5 block#=14925 blocks=1 obj#=63623 tim=3271323845
WAIT #43: nam='db file sequential read' ela=  9208 file#=5 block#=15020 blocks=1 obj#=63623 tim=3271333239
WAIT #43: nam='db file sequential read' ela=  7708 file#=5 block#=15116 blocks=1 obj#=63623 tim=3271341141
WAIT #43: nam='db file sequential read' ela= 15484 file#=6 block#=15341 blocks=1 obj#=63623 tim=3271356807
WAIT #43: nam='db file sequential read' ela=  5488 file#=6 block#=15437 blocks=1 obj#=63623 tim=3271362623
WAIT #43: nam='db file sequential read' ela= 10447 file#=6 block#=15532 blocks=1 obj#=63623 tim=3271373342
WAIT #43: nam='db file sequential read' ela= 12565 file#=6 block#=15629 blocks=1 obj#=63623 tim=3271386741
WAIT #43: nam='db file sequential read' ela= 17168 file#=5 block#=15725 blocks=1 obj#=63623 tim=3271404135
WAIT #43: nam='db file sequential read' ela=  7542 file#=5 block#=15820 blocks=1 obj#=63623 tim=3271411882
WAIT #43: nam='db file sequential read' ela=  9400 file#=5 block#=15917 blocks=1 obj#=63623 tim=3271421514
WAIT #43: nam='db file sequential read' ela=  7804 file#=5 block#=16013 blocks=1 obj#=63623 tim=3271429519
WAIT #43: nam='db file sequential read' ela= 14470 file#=6 block#=16237 blocks=1 obj#=63623 tim=3271444168
WAIT #43: nam='db file sequential read' ela=  5788 file#=6 block#=16333 blocks=1 obj#=63623 tim=3271450154
WAIT #43: nam='db file sequential read' ela=  9630 file#=6 block#=16429 blocks=1 obj#=63623 tim=3271460008
WAIT #43: nam='db file sequential read' ela= 10910 file#=6 block#=16525 blocks=1 obj#=63623 tim=3271471174
WAIT #43: nam='db file sequential read' ela= 15683 file#=5 block#=16620 blocks=1 obj#=63623 tim=3271487065
WAIT #43: nam='db file sequential read' ela=  8094 file#=5 block#=16717 blocks=1 obj#=63623 tim=3271495454
WAIT #43: nam='db file sequential read' ela=  6670 file#=5 block#=16813 blocks=1 obj#=63623 tim=3271502293
WAIT #43: nam='db file sequential read' ela=  7852 file#=5 block#=16908 blocks=1 obj#=63623 tim=3271510360
WAIT #43: nam='db file sequential read' ela= 10500 file#=6 block#=17133 blocks=1 obj#=63623 tim=3271521039
WAIT #43: nam='db file sequential read' ela= 11038 file#=6 block#=17229 blocks=1 obj#=63623 tim=3271532275
WAIT #43: nam='db file sequential read' ela= 12432 file#=6 block#=17325 blocks=1 obj#=63623 tim=3271544974
WAIT #43: nam='db file sequential read' ela=  7784 file#=6 block#=17421 blocks=1 obj#=63623 tim=3271553331
WAIT #43: nam='db file sequential read' ela=  7774 file#=5 block#=17517 blocks=1 obj#=63623 tim=3271561346
WAIT #43: nam='db file sequential read' ela=  6583 file#=5 block#=17613 blocks=1 obj#=63623 tim=3271568146
WAIT #43: nam='db file sequential read' ela=  7901 file#=5 block#=17708 blocks=1 obj#=63623 tim=3271576231
WAIT #43: nam='db file sequential read' ela=  6667 file#=5 block#=17805 blocks=1 obj#=63623 tim=3271583259
WAIT #43: nam='db file sequential read' ela=  9427 file#=6 block#=18029 blocks=1 obj#=63623 tim=3271592988
WAIT #43: nam='db file sequential read' ela= 52334 file#=6 block#=18125 blocks=1 obj#=63623 tim=3271646055
WAIT #43: nam='db file sequential read' ela= 50512 file#=6 block#=18221 blocks=1 obj#=63623 tim=3271697284
WAIT #43: nam='db file sequential read' ela= 10095 file#=6 block#=18317 blocks=1 obj#=63623 tim=3271708095

Check the block number for this list of single block reads – we’re jumping through the index about 100 blocks at a time to read the next block where an index entry has to go. The jumps are the expected (and designed) effect of reverse key indexes: the fact that the jumps turn into physical disc reads is the (possibly unexpected) side effect. Reversing an index makes adjacent values look very different (by reversing the bytes) and go to different index leaf blocks: the purpose of the exercise is to scatter concurrent similar inserts across multiple blocks, but if you scatter the index entries you need to buffer a lot more of the index to keep the most recently used values in memory. Reversing the index may eliminate buffer busy waits, but it may increase time lost of db file sequential reads dramatically.

Here’s a short list of interesting statistics from this test – this time running on 11.2.0.4 on a machine with SSDs) comparing the effects of reversing the index with those of not reversing the index – normal index first:


Normal index
------------
CPU used by this session               83
DB time                                97
db block gets                      40,732
physical reads                         51
db block changes                   40,657
redo entries                       20,174
redo size                       5,091,436
undo change vector size         1,649,648

Repeat with reverse key index
-----------------------------
CPU used by this session              115
DB time                               121
db block gets                      40,504
physical reads                     10,006
db block changes                   40,295
redo entries                       19,973
redo size                       4,974,820
undo change vector size         1,639,232

Because of the SSDs there’s little difference in timing between the two sets of data and, in fact, all the other measures of work done are very similar except for the physical read, and the increase in reads is probably the cause of the extra CPU time thanks to both the LRU manipulation and the interaction with the operating system.

If you want to check the effect of index reversal you can take advantage of the sys_op_lbid() function to sample a little of your data – in my case I’ve queried the last 10,000 rows (values) in the table:


select 
	/*+ 
		cursor_sharing_exact 
		dynamic_sampling(0) 
		no_monitoring 
		no_expand 
		index_ffs(t1,t1_i1) 
		noparallel_index(t,t1_i1) 
	*/ 
	count (distinct sys_op_lbid( &m_ind_id ,'L',t1.rowid)) as leaf_blocks
from 
	t1
where 
	id between 2e7 + 1 and 2e7 + 1e4
;

The &m_ind_id substition variable is the object_id of the index t1_i1.

In my case, with an index of 22,300 leaf blocks, my 10,000 consecutive values were scattered over 9,923 leaf blocks. If I want access to “recent data” to be as efficient as possible I need to keep that many blocks of the index cached, compared to (absolute) worst case for my data 100 leaf blocks. When you reverse key an index you have to think about how much bigger you have to make your buffer cache to keep the performance constant.

July 2, 2014

Comparisons

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 5:09 pm BST Jul 2,2014

Catching up (still) from the Trivadis CBO days, here’s a little detail which had never crossed my mind before.


where   (col1, col2) < (const1, const2)

This isn’t a legal construct in Oracle SQL, even though it’s legal in other dialects of SQL. The logic is simple (allowing for the usual anomaly with NULL): the predicate should evaluate to true if (col1 < const1), or if (col1 = const1 and col2 < const2). The thought that popped into my mind when Markus Winand showed a slide with this predicate on it – and then pointed out that equality was the only option that Oracle allowed for multi-column operators – was that, despite not enabling the syntax, Oracle does implement the mechanism.

If you’re struggling to think where, it’s in multi-column range partitioning: (value1, value2) belongs in the partition with high value (k1, k2) if (value1 < k1) or if (value1 = k1 and value2 < k2).

December 26, 2013

Current row

Filed under: Oracle,Partitioning — Jonathan Lewis @ 9:52 pm BST Dec 26,2013

Here’s a question that I’ve had on my todo (and draft posts) list for a few years – so I’m presenting it as a task for anyone who can demonstrate the answer.

If you’ve got a pl/sql cursor open and you’re using the “update current of” syntax, what happens if you update the same row twice but the row comes from a partitioned table and moves to a new partition on the first update ?

If you have a demo that you want to include in the comments then start with “sourcecode” and end with “/sourcecode” – in square brackets, without the quotation marks – to get a fixed font format and space preservation. If you have a good demonstration or reference article that you can link to, a simple URL will do nicely.#

December 13, 2013

Bitmap Question

Filed under: Indexing,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 6:09 pm BST Dec 13,2013

This question came up on the OTN database forum a couple of months ago: “Why doesn’t Oracle allow you to create globally partitioned bitmap indexes?” The obvius answer is “It just doesn’t, okay.” But it can be quite interesting to think of reasons why a particular mechanism might not have been implemented – sometimes the answer can give you an insight into how a feature has been implemented, or it might suggest cases where a feature might not work very well, it might give you some ideas on how to work around a particular limitation, and sometimes it might just help to pass the time on a short flight.

(more…)

November 5, 2013

Deadlock

Filed under: Oracle,Parallel Execution,Partitioning — Jonathan Lewis @ 6:14 am BST Nov 5,2013

There an interesting example of a deadlock on the OTN database forum:

DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
PS-00000001-00000011        92     423     S             33     128     S     X
BF-2ed08c01-00000000        33     128     S             92     423     S     X

(more…)

October 17, 2013

Virtual date partitions

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 7:40 pm BST Oct 17,2013

I posted this question on twitter earlier on today (It was a thought that crossed my mind during a (terrible) presentation on partitioning that I had to sit through a few weeks ago – it’s always possible to be prompted to think of some interesting questions no matter how bad the presentation is, though):

Quiz: if you create a virtual column as trunc(date_col,’W’) and partition on it – will a query on date_col result in partition elimination?

(more…)

October 11, 2013

Testing Partitions

Filed under: Oracle,Partitioning — Jonathan Lewis @ 7:47 am BST Oct 11,2013

A quick check-list on testing new partitioning features:

  • Step 1 – test with a small amount of data in place, just to see if it works at all
  • Step 2 – test with a reasonably large amount of data in place so that you can spot any unexpected time being lost
  • Step 3 – test with a few uncommitted transactions from other sessions so that you can spot locking problems easily
  • Step 4 – test with SQL tracing enabled so that you can check the trace files for any recursive SQL threats
  • Step 5 – (optional, and harder to recognise) test with event 10704 set so that you can check the trace files for locking threats

Whatever else you do, though, make sure that you always have some data in every object because Oracle has some special optimisations for dealing with partitions that are known to be empty so, in the absence of data, you may be testing something that will never happen in production.

(more…)

September 11, 2013

MV Refresh

Filed under: compression,Infrastructure,Materialized view,Oracle,Partitioning,Performance — Jonathan Lewis @ 8:47 pm BST Sep 11,2013

I have a fairly strong preference for choosing simple solutions over complex solutions, and using Oracle-supplied packages over writing custom code – provided the difference in cost (whether that’s in human effort, run-time resources or licence fees) is acceptable. Sometimes, though, the gap between simplicity and cost is so extreme that a hand-crafted solution is clearly the better choice. Here’s an idea prompted by a recent visit to a site that makes use of materialized views and also happens to be licensed for the partitioning option.

(more…)

June 25, 2013

12c

Filed under: 12c,Infrastructure,Oracle,Partitioning,redo — Jonathan Lewis @ 11:43 pm BST Jun 25,2013

The news is out that 12c is now available for download (Code, Docs and Tutorials). There are plenty of nice little bits in it, and several important additions or enhancements to the optimizer, but there’s one feature that might prove to be very popular:

SQL> alter table p1 move partition solo ONLINE;

Table altered.

(more…)

October 2, 2012

Partitioning 12c

Filed under: 12c,Oracle,Partitioning — Jonathan Lewis @ 8:11 pm BST Oct 2,2012

Most useful presentation of OOW so far, from Hermann Baer of Oracle on improvements in partitioning features in 12c – and there are lots of useful ones, including:

Online move of a partition – so easy to compress a partition when it has reached its final “read-only” state

multiple partition maintenance in one operation – e.g. merge 3 monthly partitions into one quarterly partition, or split one partition into 24 (think about “how do I partition a non-partitioned table”, and 12c has just made it easier and quicker – exchange it into an empty partitioned table, then do one massive split).

partial indexing – define which partitions of a partitioned table should be included in the indexes you create on the table – and the optimizer also knows that different partitions need different plans (an enhancement of “table expansion”.

interval-reference partitionining – you can create an interval partitioned table, and use ref-partitioning to create child (and further decendent) tables, and their partitions will automatically be generated, truncated, and dropped as the parent is extended, truncated or dropped (needs enabled foreign key constraints).

Lots more details – and lots of stress-testing to be done – but I’m off to hear “the optimizer lady” talk about hints.

Reminder:

Oracle has a “safe harbour” slide at the start of all presentations about future developments pointing out that the information presented is an indication of direction, but not guaranteed to make it into production.

July 13, 2012

Recursive SQL

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 4:41 pm BST Jul 13,2012

A question came up on the OTN database forum a few days ago asking why a very simple query against a hash partitioned table was doing so much physical I/O to get a very small amount of data from a single partition:

I have table hash-partitioned on an indexed column “id” which is non-unique and part of my primary key. Inside each partition, rows with the same id are located close to each other which is done by dbms_redefinition reorg using orderby_cols. The intention is to reduce the amount of physical reads since there are no queries that don’t filter on the id column.

What I’m seeing is a lot of physical reads though. The first partition has roughly 80K rows, an average row length of 347, block size of 8K and compression … resulting in 821 blocks. And when (after flushing buffered cache and shared pool) submit a query that filters on “id” only and results is 106 selected rows, I see roughly 1400 physical reads.

The poster has clearly applied some careful thought to minimising the work required – so what’s gone wrong ? To help us analyse the problem he posted the results from running a query through autotrace:

(more…)

June 5, 2012

Truncate Partition

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 5:11 pm BST Jun 5,2012

A recent posting on the OTN database forum (which I didn’t answer correctly on my first attempt) raised the problem of truncating partitions when there are referential integrity constraints in place. Let’s start with a demonstration of the problem:
(more…)

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,383 other followers