Oracle Scratchpad

August 29, 2015

Index Usage – 2

Filed under: 12c,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 11:33 am BST Aug 29,2015

I’ve beeen a little slow in the follow-up to my previous posting on possibly redundant indexes. Before going into the slightly more complex stuff, there’s another peripheral point (but a very important one) that’s worth raising about how clever the optimizer can be. Here’s some code for 11.2.0.4 to demonstrate the point:

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(sysdate,'MM') + (rownum-1)/1440	date_time,
	rpad('x',100)				padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;


alter table t1 
add (
        date_only
	generated always as (trunc(date_time)) virtual 
)
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_i1 on t1(date_only) nologging;

So, in a two-step process, I’ve got an indexed virtual column that holds the value of the date_time column truncated to just the date. Would you expect the optimizer to use the index to execute the following query efficiently:


select
        max(id)
from
        t1
where
        date_time between sysdate-1 and sysdate
;

Note that the query references the real date_time column not the virtual column date_only, and it’s not using the expression that defines the index – yet the plan reads as follows:


-----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |     1 |    21 |    86   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE                       |       |     1 |    21 |            |          |
|*  2 |   FILTER                              |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |  1442 | 30282 |    86   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_TIME"<=SYSDATE@! AND "DATE_TIME">=SYSDATE@!-1)
   4 - access("T1"."DATE_ONLY">=TRUNC(SYSDATE@!-1) AND
              "T1"."DATE_ONLY"<=TRUNC(SYSDATE@!))

It’s a little odd that even though the optimizer in the newer versions of Oracle treats many simple expressions on sysdate as constants it still checks (operation 2) that “sysdate >= sysdate – 1” but perhaps that’s just a case of a piece of generic code that isn’t worth the risk or effort of changing.

The key point, of course, is that Oracle has manged to generate some extra predicates that allow it to use the “wrong” index to get a first approximation of the result set fairly efficiently, and then used the original predicate to reduce the approximation down to the correct result set.

If you want a quick sanity check on the access predicates used for operation 4:

  • If date_time >= sysdate-1, then trunc(date_time) >= trunc(sysdate-1)
  • If date_time <= sysdate, then trunc(date_time) <= trunc(sysdate)

This style of predicate manipulation also works numeric data types, but I think its greatest benefit (or convenience) is likely to come from date data types where the data has been created with a time component but there are frequent “date-only” queries. The days of creating two indexes as a workaround for handling generated code that wants to deal with both date_time and trunc(date_time) predicates should be numbered.

Footnote:

This enhancement probably appeared in 11.2.0.2, and I first saw it described in October 2013 in this blog note by Mohamed Houri; but 12c offers a delightful little enhancement – here’s what my table looks like in the 12c version of the code:


SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 DATE_TIME                              DATE
 PADDING                                VARCHAR2(100)

SQL> 

Where’s the virtual column ? The 12c version of my code had a slightly different definition for it:


alter table t1  
add (
        date_only
        invisible
        generated always as (trunc(date_time)) virtual
)
;

The transformation still works even when the virtual column is invisible. So (subject to searching for anomalies, boundary conditions and bugs) it looks as if you can change the table definition, and get the benefits of two indexes for the price of one without the application realising that anything has changed.

August 7, 2015

CBO catchup

Filed under: 12c,CBO,Oracle,Partitioning — Jonathan Lewis @ 1:10 pm BST Aug 7,2015

It’s interesting to watch the CBO evolving and see how an enhancement in one piece of code doesn’t necessarily echo through to all the other places it seems to fit. Here’s an example of an enhancement that spoiled (or, rather, made slightly more complicated) a little demonstration I had been running for about the last 15  years  – but (in a fashion akin to another partitioning limitation) doesn’t always work in exactly the way you might expect.

I wrote a note some time ago about the way that the optimizer could pre-compute the result of what I called a “fixed subquery” (such as “select 9100 from dual”) and take advantage of the value it derived to do a better job of estimating the cardinality for a query. That’s a neat feature (although it may cause some 3rd party applications a lot of pain as plans change on the upgrade to 11.2.0.4 or 12c) but it doesn’t work everywhere you might hope.

I’m going to create two (small) tables with the same data, but one of them is going to be a simple heap table and the other is going to be partitioned by range; then I’m going to run the same queries against the pair of them and show you the differences in execution plans. First the tables:


create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
from
        generator       v1
where
        rownum <= 1e4
;

create table pt1(
        id, v1, padding
)
partition by range (id) (
        partition p02000 values less than ( 2001),
        partition p04000 values less than ( 4001),
        partition p06000 values less than ( 6001),
        partition p08000 values less than ( 8001),
        partition p10000 values less than (10001)
)
as
select * from t1
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PT1',
                granularity      =>'ALL',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

alter table  t1 add constraint  t1_pk primary key(id);
alter table pt1 add constraint pt1_pk primary key(id) using index local;

create or replace function f(i_in  number)
return number
is
begin
        return i_in;
end;
/

Note that I’ve used ‘ALL’ as my granularity option – for such small tables this should mean that the statistics at the partition and global level are as accurate as they can be. And since the data is defined to be uniform I don’t expect the partitioning to introduce any peculiarities in the optimizer’s calculations of selectivity and cardinality. I’ve created the indexes after gathering stats on the tables – this is 12c (and 11.2.0.4) so the index stats will be collected with a 100% sample as the indexes are created. Finally I’ve created a function that simply returns its numeric input.

Now let’s run a couple of queries against the simple table and check the cardinality (Rows) predicted by the optimizer – the two plans follow the code that generated them:

set serveroutput off

select  max(v1)
from    t1
where   id between (select 500 from dual)
           and     (select 599 from dual)
;

select * from table(dbms_xplan.display_cursor);

select  max(v1)
from    t1
where   id between (select f(500) from dual)
           and     (select f(599) from dual)
;

select * from table(dbms_xplan.display_cursor);

======================
Actual Execution Plans
======================

select max(v1) from t1 where id between (select 500 from dual)
  and     (select 599 from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   101 |  1515 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |   101 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

select max(v1) from t1 where id between (select f(500) from dual)
     and     (select f(599) from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    25 |   375 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |    45 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

In the first plan the optimizer has recognised the values 500 and 599, so its range-based calculation has produced a (matching, and nearly correct) prediction of 101 rows. In the second plan the function call has hidden the values so the optimizer has had to use the arithmetic for “ranges with unknown values” – which means it uses guesses for the selectivity of 0.45% for the index and 0.25% for the table. Maybe in a future release that f(500) will be evaluated in the same way that we can trigger in-list calculation with the precompute_subquery hint.

Now we repeat the query, but using the partitioned table – showing only the trimmed output from dbms_xplan.display_cursor():

select max(v1) from pt1 where id between (select 500 from dual)
   and     (select 599 from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     4 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |   101 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

select max(v1) from pt1 where id between (select f(500) from dual)
      and     (select f(599) from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     3 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |    45 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

It’s great to see that the predicted cardinalities match the simple heap version exactly – but can you see anything odd about either of these plans ?

 

 

Pause for thought …

 

 

There’s nothing odd about the second plan but there’s a little puzzle in the first.

In theory, it seems, the optimizer is aware that the first query covers the range 500 – 599; so why are the pstart/pstop columns for operations 2-4 showing KEY-KEY, which usually means the optimizer knows that it will have some partition key values at run-time and will be able to do run-time partition elimination, but it doesn’t know what those key values are at parse time.

In this very simple case it’s (probably) not going to make any difference to the performance – but it may be worth some careful experimentation in more complex cases where you might have been hoping to get strict identification of partitions and partition-wise joins taking place. Yet another topic to put on the todo list of “pre-emptive investigations” with a reminder to re-run the tests from time to time.

 

 

July 20, 2015

12c Downgrade

Filed under: 12c,Bugs,CBO,Oracle — Jonathan Lewis @ 1:12 pm BST Jul 20,2015

No, not really – but sometimes the optimizer gets better and gives you worse performance as a side effect when you upgrade. Here’s an example where 11.2.0.4 recognised (with a few hints) the case for a nested loop semi-join and 12c went a bit further and recognised the opportunity for doing a cunning “semi_to_inner” transformation … which just happened to do more work than the 11g plan.

Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:


create table chi
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1                              id,
        trunc((rownum-1)/10)                    n1,
        trunc(dbms_random.value(0,1000))        n2,
        rpad('x',1000)                          padding
from
        generator
;

create table par
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1      id,
        rpad('x',1000)  padding
from
        generator
where
        rownum <= 1e3
;

alter table par modify id not null;
alter table par add constraint par_pk primary key(id)
-- deferrable
;

-- Now gather stats on the tables.

The code uses my standard framework that could generate a few million rows even though it’s only generating 1,000 in par and 10,000 in chi. The presence of the commented “deferrable” for the primary key constraint is for a secondary demonstration.

You’ll notice that the 1,000 values that appear in chi.n1 and chi.n2 are matched by the 1,000 rows that appear in the primary key of par – in some other experiment I’ve got two foreign keys from chi to par. Take note that the values in n1 are very well clustered because of the call to trunc() while the values in n2 are evenly scattered because of the call to dbms_random() – the data patterns are very different although the data content is very similar (the randomised data will still produce, on average, 10 rows per value).

So here’s the test code:


set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

prompt  =============================
prompt  Strictly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

prompt  =============================
prompt  Randomly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

set serveroutput on
alter session set statistics_level = typical;

In both cases I’ve hinted the query quite heavily, using internally generated query block names, into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.14 |    1450 |   1041 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.14 |    1450 |   1041 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.12 |    1450 |   1041 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.07 |    1434 |   1037 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1048 |     0   (0)|   1000 |00:00:00.01 |      16 |      4 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.12 |    5544 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.12 |    5544 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.10 |    5544 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.02 |    1434 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   4033 |   1048 |     0   (0)|   4033 |00:00:00.02 |    4110 |
-----------------------------------------------------------------------------------------------------

Notice how we do 1,000 starts of operation 4 when the data is well ordered, and 4,033 starts when the data is randomly ordered. For a semi-join nested loop the run-time engine uses the same caching mechanism as it does for scalar subqueries – a fact you can corroborate by removing the current hints and putting the /*+ no_unnest */ hint into the subquery so that you get a filter subquery plan, in which you will note exactly the same number of starts of the filter subquery.

As an extra benefit you’ll notice that the index probes for the well-ordered data have managed to take advantage of buffer pinning (statistic “buffer is pinned count”) – keeping the root block and most recent leaf block of the par_pk index pinned almost continually through the query; while the randomised data access unfortunately required Oracle to unpin and repin the index leaf blocks (even though there were only 2 in the index) as the scan of chi progessed.

Time to upgrade to 12.1.0.2 and see what happens:

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |    1448 |   1456 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |    1448 |   1456 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.20 |    1448 |   1456 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.06 |      16 |     27 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |   11588 |   1429 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |   11588 |   1429 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.19 |   11588 |   1429 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.07 |   10156 |      0 |
--------------------------------------------------------------------------------------------------------------

Take a close look at operation 2 – it’s no longer a NESTED LOOP SEMI, the optimizer has got so smart (recognising the nature of the primary key on par) that it’s done a “semi_to_inner” transformation. But a side effect of the transformation is that the scalar subquery caching mechanism no longer applies so we probe the par table 10,000 times. When the driving data is well-ordered this hasn’t made much difference to the buffer gets (and related latch activity), but when the data is randomised the extra probes ramp the buffer gets up even further.

The timings (A-time) on these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.

Footnote

There is a hint /*+ no_semi_to_inner(@queryblock object_alias) */ that I thought might persuade the optimizer to stick with the semi-join, but it didn’t have any effect. Since the “semi to inner” transformation (and the associated hints) are available in 11.2.0.4 I was a little puzzled that (a) I didn’t see the same transformation in the 11g test, and (b) that I couldn’t hint the transformation.  This makes me wonder if there’s a defect in 11g that might be fixed in a future patch.

It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might eventually become available  to standard joins (in cases such as “join to parent”, perhaps).

March 26, 2015

12c MView refresh

Filed under: 12c,Infrastructure,Materialized view,Oracle — Jonathan Lewis @ 1:19 pm BST Mar 26,2015

Some time ago I wrote a blog note describing a hack for refreshing a large materialized view with minimum overhead by taking advantage of a single-partition partitioned table. This note describes how Oracle 12c now gives you an official way of doing something similar – the “out of place” refresh.

I’ll start by creating a matieralized view and creating a couple of indexes on the resulting underlying table; then show you three different calls to refresh the view. The materialized view is based on all_objects so it can’t be made available for query rewrite (ORA-30354: Query rewrite not allowed on SYS relations) , and I haven’t created any materialized view logs so there’s no question of fast refreshes – but all I intend to do here is show you the relative impact of a complete refresh.


create materialized view mv_objects nologging
build immediate
refresh on demand
as
select
        *
from
        all_objects
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'mv_objects',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

create index mv_obj_i1 on mv_objects(object_name) nologging compress;
create index mv_obj_i2 on mv_objects(object_type, owner, data_object_id) nologging compress 2;

This was a default install of 12c, so there were about 85,000 rows in the view. You’ll notice that I’ve created all the objects as “nologging” – this will have an effect on the work done during some of the refreshes.

Here are the three variants I used – all declared explicitly as complete refreshes:


begin
	dbms_mview.refresh(
		list			=> 'MV_OBJECTS',
		method			=> 'C',
		atomic_refresh		=> true
	);
end;
/

begin
	dbms_mview.refresh(
		list			=> 'MV_OBJECTS',
		method			=> 'C',
		atomic_refresh		=> false
	);
end;
/

begin
	dbms_mview.refresh(
		list			=> 'MV_OBJECTS',
		method			=> 'C',
		atomic_refresh		=> false,
		out_of_place		=> true
	);
end;
/

The first one (atomic_refresh=>true) is the one you have to use if you want to refresh several materialized views simultaneously and keep them self consistent, or if you want to ensure that the data doesn’t temporarily disappear if all you’re worried about is a single view. The refresh works by deleting all the rows from the materialized view then executing the definition to generate and insert the replacement rows before committing. This generates a lot of undo and redo – especially if you have indexes on the materialized view as these have to be maintained “row by row” and may leave users accessing and applying a lot of undo for read-consistency purposes. An example at a recent client site refreshed a table of 6.5M rows with two indexes, taking about 10 minutes to refresh, generating 7GB of redo as it ran, and performing 350,000 “physical reads for flashback new”. This strategy does not take advantage of the nologging nature of the objects – and as a side effect of the delete/insert cycle you’re likely to see the indexes grow to roughly twice their optimal size and you may see the statistic “recursive aborts on index block reclamation” climbing as the indexes are maintained.

The second option (atomic_refresh => false) is quick and efficient – but may result in wrong results showing up in any code that references the materialized view (whether explicitly or by rewrite). The session truncates the underlying table, sets any indexes on it unusable, then reloads the table with an insert /*+ append */. The append means you get virtually no undo generated, and if the table is declared nologging you get virtually no redo. In my case, the session then dispatched two jobs to rebuild the two indexes – and since the indexes were declared nologging the rebuilds generated virtually no redo. (I could have declared them with pctfree 0, which would also have made them as small as possible).

The final option is the 12c variant – the setting atomic_refresh => false is mandatory if we want  out_of_place => true. With these settings the session will create a new table with a name of the form RV$xxxxxx where xxxxxx is the hexadecimal version of the new object id, insert the new data into that table (though not using the /*+ append */ hint), create the indexes on that table (again with names like RV$xxxxxx – where xxxxxx is the index’s object_id). Once the new data has been indexed Oracle will do some name-switching in the data dictionary (shades of exchange partition) to make the new version of the materialized view visible. A quirky detail of the process is that the initial create of the new table and the final drop of the old table don’t show up in the trace file  [Ed: wrong, see comment #1] although the commands to drop and create indexes do appear. (The original table, though it’s dropped after the name switching, is not purged from the recyclebin.) The impact on undo and redo generation is significant – because the table is empty and has no indexes when the insert takes place the insert creates a lot less undo and redo than it would if the table had been emptied by a bulk delete – even though the insert is a normal insert and not an append; then the index creation honours my nologging definition, so produces very little redo. At the client site above, the redo generated dropped from 7GB to 200MB, and the time dropped to 200 seconds which was 99% CPU time.

Limitations, traps, and opportunities

The manuals say that the out of place refresh can only be used for materialized views that are joins or aggregates and, surprisingly, you actually can’t use the method on a view that simply extracts a subset of rows and columns from a single table.  There’s a simple workaround, though – join the table to DUAL (or some other single row table if you want to enable query rewrite).

Because the out of place refresh does an ordinary insert into a new table the resulting table will have no statistics – you’ll have to add a call to gather them. (If you’ve previously been using a non-atomic refreshes this won’t be a new problem, of course). The indexes will have up to date statistics, of course, because they will have been created after the table insert.

The big opportunity, of course, is to change a very expensive atomic refresh into a much cheaper out of place refresh – in some special cases. My client had to use the atomic_refresh=>true option in 11g because they couldn’t afford to leave the table truncated (empty) for the few minutes it took to rebuild; but they might be okay using the out_of_place => true with atomic_refresh=>false in 12c because:

  • the period when something might break is brief
  • if something does go wrong the users won’t get wrong (silently missing) results, they’ll an Oracle error (probably ORA-08103: object no longer exists)
  • the application uses this particular materialized view directly (i.e. not through query rewrite), and the query plans are all quick, light-weight indexed access paths
  • most queries will probably run correctly even if they run through the moment of exchange

I don’t think we could guarantee that last statement – and Oracle Corp. may not officially confirm it – and it doesn’t matter how many times I show queries succeeding but it’s true. Thanks to “cross-DDL read-consistency” as it was called in 8i when partition-exchange appeared and because the old objects still exist in the data files, provided your query doesn’t hit a block that has been overwritten by a new object, or request a space management block that was zero-ed out on the “drop” a running query can keep on using the old location for an object after it has been replaced by a newer version. If you want to make the mechanism as safe as possible you can help – put each relevant materialized view (along with its indexes) into its own tablespace so that the only thing that is going to overwrite an earlier version of the view is the stuff you create on the next refresh.

 

January 26, 2015

In-memory DB

Filed under: 12c,in-memory,Infrastructure,Oracle — Jonathan Lewis @ 8:18 am BST Jan 26,2015

A recent thread on the OTN database forum supplied some code that seemed to show that In-memory DB made no difference to performance when compared with the traditional row-store mechanism and asked why not.  (It looked as if the answer was that almost all the time for the tests was spent returning the 3M row result set to the SQL*Plus client 15 rows at a time.)

The responses on the thread led to the question:  Why would the in-memory (column-store) database be faster than simply having the (row-store) data fully cached in the buffer cache ?

Maria Colgan has addressed this question in part 3 of her series on In-Memory Database (see catalogue below), but I thought I’d repeat the basic ideas with a few imaginative numbers thrown in to give a better flavour of what’s going on. So imagine you have a table sized at 100GB, with 100 columns of data where every column holds data of a similar size and pattern; and want to execute a query of the form: select {list of columns} from big_table where colX >=  {some constant}.

Traditional Tablescan (approximation) with table fully cached

For each block of the 100GB, Oracle has to acquire the “cache buffers chains” latch, pin the block, drop the latch, and scan the block testing each row, then acquire the latch, unpin the block, and drop the latch.  Scanning the block requires a walk through the row directory and, for each row pointer, jumping to the correct location in the block for the row, stepping along the row one column at a time to get to the correct row, and then checking the column  value. If the column matches the predicate extract, format and return the required columns from that row.

It’s a lot of memory to scan, in a large number of small steps, involving a lot of latching and pinning – which translates into a lot of CPU. On the plus side, although it’s very expensive to identify the required rows, it’s very cheap to construct and return a row once you’ve identified it.

In-memory scan (approximation)

  1. Given the way I’ve described the table (100GB, 100 similar columns), Oracle can recreate it in memory as 100 lists of 1GB each; so we can identify the rows we want by scanning one of those lists and applying the predicate – so only 1GB of (fairly contigious) memory to scan, rather than 100GB, and virtually no latching and pinning to find that memory, and no jumping around following pointers and counting along rows.
  2. But it’s probably NOT 1GB of memory to scan, because Oracle has some compression/deduplication methods that it can apply to the data that often reduces the memory requirement by a factor of four of five – so perhaps it’s only 250 MB of memory to scan.
  3. But Oracle breaks large lists into chunks, so rather than 250MB of contiguous memory, it’s likely to be 250 chunks of 1MB; and as part of the description of each chunk Oracle records the lowest and highest value in the chunk; [ed:  Christian Antognini says that the metadata list every distinct value for the chunk] so it can check the predicate against the boundary values on the chunk and be in a position to discard entire chunks without looking at their content. So, depending on the actual content and pattern of the data, we may examine only a handful of chunks in detail, dropping the scan from 250MB to, perhaps, 10MB.
  4. And we still haven’t finished there, because the code that handles the column-store is able to take advantage of the SIMD (Single Instruction,  Multiple Data) operations in the CPU to check the predicate against 4, or possibly even 8, values simultaneously at a speed matching a single column comparison that has to be used for the traditional cached row-store. (Given the way that Oracle  (probably) handles the compression, I suspect that this final advantage is only significant for range-based predicates – but that’s purely conjectural).

So the huge benefit you get from In-Memory column store, compared to “fully cached row-store”  is that you are likely to scan far less memory to identify the rows that match your predicate, and do it with far less “infrastructure” activity like latching and pinning. The potential saving in CPU usage is huge.

There is, of course, a penalty to pay. As you identify the rows of interest you can (in effect) construct a bitmap representing the position of those rows in the table (and if you have predicates on more than 1 column you can use bitmap operations on the individual column bitmaps to identify the rows you want in the final result) but then you have to construct the row that goes into the result set. If your query is interested in just 5 columns that means using the bitmap to locate the correct entry from each of 5 separate column lists; if your query is interested in 99 column that means extracting the correct entry from each of 99 separate column lists. Identifying the rows you want can be very  quick, building the final result may be relatively slow.

Soundbite summary

  • Using the In-memory Database, you can identify the rows you want very quickly but it’s relatively slow to reconstruct them.
  • Using a fully cached traditional row-store, it’s relatively slow to identify the rows you want, but once you’ve found them you spend no time reconstructing them.

Bear in mind that this is an extremely simplified analysis and ignores all sorts of details about read-consistency, the probability of physical reads, the probability of scanning blocks instead of scanning chunks, and so on; my focus is only on the type of activity that differentiates row-store handling from column-store handling when all the data is in memory so that you can have some appreciation of why the benefits available from In-memory DB can vary with the pattern of the data and the way you use it.

Catalogue of blog posts by Maria Colgan and Andy Rivenes:

Other articles on In-memory DB:

 

January 5, 2015

Re-optimization

Filed under: 12c,Oracle,Performance — Jonathan Lewis @ 12:54 pm BST Jan 5,2015

The spelling is with a Z rather than an S because it’s an Oracle thing.

Tim Hall has just published a set of notes on Adaptive Query Optimization, so I thought I’d throw in one extra little detail.

When the optimizer decides that a query execution plan involves some guesswork the run-time engine can monitor the execution of the query and collect some information that may allow the optimizer to produce a better execution plan. The interaction between all the re-optimization mechanisms can get very messy, so I’m not going to try to cover all the possibilities – read Tim’s notes for that – but one of the ways in which this type of information can be kept is now visible in a dynamic performance view.


SQL> select hash_value, sql_id, child_number, hint_text from V$sql_reoptimization_hints;

HASH_VALUE SQL_ID        CHILD_NUMBER HINT_TEXT
---------- ------------- ------------ ----------------------------------------------------------------
3680288808 d5k20s7dpth18            0 OPT_ESTIMATE (@"SEL$6E65FD6A" GROUP_BY ROWS=28.000000 )

1 row selected.

SQL> select sql_id, child_number, is_reoptimizable, sql_text from v$sql where sql_id = 'd5k20s7dpth18';

SQL_ID        CHILD_NUMBER I SQL_TEXT
------------- ------------ - --------------------------------------------------
d5k20s7dpth18            0 Y select * from V$OPTIMIZER_PROCESSING_RATE
d5k20s7dpth18            1 N select * from V$OPTIMIZER_PROCESSING_RATE

2 rows selected.

To get the results above I had flushed the shared pool and then (using the SYS schema) executed the indicated query twice. Pulling the execution plans from memory I got (with a little cosmetic cleaning) these results:


SQL> select * from table(dbms_xplan.display_cursor('d5k20s7dpth18',null));

SQL_ID  d5k20s7dpth18, child number 0
-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                              |       |       |     1 (100)|          |
|   1 |  VIEW                | GV$OPTIMIZER_PROCESSING_RATE |    28 |  2072 |     1 (100)| 00:00:01 |
|   2 |   HASH GROUP BY PIVOT|                              |    28 |  1568 |     1 (100)| 00:00:01 |
|*  3 |    FIXED TABLE FULL  | X$OPTIM_CALIB_STATS          |    28 |  1568 |     0   (0)|          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("INST_ID"=USERENV('INSTANCE'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL_ID  d5k20s7dpth18, child number 1
-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                              |       |       |     1 (100)|          |
|   1 |  VIEW                | GV$OPTIMIZER_PROCESSING_RATE |    28 |  2072 |     1 (100)| 00:00:01 |
|   2 |   HASH GROUP BY PIVOT|                              |    28 |  1568 |     1 (100)| 00:00:01 |
|*  3 |    FIXED TABLE FULL  | X$OPTIM_CALIB_STATS          |    28 |  1568 |     0   (0)|          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("INST_ID"=USERENV('INSTANCE'))

Note
-----
   - statistics feedback used for this statement

The Note section for child zero tells us that we used “dynamic statistics” (and the other_xml column in v$sql_plan still calls it dynamic sampling) as part of the statement optimization; but the Note section for child one tells us that we used “statistics feedback” (and the other_xml column in v$sql_plan still calls it cardinality feedback) – in other words we didn’t do dynamic sampling, we re-used the sample (saved as that in-memory opt_estimate hint) from the previous child.

There’s no great significance in the SQL statement I chose to demonstrate this point – I just happened to be doing some investigation into the Optimizer Processing Rates when I paused to read Tim’s notes, so I based this post on one of the statements I had run a few minutes before.

October 19, 2014

Plan depth

Filed under: 12c,Bugs,Execution plans,Oracle,subqueries — Jonathan Lewis @ 6:20 pm BST Oct 19,2014

A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 months ago referencing 11.2.0.3; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the depth calculated by Oracle doesn’t match the level that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:


create table t1
as
select
	rownum 			id,
	lpad(rownum,200)	padding
from	all_objects
where	rownum <= 2500
;

create table t2
as
select	* from t1
;

-- call dbms_stats to gather stats

explain plan for
select
	case mod(id,2)
		when 1 then (select max(t1.id) from t1 where t1.id <= t2.id)
		when 0 then (select max(t1.id) from t1 where t1.id >= t2.id)
	end id
from	t2
;

select * from table(dbms_xplan.display);

It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  2500 | 10000 | 28039   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE      |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL   | T2   |  2500 | 10000 |    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"<=:B1)
   4 - filter("T1"."ID">=:B1)

As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.


SQL> select id, parent_id from plan_table;

        ID  PARENT_ID
---------- ----------
         0
         1          0
         2          1
         3          0
         4          3
         5          0

--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     2K|    9K|  28039 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  TABLE ACCESS FULL        |T2        |     2K|    9K|     11 |       |       |
--------------------------------------------------------------------------------

So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.

 

Update 20th Oct 2014

A question via twitter – does the error also show up with dbms_xplan.display_cursor(), SQL tuning sets, AWR, etc. or is it just a defect of explain plan. Since the depth is (probably) a derived value for display purposes that Oracle doesn’t use internally for executing the plan I would be inclined to assume that the defect is universal, but I’ve only checked it through explain plan/display, and through execution/display_cursor().

 

 

 

September 21, 2014

Partitioned Clusters

Filed under: 12c,clusters,Infrastructure,Oracle — Jonathan Lewis @ 7:28 pm BST Sep 21,2014

In case you hadn’t noticed it, partitioning has finally reached clusters in 12c – specifically 12.1.0.2. They’re limited to hash clusters with range partitioning, but it may be enough to encourage more people to use the technology. Here’s a simple example of the syntax:


create cluster pt_hash_cluster (
        id              number(8,0),
        d_date          date,
        small_vc        varchar2(8),
        padding         varchar2(100)
)
-- single table
hashkeys 10000
hash is id
size 700
partition by range (d_date) (
        partition p2011Jan values less than (to_date('01-Feb-2011','dd-mon-yyyy')),
        partition p2011Feb values less than (to_date('01-Mar-2011','dd-mon-yyyy')),
        partition p2011Mar values less than (to_date('01-Apr-2011','dd-mon-yyyy'))
)
;

I’ve been waiting for them to appear ever since 11.2.0.1 and the TPC-C benchmark that Oracle did with them – they’ve been a long time coming (check the partition dates – that gives you some idea of when I wrote this example).

Just to add choice (a.k.a. confusion) 12.1.0.2 has also introduce attribute clustering so you can cluster data in single tables without creating clusters – but only while doing direct path loads or table moves. The performance intent is similar, though the technology and circumstances of use are different.

September 4, 2014

Group By Bug

Filed under: 12c,Bugs,dbms_xplan,Execution plans,Oracle — Jonathan Lewis @ 5:11 pm BST Sep 4,2014

This just in from OTN Database Forum – a surprising little bug with “group by elimination” exclusive to 12c.


alter session set nls_date_format='dd-Mon-yyyy hh24:mi:ss';

select
       /* optimizer_features_enable('12.1.0.1')*/
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from (
  select
        ts, max(fieldb) fieldb
  from (
  select trunc(sysdate) - 1/24 ts, 1 fieldb from dual
  union all
  select trunc(sysdate) - 2/24 ts, 2 fieldb from dual
  union all
  select trunc(sysdate) - 3/24 ts, 3 fieldb from dual
  union all
  select trunc(sysdate) - 4/24 ts, 4 fieldb from dual
  union all
  select trunc(sysdate) - 5/24 ts, 5 fieldb from dual
  )
  group by ts
)
group by trunc (ts,'DD')
/

You might expect to get one row as the answer – but this is the result I got, with the execution plan pulled from memory:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00          1
03-Sep-2014 00:00:00          5
03-Sep-2014 00:00:00          4
03-Sep-2014 00:00:00          2
03-Sep-2014 00:00:00          3

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    11 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    11  (10)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    10   (0)| 00:00:01 |
|   3 |    UNION-ALL     |      |       |       |            |          |
|   4 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

You’ll notice that I’ve got an “optimizer_features_enable()” comment in the code: if I change it into a hint I get the following (correct) result and plan:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00         15

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    12 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    12  (17)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    11  (10)| 00:00:01 |
|   3 |    HASH GROUP BY |      |     5 |    60 |    11  (10)| 00:00:01 |
|   4 |     VIEW         |      |     5 |    60 |    10   (0)| 00:00:01 |
|   5 |      UNION-ALL   |      |       |       |            |          |
|   6 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Somehow 12.1.0.2 has managed to get confused by the combination of “group by ts” and “group by trunc(ts,’DD’)” and has performed “group-by elimination” when it shouldn’t have. If you use the ‘outline’ option for dbms_xplan.display_cursor() you’ll find that the bad result reports the hint elim_groupby(@sel$1), which leads to an alternative solution to hinting the optimizer_features level. Start the code like this:


select
       /*+ qb_name(main) no_elim_groupby(@main) */
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from  ...

The (no_)elim_groupby is a hint that appeared in v$sql_hints only in the 12.1.0.2.

August 27, 2014

In-memory Consistency

Filed under: 12c,in-memory,Oracle — Jonathan Lewis @ 7:00 pm BST Aug 27,2014

A comment on one of my early blogs about the 12c in-memory database option asked how Oracle would deal with read-consistency. I came up with a couple of comments outlining the sort of thing I would look for in a solution, and this note is an outline on how I started to tackle the question – with a couple of the subsequent observations. The data is (nearly) the same as the data I generated for my previous article on the in-memory database (and I’m running 12.1.0.2, of course):


create table t1 nologging
as
select  *
from    all_objects
where   rownum <= 50000
;

insert /*+ append */ into t1 select * from t1;
commit;

insert /*+ append */ into t1 select * from t1;
commit;

insert /*+ append */ into t1 select * from t1;
commit;

begin
        dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1');
end;
/

alter table t1
        inmemory priority high memcompress for query low
        inmemory memcompress for query high (object_type)
;

In this case I’ve made the inmemory priority high and I haven’t set any column to “no inmemory” although I have made one column different from the rest (v$_im_column_level doesn’t get populated unless there is some variation across columns). I have to say I couldn’t get very consistent behaviour in terms of when the data finally got into memory with this table creation – possibly something to do with using “alter table” rather than “create table” – but a second “alter table t1 inmemory;” seemed to do the trick if Oracle was playing hard to get.

Once I’d checked that the table was in memory I collected performance figures from v$mystat and v$session_event for the following query:


select
        /* Test Run */
        last_ddl_time
from
        t1
where   t1.created > trunc(sysdate)
and     t1.object_type = 'TABLE'
and     t1.subobject_name is not null
;

Once I was satisfied that the in-memory option was working correctly, I went through the following steps:

  • Session 1: set transaction read only;
  • Session 1: run the query and collect performance figures
  • Session 2: do several small, committed, updates, modifying a total of 30 or 40 random rows
  • Session 2: Flush the buffer cache – so that we can see future block acquisition
  • Session 1: re-run the query and collect performance figures – compare and contrast

The effect of the “set transaction read only;” was to force the session to do some extra work in the second execution of the query to make the data read-consistent back to the start of the “transaction”. The results were as follows (don’t forget that some of the numbers will relate to the action of collecting the performance figures):


First execution
===============
Name                                                                     Value
----                                                                     -----
Requests to/from client                                                      4
opened cursors cumulative                                                    5
user calls                                                                   6
recursive calls                                                              3
session logical reads                                                    6,680
DB time                                                                      1
non-idle wait count                                                          4
consistent gets                                                              3
consistent gets from cache                                                   3
consistent gets pin                                                          3
consistent gets pin (fastpath)                                               3
logical read bytes from cache                                           24,576
calls to kcmgcs                                                              7
calls to get snapshot scn: kcmgss                                            1
table scans (long tables)                                                    1
table scans (IM)                                                             1
IM scan CUs memcompress for query low                                        1
session logical reads - IM                                               6,677
IM scan bytes in-memory                                              5,155,309
IM scan bytes uncompressed                                          45,896,824
IM scan CUs columns theoretical max                                         18
IM scan rows                                                           399,984
IM scan rows optimized                                                 399,984
IM scan CUs split pieces                                                     1
IM scan CUs predicates received                                              3
IM scan CUs predicates applied                                               3
IM scan CUs predicates optimized                                             1
IM scan CUs pruned                                                           1
IM scan segments minmax eligible                                             1
session cursor cache hits                                                    5
workarea executions - optimal                                                1
parse count (total)                                                          4
execute count                                                                5
bytes sent via SQL*Net to client                                         1,150

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
SQL*Net message to client                             9           0           0.00        .000           0
SQL*Net message from client                           9           0           0.44        .049       8,408

Second Execution
================
Name                                                                     Value
----                                                                     -----
Requests to/from client                                                      4
opened cursors cumulative                                                    5
user calls                                                                   6
recursive calls                                                              3
session logical reads                                                    6,728
DB time                                                                      1
non-idle wait count                                                         35
enqueue requests                                                             2
enqueue releases                                                             2
physical read total IO requests                                             29
physical read total multi block requests                                    24
physical read total bytes                                            6,987,776
cell physical IO interconnect bytes                                  6,987,776
consistent gets                                                             92
consistent gets from cache                                                  92
consistent gets pin                                                         44
consistent gets pin (fastpath)                                               5
consistent gets examination                                                 48
logical read bytes from cache                                          753,664
physical reads                                                             853
physical reads cache                                                       853
physical read IO requests                                                   29
physical read bytes                                                  6,987,776
consistent changes                                                          48
free buffer requested                                                      894
CR blocks created                                                           41
physical reads cache prefetch                                              824
physical reads prefetch warmup                                             713
shared hash latch upgrades - no wait                                        43
calls to kcmgcs                                                              7
calls to get snapshot scn: kcmgss                                            1
file io wait time                                                        3,861
data blocks consistent reads - undo records applied                         48
rollbacks only - consistent read gets                                       41
table scans (long tables)                                                    1
table scans (IM)                                                             1
table scan rows gotten                                                   2,803
table scan blocks gotten                                                    41
IM scan CUs memcompress for query low                                        1
session logical reads - IM                                               6,636
IM scan bytes in-memory                                              5,155,309
IM scan bytes uncompressed                                          45,896,824
IM scan CUs columns theoretical max                                         18
IM scan rows                                                           399,984
IM scan rows optimized                                                 399,984
IM scan rows cache                                                          48
IM scan blocks cache                                                        41
IM scan CUs split pieces                                                     1
IM scan CUs predicates received                                              3
IM scan CUs predicates applied                                               3
IM scan CUs predicates optimized                                             1
IM scan CUs pruned                                                           1
IM scan segments minmax eligible                                             1
session cursor cache hits                                                    5
workarea executions - optimal                                                1
parse count (total)                                                          4
execute count                                                                5
bytes sent via SQL*Net to client                                         1,150
bytes received via SQL*Net from client                                   1,772
SQL*Net roundtrips to/from client                                            4

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
Disk file operations I/O                              2           0           0.01        .003           0
db file sequential read                               5           0           0.01        .001           0
db file scattered read                               24           0           0.38        .016           0
SQL*Net message to client                            10           0           0.01        .001           0
SQL*Net message from client                          10           0           0.76        .076       8,408

There’s quite a lot of stats which probably aren’t interesting – and there’s one detail that is important but doesn’t appear (at least not clearly) and that’s the fact that the table in question had about 6,800 blocks below its highwater mark.

So, what do the stats tell us? The most obvious change, of course, is that we had to do some physical reads to get a result set: 24 multiblock reads and 5 single block reads (the latter from the undo tablespace). This is echoed in the session stats as 853 “physical reads cache” from 29 “physical read IO requests”. We can then see the specific read-consistency work (in two ways – with a third close approximation):

consistent changes                                                          48
CR blocks created                                                           41

data blocks consistent reads - undo records applied                         48
rollbacks only - consistent read gets                                       41

IM scan rows cache                                                          48
IM scan blocks cache                                                        41

We applied 48 undo change vectors to fix up 41 blocks to the correct point in time and used them to read 48 rows – the last pair of figures won’t necessarily match the first two pairs, but they do give us a measure of how much data we had to acquire from the cache when trying to do an in-memory scan.

The number 41 actually appears a couple more times: it’s “table scan blocks gotten” (which might seem a little odd since we got far more than 41 blocks by multiblock reads – but we only really wanted 41), and it’s also the change (downwards) in “session logical reads – IM”. Even when Oracle does a pure in-memory query it calculates the number of blocks it would have been reading and reports that number as “session logical reads” and “session logical reads – IM” – so there’s another way to get confused about buffer visits and another statistic to cross-check when you’re trying to work out how to calculate “the buffer hit ratio” ;)

After the first read the scattered reads all seemed to be 32 blocks of “intermittent” tablescan – perhaps this is a measure of the number of blocks that are compressed into a single in-memory chunk (for query low), but perhaps it’s a side effect of the “physical reads prefetch warmup” that Oracle may do when the cache has a lot of empty space. I’ll leave it as an exercise to the reader to refine the test (or think of a different test) to determine whether it’s the former or latter; it’s quite important to find this out because if Oracle is tracking change at the “in-memory chunk” rather than at the block level then a small amount of high-precision change to an in-memory table could result in a relatively large amount of “redundant” I/O as a long-running query tried to stay read-consistent.

August 24, 2014

In-memory Aggregation

Filed under: 12c,in-memory,Oracle,Performance — Jonathan Lewis @ 8:05 pm BST Aug 24,2014

The title of this piece is the name given to a new feature in 12.1.0.2, and since I’ve recently blogged about a limitation of the in-memory option I thought I’d pick this feature as the next obvious thing to blog about. This is a bit of a non sequitur, though, as the feature seems to have nothing whatsoever to do with the in-memory option; instead it’s a cunning mechanism combining aspects of the star-transformation (but without the bitmap indexes), Bloom filters, and “group-by” placement to minimise the cost of aggregation over high-volume joins.

Here’s a small data set I’ll use to demonstrate the feature:

create table towns
as
select
        rownum                                          id,
        trunc(dbms_random.value(1,51))                  id_state,
        rpad(dbms_random.string('U',3),12)              name,
        cast (rpad('x',trunc(dbms_random.value(50,60)),'x') as varchar2(60))    padding
from
        all_objects
where
        rownum <= 2000
;

alter table towns add constraint to_pk primary key(id);
create index to_i1 on towns(name);

create table people(
        id_town_work    number(6,0)     not null
                constraint pe_fk_wo references towns,
        id_town_home    number(6,0)     not null
                constraint pe_fk_ho references towns,
        dummy1          varchar2(10),
        dummy2          varchar2(10),
        padding         varchar2(110)
);

insert /*+ append */  into people
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(dbms_random.value(1,2001)),
        trunc(dbms_random.value(1,2001)),
        lpad(rownum,10),
        lpad(rownum,10),
        cast (rpad('x',trunc(dbms_random.value(50,60)),'x') as varchar2(60))    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'Towns',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'People',
                method_opt       => 'for all columns size 1'
        );
end;
/

I have a “large” table of people, and people can live in one town and work in another. Towns are in states and I’m interested in a report about people who live in one specific state but work in another (e.g. New Hampshre vs. Massachusetts). There are a couple of “padding” columns to represent the data associated with each town and person that I might want in a report. To keep things simple I haven’t extended the query out to select the name of the state. Here’s the query I might use to get the report I want:

select
        wt.padding,
        ht.padding,
        max(pe.padding)
from
        towns   wt,
        towns   ht,
        people  pe
where
        wt.id_state     = 1
and     pe.id_town_work = wt.id
and     ht.id_state     = 2
and     pe.id_town_home = ht.id
group by
        wt.padding,
        ht.padding
;

You might expect something like the following as the execution plan:

-------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |    40 |  7600 |   179   (6)| 00:00:01 |
|   1 |  HASH GROUP BY       |        |    40 |  7600 |   179   (6)| 00:00:01 |
|*  2 |   HASH JOIN          |        |    40 |  7600 |   178   (6)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL | TOWNS  |    40 |  2520 |     5   (0)| 00:00:01 |
|*  4 |    HASH JOIN         |        |  2000 |   248K|   173   (6)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| TOWNS  |    40 |  2520 |     5   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| PEOPLE |   100K|  6250K|   165   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PE"."ID_TOWN_HOME"="HT"."ID")
   3 - filter("HT"."ID_STATE"=2)
   4 - access("PE"."ID_TOWN_WORK"="WT"."ID")
   5 - filter("WT"."ID_STATE"=1)

The order of operation (row source generation) is: 3,5,6,4,2,1 – we build a hash table from the towns in state 2; build a hash table from the towns in state 1; scan the people table and probe the state 1 hash table, any row that survives is used to probe the state 2 hash table, and the rows that survive the second probe are aggregated to produce the answer.

When you do this type of thing with very large data sets one of the potential performance threats comes from the volume of data you have to aggregate. As we’ve joined the three tables the row length grows significantly before we finally aggregate (admittedly my data set is small, and the number of rows we’re going to aggregate also appears to be very small according to the predictions). There’s also (in the early stages at least) the potential for passing a very large number of rows from the fact table through the first (and possibly subsequent) hash join, doing a lot of work to eliminate the rows you don’t need.

In 12c the optimizer can choose to minimise both these threat points using “vector transformation”. (The name may also reflect the possibility that the code path will take advantage of vector processing (SIMD) operations if they’re available in the CPU.) Here’s the execution path I got when I added the /*+ vector_transform(@sel$1) */ hint to my query – it’s not sensible for this tiny data set, of course, but the hint is a way of learning what Oracle can do:

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                           |    71 | 15975 |   184   (6)| 00:00:01 |

|   1 |  TEMP TABLE TRANSFORMATION    |                           |       |       |            |          |

|   2 |   LOAD AS SELECT              | SYS_TEMP_0FD9D6661_31399B |       |       |            |          |
|   3 |    VECTOR GROUP BY            |                           |    10 |   670 |     6  (17)| 00:00:01 |
|   4 |     KEY VECTOR CREATE BUFFERED| :KV0000                   |    40 |  2680 |     6  (17)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL        | TOWNS                     |    40 |  2520 |     5   (0)| 00:00:01 |

|   6 |   LOAD AS SELECT              | SYS_TEMP_0FD9D6662_31399B |       |       |            |          |
|   7 |    VECTOR GROUP BY            |                           |    10 |   670 |     6  (17)| 00:00:01 |
|   8 |     KEY VECTOR CREATE BUFFERED| :KV0001                   |    40 |  2680 |     6  (17)| 00:00:01 |
|*  9 |      TABLE ACCESS FULL        | TOWNS                     |    40 |  2520 |     5   (0)| 00:00:01 |

|  10 |   HASH GROUP BY               |                           |    71 | 15975 |   172   (6)| 00:00:01 |
|* 11 |    HASH JOIN                  |                           |    71 | 15975 |   171   (5)| 00:00:01 |
|  12 |     TABLE ACCESS FULL         | SYS_TEMP_0FD9D6662_31399B |    10 |   670 |     2   (0)| 00:00:01 |
|* 13 |     HASH JOIN                 |                           |    71 | 11218 |   169   (5)| 00:00:01 |
|  14 |      TABLE ACCESS FULL        | SYS_TEMP_0FD9D6661_31399B |    10 |   670 |     2   (0)| 00:00:01 |
|  15 |      VIEW                     | VW_VT_C444E4CB            |    71 |  6461 |   167   (5)| 00:00:01 |
|  16 |       HASH GROUP BY           |                           |    71 |  5112 |   167   (5)| 00:00:01 |
|  17 |        KEY VECTOR USE         | :KV0000                   |    71 |  5112 |   167   (5)| 00:00:01 |
|  18 |         KEY VECTOR USE        | :KV0001                   |  2000 |   132K|   167   (5)| 00:00:01 |
|* 19 |          TABLE ACCESS FULL    | PEOPLE                    |   100K|  6250K|   165   (4)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("WT"."ID_STATE"=1)
   9 - filter("HT"."ID_STATE"=2)
  11 - access("ITEM_10"=INTERNAL_FUNCTION("C0") AND "ITEM_11"="C2")
  13 - access("ITEM_8"=INTERNAL_FUNCTION("C0") AND "ITEM_9"="C2")
  19 - filter(SYS_OP_KEY_VECTOR_FILTER("PE"."ID_TOWN_HOME",:KV0001) AND
              SYS_OP_KEY_VECTOR_FILTER("PE"."ID_TOWN_WORK",:KV0000))

There are three critical components to this plan: first, we create a couple of “Key Vectors” from the towns table, then we use those key vectors while scanning the people table and aggregate a minimal data set, finally we join back to the data associated with the key vectors. Reprising my introductory paragraph: the creation and use of the key vectors is similar to the Bloom filter approach; the final join-back is similar to the strategy used in Star Transformations (especially the ones where temp tables appear), and the key vector allows the high-volume fact data to be aggregated as much as possible before adding extra row-length from the dimensions.

In outline Oracle does the following:

  • scan the towns table to extract the id, and padding columns for id_state = 1 / work town – this produced 50 rows with my data set
  • manipulate the result to extract the distinct values of padding, and give each value a unique numeric identifier – this is the information that goes into the temp table (with one extra column) – this produced 10 rows
  • manipulate the result again to produce an in-memory array of (town.id, temp_table.identifier) – this is the key vector, containing 50 elements.

The second temp table and key vector for (id_state = 2 /work town ) will be created in the same way.

As the fact table is scanned Oracle can apply the key vectors very efficiently (we hope) to pick out the people rows that would be involved in the final aggregate and associate with each relevant row the two padding identifiers that belong to that row (this step is a bit like doing 2 hash joins – but presumably much more efficient; Bloom filtering does something very similar). After selecting the minimum number of rows we can aggregate them on the  two padding identifiers (an example of the “aggregate early”/”place group by” principle – aggregate before joining); finally we join back to the two temporary tables to translate the short padding identifiers into the long padding values (just as we do in star transformations with temporary table transformation).

Strangely we aggregate again after the join-back. I don’t think it’s necessary in this case because I’m fairly sure that the join back is on a unique set of columns – but perhaps this is a generic strategy allowing for variations in the mechanism, including such things as cases where the vector transform is only applied to a subset of the dimension tables.

Technically you could almost emulate this strategy in any version of Oracle (and I probably have at various times over the last few years) with the major limitation that the “KEY VECTOR USE” operations at lines 17 and 18 would have to be replaced with hash joins; no doubt, though, the major CPU saving of this approach is the difference between consecutive hash joins and what appears to be (from the execution stats) concurrent vector filtering. At some point – if a client needs the extra performance edge before they get to 12c – I’ll have to see if I can engineer an example in 11g that emulates the whole plan but uses Bloom filtering to approximate the key vector filtering.

 

August 15, 2014

In-memory limitation

Filed under: 12c,in-memory,Oracle — Jonathan Lewis @ 8:51 pm BST Aug 15,2014

I’ve been struggling to find time to have any interaction with the Oracle community for the last couple of months – partly due to workload, partly due to family matters and (okay, I’ll admit it) I really did have a few days’ holiday this month. So making my comeback with a bang – here’s a quick comment about the 12.1.0.2 in-memory feature, and how it didn’t quite live up to my expectation; but it’s also a comment about assumptions, tests, and inventiveness.

One of the 12.1.0.2 manuals tells us that the optimizer can combine the in-memory columnar storage mechanism with the “traditional” row store mechanisms – unfortunately it turned out that this didn’t mean quite what I had hoped; I had expected too much of the first release. Here’s a quick demo of what doesn’t happen, what I wanted to happen, and how I made it happen, starting with a simple definition (note – this is running 12.1.02 and the inmemory_size parameter has been set to enable the feature):


create table t1 nologging
as
select	*
from	all_objects
where	rownum <= 50000
;

alter table t1 inmemory
no inmemory (object_id, object_name)
inmemory memcompress for query low (object_type)
-- all other columns implicitly inmemory default
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

begin
	dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1');
end;
/

rem
rem	Needs select on v$_im_column_level granted
rem

select
	table_name,
	column_name,
	inmemory_compression
from
	v$im_column_level
where	owner = user
and	table_name = 'T1'
order by
	segment_column_id
;

explain plan for
select
	last_ddl_time, created
from
	t1
where	t1.created > trunc(sysdate)
and	t1.object_type = 'TABLE'
and	t1.subobject_name is not null
;

select * from table(dbms_xplan.display);

All I’ve done at this point is create a table with most of its columns in-memory and a couple excluded from the columnar store. This is modelling a table with a very large number of columns where most queries are targeted at a relatively small subset of the data; I don’t want to have to store EVERY column in-memory in order to get the benefit of the feature, so I’m prepared to trade lower memory usage in general against slower performance for some queries. The query against v$im_column_level shows me which columns are in-memory, and how they are stored. The call to explain plan and dbms_xplan then shows that a query involving only columns that are declared in-memory could take advantage of the feature. Here’s the resulting execution plan:

-----------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |    27 |    73   (9)| 00:00:01 |
|*  1 |  TABLE ACCESS INMEMORY FULL| T1   |     1 |    27 |    73   (9)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - inmemory("T1"."SUBOBJECT_NAME" IS NOT NULL AND
              "T1"."OBJECT_TYPE"='TABLE' AND "T1"."CREATED">TRUNC(SYSDATE@!))
       filter("T1"."SUBOBJECT_NAME" IS NOT NULL AND
              "T1"."OBJECT_TYPE"='TABLE' AND "T1"."CREATED">TRUNC(SYSDATE@!))

Note that the table access full includes the inmemory keyword; and the predicate section shows the predicates that have taken advantage of in-memory columns. The question is – what happens if I add the object_id column (which I’ve declared as no inmemory) to the select list.  Here’s the resulting plan:


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    32 |  1818   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    32 |  1818   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T1"."SUBOBJECT_NAME" IS NOT NULL AND
              "T1"."OBJECT_TYPE"='TABLE' AND "T1"."CREATED">TRUNC(SYSDATE@!))

There’s simply no sign of an in-memory strategy – it’s just a normal full tablescan (and I didn’t stop with execution plans, of course, I ran other tests with tracing, snapshots of dynamic performance views etc. to check what was actually happening at run-time).

In principle there’s no reason why Oracle couldn’t use the in-memory columns that appear in the where clause to determine the rowids of the rows that I need to select and then visit the rows by rowid but (at present) the optimizer doesn’t generate a plan to do that. There’s no reason, though, why we couldn’t try to manipulate the SQL to produce exactly that effect:


explain plan for
select
        /*+ no_eliminate_join(t1b) no_eliminate_join(t1a) */
        t1b.object_id, t1b.last_ddl_time, t1b.created
from
        t1 t1a, t1 t1b
where   t1a.created > trunc(sysdate)
and     t1a.object_type = 'TABLE'
and     t1a.subobject_name is not null
and     t1b.rowid = t1a.rowid
;

select * from table(dbms_xplan.display);

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    64 |    74   (9)| 00:00:01 |
|   1 |  NESTED LOOPS               |      |     1 |    64 |    74   (9)| 00:00:01 |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |     1 |    31 |    73   (9)| 00:00:01 |
|   3 |   TABLE ACCESS BY USER ROWID| T1   |     1 |    33 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory("T1A"."SUBOBJECT_NAME" IS NOT NULL AND
              "T1A"."OBJECT_TYPE"='TABLE' AND "T1A"."CREATED">TRUNC(SYSDATE@!))
       filter("T1A"."SUBOBJECT_NAME" IS NOT NULL AND
              "T1A"."OBJECT_TYPE"='TABLE' AND "T1A"."CREATED">TRUNC(SYSDATE@!))

I’ve joined the table to itself by rowid, hinting to stop the optimizer from getting too clever and eliminating the join. In the join I’ve ensured that one reference to the table can be met completely from the in-memory columns, isolating the no inmemory columns to the second reference to the table. It is significant that the in-memory tablescan is vastly lower in cost than the traditional tablescan – and there will be occasions when this difference (combined with the knowledge that the target is a relatively small number of rows) means that this is a very sensible strategy. Note – the hints I’ve used happen to be sufficient to demonstrate method but I’d be much more thorough in a production system (possibly using an SQL baseline to fix the execution plan).

Of course, this method is just another example of the “visit a table twice to improve the efficiency” strategy that I wrote about a long time ago; and it’s this particular variant of the strategy that allows you to think of the in-memory columnar option as an implementation of OLTP bitmap indexes.

August 3, 2014

Analogy – 2

Filed under: 12c,in-memory,Oracle — Jonathan Lewis @ 1:41 pm BST Aug 3,2014

I suggested a little while ago that thinking about the new in-memory columnar store as a variation on the principle of bitmap indexes was quite a good idea. I’ve had a couple of emails since then asking me to expand on the idea because “it’s wrong” – I will follow that one up as soon as I can, but in the meantime here’s another angle for connecting old technology with new technology:

It is a feature of in-memory column storage that the default strategy is to store all columns in memory. But it’s quite likely that you’ve got some tables where a subset of the columns are frequently accessed and other columns are rarely accessed and it might seem a waste of resources to keep all the columns in memory just for the few occasional queries. So the feature allows you to de-select columns with the “no inmemory({list of columns})” option – it’s also possible to use different degrees of compression for different columns, of course, which adds another dimension to design and planning – but that’s a thought for another day.

So where else do you see an example of being selective about where you put columns ?  Index Organized Tables (IOTs) – where you can choose to put popular columns in the index (IOT_TOP) segment, and the rest in the overflow segment, knowing that this can give you good performance for critical queries, but less desirable performance for the less important or less frequent queries. IOTs allow you to specify the (typically short) list of columns you want “in” – it might be quite nice if the same were true for the in-memory option, I can imagine cases where I would want to include a small set of columns and exclude a very large number of them (for reasons that bring me back to the bitmap index analogy).

 

July 27, 2014

Analogy

Filed under: 12c,in-memory,Oracle — Jonathan Lewis @ 8:02 am BST Jul 27,2014

So 12.1.0.2 is out with a number of interesting new features, of which the most noisily touted is the “in-memory columnar storage” feature. As ever the key to making best use of a feature is to have an intuitive grasp of what it gives you, and it’s often the case that a good analogy helps you reach that level of understanding; so here’s the first thought I had about the feature during one of the briefing days run by Maria Colgan.

“In-memory columnar storage gives you bitmap indexes on OLTP systems without the usual disastrous locking side effects.”

Obviously the analogy isn’t perfect … but I think it’s very close:  for each column stored you use a compression technique to pack the values for a large number of rows into a very small space, and for each stored row you can derive the rowid by arithmetic.  In highly concurrent OLTP systems there’s still room for some contention as the session journals are applied to the globally stored compressed columns (but then, private redo introduces some effects of that sort anyway); and the “indexes” have to be created dynamically as tables are initially accessed (but that’s a startup cost, it’s timing can be controlled, and it’s basically limited to a tablescan).

Whatever the technical hand-waving it introduces – thinking of the in-memory thing as enabling real-time bitmaps ought to help you think of ways to make good use of the feature.

 

 

May 15, 2014

Subquery with OR

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 6:23 pm BST May 15,2014

Prompted by a pingback on this post, followed in very short order by a related question (with a most gratifying result) on Oracle-L, I decided to write up a note about another little optimizer enhancement that appeared in 12c. Here’s a query that differs slightly from the query in the original article:


select
	id, modded, mod_15
from
	t1
where
	t1.mod_15 = 1                     -- originally t1.mod_15 > 0
and	(   t1.modded is null             -- originally t1.modded = 0
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

As a general principle, the “OR EXISTS” stops the optimizer from unnesting the subquery, so my original article suggested a workaround that required you to rewrite the query with a UNION ALL, using the lnnvl() function (where possible) as the easy way to eliminate accidental duplication. Take a look at the plans for my new query, though – first in 11.2.0.4, then in 12.1.0.1:


Execution Plan for 11.2.0.4
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    34 |   374 |    50   (0)| 00:00:01 |
|*  1 |  FILTER            |       |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T1    |   667 |  7337 |    50   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| T2_PK |     1 |     3 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."MODDED" IS NULL OR  EXISTS (SELECT 0 FROM "T2" "T2"
              WHERE "T2"."ID"=:B1))
   2 - filter("T1"."MOD_15"=1)
   3 - access("T2"."ID"=:B1)

Execution Plan for 12.1.0.1
------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |    27 |   378 |    50   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI NA|       |    27 |   378 |    50   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T1    |   667 |  7337 |    50   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN  | T2_PK |     1 |     3 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."MOD_15"=1)
   3 - access("T2"."ID"="T1"."MODDED")

As expected, 11.2.0.4 has had to use a filter subquery approach – but 12.1.0.1 has found a different path. For this special “is null” case the optimizer has unnested the subquery and used a “null aware (NA) semi-join”. In this very small example there is no change in the reported cost, and the mechanics of the execution plan will be quite similar at run time – but in real systems there are bound to be cases where the new strategy is more efficient.

Unfortunately …

Bug 18650065 (fixed in 12.2) rears it’s ugly head: WRONG RESULTS ON QUERY WITH SUBQUERY USING OR EXISTS.
I can demonstrate this with the following code:


update t1 set modded = null
where id <= 30;
commit;

select
	id, modded, mod_15
from
	t1
where
	t1.id = 1                     -- previously mod_15 = 1
and	(   t1.modded is null
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

alter table t1 add constraint t1_pk primary key(id);

select
	id, modded, mod_15
from
	t1
where
	t1.id = 1                     -- previously mod_15 = 1
and	(   t1.modded is null
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

And here’s the output from the above script:


30 rows updated.

Commit complete.

        ID     MODDED     MOD_15
---------- ---------- ----------
         1                     1

1 row selected.

Table altered.

no rows selected

I’ve modified a few rows so that the “null-aware” bit of the new transformation matters, but I’ve now got a data set and transformation where I get the wrong results because I’ve defined a primary key (unique would have done) on a critical column in the query. If you check the execution plan you’ll find that the optimizer has switched from a null aware semi-join to a simple nested loop join.

There is a workaround for this problem – disable the relevant feature:

alter session set "_optimizer_null_accepting_semijoin"=false;

For Reference:

Here’s the SQL to generate the data for the above demonstration:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 5000
)
select
	rownum			id,
	mod(rownum,999)		modded,
	mod(rownum,15)		mod_15,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

update t1 set modded = null where modded = 26;

create index t1_i1 on t1(id);
create index t1_i2 on t1(modded);

create table t2
as
select
	2 * rownum		id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	all_Objects
where
	rownum <= 20
;	

alter table t2 add constraint t2_pk primary key(id);

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,384 other followers