Oracle Scratchpad

May 23, 2016

Virtual Partitions

Filed under: 12c,CBO,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 1:16 pm BST May 23,2016

Here’s a story of (my) failure prompted by a recent OTN posting.

The OP wants to use composite partitioning based on two different date columns – the table should be partitioned by range on the first date and subpartitioned by month on the second date. Here’s the (slightly modified) table creation script he supplied:


rem
rem     Script: virtual_partition.sql
rem     Dated:  May 2016
rem

CREATE TABLE M_DTX
(
        R_ID    NUMBER(3),
        R_AMT   NUMBER(5),
        DATE1   DATE,
        DATE2   DATE,
        VC GENERATED ALWAYS AS (EXTRACT(MONTH FROM DATE2))
)
PARTITION BY RANGE (DATE1) interval (numtoyminterval(1,'MONTH'))
SUBPARTITION BY LIST (VC)
        SUBPARTITION TEMPLATE (
                SUBPARTITION M1 VALUES (1),
                SUBPARTITION M2 VALUES (2),
                SUBPARTITION M3 VALUES (3),
                SUBPARTITION M4 VALUES (4),
                SUBPARTITION M5 VALUES (5),
                SUBPARTITION M6 VALUES (6),
                SUBPARTITION M7 VALUES (7),
                SUBPARTITION M8 VALUES (8),
                SUBPARTITION M9 VALUES (9),
                SUBPARTITION M10 VALUES (10),
                SUBPARTITION M11 VALUES (11),
                SUBPARTITION M12 VALUES (12)
        )
        (
        PARTITION M_DTX_2015060100 VALUES LESS THAN (TO_DATE('2015-06-01 00:00:01', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
        )
;

There’s nothing particularly exciting about this – until you get to the query requirement – the user wants to query on date1 and date2, and doesn’t know about the virtual month column, e.g. (and, I know that there should be a to_date() or ANSI equivalent here):

SELECT * FROM m_dtx WHERE date1 = trunc(sysdate) AND date2 = '01-Jun-2016';

Now, as a general rule, you don’t expect partition elimination to occur unless the partitioning column appears with a predicate that make elimination possible, so your first response to this query is that it could eliminate on date1, but can’t possibly eliminiate on vc because vc isn’t in the where clause. However it’s possible that the partitioning code might be coded to recognise that the subpartition is on a virtual column that is derived from date2, so perhaps it could generate a new predicate before optimising, for example:

date2 = '01-Jun-2016'  => vc = 6

Unfortunately, your first response is correct – the optimizer doesn’t get this clever, and doesn’t do the sub-partition elimination. Here’s the execution plan from 12.1.0.2 for the sample query, followed by the execution plan when I explicitly add the predicate vc = 6.


SQL_ID  8vk1a05uv16mb, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy')

Plan hash value: 3104206240

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST ALL   |       |     1 |    57 |    15   (7)| 00:00:01 |     1 |    12 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))



SQL_ID  33q012bdhjrpn, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy') and vc
= 6

Plan hash value: 938710559

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |     6 |     6 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))


Note how the predicate vc = 6  doesn’t show up in the predicate section in either case, but the execution plan shows PARTITION LIST ALL at operation 2 when we omit the predicate and PARTITION LIST SINGE when we include it (with suitable values also appearing for Pstart and Pstop). (The cost, by the way, is the cost of scanning a whole (range)partition whether or not the optimizer expects to restrict that scan to just one sub-partition.)

So the optimizer isn’t quite clever enough (yet). BUT … the optimizer can be very clever with constraints, combining constraints with predicates and applying transitive closure to produce new predicates – so maybe we could get the optimizer to do this if we helped it a little bit. Given the table definition supplied I’m going to assume that the date2 column is supposed to be non-null, so let’s add some truthful constraints/declarations to the table definition:


alter table m_dtx modify date2 not null;
alter table m_dtx modify vc  not null;
alter table m_dtx add constraint md_ck_vc check (vc = extract(month from date2));

Alas, this didn’t make any difference to the execution plan. But it did do something surprising to my attempts to load data into the table:


insert into m_dtx (r_id, r_amt, date1, date2)
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum, 1000),
        rownum,
        trunc(sysdate,'yyyy') + dbms_random.value(0,365),
        trunc(sysdate,'yyyy') + dbms_random.value(0,365)
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-01400: cannot insert NULL into (???)

So the array insert with the virtual column doesn’t like the NOT NULL constraint on the virtual column because vc is, presumably, still null when the constraint is checked (though there’s no problem with single row inserts with the values() clause – I wonder what happens with the PL/SQL “FORALL” clause) – so let’s remove the not null constraint on vc and see what happens.


insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-02290: check constraint (TEST_USER.MD_CK_VC) violated

Unsurprisingly, given the fact that Oracle didn’t like the not null constraint, the critical check constraint also fails. This, by the way, is odd because a check constraint should accept a row when the constraint doesn’t evaluate to FALSE, so (a) vc can’t have been evaluated at this point or the constraint would evaluate to TRUE – which is not FALSE, and (b) vc at this point can no longer be null or the constraint would evaluate to NULL – which is not FALSE: so what “value” has vc got that makes the constraint check return FALSE ?

Bottom line:

I can see some scope for an optimizer enhancement that tries to find eliminating predicates from virtual columns; and I think there’s a need for ensuring that we can safely add constraints to virtual columns – after all we might want to create an index on a virtual column and sometimes we need a NOT NULL declaration to ensure that an index-only execution path can be found. Unfortunately I have to end this blog without finding an immediate solution for the OP.

Despite this failure, though, there are cases (as I showed a couple of years ago) where the optimizer in 12c can get clever enough to recognize the connection between a queried date column and the virtual partitioning column based on that date column.

May 16, 2016

Cursor_Sharing problem

Filed under: 12c,Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 12:49 pm BST May 16,2016

Here’s a possible bug (though maybe “not a bug”) that came up over the weekend on the OTN database forum. An application generating lots of “literal string” SQL was tested with cursor_sharing set to force. This successfully forced the use of bind variable substitution, but a particular type of simple insert statement started generating very large numbers of child cursors – introducing a lot of mutex waits and library cache contention. Here’s a (substituted) statement that was offered as an example of the problem:


INSERT INTO schema.tableName (column1,columns2,..)
VALUES (:"SYS_B_0",:"SYS_B_1",:"SYS_B_2",:"SYS_B_3",:"SYS_B_4",:"SYS_B_5",:"SYS_B_6",timestamp:"SYS_B_7",SYSTIMESTAMP,:"SYS_B_8")

Note particularly the slightly odd looking detail: timestamp:”SYS_B_7″; this is how bind variable substitution looks if you’ve used the “compact” ANSI mechanism for handling datetime literals. We were told, in fact, that the tables had only number, varchar2, and date columns – so it looks a little suspicious when see timestamp values being inserted but the implied coercion wasn’t the source of the problem. Here’s all it takes to see the problem (tested only on 12.1.0.2):


rem
rem     Script:         ansi_datetime_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem

create table t1 (
        n1      number(8,0),
        v1      varchar2(10),
        d1      date,
        t1      timestamp
);

insert into t1 values(-1,'x',sysdate, systimestamp);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

set serveroutput off
alter session set cursor_sharing = force;

prompt  ============================
prompt  Testing DATE literals
prompt  Expect to see child number 2
prompt  ============================

insert into t1 values(1, 'A', date'2016-01-01', null);
insert into t1 values(1, 'A', date'2016-01-02', null);
insert into t1 values(1, 'A', date'2016-01-03', null);

select * from table(dbms_xplan.display_cursor(null,null,'-note -plan_hash'));

prompt  ============================
prompt  Testing TIMESTAMP literals
prompt  Expect to see child number 2
prompt  ============================

insert into t1 values(1, 'A', null, timestamp'2016-01-01 00:00:00');
insert into t1 values(1, 'A', null, timestamp'2016-01-02 00:00:00');
insert into t1 values(1, 'A', null, timestamp'2016-01-03 00:00:00');

select * from table(dbms_xplan.display_cursor(null,null,'-note -plan_hash'));

prompt  ===============================
prompt  Need privilege to see this view
prompt  Pre-coded for the sql_ids above
prompt  ===============================

break on sql_id skip 1

select
        sql_id, child_number, hash_match_failed
from
        v$sql_shared_cursor
where
        sql_id in ('58udhcm270bhn', 'gssz5cbnt7mgn')
order by
        sql_id, child_number
;

A couple of points – there are several more tests in the script demonstrating things that do NOT cause multiple child cursors to appear. I probably didn’t cover all the options that I could have covered but I hit a number of common cases to check that it wasn’t simply that cursor_sharing being severely broken in 12c. I’ve also allowed a side effect to demonstrate the presence of multiple child cursors rather than explcitly listing the child cursors. If the three statements (of each type) had produced shareable cursors then the child number reported by dbms_xplan.display_cursor() would have been zero in both cases. Here are the actual outputs:


SQL_ID  58udhcm270bhn, child number 2
-------------------------------------
insert into t1 values(:"SYS_B_0", :"SYS_B_1", date:"SYS_B_2", null)

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | T1   |       |
-------------------------------------------------


SQL_ID  gssz5cbnt7mgn, child number 2
-------------------------------------
insert into t1 values(:"SYS_B_0", :"SYS_B_1", null, timestamp:"SYS_B_2")


-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | T1   |       |
-------------------------------------------------


And, having pre-coded the script with the SQL_IDs of the two guilty statements, here’s the output identifying the cause of the failure to share from v$sql_shared_cursor:


SQL_ID        CHILD_NUMBER H
------------- ------------ -
58udhcm270bhn            0 N
                         1 Y
                         2 Y

gssz5cbnt7mgn            0 N
                         1 Y
                         2 Y

There is a bug on MoS relating to timestamp columns and failure to share cursors – it doesn’t really look like the same problem but it could be related in some way: Bug 13407937 : HIGH VERSION COUNT FOR INSERT WITH CURSOR_SHARING=FORCE AND TIMESTAMP COLUMN. It’s described as “not a bug” :(

April 25, 2016

DDL logging

Filed under: 12c,Oracle,Partitioning — Jonathan Lewis @ 1:05 pm BST Apr 25,2016

I was presenting at the UKOUG event in Manchester on Thursday last week (21st April 2016), and one of the sessions I attended was Carl Dudley’s presentation of some New Features in 12c. The one that caught my eye in particular was “DDL Logging” because it’s a feature that has come up fairly frequently in the past on OTN and other Oracle forums.

So today I decided to write a brief note about DDL Logging – and did a quick search of my blog to see if I had mentioned it before: and I found this note that I wrote in January last year but never got around to publishing – DDL Logging is convenient, but doesn’t do the one thing that I really want it to do:

DDL Logging – 12c

One of the little new features that should be most welcome in 12c is the ability to capture all DDL executed against the database. All it takes is a simple command (if you haven’t set the relevant parameter in the parameter file):

alter system set enable_ddl_logging = true;

All subsequent DDL will be logged to two different places (in two formats)

  • $ORACLE_BASE/diag/rdbms/{database}/{instance}/log/ddl/log.xml
  • $ORACLE_BASE/diag/rdbms/{database}/{instance}/log/ddl_{instance}.log

Unfortunately the one thing I really wanted to see doesn’t appear – probably because it doesn’t really count as DDL –  it’s the implicit DDL due to inserting into not-yet-existing partitions of an interval partitioned table.

Note: If you’re using a container database with pluggable databases then the DDL for all the pluggable databases goes into the same log file.

Update – Licensing

The following text in the Oracle 12c Database Licensing document has just been brought to my attention:

Licensed Parameters
The init.ora parameter ENABLE_DDL_LOGGING is licensed as part of the Database Lifecycle Management Pack when set to TRUE. When set to TRUE, the database reports schema changes in real time into the database alert log under the message group schema_ddl. The default setting is FALSE.

The licensing document is also linked to from the 12c online html page for the parameter.

The 11g parameter definition makes no mention of licensing, and the 11g “New Features” manual don’t mention the feature at all, but the parameter does get a special mention in the 11g licensing document where it is described as being part of the Change Management Pack.

init.ora Parameters
The use of the following init.ora parameter is licensed under Oracle Change Management Pack:
■ ENABLE_DDL_LOGGING: when set to TRUE (default: FALSE)

 

April 1, 2016

Set Operations

Filed under: 12c,CBO,Execution plans,Oracle — Jonathan Lewis @ 2:20 pm BST Apr 1,2016

A recent post on the OTN database forum highlights a couple of important points ideas for optimising SQL. There are: (a) is there a logically equivalent way of stating the SQL and (b) is there a different “natural language” way of posing the problem.

The posting starts with a query, part of an execution plan, and a request to “get rid of the tablescan”. I guessed originally that the query came from an 11g instance, and the OP gave us some code to create the tables and indexes, so I’ve modelled the tables to get the indicated plan (then filled in the original numbers). This is the query, and my cosmetically adjusted version of the plan output that the OP probably got:


SELECT a.hotel_code
  FROM lf_hotel_temp a
WHERE a.service_id = : p_service_id
       AND (NOT EXISTS (SELECT *
          FROM lf_ts_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code)
        or NOT EXISTS (SELECT *
          FROM lf_gta_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code) 
       or  NOT EXISTS (SELECT *
          FROM lf_hb_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code))

-------------------------------------------------------------------------------
| Id  | Operation          | Name                     | Rows  |  Bytes | Cost |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                          | 12613 | 113517 |  135 |
|*  1 |  FILTER            |                          |       |        |      |
|*  2 |   TABLE ACCESS FULL| LF_HOTEL_TEMP            | 88433 | 795897 |  135 |
|*  3 |   INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX  |     1 |      7 |    1 |
|*  4 |   INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX |     1 |      9 |    1 |
|*  5 |   INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX  |     2 |     14 |    3 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
              "B"."HOTEL_CODE"=:B1) OR  NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
              WHERE "B"."HOTEL_CODE"=:B2) OR  NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
              "B" WHERE "B"."HOTEL_CODE"=:B3))
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   3 - access("B"."HOTEL_CODE"=:B1)
   4 - access("B"."HOTEL_CODE"=:B1)
   5 - access("B"."HOTEL_CODE"=:B1)

We were told in the original posting that there’s a primary key on lf_hotel_temp declared on (hotel_code, service_id), and we were given the definitions, sizes, and index declarations of all the table in a follow-up posting. It turns out that lf_hotel_temp consists of just those two columns and holds 278,000 rows: the optimizer’s estimate for the number of rows identified by a single service_id is over 88,000, and the nature of the query tells us that the optimizer would have to examine every one of those rows to check if it satisfied any of the three subqueries.

So how might Oracle access the rows ?  Given that the only columns used will all be in the primary key index (which implies not null constraints) there are four basic options: tablescan, index fast full scan, index full scan, and index skip scan. Given the most likely data content (i.e. lots of different hotel_codes), we can assume the skip scan would be a very bad idea. We can be sure that an index fast full scan will be lower cost than an index full scan – for anything except tiny indexes. Ultimately the question is really “why a tablescan instead of an index fast full scan?”. As I pointed out, though, the table consists of just those two columns – which means it’s perfectly reasonable for the index to be larger than the table as each entry of the index will consist of the two columns AND a rowid.

The first interesting bit

The question of why the access to lf_hotel_temp was by tablescan rather than some indexed method isn’t really interesting. The interesting bit is how (in principle) we might make the plan more efficient (if it really needs it); and this leads to two key, and general purpose, observations. As Andrew Sayer pointed out on the thread, we have a compound predicate:

    (not exists A OR not exists B OR not exists C)

and this is logically equivalent to

   not (exists A AND exists B AND exists C)

If we rewrite the query to reflect this equivalence could the optimizer find a different, better way of executing it:


select  /*+ dynamic_sampling(0) */
        a.hotel_code
from    lf_hotel_temp a
where
        a.service_id = :p_service_id
and     not(
                exists (
                        select  null
                        from    lf_ts_roomtype_properties ts
                        where   ts.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_gta_roomtype_properties gta
                        where   gta.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_hb_roomtype_properties hb
                        where   hb.hotel_code = a.hotel_code
                )
        )
;

Of course, I didn’t have the original data; so I copied the DDL supplied in the OTN thread and added a little DML to insert a few rows in the tables. The data I used looked like this:


insert into lf_hotel_temp (hotel_code, service_id) values ('A',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('B',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('C',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('D',1);

-- insert into lf_ts_roomtype_properties values ( 'A','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_gta_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_gta_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_hb_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'B','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_hb_roomtype_properties values ( 'D','x','x',0,1,'x');
commit;

It’s possible that with different data volumes you’d get different execution plans, but in 11g the optimizer transformed my query back into the original form – in other words it recognised the equivalence of “not (A and B and C)” and rewrote it as “(not A or not B or not C)” !

However, I also have 12c available, and I had created a script to build a model, so I ran the test on 12c. Both versions of the query produced the following plan:


----------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|   2 |   VIEW                | VW_SQ_1                    |    82 |   984 |     6   (0)| 00:00:01 |
|*  3 |    HASH JOIN SEMI     |                            |    82 |  2952 |     6   (0)| 00:00:01 |
|*  4 |     HASH JOIN         |                            |    82 |  1968 |     4   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES |    82 |   984 |     2   (0)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| LF_HB_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|   7 |     TABLE ACCESS FULL | LF_TS_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|*  8 |   INDEX FULL SCAN     | LF_HOTEL_TEMP_PK           |   101 |   198K|     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("VW_COL_1"="A"."HOTEL_CODE")
   3 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("TS"."HOTEL_CODE"))
   4 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("GTA"."HOTEL_CODE"))
   8 - access("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))

Ignore the numbers (I hadn’t collected stats, which is why I added the /*+ dynamic_sampling(0) */ hint – with stats in place 12c produced the FILTER plan that 11g had produced) the key feature is that Oracle has managed to transform my three filter subqueries into a single join subquery and then transformed the resulting subquery into an anti-join. It’s a pretty amazing transformation – the optimizer did it automatically in 12c, but if you are aware of the logical equivalence then you may find cases where you can turn “OR’s” into “AND’s” and help the optimizer to find transformations that it can’t find automatically.

The second interesting bit

If you think about the meaning behind the query (prompted, perhaps, by the logical equivalence described above) you might rephrase the question as “find me the hotel codes that fail to appear in all three related tables” – in English this is ambigious and open to catastrophic mis-interpretation so you might have another go and say “find me the hotel codes that appear in every one of the three related tables – those are the hotel codes I don’t want”. This latter expression, of course, is exactly what Oracle is doing by joining the three tables and then doing the “not exists”/anti-join against the result. Obviously you could translate the new English form into SQL by hand, with a three table join in a “not exists” subquery.

I actually took a different approach (which might, or might not, be efficient – depending on the actual data and indexes).  I translated the new English statement into the following:


select  /*+ dynamic_sampling(0) */
        hotel_code
from    lf_hotel_temp
where   service_id = :p_service_id
minus   (
        select  hotel_code
        from    lf_ts_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_gta_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_hb_roomtype_properties
        where   hotel_code is not null
        )
;

The three way intersection gets me the list of hotels that appear in all three tables; the minus operator takes the list of hotel with the correct service_id and eliminates from it the hotels that appear in the intersection – giving me the result I want.

For my tiny data set, this is the plan I got:

--------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |  2159 |     8  (50)| 00:00:01 |
|   1 |  MINUS                |                          |       |       |            |          |
|   2 |   SORT UNIQUE NOSORT  |                          |     1 |  2015 |     2  (50)| 00:00:01 |
|*  3 |    INDEX FULL SCAN    | LF_HOTEL_TEMP_PK         |     1 |  2015 |     1   (0)| 00:00:01 |
|   4 |   INTERSECTION        |                          |       |       |            |          |
|   5 |    INTERSECTION       |                          |       |       |            |          |
|   6 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  7 |      INDEX FULL SCAN  | LF_TS_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
|   8 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  9 |      INDEX FULL SCAN  | LF_GTA_ROOMTYPE_PROP_IDX |     4 |    48 |     1   (0)| 00:00:01 |
|  10 |    SORT UNIQUE NOSORT |                          |     4 |    48 |     2  (50)| 00:00:01 |
|* 11 |     INDEX FULL SCAN   | LF_HB_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   7 - filter("HOTEL_CODE" IS NOT NULL)
   9 - filter("HOTEL_CODE" IS NOT NULL)
  11 - filter("HOTEL_CODE" IS NOT NULL)

Important note: I am not claiming that this use of set operators will be more efficient than a filter subquery or anti-join/semi-join approach, performance ultimately depends on the volume and patterns in the data combined with the available indexing. In this case you can almost see the classic performance compromise that we often see in Oracle – even in the trade-off between something as simple as choosing between a hash join and a nested loop join – should we operate this query as a tiny number of “bulk” operations, or as a (potentially) large number of tiny, high-precision operations.

If the original query was spending all it’s time on CPU running lots of subqueries, or doing lots of single block random I/Os because of the random ordering of the subqueries, then perhaps a couple of brute force “db file parallel read” index full scans would be a friendlier use of the available resources, run more quickly, and have less impact on every other user.

 

March 29, 2016

Index Usage

Filed under: 12c,Exadata,HCC,in-memory,Indexing,Oracle,Performance — Jonathan Lewis @ 10:53 am BST Mar 29,2016

There are some questions about Oracle that are like the mythical Hydra – you think you’ve killed it, but for every head you cut off another two grow. The claim that “the optimizer will switch between using an index and doing a tablescan when you access more than X% of the data” re-appeared on the OTN database forum a little while ago – it doesn’t really matter what the specific value of X was – and it’s a statement that needs to be refuted very firmly because it’s more likely to cause problems than it is to help anyone understand what’s going on.

At a very informal level we may have an intuitive feeling that for a “precise” query accessing a “small” amount of data an indexed access path should make sense while for a “big” query accessing a “large” amount of data we might expect to see a tablescan, but any attempt to give a meaning to “small” and “large” that is both general purpose and strictly quantified will be wrong: there are too many variables involved.

Just as a quick demonstration of how badly we can be misled by a simple numeric claim here’s a quick test I created on a newly installed instance of 11.2.0.4, which I happened to set up with a locally defined tablespace using uniform extents of 1MB using the default 8KB blocksize but with manual (freelist) space management:


rem
rem     Script:   index_usage_pct.sql
rem     Dated:    March 2016
rem     Author:   J P Lewis
rem

drop table t1;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4
)
select
        cast(rownum as number(8,0))                              id,
        cast(trunc(dbms_random.value(0,1e6)) as number(8,0))     n1,
        lpad(rownum,6,'0')              v1,
        rpad('x',10,'x')                small_vc,
        rpad('x',151,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(id);

spool index_usage_pct.lst

select  num_rows, blocks, avg_row_len, round(num_rows/blocks) rows_per_block
from    user_tables
where   table_name = 'T1'
;

set autotrace on explain
select count(v1) from t1 where id between 1 and 245000;
set autotrace off

spool off

I’ve created a table with 1 million rows; the rows are about 180 bytes long (you’ll see the sizes a few lines further down the page), so it’s not an unreasonable model for lots of tables in typical systems – if you want to experiment further you can adjust the rpad() in the padding column; and I’ve created an index on a sequentially  (rownum) generated column. My call to autotrace will produce a truthful execution plan for the query supplied – there’s no risk of unexpected type conversion and no problems from bind variable peeking. As you can easilky infer, my query will access 245,000 rows in the table of 1,000,000 – nearly a quarter of the table. Would you expect to see Oracle use the index ?

Here’s the output from the script on MY brand new database, instance, and tablespace:


  NUM_ROWS     BLOCKS AVG_ROW_LEN ROWS_PER_BLOCK
---------- ---------- ----------- --------------
   1000000      25642         180             39

1 row selected.


 COUNT(N1)
----------
    245000

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 269862921

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    10 |  6843   (1)| 00:01:23 |
|   1 |  SORT AGGREGATE              |       |     1 |    10 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   245K|  2392K|  6843   (1)| 00:01:23 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |   245K|       |   553   (1)| 00:00:07 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">=1 AND "ID"<=245000)


There are no tricks involved here, no cunning fiddling with data structures or parameters – this is just a simple, straightforward, test.

Of course the result is probably a little counter-intuitive; 24.5% of the data seems a lot for the optimizer to pick an index. There are many reasons for this, the first being that the data is very well clustered relative to the index – the index’s clustering_factor is the smallest it could be for a B-tree indexing every row in this table.

Another important feature, though, is that I haven’t done anything with the system statistics so the optimizer was using various default values which tell it that a multiblock read will be quite small (eight blocks) and a lot slower than a single block read (26 ms vs. 12 ms). One simple change that many people might have made during or shortly after installation (though it shouldn’t really be done in any modern version of Oracle) is to set the db_file_multiblock_read_count parameter to 128 – with just this change the optimizer would assume that a multiblock read really would be 128 blocks, but that it would now take 266 ms. That means the optimizer will assume that the read will be ten times slower than it was, but will read 32 times as much data – a fairly significant relative improvement thanks to which the access path for my initial query will switch to a full tablescan and won’t switch back to an index range scan until I reduce the range from 245,000 to something like 160,000.

I can go further, of course. With a few background queries running to exercise the database I executed the dbms_stats.gather_system_stats() procedure with the ‘start’ and ‘stop’ options to collect some figures about the hardware and expected workload. This gave me the following results,  interpreted from the sys.aux_stats$ table:


MBRC       :126
MREADTIM   :0.902
SREADTIM   :0.386
CPUSPEED   :976

With the optmizer using these figures to compare the relative speed and size of single and multiblock reads I had to reduce my selected range to roughly 51,000 before the optimizer would choose the index range scan.

I could go on to demonstrate the effects of the dbms_resource_manager.calibrate_io procedure and the effects of allowing different degrees of parallelism with different system stats, but I think I’ve probably made the point that there’s a lot of variation in the break point between index range scans and tablescans EVEN when you don’t change the data. With this very well ordered (perfect clustering_factor) data I’ve seen the break point vary between 51,000 rows and 245,000 rows (5% and 25%).

And finally …

Let’s just finish with a last (and probably the most important) variation:  changing the pattern in the data we want from perfectly clustered to extremely scattered. If you check the query that generated the data you’ll see that we can do this by creating an index on column n1 instead of column id, and changing the where clause in the test query to n1 between 1 and 4500 (which, in my case, returned slightly more that 4,500 rows thanks to a small amount of duplication generated by the call to dbms_random.value()). At slightly under 0.5% of the data (and with my most recent settings for the system statistics) the optimizer chose to use a tablescan.

Remember, there are many factors involved in the optimizer choosing between a tablescan and an index range scan and one of the most significant factors in the choice is the (apparent) clustering of the data so, if you haven’t come across it before, you should examine the “table_cached_blocks” option that appeared in 11.2.0.4 for the procedure dbms_stats.set_table_prefs() as this allows you to give the optimizer a better idea of how well your data really is clustered.

Addendum (April 2016)

Following on from the demonstration of how changes in parameters, patterns and statistics can make a difference in what we (or the optimizer) might consider a “small” amount of data and whether an indexed access path would be appropriate, it’s worth mentioning that the Exadata technologies of smart scans and hybrid columnar compression and Oracle’s latest technology of In-Memory Colum Store do not change the way you think about indexes – they only change the (unspecifiable) volume at which an index ceases to be the better option to use.

 

February 11, 2016

Quiz Night

Filed under: 12c,Infrastructure,LOBs,Oracle,Statistics — Jonathan Lewis @ 3:28 pm BST Feb 11,2016

I was setting up a few tests on a copy of 12.1.0.2 recently when I made a mistake creating the table – I forgot to put in a couple of CAST() calls in the select list, so I just patched things up with a couple of “modify column” commands. Since I was planning to smash the table in all sorts of ways and it had taken me several minutes to create the data set (10 million rows) I decided to create a clean copy of the data so that I could just drop the original table and copy back the clean version – and after I’d done this I noticed something a little odd.

Here’s the code (cut down to just 10,000 rows), with a little output:

--
--      Script:  12c_vc32767_b.sql
--

create table t1 (
        id      not null,
        date_open,
        date_closed,
        deal_type,
        client_ref,
        small_vc,
        padding
)
nologging
as
select
        rownum                                          id,
        trunc(
                add_months(sysdate, -120) +
                        (rownum)* 3652 / 1e7
        )                                               date_open,
        trunc(
                add_months(
                        add_months(sysdate, -120) +
                                (rownum) * 3652 / 1e7,
                        12 * trunc(dbms_random.value(1,6))
                )
        )                                               date_closed,
        dbms_random.string('U',1)                       deal_type,
        dbms_random.string('U',4)                       client_ref,
        lpad(rownum,10)                                 vc_small,
        rpad('x',100,'x')                               vc_padding
from
        dual
connect by
        level <= 10000
;
alter table t1 modify deal_type varchar2(1);
alter table t1 modify client_ref varchar2(4);

create table t2 nologging as select * from t1;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt=>'for all columns size 1'
        );
end;
/

select table_name, num_rows, blocks, avg_row_len from user_tables;

TABLE_NAME             NUM_ROWS     BLOCKS AVG_ROW_LEN
-------------------- ---------- ---------- -----------
T1                        10000        304         139
T2                        10000        218         139

There’s no hidden code – this is the code I ran, and the tables both went into the same tablespace- yet one table has used about 30% more blocks than the other on a simple “create as select”, even though the row lengths and row counts are the same!

When I first did this test it just so happened that the only place in the database I had to make the t2 copy was in a tablespace using freelist management, and I’d created the original table in a tablespace using ASSM, so I wasted a little time examining the ASSM space management (bitmap) blocks before I realised what had happened.

Suggestions about what has happened, and how to confirm your hypothesis, invited in the comments.

Update (following morning)

The explanation is in the comments: I had enabled extended (32,767 byte) varchar2().  My calls to dbms_random.string() had returned character values, and the return type defaults to the maximum character length, which meant they were declared as varchar2(32767) and those are implemented as (unhackable) CLOBs in 12.1.0.2.

When I modified the table to change the varchar2(32767) to shorter lengths Oracle let me get away with it because none of the actual stored values at that time was longer than my target declaration – but it didn’t physically change the stored column values to “short” varchar2(), it left them as CLOBs, and a “short” CLOB includes a LOB Locator which starts out at about 30 bytes.

When I created t2 from t1 Oracle first copied the (short) column definitions, and then copied the data, reading and converting the CLOBs to normal varchar2() storage, and that explains why t2 needed fewer blocks.  The avg_row_len for the two tables matched because the code to gather stats simply applies the sys_op_opnsize() function to the varchar2() defined in t1, and doesn’t give you any indication about the LOB locator.

(If you use the extremely deprecated analyze command to gather stats on the two tables you’ll find that the avg_row_len of table t1 then allows for the LOB locator size.)

“Workaround:”

I should have used:  cast(dbms_random.string(‘U’,4) as varchar2(4)) and the column definition would have been a proper varchar2(4) from the start.  (As noted in a comment from Ivica Arsov below, substr(…, 1, 4) would also have worked.  Of course I should have used cast() for the numerics as well so that I could declare them as things like number(8,0) rather than the slightly dangerous “number”.

Note:  I’ve put “workaround” in quotes because it’s not really a workaround to a problem – it’s just an example of doing it right.

Footnote:

Here’s a symbolic dump of a single character (‘T’) being stored as a varchar2(32767):


LOB
Locator:
  Length:        84(31)
  Version:        1
  Byte Length:    1
  LobID: 00.00.00.01.00.00.01.6b.ba.d7
  Flags[ 0x01 0x0c 0x00 0x80 ]:
    Type: BLOB
    Storage: SecureFile
    Characterset Format: IMPLICIT
    Partitioned Table: No
    Options: ReadWrite
  SecureFile Header:
    Length:   11
    Old Flag: 0x48 [ DataInRow SecureFile ]
    Flag 0:   0x90 [ INODE Valid ]
    Layers:
      Lengths Array: INODE:5
      INODE:
        00 00 01 01 54

Note the 84(31) at line 3: this is a reminder of how big a LOB locator could get for an out of line LOB, compared to the current size of the LOB locator. Rows that hold out of line LOBs can actually be much longer than avg_row_len tells you.

Addendum

Another side effect of enabling extended varchar2() types is the surprise you get when you use error logging in SQL (i.e. syntax like: “insert into t1 select … log errors”). When you create the “clone” table for the log it changes every column from the original into a varchar2(32767) e.g.:

desc target
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ID                                                                                NUMBER
 N1                                                                                NUMBER
 N2                                                                                NUMBER
 SMALL_VC                                                                          VARCHAR2(10)
 PADDING                                                                           VARCHAR2(100)


begin
        dbms_errlog.create_error_log('TARGET');
end;
/

desc err$_TARGET
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ORA_ERR_NUMBER$                                                                   NUMBER
 ORA_ERR_MESG$                                                                     VARCHAR2(2000)
 ORA_ERR_ROWID$                                                                    ROWID
 ORA_ERR_OPTYP$                                                                    VARCHAR2(2)
 ORA_ERR_TAG$                                                                      VARCHAR2(2000)
 ID                                                                                VARCHAR2(32767)
 N1                                                                                VARCHAR2(32767)
 N2                                                                                VARCHAR2(32767)
 SMALL_VC                                                                          VARCHAR2(32767)
 PADDING                                                                           VARCHAR2(32767)

You probably hope that you’re not going to see many rows inserted into the error log table, so the extra space taken up in the table by error rows probably won’t be a problem – but the fact that you get two segments (the LOB and the LOBINDEX) created for every column in the original table might cause a problem – even if they are completely empty.

January 26, 2016

Trace file size

Filed under: 12c,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 8:30 am BST Jan 26,2016

Here’s a convenient enhancement for tracing that came up on Twitter a few days ago – first in a tweet that I retweeted, then in a question from Christian Antognini based on this bit of the 12c Oracle documentation (opens in separate tab). The question was – does it work for you ?

The new description for max_dump_file_size says that for large enough values Oracle will split the file into multiple chunks of a few megabytes, using a suffix to identify the sequence of the chunks, keeping only the first chunk and the most recent chunks. Unfortunately this doesn’t seem to be true. However, prompted by Chris’ question I ran a quick query against the full parameter list looking for parameters with the word “trace” in their name:


select
        /*+
                leading(nam val val2)
                full(name)
                full(val)  use_hash(val)  no_swap_join_inputs(val)
                full(val2) use_hash(val2) no_swap_join_inputs(val2)
        */
        nam.ksppinm                             name,
        val.ksppstvl                            ses_val,
        val2.ksppstvl                           sys_val,
        nam.ksppdesc                            description,
        nam.indx+1                              numb,
        nam.ksppity                             type,
        val.ksppstdf                            is_def,
        decode(bitand(nam.ksppiflg/256,1),
                1,'True',
                  'False'
        )                                       ses_mod,
        decode(bitand(nam.ksppiflg/65536,3),
                1,'Immediate',
                2,'Deferred' ,
                3,'Immediate',
                  'False'
        )                                       sys_mod,
        decode(bitand(val.ksppstvf,7),
                1,'Modified',
                4,'System Modified',
                  'False'
        )                                       is_mod,
        decode(bitand(val.ksppstvf,2),
                2,'True',
                  'False'
        )                                       is_adj,
        val.ksppstcmnt                          notes
from
        x$ksppi         nam,
        x$ksppcv        val,
        x$ksppsv        val2
where
        nam.indx = val.indx
and     val2.indx = val.indx
and     ksppinm like '%&m_search.%'
order by
        nam.ksppinm
;

Glancing through the result I spotted a couple of interesting parameters with the letters “uts” in their names, so re-ran my query looking for all the “uts” parameters, getting the following (edited) list:


NAME                           SYS_VAL         DESCRIPTION    
------------------------------ --------------- ---------------------------------------------
_diag_uts_control              0               UTS control parameter
_uts_first_segment_retain      TRUE            Should we retain the first trace segment
_uts_first_segment_size        0               Maximum size (in bytes) of first segments 
_uts_trace_disk_threshold      0               Trace disk threshold parameter
_uts_trace_segment_size        0               Maximum size (in bytes) of a trace segment
_uts_trace_segments            5               Maximum number of trace segments 

Note particularly the “first segment size” and “trace segment size” – defaulting to zero (which often means a hidden internal setting, though that doesn’t seem to be the case here, but maybe that’s what the “diag control” is for). I haven’t investigated all the effects, but after a little experimentation I found that all I needed to do to get the behaviour attributed to max_dump_file_size was to set the following two parameters – which I could do at the session level.


alter session set "_uts_first_segment_size" = 5242880;
alter session set "_uts_trace_segment_size" = 5242880;

The minimum value for these parameters is the one I’ve shown above (5120 KB) and with the default value for _uts_trace_segments you will get a maximum of 5 trace files with sequential names like the following:

ls -ltr *4901*.trc

-rw-r----- 1 oracle oinstall 5243099 Jan 26 08:15 orcl_ora_4901_1.trc
-rw-r----- 1 oracle oinstall 5243064 Jan 26 08:15 orcl_ora_4901_12.trc
-rw-r----- 1 oracle oinstall 5243058 Jan 26 08:15 orcl_ora_4901_13.trc
-rw-r----- 1 oracle oinstall 5242993 Jan 26 08:15 orcl_ora_4901_14.trc
-rw-r----- 1 oracle oinstall 1363680 Jan 26 08:15 orcl_ora_4901.trc

As you can see I’m currently generating my 15th trace, and Oracle has kept the first one and the previous three. It’s always working on a file with no suffix to its name but as soon as that file hits its limiting size (plus or minus a few bytes) it gets its appropriate suffix, the oldest file is deleted, and a new trace file without a suffix is started.

Apart from the usual header information the trace files start and end with lines like:

*** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_11.trc ***
  
*** TRACE SEGMENT RENAMED TO /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_12.trc ***

There is one little trap to watch out for: if you set either of these parameters to be larger than max_dump_file_size tracing stops as soon as one of the segments hits the max_dump_file_size and that trace file ends with the usual “overflow” message – e.g, when I changed the max_dump_file_size to 4M in mid-session:

*** DUMP FILE SIZE IS LIMITED TO 4194304 BYTES ***

In my case I had started with max_dump_file_size set to 20M, so I got lucky with my choice of 5M as the segment size.

Further investigation is left as an exercise to the interested reader.

 

December 21, 2015

Parallel Plans

Filed under: 12c,Oracle,Parallel Execution — Jonathan Lewis @ 8:40 am BST Dec 21,2015

This article was prompted by a pair of articles by Yasin Baskan of Oracle Corporation: PX Server Sets etc. and Multiple Parallelizers, plus a little extra prompting from a mistake that I made when reading the second of those two articles. The fact that I made a mistake is significant because, without it, I wouldn’t have created a model to check Yasin’s description of the parallel activity.

I want to examine the following query to find out the order of activity:


select
        distinct t1.n1
from
        t1
where 
        t1.n1 >= 500
and     t1.n2 > (select avg(t2.n2) from t2)
and     t1.n3 > (select avg(t3.n3) from t3)
;

I’m going to be using 12.1.0.2 for this demonstration (lots of changes appear in parallel execution as you move from 11g to 12c, so the version is most significant), with all default settings for the parallel-related parameters.

To get things started, here’s the code to generate the data that I used, and the serial execution plan:


create table t1 nologging as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 2e4
;

create table t2 nologging as
with generator as ( 
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 2e4
;

create table t3 nologging as
with generator as ( 
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 2e4 ;

-- don't need to gather stats as they are generated on CTAS in 12c.
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   150 (100)|          |
|   1 |  HASH UNIQUE         |      |    24 |   336 |   150   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T1   |    25 |   350 |    51   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 20000 |    97K|    49   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 20000 |    97K|    49   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">=500 AND "T1"."N2"> AND "T1"."N3">))

I’ve pulled the execution plan from memory using dbms_xplan.display_cursor(), which is why the predicate section shows the odd comparison of “t2.n2 > {nothing there}” and “t3.n3 > {nothing there}”. The predicate section produced by a simple call to explain plan would help us to fill in the details:


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N1">=500 AND "T1"."N2"> (SELECT AVG("T2"."N2") FROM
              "T2" "T2") AND "T1"."N3"> (SELECT AVG("T3"."N3") FROM "T3" "T3"))

We can infer from the predicate section that Oracle executes the t2 subquery first and the t3 subquery second; but there is a slight oddity in the shape of the execution plan if we apply the “first child first / recursive descent” approach to reading the plan. It looks as if operations 3 and 5 are children of line 2 and should be run before child 2. This is the consequence of “subquery pushing”. In fact the subqueries are both filter subqueries and, in principle (and ignoring scalar subquery caching for the moment), they will be run once for each row produced from the tablescan of operation 2.

The activity is as follows:

  • start scanning table t1
  • when the first row appears where n1 >= 500 run the subquery against t2 to derive the average of t2.n2 (10000.5) and check to see if the n2 value of the current t1 row exceeds that value (it doesn’t);
  • continue scanning t1 checking for rows where n1 >= 500 and n2 >= 10000.5 (no need to rerun the subquery thanks to scalar subquery caching),
  • on finding the first row matching both predicates run the subquery against t3 to derive the average of t3.n3 (10000.5) and check to see if the n3 value of the current t1 row exceeds that value.
  • continue the scan of t1 checking for rows where n1 >= 500 and n2 >= 10000.5 and n3 >= 10000.5 (again scalar subquery caching means no further executions of either subquery).

We can add weight to the argument that this is the order of activity by flushing the buffer cache and enabled the extended trace before running the query.  Here’s a section of a trace file that demonstrates the point – with no edits beyond the insertion of a few blank lines:


PARSE #140345652175312:c=1000,e=1813,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=115251305,tim=134216063372
EXEC #140345652175312:c=0,e=62,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=115251305,tim=134216063544
WAIT #140345652175312: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=134216063569
WAIT #140345652175312: nam='db file sequential read' ela= 9 file#=5 block#=131 blocks=1 obj#=133974 tim=134216063627
WAIT #140345652175312: nam='db file scattered read' ela= 315 file#=5 block#=132 blocks=112 obj#=133974 tim=134216064187

WAIT #140345652175312: nam='db file sequential read' ela= 9 file#=5 block#=515 blocks=1 obj#=133975 tim=134216064438
WAIT #140345652175312: nam='db file scattered read' ela= 115 file#=5 block#=516 blocks=112 obj#=133975 tim=134216064732
WAIT #140345652175312: nam='db file scattered read' ela= 18 file#=5 block#=628 blocks=12 obj#=133975 tim=134216066046
WAIT #140345652175312: nam='db file scattered read' ela= 123 file#=5 block#=642 blocks=112 obj#=133975 tim=134216066548
WAIT #140345652175312: nam='db file scattered read' ela= 20 file#=5 block#=754 blocks=14 obj#=133975 tim=134216067760
WAIT #140345652175312: nam='db file scattered read' ela= 124 file#=5 block#=770 blocks=112 obj#=133975 tim=134216068153
WAIT #140345652175312: nam='db file scattered read' ela= 8 file#=5 block#=882 blocks=2 obj#=133975 tim=134216069677

WAIT #140345652175312: nam='db file scattered read' ela= 18 file#=5 block#=244 blocks=12 obj#=133974 tim=134216070049
WAIT #140345652175312: nam='db file scattered read' ela= 127 file#=5 block#=258 blocks=112 obj#=133974 tim=134216070382

WAIT #140345652175312: nam='db file sequential read' ela= 6 file#=5 block#=899 blocks=1 obj#=133976 tim=134216070812
WAIT #140345652175312: nam='db file scattered read' ela= 41 file#=5 block#=900 blocks=112 obj#=133976 tim=134216071028
WAIT #140345652175312: nam='db file scattered read' ela= 42 file#=5 block#=1012 blocks=12 obj#=133976 tim=134216072956
WAIT #140345652175312: nam='db file scattered read' ela= 215 file#=5 block#=1026 blocks=112 obj#=133976 tim=134216073642
WAIT #140345652175312: nam='db file scattered read' ela= 29 file#=5 block#=1138 blocks=14 obj#=133976 tim=134216076334
WAIT #140345652175312: nam='db file scattered read' ela= 154 file#=5 block#=1154 blocks=112 obj#=133976 tim=134216076825
WAIT #140345652175312: nam='db file scattered read' ela= 14 file#=5 block#=1266 blocks=2 obj#=133976 tim=134216081673

WAIT #140345652175312: nam='db file scattered read' ela= 24 file#=5 block#=370 blocks=14 obj#=133974 tim=134216082815
WAIT #140345652175312: nam='db file scattered read' ela= 144 file#=5 block#=386 blocks=112 obj#=133974 tim=134216083232
WAIT #140345652175312: nam='db file scattered read' ela= 20 file#=5 block#=498 blocks=12 obj#=133974 tim=134216084494

FETCH #140345652175312:c=16998,e=21096,p=1105,cr=1111,cu=0,mis=0,r=1,dep=0,og=1,plh=115251305,tim=134216084683

Object 133974 is table t1, object 133975 is table t2, and object 133976 is table t3.

As you can see, we start by scanning t1, then we scan t2 once, then we scan more of t1, then we scan t3 once, then we finish off scanning t1.

We could chase the detail a little further, of course – we could check the session statistics to make sure that we haven’t been using the buffer cache to scan t2 and t3 repeatedly, or we could enable rowsource execution statistics to report the number of times each operation started, or we could check the STAT# lines dumped a little later in the trace files and infer that the t2 and t3 tablescans happened just once each.

At this point I’ll mention my mistake: when I originally read Yasin’s version of the parallel equivalent of this plan my first thought was that the subquery sections of the plan operate from the bottom upwards (i.e. another one of those examples where “first child first” doesn’t apply) whereas Yasin described them as running top down. I was wrong – I was thinking of the case where the optimizer analyses the query from the bottom up, but it still produces a plan that then runs from the top down – but I didn’t realize my mistake until after I’d been poking around with the parallel version of the query and had made a surprising discovery.

Parallel Execution

So what happens when we manipulate this query into parallelism ? To make it a little easier to see some of the details I’ve added hints to each query block to use a different degree of parallelism; and then, because it all happens so fast, I’ve introduced a function call to slow down the progress of the query. Here’s the function I’ve created (you’ll find it elsewhere on blog) and the final query I started testing with:


create or replace function wait_row (
        i_secs  number,
        i_return        number
) return number
parallel_enable
is
begin
        dbms_lock.sleep(i_secs);
        return i_return;
end;
/


select
        /*+ parallel(t1,5) tracking */ 
        distinct t1.n1
from
        t1
where   
        wait_row(0.01,t1.n2) > (select /*+ parallel(t2,3) */ avg(wait_row(0.01,t2.n2)) from t2)
and     wait_row(0.01,t1.n3) > (select /*+ parallel(t3,2) */ avg(wait_row(0.01,t3.n3)) from t3)
and     t1.n1 >= 1000
--      and     wait_row(0.01, t1.n1) >= 500
;

The wait_row() function simply waits for the number of seconds given in the first parameter then returns the (numeric) value of the second parameter. It calls the dbms_lock.sleep() function which has a granularity of 1/100th of a second, and since I’m planning to use it in a parallel query I’ve included the parallel_enable declaration.

Since the function has not been declared deterministic, or assigned to the pl/sql result cache, and since there are 20,000 rows in the t2 table the time to run the t2 subquery will be roughly 20,000 * 0.01 seconds, which (ignoring parallelism) would be about 200 seconds. I’ve actually hinted DOP = 3 for that subquery so I hope it will complete in about 70 seconds, and the subquery against t3 with hinted DOP = 2 should complete in about 100 seconds.

You’ll notice I have two possible predicates against t1 – the one shown uncommented here doesn’t use call the wait_row() function, the other does. Here’s an interesting point, though, about the example that doesn’t call wait_row() – the simple predicate is going to eliminate ALL the data from t1, there are no rows where n1 >= 1000 and that’s an important point to remember.

Here’s the execution plan:


------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |       |       |    57 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)          | :TQ30001 |     1 |    14 |    57   (4)| 00:00:01 |  Q3,01 | P->S | QC (RAND)  |
|   3 |    HASH UNIQUE                 |          |     1 |    14 |    57   (4)| 00:00:01 |  Q3,01 | PCWP |            |
|   4 |     PX RECEIVE                 |          |     1 |    14 |    57   (4)| 00:00:01 |  Q3,01 | PCWP |            |
|   5 |      PX SEND HASH              | :TQ30000 |     1 |    14 |    57   (4)| 00:00:01 |  Q3,00 | P->P | HASH       |
|   6 |       HASH UNIQUE              |          |     1 |    14 |    57   (4)| 00:00:01 |  Q3,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR       |          |     1 |    14 |    11   (0)| 00:00:01 |  Q3,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL      | T1       |     1 |    14 |    11   (0)| 00:00:01 |  Q3,00 | PCWP |            |
|   9 |          SORT AGGREGATE        |          |     1 |     5 |            |          |  Q3,00 | PCWP |            |
|  10 |           PX COORDINATOR       |          |       |       |            |          |        |      |            |
|  11 |            PX SEND QC (RANDOM) | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|  12 |             SORT AGGREGATE     |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|  13 |              PX BLOCK ITERATOR |          | 20000 |    97K|    18   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|* 14 |               TABLE ACCESS FULL| T2       | 20000 |    97K|    18   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|  15 |          SORT AGGREGATE        |          |     1 |     5 |            |          |        |      |            |
|  16 |           PX COORDINATOR       |          |       |       |            |          |        |      |            |
|  17 |            PX SEND QC (RANDOM) | :TQ20000 |     1 |     5 |            |          |  Q2,00 | P->S | QC (RAND)  |
|  18 |             SORT AGGREGATE     |          |     1 |     5 |            |          |  Q2,00 | PCWP |            |
|  19 |              PX BLOCK ITERATOR |          | 20000 |    97K|    27   (0)| 00:00:01 |  Q2,00 | PCWC |            |
|* 20 |               TABLE ACCESS FULL| T3       | 20000 |    97K|    27   (0)| 00:00:01 |  Q2,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."N1">=1000 AND "WAIT_ROW"(.01,"T1"."N2")> AND "WAIT_ROW"(.01,"T1"."N3")>))
  14 - access(:Z>=:Z AND :Z<=:Z) 20 - access(:Z>=:Z AND :Z<=:Z)

The odd “:Z>=:Z AND :Z <=:Z” predicate is an artefact of rowid-range scans in parallel tablescans, and we’ve seen the “disappearing subquery” problem with predicates pulled from memory so that’s ignorable. The interesting point is that the “T1.N1 >= 1000” predicate will eliminate all the data from t1 very quickly – so why did it take roughly 3 minutes and 30 seconds to run the query ?

Quick Answer – Oracle ran the two subqueries before it started the t1 tablescan !

Slow Answer – I need to start by filling in a lot of background information.

Notice how we have multiple DFO Trees in this execution plan. In the TQ column these are captured as (Q1,xx) (Q2,xx) and (Q3,xx).  In previous blogs I’ve explained how you can follow the order of activity by following the table queues (effectively the “xx” – also known as virtual tables, also known as Data Flow Operations) in order, but I haven’t explained how you deal with multiple DFO trees. It would be nice to think that DFO Trees implicitly told you the order of operation – unfortunately that’s not true (at least in current versions of Oracle), and this query demonstrates the point.

Table Queues, DFOs (data flow operations), and DFO Trees

The first of Yasin Baskan’s two articles describes the terminology to use when describing the movement of data through a parallel execution plan, but for the sake of familiarity I’ll walk through the plan above using the appropriate terms.  (Note: once upon a time the term “DFO tree” didn’t exist and the term DFO was used to describe what is now called a DFO tree, and a number of articles I’ve written in the past echo this out-dated usage).

Lines 1 – 8 show a DFO tree consisting of two DFOs – the tree is labelled as Q3. The first DFO in the tree (DFO 0) is described in lines 5 – 8 where one set of parallel execution slaves scans real table t1 and sends a data set to the other set parallel execution slaves by writing into a virtual table (:TQ30000); the second DFO in the tree (DFO 1) is described in lines 1 – 4 where a set of parallel execution slaves reads from virtual table :TQ30000, does some work and sends a data set to the query co-ordinator by writing to virtual table :TQ30001.

Lines 9 – 14 show a second DFO tree consisting of a single DFO – this tree is labelled Q1;

Lines 15 – 20 show us a third DFO tree, labelled Q2, also consisting of a single DFO.

Essentially (as described by Yasin Baskan) the focal point of a single DFO (data flow operation) is a PX SEND, and the top of a DFO tree is a PX Coordinator.

The key to performance problems and interpreting tricky parallel execution plans is the fact that in the extreme case every DFO tree may operate at a different degree of parallelism, and all the DFO trees may be operating concurrently – an unlucky execution plan may acquire far more parallel execution slaves than you expect.

So what’s the order of execution in this query, how do we find out, how many parallel query slaves will we start (and stop) as the query runs, and what’s the maximum number of parallel query slaves that we will be holding concurrently ? There are lots of places you could look:

  • v$pq_tqstat might give you some idea of processes used and data moved – after you’ve run the query, but it’s a bit broken for multiple DFO trees.
  • v$px_sesstat will give you some information about the work done by currently live slave processes if you can query it while your query is still running (“set pause on” may help).
  • v$px_session joined to v$session_event will give you some information about the time lost to waits for currently live slave processes if you can query them while your query is still running (it would be nice if Oracle were to introduce a “v$px_sesevent”)
  • The SQL Monitor screen in OEM (or the dbms_sql_tune.report_sql_monitor() procedure is fantastic (but a little broken) if you’ve paid the licences for the diagnostic and performance packs.

Here’s part of the text output from a call to dbms_sql_tune.report_sql_monitor() for a slightly different query – one that doesn’t include any predicate at all on the t1 table:


SQL Plan Monitoring Details (Plan Hash Value=3828285674)
==================================================================================================================================================
| Id |            Operation             |   Name   |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                                  |          | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT                 |          |         |      |         1 |   +256 |    11 |     1000 |       |          |                 |
|  1 |   PX COORDINATOR                 |          |         |      |         1 |   +256 |    11 |     1000 |       |          |                 |
|  2 |    PX SEND QC (RANDOM)           | :TQ30001 |      49 |   60 |         1 |   +256 |     5 |     1000 |       |          |                 |
|  3 |     HASH UNIQUE                  |          |      49 |   60 |         1 |   +256 |     5 |     1000 |    7M |          |                 |
|  4 |      PX RECEIVE                  |          |      49 |   60 |         1 |   +256 |     5 |     4506 |       |          |                 |
|  5 |       PX SEND HASH               | :TQ30000 |      49 |   60 |         1 |   +256 |     5 |     4506 |       |          |                 |
|  6 |        HASH UNIQUE               |          |      49 |   60 |        68 |   +189 |     5 |     4506 |    7M |          |                 |
|  7 |         PX BLOCK ITERATOR        |          |      50 |   14 |        68 |   +189 |     5 |    10000 |       |          |                 |
|  8 |          TABLE ACCESS FULL       | T1       |      50 |   14 |        68 |   +189 |    63 |    10000 |       |          |                 |
|  9 |           SORT AGGREGATE         |          |       1 |      |         1 |   +187 |     1 |        1 |       |          |                 |
| 10 |            PX COORDINATOR        |          |         |      |         1 |   +187 |     4 |        3 |       |          |                 |
| 11 |             PX SEND QC (RANDOM)  | :TQ10000 |       1 |      |         3 |   +185 |     3 |        3 |       |          |                 |
| 12 |              SORT AGGREGATE      |          |       1 |      |        74 |   +114 |     3 |        3 |       |          |                 |
| 13 |               PX BLOCK ITERATOR  |          |   20000 |   18 |        74 |   +114 |     3 |    20000 |       |          |                 |
| 14 |                TABLE ACCESS FULL | T2       |   20000 |   18 |        74 |   +114 |    37 |    20000 |       |          |                 |
| 15 |           SORT AGGREGATE         |          |       1 |      |         1 |   +118 |     1 |        1 |       |          |                 |
| 16 |            PX COORDINATOR        |          |         |      |         1 |   +118 |     3 |        2 |       |          |                 |
| 17 |             PX SEND QC (RANDOM)  | :TQ20000 |       1 |      |         2 |   +111 |     2 |        2 |       |          |                 |
| 18 |              SORT AGGREGATE      |          |       1 |      |       111 |     +2 |     2 |        2 |       |          |                 |
| 19 |               PX BLOCK ITERATOR  |          |   20000 |   27 |       111 |     +2 |     2 |    20000 |       |          |                 |
| 20 |                TABLE ACCESS FULL | T3       |   20000 |   27 |       111 |     +2 |    26 |    20000 |       |          |                 |
==================================================================================================================================================

There’s a little inaccuracy in timings – in the order of the odd second or two – probably because of the granularity used, but the columns “Start Active” and “Time Active (s)” are, nevertheless, very revealing. The first one tells you when a line in the plan first did some work, the second one tells you how long that line has been active. As I walk through the plan remember that the timing granularity is fairly crude, so the numbers don’t quite add up as nicely as we might like.

If you check lines 17 – 20 you can see that DFO tree 2 (:TQ2xxxx) was the first thing to go active with its PX slaves starting at time t = 2 seconds and running for 111 seconds to do that scan and initial aggregation; the PX send line started at time t = 111 seconds and ran for 2 seconds to pass the data from the slaves to the coordinator.

There’s a little glitch (or hand waving moment) around lines15 and 16 where the PX Coordinator receives and aggregates data from the PX slaves – the co-ordinator seems to start doing this several seconds later than it should.

In lines 11 – 14 you can see that DFO tree 1 (:TQ1xxxx) was the second thing to go active with its PX slaves starting at time t = 114 (i.e. just after DFO tree 2 completes);  the full scan and initial aggregate ran for 74 seconds (taking us to time t = 188). The PX send started (a little early, this time) at t = 185 and ran for 3 seconds, with the co-ordinator starting at t = 187 and taking one second to receive and do the final aggregation.

In lines 2 – 8 we can see DFO tree 3 (:TQ3xxxx) starting up, but this DFO tree includes two DFOs. One set of slaves scans table t1 applying the predicates to filter the data then does the initial hash distinct before hash distributing the intermediate result to the next set of slaves that finish off the distinct aggregation.

When DFO :TQ30000 starts at t = 189, the previous DFO trees have completed and the PX coordinator has the results of the two subqueries which it passes to the slave set which can now scan and check for: “n1 >= 500 and  n2 > {derived constant} and n3 > {derived constant}”. The scan and initial aggregation takes 68 seconds completing at t = 255, and at t = 256 the results are distributed to the second set of PX slaves. The second set of slaves has virtually nothing to do and reports only 1 second of execution time before passing the data (DFO :TQ30001) to the query coordinator which does the final aggregation and report.

A key point to note is that this timing information tells us that (a) the subqueries both completed before the tablescan of t1 started, and – perhaps more importantly – that we cannot rely on the TQXnnnn numbers to tell us the order of operation of the DFO trees. In this example DFO tree 2 ran first, DFO tree 1 ran second, and DFO tree 3 ran third – some queries might happen to show the names and the order in synch, that would be a coincidence, not a deliberate design feature.

A little reminder, though – even though we cannot rely on the DFO tree names to tell us the ordering in which the DFO trees operate, when we look at the data flow operations inside the trees (i.e. the last few digits of a TQ name) the order of operation withing a given tree matches the sequence indicated by the TQ name.

Counting Slaves.

There’s more in the Monitor report that can help us understand the workload – and in this case I’ve got a lucky coincidence to help, and a little surprise to go with it.  For parallel queries the report includes a summary of parallel execution activity, and this is what I got from my example:

Parallel Execution Details (DOP=5 , Servers Allocated=15)
===========================================================================================================================
|      Name      | Type  | Group# | Server# | Elapsed |   Cpu   | Concurrency | PL/SQL  |  Other   | Buffer | Wait Events |
|                |       |        |         | Time(s) | Time(s) |  Waits(s)   | Time(s) | Waits(s) |  Gets  | (sample #)  |
===========================================================================================================================
| PX Coordinator | QC    |        |         |    0.03 |    0.02 |             |         |     0.02 |     15 |             |
| p00a           | Set 1 |      1 |       1 |    0.27 |    0.27 |             |    0.10 |          |    163 |             |
| p00b           | Set 1 |      1 |       2 |    0.25 |    0.25 |             |    0.09 |          |    156 |             |
| p00c           | Set 1 |      1 |       3 |    0.26 |    0.26 |             |    0.09 |          |    156 |             |

| p00a           | Set 1 |      2 |       1 |    0.43 |    0.43 |             |    0.15 |          |    221 |             |
| p00b           | Set 1 |      2 |       2 |    0.44 |    0.44 |             |    0.15 |          |    221 |             |

| p000           | Set 1 |      3 |       1 |    0.00 |    0.00 |             |         |          |        |             |
| p001           | Set 1 |      3 |       2 |    0.00 |    0.00 |             |         |          |        |             |
| p002           | Set 1 |      3 |       3 |    0.01 |    0.01 |             |         |          |        |             |
| p003           | Set 1 |      3 |       4 |    0.01 |    0.01 |             |         |          |        |             |
| p004           | Set 1 |      3 |       5 |    0.01 |    0.01 |             |         |          |        |             |
| p005           | Set 2 |      3 |       1 |    0.26 |    0.26 |             |    0.09 |          |    113 |             |
| p006           | Set 2 |      3 |       2 |    0.24 |    0.24 |        0.00 |    0.08 |          |    108 |             |
| p007           | Set 2 |      3 |       3 |    0.25 |    0.25 |             |    0.09 |          |    117 |             |
| p008           | Set 2 |      3 |       4 |    0.25 |    0.25 |             |    0.09 |          |    108 |             |
| p009           | Set 2 |      3 |       5 |    0.25 |    0.25 |        0.00 |    0.09 |          |    117 |             |
===========================================================================================================================

I’ve inserted two blank lines breaking the summary down into three separate groups, which you can identify by the heading “Group#”. In this table we see Group 1 has one slave set of three slaves – which corresponds to the tablescan of t2; Group 2 consists of one slave set of two slaves – which corresponds to the tablescan of t3; and Group 1 has two slave sets of 5 slaves each – which correspond to the tablescan and aggregation of t1. The Group numbers appear to align correctly with the DFO tree numbers.

Another detail that stands out from this list is that slaves p00a and p00b are used in the tablescan of t3, and in the tablescan of t2 (where they are joined by slave p00c). In this example Oracle has reused the slaves from one subquery to run the next. However we can also see that the slaves p000p009 that are used for the tablescan of t1 don’t have names that overlap with the slaves used to scan t2 and t3 – which tells use that we have some (in this case two) DFO trees running concurrently.

Another key point in this article is that not only does a DOP of N mean we could run 2*N slaves concurrently in a parallel query, if we have a query that breaks into a plan that uses multiple DFO trees we might 2 * N * {number of DFO trees) allocated and running concurrently. (And, as we see here, it’s even possible that different DFO trees run with different DOPs, leaving Oracle very confused about what to report as the DOP of the query – dbms_xplan actually reported this one as DOP = 4 !)

A final implementation detail that is not necessarily obvious from this table, but which I can infer because I know the history of what was going on: although, as described above, the tablescan of t3 was the first activity that generated a row source, Oracle started up the 10 slaves for the parallel tablescan and aggregate of t1 before it started the two slaves it needed to scan t3. I can infer this because there were no slave processes active when I started running the query, so it is reasonable to assume that the slave names tell me something about the order in which they were allocated. In general you would not be able to notice this in a busy system where slaves were pre-allocated (parallel_min_servers != 0) and constantly being acquired and released. (I was also able to corroborate some of the details above by looking at v$px_stat and v$session_event for the relevant slave processes shortly after the query started.)

Summary

There are a number of key points this example shows us about complex parallel queries:

  • A single query can produce multiple “DFO trees”
  • Each tree can run at a different degree of parallelism
  • The “tree number” (the X in TQXnnnn) may not reveal the order in which the trees are operated
  • The order of operation of the DFOs within a tree is revealed by the nnnn in TQXnnnn
  • It is not easy to see in a basic execution plan which DFO tree might be executing concurrently and which consecutively
  • PX server processes may be acquired by the query co-ordinator a long time before they actually become active

Footnote:

If you happen to have looked closely at the time information in the Parallel Execution Details you’ll note that the times are total rubbish when compared to the monitoring times. It looks as if the time spent in dbms_lock.sleep() has been ignored (perhaps as an “idle” wait).

 

December 10, 2015

Upgrades

Filed under: 12c,Oracle,Upgrades — Jonathan Lewis @ 8:42 am BST Dec 10,2015

One of the questions that came up at the Optimizer Round Table this year was about minimizing the performance-related** hassle of upgrading from 11g to 12c. Dealing with changes in the optimizer is always an an interesting problem but in 12c this is made more challenging because of the automatic dynamic sampling that can introduce a significant amount of extra work at (hard) parse time, then generate SQL Directives, and finally generate extended (column group) statistics the next time you (or the automatic job) collect stats.

Of course one of the traditional strategies for upgrades (especially if you don’t really have a technical need to, but want your Oracle version to be current) is to set the optimizer_features_enable parameter to the older version and hope that this stops most of the new features from appearing, letting other customers identify the problems and raise the SRs that result in the next patch set.

There’s also the option for reading the manuals and white-papers carefully and identifying the new and enhanced optimizer features that might cause you problems if only you can spot them in time in your code. (The enhancement to subquery unnesting I described in my previous post is such an example – do you know of any code in your application that was carefully engineered to maximise the benefit of scalar subquery caching, if so you should check very carefully that 12c isn’t going to transform away your subquery and wreck your performance.)

As a general strategy, though, it’s worth reviewing your code for four generic features – histograms, multi-column joins, predicates that apply a function to a column, and “out-of-range” issues. The first three are features that are likely to make the 12c optimizer decide to do something “dynamic” that may ultimately give you a better execution plan, but may cost you extra resources or result in increased contention at just the wrong moment; the fourth is one that can disrupt the solution to the multi-column problem.

For the purposes of this note I am going to assume that you don’t use partitioned table or that you have already devised a programmatic method for minimising the work while maximising the effectiveness of their stats (and that your programs unlock and relock the stats so that they aren’t re-gathered by accident).

Histograms

I’m assuming by now that you’re using the approximate_ndv method with auto_sample_size to gather object statistics. In 11g this still means that any histograms gathered by Oracle will use a (surprisingly small) sample and even if you are gathering the histograms at the right time they could introduce a lot of instability to execution plans.

In 12c, Oracle can use an approximate method to create a frequency or Top-N frequency histogram while doing the scan that generates all the other stats – so if you have columns where you know the number of distinct values is relatively small (default up to 254, though technically up to 2048) or that almost all the data (except, broadly speaking, one bucket’s worth) comes from a relatively small number of distinct values then, on the upgrade, you should be safe creating a frequency or Top-N histogram once on those columns and setting the method_opt to ‘for all columns size repeat’.

Oracle 12c still uses a small sample size, though, when it detects the need for a “hybrid” histogram – which is the thing that replaces the height-balanced histogram. This means you may still need to write code to generate the histogram rather than allowing Oracle to collect it. It’s the classic compromise problem – a good histogram will be expensive to gather, a cheaply gathered histogram could easily be inaccurate and unstable. In this case you may want to create a Top-N frequency histogram that is a good model of the data, rather than trying to create a correctly structured hybrid. For tables with such columns you need code that can detect changes to the stats and recreates the histogram. In fact, even in cases where Oracle could gather a (frequency) histogram efficiently and accurately, you may still want to detect changes to stats for some columns and write code to create a histogram that describes the data the way it will look when the users are accessing it rather than the way the data looked when Oracle created the histogram (at 2:00 a.m. or whenever the automatic stats collection job hit it).

Broad strategy: set the global preference to method_opt => ‘for all columns size 1’, set a few table preferences to ‘for all columns size repeat’, and have code that checked the last_analyzed date on a few tables and recreates the histogram you want immediately after the stats have been gathered.

Note: as a general guideline, if you know the database needs histograms to produce the most appropriate execution plans this means the front-end code has to co-operate with the database and not conceal useful information in SQL that uses bind variables that hide the significance of special values or ranges – but that’s a topic for another blog note (if I haven’t written anything about it before).

Virtual Columns and Column Groups

In 12c column expressions (where the optimizer guesses) and multi-column predicates (where the optimizer assumes independence) are key triggers to dynamic sampling and SQL Directives. To engage in a pre-emptive strike the ideal is to know the code and know the data. Search for “important” SQL that applies functions (like trunc(date_col)) to columns in predicates or does multi-column (equality) joins or uses filter predicates that reference multiple columns from the same table.

If the execution plans you find for these statements produce estimates which are clearly far from reality then you may need to take pre-emptive action, even (or, perhaps, especially) if the resulting plans look good. The optimizer may (for example) know that it is guessing when it says to itself: “trunc(delivery_date) = trunc(sysdate) will return 1% of the data” and do several things such as create an adaptive execution plan and switch plans in mid-execution, do a load of dynamic sampling to discover the guess was wrong, dump an SQL Plan Directive into the data dictionary that triggers subsequent dynamic sampling and then create a virtual column through the extended stats mechanism.

In cases like this you may want to create and document virtual columns explicitly, and create column groups explicitly before Oracle does its dynamic thing. In the case of the column groups, you only have to worry about equality predicates, a column group cannot be used with range-based predicates. Remember you are only allowed a maximum of 20 column groups per table (although there is, effectively, no limit on the number of virtual columns) so you need to make your choices of columns groups before Oracle goes onto auto-pilot.

Remember, there’s a trap waiting here if you don’t make this effort.  You might run a test suite a couple of times and fix everything that seems to be necessary without realising that in the course of your testing Oracle has created a couple of hundred column groups and virtual columns. Then, when everything is working nicely, you upgrade the production system and all hell breaks loose until the production system has generated the same (we hope) set of virtual columns and column groups. Since there is a limit of 20 column groups per table (or column groups / 10 if you have more than 200 columns) you could get unlucky and find that Oracle wants to create 30 columns groups and the 20 that happen to be created first on production may not be the ones it created on the test system.

Out of Range

When a predicate specifies a value that is above the known high value or below the known low value for a column, the optimizer uses a “linear decay” method for cardinality calculations that is based on how far out of the range the value is (as a percentage of the range). In many cases this makes introduces a slowly increasing error (though for some predicates it can create an immediate catastrophe). There is a special consideration, though,  that means you need to be very careful about time-based or sequence-based columns that can go out of range: if you have a column group that includes the column then the optimizer stops using the column group the moment you go out of range. This is probably a rare condition to check for because you probably use range-based predicates on such columns (e.g. “order_date < trunc(sysdate) – 7”) and column groups are not applicable to range-based predicates anyway, but you do need to be aware that in special cases an execution plan can go bad the moment you overshoot the high value.

For columns like these you need a strategy that allows you to set a high (or low) value that reflects a future high value for the column – even to the extent (for example) of running a scheduler task that calls dbms_stats.set_column_stats() every hour to push the high value forward a little bit.

Update [11th Dec 2015]

Since this note is about preemptive strategies for reducing the number of problems you run into with 12c, it’s worth reminding you about the new “table preference” which you can use to get more realistic values for the clustering_factor on indexes. For example:


begin
        dbms_stats.set_table_prefs(user, 't1', 'table_cached_blocks', 16);
        dbms_stats.gather_index_stats(user, 't1_i1');
end;
/

Setting the table_cached_blocks preference for a table means Oracle will remember a history of recently “visited” table blocks as it walks an index to generate the clustering_factor. Although there is no official word, I think that 16 is a good default value for this setting in single instance Oracle, and 16 * {number of instances} might be appropriate for RAC.

The fragment above is just for demo purposes – you wouldn’t set the value every time you gather stats on an index, it’s a one-off exercise for each table – though you could change it for all existing tables in a schema with a single call to dbms_stats.set_schema_prefs().

The reason why this preference becomes more important in 12c is that the more expensive an execution path is the longer the optimizer will work to find a better path – and bad settings for the clustering_factor make otherwise good paths looks expensive and encourage the optimizer to try all sorts of before deciding on the final path.

Wrap-up

This is just a brief note to pick up a few key points that you can identify as potential threats before you start upgrading. Obviously it will also be of some help after you’ve upgraded (and are still in test mode) to help you explain some of the changes to execution plans and increases in parse times that you are likely to see. The benefit of pre-emptive action, though, is that you may be able to minimise the number of hidden actions (creation of directives and extended stats) that Oracle takes during your testing. It used to be hard enough doing an upgrade in the past, going up to 12c it can be even harder because Oracle may have changed your test database behind your back when you weren’t looking leaving you running a production system that doesn’t match the system you’ve tested.

** Footnote: if you went to Tim Hall’s presentation you’ll know that part of your upgrade process should deal with the move to the multi-tenant architecture, even if you adopt just the single-PDB licence.

 

December 9, 2015

12c Scalar Subquery

Filed under: 12c,Execution plans,Oracle — Jonathan Lewis @ 2:25 pm BST Dec 9,2015

Every version of the optimizer enhances existing mechanisms and introduces new features and 12c has introduced some of the most sophisticated transformation to date; in this note I want to demonstrate an enhancement to subquery unnesting that could give a significant performance boost to a certain query pattern but which might, unfortunately, result in worse performance.

Historically subquery unnesting turned subqueries (correlated or not) in the where clause into joins. In 12c subquery unnesting can also turn scalar subqueries in the select list into joins – we’ll discuss why this could be a good thing but might occasionally be a bad thing later on in the article, but let’s start with a test case.

Sample data.

In my demonstration I’m going to use three tables which, for convenience, are three clones of the same data.

create table t1
as
with generator as (
	select
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum				id,	
	mod(rownum-1,100)		mod100,
	trunc((rownum - 1)/100)		trunc100,
	rownum				n1,
	rownum				n2,
	lpad(rownum,6,'0')		vc1,
	rpad('x',100)			padding
from
	generator
where
	rownum <= 10000
;

create table t2 as select * from t1;
create table t3 as select * from t1;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id,mod100);
create index t3_i1 on t3(id,trunc100);

begin
	dbms_stats.gather_table_stats(user,'t1');
	dbms_stats.gather_table_stats(user,'t2');
	dbms_stats.gather_table_stats(user,'t3');
end;
/

I’ll be examining a query against t1 that includes two correlated scalar subqueries in the select list that reference one each of t2 and t3:


explain plan for
select
	/*+
		qb_name(main)
	*/
	n1, n2,
	(
		select	/*+ qb_name(sq1) */
			max(mod100)
		from	t2
		where	t2.id = t1.id
	) new_n1,
	(
		select	/*+ qb_name(sq2) */
			max(trunc100)
		from	t3
		where	t3.id = t1.id
	) new_n2
from
	t1
where
	t1.id between 101 and 200
;

select * from table(dbms_xplan.display);

11g Plan

This is the execution plan you might expect to see from 11g – in my case, with my system stats etc. and running 11.2.0.4:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   101 |  1212 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |     7 |            |          |
|   2 |   FIRST ROW                  |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T2_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   4 |  SORT AGGREGATE              |       |     1 |     7 |            |          |
|   5 |   FIRST ROW                  |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN (MIN/MAX)| T3_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   7 |  TABLE ACCESS BY INDEX ROWID | T1    |   101 |  1212 |     4   (0)| 00:00:01 |
|*  8 |   INDEX RANGE SCAN           | T1_I1 |   101 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("T1"."ID">=101 AND "T1"."ID"<=200)

As you can see, the operations for the subqueries appear first in the plan (lines 1-3, and 4-6), with the operations for the main query appearing as the last section of the plan (lines 7-8). You might note that the total cost of the plan doesn’t cater for the cost of the subqueries – technically we might expect to see the optimizer producing a cost of something like 408 on the basis that it’s going to run each subquery an estimated 101 times and each subquery has a cost of 2, and the 101 rows are generated from a query with a cost of 4 giving: 4 + 101 * (2 + 2) = 408.

12c Plan

On the upgrade to 12c, the same code produces the following plan:


--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |   101 |  6464 |     8   (0)| 00:00:01 |
|*  1 |  HASH JOIN OUTER                      |          |   101 |  6464 |     8   (0)| 00:00:01 |
|*  2 |   HASH JOIN OUTER                     |          |   101 |  3838 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   101 |  1212 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1    |   101 |       |     2   (0)| 00:00:01 |
|   5 |    VIEW                               | VW_SSQ_2 |   101 |  2626 |     2   (0)| 00:00:01 |
|   6 |     HASH GROUP BY                     |          |   101 |   707 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                 | T2_I1    |   101 |   707 |     2   (0)| 00:00:01 |
|   8 |   VIEW                                | VW_SSQ_1 |   101 |  2626 |     2   (0)| 00:00:01 |
|   9 |    HASH GROUP BY                      |          |   101 |   707 |     2   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN                  | T3_I1    |   101 |   707 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ITEM_1"(+)="T1"."ID")
   2 - access("ITEM_2"(+)="T1"."ID")
   4 - access("T1"."ID">=101 AND "T1"."ID"<=200)
   7 - access("T2"."ID">=101 AND "T2"."ID"<=200)
  10 - access("T3"."ID">=101 AND "T3"."ID"<=200)

As you can see, the separate plans for the subqueries have disappeared and the plan is showing a three-table join (with two outer joins, and two of the “tables” being the non-mergeable view vw_ssq_2 and vw_ssq_1).

There are several details to pick up in this plan (apart from the unnesting). The cost is only 8 – but in this case it isn’t the effect of the optimizer “ignoring” the cost of the subqueries, it’s the optimizer correctly working out the cost of the unnested subqueries with joins. The cost happens to be low in this case because the optimizer has used transitive closure to pass the predicate from the driving query into the subqueries – so we need only do a couple of short index range scans to get all the data we need in the unnested subqueries.

The outer joins are needed because it is valid for the original scalar subquery mechanism to return no data for a subquery and still report a row (with nulls) for t1. If the rewrite didn’t introduce the outer join then t1 rows for which there were no matching t2 or t3 rows would disappear from the result set.

Threats and workarounds

In this (lightweight) example it looks as if this transformation is a good idea, but it’s always possible that the optimizer might choose to do this when it’s a bad idea. In fact, a quick check of the optimizer trace (10053) suggests that this is an uncosted transformation that will take place “because it can”. Here are six highly suggestive consecutive lines from the trace file:


SU: Unnesting query blocks in query block MAIN (#1) that are valid to unnest.
Subquery Unnesting on query block MAIN (#1)
SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block MAIN (#1).
SU:   Unnesting  scalar subquery query block SQ2 (#2)
Registered qb: SEL$2E540226 0x50b1a950 (SUBQ INTO VIEW FOR COMPLEX UNNEST SQ2)

Even if this transformation is cost-based rather than heuristic it’s always possible for the optimizer to make a very poor estimate of cost and do the wrong thing. Fortunately it’s possible to block the unnesting with the “traditional” /*+ no_unnest */ hint:


select
	/*+
		qb_name(main)
		no_unnest(@sq1)
		no_unnest(@sq2)
	*/
	n1, n2, 
...

With these hints in place the execution plan changes back to the 11g form – though there is a notable change in the estimated final cost of the query:


---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   101 |  1212 |   206   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                     |       |     1 |     7 |            |          |
|   2 |   FIRST ROW                         |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)       | T2_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   4 |  SORT AGGREGATE                     |       |     1 |     7 |            |          |
|   5 |   FIRST ROW                         |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN (MIN/MAX)       | T3_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   7 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   101 |  1212 |     4   (0)| 00:00:01 |
|*  8 |   INDEX RANGE SCAN                  | T1_I1 |   101 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("T1"."ID">=101 AND "T1"."ID"<=200)

It’s a little surprising that the final cost is (4 + 202) rather than the  (4 + 404) that we calculated earlier, but a few variations on the test suggest that the optimizer is using half the cost of each of the scalar subqueries in the final cost estimate – perhaps as a nod in the direction of scalar subquery caching.

As always it is important to remember that you cannot look at a query like this and know which plan is going to perform better unless you know the data content, the pattern of data distribution, and the probably effect of scalar subquery caching. In some cases it may be that an extremely “lucky” data pattern will mean that a scalar subquery will run a very small number of times while, with the same data arranged in a different order the benefit of scalar subquery caching will disappear and the unnesting approach may be much better.

It is a convenience to be able to recognize that there is a new optimizer feature in 12c that might give you a performance boost but might hit you with an unlucky performance penalty (especially in cases, perhaps, where you have previously engineered your code very carefully to take advantage of scalar subquery caching) and know that there is a workaround available. In your search for pre-empting production performance problems, this code structure is another pattern to scan for in your source code.

November 24, 2015

Table Expansion

Filed under: 12c,Bugs,Oracle,Partitioning — Jonathan Lewis @ 12:13 pm BST Nov 24,2015

I’ve often found that while I’m investigating one Oracle feature I get waylaid by noticing anomalies in other parts of the code. I was caught by one of these events a little while ago while experimenting with the new (12.1.0.2) Inmemory Columnar Store.  After reading a posting by Martin Bach I asked the question:

“If you have a partitioned table with a local index and one of the table partitions has been declared INMEMORY, would a query that could use that index be able to apply table expansion to produce a plan that did a tablescan on the in-memory partition and an indexed access path on the partitions that weren’t in-memory?”

The answer was yes, with one important caveat – the first test I built to investigate the question did something very silly and produced the wrong results. In fact the error turned out to have nothing to do with the inmemory columnar store, it also appeared when I disabled the inmemory feature off and nudged the optimizer into table expansion by making one of the index partitions unusable. Here’s the demo, followed by a few notes, if you want to test it for yourselves:


create table t1 (
        id,
        n1,
        padding
)
partition by range (id) interval (250000) (
        partition p_start values less than (250000)
        segment creation immediate
        inmemory priority critical
)
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum - 1              id,
        mod(rownum - 1,1250)    n1,             -- 200 rows per segment
        rpad('x',400,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(n1) local nologging;

I’ve created an interval-partitioned table holding a million rows with an interval of 250,000, and I’ve defined the first partition with an upper bound of 250,000 and the inmemory attribute, leaving Oracle to add the other three partitions which won’t have the inmemory attribute set.

I’ve created a local index on the n1 column, with 200 rows (1,250 distinct values) per partition. Because of the way I’ve defined n1 I can easily recreate the table to adjust the relative code of the index accessed path and the tablescan path by adjusting the value I use in the mod() function. The use of the mod() function also means that every partition holds the same volume of data (with exactly the same degree of scattering) for any specific value of n1.

To test the option for table expansion I’ve then checked the execution plan for (and run) four queries:


select
        /*+ full(t1) */
        id
from
        t1
where
        n1 = 1
and     id < 250000
;

select
        /*+ index(t1) */
        id
from
        t1
where
        n1 = 1
and     id < 250000 
;
 
select
        /*+ full(t1) */
        id 
from    t1
where   n1 = 1
and     id >= 250000
and     id <  500000
;

select
        id
from
        t1
where
        n1 = 1
;

The first two queries give me the cost of accessing the inmemory partition by tablescan compared to the cost of accessing it by index. The third query is there to demonstrate that the non-inmemory tablescan is a LOT more expensive than the inmemory one. The final query accesses all four partitions to see if the inmemory partition is treated differently from the other three. Here are the four plans:

Default tablescan when limited to the in-memory partition
----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   200 |  1800 |   184   (9)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE     |      |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory("N1"=1 AND "ID"<250000)
       filter("N1"=1 AND "ID"<250000)



Index access is more expensive than the inmemory tablescan
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   200 |  1800 |   201   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |       |   200 |  1800 |   201   (0)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   200 |  1800 |   201   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   200 |       |     1   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<250000)
   3 - access("N1"=1)



Tablescan on a non-inmemory partition is much higher than inmemory (and indexed access)
-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |   200 |  1800 |   1891  (2)| 00:00:01 |       |       |
|   1 | PARTITION RANGE SINGLE|       |   200 |  1800 |   1891  (2)| 00:00:01 |     2 |     2 |
|*  2 | TABLE ACCESS FULL     | T1    |   200 |  1800 |   1891  (2)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"=1 AND "ID">=250000 AND "ID"<500000)



Access all four partitions - no table expansion. Cost = 4 x cost of single partition indexed access path.
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   800 |  7200 |   807   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL                       |       |   800 |  7200 |   807   (1)| 00:00:01 |     1 |1048575|
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   800 |  7200 |   807   (1)| 00:00:01 |     1 |1048575|
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   800 |       |     6   (0)| 00:00:01 |     1 |1048575|
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=1)

So we can see that the inmemory tablescan of a single segment is cheaper than the indexed access path, and much cheaper than the tablescan of a single non-inmemory segment. When we execute the query that accesses all four segments, though, the optimizer chooses to use the same access path for all four partitions rather then splitting the query into one tablescan and three indexed accesses.

This is just a starting point for testing though – there are transformations where the optimizer will only use a particular transformation if the transformed query has a cost that is “X%” lower than it would be without the transformation (and some of these transformations have associated parameters – usually hidden – that can be set to adjust the percentage). Perhaps if I adjusted the data so that the relative benefit of a single inmemory scan was larger; or if I created a table with more partitions and had two inmemory and three not inmemory, or 3 inmemory and 4 not inmemory; or if I had more inmemory partitions than not then maybe the optimizer would spontaneously do table expansion.

Rather than fiddle about with the data my next step was to hint the final query with /*+ expand_table(t1) */. Here’s the resulting execution plan:


------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |       |       |  1797 (100)|          |       |       |
|   1 |  VIEW                                        | VW_TE_1 |  1600 | 20800 |  1797   (1)| 00:00:01 |       |       |
|   2 |   UNION-ALL                                  |         |       |       |            |          |       |       |
|   3 |    PARTITION RANGE SINGLE                    |         |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|*  4 |     TABLE ACCESS INMEMORY FULL               | T1      |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|   5 |    PARTITION RANGE ITERATOR                  |         |   600 |  5400 |   806   (1)| 00:00:01 |     2 |     4 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |   600 |  5400 |   806   (1)| 00:00:01 |     2 |     4 |
|*  7 |      INDEX RANGE SCAN                        | T1_I1   |   800 |       |     5   (0)| 00:00:01 |     2 |     4 |
|   8 |    PARTITION RANGE INLIST                    |         |   800 | 16800 |   807   (1)| 00:00:01 |       |       |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |   800 | 16800 |   807   (1)| 00:00:01 |       |       |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |   800 |       |     6   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - inmemory(("N1"=1 AND "T1"."ID"<250000))
       filter(("N1"=1 AND "T1"."ID"<250000)) 6 - filter(("T1"."ID">=250000 AND "T1"."ID"<1000000))
   7 - access("N1"=1)
  10 - access("N1"=1)

Points to notice:
Table expansion has occurred – we did an inmemory full tablescan on the inmemory partition at a cost of 184, and index range scans on the other three partitions at a cost of 805 (which looks more like the cost of visiting 4 partitions), expecting 800 rowids from the local index (a failure of partition elimination) and 600 rows from the table. The partition start/stop columns do show that the optimizer expects to visit just the three correct partitions, although some of the cost and cardinality numbers seem to be appropriate to 4 partitions.

Worse still we see a third branch to the UNION ALL / table expansion – operations 8 to 10 – which don’t report a partition start and stop. What’s that doing ? Sometimes, of course, we see sections of plan that don’t actually run – usually preceded by a FILTER operation that can evaluate to FALSE – so maybe that’s what’s happening here. Re-running the query with rowsource execution stats enabled it turned out that the PARTITION RANGE INLIST started once, and operations 9 and 10 didn’t operate at all – so that’s okay.

But what happens if I execute a query that should only access the first two partitions ? Here’s the run-time plan to answer that question:


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  5dgp982ffsph8, child number 0
-------------------------------------
select  /*+ expand_table(t1) gather_plan_statistics */         id from
       t1 where  n1 = 1 and id < 500000

Plan hash value: 2876620067

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |      1 |        |    800 |00:00:00.05 |     649 |
|   1 |  VIEW                                        | VW_TE_1 |      1 |   1200 |    800 |00:00:00.05 |     649 |
|   2 |   UNION-ALL                                  |         |      1 |        |    800 |00:00:00.01 |     649 |
|   3 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |       3 |
|*  4 |     TABLE ACCESS INMEMORY FULL               | T1      |      1 |    200 |    200 |00:00:00.01 |       3 |
|   5 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |     215 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      1 |    200 |    200 |00:00:00.01 |     215 |
|*  7 |      INDEX RANGE SCAN                        | T1_I1   |      1 |    200 |    200 |00:00:00.01 |      15 |
|   8 |    PARTITION RANGE INLIST                    |         |      1 |    800 |    400 |00:00:00.01 |     431 |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      2 |    800 |    400 |00:00:00.01 |     431 |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |      2 |    800 |    400 |00:00:00.01 |      31 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - inmemory(("N1"=1 AND "T1"."ID"<250000))
       filter(("N1"=1 AND "T1"."ID"<250000))
   6 - filter(("T1"."ID"<500000 AND "T1"."ID">=250000))
   7 - access("N1"=1)
  10 - access("N1"=1)

My query should have returned 400 rows, accessing only the first two partitions. It returned 800 rows, accessing all 4 partitions. (Notice how there are no predicates on ID for operations 9 and 10).

Since I had started my investigation with the inmemory option I assumed at this point that the problem was somehow related to the that feature; which shows how easy it is to get too focused and jump to conclusions. After raising the problem with Oracle I got a reply that the problem wasn’t about the inmemory columnar store – and here’s the next little change to test to demonstrate that point:


alter index t1_i1 indexing partial;

alter table t1 modify partition p_start indexing off;
alter table t1 modify partition p_start no inmemory;

select  partition_name, indexing, inmemory 
from    user_tab_partitions 
where   table_name = 'T1';

select  partition_name, segment_created    
from    user_ind_partitions 
where   index_name = 'T1_I1';

Enable partial indexing for the index, switch off the index on the first partition then disable the inmemory option for the partition. This is the plan I got from re-running the two-partition query:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |      1 |        |    800 |00:00:00.08 |   15370 |  14706 |
|   1 |  VIEW                                        | VW_TE_1 |      1 |   1200 |    800 |00:00:00.08 |   15370 |  14706 |
|   2 |   UNION-ALL                                  |         |      1 |        |    800 |00:00:00.07 |   15370 |  14706 |
|   3 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |     216 |      0 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      1 |    200 |    200 |00:00:00.01 |     216 |      0 |
|*  5 |      INDEX RANGE SCAN                        | T1_I1   |      1 |    200 |    200 |00:00:00.01 |      16 |      0 |
|   6 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.04 |   14723 |  14706 |
|*  7 |     TABLE ACCESS FULL                        | T1      |      1 |    200 |    200 |00:00:00.04 |   14723 |  14706 |
|   8 |    PARTITION RANGE INLIST                    |         |      1 |    800 |    400 |00:00:00.01 |     431 |      0 |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      2 |    800 |    400 |00:00:00.01 |     431 |      0 |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |      2 |    800 |    400 |00:00:00.01 |      31 |      0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."ID"<500000 AND "T1"."ID">=250000))
   5 - access("N1"=1)
   7 - filter(("N1"=1 AND "T1"."ID"<250000))
  10 - access("N1"=1)


Again I accessed all 4 partitions and returned 800 rows. As an odd little detail the plan reversed the order of access of partitions 1 and 2. The problem isn’t about the inmemory option, it’s a problem with generic table expansion.

Continuing Tests
I won’t go into all the details of what I did next – once you get started it’s hard to stop, and easy to lose track of what you’ve done. So here’s a quick summary.

Given the problem appeared without tthe inmemory columnar store enabled, I switched back to 11.2.0.4 (where table expansion is also possible) and emulated the problem by setting the first index partition unusable (since partial indexing is a 12c feature). Table expansion did not occur even when hinted: the plan was a full tablescan on both partitions.

So I switched the table from being interval partitioned to simple range partitioned, creating all 4 partitions as I created the data. In 11g I got table expansion and the correct answer without the extra branch to the union all; so I went back to 12c and did the same – pure range partitioning, partial indexing, and got table expansion with the correct result and no spare branches to the plan.

Tentative Conclusion

Clearly my testing is by no means exhaustive – but 12c seems to allow table expansion for interval partitioning in cases where 11g does not; unfortunately it is possible for the table expansion code to go wrong for interval partitioning in cases where simple range partitioning does not. In my case this led to wrong results.

 

Reference Script: in_memory_table_expansion_*.sql

 

 

November 9, 2015

Wrong Results

Filed under: 12c,Bugs,Oracle — Jonathan Lewis @ 6:23 am BST Nov 9,2015

Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result depends on the SQL*Plus arraysize!

The email had a short description, and a script to create a small data set that would demonstrate the problem. I’m not going to show you the query, or the result set, but here’s a sample of the output from an SQL*Plus session after creating the data. This is, by the way, on a “single-user” system – there is no way that some other session is changing the data – especially after the opening “set transaction”:

SQL> set transaction read only;

Transaction set.

SQL> set arraysize 1
SQL> select ...

...

541 rows selected.

SQL> set arraysize 4
SQL> select ...

...

599 rows selected.

SQL> set arraysize 10
SQL> select ...

...

620 rows selected.

SQL> set arraysize 32
SQL> select ...

...

616 rows selected.

The correct result set should have had the 616 rows reported when the arraysize was set to 32 (of, course, it’s possible with an arraysize of 32 the 616 rows returned weren’t the correct 616 rows – rows seemed to get multiplied or deleted fairly arbitrarily as the arraysize changed).

The execution plan was a little unusual in that it forced a nested loop join with a tablescan on the inner table; and when I ran the query with rowsource execution statistics enabled the number of starts of the inner tablescan was 1,597 but the number of rows actually returned varied. My first thought was that some new mechanical optimisation of the tablescan code was losing track of where it had got to in the repeated tablescans – but it turned out I was wrong.

Here’s the execution plan (with camouflage) – the key detail is in a section I didn’t look at intially, the column projection:


select * from table(dbms_xplan.display_cursor('0dh0kh9qa88mz',1,'-note +projection'));

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       | 14118 (100)|          |
|   1 |  NESTED LOOPS       |      |    29 |  2958 | 14118   (2)| 00:00:01 |
|*  2 |   HASH JOIN         |      |   892 | 57088 |    35   (3)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ABC  |   549 | 21411 |    17   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| DEF  |   892 | 22300 |    17   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | XYZ  |     1 |    38 |    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEF"."ABC_FK"="ABC"."ABC_ID")
   5 - filter(("DEF"."COL0"="XYZ"."COL0" AND "XYZ"."COL1"="ABC"."COL1"
              AND "XYZ"."COL2"="ABC"."COL2"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20], "XYZ"."COL1"[NUMBER,22],
       "XYZ"."COL2"[NUMBER,22], "XYZ"."COL0"[VARCHAR2,20]
   2 - (#keys=1) "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20]
   3 - (rowset=200) "ABC"."ABC_ID"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22]
   4 - (rowset=200) "DEF"."ABC_FK"[NUMBER,22], "DEF"."COL0"[VARCHAR2,20]
   5 - "XYZ"."COL1"[NUMBER,22], "XYZ"."COL2"[NUMBER,22],
       "XYZ"."COL0"[VARCHAR2,20]

The predicate section is irrelevant in this case, and I’ve camouflaged the names of the tables and columns – the only interesting bit is the appearance of the (rowset=200) in the projection information. This is reporting a feature new in 12c (and not to be confused with Oracle Java Rowsets) that should improve the performance of some queries.

I didn’t actually look at the projection information until after I’d asked the Oak Table members if they had ever seen this type of anomaly before – and Stefan Koehler emailed back to suggest that the problem might be related to rowsets (there are a couple of similar bugs on MoS, e.g: 17016479 and 20960570) – so I checked the projection, then repeated my tests after disabling the feature with a call to: ‘alter session set “_rowsets_enabled”=false;’

Problem solved – although I’ve told the person who emailed me to report this discovery and workaround to Oracle support and see what they supply as the approved solution.

It is possible to affect the feature through event 10055 – different levels disable it at different locations in the code; the list of options is given in the $ORACLE_HOME/rdbms/mesg/oraus.msg file (if you’re not running Windows):


//            Level:
//            0x00000001 - turn off for table scan
//            0x00000002 - turn off for hash join consume
//            0x00000004 - turn off for hash join produce
//            0x00000008 - turn off for group by
//            0x00000010 - turn off for sort
//            0x00000020 - turn off for table-queue out
//            0x00000040 - turn off for table-queue in
//            0x00000080 - turn off for identity
//            0x00000100 - turn off for granule iterator
//            0x00000200 - turn off for EVA functions
//            0x00000400 - turn off for PL/SQL
//            0x00000800 - turn off for upgrade
//            0x00001000 - turn off for database startup
//            0x00002000 - turn off for blobs and clobs
//            0x00004000 - turn off for tracing row source
//            0x00008000 - turn off rowset information in explain plan
//            0x00010000 - disable hash join rowsets fast path
//            0x00020000 - turn off for bloom create
//            0x00040000 - turn off for bloom use
//            0x00080000 - disable prefetch for hash join
//            0x00100000 - disable prefetch for bloom
//            0x00200000 - disable semi blocking hash join
//            0x00400000 - turn off rowset for fixed table

I tried the first few levels and found that both levels 1 and 2 eliminated the problem (and eliminated the appearance of the (rowset=200) entry in the projection information). Given the shape of the plan I had thought that just one of 1,2 or 4 might have been relevant so I was a little surprised to find that both 1 and 2 were effective – but that’s probably just a question of interpretation of the brief descriptions.

Update

I’ve asked the owner of the problem if it’s okay to post the script to create the tables and data – and the answer was yes: the content was already heavily camouflaged anyway. So here’s a file you can download if you want to test other environments: insert_script_12c_bug

It’s declared as a “.doc” file to get past the upload process, but it’s really a flat text file.

Update 16th Nov

The official “minimum impact” workaround is to set event 10055 at level 2097152 (disable semi blocking hash join). Alternatively there is also a patch available. See Mike Dietriech’s blog for details: https://blogs.oracle.com/UPGRADE/entry/update_for_switch_off_rowsets

Update 7th Dec

Mike Dietriech has updated his second posting to add details of a patch for this bug.

 

October 9, 2015

PL/SQL Functions

Filed under: 12c,Oracle,Performance — Jonathan Lewis @ 6:17 pm BST Oct 9,2015

Assuming everything else has been tuned to perfection, what’s the best you can do while calling PL/SQL functions from SQL ? Here’s a little code to create a table with some data, and a function that we can use to start an investigation:


create table t1
nologging
pctfree 0
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(dbms_random.value(1,1001))        n1
from
        generator       v1,
        generator       v2
where
        rownum <= 3e5
;

create or replace function func_normal(i_in number)
return number
as
begin
        return round(exp(ln(i_in)));
end;
/

That’s 300,000 rows in the table and a silly little function to use up some CPU to get nowhere. There are 1,000 distinct integer values scattered uniformly through the table, and the function returns the value it’s called with – but it does it a hard way.

Here’s some test code – with “set timing on” – followed by the results:


select count(distinct(n1)) from t1;

select 
        count(distinct(func_normal(n1)))
from    t1
;



COUNT(DISTINCT(N1))
-------------------
               1000

1 row selected.

Elapsed: 00:00:00.03

COUNT(DISTINCT(FUNC_NORMAL(N1)))
--------------------------------
                            1000

1 row selected.

Elapsed: 00:00:11.39

This is running on 12.1.0.2 on a machine with a CPU speed of aboaut 3 MHz – you may want to adjust the number of rows in the table for your own testing.

The question is, what options can we use to improve the efficiency of the calls to the PL/SQL. One option, of course, is to use the new 12c “with PL/SQL” clause – we could embed the function in the SQL like this:


with
        function func_with (
                i_in    number
        ) return number
        is
        begin
                return round(exp(ln(i_in)));
        end;
select
        count(distinct(func_with(n1)))
from    t1
/

COUNT(DISTINCT(FUNC_WITH(N1)))
------------------------------
                          1000

1 row selected.

Elapsed: 00:00:09.77

So, for the cost of copying the function into the SQL we get a 10% improvement in performance – which we could reasonably attribute to an improved efficiency in the call mechanism. There are arguments for and against copying code like this, of course, and my improvement was only 1 second for 300,000 calls, but you may decide that the benefit of the “With-PL/SQL” method is sufficient to justify the effort.

If you don’t want to copy a PL/SQL function into the SQL, though, there is another alternative – the pragma UDF (for user-defined function) – which has the effect of reducing the cost of the so-called “context switch” as you call from SQL to PL/SQL or vice versa.

Technically I think what’s happening is that the stack formats for SQL and PL/SQL are different and the context switch is the work needed to reformat the stack as you pass from one environment to the other – by declaring the function as UDF you probably set it up to deal with the incoming SQL stack itself.

Here’s how to add the UDF pragma to the function, and the results showing the effects:


create or replace function func_udf(i_in number)
return number
as
pragma UDF;
begin
        return round(exp(ln(i_in)));
end;
/

COUNT(DISTINCT(FUNC_UDF(N1)))
-----------------------------
                         1000

1 row selected.

Elapsed: 00:00:09.55

With pragma UDF the standalone function call is slightly faster than the in-line “WITH” function. I did wonder whether the pragma UDF would make the function call slower if I simply called it from a loop in a PL/SQL block, but there didn’t seem to be any significant difference between the normal function and the UDF function.

Initially, then, it looks like UDF is faster than WITH, which is faster than basic; but there are other considerations. My sample data has only 1,000 possible inputs to the function – and Oracle has three different methods for caching that I might be able to benefit from:

  • Declaring the function as deterministic
  • Putting the function into the PL/SQL result cache
  • Modifying the SQL to take advantage of scalar subquery caching

Here’s what the function declaration looks like if I want to use the PL/SQL function cache:


create or replace function func_cached(i_in number)
return number
result_cache
as
begin
        return round(exp(ln(i_in)));
end;
/

Changing my query to use func_cached() the query completed in 1.65 seconds – a clear winner, but can anything else get close.

To make the functions deterministic, I just have to add the word “deterministic” after the declaration of the return type:


create or replace function func_normal(i_in number) 
return number
deterministic
as
begin
        return round(exp(ln(i_in)));
end;
/

We can also add the deterministic keyword to the function defined in the WITH clause. Before reporting the results of testing the functions with determinism, there’s one more strategy to consider. Remove the deterministic key word from the functions, and introduce a scalar subquery to the test query, e.g.:


select
        count(distinct(select func_normal(n1) from dual))
from    t1

;

Here’s a table of results:

Method Initial Deterministic Scalar Subquery
Basic 11.39 4.30 4.32
With 9.77 9.72 3.60
UDF 9.55 9.57 3.57
Cached 1.65 0.71 0.72

Summary

Before stating any conclusions it’s worth remembering that the sample data is based on a relatively small number of distinct input values. It is the repetition that allows us to benefit from things like caching and determinism. On top of that we need to consider the scale of the time-saving in light of the number of calls made (or not made).

Having said that, the PL/SQL function cache is clearly the thing that gives us the most benefit in return for a simple implementation. We should remember that the cache is “public” – i.e. stored in the SGA – and each input value takes up another bit of the public result cache: on the plus side this means that everyone else calling this function gets the benefit of our prior calculation; on the minus side this means if we use the result cache for the wrong function then we could take up a lot of space in the cache for very little benefit. Since the relevant result cache latch has not child latches it’s also possible to end up with latch contention if the too many sessions are taking advantage of the result cache too frequently.

Although we might expect a deterministic function to give us a big benefit (in the “very repetitive inputs” case), we find that the deterministic keyword has no effect (as at 12.1.0.2) in functions declared in a WITH clause or declared as standalone with pragma UDF. Fortunately scalar subquery caching (which seems to use the same hashing algorithm as the deterministic caching algorithm) still works with WITH functions or UDF functions and (slightly surprisingly) standalone functions declared with pragma UDF seem to have a very small edge over WITH functions.

Both of the latter two approaches use a local cache with a strictly limited size. The benefit is that the locality means they won’t interfere with other sessions or end up hogging an extreme amount of a public memory area; the drawback is that the size limit (which can be adjusted with a hidden parameter) means that you can get unlucky with hash collisions and end up with extremely variable performance from day to day because of a small change in the data being processed, or even a change in the order in which an unchanged data set is processed.

Footnote

I’ll leave you with one thought. Without declaring the original function as deterministic, I ran the following query to maximise the benefit from the scalar subquery caching algorithm:


select
        count(distinct(select func_normal(n1) from dual))
from    (
        select /*+ no_eliminate_oby */  n1 from t1 order by n1
        )
;

The query completed in 0.13 seconds: the subquery was called 1,000 times (once for each distinct value – see this ancient URL), and the benefit of eliminating the function calls outweighed the cost of having to sort the data set.

Ref: 12c_function_options.sql

September 28, 2015

Result Cache 2

Filed under: 12c,Flashback,Oracle,Performance — Jonathan Lewis @ 8:50 am BST Sep 28,2015

Following on from my earlier posting of problems with temporary table and the PL/SQL result cache (a combination which the manuals warn you against) here’s another problem – again, to a large degree, self-inflicted.

Imagine you have a complex report involving a large number of financial transactions with a need to include calculations about current exchange rates. Unfortunately the rules about calculating the appropriate exchange rate for any transaction are complex and you find you have a choice between adding 6 tables with outer joins and a couple of aggregate (max) subqueries to the base query or calling a PL/SQL function to calculate the exchange rate for each row. I’m going to create an extremely simplified model of this requirement:

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        case mod(rownum,2) when 0 then 'USD' else 'GBP' end v1,
        case mod(rownum,2) when 0 then 'GBP' else 'USD' end v2
from
        generator       v1
where
        rownum <= 1e3
;

create table t2 (
        v1      varchar2(3),
        v2      varchar2(3),
        cvt     number(10,6),
        constraint t2_pk primary key(v1,v2)
)
organization index
;

insert into t2 values('USD','GBP',0);
insert into t2 values('GBP','USD',1);

commit;

create or replace function plsql_ordinary(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number is
        m_ret number;
begin
        select /*+ ordinary trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_ordinary;
/

execute dbms_stats.gather_table_stats(user,'t2')

My t1 table represents the set of transactions but only has to worry about two exchange rates, the dollar/sterling and its inverse. My t2 table is an exchange rate table and I’ve loaded it with the two exchange rates I’m interested in. My function plsql_ordinary() takes two currency codes as inputs and returns the exchange rate.

Here’s the modelling query, with a little infrastructure to examine the workload. Note that this will only run on 12c because of the inline PL/SQL function I’ve used for slowing the query down.

set pagesize 20
set arraysize 6

set serveroutput off
alter system flush shared_pool;
alter session set statistics_level = all;

with
        function wait_row_n (
                i_secs          number,
                i_return        number default -1
        ) return number
        is
        begin
                dbms_lock.sleep(i_secs);
                return i_return;
        end wait_row_n;
select
        /*+ driver trace_this */
        wait_row_n(0.3,id),
        plsql_ordinary(v1,v2),
        (select /*+ scalar trace_this */ t2.cvt from t2 where t2.v1 = t1.v1 and t2.v2 = t1.v2) scalar_sub
from
        t1
where
        rownum <= 50
;

set pagesize 40

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select  sql_id, executions, fetches, rows_processed, sql_text
from    v$sql
where   lower(sql_text) like 'select%trace_this%'
and     lower(sql_text) not like '%v$sql%'
;

The query includes a scalar subquery in the select list to get the same data as the PL/SQL function, and you’ll see the point of that in a while. Because of the arraysize and input parameters to wait_row_n() the query will produce output in batches of 6 rows roughly every two seconds for a total of about 18 seconds – which will give me plenty of time to play around in another session. Before I try to do any damage, though, let’s check the execution plan of the report and the statistics of the queries with “trace_this” in their text:


--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |     50 |00:00:00.01 |      11 |
|*  1 |  INDEX UNIQUE SCAN | T2_PK |      2 |      1 |      2 |00:00:00.01 |       2 |
|*  2 |  COUNT STOPKEY     |       |      1 |        |     50 |00:00:00.01 |      11 |
|   3 |   TABLE ACCESS FULL| T1    |      1 |     50 |     50 |00:00:00.01 |      11 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."V1"=:B1 AND "T2"."V2"=:B2)
   2 - filter(ROWNUM<=50)


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------
f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

Looking at operation 1 in the plan you can see that the scalar subquery has started just twice (once for each distinct combination of currency codes).
Looking at the results from v$sql you can see that the query in the PL/SQL function was executed 50 times – once for each row. Functions like this can be a massive drain of resources (typically CPU, but also through latch contention on buffer cache and shared pool latches).

Conveniently we realise that in our system the derived exchange rates change very slowly – so how about telling Oracle that the exchange rate function is a deterministic function (it’s nearly true), or better still, perhaps, experiment with the PL/SQL Function Result Cache.

(Note very carefully, however, that the Database Administrators’ Manual for 11.2 makes the following comment about using the deterministic keyword with PL/SQL functions)

DETERMINISTIC

Tells the optimizer that the function returns the same value whenever it is invoked with the same parameter values (if this is not true, then specifying DETERMINISTIC causes unpredictable results).

Look carefully at that “unpredictable” – it’s true … but you might not realise it until too late. Our PL/SQL function is NOT deterministic – after all a function that queries the database to produce a result may produce a different result every time it executes if someone keeps changing the underlying data – but we might wave our hands a bit and say that the table isn’t going to change while we’re running our report so it’s okay to pretend it’s deterministic, we might even make it a read-only table for the duration. Similar thoughts should cross our minds about declaring a PL/SQL function to the result cache – even though the manual doesn’t say anything quite so explicitly threatening about the result cache. But what the heck, let’s just do it and see what happens:

create or replace function plsql_result_cache(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number
result_cache
is
        m_ret number;
begin
        select /*+ result cache trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_result_cache;
/

create or replace function plsql_deterministic(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number
deterministic
is
        m_ret number;
begin
        select /*+ det trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_deterministic;
/

...
select
        /*+ driver trace_this */
        wait_row_n(0.3,id),
        plsql_ordinary(v1,v2),
        plsql_deterministic(v1,v2),
        plsql_result_cache(v1,v2),
        (select /*+ scalar trace_this */ t2.cvt from t2 where t2.v1 = t1.v1 and t2.v2 = t1.v2) scalar_sub
from
...

All three functions returned the same set of results as the scalar subquery – and here’s the output from v$sql showing the impact of declaring a deteministic function and a result cache function (note that “result cache” is not a hint in the first statement, it’s just a convenient label):


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg          2          2              2
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1


The simple function ran 50 times, the deteministic function ran 18 times, and the result cache function ran twice. It required just two executions to get the two distinct results needed, after which the session used the result cache rather than calling the function again.

The deterministic function only remembers its results for the duration of a single database call – in this case the fetch – so on each fetch the session has to re-populate the session’s “deterministic cache”, which takes 2 calls for each fetch, a total of 9 fetches * 2 calls = 18 calls.

Clearly, if I can get away with it safely, the PL/SQL function result cache looks like a real winner, with the deterministic option coming a close second (given that real life ought to be using a significantly larger fetch arraysize). So what could possibly go wrong ? Two things – first, the results … and if the results can go wrong there’s not much point in talking about the second thing at the moment.

My query runs for 18 seconds, I’m going to start another session while it runs and update one of the rows in the t2 table a few seconds after my long-running query starts. Here’s the SQL I’ve run, an extract from the query output, and the results from v$sql:


update  t2 set
        cvt = 2
where   v1 = 'USD' 
;

commit;


WAIT_ROW_N(0.3,ID) PLSQL_ORDINARY(V1,V2) PLSQL_DETERMINISTIC(V1,V2) PLSQL_RESULT_CACHE(V1,V2) SCALAR_SUB
------------------ --------------------- -------------------------- ------------------------- ----------
                 1                     1                          1                         1          1
                 2                     0                          0                         0          0
                 3                     1                          1                         1          1
                 4                     0                          0                         0          0
                 5                     1                          1                         1          1
                 6                     0                          0                         0          0
                 7                     1                          1                         1          1
                 8                     0                          0                         0          0
                 9                     1                          1                         1          1
                10                     2                          0                         2          0
                11                     1                          1                         1          1
                12                     2                          0                         2          0
                13                     1                          1                         1          1
                14                     2                          2                         2          0
                15                     1                          1                         1          1
                16                     2                          2                         2          0


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg          4          4              4
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

The most important point is that we’ve got results that are not self-consistent – except for the scalar subquery results.

The SQL statements that are executed inside the PL/SQL functions do not run at the SCN of the driving query, each individual statement executes at its own starting SCN. This is an important point that is often overlooked when people write PL/SQL functions that are then called from SQL. The inline scalar subquery, on the other hand, always runs as at the start SCN of the driving query no matter how many times or how late in the lifetime of the driving query it runs.

If we examine the results we can see that the ordinary PL/SQL function and the result cache PL/SQL function changed their output the moment the commit took place (you’ll have to take my word on that, of course), but the deterministic function seemed to delay slightly. We can also see that the number of executions for the ordinary and deterministic functions didn’t change, but the result cache function doubled its executions.

Because of the way I’ve created my data and defined the function, the ordinary function executes its SQL once every row while the deterministic function executes its SQL twice every fetch (once for each pair of input values (though the first fetch from SQL*Plus is a special case) and then remembers the outputs for the rest of the fetch – this is why there is a delay before the deterministic function changes its output and doesn’t introduce any extra calls to the SQL – it was going to re-execute for both values on the fetch starting at id 13 whatever went on around it; the result cache function gets an invalidation warning the moment the other session commits, so re-executes its SQL as many times as needed to repopulate the bits of the cache that have been discarded – and that’s why we see the number of SQL calls doubling, the relevant part of the cache was identified by some sort of hash value for the statement with SQL_ID = ’49buxp3gba3cg’ so both results were discarded and reloaded even though only one of them actually had to change.

Critically every execution of the recursive statements runs at the then-current SCN – so when the underlying data changes our report sees those changes, the report is not self-consistent.

Fortunately there’s something we can do about this – if we want the whole report to operate at the same SCN all we need to do is freeze our session at a specific point in time with the command “set transaction read only;”. This has to be executed as the first statement of a transaction but if we can slip it in just before we start running our report we can ensure that all our SQL statements (including the recursive ones) run as at the same SCN until we issue a commit. When I repeated the data-changing experiment after setting the transaction read only the report ran to completion showing the correct results.

But there was a significant change in the output from v$sql:


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg         44         44             44
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

I did my update just after the first batch of rows had appeared on screen – notice how the result cache SQL has executed 44 times instead of 2 (or 4) times. When I set my transaction to read only it looks as if my session stopped using the result cache the moment the other session commited – and that’s a GOOD THING. If my session were able to continue using the result cache that would mean one of two things, either I would be seeing a result created by another user – which would be too new for me, or every other session would be seeing the results from my session – which would (typically) be out of date for them. The session seems to have protected itself from the risk of a result cached PL/SQL function producing inconsistent results – but the workload changed the moment another session committed a change to the data we were interested in.

At that point I stopped investigating “set transaction read only” until a couple of days later when I realised that there was one more thing I had to test – when I changed the data from another session I didn’t check to see what that session would see when it executed the cached function, so I modified the code run by the other session to do the following:


update  t2 set 
        cvt = 2 
where   v1 = 'USD' 
;

commit;

execute dbms_lock.sleep(6)
execute dbms_output.put_line(plsql_result_cache('USD','GBP'))

The other session goes to sleep for a while (to let the reporting session get through a little more work) and then calls the function. I was very pleased to see that the session returned the correct result – the value 2 that it had just written to the table. But what I got from the reporting session wasn’t so good:

WAIT_ROW_N(0.3,ID) PLSQL_ORDINARY(V1,V2) PLSQL_DETERMINISTIC(V1,V2) PLSQL_RESULT_CACHE(V1,V2) SCALAR_SUB
------------------ --------------------- -------------------------- ------------------------- ----------
                 1                     1                          1                         1          1
                 2                     0                          0                         0          0
                 3                     1                          1                         1          1
                 4                     0                          0                         0          0
                 5                     1                          1                         1          1
                 6                     0                          0                         0          0
                 7                     1                          1                         1          1
                 8                     0                          0                         0          0
...
                24                     0                          0                         0          0
                25                     1                          1                         1          1
                26                     0                          0                         0          0
                27                     1                          1                         1          1
                28                     0                          0                         0          0
                29                     1                          1                         1          1
                30                     0                          0                         2          0
                31                     1                          1                         1          1
                32                     0                          0                         2          0

SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED SQL_TEXT
------------- ---------- ---------- -------------- --------------------------------
49buxp3gba3cg         32         32             32 SELECT /*+ result cache trace_th
                                                   is */ CVT FROM T2 WHERE V1 = :B2
                                                    AND V2 = :B1

49buxp3gba3cg          1          1              1 SELECT /*+ result cache trace_th
                                                   is */ CVT FROM T2 WHERE V1 = :B2
                                                    AND V2 = :B1

2sh7bm59dkwhw         18         18             18 SELECT /*+ det trace_this */ CVT
                                                    FROM T2 WHERE V1 = :B2 AND V2 = 
                                                    :B1    

f1bz07bk5rbth         50         50             50 SELECT /*+ ordinary trace_this * 
                                                   / CVT FROM T2 WHERE V1 = :B2 AND
                                                    V2 = :B1

I changed t2 just after the first batch of rows had appeared (just after id 6), then called the function a few seconds later – and as I called the function from the other session it queried the data (the second child to 49buxp3gba3cg, executed just once above) and reloaded the result cache. At that moment (visible at id 30) the first session found data in the result cache and stopped re-executing its queries. When my session went read only it protected other sessions from the past by NOT re-populating the result cache as it ran its queries – but if it found something in the result cache it used it (notice how it has recorded 32 executions of the query rather than 44 – it found something in the result cache on the last 12 calls of the function). The protection only goes one way.

Observations

Using PL/SQL functions in the select list to encapsulate complex calculations that query the database is not something you can do casually. You have no guarantee that you will end up with a self-consistent result unless you take action to deal with the threats introduced by concurrent activity – ideally all tables involved should be set to read-only (which is only possible in 12c [Ed: see comment below] became possible from 11.1 onwards, though you can set a whole tablespace readonly in earlier versions: neither strategy is commonly viable). If you decide that you can work around those threats you still have the performance threat implied by the need to do some complex work for each driving row of your result set. For a report the simple solution to consistency is to “freeze” the session SCN by setting your session (transaction) into read only mode.

Once you’ve dealt with the consistency problem you can start to address the performance problen by claiming that you were using deterministic functions. You might be tempted to use the PL/SQL Result Cache to give you an even larger performance boost, but if you do you really have to make the table (or tablespace) read-only to be protected from read-consistency problems. The deterministic strategy may not be as dramatic in its effects as the result cache strategy but, given a reasonably sized fetch array, the performance benefit you get may be all you really need.

Whatever else you do, there’s an important underlying threat to remember. The moment you create a PL/SQL function that uses the result cache or deterministic option you have to ensure that nobody uses that function without ensuring that their code has handled the consistency threat properly. It’s too easy to forget, with the passing of time, that certain procedures have to be adopted when particular coding strategies are used.

Left as Exercise

I was planning to write a further article going into some detail about using dbms_flashback.enable_at_time(systimestamp) instead of “set transaction read only” – a mechanism that might be used to achieve the same read-consistency requirement though, perhaps, used less frequently than the older, less flexible option. I was also planning to include notes in the same araticle about the effect of “alter session set isolation_level = serializable” which some systems probably use to get read-consistency across multiple statements while writing results back to the database.

Both strategies run into the same issue as “set transaction read only”, with the added effect that your session (the one that has tried to “fix” its SCN) will repopulate the cache, so not only could you see newer results from other sessions in the cache; other sessions could see older results because you put them into the cache.

I’m not going to write up these demonstrations (which require fairly simple modifications to the code supplied above) as all the phenomena have been recorded as bugs on MoS (along with the GTT problem from my previous post, and a documentation bug for the PL/SQL manual to clarify the various threats):

  • Bug 21905592 : PL/SQL RESULT CACHE MUST BE BYPASSSED WHEN SESSION SCN NOT CURRENT
  • Bug 21905695 : BAD X-SESSION RESULTS W/ RESULT-CACHED PL/SQL FN AND “SET TXN READ ONLY”
  • Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT
  • Bug 21885173 : THE LNPLS SECTION “RESTRICTIONS ON RESULT-CACHED FUNCTIONS” NEEDS RE-WRITE

Footnote

I haven’t mentioned it here, but another strategy for reducing the number of PL/SQL calls is simply to wrap the function call inside a scalar subquery, along the lines of:

select
       (select plsql_ordinary(v1, v2) from dual),
       ...

Provided you don’t have many combinations of (v1,v2) to handle, and with a little luck with Oracle’s internal hashing function, you could find that scalar subquery caching reduces your execution count from once per row to once per combination. Note that the function is the “ordinary” function, not the “fake deterministic” one; internally Oracle uses the same hashing strategy for remembering the results, but the duration of the scalar subquery cache is the statement rather than the fetch.

 

September 22, 2015

Result Cache

Filed under: 12c,Bugs,Infrastructure,Oracle,Performance — Jonathan Lewis @ 10:11 am BST Sep 22,2015

Yesterday I thought I’d spend half an hour before breakfast creating a little demonstration of a feature; some time about midnight I felt it was time to stop because I’d spent enough time chasing around a couple of bugs that produced wrong results in a variety of ways. Today’s short post is just little warning: be VERY careful what you do with the PL/SQL result cache – if you use the results of database queries in the cache you may end up with inconsistent results in your application. Here’s one very simple example of what can go wrong, starting with a little script:


alter system flush shared_pool;

truncate table gtt1;
drop table gtt1;
create global temporary table gtt1 (n1 number)
on commit preserve rows
;

insert into gtt1 values(1);
commit;

execute dbms_stats.gather_table_stats(user,'gtt1');


create or replace function f_cache
return number 
result_cache 
-- relies_on (gtt1)
is
        m_ret number;
begin
        select  max(n1)
        into    f_cache.m_ret
        from    gtt1
        ;

        return f_cache.m_ret;
end;
/

execute dbms_output.put_line(f_cache)

Here’s the output from a session that’s just connected and run this script (the table already existed from an earlier run):

SQL> @temp

System altered.


Table truncated.


Table dropped.


Table created.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Function created.

1

PL/SQL procedure successfully completed.

SQL> 

Let’s call this session A, and I’m going to connect with another session which I’ll call session B. The following extracts are cut-and-pastes as I alternate between sessions:

Session B:

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

SQL> insert into gtt1 values(0);

1 row created.

SQL> execute dbms_output.put_line(f_cache);
0

PL/SQL procedure successfully completed.


Session B has just seen the data inserted into a global temporary table by session A; but after inserting a row into the table it now sees what it perceives to be the correct answer.

Session A:

SQL> truncate table gtt1;

Table truncated.

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

Despite truncating the table, session A still sees the data which has been eliminated unrecoverably.

Session B (where I hadn’t yet committed):


SQL> commit;

Commit complete.

SQL> 

Session A (where I’ve done nothing new):

SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL> 

The row has finally “disappeared” because session B committed.

Session B (where I haven’t done anything since committing):


SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL> 

Session B no longer sees the data because it’s now seeing what session A has just seen.

Warning.

This is just one of several ways I managed to get surprising inconsistencies when using the (cross-session) PL/SQL Result Cache. Oracle (in 12c) is supposed to know what the cache relies on and react accordingly, but it doesn’t manage to do it properly (even if you include the relies_on clause) – if you query the database in your PL/SQL you may find strange things happen.

The most important point to note in this example is that a session that wasn’t necessarily doing anything wrong got the wrong results because of the actions of another session that Oracle should have protected it from.

The good thing about this is example that it’s documented (sort of) – as it says in the manual: “to be result cached … does not reference … temporary tables ..”. So that should warn people off copying my example; on the other hand the problem I’m seeing arises because Oracle seems to be trying to use the result cache when the manuals suggests it shouldn’t be.

Conclusion

Do not mix the pl/sql result cache with database queries. The cache is public but (unlike the buffer cache) it is not guaranteed to give you read-consistency.

If you think this was an unrealistic example and you don’t need to worry about it – I’ll post a couple more examples in the next couple of weeks. They’ll be slightly special cases again, but I find the Oracle world is full of special cases.

Update

This behaviour is now visible on MoS as “Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT”

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 6,320 other followers