Oracle Scratchpad

June 22, 2016

Conditional SQL- 6

Filed under: Conditional SQL,Execution plans,Oracle,Performance — Jonathan Lewis @ 2:16 pm BST Jun 22,2016

An odd little anomaly showed up on the OTN database forum a few days ago where a query involving a table covered by Oracle Label Security (OLS) seemed to wrap itself into a non-mergeable view when written using traditional Oracle SQL, but allowed for view-merging when accessed through ANSI standard SQL. I don’t know why there’s a difference but it did prompt a thought about non-mergeable views and what I’ve previously called “conditional SQL” – namely SQL which holds a predicate that should have been tested in the client code and not passed to the database engine.

The thought was this – could the database engine decide to do a lot of redundant work if you stuck a silly predicate inside a non-mergeable view: the answer turns out to be yes. Here’s a demonstration I’ve run on 11g and 12c:


rem
rem     Script:         conditional_fail.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016
rem

create table t2
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1e5)         n1,
        lpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1e3)         n1,
        lpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;
-- gather simple statistics (not needed in 12c) 

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t2_i1 on t2(id);

variable b1 number
variable b2 number

exec :b1 := 1; :b2 := 0

There’s nothing terribly significant about the data, beyond the fact that I’ve got a “small” table and a “large” table that I can use to encourage the optimizer to do a hash join. I’ve also created a couple of bind variables and set them to values that ensure that we can see that b1 is definitely not smaller than b2. So here’s a simple query – with a mergeable inline view in the first instance which is then hinted to make the view non-mergeable.


select
        t1.n1, count(*), sum(v1.n1)
from
        t1,
        (select t2.id, t2.n1 from t2 where :b1 < :b2) v1
where
        t1.n1 = 0
and     v1.id = t1.id
group by
        t1.n1
;

select
        t1.n1, count(*), sum(v1.n1)
from
        t1,
        (select /*+ no_merge */ t2.id, t2.n1 from t2 where :b1 < :b2) v1
where
        t1.n1 = 0
and     v1.id = t1.id
group by
        t1.n1
;

Clearly, for our values of b1 and b2, the query will not return any data. In fact we can go further and say that the presence of the “bind variable predicate” in the inline view either has no effect on the volume of data returned or it eliminates all the data. But the presence of the no_merge hint makes a difference to how much work Oracle does for the “no data” option. Here are the two plans, pulled from the memory of an 11g instance after enabling rowsource execution statistics – first when the view is mergeable:


-----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |   508 (100)|      0 |00:00:00.01 |
|   1 |  SORT GROUP BY NOSORT          |       |      1 |      1 |   508   (2)|      0 |00:00:00.01 |
|*  2 |   FILTER                       |       |      1 |        |            |      0 |00:00:00.01 |
|   3 |    NESTED LOOPS                |       |      0 |    100 |   508   (2)|      0 |00:00:00.01 |
|   4 |     NESTED LOOPS               |       |      0 |    100 |   508   (2)|      0 |00:00:00.01 |
|*  5 |      TABLE ACCESS FULL         | T1    |      0 |    100 |   208   (4)|      0 |00:00:00.01 |
|*  6 |      INDEX RANGE SCAN          | T2_I1 |      0 |      1 |     2   (0)|      0 |00:00:00.01 |
|   7 |     TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |     3   (0)|      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:B1<:B2)
   5 - filter("T1"."N1"=0)
   6 - access("T2"."ID"="T1"."ID")

Notice how, despite t2 being the second table in the join, the bind variable predicate has worked its way to the top of the execution plan and execution has terminated after the run-time engine has determined that 1 is not less than zero.

Compare this with the plan when the view is non-mergeable:

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |      1 |        |  2300 (100)|      0 |00:00:00.01 |    1599 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |         |      1 |      1 |  2300   (4)|      0 |00:00:00.01 |    1599 |       |       |          |
|*  2 |   HASH JOIN            |         |      1 |    100 |  2300   (4)|      0 |00:00:00.01 |    1599 |  2061K|  2061K| 1109K (0)|
|   3 |    JOIN FILTER CREATE  | :BF0000 |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|*  4 |     TABLE ACCESS FULL  | T1      |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|   5 |    VIEW                |         |      1 |   1000K|  2072   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|*  6 |     FILTER             |         |      1 |        |            |      0 |00:00:00.01 |       0 |       |       |          |
|   7 |      JOIN FILTER USE   | :BF0000 |      0 |   1000K|  2072   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2      |      0 |   1000K|  2072   (3)|      0 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."ID"="T1"."ID")
   4 - filter("T1"."N1"=0)
   6 - filter(:B1<:B2)
   8 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))

Thanks to the no_merge hint the bind variable predicate has not been promoted to the top of the plan, so the run-time engine has produced a plan that requires it to access data from table t1 before visiting table t2. In fact the optimizer has decided to do a hash join gathering all the relevant data from t1 and building an in-memory hash table before deciding that 1 is greater than zero and terminating the query.

Be careful if you write SQL that compares bind variables (or other pseudo-constants such as calls to sys_context) with bind variables (etc.); you may find that you’ve managed to produce code that forces the optimizer to do work that it could have avoided if only it had found a way of doing that comparison at the earliest possible moment.

These plans were from 11g, but 12c can behave the same way although, with my specific data set, I had to add the no_push_pred() hint to the query to demonstrate the effect of hash join appearing.

Footnote 1

An interesting side effect of this plan is that it has also allowed a Bloom filter to appear in a serial hash join – not something you’d normally expect to see, so I changed the predicate to :b1 > :b2 to see if the no_merge hint combined with the Bloom filter was faster than merging without the Bloom filter. Here are the two plans, pulled from memory. Running this on 11g I had to hint the hash join when I removed the no_merge hint:


Non-mergeable view - Bloom filter appears
------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |      1 |        |  2300 (100)|      1 |00:00:00.12 |   17725 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |         |      1 |      1 |  2300   (4)|      1 |00:00:00.12 |   17725 |       |       |          |
|*  2 |   HASH JOIN            |         |      1 |    100 |  2300   (4)|    100 |00:00:00.12 |   17725 |  2061K|  2061K| 1106K (0)|
|   3 |    JOIN FILTER CREATE  | :BF0000 |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|*  4 |     TABLE ACCESS FULL  | T1      |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|   5 |    VIEW                |         |      1 |   1000K|  2072   (3)|   1605 |00:00:00.10 |   16126 |       |       |          |
|*  6 |     FILTER             |         |      1 |        |            |   1605 |00:00:00.09 |   16126 |       |       |          |
|   7 |      JOIN FILTER USE   | :BF0000 |      1 |   1000K|  2072   (3)|   1605 |00:00:00.08 |   16126 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2      |      1 |   1000K|  2072   (3)|   1605 |00:00:00.07 |   16126 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."ID"="T1"."ID")
   4 - filter("T1"."N1"=0)
   6 - filter(:B1<B2)
   8 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))

View merging allowed - no Bloom filter
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |  2300 (100)|      1 |00:00:07.56 |   17725 |       |       |          |
|   1 |  SORT GROUP BY NOSORT|      |      1 |      1 |  2300   (4)|      1 |00:00:07.56 |   17725 |       |       |          |
|*  2 |   FILTER             |      |      1 |        |            |    100 |00:00:07.56 |   17725 |       |       |          |
|*  3 |    HASH JOIN         |      |      1 |    100 |  2300   (4)|    100 |00:00:07.56 |   17725 |  2061K|  2061K| 1446K (0)|
|*  4 |     TABLE ACCESS FULL| T1   |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1000K|  2072   (3)|   1000K|00:00:01.94 |   16126 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:B1<:B2)
   3 - access("T2"."ID"="T1"."ID")
   4 - filter("T1"."N1"=0)

Things to note – the Bloom filter eliminated all but 1,605 rows from the tablescan before passing them to the hash join operation to probe the hash table; the run time of the query without filtering was 7.56 seconds (!) compared to 0.12 seconds with the Bloom filter. Fortunately I decided that this was too good to be true BEFORE I published the results and decided to re-run the tests with statistics_level set back to typical and found that most of the difference was CPU time spent on collecting rowsource execution statistics. The query with the Bloom filter was still faster, but only just – the times were more like 0.09 seconds vs. 0.12 seconds.

Footnote 2

The source of the problem on OTN was that as well as using OLS the query in question included a user-defined function. Since you can write a user-defined function that “spies” on the data content and uses (e.g.) dbms_output to write data to the terminal this poses a security risk; if a predicate calling that function executed before the security predicate had been tested then your function could output data that your query shouldn’t be able to report. To avoid this security loophole Oracle restricts the way it merges views (unless you set optimizer_secure_view_merging to false). For a more detailed explanation and demonstration of the issues, see this item on Christian Antognini’s blog.

I still don’t know why the ANSI form of the query managed to bypass this issue, but the predicate with the user-defined function was applied as a filter at the very last step of the plan, so perhaps there was something about the transformation Oracle took to get from ANSI syntax to its internal syntax (with cascading lateral views) that made it possible for the optimizer to recognize and eliminate the security threat efficiently.

June 20, 2016

Plan Shapes

Filed under: Execution plans,Oracle — Jonathan Lewis @ 12:58 pm BST Jun 20,2016

There are a number of articles, webinars, and blogs online about how to read execution plans, but many of them seem to stop after the the minimum description of the simplest type of plan, so I thought I’d throw out a brief comment on a couple of the slightly more complicated things that are likely to appear fairly commonly because you sometimes find plans with very similar shapes but extremely different interpretation.

First: select with scalar subqueries in the select list (there’s no need to worry about what the table definitions look like):


rem
rem     Script:         plan_shapes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem

select
        id, n1,
        (select /*+  no_unnest */ max(n1) from t2 where t2.id = t1.n1)  t2_max,
        (select /*+  no_unnest */ max(n1) from t3 where t3.id = t1.n1)  t3_max
from
        t1
where
        id between 100 and 119
;


--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE                      |       |     20 |      1 |     20 |00:00:00.01 |      63 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     20 |      2 |     40 |00:00:00.01 |      63 |
|*  3 |    INDEX RANGE SCAN                  | T2_I1 |     20 |      2 |     40 |00:00:00.01 |      23 |
|   4 |  SORT AGGREGATE                      |       |     20 |      1 |     20 |00:00:00.01 |      83 |
|   5 |   TABLE ACCESS BY INDEX ROWID BATCHED| T3    |     20 |      3 |     60 |00:00:00.01 |      83 |
|*  6 |    INDEX RANGE SCAN                  | T3_I1 |     20 |      3 |     60 |00:00:00.01 |      23 |
|   7 |  TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |     21 |     20 |00:00:00.01 |       8 |
|*  8 |   INDEX RANGE SCAN                   | T1_I1 |      1 |     21 |     20 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("ID">=100 AND "ID"<=119)

We have a select statement, with two columns in the select list generated by (correlated) scalar subqueries.

The overall shape of the plan shows the driving query as the last child plan for the SELECT (operations 7-8). The first and second child plans are the plans for the two scalar subqueries in turn (and the order the sub-plans appear is the order of the scalar subqueries in the select list). In this case the main query returned 20 rows (A-Rows=20), and the scalar subqueries executed 20 times each. There are a few other details we could mention, but the key feature of the plan is that the driver is the last sub-plan.

Second: update with scalar subqueries:


update t1
set
        n1 = (select max(n1) from t2 where t2.id = t1.n1),
        v1 = (select max(v1) from t3 where t3.id = t1.n1)
where
        id between 1000 and 1999
;

---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |       |      1 |        |      0 |00:00:00.13 |   10361 |
|   1 |  UPDATE                               | T1    |      1 |        |      0 |00:00:00.13 |   10361 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |   1001 |   1000 |00:00:00.01 |      22 |
|*  3 |    INDEX RANGE SCAN                   | T1_I1 |      1 |   1001 |   1000 |00:00:00.01 |       4 |
|   4 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.04 |    3672 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    916 |      2 |   1832 |00:00:00.03 |    3672 |
|*  6 |     INDEX RANGE SCAN                  | T2_I1 |    916 |      2 |   1832 |00:00:00.01 |    1840 |
|   7 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.05 |    4588 |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| T3    |    916 |      3 |   2748 |00:00:00.04 |    4588 |
|*  9 |     INDEX RANGE SCAN                  | T3_I1 |    916 |      3 |   2748 |00:00:00.01 |    1840 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1000 AND "ID"<=1999)
   6 - access("T2"."ID"=:B1)
   9 - access("T3"."ID"=:B1)


In this statement we update two columns by executing (correlated) scalar subqueries. The most important feature of interpreting this plan is that it is the exact opposite of the earlier select statement. In this plan the first subplan is the driving part of the statement – it’s the part of the plan that tells us how to find rows to be updated (and we find 1,000 of them); the 2nd and 3rd sub-plans correspond to the subqueries in the two columns whose value we set. In this case we happen to get some benefit from scalar subquery caching so the two subqueries each run 916 times. (The fact that both subqueries run the same number of times is not a coincidence – the caching benefit is dependent on the value(s) used for driving the subqueries, and that’s the t1.n1 column in both cases.)

Finally: because people do all sorts of complicated things, and it’s easy to get deceived if you read the plan without seeing the SQL, one more example (starting with the plan):


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |       |      1 |        |      0 |00:00:00.14 |   11257 |
|   1 |  UPDATE                               | T1    |      1 |        |      0 |00:00:00.14 |   11257 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |   1001 |   1000 |00:00:00.01 |      22 |
|*  3 |    INDEX RANGE SCAN                   | T1_I1 |      1 |   1001 |   1000 |00:00:00.01 |       4 |
|   4 |   SORT AGGREGATE                      |       |   1103 |      1 |   1103 |00:00:00.06 |    5519 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T3    |   1103 |      3 |   3306 |00:00:00.04 |    5519 |
|*  6 |     INDEX RANGE SCAN                  | T3_I1 |   1103 |      3 |   3306 |00:00:00.01 |    2213 |
|   7 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.11 |    9191 |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    916 |      2 |   1832 |00:00:00.03 |    3672 |
|*  9 |     INDEX RANGE SCAN                  | T2_I1 |    916 |      2 |   1832 |00:00:00.01 |    1840 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1000 AND "ID"<=1999)
   6 - access("T3"."ID"=:B1)
   9 - access("T2"."ID"=:B1)


This plan looks very like the previous plan – it’s an update so the first sub-plan will be identifying the rows to be updated, but then what ?

The easiest assumption – always one worth considering, providing you remember that there are other possibilities – is that we have two subqueries executing for each row we update – but in this example there’s a little clue in the numbers that this isn’what the statement does. The driving query (operations 2 – 3) identifies 1,000 rows, so how can operations 4 – 6 start 1,103 times each ? Something more complicated is going on.

I’ve combined the features of the first two examples. I have a single set subquery, but it contains a scalar subquery in the select list – so operations 4 – 9 are the sub-plan for a single select statement – and a select statement with a scalar subquery in the select list puts the main plan last (operations 7 – 9) and its scalar subquery sub-plan first (operations 4 – 6). Here’s the SQL:


update t1
set
        (n1,v1) = (
                select
                        max(n1),
                        max((select /*+  no_unnest */ max(v1) from t3 where t3.id = t2.n1))
                from    t2
                where   t2.id = t1.n1
        )
where
        id between 1000 and 1999
;

What happens here is that the driving body of the update statement identifies 1,000 rows so the scalar subquery against t2 should execute 1,000 times; thanks to scalar subquery caching, though, it only executes 916 times. Each time it executes it finds 2 row and for each of those rows it executes the scalar subquery against t3 which, fortunately, also benefits from its own scalar subquery caching and so runs only 1,103 times in total

I could go on, of course, with increasingly complex examples – for example scalar subqueries that contain decode() calls with scalar subqueries as their inputs; where clauses which compare the results of scalar subqueries, and so on. It can get quite difficult to see, even with the SQL in front of you, what the plan is trying to tell you so I’ll leave you with one thought: always use the qb_name() hint to name every single subquery so that, if interpretation of the plan gets a bit difficult, you can use the ‘alias’ formatting option in the call to dbms_xplan to connect each table reference in the plan with the query block it came from in the query.

 

 

June 10, 2016

Uniquely parallel

Filed under: 12c,distributed,Execution plans,Oracle,Parallel Execution,Performance,Upgrades — Jonathan Lewis @ 7:36 am BST Jun 10,2016

Here’s a surprising (to me) execution plan from 12.1.0.2 – parallel execution to find one row in a table using a unique scan of a unique index – produced by running the following script (data creation SQL to follow):


set serveroutput off
set linesize 180
set trimspool on
set pagesize 60

alter session set statistics_level = all;

variable b1 number
exec :b1 := 50000

select /*+ parallel (3) */ id, v1 from t2 where id=:b1;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

break on dfo_number skip 1 on tq_id skip 1 on server_type

select
        dfo_number, tq_id, server_type, instance, process, num_rows
from
        v$pq_tqstat
order by
        dfo_number, tq_id, server_type desc, instance, process
;

All I’ve done is enable rowsource execution statistics, set a bind variable to a value, query a table with a /*+ parallel(3) */ hint to find the one row that will be identified by primary key, and then reported the actual execution plan. When I first ran the test Oracle didn’t report the execution statistics correctly so I’ve also queried v$pq_tqstat to show the PX servers used and the flow of data through the plan. Here’s the plan, followed by the  results from v$pq_tqstat:


SQL_ID  0dzynh9d29pt9, child number 0
-------------------------------------
select /*+ parallel (3) */ id,v1 from t2 where id=:b1

Plan hash value: 247082613

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   1 |  PX COORDINATOR                   |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |  Q1,01 | P->S | QC (RAND)  |      0 |00:00:00.01 |
|   3 |    TABLE ACCESS BY INDEX ROWID    | T2       |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   4 |     BUFFER SORT                   |          |      0 |        |  Q1,01 | PCWC |            |      0 |00:00:00.01 |
|   5 |      PX RECEIVE                   |          |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |
|   7 |        PX SELECTOR                |          |      0 |        |  Q1,00 | SCWC |            |      0 |00:00:00.01 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |      0 |      1 |  Q1,00 | SCWP |            |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("ID"=:B1)

Note
-----
   - Degree of Parallelism is 3 because of hint

DFO_NUMBER      TQ_ID SERVER_TYP   INSTANCE PROCES   NUM_ROWS
---------- ---------- ---------- ---------- ------ ----------
         1          0 Producer            1 P003            0
                                          1 P004            1
                                          1 P005            0
                      Consumer            1 P000            0
                                          1 P001            1
                                          1 P002            0

                    1 Producer            1 P000            0
                                          1 P001            1
                                          1 P002            0
                      Consumer            1 QC              1

As you can see the table access follows a unique scan of an index and, although the rowsource execution stats report zero starts for the unique scan, we can see from v$pq_tqstat that slave P004 acquired a “row” (actually a rowid) and passed it to slave P001 which then acquired a row from the table and passed that row to the query coordinator. Oracle really did execute a parallel query, starting and stopping a total of 6 sessions to perform a single unique index access.

You’ll notice operation 7 is one you’ve only seen in the latest version of Oracle. The PX SELECTOR was introduced in 12c to reduce the number of times a complex parallel query would funnel into the query coordinator (parallel to serial) and then fan out again (serial to parallel) generating a new data flow operation tree (DFO tree) spawning one or two new parallel server groups as it did so. To stop this happening a step that needs to serialise in a 12c parallel plan can nominate one of the existing PX server processes (from each set, if necessary) to do the job so that the same set of PX servers can carry on running the query without the need for a new DFO tree to appear.

This enhancement to parallel execution plans is a good idea – except when it appears in my silly little query and turns something that ought to be quick and cheap into a job that is far more resource-intensive than it should be.

At this point, of course, you’re probably wondering what kind of idiot would put a parallel() hint into a query that was doing nothing but selecting one row by primary key – the answer is: “the Oracle optimizer in 12c”. I discovered this anomaly while creating a demonstration of the way that a distributed parallel query has to serialise through a single database link even if the operations at the two ends of the link run parallel. Here’s the SQL I wrote for the full demonstration:


rem     Script:         distributed_pq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016

define m_remote='test@loopback'
define m_remote='orcl@loopback'

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,8,'0') as varchar2(8))         v1,
        cast(rpad('x',100) as varchar2(100))            padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;

create table t2
as
select  *
from    t1
where   mod(id,100) = 0
;

alter table t2 add constraint t2_pk primary key(id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

set serveroutput off

select
        /*+ parallel(3) */
        t1.v1, t2.v1
from
        t1,
        t2@&m_remote
where
        mod(t1.id,10) = 0
and     t2.id = t1.id
and     mod(to_number(t2.v1),10) = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

If you want to run this demo you’ll need to do something about formatting the output; more importantly you’ll have to create a database link (with a loopback link) and set up a define identifying it at the line where I’ve got orcl@loopback and test@loopback (which are my 12c and 11g loopback links respectively).

Here’s the plan (with rowsource stats) I got from the 12c test:


----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   1 |  NESTED LOOPS         |          |      1 |     10 |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   2 |   PX COORDINATOR      |          |      1 |        |        |      |            |  10000 |00:00:00.11 |       7 |      0 |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |      0 |   1000 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
|   4 |     PX BLOCK ITERATOR |          |      3 |   1000 |  Q1,00 | PCWC |            |  10000 |00:03:17.72 |    1745 |   1667 |
|*  5 |      TABLE ACCESS FULL| T1       |     39 |   1000 |  Q1,00 | PCWP |            |  10000 |00:00:00.06 |    1745 |   1667 |
|   6 |   REMOTE              | T2       |  10000 |      1 |        |      |            |      0 |00:01:14.44 |       0 |      0 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(:Z>=:Z AND :Z<=:Z)
       filter(MOD("T1"."ID",10)=0)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   6 - SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1
       (accessing 'ORCL@LOOPBACK' )

Note
-----
   - Degree of Parallelism is 3 because of hint

I have hacked this output a little – the “Remote SQL” section didn’t get reported by display_cursor(), so I’ve inserted the remote sql I got from a call to dbms_xplan.display() after using explain plan to generate a plan. Note the /*+ shared(3) */ hint that appears in the remote SQL – that’s the internal version of a parallel(3) hint.

In 11g the query complete in 2.4 seconds, in 12c the query took nearly 75 seconds to run thanks to the 12c enhancement that allowed it to obey the hint! Looking at the time column (and ignoring the anomalous 3:17 at operation 4 – which might roughly be echoing 3 * 1:14) we can see that the time goes on the calls to the remote database (and a check of v$session_event shows this time spent in “SQL*Net message from db link”), so the obvious thing to do is check what actually happened at the remote database and we can do that by searching the library cache for a recognizable piece of the remote SQL – here’s the SQL to do that, with the results from 11g followed by the results from 12c:


SQL> select sql_id, child_number, executions, px_servers_executions, sql_text from v$sql
  2  where sql_text like '%SHARED%' and sql_text not like 'select sql_id%';

11g results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
c0f292z5czhwk            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE MOD(TO_NUMBER("V1"),10)=1 AND "ID"=:1


12c results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
7bk51w7vtagwd            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

7bk51w7vtagwd            1          0                 59995
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

It’s not surprising to see that the query has executed 10,000 times – that’s what we were told by the Starts statistic from dbms_output.display_cursor(), but 12c has 60,000 (with a little error) PX Servers executions of the statement. That’s 10,000 executions * degree 3 * the 2 slave sets we saw in my original execution plan. (It’s an odd little quirk of the two versions of Oracle that the order of predicates in the remote SQL was reversed between 11g and 12c – leading to two different SQL_IDs).

By enabling rowsource execution stats at the system level I was able to capture the remote execution plan with its stats:


SQL_ID  7bk51w7vtagwd, child number 0
-------------------------------------
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND
MOD(TO_NUMBER("V1"),10)=1

--------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  PX COORDINATOR                   |          |      0 |        |      0 |00:00:00.01 |       0 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  3 |    TABLE ACCESS BY INDEX ROWID    | T2       |  29983 |      1 |      0 |00:00:22.21 |    1000 |
|   4 |     BUFFER SORT                   |          |  29995 |        |    999 |00:00:21.78 |       0 |
|   5 |      PX RECEIVE                   |          |  29924 |      1 |    994 |00:00:21.21 |       0 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |
|   7 |        PX SELECTOR                |          |  29993 |        |    999 |00:00:06.08 |   19992 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |  29999 |      1 |   1000 |00:00:00.24 |   20000 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MOD(TO_NUMBER("V1"),10)=1)
   8 - access("ID"=:1)

Unlike the test case I started with, this output did show the number of starts (with a few missing) and the work done across the slaves. Our index probe had to do two buffer gets on every execution, and we have 10,000 executions of the query so 20,000 buffer gets on the index unique scan. Even though only one slave actually does any work with the PX Selector, all three slaves in that set seem to “start” the relevant operations. The definition of the data meant that only one index probe in 10 was successful so we only visited 1,000 buffers from the table. If you’re familiar with reading plans with rowsource execution stats you’ll appreciate that something has gone wrong in the reporting here – that 1,000 at operation 3 should read 21,000 because it ought to include the 20,000 from the index scan (at least, that’s what a serial plan would do).

If you’re still wondering why running this query as a parallel query should take so long – after all it’s only 10,000 executions in 70 seconds – bear in mind that Oracle has to allocate and deallocate 6 PX servers to new sessions each time it starts; the instance activity stats showed “logons cumulative” going up by 60,000 each time I ran the driving query: that’s about 850 logons (and log offs) per second. I don’t think my test machine would give a realistic impression of the impact of a couple of copies of this query running simultaneously, but when I tried the contention introduce increased the run time to 93 seconds.

tl;dr

Watch out for poor performance becoming disastrous for distributed parallel queries when you upgrade from 11g to 12c

 

 

.

June 7, 2016

Quiz Night

Filed under: 12c,Execution plans,Oracle,subqueries — Jonathan Lewis @ 7:35 pm BST Jun 7,2016

Here’s an execution plan from a recent OTN database forum posting:

 
------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |                    |     1 |   231 |   160   (6)| 00:00:02 |
|   1 |  UPDATE                   | GS_TABLE           |       |       |            |          |
|*  2 |   HASH JOIN SEMI          |                    |     1 |   231 |   130   (0)| 00:00:02 |
|*  3 |    TABLE ACCESS FULL      | GS_TABLE           |     5 |   895 |   123   (0)| 00:00:02 |
|*  4 |    TABLE ACCESS FULL      | UPDATEDPROGRAMCODE |   850 | 44200 |     7   (0)| 00:00:01 |
|*  5 |   VIEW                    |                    |    11 |  2024 |     8  (13)| 00:00:01 |
|*  6 |    WINDOW SORT PUSHED RANK|                    |    11 |   440 |     8  (13)| 00:00:01 |
|*  7 |     FILTER                |                    |       |       |            |          |
|*  8 |      TABLE ACCESS FULL    | UPDATEDPROGRAMCODE |    11 |   440 |     7   (0)| 00:00:01 |
|   9 |   VIEW                    |                    |   850 |  1138K|     9  (23)| 00:00:01 |
|  10 |    SORT ORDER BY          |                    |   850 |   685K|     9  (23)| 00:00:01 |
|* 11 |     VIEW                  |                    |   850 |   685K|     8  (13)| 00:00:01 |
|  12 |      WINDOW SORT          |                    |   850 | 47600 |     8  (13)| 00:00:01 |
|* 13 |       FILTER              |                    |       |       |            |          |
|* 14 |        TABLE ACCESS FULL  | UPDATEDPROGRAMCODE |   850 | 47600 |     7   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Unfortunately the originator of this plan declined to show us the query or answer any questions about where the work was going, but did confirm a speculative comment I had made that the instance was 12c. So the question is this: can you spot what it was that made me think that the plan came from 12c ?

I have to say, by the way, that there may be ways to get this plan from 11g, it was just that my first impression was that it was probably 12c and I didn’t attempt to come up with a way of getting a similar plan from 11g. (And, as far as the general shape of the plan is concerned, I can think of two different types of query that could produce it.)

Footnote

You are allowed to prove me wrong.

Answer

(Which might be me showing ignorance rather than inspiration)

The basic shape of the plan suggests to me that the query is of the form:

update gs_table 
set     col1 = (select from updatedprogramcode),
        col2 = (select from updatedprogramcode)
where   exists (select from updatedprogramcode)         -- possibly "where IN (subquery)"
;

There are a couple of variations in how the “set” subquery content might vary, and I’ll write up a short blog about that later.

Having noted this basic shape, I then noted that the subqueries involved analytic functions – as indicated by the WINDOW SORT operations; moreover one of them used a PUSHED RANK option and the other was embedded in a non-mergeable VIEW (operation 11). Updates with subqueries generally involve correlated columns – and prior to 12c there are some restrictions on how far up the tree the correlation can go. Here’s a sample query (using two tables that I’ve cloned from all_objects) to demonstrate:


update t1 set
        data_object_id = (
                select  objno
                from    (
                        select
                                object_id objno,
                                row_number() over (order by  object_id desc) rn
                        from
                                t2
                        where
                                t2.object_type = t1.object_type
                        )
                where rn = 1
        )
/

We need to embed the inner select statement in an inline view because we want to use the result of the row_number() analytic function in a filter predicate, but in Oracle 11g the reference to t1.object_id can’t correlate back to the outer t1 table, while in 12c this isn’t a problem. Here’s the 12c plan, followed by the 11g error:


12c Plan (autotrace)
----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |      |  5000 | 45000 | 70012  (15)| 00:04:34 |
|   1 |  UPDATE                   | T1   |       |       |            |          |
|   2 |   TABLE ACCESS FULL       | T1   |  5000 | 45000 |    12   (0)| 00:00:01 |
|*  3 |   VIEW                    |      |     1 |    26 |    13   (8)| 00:00:01 |
|*  4 |    WINDOW SORT PUSHED RANK|      |   556 |  6116 |    13   (8)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL     | T2   |   556 |  6116 |    12   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("RN"=1)
   4 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID")
              DESC )<=1)
   5 - filter("T2"."OBJECT_TYPE"=:B1)


11g Error
---------
                                t2.object_type = t1.object_type
                                                 *
ERROR at line 11:
ORA-00904: "T1"."OBJECT_TYPE": invalid identifier

Notice, by the way that my predicate “rn = 1” has resulted in the WINDOW SORT PUSHED RANK that appeared in the original plan.

In case I haven’t said it enough times: this is a just a rapid inference I drew from looking briefly at the plan and I haven’t tried hard to work out whether there is a way to get a plan like this in 11g. It was nice being proved right by the follow-up post from the OP, but my guess may have been right by accident – I’d rather be proved wrong than carry on thinking I’d got it right when I hadn’t … so feel free to supply an example in the comments.

 

May 11, 2016

dbms_xplan

Filed under: dbms_xplan,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 12:22 pm BST May 11,2016

My favourite format options for dbms_xplan.display_cursor().

This is another of those posts where I tell you about something that I’ve frequently mentioned but never documented explicitly as a good (or, at least, convenient) idea. It also another example of how easy it is to tell half the story most of the time when someone asks a “simple” question.

You’re probably familiar with the idea of “tuning by cardinality feedback” – comparing the predicted data volumes with the actual data volumes from an execution plan – and I wrote a short note about how to make that comparison last week; and you’re probably familiar with making a call to dbms_xplan.display_cursor() after enabling the capture of rowsource execution statistics (in one of three ways) for the execution of the query, and the format parameter usually suggested for the call is ‘allstats last’ to get the execution stats for the most recent execution of the query. I actually like to see the Cost column of the execution plan as well, so I usually add that to the format, so (with all three strategies shown for an SQL*Plus environment):

set linesize 180
set trimspool on
set pagesize 60
set serveroutput off

alter session set "_rowsource_execution_statistics"=true;
alter session set statistics_level=all;

select /*+ gather_plan_statistics */ * from user_tablespaces;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

So what do we often forget to mention:

  • For SQL*Plus it is important to ensure that serveroutput is off
  • The /*+ gather_plan_statistics */ option uses sampling, so may be a bit inaccurate
  • The two accurate strategies may add a significant, sometimes catastrophic, amount of CPU overhead
  • This isn’t appropriate if the query runs parallel

For a parallel query the “last” execution of a query is typically carried out by the query co-ordinator, so the rowsource execution stats of many (or all) of the parallel execution slaves are likely to disappear from the output. If you’re testing with parallel queries you need to add some “tag” text to the query to make it unique and omit the ‘last’ option from the format string.

Now, a common suggestion is that you need to add the ‘all’ format option instead – but this doesn’t mean “all executions” it means (though doesn’t actually deliver) all the data that’s available about the plan. So here’s an execution plans produced after running a parallel query and using ‘allstats all’ as the format option (t1 is a copy of all_objects, and this demo is running on 12.1.0.2).

SQL_ID  51u5j42rvnnfg, child number 1
-------------------------------------
select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
group by object_type order by object_type

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |       |   113 (100)|          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |       |            |          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   8 - SEL$1 / T1@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   2 - (#keys=0) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   3 - (#keys=1; rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   4 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   5 - (#keys=1) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   6 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   7 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]
   8 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]

Note
-----
   - Degree of Parallelism is 2 because of hint


48 rows selected.

You’ll notice we’ve reported the “alias” and “projection” information – those are two of the format options that you can use with a + or – to include or exclude if you want. We’ve also got E-Bytes and E-time columns in the body of the plan. In other words (at least in my opinion) we’ve got extra information that makes the output longer and wider and therefore harder to read.

The format string I tend to use for parallel query is ‘allstats parallel cost’ – which (typically) gives something like the following:

SQL_ID  51u5j42rvnnfg, child number 1
-------------------------------------
select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
group by object_type order by object_type

Plan hash value: 2919148568

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |   113 (100)|        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   113   (9)|  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   113   (9)|  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |     30 |   113   (9)|  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |   105   (2)|  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |   105   (2)|  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 2 because of hint


30 rows selected.

Of course you may prefer ‘allstats all’ – and sometimes I do actually want to see the alias or projection information – but I think there’s so much information available on the execution plan output that anything that makes it a little shorter, cleaner and tidier is a good thing.

You might have noticed, by the way, that the Buffers, Reads, and A-Time columns have still managed to lose information on the way up from operation 6; information that should have been summing up the plan has simply disappeared.  Make sure you do a sanity check for disappearing numbers when you’re looking at more complex plans.

 

May 5, 2016

E-rows / A-rows

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:26 pm BST May 5,2016

A recent post on the OTN database forum reminded me how easy it is to forget to keep repeating a piece of information after the first couple of hundred times you’ve explained it. No matter how “intuitively obvious” it is for one person, it’s new to someone else.

Here’s an execution plan that raised the question that prompted this note – it comes from calling dbms_xplan.display_cursor() with the ‘allstats last’ format option after enabling rowsource execution statisics (using hint gather_plan_statistics, or setting parameter statistics_level to all, or setting hidden parameter “_rowsource_execution_statistics” to true):

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |      1 |        |   1989 |00:00:04.96 |    9280 |    897 |
|   1 |  NESTED LOOPS OUTER                   |                        |      1 |   2125 |   1989 |00:00:04.96 |    9280 |    897 |
|   2 |   NESTED LOOPS OUTER                  |                        |      1 |   2125 |   1989 |00:00:04.93 |    9271 |    895 |
|   3 |    NESTED LOOPS OUTER                 |                        |      1 |   2125 |   1989 |00:00:00.03 |    5732 |      0 |
|   4 |     COLLECTION ITERATOR PICKLER FETCH |                        |      1 |   1989 |   1989 |00:00:00.01 |       0 |      0 |
|*  5 |     TABLE ACCESS BY INDEX ROWID       | TABLE1                 |   1989 |      1 |   1178 |00:00:00.03 |    5732 |      0 |
|*  6 |      INDEX RANGE SCAN                 | IDX_TABLE1             |   1989 |      2 |   2197 |00:00:00.02 |    3545 |      0 |
|   7 |    TABLE ACCESS BY INDEX ROWID        | TABLE2                 |   1989 |      1 |   1178 |00:00:03.26 |    3539 |    895 |
|*  8 |     INDEX UNIQUE SCAN                 | IDX_TABLE2_PK          |   1989 |      1 |   1178 |00:00:03.25 |    2359 |    895 |
|   9 |   TABLE ACCESS BY INDEX ROWID         | TABLE3                 |   1989 |      1 |      0 |00:00:00.03 |       9 |      2 |
|* 10 |    INDEX UNIQUE SCAN                  | IDX_TABLE3_PK          |   1989 |      1 |      0 |00:00:00.03 |       9 |      2 |
-----------------------------------------------------------------------------------------------------------------------------------

Many people have heard about “tuning by cardinality feedback” (a term first used, I think, by Wolfgang Breitling many years ago), and it’s fairly common knowledge that this requires you to compare the predicted (estimated) E-rows with the actual A-rows generated by a line of the plan. A critical detail that often fails to appear when this point is being explained is that there is a very important difference between the meaning of E-rows and A-rows. Although this is a point I make very clear in my presentations on how to read execution plans, it was only when I found myself explaining it to Tony Hasler a couple of years ago [ed: see comment 3 – at least 6 years ago!] that I realised that it’s not commonly known and that, in casual conversation, it’s easy to overlook the fact that it’s something that you might need to mention explicitly. So here’s the critical detail:

  • E-Rows is the number of rows the optimizer is expecting the operation to produce each time it is called
  • A-Rows is the cumulative number of rows summed across all calls to the operation

As a consequence of this difference any comparison you make should not be comparing A-Rows with E-rows, it should be comparing A-Rows with E-Rows * Starts so, for example,  operation 6 in the plan above (index range scan idx_table1) shows a reasonable estimate because 1,989 starts at 2 rows per start is fairly close to the final 2,197 rows produced (especially when you consider that there’s a lot of integer rounding going on to display the values).

 

May 3, 2016

Debugging

Filed under: CBO,compression,Execution plans,Infrastructure,Oracle,Uncategorized — Jonathan Lewis @ 8:11 am BST May 3,2016

The OTN database forum supplied a little puzzle a few days ago – starting with the old, old, question: “Why is the plan with the higher cost taking less time to run?”

The standard (usually correct) answer to this question is that the optimizer doesn’t know all it needs to know to predict what’s going to happen, and even if it had perfect information about your data the model used isn’t perfect anyway. This was the correct answer in this case, but with a little twist in the tail that made it a little more entertaining. Here’s the query, with the two execution plans and the execution statistics from autotrace:


SELECT  /* INDEX(D XPKCLIENT_ACCOUNT) */ 
        E.ECID,A.acct_nb
FROM    
        client_account d, 
        client         e, 
        account        a
where
        A.acct_nb ='00000000000000722616216'</li>


AND     D.CLNT_ID = E.CLNT_ID
AND     D.ACCT_ID=A.ACCT_ID;

Plan (A) with a full tablescan of client_account – cost 808, runtime 1.38 seconds, buffer gets 17,955


-------------------------------------------------------------------------------------------------
| Id | Operation                      | Name           | Rows  | Bytes  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                |     1 |    59  |   808 (14) | 00:00:10 |
|  1 |  NESTED LOOPS                  |                |     1 |    59  |   808 (14) | 00:00:10 |
|  2 |   NESTED LOOPS                 |                |     1 |    59  |   808 (14) | 00:00:10 |
|* 3 |    HASH JOIN                   |                |     1 |    42  |   806 (14) | 00:00:10 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT        |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT    |     1 |        |     4  (0) | 00:00:01 |
|  6 |     TABLE ACCESS FULL          | CLIENT_ACCOUNT |  9479K|   108M |   763 (10) | 00:00:09 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT      |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT         |     1 |    17  |     2  (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 17955  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Plan (B) with an index fast full scan on a client_account index – cost 1,190, runtime 0.86 seconds, buffer gets 28696


----------------------------------------------------------------------------------------------------
| Id | Operation                      | Name              | Rows  | Bytes  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  1 |  NESTED LOOPS                  |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  2 |   NESTED LOOPS                 |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|* 3 |    HASH JOIN                   |                   |     1 |    42  |  1188  (8) | 00:00:14 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT           |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT       |     1 |        |     4  (0) | 00:00:01 |
|  6 |     INDEX FAST FULL SCAN       | XPKCLIENT_ACCOUNT | 9479K |   108M |  1145  (5) | 00:00:13 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT         |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT            |     1 |    17  |     2  (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 28696  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Note, particularly, that the two plans are the same apart from operation 6 where a full tablescan changes to an index fast full scan, predicting the same number of rows but with an increase of 50% in the cost; the increase in cost is matched by an increase in the reported workload – a 60% increase in the number of consistent reads and no disk reads or recursive SQL in either case. Yet the execution time (on multiple repeated executions) dropped by nearly 40%.

So what’s interesting and informative about the plan ?

The cost of a tablescan or an index fast full scan is easy to calculate; broadly speaking it’s “size of object” / “multiblock read count” * k, where k is some constant relating to the hardware capability. The costs in these plans and the autotrace statistics seem to be telling us that the index is bigger than the table, while the actual run times seem to be telling us that the index has to be smaller than the table.

It’s easy for an index to be bigger than its underlying table, of course; for example, if this table consisted of nothing but two short columns the index could easily be bigger (even after a rebuild) because it would be two short columns plus a rowid. If that were the case here, though, we would expect the time to fast full scan the index to be higher than the time to scan the table.

So two thoughts crossed my mind as I looked at operation 6:

  • Mixing block sizes in a database really messes up the optimizer costing, particularly for tablescans and index fast full scans. Maybe the table had been built in a tablespace using 32KB  blocks while the index had been built in a tablespace using the more common 8KB blocksize – I didn’t want to start working out the arithmetic but that might be just enough to produce the contradiction.
  • Maybe the table was both bigger AND smaller than the index – bigger because it held more data, smaller because it had been compressed. If so then the difference in run-time would be the overhead of decompressing the rows before projecting and comparing the data.

Conveniently the OP has included an extract from the 10053 trace:


Table Stats::
  Table: CLIENT_ACCOUNT  Alias:  D
    #Rows: 9479811  #Blks:  18110  AvgRowLen:  71.00  ChainCnt:  0.00
  Column (#1): CLNT_ID(
    AvgLen: 6 NDV: 1261035 Nulls: 0 Density: 0.000001 Min: 0 Max: 4244786
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 239
  Column (#2): ACCT_ID(
    AvgLen: 6 NDV: 9479811 Nulls: 0 Density: 0.000000 Min: 1 Max: 22028568
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255

Index Stats::
  Index: XPKCLIENT_ACCOUNT  Col#: 1 2
    LVLS: 2  #LB: 28543  #DK: 9479811  LB/K: 1.00  DB/K: 1.00  CLUF: 1809449.00

Note that the index is called xpclient_account – which suggests “primary key” –  and the number of distinct keys in the index (#DK) matches the number of rows in the table(#Rows). The index and table stats seem to be consistent so we’re not looking at a problem of bad statistics.

Now to do some simple (ballpark) arithmetic: for the table can we check if  “rows * average row length / 8K =  blocks”. We can read the numbers directly from the trace file:  9,500,000 * 71 / 8,000 = 84,000.  It’s wrong by a factor of about 4 (so maybe it’s a 32K block, and maybe I could rule out that possibility by including more detail in the arithmetic – like allowing properly for the block header, row overheads, pctfree etc).

For the index – we believe it’s the primary key, so we know the number of rows in the index – it’s the same as the number of distinct keys. As for the length of an index entry, we have the index definition (col#: 1 2) and we happen to have the column stats about those columns so we know their average length. Allowing for the rowid and length bytes we can say that the average index entry is (6 +1) + (6 + 1) + 6 = 20 bytes.  So the number of leaf blocks should be roughy 9,500,000 * 20 / 8,000 = 23,750. That’s close enough given the reported 28,543 and the fact that I haven’t bothered to worry about row overheads, block overheads and pctfree.

The aritmetic provides an obvious guess – which turned out to be correct: the table is compressed, the index isn’t. The optimizer hasn’t allowed for the CPU cost of decompressing the compressed rows, so the time required to decompress 9.5M rows doesn’t appear in the execution plan.

Footnote.

Looking at the column stats, it looks like there are roughly 8 acct_ids for each clnt_id, so it would probably be sensible to compress the primary key index (clnt_id, acct_id) on the first column as this would probably reduce the size of the index by about 20%.

Better still – the client_account table has very short rows – it looks like a typical intersection table with a little extra data carried. Perhaps this is a table that should be an index-organized table with no overflow. It looks like there should also be an index (acct_id, clnt_id) on this table to optimse the path from account to client and this would become a secondary index – interestingly being one of those rare cases where the secondary index on an IOT might actually be a tiny bit smaller than the equivalent index on a heap table because (in recent versions of Oracle) primary key columns that are included in the secondary key are not repeated in the index structure. (It’s a little strange that this index doesn’t seem to exist already – you might have expected it to be there given the OP’s query, and given that it’s an “obvious” requirement as an index to protect the foreign key.)

The only argument against the IOT strategy is that the table clearly compresses very well as a heap table, so a compressed heap table plus two B-tree indexes might be more cost-effective than an IOT with a single secondary index.

 

April 1, 2016

Set Operations

Filed under: 12c,CBO,Execution plans,Oracle — Jonathan Lewis @ 2:20 pm BST Apr 1,2016

A recent post on the OTN database forum highlights a couple of important points ideas for optimising SQL. There are: (a) is there a logically equivalent way of stating the SQL and (b) is there a different “natural language” way of posing the problem.

The posting starts with a query, part of an execution plan, and a request to “get rid of the tablescan”. I guessed originally that the query came from an 11g instance, and the OP gave us some code to create the tables and indexes, so I’ve modelled the tables to get the indicated plan (then filled in the original numbers). This is the query, and my cosmetically adjusted version of the plan output that the OP probably got:


SELECT a.hotel_code
  FROM lf_hotel_temp a
WHERE a.service_id = : p_service_id
       AND (NOT EXISTS (SELECT *
          FROM lf_ts_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code)
        or NOT EXISTS (SELECT *
          FROM lf_gta_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code) 
       or  NOT EXISTS (SELECT *
          FROM lf_hb_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code))

-------------------------------------------------------------------------------
| Id  | Operation          | Name                     | Rows  |  Bytes | Cost |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                          | 12613 | 113517 |  135 |
|*  1 |  FILTER            |                          |       |        |      |
|*  2 |   TABLE ACCESS FULL| LF_HOTEL_TEMP            | 88433 | 795897 |  135 |
|*  3 |   INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX  |     1 |      7 |    1 |
|*  4 |   INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX |     1 |      9 |    1 |
|*  5 |   INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX  |     2 |     14 |    3 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
              "B"."HOTEL_CODE"=:B1) OR  NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
              WHERE "B"."HOTEL_CODE"=:B2) OR  NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
              "B" WHERE "B"."HOTEL_CODE"=:B3))
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   3 - access("B"."HOTEL_CODE"=:B1)
   4 - access("B"."HOTEL_CODE"=:B1)
   5 - access("B"."HOTEL_CODE"=:B1)

We were told in the original posting that there’s a primary key on lf_hotel_temp declared on (hotel_code, service_id), and we were given the definitions, sizes, and index declarations of all the table in a follow-up posting. It turns out that lf_hotel_temp consists of just those two columns and holds 278,000 rows: the optimizer’s estimate for the number of rows identified by a single service_id is over 88,000, and the nature of the query tells us that the optimizer would have to examine every one of those rows to check if it satisfied any of the three subqueries.

So how might Oracle access the rows ?  Given that the only columns used will all be in the primary key index (which implies not null constraints) there are four basic options: tablescan, index fast full scan, index full scan, and index skip scan. Given the most likely data content (i.e. lots of different hotel_codes), we can assume the skip scan would be a very bad idea. We can be sure that an index fast full scan will be lower cost than an index full scan – for anything except tiny indexes. Ultimately the question is really “why a tablescan instead of an index fast full scan?”. As I pointed out, though, the table consists of just those two columns – which means it’s perfectly reasonable for the index to be larger than the table as each entry of the index will consist of the two columns AND a rowid.

The first interesting bit

The question of why the access to lf_hotel_temp was by tablescan rather than some indexed method isn’t really interesting. The interesting bit is how (in principle) we might make the plan more efficient (if it really needs it); and this leads to two key, and general purpose, observations. As Andrew Sayer pointed out on the thread, we have a compound predicate:

    (not exists A OR not exists B OR not exists C)

and this is logically equivalent to

   not (exists A AND exists B AND exists C)

If we rewrite the query to reflect this equivalence could the optimizer find a different, better way of executing it:


select  /*+ dynamic_sampling(0) */
        a.hotel_code
from    lf_hotel_temp a
where
        a.service_id = :p_service_id
and     not(
                exists (
                        select  null
                        from    lf_ts_roomtype_properties ts
                        where   ts.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_gta_roomtype_properties gta
                        where   gta.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_hb_roomtype_properties hb
                        where   hb.hotel_code = a.hotel_code
                )
        )
;

Of course, I didn’t have the original data; so I copied the DDL supplied in the OTN thread and added a little DML to insert a few rows in the tables. The data I used looked like this:


insert into lf_hotel_temp (hotel_code, service_id) values ('A',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('B',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('C',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('D',1);

-- insert into lf_ts_roomtype_properties values ( 'A','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_gta_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_gta_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_hb_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'B','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_hb_roomtype_properties values ( 'D','x','x',0,1,'x');
commit;

It’s possible that with different data volumes you’d get different execution plans, but in 11g the optimizer transformed my query back into the original form – in other words it recognised the equivalence of “not (A and B and C)” and rewrote it as “(not A or not B or not C)” !

However, I also have 12c available, and I had created a script to build a model, so I ran the test on 12c. Both versions of the query produced the following plan:


----------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|   2 |   VIEW                | VW_SQ_1                    |    82 |   984 |     6   (0)| 00:00:01 |
|*  3 |    HASH JOIN SEMI     |                            |    82 |  2952 |     6   (0)| 00:00:01 |
|*  4 |     HASH JOIN         |                            |    82 |  1968 |     4   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES |    82 |   984 |     2   (0)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| LF_HB_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|   7 |     TABLE ACCESS FULL | LF_TS_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|*  8 |   INDEX FULL SCAN     | LF_HOTEL_TEMP_PK           |   101 |   198K|     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("VW_COL_1"="A"."HOTEL_CODE")
   3 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("TS"."HOTEL_CODE"))
   4 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("GTA"."HOTEL_CODE"))
   8 - access("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))

Ignore the numbers (I hadn’t collected stats, which is why I added the /*+ dynamic_sampling(0) */ hint – with stats in place 12c produced the FILTER plan that 11g had produced) the key feature is that Oracle has managed to transform my three filter subqueries into a single join subquery and then transformed the resulting subquery into an anti-join. It’s a pretty amazing transformation – the optimizer did it automatically in 12c, but if you are aware of the logical equivalence then you may find cases where you can turn “OR’s” into “AND’s” and help the optimizer to find transformations that it can’t find automatically.

The second interesting bit

If you think about the meaning behind the query (prompted, perhaps, by the logical equivalence described above) you might rephrase the question as “find me the hotel codes that fail to appear in all three related tables” – in English this is ambigious and open to catastrophic mis-interpretation so you might have another go and say “find me the hotel codes that appear in every one of the three related tables – those are the hotel codes I don’t want”. This latter expression, of course, is exactly what Oracle is doing by joining the three tables and then doing the “not exists”/anti-join against the result. Obviously you could translate the new English form into SQL by hand, with a three table join in a “not exists” subquery.

I actually took a different approach (which might, or might not, be efficient – depending on the actual data and indexes).  I translated the new English statement into the following:


select  /*+ dynamic_sampling(0) */
        hotel_code
from    lf_hotel_temp
where   service_id = :p_service_id
minus   (
        select  hotel_code
        from    lf_ts_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_gta_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_hb_roomtype_properties
        where   hotel_code is not null
        )
;

The three way intersection gets me the list of hotels that appear in all three tables; the minus operator takes the list of hotel with the correct service_id and eliminates from it the hotels that appear in the intersection – giving me the result I want.

For my tiny data set, this is the plan I got:

--------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |  2159 |     8  (50)| 00:00:01 |
|   1 |  MINUS                |                          |       |       |            |          |
|   2 |   SORT UNIQUE NOSORT  |                          |     1 |  2015 |     2  (50)| 00:00:01 |
|*  3 |    INDEX FULL SCAN    | LF_HOTEL_TEMP_PK         |     1 |  2015 |     1   (0)| 00:00:01 |
|   4 |   INTERSECTION        |                          |       |       |            |          |
|   5 |    INTERSECTION       |                          |       |       |            |          |
|   6 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  7 |      INDEX FULL SCAN  | LF_TS_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
|   8 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  9 |      INDEX FULL SCAN  | LF_GTA_ROOMTYPE_PROP_IDX |     4 |    48 |     1   (0)| 00:00:01 |
|  10 |    SORT UNIQUE NOSORT |                          |     4 |    48 |     2  (50)| 00:00:01 |
|* 11 |     INDEX FULL SCAN   | LF_HB_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   7 - filter("HOTEL_CODE" IS NOT NULL)
   9 - filter("HOTEL_CODE" IS NOT NULL)
  11 - filter("HOTEL_CODE" IS NOT NULL)

Important note: I am not claiming that this use of set operators will be more efficient than a filter subquery or anti-join/semi-join approach, performance ultimately depends on the volume and patterns in the data combined with the available indexing. In this case you can almost see the classic performance compromise that we often see in Oracle – even in the trade-off between something as simple as choosing between a hash join and a nested loop join – should we operate this query as a tiny number of “bulk” operations, or as a (potentially) large number of tiny, high-precision operations.

If the original query was spending all it’s time on CPU running lots of subqueries, or doing lots of single block random I/Os because of the random ordering of the subqueries, then perhaps a couple of brute force “db file parallel read” index full scans would be a friendlier use of the available resources, run more quickly, and have less impact on every other user.

 

February 15, 2016

Connect By

Filed under: Execution plans,Hints,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 2:01 pm BST Feb 15,2016

I received an email a couple of days ago that was a little different from usual – although the obvious answer was “it’s the data”. A connect by query with any one of several hundred input values ran in just a few seconds, but with one specific input it was still running 4,000 seconds later using the same execution plan – was this a bug ?

There’s nothing to suggest that it should be, with skewed data anything can happen: even a single table access by exact index could take 1/100th of a second to return a result if there was only one row matching the requirement and 1,000 seconds if there were 100,000 rows in 100,000 different table blocks (and the table was VERY big). The same scaling problem could be true of any type of query – and “connect by” queries can expose you to a massive impact because their run time can increase geometrically as the recursion takes place.

So it was easy to answer the question – no it’s (probably) not a bug, check the data for that one value.

Then I decided to build a simple model. The original email had a four table join, but I just created a single table, and used a “no filtering” connect by which I had to hint. Here’s some code I ran on 11.2.0.4:


rem
rem     script: connect_by_skew.sql
rem     dated:  Feb 2016
rem     Last tested:
rem             12.1.0.2
rem

create table t1 nologging 
as
select 
        rownum id_p, 10 * rownum id
from
        all_objects
where 
        rownum <= 50000 ; execute dbms_stats.gather_table_stats(user,'t1', method_opt=>'for all columns size 1')

alter system flush shared_pool;

set serveroutput off
alter session set statistics_level = all;

select sum(ct) 
from    (
        select
                /*+ no_connect_by_filtering */
                count(id) ct
        from
                t1
        connect by
                id = 20 * prior id_p
        start with
                id_p = 1
        group by
                id
)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

update t1 set id_p = 0
where   id_p = 1
;

update t1 set id_p = 1
where   id_p > 45000
;

select sum(ct) 
from    (
        select
                /*+ no_connect_by_filtering */
                count(id) ct
        from
                t1
        connect by
                id = 20 * prior id_p
        start with
                id_p = 1
        group by
                id
)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

The sum() of the inline aggregate view emulates the original code – I don’t know what it was for, possibly it was a way of demonstrating the problem without producing a large output, I just copied it.

As you can see in my script every parent id (id_p) starts out unique, and if I look at the pattern of the raw data identified by the recursion from id_p = 1 (rather than looiking at the result of the actual query) this is what I’d get:

      ID_P         ID
---------- ----------
         1         10
         2         20
         4         40
         8         80
        16        160
        32        320
        64        640
       128       1280
       256       2560
       512       5120
      1024      10240
      2048      20480
      4096      40960
      8192      81920
     16384     163840
     32768     327680

When I modify the data so that I have exactly 5,000 rows with id_p = 1 the initial data generation will be 80,000 rows of data. If you want to try setting id_p = 1 for more rows make sure you do it to rows where id_p is already greater than 32768 or you’ll run into Oracle error ORA-01436: CONNECT BY loop in user data.

Here’s the execution plan, with rowsource execution stats I got for the first query (running 11.2.0.4):


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |      1 |        |    32 (100)|      1 |00:00:00.44 |     103 |       |       |          |
|   1 |  SORT AGGREGATE                            |      |      1 |      1 |            |      1 |00:00:00.44 |     103 |       |       |          |
|   2 |   VIEW                                     |      |      1 |      2 |    32   (7)|     16 |00:00:00.44 |     103 |       |       |          |
|   3 |    HASH GROUP BY                           |      |      1 |      2 |    32   (7)|     16 |00:00:00.44 |     103 |  1519K|  1519K| 1222K (0)|
|*  4 |     CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |            |     16 |00:00:00.44 |     103 |       |       |          |
|   5 |      TABLE ACCESS FULL                     | T1   |      1 |  50000 |    31   (4)|  50000 |00:00:00.10 |     103 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, this took 0.44 seconds, generated the expected 16 rows (still visible up to operation 2) which it then counted. Oracle followed the same execution plan when I set 5,000 rows to the critical value – here’s the new run-time plan:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |      1 |        |    32 (100)|      1 |00:05:39.25 |     103 |       |       |          |
|   1 |  SORT AGGREGATE                            |      |      1 |      1 |            |      1 |00:05:39.25 |     103 |       |       |          |
|   2 |   VIEW                                     |      |      1 |      2 |    32   (7)|   5015 |00:05:39.24 |     103 |       |       |          |
|   3 |    HASH GROUP BY                           |      |      1 |      2 |    32   (7)|   5015 |00:05:39.22 |     103 |  5312K|  2025K| 1347K (0)|
|*  4 |     CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |            |  80000 |00:05:38.56 |     103 |       |       |          |
|   5 |      TABLE ACCESS FULL                     | T1   |      1 |  50000 |    31   (4)|  50000 |00:00:00.09 |     103 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

As expected, 80,000 rows generated (5,000 * 16), aggregated down to 5,015, then aggregated again to the one row result. Time to complete: 5 minutes 39 seconds – and it was all CPU time. It’s not entirely surprising – a single recursive descent (with startup overheads) took 0.44 seconds – presumably a fairly large fraction of that was startup, but even 0.1 seconds adds up if you do it 5,000 times.

Everybody knows that skewed data can produced extremely variable response times. With a deeper tree and more rows with the special value it wouldn’t be hard for the total run time of this query to get to the 4,000 seconds reported in the original email. (I also tried running with 10,000 rows set to 1 and the run time went up to 18 minutes – of which a large fraction was reading from the TEMPORARY tablespace because something had overflowed to disc).

Was there a solution ?

I don’t know – but I did suggest two options
a) create a histogram on the data to show that there was one particular special value; since the code seemed to include literals perhaps the optimizer would notice the special case and choose a different plan.
b) hint the code to use a different strategy – the hint would be /*+ connect_by_filtering */. Here’s the resulting execution plan:


---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |    95 (100)|      1 |00:00:06.50 |    1751 |       |       |          |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |            |      1 |00:00:06.50 |    1751 |       |       |          |
|   2 |   VIEW                       |      |      1 |      2 |    95   (6)|   5015 |00:00:06.49 |    1751 |       |       |          |
|   3 |    HASH GROUP BY             |      |      1 |      2 |    95   (6)|   5015 |00:00:06.47 |    1751 |  5312K|  2025K| 1346K (0)|
|   4 |     CONNECT BY WITH FILTERING|      |      1 |        |            |  80000 |00:00:06.30 |    1751 |   337K|   337K|  299K (0)|
|*  5 |      TABLE ACCESS FULL       | T1   |      1 |      1 |    31   (4)|   5000 |00:00:00.01 |     103 |       |       |          |
|*  6 |      HASH JOIN               |      |     16 |      1 |    63   (5)|     15 |00:00:05.98 |    1648 |  1969K|  1969K|  741K (0)|
|   7 |       CONNECT BY PUMP        |      |     16 |        |            |     16 |00:00:00.01 |       0 |       |       |          |
|   8 |       TABLE ACCESS FULL      | T1   |     16 |  50000 |    31   (4)|    800K|00:00:01.49 |    1648 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

We get the result in 6.5 seconds! [UPDATE: but there’s a nice explanation for that – most of the time comes from the work done gathering rowsource execution statistics; with statistics_level set back to typical the run time dropped to 0.19 seconds.]

February 5, 2016

Parallel DML

Filed under: Execution plans,Oracle,Parallel Execution,Performance — Jonathan Lewis @ 1:02 pm BST Feb 5,2016

A recent posting on OTN presented a performance anomaly when comparing a parallel “insert /*+ append */” with a parallel “create table as select”.  The CTAS statement took about 4 minutes, the insert about 45 minutes. Since the process of getting the data into the data blocks would be the same in both cases something was clearly not working properly. Following Occam’s razor, the first check had to be the execution plans – when two statements that “ought” to do the same amount of work take very different times it’s probably something to do with the execution plans – so here are the two statements with their plans:

First the insert, which took 45 minutes:

insert  /*+ append parallel(a,16) */ into    
        dg.tiz_irdm_g02_cc  a
select
        /*+ parallel (a,16) parallel (b,16) */ 
        *
from    tgarstg.tst_irdm_g02_f01 a, 
        tgarstg.tst_irdm_g02_f02 b
where   a.ip_id = b.ip_id
;

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |
|   1 |  LOAD AS SELECT                  | TIZ_IRDM_G02_CC  |       |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR                 |                  |       |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->S | QC (RAND)  |
|*  4 |     HASH JOIN BUFFERED           |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE                  |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,02 | PCWP |            |
|   6 |       PX SEND HASH               | :TQ10000         |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   7 |        PX BLOCK ITERATOR         |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWP |            |
|   9 |      PX RECEIVE                  |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,02 | PCWP |            |
|  10 |       PX SEND HASH               | :TQ10001         |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | P->P | HASH       |
|  11 |        PX BLOCK ITERATOR         |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWC |            |
|  12 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------

And here’s the ‘create table’ at 4:00 minutes:

create table dg.tiz_irdm_g02_cc 
nologging 
parallel 16 
compress for query high 
as
select
        /*+ parallel (a,16) parallel (b,16) */ 
        *
from    tgarstg.tst_irdm_g02_f01 a , 
        tgarstg.tst_irdm_g02_f02 b 
where
        a.ip_id = b.ip_id

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |                  |    13M|    36G|       |   397K  (1)| 00:00:14 |        |      |            |
|   1 |  PX COORDINATOR                  |                  |       |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)            | :TQ10002         |    13M|    36G|       |   255K  (1)| 00:00:09 |  Q1,02 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT                | TIZ_IRDM_G02_CC  |       |       |       |            |          |  Q1,02 | PCWP |            |
|*  4 |     HASH JOIN                    |                  |    13M|    36G|  1842M|   255K  (1)| 00:00:09 |  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE                  |                  |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,02 | PCWP |            |
|   6 |       PX SEND HASH               | :TQ10000         |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   7 |        PX BLOCK ITERATOR         |                  |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,00 | PCWP |            |
|   9 |      PX RECEIVE                  |                  |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,02 | PCWP |            |
|  10 |       PX SEND HASH               | :TQ10001         |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,01 | P->P | HASH       |
|  11 |        PX BLOCK ITERATOR         |                  |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,01 | PCWC |            |
|  12 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,01 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------

As you can see, the statements are supposed to operate with degree of parallelism 16, and we were assured that the pre-existing table had been declared as nologging with the same level of compression as that given in the CTAS so, assuming the queries did run with the degree expected, they should take virtually the same amount of time.

But there’s an important clue in the plan about why there was a difference, and why the difference could be so great. The first statement is DML, the second is DDL. Parallel DDL is automatically enabled, parallel DML has to be enabled explicitly otherwise the select will run in parallel but the insert will be serialized. Look at operations 1 – 4 of the insert – the query co-ordinator does the “load as select” of the rowsource sent to it by the parallel execution slaves. Not only does this mean that one process (rather than 16) does the insert, you also have all the extra time for all the messaging and the hash join (at line 4) has to be buffered – which means a HUGE amount of data could have been dumped to disc by each slave prior to the join actually taking place and then been read back from disc, joined, and forwarded.

Note that the hash join in the CTAS is not buffered – each slave does the join as the data arrives and writes the result directly to its local segment. Basically the insert could be doing something like twice the I/O of the CTAS (and this is Exadata, so reads from temp can be MUCH slower than the tablescans that supply the data to be joined).

So the OP checked, and found that (although he thought he had enabled parallel DML) he hadn’t actually done so. And after enabling parallel DML the timing was … just as bad. Ooops!! Something else must have gone wrong. Here’s the plan after enabling parallel DML:


--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |
|   1 |  PX COORDINATOR                    |                  |       |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10003         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,03 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT                  | TIZ_IRDM_G02_CC  |       |       |       |            |          |  Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                     |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,03 | PCWP |            |
|   5 |      PX SEND RANDOM LOCAL          | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->P | RANDOM LOCA|
|*  6 |       HASH JOIN BUFFERED           |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |
|   7 |        PX RECEIVE                  |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,02 | PCWP |            |
|   8 |         PX SEND HASH               | :TQ10000         |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   9 |          PX BLOCK ITERATOR         |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWC |            |
|  10 |           TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWP |            |
|  11 |        PX RECEIVE                  |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,02 | PCWP |            |
|  12 |         PX SEND HASH               | :TQ10001         |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | P->P | HASH       |
|  13 |          PX BLOCK ITERATOR         |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWC |            |
|  14 |           TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------------

As you can see, line 3 has the LOAD AS SELECT after which the slaves message the query co-ordinator – so the DML certainly was parallel even though it wasn’t any faster. But why is the hash join (line 6) still buffered, and why is there an extra data flow (lines 5 and 4 – PX SEND RANDOM LOCAL / PX RECEIVE). The hash join has to be buffered because of that extra data flow (which suggests that the buffering and messaging could still be the big problem) – but WHY is the data flow there at all, it shouldn’t be.

At this point I remembered that the first message in the thread had mentioned testing partitioned tables as well as non-partitioned tables – and if you do a parallel insert to a partitioned table and the data is going to be spread across several partitions, and the number of partitions is not a good match for the degree of parallelism then you’re likely to an extra stage of data distribution as Oracle tries to share the data and the partitions as efficiently as possible across slaves. One of the possible distribution methods is “local random” – which is fairly likely to appear if the number of slaves is larger than the number of partitions. This behaviour can be modified with the newer “single distribution” version of the pq_distribute hint. So I asked the OP if their latest test was on a partitioned table, and suggested they insert the hint /*+ pq_distribute(a none) */ just after the parallel hint.

The answer was yes, and the hint had the effect of dropping the run time down to 7 minutes – still not as good as the CTAS, but then the CTAS wasn’t creating a partitioned table so it’s still not a completely fair test. Here’s the (start of the) final plan:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |
|   1 |  PX COORDINATOR                    |                  |       |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT                  | TIZ_IRDM_G02_CC  |       |       |       |            |          |  Q1,02 | PCWP |            |
|*  4 |     HASH JOIN                      |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |

As you can see, we have a hash join that is NOT buffered; we don’t have a third distribution, and the slaves do the data load and then message the query co-ordinator.

It would be interesting to know if there was a significant skew in the data volumes that went into each partition of the partitioned table, and check where the time was spent for both the partitioned insert and the non-partitioned CTAS (and compare with a non-partitioned insert) – but real-world DBAs don’t necessarily have all the time for investigations that I do.

My reference: parallel_dml.sql

January 24, 2016

Semijoin_driver

Filed under: bitmaps,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:42 am BST Jan 24,2016

Here’s one of those odd little tricks that (a) may help in a couple of very special cases and (b) may show up at some future date – or maybe it already does – in the optimizer if it is recognised as a solution to a more popular problem. It’s about an apparent restriction on how the optimizer uses the BITMAP MERGE operation, and to demonstrate a very simple case I’ll start with a data set with just one bitmap index:


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index t1_b1 on t1(n1);

select index_name, leaf_blocks, num_rows from user_indexes;

/*
INDEX_NAME           LEAF_BLOCKS   NUM_ROWS
-------------------- ----------- ----------
T1_B1                        500       1000
*/

Realistically we don’t expect to use a single bitmap index to access data from a large table, usually we expect to have queries that give the optimizer the option to choose and combine several bitmap indexes (possibly driving through dimension tables first) to reduce the target row set in the table to a cost-effective level.

In this example, though, I’ve created a column data set that many people might view as “inappropriate” as the target for a bitmap index – in one million rows I have one thousand distinct values, it’s not a “low cardinality” column – but, as Richard Foote (among others) has often had to point out, it’s wrong to think that bitmap indexes are only suitable for columns with a very small number of distinct values. Moreover, it’s the only index on the table, so no chance of combining bitmaps.

Another thing to notice about my data set is that the n1 column has been generated by the mod() function; because of this the column cycles through the 1,000 values I’ve allowed for it, and this means that the rows for any given value are scattered widely across the table, but it also means that if I find a row with the value X in it then there could well be a row with the value X+4 (say) in the same block.

I’ve reported the statistics from user_indexes at the end of the sample code. This shows you that the index holds 1,000 “rows” – i.e. each key value requires only one bitmap entry to cover the whole table, with two rows per leaf block.  (By comparison, a B-tree index oon the column was 2,077 leaf block uncompressed, or 1,538 leaf blocks when compressed).

So here’s the query I want to play with, followed by the run-time execution plan with stats (in this case from a 12.1.0.2 instance):


alter session set statistics_level = all;

select
        /*+
                qb_name(main)
        */
        max(small_vc)
from
        t1
where
        n1 in (1,5)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |      1 |00:00:00.03 |    2006 |      4 |
|   1 |  SORT AGGREGATE                       |       |      1 |      1 |      1 |00:00:00.03 |    2006 |      4 |
|   2 |   INLIST ITERATOR                     |       |      1 |        |   2000 |00:00:00.03 |    2006 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      2 |   2000 |   2000 |00:00:00.02 |    2006 |      4 |
|   4 |     BITMAP CONVERSION TO ROWIDS       |       |      2 |        |   2000 |00:00:00.01 |       6 |      4 |
|*  5 |      BITMAP INDEX SINGLE VALUE        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |      4 |
------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("N1"=1 OR "N1"=5))

The query is selecting 2,000 rows from the table, for n1 = 1 and n1 = 5, and the plan shows us that the optimizer probes the bitmap index twice (operation 5), once for each value, fetching all the rows for n1 = 1, then fetching all the rows for n1 = 5. This entails 2,000 buffer gets. However, we know that for every row where n1 = 1 there is another row nearby (probably in the same block) where n1 = 5 – it would be nice if we could pick up the 1 and the 5 at the same time and do less work.

Technically the optimizer has the necessary facility to do this – it’s known as the BITMAP MERGE – Oracle can read two or more entries from a bitmap index, superimpose the bits (effectively a BITMAP OR), then convert to rowids and visit the table. Unfortunately there are cases (and it seems to be only the simple cases) where this doesn’t appear to be allowed even when we – the users – can see that it might be a very effective strategy. So can we make it happen – and since I’ve asked the question you know that the answer is almost sure to be yes.

Here’s an alternate (messier) SQL statement that achieves the same result:


select
        /*+
                qb_name(main)
                semijoin_driver(@subq)
        */
        max(small_vc)
from
        t1
where
        n1 in (
                select /*+ qb_name(subq) */
                        *
                from    (
                        select 1 from dual
                        union all
                        select 5 from dual
                        )
        )
;

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.02 |    1074 |       |       |          |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.02 |    1074 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |   2000 |   2000 |00:00:00.02 |    1074 |       |       |          |
|   3 |    BITMAP CONVERSION TO ROWIDS       |       |      1 |        |   2000 |00:00:00.01 |       6 |       |       |          |
|   4 |     BITMAP MERGE                     |       |      1 |        |      1 |00:00:00.01 |       6 |  1024K|   512K| 8192  (0)|
|   5 |      BITMAP KEY ITERATION            |       |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   6 |       VIEW                           |       |      1 |      2 |      2 |00:00:00.01 |       0 |       |       |          |
|   7 |        UNION-ALL                     |       |      1 |        |      2 |00:00:00.01 |       0 |       |       |          |
|   8 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 10 |       BITMAP INDEX RANGE SCAN        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  10 - access("N1"="from$_subquery$_002"."1")

Key points from this plan – and I’ll comment on the SQL in a moment: The number of buffer visits is roughly halved (In many cases we picked up two rows as we visited each buffer); operation 4 shows us that we did a BITMAP MERGE, and we can see in operations 5 to 10 that we did a BITMAP KEY ITERATION (which is a bit like a nested loop join – “for each row returned by child 1 (operation 6) we executed child 2 (operation 10)”) to probe the index twice and get two strings of bits that operation 4 could merge before operation 3 converted to rowids.

For a clearer picture of how we visit the table, here are the first few rows and last few rows from a version of the two queries where we simply select the ID column rather than aggregating on the small_vc column:

select  id from ...

Original query structure
         1
      1001
      2001
      3001
...
    997005
    998005
    999005

2000 rows selected.

Modified query structure:

         1
         5
      1001
      1005
      2001
      2005
...
    998001
    998005
    999001
    999005
    
2000 rows selected.

As you can see, one query returns all the n1 = 1 rows then all the n1 = 5 rows while the other query alternates as it walks through the merged bitmap. You may recall the Exadata indexing problem (now addressed, of course) from a few years back where the order in which rows were visited after a (B-tree) index range scan made a big difference to performance. This is the same type of issue – when the optimizer’s default plan gets the right data in the wrong order we may be able to find ways of modifying the SQL to visit the data in a more efficient order. In this case we save only fractions of a second because all the data is buffered, but it’s possible that in a production environment with much larger tables many, or all, of the re-visits could turn into physical reads.

Coming back to the SQL, the key to the re-write is to turn my IN-list into a subquery, and then tell the optimizer to use that subquery as a “semijoin driver”. This is essentially the mechanism used by the Star Tranformation, where the optimizer rewrites a simple join so that each dimension table (typically) appears twice, first as an IN subquery driving the bitmap selection then as a “joinback”. But (according to the manuals) a star transformation requires at least two dimension tables to be involved in a join to the central fact table – and that may be why the semi-join approach is not considered in this (and slightly more complex) cases.

 

 

My reference: bitmap_merge.sql, star_hack3.sql

January 11, 2016

Subquery Effects

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 12:50 pm BST Jan 11,2016

Towards the end of last year I used a query with a couple of “constant” subqueries as a focal point for a blog note on reading parallel execution plans. One of the comments on that note raised a question about cardinality estimates and, coincidentally, I received an email about the cost calculations for a similar query a few days later.

Unfortunately there are all sorts of anomalies, special cases, and changes that show up across versions when subqueries come into play – it’s only in recent versions of 11.2, for example, that a very simple example I’ve got of three equivalent statements that produce the same execution plan report the same costs and cardinality. (The queries are:  table with IN subquery, table with EXISTS subquery, table joined to “manually unnested” subquery – the three plans take the unnested subquery shape.)

I’m just going to pick out one particular anomaly, which is a costing error with multiple subqueries when “OR-ed”. Here’s my sample data set:


create table t1
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 20000
;


create table t2
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 25000
;

create table t3
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 30000
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t2',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t3',
                method_opt       => 'for all columns size 1'
        );
end;
/

The three tables are slightly different sizes so that it will be easy to see different costs of tablescans, and there are no indexes to everything I do in the queries will be tablescans. Here are six queries I’m going to test – they all scan t1, with “constant” subqueries against t2 and/or t3. The first pair is just to show you the basic cost of the query with a single subquery, the second pair shows you the default action with two subqueries in two different orders, the final pair shows you what happens with two subqueries when you block subquery pushing.


select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n2 > (select avg(t2.n2) from t2)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n3 > (select avg(t3.n3) from t3)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select avg(t2.n2) from t2)
         or t1.n3 > (select avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select avg(t3.n3) from t3)
         or t1.n2 > (select avg(t2.n2) from t2)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
         or t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
         or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
        )
;

Here are the first two plans, pulled from memory (which you might have guessed thanks to the “disappearing subquery predicate” in the predicate section. These examples came from 12.1.0.2, but the same happens in 11.2.0.4:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N2">))

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   123 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N3">))

As you can see, the cost of the query is the cost of the t1 tablescan plus the cost of running the t2 or t3 subquery once: 111 = 49 + 62, and 123 = 49 + 74.

(As a general guideline, recent versions of the optimizer tend to allow for subqueries by including “cost of subquery” * “number of times the optimizer thinks it will execute” – in this case the optimizer knows that the subquery will run exactly once).

But what happens when we test the query that applies BOTH subqueries to the tablescan ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N2"> OR "T1"."N3">)))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N3"> OR "T1"."N2">)))

The cost of the query in both cases is just the cost of the tablescan of t1 – the subqueries are, apparently, free. You can check from the predicate section, by the way, that the subqueries are applied in the order they appear in original statement.

Does anything change if the subqueries are not pushed ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N2"> OR "T1"."N3">))
   3 - filter("T1"."N1">10000)

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   124 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N3"> OR "T1"."N2">))
   3 - filter("T1"."N1">10000)

The two plans have different costs – and the cost is the cost of the tablescan of t1 plus the cost of just the first subquery in the filter predciate list.

The non-pushed subqueries show up another anomaly: you’ll notice that the t1 tablescan reports 10,001 rows cardinality, but the FILTER operation doesn’t have an associated cardinality so we can’t see how many rows the optimizer thinks will survive the subqueries. So let’s run a query that allows us to see the surviving row estimate:


select
        max(n1)
from
        (
        select
                /*+ no_eliminate_oby */
                t1.n1
        from
                t1
        where
                t1.n1 > 10000
        and     (
                   t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
                or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
                )
        order by
                n1
        )
;

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |       |       |   126 (100)|          |
|   1 |  SORT AGGREGATE        |      |     1 |    13 |            |          |
|   2 |   VIEW                 |      | 10001 |   126K|   126   (5)| 00:00:01 |
|   3 |    SORT ORDER BY       |      | 10001 |   146K|   126   (5)| 00:00:01 |
|*  4 |     FILTER             |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   6 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |       TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   8 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   9 |       TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."N3"> OR "T1"."N2">))
   5 - filter("T1"."N1">10000)

As you can see, the SORT ORDER BY operation thinks it’s going to handle 10,0001 rows – it looks as if the optimizer arithmetic hasn’t applied the usual subquery guess of 5% for the two subqueries. (When the subqueries were automatically pushed you saw a cardinality of 975 – which is 5% for subquery t2 plus (due to OR) 5% for subquery t3 minus 5% of 5% (=25) for the overlap – this is the standard OR arithmetic)

tl;dr

Although the optimizer code has been enhanced in many places for dealing with subquery estimates, but there are still some odd errors and inconsistencies that you need to be aware of. The examples I’ve shown may not be particularly significant in terms of what they do, but the pattern is one that you may recognise in more complex queries.

 

Reference script: subq_cost_anomaly_2.sql

 

January 8, 2016

CTEs and Updates

Filed under: Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 1:01 pm BST Jan 8,2016

An important target of trouble-shooting, particularly when addressing performance problems, is to minimise the time and effort you have to spend to get a “good enough” result. A recent question on the OTN database forum struck me as a good demonstration of following this strategy; the problem featured a correlated update that had to access a view 84 times to update a small table; but the view was a complex view (apparently non-mergeable) and the update took several hours to complete even though the view, when instantiated, held only 63 rows.

The OP told us that the query “select * from view” took seven minutes to return those 63 rows, and wanted to know if we could find a nice way to perform the update in (approximately) that seven minutes, rather than using the correlated update approach that seemed to take something in the ballpark of 7 minutes per row updated.

Of course the OP could have given us all the details of the view definition, all the table and index definitions, with stats etc. and asked us if we could make the update run faster – but that could lead to a long and frustrating period of experimentation and testing, and a solution that might increase the general maintenance costs of the system (because a subsequent modification to the view might then have to be echoed into the code that did the update). Setting a strictly limited target that clearly ought to be achievable is (if nothing else) a very good starting point for improving the current situation.

I don’t know (as at the time of writing) if the OP implemented the strategy I suggested, but from his description it looked as if it should have been simple to use subquery factoring with materialization to achieve the required result in the most elegant way possible (meaning, in this case, simple SQL and no change to any surrounding code).

The OP has responded to my suggestion with a comment that “it didn’t work”, but it appeared to me that they were looking at and mis-interpreting the output from a call to “Explain Plan” rather than testing the query and pulling the plan from memory – so I thought I’d build a simple model to demonstrate the principle and show you how you could confirm (beyond just checking the clock) that the strategy had worked.

We start with a table to update, a non-mergeable view, and two tables to make up the non-mergeable view:


create table t1
as
select
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 3000
;


create table t2
as
select
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 3000;
create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt => 'for all columns size 1'
        );
end;
/

create or replace view v1
as
select distinct
        t1.n1 t1n1, t1.n2 t1n2, t2.n2 t2n2
from
        t1, t2
where
        t1.n1 = t2.n1
;

create table t3
as
select * from v1
;

begin
        dbms_stats.gather_table_stats(
                user,
                't3',
                method_opt => 'for all columns size 1'
        );
end;
/

I’ve created the table t3 by copying the content of the view v1 and I’m going to update every row in t3 from v1; I gathered stats on t1 and t2 before creating the view and table simply to avoid the need for Oracle to do dynamic sampling as it created t3. Depending on your version of Oracle, of course, the stats collections might be redundant.

Having set the scene with the data, here’s the “original” code for doing the required update, followed by its execution plan (pulled from the memory of a 12.1.0.2 instance):


set serveroutput off
set linesize 180
set trimspool on

alter session set statistics_level = all;

spool cte_update

update t3
        set t2n2 = (
                select  v1.t2n2
                from    v1
                where   v1.t1n1 = t3.t1n1
                and     v1.t1n2 = t3.t1n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                         |       |      1 |        |      0 |00:00:01.22 |   46745 |       |       |          |
|   1 |  UPDATE                                  | T3    |      1 |        |      0 |00:00:01.22 |   46745 |       |       |          |
|   2 |   TABLE ACCESS FULL                      | T3    |      1 |    200 |    200 |00:00:00.01 |       3 |       |       |          |
|   3 |   VIEW                                   | V1    |    200 |      1 |    200 |00:00:01.22 |   46332 |       |       |          |
|   4 |    SORT UNIQUE                           |       |    200 |      1 |    200 |00:00:01.21 |   46332 |  2048 |  2048 | 2048  (0)|
|   5 |     NESTED LOOPS                         |       |    200 |      1 |  45000 |00:00:01.11 |   46332 |       |       |          |
|   6 |      NESTED LOOPS                        |       |    200 |      1 |  45000 |00:00:00.34 |    1332 |       |       |          |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    200 |      1 |   3000 |00:00:00.02 |     684 |       |       |          |
|*  8 |        INDEX RANGE SCAN                  | T1_I1 |    200 |     15 |   3000 |00:00:00.01 |     408 |       |       |          |
|*  9 |       INDEX RANGE SCAN                   | T2_I1 |   3000 |      1 |  45000 |00:00:00.11 |     648 |       |       |          |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T2    |  45000 |      1 |  45000 |00:00:00.31 |   45000 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - filter("T1"."N2"=:B1)
   8 - access("T1"."N1"=:B1)
   9 - access("T2"."N1"=:B1)
       filter("T1"."N1"="T2"."N1")

Points to note from this execution plan: the VIEW operation at line 3 has started 200 times (there are 200 rows in table t3, the subquery runs once per row); and a simple measure of work done is the 46,745 buffer visits (of which, I can tell you, roughly 400 are current block gets) reported under Buffers in the top line of the plan.

It’s an interesting detail that although Oracle has pushed the correlation predicates inside the view (as shown by the predicate section for operations 7,8 and 9) it doesn’t report the operation at line 3 as “VIEW PUSHED PREDICATE”. It would be nice to see the explicit announcement of predicate pushing here, but that seems to be an expression reserved for pushing join predicates into views – fortunately we always check the predicate section, don’t we!

Now let’s see what the SQL and plan look like if we want Oracle to create the entire v1 result set and use that to update the t3 table.

update t3 
        set t2n2 = (
                with v0 as (
                        select
                                /*+ materialize */
                                t1n1, t1n2, t2n2
                        from v1
                )
                select
                        t2n2
                from
                        v0
                where   v0.t1n1 = t3.t1n1
                and     v0.t1n2 = t3.t1n2
        )
;

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT            |                            |      1 |        |      0 |00:00:00.19 |    1185 |      1 |      1 |       |       |          |
|   1 |  UPDATE                     | T3                         |      1 |        |      0 |00:00:00.19 |    1185 |      1 |      1 |       |       |          |
|   2 |   TABLE ACCESS FULL         | T3                         |      1 |    200 |    200 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|   3 |   TEMP TABLE TRANSFORMATION |                            |    200 |        |    200 |00:00:00.18 |     778 |      1 |      1 |       |       |          |
|   4 |    LOAD AS SELECT           |                            |      1 |        |      0 |00:00:00.01 |     171 |      0 |      1 |  1040K|  1040K|          |
|   5 |     VIEW                    | V1                         |      1 |  45000 |    200 |00:00:00.01 |     168 |      0 |      0 |       |       |          |
|   6 |      HASH UNIQUE            |                            |      1 |  45000 |    200 |00:00:00.01 |     168 |      0 |      0 |  1558K|  1558K| 3034K (0)|
|*  7 |       HASH JOIN             |                            |      1 |  45000 |  45000 |00:00:00.01 |     168 |      0 |      0 |  1969K|  1969K| 1642K (0)|
|   8 |        TABLE ACCESS FULL    | T1                         |      1 |   3000 |   3000 |00:00:00.01 |      84 |      0 |      0 |       |       |          |
|   9 |        TABLE ACCESS FULL    | T2                         |      1 |   3000 |   3000 |00:00:00.01 |      84 |      0 |      0 |       |       |          |
|* 10 |    VIEW                     |                            |    200 |  45000 |    200 |00:00:00.17 |     603 |      1 |      0 |       |       |          |
|  11 |     TABLE ACCESS FULL       | SYS_TEMP_0FD9D6618_911FB4C |    200 |  45000 |  40000 |00:00:00.08 |     603 |      1 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1"."N1"="T2"."N1")
  10 - filter(("V0"."T1N1"=:B1 AND "V0"."T1N2"=:B2))

The headline figure to note is that 1,185 Buffer visits – clearly we’ve done something very different (and possibly cheaper and faster, even in this tiny demonstration). Looking at operation 3 we see the “TEMP TABLE TRANSFORMATION”, which tells us that we’ve materialized our factored subquery. There is scope, though, for a little ambiguity and uncertainty – the Starts column for this operation says we started it 200 times, once for each row in t3. We might worry that we’ve actually recreated the result and written it to disc 200 times even though we might then notice that lines 4 – 9 tell us that we loaded the temporary table just once (Starts = 1).

You could take my word for it that we didn’t “do” the temp table transformation 200 time, we merely used the result of the temp table transformation 200 times; but I wasn’t prepared to make this assumption until I had done a little more checking, so there’s no reason why you shouldn’t still be a little suspicious. Lines 4 – 9 do seem to tell us (consistently) that we only load the data once, but there have been occasional bugs where counters have been reset to zero when they shouldn’t have been, so the fact that we see (for example, at operation 8) “1 full tablescan of t1 returning 3,000 rows after visiting 84 buffers” may mean that Oracle counted the work once and “forgot” to count it the other 199 times.

It’s easy enough to do a quick cross-check. Take a snapshot of v$mystat joined to v$statname before and after runnning the query, and check the difference in buffer visits, tablescans, and tablescan rows gotten – if those figures are broadly consistent with the figures in the execution plan I think we can be reasonably confident that the plan is telling us the truth.

Here’s what we get for a few key figures:

Name                                       Value
----                                       -----
session logical reads                      1,472
db block gets                                412
consistent gets                            1,060
consistent gets from cache                 1,060
db block changes                             410
table scans (short tables)                   205
table scan rows gotten                    46,213
table scan blocks gotten                     366

There are a number of oddities – not to mention version and feature dependent variations – in the numbers and a couple of discrepancies introduced by the code I was using to take the snapshot, but the “table scan rows gotten” figure is particularly easy to see in the execution plan:

46,213 = 3000 (t1) + 3000 (t2) + 200 (t3) + 200 * 200 (temp table)

With a small error the number of “table scans (short tables)” is also consistent with the plan Starts – and that’s perhaps the most important indicator, we scan t1 and t2 just once, and the temp table result 200 times. If we were creating the temp table 200 times we’d have to have done over 400 table scans (200 each for t1 and t2).

I won’t go into the details of how to compare the session logical I/O to the total Buffer gets for the plan – but the figures are in the right ballpark as far as matching is concerned – if the plan was deceiving us about the number of times the temporary table was created (rather than used) the session stats would have to report a figure more like 33,600 (200 * (84 + 84)) consistent gets.

Conclusion

We have managed to reduce the workload from “one view instantiation per row” to “one view instantiation” with a very small change to the SQL. In the case of the OP this should result in a small, easily comprehensible, change in the SQL statement leading to a drop in run-time from several hours to seven minutes – and maybe that’s good enough for the present.

Reference Script: cte_update.sql

 

January 6, 2016

NLS Mess

Filed under: Bugs,CBO,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:18 pm BST Jan 6,2016

The Oracle database has all sorts of little details built into it to help it deal with multi-national companies, but since they’re not commonly used you can find all sorts of odd “buggy” bits of behaviour when you start to look closely. I have to put “buggy” in quotes because some of the reported oddities are the inevitable consequences of (for example) how multi-byte character sets have to work; but some of the oddities look as if they simply wouldn’t be there if the programmer writing the relevant bit of code had remembered that they also had to cater for some NLS feature.

Here’s an example of the type of unexpected behaviour that can appear. There probably are some bugs in the area I’m going to demonstrate but, at first glance, I thought I was looking at an acceptable limitation imposed by a generic requirement. The example came from AskTom. which is why the data set isn’t my usual “t1” generation (and the formatting and capitalisation isn’t according to my usual standards).

The problem involves Case Insensitive indexing.


ALTER session SET nls_sort=binary_ci;
ALTER session SET nls_comp=linguistic;

CREATE TABLE log_data(
  account_id NUMBER,
  log_type NUMBER,
  sys_name VARCHAR2(30),
  log_time TIMESTAMP,
  msg varchar2(4000)
)
nologging
;

insert /*+ append */ into log_data(
  account_id,
  log_type,
  sys_name,
  log_time,
  msg
)
select
        5,
        2,
        dbms_random.string('a',1),
        sysdate + dbms_random.value,
        rpad('x',200)
from
        dual
connect by
        level <= 26000
;


create index log_date on log_data (
        account_id, 
        log_type, 
--      sys_name,
        NLSSORT(sys_name,'NLS_SORT=BINARY_CI'),
        log_time
)
nologging
;
  
rem     ======================================================================
rem     Need to gather stats AFTER index creation because of the hidden column
rem     ======================================================================
  
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'LOG_DATA',
                method_opt       => 'for all columns size 1'
        );
end;
/

And here’s the query I want to optimize:


SELECT 
        *
FROM
  (
    SELECT
        sys_name, log_time,  substr(msg,1,40) msg
    FROM log_data
    WHERE
      account_id=5
      AND log_type=2
      AND sys_name='a'
    ORDER BY
      log_time  desc
  )
WHERE
  rownum <= 10
;

The requirement of the query is that we see the ten most recent entries for a given combination of account_id, log_type and sys_name (ignoring case in sys_name). The orginal table has tens of millions of rows, of course, with many combinations, and some of the combinations have a very large number of entries hence the desire to find an access path that gets just the 10 rows we want without getting all the rows for a combination and sorting them before returning the ten.

Normally we would just create an index that started with the 3 columns used in the equality and ending with the column in the order by clause, and that would be enough for the optimizer to see the option for a “sort order by nosort” operation to get the required data through an index range scan; so that’s the index the code sample creates, except that since we’ve enabled case insensitive sorting we need to use a function-based index to hold the case-insensitive version of sys_name.

Here’s the execution plan we would get if we DIDN’T use the nlssort() function in the index – I’ve run the query in 11.2.0.4 and pulled the plan from memory with rowsource execution stats enabled:


---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   605 (100)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.02 |    1065 |       |       |          |
|   2 |   VIEW                         |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY       |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| LOG_DATA |      1 |    500 |   603   (1)|    966 |00:00:00.01 |    1065 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | LOG_DATE |      1 |    500 |   103   (3)|    966 |00:00:00.01 |     100 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2)
       filter(NLSSORT("SYS_NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6100') )

Notice particularly the filter predicate at operation 5: that’s the thing we need to get into the index before we can avoid picking up excess data and sorting it. Notice also in the A-Rows column that we acquired 966 rows from the table before sorting and discarding all but 10 of them at operation 3.

Notice especially how important it is to look at the predicate section of an execution plan to gain a full understanding of what’s happening.

So here’s the execution plan we get by default with the function-based index in place:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |    13 (100)|     10 |00:00:00.01 |     969 |       |       |          |
|*  1 |  COUNT STOPKEY                  |          |      1 |        |            |     10 |00:00:00.01 |     969 |       |       |          |
|   2 |   VIEW                          |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY        |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|    966 |00:00:00.01 |     969 |       |       |          |
|*  5 |      INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|    966 |00:00:00.01 |       5 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

It didn’t work ! (Check the A-Rows at operations 4 and 5, and the sort that we didn’t want at operation 3 where the data is finally reduced to 10 rows.

But there’s something odd going on here – look at the predicate section: our three predicates are all access predicates for the index range scan descending. We are doing exactly what we want to do with the index, but we’re not stopping after the 10 rows that we need, we’re getting all of them (in the order we want) and then doing a trivial sort and discard. Look at the Cost column – the cost at operation 4 is exactly what we might expect for the 10 rows we want to see, and the E-rows at line 5 is clearly based on our “first 10 rows” requirement.

This raises two questions:

  1. What’s gone wrong ?
  2. Can we work around the problem ?

The answer to (1) is, I think, that there’s a bug in the code. Looking at the 10053 trace file I can see the optimizer correctly handling the arithmetic of the virtual column (the sys_nc000006$) representing the function in the index and then getting to the point where it goes into a code section relating to “Recost for ORDER BY”, and brings back the original function as a filter predicate – I think that in the recosting it may be losing track of the fact that sys_nc000006$ and nlssort(sys_name, ‘nls_sort=binary_ci’) are the same thing and therefore can’t apply the rule about “Equality on 1st N columns, order by on the remainder”.

There are several answers to (2).

Workarounds

The honest hack

The first one is simply to fall back to the old (probably version 7, possibly version 8) requirement for getting the “sort order by nosort” operation – put all the index columns into the order by clause. Unfortunately the optimizer then did a tablescan rather than an index range scan because my data set was so small, so I had to hack the system stats temporarily to make the tablescan very expensive:


begin
        dbms_stats.set_system_stats('MBRC',2);
        dbms_stats.set_system_stats('MREADTIM',20); 
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000); 
end;
/

... order by account_id desc, log_type desc, sys_name desc, lot_time desc

Unfortunately the optimizer still went wrong – it did an ASCENDING index range scan sorting all the data. I actually had to hint the code to use the index in descending order to get the following execution plan:


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |  1215 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |   1000 |  1215   (1)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |  1006   (1)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |   1000 |     5   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

The A-Rows tells us we’ve accessed the minimum data set, and the absence of the SORT ORDER BY STOPKEY operation tells us that we’ve avoided doing the sort. Notice, though that the cost is the cost that would have been appropriate if we have accessed all 1,000 rows that matched the equality predicates. This is an example of a plan that you couldn’t really trust if all you had done was an “explain plan” rather than running the query and checking the rowsource execution stats. If you ignore the A-Rows it looks as if the plan WOULD get all the data in order and only eliminate the redundant rows at operation 1.

The silly surprise

The original author of the problem came up with this one. Put in two predicates which, between them are equivalent to the original requirement:


where ...
and     sys_name >= 'a'
and     sys_name <= 'a'

Clearly this is totally silly – the optimizer can fold this pair of predicates into the single predicate “sys_name = ‘a'”, so it shouldn’t make any difference. But here’s the execution plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

Yes, it’s (structurally) exactly the same plan, with exactly the same predicate section except that (a) it gets there without being hinted, (b) the Cost column looks appropriate all down the line, and (c) the E-Rows value for the VIEW operator would have helped us appreciate that the correct elimination was (probably) going to happen if all we had done was the Explain Plan.

The dirty hack

I know the name of the hidden column that’s causing the problem, and I know how to generate the value it has to be – so let’s give Oracle exactly what it needs to see rather than allowing its internal transformation to rewrite the SQL:

...
AND sys_nc00006$ = nlssort('a','nls_sort=binary_ci')
...


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "SYS_NC00006$"=HEXTORAW('6100') )

We get exactly the plan we need – and the silly thing about this example is that it’s a case where we get the plan we want by EXPLICITLY transforming the SQL to reproduce the transformation that Oracle had done IMPLICITLY and then messed up !

Final Choice
Of the three options – the dirty hack is definitely a no-no in production; the “double the predicate” trock is undesirable because it may depend in some unexpected way on a particular optimizer bug or on some statistical detail that could change; so I’d choose the hinted path with the (nominally) redundant columns.

One final point about this solution, we actually needed to include only the sys_name in the order by clause to use the descending range scan and early stop – which is basically another indication that it’s something about the function-based column that is breaking the normal code path.

Reference Script: nls_sort_anomaly.sql

December 22, 2015

Predicates

Filed under: Execution plans,Oracle,Tuning — Jonathan Lewis @ 12:58 pm BST Dec 22,2015

I received an email recently that started with the sort of opening sentence that I see far more often than I want to:

I have come across an interesting scenario that I would like to run by you, for your opinion.

It’s not that I object to being sent interesting scenarios, it’s just that they are rarely interesting – and this wasn’t one of those rare interesting ones. On the plus side it reminded me that I hadn’t vented one of my popular rants for some time.

Here’s the problem – see if you can work out the error before you get to the rant:

“I’ve got a table and a view on that table; and I’ve got a query that is supposed to use the view. Whether I use the table or the view in query the optimizer uses the primary key on the table to access the table – but when I use the table the query takes about 30 ms, when I use the view the query takes about 903 ms”.

The email included a stripped-down version of the problem (which I’ve stripped even further) – so score some brownie points on that one.  Here, in order, are the table, the view, and two variations of the query:


create table table_a (
	col_1  varchar2(20)	not null,
	col_2  number(10)	not null,
	col_3  varchar2(20)	not null,
	col_4  varchar2(100)
);

insert /*+ append */ into table_a
select
	lpad(mod(rownum-1,1000),10), mod(rownum-1,1000), lpad(rownum,20), rpad(rownum,100)
from
	all_objects
where
	rownum <= 10000
;
commit; 

alter table table_a add constraint ta_pk primary key(col_1, col_2, col_3); 
execute dbms_stats.gather_table_stats(user,'table_a',method_opt=>'for all columns size 1')

create or replace view view_a (
	col1,
	col2,
	col3,
	col4
)
as
select 
	col_1 as col1,
	cast(col_2 as number(9)) as col2,
	col_3 as col3,
	col_4 as col4
from
	table_a
;


variable b1 varchar2(10)
variable b2 number

exec :b1 := lpad(0,10)
exec :b2 := 0

select /*+ index(table_a) tracking_t2 */
	 *
from	table_a
where 
	col_1 = :b1
and	col_2 = :b2
;

select /*+ index(view_a.table_a) tracking_v2 */
	*
from	view_a
where 
	col1 = :b1
and	col2 = :b2
;

Question 1 (for no points): Why would there be a difference (though very small in this example) in performance ?

Question 2 (for a virtual pat on the head): What did the author of the email not do that made him think this was an interesting problem ?

Just to muddy the water for those who need a hint (that’s a hint hint, not an Oracle hint) – here are the two execution plans reprted from v$sql in version 12.1.0.2:


SQL_ID  514syc2mcb1wp, child number 0
-------------------------------------
select /*+ index(table_a) tracking_t2 */   * from table_a where  col_1
= :b1 and col_2 = :b2

Plan hash value: 3313752691

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |      1 |        |     10 |00:00:00.01 |      13 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TABLE_A |      1 |      1 |     10 |00:00:00.01 |      13 |
|*  2 |   INDEX RANGE SCAN                  | TA_PK   |      1 |      1 |     10 |00:00:00.01 |       3 |
---------------------------------------------------------------------------------------------------------


SQL_ID  ck0y3v9833wrh, child number 0
-------------------------------------
select /*+ index(view_a.table_a) tracking_v2 */  * from view_a where
col1 = :b1 and col2 = :b2

Plan hash value: 3313752691

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |      1 |        |     10 |00:00:00.01 |      13 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TABLE_A |      1 |      1 |     10 |00:00:00.01 |      13 |
|*  2 |   INDEX RANGE SCAN                  | TA_PK   |      1 |      1 |     10 |00:00:00.01 |       3 |
---------------------------------------------------------------------------------------------------------

I’ve even shown you the Plan Hash Values for the two queries so you can check that the execution plans were the same.

So what have I just NOT done in my attempt to make it harder for you to understand what is going on ?

Give yourself a pat on the head if you’ve been thinking “Where’s the predicate section for these plans ?”  (9 years old today).

Here are the two predicate sections (in the same order as the plans above):


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL_1"=:B1 AND "COL_2"=:B2)


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL_1"=:B1)
       filter(CAST("COL_2" AS number(9))=:B2)

Notice how the optimizer can use both predicates to probe the index when we query the table but, thanks to the function applied to the column in the view, can only probe the index on the first column of the view and has to check every index entry for the first input value to see of the result of the cast matches the second input value. The size of the range scan in the second case could be much larger than the size of the range scan in the first case – the difference in performance could simply be a reflection that col_1 is very repetitive with many different values of col_2 for every value of col_1.

Interesting

While the problem itself isn’t interesting – it does raise a couple of points worth mentioning (and I’m not going to ask why the view has that surprising cast() in it – but if pushed I could invent a reason)

First, what steps have been taken to ensure that a query against the view won’t crash with Oracle error 1438:

SQL> insert into table_a values(:b1, 1e9,'x','x');

1 row created.

SQL> select * from view_a where col1 = :b1;
ERROR:
ORA-01438: value larger than specified precision allowed for this column

Possibly there’s a check constraint on the column restricting it to values that can survive the cast to number(9).

Secondly, it’s often possible to use constraints or virtual columns (or both together) that allow the optimizer to get clever with expression substitution and come up with optimal execution plans even when there are traps like this put in the way. In this case I couldn’t manage to make the usual tricks work. Possibly the only way to get the hoped-for performance is to create a second index on (col_1, cast(col_2) as number(9), col_3).

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 6,553 other followers