Oracle Scratchpad

June 14, 2017

Unpivot

Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 3:46 pm BST Jun 14,2017

An interesting observation appeared recently as a side-channel on a question on the OTN database forum – how does Oracle execute an unpivot() operation. Here’s an example of such a query:

rem
rem     Script:         unpivot_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2017
rem

create table t1( id, col1, col2, col3, col4, col5, padding )
cache
pctfree 95 pctused 5
-- compress for query low
as
select
        1, 100 , 200 , 300 , 400 , 500,rpad('x',100)
from
        all_objects
where
        rownum <= 50000       -- > needed to bypass wordpress format glitch
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select
        /*+ gather_plan_statistics find this */
        id, max(val) as high_val
from
        t1
unpivot include nulls (
        val for source in (col1, col2, col3, col4, col5)
)
group by id
order by id
; 

I’ve created a table with 50,000 rows (all_objects is quite large in 12.1.0.2 and 12.2.0.1), but with lots of free space per block so that I get three rows per block for a total of roughly 16,667 blocks which is going to make it fairly easy to spot any interesting session statistics. Then I’ve used an unpivot() call that has the effect of turning one row with five columns into five rows with one column.

Here’s the basic execution plan for the query (as pulled from memory):


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        | 10695 (100)|      1 |00:00:00.18 |   16671 |       |       |          |
|   1 |  SORT GROUP BY       |      |      1 |      1 | 10695   (2)|      1 |00:00:00.18 |   16671 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW               |      |      1 |    250K| 10653   (2)|    250K|00:00:01.41 |   16671 |       |       |          |
|   3 |    UNPIVOT           |      |      1 |        |            |    250K|00:00:00.52 |   16671 |       |       |          |
|   4 |     TABLE ACCESS FULL| T1   |      1 |  50000 |  2131   (2)|  50000 |00:00:00.12 |   16671 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

According to the plan Oracle has done one full tablescan on the data (Starts = 1), but the unpivot operation has used this to produce 250,000 rows of output from 50,000 rows (A-Rows=) of input. Note, however, that the cost of the view operation is 5 times the cost of the tablescan but, on the other hand, the number of buffers visited is 16,671 (which matches the size of the table). So have we done 5 tablescans with a union all, or have we done one tablescan ?

The next step is to look at the 10053 (optimizer) trace file, specifically for the “unparsed query” which (despite the final plan table showing the plan we’ve just seen above) looked like this – after applying a few purely cosmetic changes:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "from$_subquery$_002"."ID" "ID",MAX("from$_subquery$_002"."VAL") "HIGH_VAL"
FROM    (
                   (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL1' "SOURCE","T1"."COL1" "VAL" FROM "TEST_USER"."T1" "T1")
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL2' "SOURCE","T1"."COL2" "VAL" FROM "TEST_USER"."T1" "T1")
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL3' "SOURCE","T1"."COL3" "VAL" FROM "TEST_USER"."T1" "T1")
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL4' "SOURCE","T1"."COL4" "VAL" FROM "TEST_USER"."T1" "T1")
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL5' "SOURCE","T1"."COL5" "VAL" FROM "TEST_USER"."T1" "T1")
        ) "from$_subquery$_002"
GROUP BY "from$_subquery$_002"."ID"
ORDER BY "from$_subquery$_002"."ID"
;

And then there’s the outline (which I can see in the 10053 trace, or in the plan pulled from memory by a call to dbms_xplan.display_cursor()).


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$6")
      NO_ACCESS(@"SEL$6" "from$_subquery$_002"@"SEL$6")
      FULL(@"SEL$5" "T1"@"SEL$5")
      FULL(@"SEL$4" "T1"@"SEL$4")
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$2" "T1"@"SEL$2")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

And if I also pull the alias (query block and fully qualified table name) information from memory:


Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$6
   2 - SET$1 / from$_subquery$_002@SEL$6
   3 - SET$1
   4 - SEL$5 / T1@SEL$5

Then there’s the alternative (though not necessarily correct) way to find an “unparsed” version of the query – dbms_sql2.expand_sql_text() in 11g, dbms_utility.expand_sql_text() in 12c:


variable m_sql_out clob

declare
        m_sql_in    clob :=
                'select /*+ gather_plan_statistics find this */ id, max(val) as high_val
                from    t1
                        unpivot include nulls ( val for source in (col1, col2, col3, col4, col5) )
                group by id
                order by id
                '
        ;
begin

--      dbms_sql2.expand_sql_text(        -- 11g
        dbms_utility.expand_sql_text(     -- 12c
                m_sql_in,
                :m_sql_out
        );

end;
/

set long 20000
print m_sql_out

M_SQL_OUT
--------------------------------------------------------------------------------
SELECT "A1"."ID" "ID",MAX("A1"."VAL") "HIGH_VAL" FROM  ( (SELECT "A3"."ID" "ID",
"A3"."PADDING" "PADDING",'COL1' "SOURCE","A3"."COL1" "VAL" FROM "TEST_USER"."T1"
 "A3") UNION ALL  (SELECT "A4"."ID" "ID","A4"."PADDING" "PADDING",'COL2' "SOURCE
","A4"."COL2" "VAL" FROM "TEST_USER"."T1" "A4") UNION ALL  (SELECT "A5"."ID" "ID
","A5"."PADDING" "PADDING",'COL3' "SOURCE","A5"."COL3" "VAL" FROM "TEST_USER"."T
1" "A5") UNION ALL  (SELECT "A6"."ID" "ID","A6"."PADDING" "PADDING",'COL4' "SOUR
CE","A6"."COL4" "VAL" FROM "TEST_USER"."T1" "A6") UNION ALL  (SELECT "A7"."ID" "
ID","A7"."PADDING" "PADDING",'COL5' "SOURCE","A7"."COL5" "VAL" FROM "TEST_USER".
"T1" "A7")) "A1" GROUP BY "A1"."ID" ORDER BY "A1"."ID"

So at some level the optimizer does think it’s doing 5 tablescans – and the t1 reported in the unpivot plan is from the fifth (i.e. last) copy of the table in the union all. But the execution statistics and the session statistics say I’ve done just one tablescan – and this persists even when I make the table so large that it (a) won’t fit the buffer cache, and/or (b) uses direct path reads, and/or (c) runs under Exadata with Hybrid columnar compression.

So what’s (probably) happening ?

I think Oracle has a code path that says it’s doing a union all of tablescans (at least for this example) but tells it that the union all is there as an expansion of an unpivot so (sticking with an example that does a tablescan into the buffer cache) Oracle reads the number of blocks dictated by the current multiblock read count into the cache, pins that one batch of blocks, scans the batch (or, possibly, each block in turn) 5 times (or as required), unpins the batch and then reads the next batch. So the session does five tablescans but does them in a way that lets you see only one tablescan in the statistics.

Footnote

There was a change in the results when I tested this on 12.2.0.1; the unparsed query reported only a two-part union all subquery, and the table alias information in the plan identified the referenced table as the copy of the table from the second (i.e. still last) subquery in the union all. More significantly the cost of the VIEW operation was a close match to the cost of a single tablescan, rather than being a multiple thereof:


select * from table(dbms_xplan.display_cursor('1k077bzp0t6mn',null,'outline alias cost 

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |  1117 (100)|      1 |00:00:00.13 |    8350 |   8334 |       |       |          |
|   1 |  SORT GROUP BY       |      |      1 |    250K|  1117   (6)|      1 |00:00:00.13 |    8350 |   8334 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW               |      |      1 |    250K|  1076   (2)|    250K|00:00:00.13 |    8350 |   8334 |       |       |          |
|   3 |    UNPIVOT           |      |      1 |        |            |    250K|00:00:00.09 |    8350 |   8334 |       |       |          |
|   4 |     TABLE ACCESS FULL| T1   |      1 |  50000 |  1074   (2)|  50000 |00:00:00.07 |    8350 |   8334 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$3
   2 - SET$1 / from$_subquery$_002@SEL$3
   3 - SET$1
   4 - SEL$2 / T1@SEL$2

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$3")
      NO_ACCESS(@"SEL$3" "from$_subquery$_002"@"SEL$3")
      FULL(@"SEL$2" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */

M_SQL_OUT
--------------------------------------------------------------------------------
SELECT "A1"."ID" "ID",MAX("A1"."VAL") "HIGH_VAL" FROM  (SELECT "A3"."ID" "ID","A
3"."PADDING" "PADDING",'COL1' "SOURCE","A3"."COL1" "VAL" FROM "TEST_USER"."T1" "
A3" UNION ALL SELECT "A3"."ID" "ID","A3"."PADDING" "PADDING",'COL2' "SOURCE","A3
"."COL2" "VAL" FROM "TEST_USER"."T1" "A3" UNION ALL SELECT "A3"."ID" "ID","A3"."
PADDING" "PADDING",'COL3' "SOURCE","A3"."COL3" "VAL" FROM "TEST_USER"."T1" "A3"
UNION ALL SELECT "A3"."ID" "ID","A3"."PADDING" "PADDING",'COL4' "SOURCE","A3"."C
OL4" "VAL" FROM "TEST_USER"."T1" "A3" UNION ALL SELECT "A3"."ID" "ID","A3"."PADD
ING" "PADDING",'COL5' "SOURCE","A3"."COL5" "VAL" FROM "TEST_USER"."T1" "A3") "A1
" GROUP BY "A1"."ID" ORDER BY "A1"."ID"
allstats last'));

It used to be hard enough working out what the run-time engine was doing when the optimizer and the run-time engine claimed that they were working on the same (transformed version of the) SQL; but now we can get the impression that the optimizer is directing the run-time engine to execute a plan that couldn’t possibly match the (unparsed) query that the optimizer had been considering.

5 Comments »

  1. Jonathan,

    Thanks again for shedding light on this obscure corner. One detail I noticed:

    You mentioned in https://jonathanlewis.wordpress.com/2016/06/06/merge-precision/ that the optimizer doesn’t work out what columns are actually needed in the USING table, so it may scan the table even though a covering index would do.

    The same thing can happen here. If I code

    select name from hr.employees
    unpivot(name for descr in(first_name, last_name));

    then the table is scanned, but if I code

    select name from (select first_name, last_name from hr.employees)
    unpivot(name for descr in(first_name, last_name));

    then the optimizer scans the EMP_NAME_IX index instead.

    Best regards, Stew

    Comment by stewashton — June 16, 2017 @ 10:43 am BST Jun 16,2017 | Reply

  2. Stew,

    Thanks for the comment.

    I did a couple of checks – firstly by adding the “Projection” option to my current code, and this showed that the tablescan was projecting only the columns meeded.

    Then I created an index on (id, col1, col2, col3, col4, col5) on my test table and declaring id not null so that I had a covering index that could be used in an index-only query. Without changing the query Oracle switched to reporting (and using) an index fast full scan in both 11.2.0.4 and 12.1.0.2 (I haven’t tried 12.2.0.1 yet).

    However there was an oddity in the 10053 “unparsed query” text – it did actually show the padding column as part of the unparsed query, so maybe that’s connected somehow to your observation.

    Comment by Jonathan Lewis — June 16, 2017 @ 11:25 am BST Jun 16,2017 | Reply

  3. Thanks for this detailed analysis of what’s going on under the hood.

    I realize this is much more basic, but if could I ask a confirming question: Even though the Explain Plan for the UNPIVOT says the CBO doing the equivalent of a UNION ALL, would you expect the performance of the UNPIVOT to be significantly faster than someone hand-coding a UNION ALL? We have some old code that does the latter. I’ve been arguing that, in cases where performance is a problem, that it’s worth rewriting them to use UNPIVOT.

    Side-note: A shout-out to the only other Stew I’ve discovered in the Oracle world! :-)

    Comment by stewstryker — June 21, 2017 @ 4:08 pm BST Jun 21,2017 | Reply

    • Stew(2)

      Although the plan says “union all” it is an optimised variant which gets each block just once rather than repeating each subquery from the beginning, so even in the best “union all” case the unpivot should beat It for efficiency.

      Comment by Jonathan Lewis — June 21, 2017 @ 7:00 pm BST Jun 21,2017 | Reply

  4. […] 5. Unpivot […]

    Pingback by Summer Holiday Reading for OBIEE Fans. – OBIEE News — July 30, 2017 @ 8:32 am BST Jul 30,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.