Oracle Scratchpad

June 14, 2017

Unpivot

Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 3:46 pm BST Jun 14,2017

See the May 2019 update for a critical performance detail.

An interesting observation appeared recently as a side-channel on a question on the OTN database forum – how does Oracle execute an unpivot() operation. Here’s an example of such a query:

rem
rem     Script:         unpivot_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2017
rem

create table t1( id, col1, col2, col3, col4, col5, padding )
cache
pctfree 95 pctused 5
-- compress for query low
as
select
        1, 100 , 200 , 300 , 400 , 500,rpad('x',100)
from
        all_objects
where
        rownum <= 50000 -- > needed to bypass wordpress format glitch
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select
        /*+ gather_plan_statistics find this */
        id, max(val) as high_val
from
        t1
unpivot include nulls (
        val for source in (col1, col2, col3, col4, col5)
)
group by id
order by id
; 

I’ve created a table with 50,000 rows (all_objects is quite large in 12.1.0.2 and 12.2.0.1), but with lots of free space per block so that I get three rows per block for a total of roughly 16,667 blocks which is going to make it fairly easy to spot any interesting session statistics. Then I’ve used an unpivot() call that has the effect of turning one row with five columns into five rows with one column.

Here’s the basic execution plan for the query (as pulled from memory):


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        | 10695 (100)|      1 |00:00:00.18 |   16671 |       |       |          |
|   1 |  SORT GROUP BY       |      |      1 |      1 | 10695   (2)|      1 |00:00:00.18 |   16671 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW               |      |      1 |    250K| 10653   (2)|    250K|00:00:01.41 |   16671 |       |       |          |
|   3 |    UNPIVOT           |      |      1 |        |            |    250K|00:00:00.52 |   16671 |       |       |          |
|   4 |     TABLE ACCESS FULL| T1   |      1 |  50000 |  2131   (2)|  50000 |00:00:00.12 |   16671 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

According to the plan Oracle has done one full tablescan on the data (Starts = 1), but the unpivot operation has used this to produce 250,000 rows of output from 50,000 rows (A-Rows=) of input. Note, however, that the cost of the view operation is 5 times the cost of the tablescan but, on the other hand, the number of buffers visited is 16,671 (which matches the size of the table). So have we done 5 tablescans with a union all, or have we done one tablescan ?

The next step is to look at the 10053 (optimizer) trace file, specifically for the “unparsed query” which (despite the final plan table showing the plan we’ve just seen above) looked like this – after applying a few purely cosmetic changes:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT from$_subquery$_002.ID ID,MAX(from$_subquery$_002.VAL) HIGH_VAL
FROM    (
                   (SELECT T1.ID ID, T1.PADDING PADDING, 'COL1' SOURCE, T1.COL1 VAL FROM TEST_USER.T1 T1)
        UNION ALL  (SELECT T1.ID ID, T1.PADDING PADDING, 'COL2' SOURCE, T1.COL2 VAL FROM TEST_USER.T1 T1)
        UNION ALL  (SELECT T1.ID ID, T1.PADDING PADDING, 'COL3' SOURCE, T1.COL3 VAL FROM TEST_USER.T1 T1)
        UNION ALL  (SELECT T1.ID ID, T1.PADDING PADDING, 'COL4' SOURCE, T1.COL4 VAL FROM TEST_USER.T1 T1)
        UNION ALL  (SELECT T1.ID ID, T1.PADDING PADDING, 'COL5' SOURCE, T1.COL5 VAL FROM TEST_USER.T1 T1)
        ) from$_subquery$_002
GROUP BY from$_subquery$_002.ID
ORDER BY from$_subquery$_002.ID
;

And then there’s the outline (which I can see in the 10053 trace, or in the plan pulled from memory by a call to dbms_xplan.display_cursor()).


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$6")
      NO_ACCESS(@"SEL$6" "from$_subquery$_002"@"SEL$6")
      FULL(@"SEL$5" "T1"@"SEL$5")
      FULL(@"SEL$4" "T1"@"SEL$4")
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$2" "T1"@"SEL$2")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

And if I also pull the alias (query block and fully qualified table name) information from memory:


Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$6
   2 - SET$1 / from$_subquery$_002@SEL$6
   3 - SET$1
   4 - SEL$5 / T1@SEL$5

Then there’s an alternative (though not necessarily correct) way to find the ‘expanded’ version of the query that the optimizer will receive – dbms_sql2.expand_sql_text() in 11g, dbms_utility.expand_sql_text() in 12c:


variable m_sql_out clob

declare
        m_sql_in    clob :=
                'select /*+ gather_plan_statistics find this */ id, max(val) as high_val
                from    t1
                        unpivot include nulls ( val for source in (col1, col2, col3, col4, col5) )
                group by id
                order by id
                '
        ;
begin

--      dbms_sql2.expand_sql_text(        -- 11g
        dbms_utility.expand_sql_text(     -- 12c
                m_sql_in,
                :m_sql_out
        );

end;
/

set long 20000
print m_sql_out

M_SQL_OUT
--------------------------------------------------------------------------------
SELECT "A1"."ID" "ID",MAX("A1"."VAL") "HIGH_VAL" FROM  ( (SELECT "A3"."ID" "ID",
"A3"."PADDING" "PADDING",'COL1' "SOURCE","A3"."COL1" "VAL" FROM "TEST_USER"."T1"
 "A3") UNION ALL  (SELECT "A4"."ID" "ID","A4"."PADDING" "PADDING",'COL2' "SOURCE
","A4"."COL2" "VAL" FROM "TEST_USER"."T1" "A4") UNION ALL  (SELECT "A5"."ID" "ID
","A5"."PADDING" "PADDING",'COL3' "SOURCE","A5"."COL3" "VAL" FROM "TEST_USER"."T
1" "A5") UNION ALL  (SELECT "A6"."ID" "ID","A6"."PADDING" "PADDING",'COL4' "SOUR
CE","A6"."COL4" "VAL" FROM "TEST_USER"."T1" "A6") UNION ALL  (SELECT "A7"."ID" "
ID","A7"."PADDING" "PADDING",'COL5' "SOURCE","A7"."COL5" "VAL" FROM "TEST_USER".
"T1" "A7")) "A1" GROUP BY "A1"."ID" ORDER BY "A1"."ID"

So at some level the optimizer does think it’s doing 5 tablescans – and the t1 reported in the unpivot plan is from the fifth (i.e. last) copy of the table in the union all. But the execution statistics and the session statistics say I’ve done just one tablescan – and this persists even when I make the table so large that it (a) won’t fit the buffer cache, and/or (b) uses direct path reads, and/or (c) runs under Exadata with Hybrid columnar compression.

So what’s (probably) happening ?

I think Oracle has a code path that says it’s doing a union all of tablescans (at least for this example) but tells it that the union all is there as an expansion of an unpivot so (sticking with an example that does a tablescan into the buffer cache) Oracle reads the number of blocks dictated by the current multiblock read count into the cache, pins that one batch of blocks, scans the batch (or, possibly, each block in turn) 5 times (or as required), unpins the batch and then reads the next batch. So the session does five tablescans but does them in a way that lets you see only one tablescan in the statistics.

Footnote

There was a change in the results when I tested this on 12.2.0.1; the unparsed query reported only a two-part union all subquery, and the table alias information in the plan identified the referenced table as the copy of the table from the second (i.e. still last) subquery in the union all. More significantly the cost of the VIEW operation was a close match to the cost of a single tablescan, rather than being a multiple thereof:


select * from table(dbms_xplan.display_cursor('1k077bzp0t6mn',null,'outline alias cost 

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |  1117 (100)|      1 |00:00:00.13 |    8350 |   8334 |       |       |          |
|   1 |  SORT GROUP BY       |      |      1 |    250K|  1117   (6)|      1 |00:00:00.13 |    8350 |   8334 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW               |      |      1 |    250K|  1076   (2)|    250K|00:00:00.13 |    8350 |   8334 |       |       |          |
|   3 |    UNPIVOT           |      |      1 |        |            |    250K|00:00:00.09 |    8350 |   8334 |       |       |          |
|   4 |     TABLE ACCESS FULL| T1   |      1 |  50000 |  1074   (2)|  50000 |00:00:00.07 |    8350 |   8334 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$3
   2 - SET$1 / from$_subquery$_002@SEL$3
   3 - SET$1
   4 - SEL$2 / T1@SEL$2

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$3")
      NO_ACCESS(@"SEL$3" "from$_subquery$_002"@"SEL$3")
      FULL(@"SEL$2" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */

M_SQL_OUT
--------------------------------------------------------------------------------
SELECT "A1"."ID" "ID",MAX("A1"."VAL") "HIGH_VAL" FROM  (SELECT "A3"."ID" "ID","A
3"."PADDING" "PADDING",'COL1' "SOURCE","A3"."COL1" "VAL" FROM "TEST_USER"."T1" "
A3" UNION ALL SELECT "A3"."ID" "ID","A3"."PADDING" "PADDING",'COL2' "SOURCE","A3
"."COL2" "VAL" FROM "TEST_USER"."T1" "A3" UNION ALL SELECT "A3"."ID" "ID","A3"."
PADDING" "PADDING",'COL3' "SOURCE","A3"."COL3" "VAL" FROM "TEST_USER"."T1" "A3"
UNION ALL SELECT "A3"."ID" "ID","A3"."PADDING" "PADDING",'COL4' "SOURCE","A3"."C
OL4" "VAL" FROM "TEST_USER"."T1" "A3" UNION ALL SELECT "A3"."ID" "ID","A3"."PADD
ING" "PADDING",'COL5' "SOURCE","A3"."COL5" "VAL" FROM "TEST_USER"."T1" "A3") "A1
" GROUP BY "A1"."ID" ORDER BY "A1"."ID"
allstats last'));

It used to be hard enough working out what the run-time engine was doing when the optimizer and the run-time engine claimed that they were working on the same (transformed version of the) SQL; but now we can get the impression that the optimizer is directing the run-time engine to execute a plan that couldn’t possibly match the (unparsed) query that the optimizer had been considering.

Update (May 2019)

In a response to Stew Ashton in the comment section I pointed out that the unpivot feature (unlike the merge command) behaved nicely when it came to deciding which columns to project from the table you were operating on. I made the comment that I had tested 11.2.0.4 and 12.1.0.2 but not got round to testing 12.2.0.1.  A couple of years have now passed and today I finally tested 12.2.0.1 and 18.3.0.0 – they’ve taken a retrograde step, both versions project the padding column and the plan is forced to do a full tablescan even if there’s a suitable index that could be used for an index fast full scan.

Key point:  if you’re going to unpivot a few columns from a big table then don’t supply the table name in the “from” clause,  change it to an inline view that selects only the columns you need, for example:


select 
        id, max(val) as high_val 
from 
        (select id, col1, col2, col3, col4, col5 from t1) t1
unpivot include nulls ( 
        val for source in (col1, col2, col3, col4, col5)
) 
group by id 
order by id 
; 

With a new set of sample data the execution plan for a simple query referencing the table directly used a tablescan full, but the modified query above switched to an index fast full scan as I had a covering index available. Of course, you could use this strategy even if you don’t have a covering index as there will still be some CPU overhead of projecting redundant columns and passing them up the pipeline to the parent (unpivot) operation.

Update Mar 2020

Andrew Sayer has just brought Bug 22339954 – High Parse Time for Query Using UNPIVOT Operator (Doc ID 22339954.8) to my attention: present up to version 12.1.0.2, fixed in 12.2.0.1 with back ports for 11.2.0.3 and 12.1.0.2.

The details about the bug suggest that the bug also allows for errors in expansion and column projection (ORA-00904, ORA-00942), so part of the fix to bug 22339954 may be the underlying cause of the appearance of the redundant column in the projection list for my example.

Update Aug 2022

A recent thread on one of the Oracle forums has supplied an example of the expand_sql_text() utility producing a statement that is not equivalent to the input statement. The example was a simple “right outer join” (“ANSI” format) which was tranformed into a query in “Oracle” style that had lost the outer join and therefore produced the wrong results:

Input Text

select  d.deptno,ename 
from 
        emp e 
right outer join 
        dept d 
on 
        e.deptno=d.deptno 
where 
        ename is null

Output text (with cosmetic adjustments)

SELECT 
        A1.QCSJ_C000000000300001_2   DEPTNO,
        A1.ENAME_0                   ENAME 
FROM (
        SELECT 
                A3.ENAMEi     ENAME_0,
                A3.DEPTNO     QCSJ_C000000000300000,
                A2.DEPTNO     QCSJ_C000000000300001_2 
        FROM 
                EMP A3,
                DEPT A2 
        WHERE 
                A3.DEPTNO = A2.DEPTNO
        ) A1 
WHERE 
        A1.ENAME_0 IS NULL

This is with the data from the standard SCOTT schema, and I’ve reproduced the result on 19.11.0.0. Note the absence of a (+) on the where clause predicate at line 13, which should be: A3.DEPTNO(+) = A2.DEPTNO.

8 Comments »

  1. Jonathan,

    Thanks again for shedding light on this obscure corner. One detail I noticed:

    You mentioned in https://jonathanlewis.wordpress.com/2016/06/06/merge-precision/ that the optimizer doesn’t work out what columns are actually needed in the USING table, so it may scan the table even though a covering index would do.

    The same thing can happen here. If I code

    select name from hr.employees
    unpivot(name for descr in(first_name, last_name));

    then the table is scanned, but if I code

    select name from (select first_name, last_name from hr.employees)
    unpivot(name for descr in(first_name, last_name));

    then the optimizer scans the EMP_NAME_IX index instead.

    Best regards, Stew

    Comment by stewashton — June 16, 2017 @ 10:43 am BST Jun 16,2017 | Reply

  2. Stew,

    Thanks for the comment.

    I did a couple of checks – firstly by adding the “Projection” option to my current code, and this showed that the tablescan was projecting only the columns meeded.

    Then I created an index on (id, col1, col2, col3, col4, col5) on my test table and declaring id not null so that I had a covering index that could be used in an index-only query. Without changing the query Oracle switched to reporting (and using) an index fast full scan in both 11.2.0.4 and 12.1.0.2 (I haven’t tried 12.2.0.1 yet).

    However there was an oddity in the 10053 “unparsed query” text – it did actually show the padding column as part of the unparsed query, so maybe that’s connected somehow to your observation.

    Comment by Jonathan Lewis — June 16, 2017 @ 11:25 am BST Jun 16,2017 | Reply

  3. Thanks for this detailed analysis of what’s going on under the hood.

    I realize this is much more basic, but if could I ask a confirming question: Even though the Explain Plan for the UNPIVOT says the CBO doing the equivalent of a UNION ALL, would you expect the performance of the UNPIVOT to be significantly faster than someone hand-coding a UNION ALL? We have some old code that does the latter. I’ve been arguing that, in cases where performance is a problem, that it’s worth rewriting them to use UNPIVOT.

    Side-note: A shout-out to the only other Stew I’ve discovered in the Oracle world! :-)

    Comment by stewstryker — June 21, 2017 @ 4:08 pm BST Jun 21,2017 | Reply

    • Stew(2)

      Although the plan says “union all” it is an optimised variant which gets each block just once rather than repeating each subquery from the beginning, so even in the best “union all” case the unpivot should beat It for efficiency.

      Comment by Jonathan Lewis — June 21, 2017 @ 7:00 pm BST Jun 21,2017 | Reply

  4. […] 5. Unpivot […]

    Pingback by Summer Holiday Reading for OBIEE Fans. – OBIEE News — July 30, 2017 @ 8:32 am BST Jul 30,2017 | Reply

  5. […] hardly ever touched the pivot/unpivot feature in SQL, but a recent comment by Jason Bucata on a note I’d written about Java names […]

    Pingback by Pivot upgrade | Oracle Scratchpad — February 1, 2021 @ 1:35 pm GMT Feb 1,2021 | Reply

  6. […] Unpivot mechanics (June 2017): what does Oracle do internally to handle an unpivot? […]

    Pingback by Infrastructure Catalogue | Oracle Scratchpad — July 17, 2022 @ 8:22 pm BST Jul 17,2022 | Reply

  7. […] Unpivot mechanics (June 2017): what does Oracle do internally to handle an unpivot? […]

    Pingback by Performance catalogue | Oracle Scratchpad — July 17, 2022 @ 8:28 pm BST Jul 17,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.