Oracle Scratchpad

February 16, 2014

Recursive subquery factoring

Filed under: Hints,Ignoring Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:11 pm GMT Feb 16,2014

Updated for 19c, May 2020

This is possibly my longest title to date – I try to keep them short enough to fit the right hand column of the blog without wrapping – but I couldn’t think of a good way to shorten it (Personally I prefer to use the expression CTE – common table expression – over “factored subquery” or “subquery factoring” or “with subquery”, and that would have achieved my goal, but might not have meant anything to most people.)

If you haven’t come across them before recursive CTEs appeared in 11.2, are in the ANSI standard, and are (probably) viewed by Oracle as the strategic replacement for “connect by” queries. Here, to get things started, is a simple (and silly) example:


rem
rem     Script: recursive_with_3.sql
rem     Author: Jonathan Lewis
rem     Dated:  Feb 2014
rem

with data(p) as (
        select 1 p from dual
        union all
        select p + 1 from data where p < 100
)
select  p
from    data
where   rownum <= 10
;

;


         P
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 37253879

---------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |     2 |    26 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                             |      |       |       |            |          |
|   2 |   VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|   4 |     FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|*  5 |     RECURSIVE WITH PUMP                    |      |       |       |            |          |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   5 - filter("P"<100)

A recursive CTE has three features that identify it.

  • first, the query alias (“data” in this case) is followed by a list of column aliases;
  • secondly the query includes a UNION ALL; and
  • thirdly the second subquery in the UNION ALL references the query alias – i.e. it’s the recursive bit.

There are other optional bits but I’m not planning to go into those – all I want to talk about is how to control the materialization (or not) of a recursive CTE through hinting.

The reason I wrote this note was because Jeff Jacobs, in his presentation on “Performance Anti-patterns” at RMOUG last week, raised the question of whether or not the /*+ materialize */ and /*+ inline */ hints worked with recursive CTEs and gave an example of a UNION ALL query where the CTE always materialized, no matter how you applied the /*+ inline */ hint. The CTE seemed to be following the basic guideline for CTEs – if you use it once in the main query it goes inline, if you use it more than once it will (almost invariably) materialize – though there are some restrictions on materialization.

I’m always interested in examples where “the hint is ignored” so I exchanged a couple of email messages with Jeff and he sent me an example (which I’ve simplified for this blog) of a query that demonstrated the issue; and I spent a little while thinking about it and decided (initially) that it simply wasn’t possible to hint the code the way we wanted to and it was just one of those cases where it takes a bit of time for new features to catch up and fit in to the standard framework. Here’s a simplified version of the query, with its execution plan – note the “temp table transformation” that tells you that materialization has taken place:

with data(p) as (
        select 1 p from dual
        union all
        select p + 1 from data where p < 100
)
select  p
from    data
where   rownum <= 10
union all
select  p
from    data
where   rownum <= 10
;


-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                            |     4 |    52 |     4   (0)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION                 |                            |       |       |            |          |
|   2 |   LOAD AS SELECT                           | SYS_TEMP_0FD9D6608_7391CD7 |       |       |            |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|                            |       |       |            |          |
|   4 |     FAST DUAL                              |                            |     1 |       |     2   (0)| 00:00:01 |
|*  5 |     RECURSIVE WITH PUMP                    |                            |       |       |            |          |
|   6 |   UNION-ALL                                |                            |       |       |            |          |
|*  7 |    COUNT STOPKEY                           |                            |       |       |            |          |
|   8 |     VIEW                                   |                            |     2 |    26 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6608_7391CD7 |     2 |    12 |     2   (0)| 00:00:01 |
|* 10 |    COUNT STOPKEY                           |                            |       |       |            |          |
|  11 |     VIEW                                   |                            |     2 |    26 |     2   (0)| 00:00:01 |
|  12 |      TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6608_7391CD7 |     2 |    12 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("P"<100)
   7 - filter(ROWNUM<=10)
  10 - filter(ROWNUM<=10)

The following morning I woke up with one of those “overnight insights” where you seem to have worked out the answer in your sleep. To make a hint work you either have to put it in the right query block or you have to name the right query block in a hint that you’ve put in the main query block. In this case the right query block doesn’t exist in the text and it’s not possible to figure out what the name of the right query block would be if it came into existence.

If you try putting the /*+ inline */ hint into the query after the select at line 2 above, you’ve put the hint into the first query block of a union all, NOT into the query block of the recursvie CTE.

Having identified the problem the solution (or, at least, a possible solution) was obvious – create the query block you need. This (with its execution plan from 11.2.0.4) is what worked:

with data(p) as (
        select 1 p from dual
        union all
        select p + 1 from data where p < 100
),
data1 as (
        select /*+ inline */ * from data
)
select  p
from    (
        select * from data1 where rownum <= 10
        union all
        select * from data1 where rownum <= 10
        )
;

-----------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |      |     4 |    52 |     8   (0)| 00:00:01 |
|   1 |  VIEW                                        |      |     4 |    52 |     8   (0)| 00:00:01 |
|   2 |   UNION-ALL                                  |      |       |       |            |          |
|*  3 |    COUNT STOPKEY                             |      |       |       |            |          |
|   4 |     VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|   6 |       FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|*  7 |       RECURSIVE WITH PUMP                    |      |       |       |            |          |
|*  8 |    COUNT STOPKEY                             |      |       |       |            |          |
|   9 |     VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|  10 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|  11 |       FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|* 12 |       RECURSIVE WITH PUMP                    |      |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(ROWNUM<=10)
   7 - filter("P"<100)
   8 - filter(ROWNUM<=10)
  12 - filter("P"<100)

All I’ve done is create a second CTE that selects from the first CTE. This is now a simple select, so I can add a perfectly placed hint to it – in the hope that this would effectively require the dependent recursive CTE to be inlined inside it. It seems to be sufficient – the “temp table transformation” has disappeared and we now see two “recursive with”s on the dual table.

I haven’t tested the strategy exhaustively – so I can give you no guarantee that this has to work – unfortunately I did have another example that I applied the method to and, after several seconds of no response, it crashed with an ORA-00600 error but that might have been a side effect of the nature of query (it included a couple of the optional extras) rather than a specific feature of inlining.

Update May 2020

Prompted by a conversation that appeared in the comments in the last couple of days I’ve re-run the test on 12.2.0.1 and 19.3.0.0 and discovered that 19.3 (and it may apply to 18c as well) will accept the hint /*+ inline(@set1) */ – note that that’s SET not SEL – as a directive to move the recursive subquery inline.


with data(p) as (
        select 1 p from dual
        union all
        select p + 1 from data where p < 100 
)
select  /*+ qb_name(main) inline(@set$1) */
        p
from    data 
where   rownum <= 10
union all
select  p
from    data 
where   rownum <= 10
;

This is a nicer strategy than my workaround of wrapping the subquery inside a second subquery, but the hint doesn’t work with 12.2.0.1 (unless it’s in a patch set that I haven’t installed).

There is a strategy that will work with 12.2, and that is to set the hidden parameter “_with_subquery” to ‘inline’ before running the query and setting it back to (the default) ‘optimizer’ afterwards. This will force the optimizer to inline any CTEs while the parameter is set. It is also legal to set the parameter to ‘materialize’ to force materialization. However it’s not possible to set the parameter for the duration of the query through the /*+ opt_param() */ hint, nor is it possible to use the outline data to recreate the plan.

I did try generating the outline data while the parameter was set to inline and then using the resulting set of hints by itself – this didn’t work in 12c, but it did work in 19.3 provided I included the begin_outline_data / end_outline_data pair in the hint set.

22 Comments »

  1. Sorry I missed Jeff’s included bit about Recursive Subquery Factoring, as that was the basis of my presentation at RMOUG.
    I have not played with hints too much in RSF, other than to force a Temp Table Transformation where I know it will work better.
    What I have tried to do, so far unsuccessfully so far, is to get a MATERIALIZE hint to work with the older CONNECT BY syntax.

    The reason for this is to see if there is a way around the limitation with CONNECT BY where when the memory runs out, an ORA-30009 occurs.
    RSF has no such limitation, at least not in memory.
    This example will error out on my laptop when about 100M of memory has been consumed:

    with a as (
      select level /*+ materialize */ id
      from dual
      connect by level <= 5000000
    )
    select id from a
    

    This takes a couple minutes to max out the memory on my 11.2.0.3 test database.

    Using Frits Hoogland’s profiler I can see that there are no calls to functions that write to disk.

    Monitoring v$temp_usage shows no usage either.

    Maybe there is another method to make this work with CONNECT BY, but I have not yet found one.

    Interestingly, if you replace ‘id’ in the main query with count(*), oracle does a little math and calculates there isn’t enough memory and returns quite quickly.

    RSF on the other hand, when it begins to run out of memory, just writes to TEMP.

    with gen (id) as (
       select 0 id from dual
       union all
       select gen.id + 1 as id
       from gen
       where id <= 5000000
    )
    select id from gen
    

    As the memory reaches max, oracle starts writing to a sort segment as seen here.

      Oracle Memory:             14002664
    
    
             USERNAME    SID       TYPE     BYTES
      =============== ====== ========== =========
              JKSTILL   4365  LOB_INDEX  68157440
              JKSTILL   4365       SORT  24117248
    

    Perhaps more interesting is the LOB_INDEX, as RSF starts writing to this immediately.
    I have not yet tried to work out why RSF is using memory and a LOB_INDEX object right at the start.

    So RSF by default is using TEMP space as soon as it starts.

    Even for much smaller values (5000) in the comparison for the id column, TEMP usage can be observed.

    Add in the MATERIALIZE hint and it gets more interesting:

    with gen (id) as (
       select 0 id from dual
       union all
       select gen.id + 1 as id
       from gen
       where id <= 5000000
    ),
    d as (
       select /*+ materialize */ id from gen
    )
    select id from d
    
    
      Oracle Memory:             69101384
    
    
             USERNAME    SID       TYPE     BYTES
      =============== ====== ========== =========
              JKSTILL   4365       DATA  14680064
              JKSTILL   4365  LOB_INDEX  67108864
              JKSTILL   4365       SORT  24117248
    
    

    Another interesting apparent optimization of RSF is that when it completes, free() is called to release any memory used for the recursive operations.

    CONNECT BY does not do that, rather it holds on to the memory until another operations clears it.

    Hmm, I may have strayed off topic.

    Also, the ANSI name does seem preferable.

    Comment by jkstill — February 17, 2014 @ 6:42 am GMT Feb 17,2014 | Reply

    • Jared,

      Several interesting points – and not at all off-topic. it’s always useful to collate bits of information like this.

      The LOB_INDEX bit is odd – I wonder if that’s a tagging error in the structure or a reporting error in the view, because I don’t think you can see a LOB_INDEX that large without seeing a LOB SEGMENT that it points to.

      I wonder if the absence of a “free” call on the “connect by” memory allocation can be taken as a clue that that entire section of code has been frozen: maybe at some future date we’ll see an internal transformation that rewrites a connect by as a recursive CTE during semantic analysis.

      I’ll try to find some time to test your example on 11.2.0.4 and 12.1.0.1 (unless someone else does it first).

      Comment by Jonathan Lewis — February 18, 2014 @ 10:28 am GMT Feb 18,2014 | Reply

      • Hi Jonathan,

        I dug into this a bit more in an attempt to find out how the TEMP space is being used.

        The LOB_INDEX value is correct when querying v$temp_usage.
        (Why not LOBINDEX though?)

        The RSF query was rerun with 10046 tracing enabled.
        Direct path writes were performed on objects 0 and 528 as shown

        
        [oracle@ora11203fs trace]$ grep 'direct path write temp' js01_ora_7043.trc | awk '{ print $15 }' | sort | uniq -c
             12 obj#=0
            541 obj#=528
        

        I don’t know what object 0 refers to, and have not been able to find out – there is no object 0 in dba_object, obj$ or v$fixed_table
        There may be a clue in ?/rdbms/admin scripts, but I have not yet looked for any info there on this.

        Object 528 however can be identified:

        
        16:27:58 ora11203fs.jks.com - sys@js01 SQL> select * from dba_objects where object_id = 528;
        
        OWNER	     OBJECT NAME		    SUBOBJECT_NAME		    OBJECT_ID DATA_OBJECT_ID
        ------------ ------------------------------ ------------------------------ ---------- --------------
        			      LAST DDL
        OBJECT_TYPE	    CREATED   TIME	TIME STAMP	    STATUS  T G S  NAMESPACE
        ------------------- --------- --------- ------------------- ------- - - - ----------
        EDITION_NAME
        ------------------------------
        SYS	     KOTAD$								  528		 528
        TABLE		    01-JAN-13 01-JAN-13 2013-01-01:10:12:39 VALID   N N N	   1
        

        It does use a LOB:

        16:30:30 ora11203fs.jks.com - sys@js01 SQL> @findlob
        Enter value for table_name: KOTAD$
        Enter value for owner: SYS
        
              Table Column				  Segment	      Segment
        Owner Name  Name       Segment Name		  Type			Bytes
        ----- ----- ---------- -------------------------- ---------- ----------------
        SYS   KOTAD SYS_NC_ROW SYS_LOB0000000528C00002$$  LOBSEGMENT	       65,536
              $     INFO$
        
        
        1 row selected.
        

        During the course of running the RSF query the number of rows stays constant, so an existing row is being reused.

        The data cannot be viewed directly from SQL*Plus

        
        16:33:09 ora11203fs.jks.com - sys@js01 SQL> select count(*) from sys.kotad$;
        
          COUNT(*)
        ----------
             16060
        
        
          1* select * from sys.kotad$
        16:36:52 ora11203fs.jks.com - sys@js01 SQL> /
        select * from sys.kotad$
                          *
        ERROR at line 1:
        ORA-30732: table contains no user-visible columns
        
        

        The table is created with type SYS.KOTAD, and there doesn’t seem to be any simple method to view the data.
        Exploring the type attributes results in a number of cryptically named attributes such as KOTADFLG, etc.

        Searching the data dictionary scripts resulted in the following method that can be used to create a copy of the table for exploration without fear of messing up the database.
        Note: Please don’t do this in any database that is not expendable

        
        09:41:59 SYS@js01 AS SYSDBA> alter session set events '22372 trace name context forever'
        09:42:02   2  /
        
        Session altered.
        
        Elapsed: 00:00:00.00
        09:42:02 SYS@js01 AS SYSDBA> create table kotad_temp$ of kotad;
        
        Table created.
        
        Elapsed: 00:00:00.21
        09:42:11 SYS@js01 AS SYSDBA> insert into kotad_temp$(sys_nc_oid$, sys_nc_rowinfo$) select sys_nc_oid$, sys_nc_rowinfo$ from kotad$;
        
        16060 rows created.
        
        Elapsed: 00:00:01.04
        09:42:21 SYS@js01 AS SYSDBA> commit;
        
        Commit complete.
        
        

        This is where I have run into a bit of a dead end.

        As per the oracle docs, temporary lobs such as this can only be manipulated via API or PL/SQL.
        I have worked very little with lobs, and while I could manage creating and playing LOBs in my own code,
        deciphering how to access the LOBs in this table to find out what is happening will consume too much time,
        or at least more than I have available to spend on it now.

        Perhaps someone else reading this will already know how to use SYS.KOTAD and DBMS_LOB to access this data.

        Comment by jkstill — February 25, 2014 @ 6:27 pm GMT Feb 25,2014 | Reply

        • Jared,

          I just tried to repeat your experiment using 11.2.0.4
          My direct path writes reported obj# = 0 or obj# = -1, which I would take to mean “no object” – I think the obj# = 528 might be a red herring, a value left in parameter 3 from a previous wait perhaps.

          Comment by Jonathan Lewis — February 25, 2014 @ 8:17 pm GMT Feb 25,2014

        • Thanks Jonathan – that well could be a red herring.
          A different approach will need to be taken, just need to ponder it a bit more.

          Comment by jkstill — February 25, 2014 @ 8:23 pm GMT Feb 25,2014

        • Hi Jonathan,

          I took another look at the 10046 trace file, and obj#=528 is indeed a red herring.
          The obj# is from a previous wait.

          WAIT #139894016593672: nam='SQL*Net message from client' ela= 345 driver id=1650815232 #bytes=1 p3=0 obj#=528 tim=1393288033248613
          WAIT #139894016593672: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=528 tim=1393288033248657
          WAIT #139894016593672: nam='direct path write temp' ela= 3079 file number=201 first dba=167296 block cnt=31 obj#=528 tim=1393288033278367
          WAIT #139894016593672: nam='direct path write temp' ela= 2196 file number=201 first dba=167327 block cnt=31 obj#=528 tim=1393288033332418
          WAIT #139894016593672: nam='direct path write temp' ela= 3334 file number=201 first dba=167358 block cnt=31 obj#=528 tim=1393288033372929
          

          The waits on ‘direct path write temp’ are just sort block writes to temp.

          The file number is a bit of a mystery, as there is no file# 201.

          Using strace I can see that the TEMP file is opened, and the handle duplicated to 258

          open("/u01/oradata/JS01/datafile/o1_mf_temp_8g69rnkr_.tmp", O_RDWR|O_DSYNC) = 11
          getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
          fcntl(11, F_DUPFD, 256)                 = 258
          close(11)                               = 0
          fcntl(258, F_SETFD, FD_CLOEXEC)         = 0
          

          A quick google search shows that ‘file number=201’ is fairly common for direct path writes, so perhaps this is hardcoded.

          Comment by jkstill — February 27, 2014 @ 4:37 pm GMT Feb 27,2014

        • Jared,

          The 201 file number is the “default” for most systems. File numbers for tempfiles are the “tempfile number: file#” from v$tempfile + parameter db_files, and the default for that parameter is 200 … hence 201.

          Comment by Jonathan Lewis — February 27, 2014 @ 7:36 pm GMT Feb 27,2014

  2. Thanks Jonathan – I thought the 201 must be a default value, but did not know how it was derived.

    Comment by jkstill — February 27, 2014 @ 11:19 pm GMT Feb 27,2014 | Reply

    • Back to the question about RSF creating LOBINDEX segments:

      V$TEMPSEG_USAGE is a synonym for V_$SORT_USAGE, which is in turn a view based on GV_$SORT_USAGE.

      GV_$SORT_USAGE is a view based on V$SESSION and X$KTSSO.

      The space management bits are in the X$KTSSO table (KTS = kernel space management – ‘SO’ may be ‘SOrt’)

      The SEGTYPE column is the one that has identified the RSF query as generating LOB_INDEX segments:

       
      decode(ktssosegt, 1, 'SORT', 2, 'HASH', 3, 'DATA', 4, 'INDEX', 5, 'LOB_DATA', 6, 'LOB_INDEX' , 'UNDEFINED') 
      

      If identifying segments as LOB_INDEX is incorrect, then it would appear to be an error in the Oracle Kernel.

      Here is the data from one RSF query that spilled over into TEMP:

      
      16:36:41 ora11203fs.jks.com - sys@js01 SQL&gt; select * from X$KTSSO;
      
      ADDR                   INDX    INST_ID KTSSOSES           KTSSOSNO KTSSOTSN     KTSSOCNT  KTSSOSEGT   KTSSOFNO   KTSSOBNO  KTSSOEXTS  KTSSOBLKS  KTSSORFNO  KTSSOOBJD  KTSSOOBJN KTSSOTSNUM KTSSOSQLID
      ---------------- ---------- ---------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------
      00007F0947860360          0          1 00000000917B3C00       3047 TEMP                1          6        201     164480         23       2944          1    4358784          0          3 bd6rf4gs1d3uk
      00007F0947860360          1          1 00000000917B3C00       3047 TEMP                1          1        201     126976          1        128          1    4321280          0          3 bd6rf4gs1d3uk
      00007F0947860360          2          1 00000000917B3C00       3047 TEMP                1          6        201     124416          2        256          1    4318720          0          3 bd6rf4gs1d3uk
      
      3 rows selected.
      
      

      As the X$KTSSO table contains the address of the values, one might start poking around in memory to explore this.

      At this point I am not sure it is worth pursuing further.

      Here is the full mapping of the columns in GV$SORT_USAGE for anyone interested:

       
      gv$sort_usage  : x$ktsso
      ==============================
      N/A            : ADDR
      INT_ID         : inst_id
      USERNAME       : v$sessionusername
      USER           : v$sessionusername 
      SESSION_ADDR   : ktssoses
      SESSION_NUM    : ktssosno
      SQLADDR        : v$sessionprev_sql_addr
      SQLHASH        : v$sessionprev_hash_value
      SQL_ID         : v$sessionprev_sql_id
      TABLESPACE     : ktssotsn
      CONTENTS       : decode(ktssocnt, 0, 'PERMANENT', 1, 'TEMPORARY')
      SEGTYPE        : decode(ktssosegt, 1, 'SORT', 2, 'HASH', 3, 'DATA', 4, 'INDEX', 5, 'LOB_DATA', 6, 'LOB_INDEX' , 'UNDEFINED')
      SEGFILE#       : ktssofno
      SEGBLK#        : ktssobno
      EXTENTS        : ktssoexts
      BLOCKS         : ktssoblks
      SEGRFNO#       : ktssorfno
      

      You may have noticed the SQL_ID information is incorrect.

      The following Oracle Support document shows that the correct SQL_ID value can be found in the KTSSOSQLID columns

      Bug 17834663 – Include SQL ID for statement that created a temporary segment in GV$SORT_USAGE (Doc ID 17834663.8)

      Comment by jkstill — March 2, 2014 @ 12:55 am GMT Mar 2,2014 | Reply

  3. […] common table expressions are still quite new to Oracle, hints can be ignored, as discussed here, which of course is of considerable interest to database developers who want to tune […]

    Pingback by How to Multiply Across a Hierarchy in Oracle: Performance (Part 2/2) | Databaseline — October 12, 2014 @ 8:15 am BST Oct 12,2014 | Reply

  4. […] For more information on Recursive With clause(recursive subquery factoring), please check. […]

    Pingback by how to replace multiple replace functions? | Oracle Insight and Performance concepts — December 7, 2014 @ 6:32 am GMT Dec 7,2014 | Reply

  5. Hi Jonathan

    Always a pleasure reading thru your blogs. Thanks in advance for reading thru this long post.

    I’m looking at a performance issue with this recursive WITH query.

    CREATE TABLE TARGET
    AS
    SELECT  *
    FROM    (
            WITH t_cte (hop, ID, ….  LEVEL_FIELD) AS (
                    SELECT 
                            1 AS hop,
                            stage.ID,
                            ….
                            stage.LEVEL_FIELD
                    FROM 
                            T1 stage
                    WHERE 
                            C1 || ‘.’ || stage.C2 = ‘XYZ.BHCK1765’
                    AND     stage.C3 = ‘BHCK1765’
                    AND     stage.C4 = ‘GCR_POSTING_AMT_GC_USD’
                    UNION ALL
                    SELECT
                            hop + 1,
                            stage.ID,
                            stage.PROC_DATA_SET_PHY_ID,
                            stage.C1,
                            stage.PARNT_PROC_LGCL_NAME,
                            stage.LEVEL_FIELD
                    FROM 
                            T1 stage
                    INNER JOIN 
                            t_cte lc
                    ON
                            lc.C5 = stage.C4
                    AND     lc.C6 = stage.C3
                    AND     lc.C7 = stage.C1 || ‘.’ || stage.C2
            )
            SELECT 
                    *
            FROM 
                    t_cte
            WHERE 
                    PROC_DATA_SET_PHY_ID IS NOT NULL
            )
    ;
    
    
    
    Global Stats
    =====================================================================================================
    | Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read  | Read  | Write | Write |    Offload     |
    | Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes | Reqs  | Bytes | Returned Bytes |
    =====================================================================================================
    |   36059 |   35817 |       55 |      187 |    23G | 73001 |  36GB | 15797 |  10GB |           10GB |
    =====================================================================================================
    
    SQL Plan Monitoring Details (Plan Hash Value=2863067891)
    ==================================================================================================================================================================================================================================
    | Id   |                   Operation                   |       Name       |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Mem  | Temp  | Activity |         Activity Detail          |
    |      |                                               |                  | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |       |       |   (%)    |           (# samples)            |
    ==================================================================================================================================================================================================================================
    |    0 | CREATE TABLE STATEMENT                        |                  |         |      |      7815 |     +4 |     1 |        0 |       |       |       |       |     . |     . |          |                                  |
    |    1 |   LOAD AS SELECT                              | TARGET_TABLE1234 |         |      |      7815 |     +4 |     1 |        0 |       |       |  1325 |   1GB |   2MB |     . |     0.02 | log file switch completion (3)   |
    |      |                                               |                  |         |      |           |        |       |          |       |       |       |       |       |       |          | Cpu (2)                          |
    |      |                                               |                  |         |      |           |        |       |          |       |       |       |       |       |       |          | ASM IO for non-blocking poll (2) |
    |    2 |    OPTIMIZER STATISTICS GATHERING             |                  |       2 | 3205 |      7815 |     +4 |     1 |       3M |       |       |       |       |   5MB |     . |     0.02 | Cpu (6)                          |
    |    3 |     VIEW                                      |                  |       2 | 3205 |     36056 |     +4 |     1 |       3M |       |       |       |       |     . |     . |     0.07 | Cpu (26)                         |
    | -> 4 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST |                  |         |      |     36057 |     +4 |     1 |      97M |       |       | 13688 |   8GB |  17MB | 343MB |    99.16 | Cpu (35724)                      |
    |    5 |       TABLE ACCESS STORAGE FULL               | T1               |       1 |  347 |         1 |     +4 |     1 |        2 |       |       |       |       |     . |     . |          |                                  |
    |    6 |       HASH JOIN                               |                  |       1 | 2856 |     36056 |     +4 |    98 |      97M |   267 |  75MB |   267 |  75MB | 378MB |     . |     0.37 | Cpu (132)                        |
    |      |                                               |                  |         |      |           |        |       |          |       |       |       |       |       |       |          | direct path read temp (1)        |
    |    7 |        RECURSIVE WITH PUMP                    |                  |         |      |     35383 |     +4 |    98 |      96M | 16442 |   8GB |       |       |     . |     . |     0.06 | Cpu (10)                         |
    |      |                                               |                  |         |      |           |        |       |          |       |       |       |       |       |       |          | direct path read temp (13)       |
    |    8 |        BUFFER SORT (REUSE)                    |                  |         |      |     36056 |     +0 |    98 |      98M | 56236 |  27GB |   517 | 288MB |   2MB | 288MB |     0.29 | Cpu (59)                         |
    |      |                                               |                  |         |      |           |        |       |          |       |       |       |       |       |       |          | direct path read temp (46)       |
    |    9 |         TABLE ACCESS STORAGE FULL             | T1               |      1M |  355 |         4 |     +1 |     1 |       1M |       |       |       |       |     . |     . |     0.00 | Cpu (1)                          |
    ==================================================================================================================================================================================================================================
    
    
    

    I have couple of questions here.

    1. This SQL never completes. Underlying table T1 has one million records. A-Rows at OP 7 and 6 is 96M. How is that possible when the total number of records in the table is 1M. Even if I have 1M levels, total number of records fetched would still be 1M. And the anchor query, first part of the union block returns just 2 rows.
    Am I missing something here ? At the time this SQL monitor report was taken, I can see that Oracle has processed 98 levels and still going on . Do you think that Oracle is processing more records than it should ?

    2. Even after creating indexes and using hints, I’m unable to force nested loop joins.


    Best Regards
    Vishal

    Comment by Vishal Beri — May 13, 2020 @ 5:37 pm BST May 13,2020 | Reply

    • Vishal,

      Thanks for the question – the topic is a large one and recursion is always difficult to think through in the abstract, so I don’t really have the space or time to give you a completely detail answer. (And I’d need to know your data to be sure of giving you an exact answer).

      The key points, though are these:
      a) You’re doing a self join – which means you can easily make the output larger than the input
      b) Recursion means you’re repeating the self join – and you hope that each time you eliminate more data on the join so that the recursive descent ends

      In your case I’d work through a couple of steps by hand to try and clarify what’s going on. Run the query that returns the first two rows. Run a query that joins those two rows back to the table (and saves the result in another table), run the query that joins the temporary result table to the base table — repeat a couple more times to see what’s actually happening.

      Bear in mind that your predicate “PROC_DATA_SET_PHY_ID IS NOT NULL” is not applied until the VIEW operation – which is why you drop from 96M rows to 3M rows at that point.

      As a little example to consider of the impact of self joins on volume:

      SQL> create table t1 as select * from all_objects where rownum <= 100;
      
      Table created.
      
      SQL> insert into t1 select * from t1;
      
      100 rows created.
      
      SQL> commit;
      
      Commit complete.
      
      SQL> select count(*) from t1 a, t1 b where a.object_id = b.object_id;
      
        COUNT(*)
      ----------
             400
      
      1 row selected.
      
      
      

      I’ve got a table with 200 rows in it, but because the join doesn’t have a uniqueness constraint (at either end) the volume of data coming out of the join is larger than the volume of data in the table.

      Regarding the second question: the first step is to look at the predicates as they are applied in the plan, and the “alias” information, and the “outline” information. If it is possible to use an index you may have to define it very carefully and make sure you direct the hint at the correct query block.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — May 13, 2020 @ 9:01 pm BST May 13,2020 | Reply

      • Thanks Jonathan for taking a look. Appreciate your time.
        Well I did the recursion manually.

        1. Ran the first part of the union block, stored it in a table T2
        2. Joined T2 back to the original Table T1 – Got two rows again. Stored this in T3
        3. Joined T3 back to T1 – Again got two rows .

        Repeated it 5 times. Every time I just get 2 records.

        I understand that looking at the data is really important here. But assuming two cases.

        1. I have 1 million levels. In that case recursive with pump should not exceed 1M
        2. Assuming that I just have one level, then I fetch all 1M records and then there are no more levels to probe.

        I also ran some queries on the basic emp table with emp-mgr recursive. I tried all level of skewneess but recursive with pump never fetches more records than number of rows in the emp table.

        Is there something else you would like me to test ?


        Best Regards
        Vishal

        Comment by Vishal Beri — May 13, 2020 @ 9:46 pm BST May 13,2020 | Reply

        • I do not know your data and I see nothing in your query that would cause your query to end – though Oracle might run out of memory or have some limit on recursion that causes it to crash. What do you know about the data that makes you sure the query has a stopping point ? (You’ve not shown the entire select list – in particular the derivation of c5, c6, c7 – so maybe there’s something there that makes it obvious).

          Just to check a detail – you do realise that operation 8 reports 98M rows because it has averaged 1M rows on each of 98 starts, and operation 7 reported 96M rows because it has averaged a little under 1M rows on each of 98 starts. Operation 7 may be very deceptive because it looks as if it’s 1M rows per start, but it’s possible that it was a very small number of rows for the first few iterations and then started to grow more quickly. Your point that you have 827,592 distinct combinations for (c5,c6,c7) actually means you have to account for duplication across 172, 408 rows.

          Since I don’t know anything about your data and how you derive the columns that go into the join, for all I know it’s possible that you have some arrangement with sets of values appearing 55,000 times each. If at some level you change from 2 rows to the first high duplicate count you get 55,000 rows; if they each point to one of the other duplicates then the next level could give you 3 billion rows. Clearly it’s not that extreme, but a small number of levels with a small number of repetitions can quickly take you to very large numbers of rows.

          I was hoping you’d see some fairly rapid escalation in a couple of steps but since you haven’t the think I’d do next is run the query again with the /*+ monitor */ hint and keep repeating a query for the SQL Monitor output every couple of seconds to see if you can see the the number of Rows (Actual) increasing slowly for a few iterations before starting to ramp up to something more like 1M rows per iteration.

          Comment by Jonathan Lewis — May 13, 2020 @ 11:33 pm BST May 13,2020

      • And another bit.

        Just to rule out that self join is exploding the data set because of non-uniqueness

        I ran this

        select count(*) from
        (select distinct C5 C6 C7 from T1)

        — And this count is 827592 . ( very close to 1M )

        So, most of the values are indeed unique and self join should not fetch catersan-join like num rows.

        Comment by Vishal Beri — May 13, 2020 @ 10:01 pm BST May 13,2020 | Reply

    • Vishal,

      I’ve had a little play with the SCOTT emp/dept schema to see what could be done, first using the default tables and indexes with the following query:

      
      with org_chart (
              empno, ename, mgr
      ) as (
              select  empno, ename, mgr
              from    emp
              where   mgr is null
              union  all
              select  e.empno, e.ename, e.mgr
              from    org_chart oc
              join    emp e
              on      e.mgr = oc.empno
      )
      select 
              * 
      from    org_chart
      where   mgr is not null
      /
      
      

      The plan I got (after adding a swap_join_inputs() hint for the hash join – to model your plan exactly – was:

      
      ---------------------------------------------------------------------------------------------------------------------------------------
      | Id  | Operation                                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
      ---------------------------------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                          |      |      1 |        |     13 |00:00:00.01 |      12 |       |       |          |
      |*  1 |  VIEW                                     |      |      1 |    203 |     13 |00:00:00.01 |      12 |       |       |          |
      |   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |      1 |        |     14 |00:00:00.01 |      12 |  2048 |  2048 | 2048  (0)|
      |*  3 |    TABLE ACCESS FULL                      | EMP  |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
      |*  4 |    HASH JOIN                              |      |      4 |    202 |     13 |00:00:00.01 |       6 |  1856K|  1856K| 1442K (0)|
      |   5 |     RECURSIVE WITH PUMP                   |      |      4 |        |     14 |00:00:00.01 |       0 |       |       |          |
      |   6 |     BUFFER SORT (REUSE)                   |      |      4 |        |     52 |00:00:00.01 |       6 | 73728 | 73728 |          |
      |*  7 |      TABLE ACCESS FULL                    | EMP  |      1 |     13 |     13 |00:00:00.01 |       6 |       |       |          |
      ---------------------------------------------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
          1 - filter("MGR" IS NOT NULL)
         3 - filter("MGR" IS NULL)
         4 - access("E"."MGR"="OC"."EMPNO")
         7 - filter("E"."MGR" IS NOT NULL)
      

      Now, when I hint it to use a nested loop at operation 4 with an indexed access path into the EMP at operation 7 – and in the best case I created an index on (mgr) first, I got the following:

      
      ----------------------------------------------------------------------------------------------------------------
      | Id  | Operation                                 | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
      ----------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                          |          |      1 |        |     13 |00:00:00.01 |      16 |
      |*  1 |  VIEW                                     |          |      1 |    203 |     13 |00:00:00.01 |      16 |
      |   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|          |      1 |        |     14 |00:00:00.01 |      16 |
      |*  3 |    TABLE ACCESS FULL                      | EMP      |      1 |      1 |      1 |00:00:00.01 |       6 |
      |   4 |    NESTED LOOPS                           |          |      4 |    202 |     13 |00:00:00.01 |      10 |
      |   5 |     RECURSIVE WITH PUMP                   |          |      4 |        |     14 |00:00:00.01 |       0 |
      |   6 |     TABLE ACCESS BY INDEX ROWID           | EMP      |     14 |      2 |     13 |00:00:00.01 |      10 |
      |*  7 |      INDEX RANGE SCAN                     | EMP_IDX1 |     14 |      2 |     13 |00:00:00.01 |       6 |
      ----------------------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
         1 - filter("MGR" IS NOT NULL)
         3 - filter("MGR" IS NULL)
         7 - access("E"."MGR"="OC"."EMPNO")
             filter("E"."MGR" IS NOT NULL)
      

      Crticially the index range scan starts 14 times – which is probably a measure of how many nodes there are in the tree before you get to the final leaf values, rather than being the number of rows in the table.

      My choice of index definition was dictated by the predicates in operation 4 (hash join) and 7 (table access full) in the previous plan. So it is possible to bypass the hash join and buffer() for this data. If you want to post the predicate section and outline sections for your query I’ll work out the necessary hint set for you.

      For reference, I took the whole hint set from my first plan, which included the following lines:

      
            FULL(@"SEL$F1D6E378" "E"@"SEL$1")
            FULL(@"SEL$F1D6E378" "OC"@"SEL$1")
            LEADING(@"SEL$F1D6E378" "E"@"SEL$1" "OC"@"SEL$1")
            USE_HASH(@"SEL$F1D6E378" "OC"@"SEL$1")
            SWAP_JOIN_INPUTS(@"SEL$F1D6E378" "OC"@"SEL$1")
      
      

      (the swap_join_inputs() was my manual intervention – you may not see it) and replaced them with

      
            FULL(@"SEL$F1D6E378" "OC"@"SEL$1")
            INDEX_RS_ASC(@"SEL$F1D6E378" "E"@"SEL$1" ("EMP"."MGR"))
            LEADING(@"SEL$F1D6E378" "OC"@"SEL$1" "E"@"SEL$1")
            USE_NL(@"SEL$F1D6E378" "E"@"SEL$1")
      
      

      Then I copied the whole hint set into the query.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — May 14, 2020 @ 10:05 am BST May 14,2020 | Reply

      • Thanks Jonathan. I really appreciate your time.
        Based on your suggestion, I ran the SQL monitor every few seconds to see how quickly things escalate.
        For the first few iterations- I saw a jump of 2 records every time.
        Then for 20th iteration – number went up to 5181
        And .. then around 50th execution – it was 1M every time.

        You were spot on, it is indeed a data issue. Constructed this little test case

        
        CREATE TABLE emp (EMPNO NUMBER, MGRNO NUMBER) 
        
        BEGIN
            INSERT INTO emp
                 VALUES (1, '');
        
            INSERT INTO emp
                 VALUES (2, 1);
        
            FOR i IN 1 .. 100
            LOOP
                INSERT INTO emp
                     VALUES (3, 2);
                          
                INSERT INTO emp
                     VALUES (4, 3);
        
                INSERT INTO emp
                     VALUES (5, 4);
            END LOOP;
        END;
        /
        
        WITH  /*+ MONITOR  */ 
             e (xlevel, empno, mgrno)
              AS (
                SELECT 1, empno, mgrno
                FROM emp
                WHERE mgrno IS NULL
                UNION ALL
                SELECT mgr.xlevel+1, emp.empno, emp.mgrno
                FROM emp, e mgr
               WHERE emp.mgrno = mgr.empno
            )
           SELECT *
           FROM /*+ MONITOR  */  e;
        
        
        

        Just 302 records in the underlying table, with few repetitions and fewer levels, 302 records quickly escalated to 1M rows.
        With 50K+ duplicates, this query would never complete.


        Best Regards
        Vishal

        Comment by Vishal Beri — May 14, 2020 @ 3:28 pm BST May 14,2020 | Reply

  6. […] A brief note on hinting for recursive subquery factoring […]

    Pingback by CTE Catalogue | Oracle Scratchpad — June 10, 2020 @ 6:46 pm BST Jun 10,2020 | Reply

  7. […] a notable change in the way the optimizer does cost and cardinality calculations for recursive subquery factoring that may make some of your execution plans change – with a massive impact on performance […]

    Pingback by Recursive WITH upgrade | Oracle Scratchpad — July 10, 2020 @ 4:19 pm BST Jul 10,2020 | Reply

  8. […] It turns out that I discovered this enhancement a few months ago while doing some experimentation with recursive subquery factoring. […]

    Pingback by Inline Hint | Oracle Scratchpad — October 12, 2020 @ 11:16 am BST Oct 12,2020 | Reply

  9. […] turn a pair of dates into a date range – which can be done in many ways but the OP had used a recursive “with subquery” (CTE/common table […]

    Pingback by SQL Macro | Oracle Scratchpad — July 22, 2021 @ 6:21 pm BST Jul 22,2021 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.