Oracle Scratchpad

February 16, 2014

Recursive subquery factoring

Filed under: Hints,Ignoring Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:11 pm GMT Feb 16,2014

This is possibly my longest title to date – I try to keep them short enough to fit the right hand column of the blog without wrapping – but I couldn’t think of a good way to shorten it (Personally I prefer to use the expression CTE – common table expression – over “factored subquery” or “subquery factoring” or “with subquery”, and that would have achieved my goal, but might not have meant anything to most people.)

If you haven’t come across them before, recursive CTEs appeared in 11.2, are in the ANSI standard, and are (probably) viewed by Oracle as the strategic replacement for “connect by” queries. Here, to get things started, is a simple (and silly) example:


with data(p) as (
	select 1 p from dual
	union all
	select p + 1 from data where p < 100
)
select	p
from	data
where	rownum <= 10
;

         P
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 37253879

---------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |     2 |    26 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                             |      |       |       |            |          |
|   2 |   VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|   4 |     FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|*  5 |     RECURSIVE WITH PUMP                    |      |       |       |            |          |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   5 - filter("P"<100)

A recursive CTE has three features that identify it. First, the query alias (“data” in this case) is followed by a list of column aliases; secondly the query includes a UNION ALL; and thirdly the second subquery in the UNION ALL references the query alias – i.e. it’s the recursive bit. There are other optional bits but I’m not planning to go into those – all I want to talk about is how to control the materialization (or not) of a recursive CTE through hinting.

The reason I wrote this note was because Jeff Jacobs, in his presentation on “Performance Anti-patterns” at RMOUG last week, raised the question of whether or not the /*+ materialize */ and /*+ inline */ hints worked with recursive CTEs and gave an example of a UNION ALL query where the CTE always materialized, no matter how you applied the /*+ inline */ hint. The CTE seemed to be following the basic guideline for CTEs – if you use it once in the main query it goes inline, if you use it more than once it will (almost invariably) materialize.

I’m always interested in examples where “the hint is ignored”, so I exchanged a couple of email messages with Jeff and he sent me an example (which I’ve simplified for this blog) of a query that demonstrated the issue; and I spent a little while thinking about it and decided that it simply wasn’t possible to hint the code the way we wanted to and it was just one of those cases where it takes a bit of time for new features to catch up and fit in to the standard framework. Here’s a simplified version of the query, with its execution plan:

with data(p) as (
	select 1 p from dual
	union all
	select p + 1 from data where p < 100
)
select	p
from	data
where	rownum <= 10
union all
select	p
from	data
where	rownum <= 10
;

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                            |     4 |    52 |     4   (0)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION                 |                            |       |       |            |          |
|   2 |   LOAD AS SELECT                           | SYS_TEMP_0FD9D6608_7391CD7 |       |       |            |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|                            |       |       |            |          |
|   4 |     FAST DUAL                              |                            |     1 |       |     2   (0)| 00:00:01 |
|*  5 |     RECURSIVE WITH PUMP                    |                            |       |       |            |          |
|   6 |   UNION-ALL                                |                            |       |       |            |          |
|*  7 |    COUNT STOPKEY                           |                            |       |       |            |          |
|   8 |     VIEW                                   |                            |     2 |    26 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6608_7391CD7 |     2 |    12 |     2   (0)| 00:00:01 |
|* 10 |    COUNT STOPKEY                           |                            |       |       |            |          |
|  11 |     VIEW                                   |                            |     2 |    26 |     2   (0)| 00:00:01 |
|  12 |      TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6608_7391CD7 |     2 |    12 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("P"<100)
   7 - filter(ROWNUM<=10)
  10 - filter(ROWNUM<=10)

The following morning I woke up with one of those “overnight insights” where you seem to have worked out the answer in your sleep. To make a hint work you have to put it in the right query block, or you have to name the right query block in the main query block: in this case the right query block doesn’t exist in the text, and it’s not possible to figure out what the name of the right query block would be if it came into existence.

If you try putting the /*+ inline */ hint into the query after the select at line 2 above, you’ve put the hint into the first query block of a union all, NOT into the query block of the recursvie CTE.

Having identified the problem, the solution (or at least, a possible solution) was obvious – create the query block you need. This (with its execution plan from 11.2.0.4) is what worked:

with data(p) as (
	select 1 p from dual
	union all
	select p + 1 from data where p < 100
),
data1 as (
	select /*+ inline */ * from data
)
select	p
from	(
	select * from data1 where rownum <= 10
	union all
	select * from data1 where rownum <= 10
	)
;

-----------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |      |     4 |    52 |     8   (0)| 00:00:01 |
|   1 |  VIEW                                        |      |     4 |    52 |     8   (0)| 00:00:01 |
|   2 |   UNION-ALL                                  |      |       |       |            |          |
|*  3 |    COUNT STOPKEY                             |      |       |       |            |          |
|   4 |     VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|   6 |       FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|*  7 |       RECURSIVE WITH PUMP                    |      |       |       |            |          |
|*  8 |    COUNT STOPKEY                             |      |       |       |            |          |
|   9 |     VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|  10 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|  11 |       FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|* 12 |       RECURSIVE WITH PUMP                    |      |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(ROWNUM<=10)
   7 - filter("P"<100)
   8 - filter(ROWNUM<=10)
  12 - filter("P"<100)

All I’ve done is create a second CTE that selects from the first CTE. This is now a simple select, so I can add a perfectly placed hint to it – in the hope that this would effectively require the dependent recursive CTE to be inlined inside it. It seems to be sufficient.

I haven’t tested the strategy exhaustively – so I can give you no guarantee that this has to work – unfortunately I did have another example that I applied the method to, and after several seconds of no response it crashed with an ORA-00600 error :( but that might have been a side effect of the nature of query (it included a couple of the optional extras) rather than a specific feature of inlining.)

[Further reading on “ignoring hints”]

[Further reading on “subquery factoring”]

11 Comments »

  1. Sorry I missed Jeff’s included bit about Recursive Subquery Factoring, as that was the basis of my presentation at RMOUG.
    I have not played with hints too much in RSF, other than to force a Temp Table Transformation where I know it will work better.
    What I have tried to do, so far unsuccessfully so far, is to get a MATERIALIZE hint to work with the older CONNECT BY syntax.

    The reason for this is to see if there is a way around the limitation with CONNECT BY where when the memory runs out, an ORA-30009 occurs.
    RSF has no such limitation, at least not in memory.
    This example will error out on my laptop when about 100M of memory has been consumed:

    with a as (
      select level /*+ materialize */ id
      from dual
      connect by level <= 5000000
    )
    select id from a
    

    This takes a couple minutes to max out the memory on my 11.2.0.3 test database.

    Using Frits Hoogland’s profiler I can see that there are no calls to functions that write to disk.

    Monitoring v$temp_usage shows no usage either.

    Maybe there is another method to make this work with CONNECT BY, but I have not yet found one.

    Interestingly, if you replace ‘id’ in the main query with count(*), oracle does a little math and calculates there isn’t enough memory and returns quite quickly.

    RSF on the other hand, when it begins to run out of memory, just writes to TEMP.

    with gen (id) as (
       select 0 id from dual
       union all
       select gen.id + 1 as id
       from gen
       where id <= 5000000
    )
    select id from gen
    

    As the memory reaches max, oracle starts writing to a sort segment as seen here.

      Oracle Memory:             14002664
    
    
             USERNAME    SID       TYPE     BYTES
      =============== ====== ========== =========
              JKSTILL   4365  LOB_INDEX  68157440
              JKSTILL   4365       SORT  24117248
    

    Perhaps more interesting is the LOB_INDEX, as RSF starts writing to this immediately.
    I have not yet tried to work out why RSF is using memory and a LOB_INDEX object right at the start.

    So RSF by default is using TEMP space as soon as it starts.

    Even for much smaller values (5000) in the comparison for the id column, TEMP usage can be observed.

    Add in the MATERIALIZE hint and it gets more interesting:

    with gen (id) as (
       select 0 id from dual
       union all
       select gen.id + 1 as id
       from gen
       where id <= 5000000
    ),
    d as (
       select /*+ materialize */ id from gen
    )
    select id from d
    
    
      Oracle Memory:             69101384
    
    
             USERNAME    SID       TYPE     BYTES
      =============== ====== ========== =========
              JKSTILL   4365       DATA  14680064
              JKSTILL   4365  LOB_INDEX  67108864
              JKSTILL   4365       SORT  24117248
    
    

    Another interesting apparent optimization of RSF is that when it completes, free() is called to release any memory used for the recursive operations.

    CONNECT BY does not do that, rather it holds on to the memory until another operations clears it.

    Hmm, I may have strayed off topic.

    Also, the ANSI name does seem preferable.

    Comment by jkstill — February 17, 2014 @ 6:42 am GMT Feb 17,2014 | Reply

    • Jared,

      Several interesting points – and not at all off-topic. it’s always useful to collate bits of information like this.

      The LOB_INDEX bit is odd – I wonder if that’s a tagging error in the structure or a reporting error in the view, because I don’t think you can see a LOB_INDEX that large without seeing a LOB SEGMENT that it points to.

      I wonder if the absence of a “free” call on the “connect by” memory allocation can be taken as a clue that that entire section of code has been frozen: maybe at some future date we’ll see an internal transformation that rewrites a connect by as a recursive CTE during semantic analysis.

      I’ll try to find some time to test your example on 11.2.0.4 and 12.1.0.1 (unless someone else does it first).

      Comment by Jonathan Lewis — February 18, 2014 @ 10:28 am GMT Feb 18,2014 | Reply

      • Hi Jonathan,

        I dug into this a bit more in an attempt to find out how the TEMP space is being used.

        The LOB_INDEX value is correct when querying v$temp_usage.
        (Why not LOBINDEX though?)

        The RSF query was rerun with 10046 tracing enabled.
        Direct path writes were performed on objects 0 and 528 as shown

        
        [oracle@ora11203fs trace]$ grep 'direct path write temp' js01_ora_7043.trc | awk '{ print $15 }' | sort | uniq -c
             12 obj#=0
            541 obj#=528
        

        I don’t know what object 0 refers to, and have not been able to find out – there is no object 0 in dba_object, obj$ or v$fixed_table
        There may be a clue in ?/rdbms/admin scripts, but I have not yet looked for any info there on this.

        Object 528 however can be identified:

        
        16:27:58 ora11203fs.jks.com - sys@js01 SQL> select * from dba_objects where object_id = 528;
        
        OWNER	     OBJECT NAME		    SUBOBJECT_NAME		    OBJECT_ID DATA_OBJECT_ID
        ------------ ------------------------------ ------------------------------ ---------- --------------
        			      LAST DDL
        OBJECT_TYPE	    CREATED   TIME	TIME STAMP	    STATUS  T G S  NAMESPACE
        ------------------- --------- --------- ------------------- ------- - - - ----------
        EDITION_NAME
        ------------------------------
        SYS	     KOTAD$								  528		 528
        TABLE		    01-JAN-13 01-JAN-13 2013-01-01:10:12:39 VALID   N N N	   1
        

        It does use a LOB:

        16:30:30 ora11203fs.jks.com - sys@js01 SQL> @findlob
        Enter value for table_name: KOTAD$
        Enter value for owner: SYS
        
              Table Column				  Segment	      Segment
        Owner Name  Name       Segment Name		  Type			Bytes
        ----- ----- ---------- -------------------------- ---------- ----------------
        SYS   KOTAD SYS_NC_ROW SYS_LOB0000000528C00002$$  LOBSEGMENT	       65,536
              $     INFO$
        
        
        1 row selected.
        

        During the course of running the RSF query the number of rows stays constant, so an existing row is being reused.

        The data cannot be viewed directly from SQL*Plus

        
        16:33:09 ora11203fs.jks.com - sys@js01 SQL> select count(*) from sys.kotad$;
        
          COUNT(*)
        ----------
             16060
        
        
          1* select * from sys.kotad$
        16:36:52 ora11203fs.jks.com - sys@js01 SQL> /
        select * from sys.kotad$
                          *
        ERROR at line 1:
        ORA-30732: table contains no user-visible columns
        
        

        The table is created with type SYS.KOTAD, and there doesn’t seem to be any simple method to view the data.
        Exploring the type attributes results in a number of cryptically named attributes such as KOTADFLG, etc.

        Searching the data dictionary scripts resulted in the following method that can be used to create a copy of the table for exploration without fear of messing up the database.
        Note: Please don’t do this in any database that is not expendable

        
        09:41:59 SYS@js01 AS SYSDBA> alter session set events '22372 trace name context forever'
        09:42:02   2  /
        
        Session altered.
        
        Elapsed: 00:00:00.00
        09:42:02 SYS@js01 AS SYSDBA> create table kotad_temp$ of kotad;
        
        Table created.
        
        Elapsed: 00:00:00.21
        09:42:11 SYS@js01 AS SYSDBA> insert into kotad_temp$(sys_nc_oid$, sys_nc_rowinfo$) select sys_nc_oid$, sys_nc_rowinfo$ from kotad$;
        
        16060 rows created.
        
        Elapsed: 00:00:01.04
        09:42:21 SYS@js01 AS SYSDBA> commit;
        
        Commit complete.
        
        

        This is where I have run into a bit of a dead end.

        As per the oracle docs, temporary lobs such as this can only be manipulated via API or PL/SQL.
        I have worked very little with lobs, and while I could manage creating and playing LOBs in my own code,
        deciphering how to access the LOBs in this table to find out what is happening will consume too much time,
        or at least more than I have available to spend on it now.

        Perhaps someone else reading this will already know how to use SYS.KOTAD and DBMS_LOB to access this data.

        Comment by jkstill — February 25, 2014 @ 6:27 pm GMT Feb 25,2014 | Reply

        • Jared,

          I just tried to repeat your experiment using 11.2.0.4
          My direct path writes reported obj# = 0 or obj# = -1, which I would take to mean “no object” – I think the obj# = 528 might be a red herring, a value left in parameter 3 from a previous wait perhaps.

          Comment by Jonathan Lewis — February 25, 2014 @ 8:17 pm GMT Feb 25,2014

        • Thanks Jonathan – that well could be a red herring.
          A different approach will need to be taken, just need to ponder it a bit more.

          Comment by jkstill — February 25, 2014 @ 8:23 pm GMT Feb 25,2014

        • Hi Jonathan,

          I took another look at the 10046 trace file, and obj#=528 is indeed a red herring.
          The obj# is from a previous wait.

          WAIT #139894016593672: nam='SQL*Net message from client' ela= 345 driver id=1650815232 #bytes=1 p3=0 obj#=528 tim=1393288033248613
          WAIT #139894016593672: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=528 tim=1393288033248657
          WAIT #139894016593672: nam='direct path write temp' ela= 3079 file number=201 first dba=167296 block cnt=31 obj#=528 tim=1393288033278367
          WAIT #139894016593672: nam='direct path write temp' ela= 2196 file number=201 first dba=167327 block cnt=31 obj#=528 tim=1393288033332418
          WAIT #139894016593672: nam='direct path write temp' ela= 3334 file number=201 first dba=167358 block cnt=31 obj#=528 tim=1393288033372929
          

          The waits on ‘direct path write temp’ are just sort block writes to temp.

          The file number is a bit of a mystery, as there is no file# 201.

          Using strace I can see that the TEMP file is opened, and the handle duplicated to 258

          open("/u01/oradata/JS01/datafile/o1_mf_temp_8g69rnkr_.tmp", O_RDWR|O_DSYNC) = 11
          getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
          fcntl(11, F_DUPFD, 256)                 = 258
          close(11)                               = 0
          fcntl(258, F_SETFD, FD_CLOEXEC)         = 0
          

          A quick google search shows that ‘file number=201′ is fairly common for direct path writes, so perhaps this is hardcoded.

          Comment by jkstill — February 27, 2014 @ 4:37 pm GMT Feb 27,2014

        • Jared,

          The 201 file number is the “default” for most systems. File numbers for tempfiles are the “tempfile number: file#” from v$tempfile + parameter db_files, and the default for that parameter is 200 … hence 201.

          Comment by Jonathan Lewis — February 27, 2014 @ 7:36 pm GMT Feb 27,2014

  2. Thanks Jonathan – I thought the 201 must be a default value, but did not know how it was derived.

    Comment by jkstill — February 27, 2014 @ 11:19 pm GMT Feb 27,2014 | Reply

    • Back to the question about RSF creating LOBINDEX segments:

      V$TEMPSEG_USAGE is a synonym for V_$SORT_USAGE, which is in turn a view based on GV_$SORT_USAGE.

      GV_$SORT_USAGE is a view based on V$SESSION and X$KTSSO.

      The space management bits are in the X$KTSSO table (KTS = kernel space management – ‘SO’ may be ‘SOrt’)

      The SEGTYPE column is the one that has identified the RSF query as generating LOB_INDEX segments:

       
      decode(ktssosegt, 1, 'SORT', 2, 'HASH', 3, 'DATA', 4, 'INDEX', 5, 'LOB_DATA', 6, 'LOB_INDEX' , 'UNDEFINED') 
      

      If identifying segments as LOB_INDEX is incorrect, then it would appear to be an error in the Oracle Kernel.

      Here is the data from one RSF query that spilled over into TEMP:

      
      16:36:41 ora11203fs.jks.com - sys@js01 SQL&gt; select * from X$KTSSO;
      
      ADDR                   INDX    INST_ID KTSSOSES           KTSSOSNO KTSSOTSN     KTSSOCNT  KTSSOSEGT   KTSSOFNO   KTSSOBNO  KTSSOEXTS  KTSSOBLKS  KTSSORFNO  KTSSOOBJD  KTSSOOBJN KTSSOTSNUM KTSSOSQLID
      ---------------- ---------- ---------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------
      00007F0947860360          0          1 00000000917B3C00       3047 TEMP                1          6        201     164480         23       2944          1    4358784          0          3 bd6rf4gs1d3uk
      00007F0947860360          1          1 00000000917B3C00       3047 TEMP                1          1        201     126976          1        128          1    4321280          0          3 bd6rf4gs1d3uk
      00007F0947860360          2          1 00000000917B3C00       3047 TEMP                1          6        201     124416          2        256          1    4318720          0          3 bd6rf4gs1d3uk
      
      3 rows selected.
      
      

      As the X$KTSSO table contains the address of the values, one might start poking around in memory to explore this.

      At this point I am not sure it is worth pursuing further.

      Here is the full mapping of the columns in GV$SORT_USAGE for anyone interested:

       
      gv$sort_usage  : x$ktsso
      ==============================
      N/A            : ADDR
      INT_ID         : inst_id
      USERNAME       : v$sessionusername
      USER           : v$sessionusername 
      SESSION_ADDR   : ktssoses
      SESSION_NUM    : ktssosno
      SQLADDR        : v$sessionprev_sql_addr
      SQLHASH        : v$sessionprev_hash_value
      SQL_ID         : v$sessionprev_sql_id
      TABLESPACE     : ktssotsn
      CONTENTS       : decode(ktssocnt, 0, 'PERMANENT', 1, 'TEMPORARY')
      SEGTYPE        : decode(ktssosegt, 1, 'SORT', 2, 'HASH', 3, 'DATA', 4, 'INDEX', 5, 'LOB_DATA', 6, 'LOB_INDEX' , 'UNDEFINED')
      SEGFILE#       : ktssofno
      SEGBLK#        : ktssobno
      EXTENTS        : ktssoexts
      BLOCKS         : ktssoblks
      SEGRFNO#       : ktssorfno
      

      You may have noticed the SQL_ID information is incorrect.

      The following Oracle Support document shows that the correct SQL_ID value can be found in the KTSSOSQLID columns

      Bug 17834663 – Include SQL ID for statement that created a temporary segment in GV$SORT_USAGE (Doc ID 17834663.8)

      Comment by jkstill — March 2, 2014 @ 12:55 am GMT Mar 2,2014 | Reply

  3. […] common table expressions are still quite new to Oracle, hints can be ignored, as discussed here, which of course is of considerable interest to database developers who want to tune […]

    Pingback by How to Multiply Across a Hierarchy in Oracle: Performance (Part 2/2) | Databaseline — October 12, 2014 @ 8:15 am GMT Oct 12,2014 | Reply

  4. […] For more information on Recursive With clause(recursive subquery factoring), please check. […]

    Pingback by how to replace multiple replace functions? | Oracle Insight and Performance concepts — December 7, 2014 @ 6:32 am GMT Dec 7,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,521 other followers